Right so technically for me a agent is the node that connects to the satellite (agent)
yes this is confusing as the termS agent and agent mean two different things and they are both right as there is also a agent on the master node but we just call it the master
So I like to keep it simple
master
satelite (optional)
agent (the server you want to monitor)
Any how still does not solve your initial question I suppose!
back to post 1:
Yes, the the master receives the results, after it tels the satellite the configuration. The satellite will then as the agent to do the check
zone files, configuration and logs mostly…
Bu putting them in the proper folder on the master in the /zones.d/zone_name this will go to the satellite, that goes to the agent. The agent needs to allow the satellites incoming connection in the configuration for it to work.
This is excellent information. I really appreciate you sticking with me on this and helping clear things up. I am starting to gain traction on all of this.
I have gone through the configuration both on the command line and through director and spotted a few mistakes I made. I have corrected them but I am still unable to get a successful ping/ssh/http/winrm check on a host that only the satellite should be able to ping/ssh/http/winrm. I have confirmed that I can ping the host in question from the satellite too. If I can get this demo host working then I can take everything I have learned and start applying it everywhere else.
I did everything through Director but I have been using the preview tab to see where it saves the files and then inspecting them on the command line to see what is going and to learn what it is that it does etc.
So it would appear that these two test agents are being checked by the correct satellite which is good news. Assuming that my hosts are setup correctly, then that means I setup my services wrong?
Other than check-ping being null I am not seeing anything that stands out?
Right so I you look at your screen shot, then you will see ping being “not connected”
Which is not typically the error message you get from a ping service. This indicates it is trying to connect to the agent WDC-BOUNCE-01 on the icinga proces and tries to ping it self which is in 100% of the cases not use-full
You want the first piece of icinga infrastructure above the agent to ping the agent. (satellite or master)
So it being null in your config is definitely a problem here. (probably the same for winrm/ssh, not very use-full if the agent tries to check it self!)
Its kinda like having a doctor as a parent and googling your symptoms yourself and saying I am fine
So your response brings us back full circle to my original desire of doing monitoring without having to go around and install the icinga agent on all of my hosts. I thought that the Satellites in the zones of the agents would be able to do the basic ssh/ping/http/winrm checks?
Hi No TG here in the EU but thanks for asking yourself?
I love a good circle
Right so you could let your satellite do all the heavy lifting, If its a good thing to do ?
depends,
In any case you are looking for the command_endpoint setting there is one in your host template that you can set to your satellite that should give you the settings you want.
So I am glad that you mentioned command_endpoint as I told my host template to use the Satellite but was unsure if it was the correct thing to do. Now I know it was which is good.
I went to the Satellite and confirmed that the password in /etc/icinga2/conf.d/api-users.conf was correct
/**
* The APIUser objects are used for authentication against the API.
*/
object ApiUser "WDC" {
password = "SuperSecretPassword123!"
// client_cn = ""
permissions = [ "*" ]
}
I then took that password and verified on master that the API user had that password.
So I tried to set this manually but after restarting the icinga service using systemctl restart icinga, director changed the file back. I guess this is to be expected and actually a good thing in a way. I poked around the WebUI but could not find any way to manually change this.
I am assuming that you are referring to /etc/icinga2/conf.d/hosts.conf. If so, I checked in there and I do not have vars.agent_endpoint = name set in that file. I will go ahead and add it.
object Host NodeName {
/* Import the default host template defined in `templates.conf`. */
import "generic-host"
/* Specify the address attributes for checks e.g. `ssh` or `http`. */
address = "127.0.0.1"
address6 = "::1"
/* Set custom attribute `os` for hostgroup assignment in `groups.conf`. */
vars.os = "Linux"
/* Define http vhost attributes for service apply rules in `services.conf`. */
vars.http_vhosts["http"] = {
http_uri = "/"
}
/* Uncomment if you've sucessfully installed Icinga Web 2. */
//vars.http_vhosts["Icinga Web 2"] = {
// http_uri = "/icingaweb2"
//}
/* Define disks and attributes for service apply rules in `services.conf`. */
vars.disks["disk"] = {
/* No parameters. */
}
vars.disks["disk /"] = {
disk_partitions = "/"
}
/* Define notification mail attributes for notification apply rules in `notifications.conf`. */
vars.notification["mail"] = {
/* The UserGroup `icingaadmins` is defined in `users.conf`. */
groups = [ "icingaadmins" ]
}
/* For use with our service template */
vars.agent_endpoint = name
}
(This change persisted through a restart probably because it is not in the “directory” folder (ie director doesnt control it))
Did you mean hosts config on your master and if not which configuration are you referring to?
Although as I type this I am realizing that what I said above about director changing the file back when Icinga is restarted will probably also happen here. So no manual changes can be made to the files which is a bummer.
Regardless, after making the suggested change of setting run on agent to no on the service templates my service_templates.conf now looks like this
I feel like we are really close to figuring this out. Each post gets me further and teaches me something new so again, thank you for all your help. I created a simulated multi-site setup on my homelab using firewall rules and vlans so that I can work through all of this from scratch and document the whole process. If you are alright with it, I plan on giving a special mention / shoutout to you at the bottom of each part of the guide so that if anyone ever reads my guide, they will know the true source of my knowledge .
Is great stuff. If you change the template to yes this will change for the specific service of your choosing, It should monitor your service on the specific host And that should be the end result you are looking for ?
It is really weird that it is working for two of them but not the other two? I wonder if it is because those two checks are not being used since WDC-Bounce-01 is a Windows Server. Let me add a Linux server and see what happens.
Yup!
Now I am getting the same error for the SSH and HTTP service checks that I was for the winrm and ping.
information/cli: Icinga application loader (version: r2.8.1-1)
information/cli: Loading configuration file(s).
information/ConfigItem: Committing config item(s).
information/ApiListener: My API identity: AWS-NOC-02
critical/config: Error: Validation failed for object 'WDC-UTIL-01!check-http' of type 'Service'; Attribute 'command_endpoint': Object 'WDC-UTIL-01' of type 'Endpoint' does not exist.
Location: in [stage]/zones.d/director-global/service_templates.conf: 8:5-8:32
[stage]/zones.d/director-global/service_templates.conf(6): template Service "check-http" {
[stage]/zones.d/director-global/service_templates.conf(7): check_command = "http"
[stage]/zones.d/director-global/service_templates.conf(8): command_endpoint = host_name
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[stage]/zones.d/director-global/service_templates.conf(9): }
[stage]/zones.d/director-global/service_templates.conf(10):
critical/config: Error: Validation failed for object 'WDC-UTIL-01!check-ssh' of type 'Service'; Attribute 'command_endpoint': Object 'WDC-UTIL-01' of type 'Endpoint' does not exist.
Location: in [stage]/zones.d/director-global/service_templates.conf: 3:5-3:32
[stage]/zones.d/director-global/service_templates.conf(1): template Service "check-ssh" {
[stage]/zones.d/director-global/service_templates.conf(2): check_command = "ssh"
[stage]/zones.d/director-global/service_templates.conf(3): command_endpoint = host_name
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[stage]/zones.d/director-global/service_templates.conf(4): }
[stage]/zones.d/director-global/service_templates.conf(5):
critical/config: 2 errors
(I am thinking out loud at this point. Trying to enumerate through everything and check each little thing in an effort to debug this )
Endpoints are in Director -> infrastructure
Alright I just noticed something interesting. If I go to the dashboard and click on the host then it will show its check source as being the Satellite.
Let’s take a step back and look at all the configuration made at this point. I have only ever used Director and nothing else so all the conf files should be in the /var/lib/icinga2/api/ directory/zones.
root@AWS-NOC-02:/var/lib/icinga2/api/zones# ls AWS-NOC-02/director/
endpoints.conf hosts.conf zones.conf
root@AWS-NOC-02:/var/lib/icinga2/api/zones# cat AWS-NOC-02/director/endpoints.conf
object Endpoint "WDC-NOCSat-01" {
host = "10.10.10.20"
}
root@AWS-NOC-02:/var/lib/icinga2/api/zones# cat AWS-NOC-02/director/hosts.conf
object Host "WDC-UTIL-01" {
import "basic-linux"
display_name = "WDC Linux Server"
address = "10.10.10.30"
}
root@AWS-NOC-02:/var/lib/icinga2/api/zones# cat AWS-NOC-02/director/zones.conf
object Zone "WDC" {
parent = "AWS-NOC-02"
endpoints = [ "WDC-NOCSat-01" ]
}
root@AWS-NOC-02:/var/lib/icinga2/api/zones# ls WDC/director/
hosts.conf
root@AWS-NOC-02:/var/lib/icinga2/api/zones# cat WDC/director/hosts.conf
object Host "WDC-NOCSat-01" {
import "WDC-NOCSat-Template"
display_name = "WDC Icinga Satellite"
address = "10.10.10.20"
}
object Host "WDC112SW01" {
import "wdc-switches"
display_name = "Meraki Switch Test"
address = "10.10.40.80"
}
object Host "WDC-BOUNCE-01" {
import "wdc-basic-windows"
display_name = "WDC-BOUNCE-01"
address = "10.10.10.50"
}
root@AWS-NOC-02:/var/lib/icinga2/api/zones# ls director-global/director/
001-director-basics.conf endpoint_templates.conf host_templates.conf service_templates.conf servicesets.conf
root@AWS-NOC-02:/var/lib/icinga2/api/zones# cat director-global/director/001-director-basics.conf
const DirectorStageDir = dirname(dirname(current_filename))
globals.directorWarnedOnceForThresholds = false;
globals.directorWarnOnceForThresholds = function() {
if (globals.directorWarnedOnceForThresholds == false) {
globals.directorWarnedOnceForThresholds = true
log(LogWarning, "config", "Director: flapping_threshold_high/low is not supported in this Icinga 2 version!")
}
}
const DirectorOverrideTemplate = "host var overrides (Director)"
if (! globals.contains(DirectorOverrideTemplate)) {
const DirectorOverrideVars = "_override_servicevars"
globals.directorWarnedOnceForServiceWithoutHost = false;
globals.directorWarnOnceForServiceWithoutHost = function() {
if (globals.directorWarnedOnceForServiceWithoutHost == false) {
globals.directorWarnedOnceForServiceWithoutHost = true
log(
LogWarning,
"config",
"Director: Custom Variable Overrides will not work in this Icinga 2 version. See Director issue #1579"
)
}
}
template Service DirectorOverrideTemplate {
/**
* Seems that host is missing when used in a service object, works fine for
* apply rules
*/
if (! host) {
var host = get_host(host_name)
}
if (! host) {
globals.directorWarnOnceForServiceWithoutHost()
}
if (vars) {
vars += host.vars[DirectorOverrideVars][name]
} else {
vars = host.vars[DirectorOverrideVars][name]
}
}
}
root@AWS-NOC-02:/var/lib/icinga2/api/zones# cat director-global/director/endpoint_templates.conf
template Endpoint "WDC Endpoint Tester" {
host = "10.10.10.20"
port = "5665"
}
root@AWS-NOC-02:/var/lib/icinga2/api/zones# cat director-global/director/host_templates.conf
template Host "basic-linux" {
check_command = "hostalive"
command_endpoint = "WDC-NOCSat-01"
}
template Host "WDC-NOCSat-Template" {
check_command = "hostalive"
command_endpoint = "WDC-NOCSat-01"
notes = "Dummy template for the WDC Icinga Satellite"
}
template Host "wdc-basic-windows" {
check_command = "hostalive"
command_endpoint = "WDC-NOCSat-01"
}
template Host "wdc-switches" {
check_command = "hostalive"
command_endpoint = "WDC-NOCSat-01"
}
root@AWS-NOC-02:/var/lib/icinga2/api/zones# cat director-global/director/service_templates.conf
template Service "check-ssh" {
check_command = "ssh"
command_endpoint = null
}
template Service "check-http" {
check_command = "http"
command_endpoint = null
}
template Service "check-ping" {
check_command = "hostalive"
command_endpoint = null
}
template Service "check-winrm " {
check_command = "tcp"
command_endpoint = null
vars.tcp_port = "5985"
}
root@AWS-NOC-02:/var/lib/icinga2/api/zones# cat director-global/director/servicesets.conf
/**
* Service Set: basic-linux-services
*
* HTTP/PING/SSH
*/
/**
* Service Set: basic-linux-services
* on host basic-linux
*/
apply Service "check-http" {
import "check-http"
assign where "basic-linux" in host.templates
import DirectorOverrideTemplate
}
apply Service "check-ssh" {
import "check-ssh"
assign where "basic-linux" in host.templates
import DirectorOverrideTemplate
}
apply Service "check-ping" {
import "check-ping"
assign where "basic-linux" in host.templates
import DirectorOverrideTemplate
}
/**
* Service Set: basic-windows-services
*
* ping and winrm
*/
/**
* Service Set: basic-ping
*
* ping check
*/
/**
* Service Set: basic-windows-services
* on host wdc-basic-windows
*/
apply Service "check-ping" {
import "check-ping"
assign where "wdc-basic-windows" in host.templates
import DirectorOverrideTemplate
}
apply Service "check-winrm " {
import "check-winrm "
assign where "wdc-basic-windows" in host.templates
import DirectorOverrideTemplate
}
Looking through this, I am not seeing any issues. Are you?
Ideally they look like this ish UNLESS the service needs to run somewhere else:
template Service "generic-service" {
max_check_attempts = 5
check_interval = 5m
retry_interval = 30s
command_endpoint = host.vars.agent_endpoint
// this points to "object Host" -> "vars.agent_endpoint = name"
}
Your satellite host alive pings it self:
template Host "WDC-NOCSat-Template" {
check_command = "hostalive"
command_endpoint = "WDC-NOCSat-01"
notes = "Dummy template for the WDC Icinga Satellite"
}
Ideally:
template Host "WDC-NOCSat-Template" {
check_command = "hostalive"
command_endpoint = "AW-NOC-02"
// IF this is indeed your master?
notes = "Dummy template for the WDC Icinga Satellite"
}
your check-ping is defined twice and then applied twice for windows and linux
apply Service "check-ping" {
import "check-ping"
assign where "basic-linux" in host.templates
and
apply Service "check-ping" {
import "check-ping"
assign where "wdc-basic-windows" in host.templates
Not this might be a director thing but ideally you keep assigns to a minimum like this:
apply Service "check-ping" {
import "check-ping"
assign where host.address
This reduces the load on your master with reloading the config.
So a lot to fix still in your config but that is alright hope it helps you fix a few things
Thank you so much for looking through that. I ended up making a lot of changes to help my sanity. I ended up reinstalling the Satellite due to it just being completely out of whack. There is so much that I have learned since I first installed it and this allowed me to check each step to ensure everything was correct. This also fixed the issue of the Master and Satellite not running the same version of Icinga2.
Deleted the check-ping service template since the check_commandhostalive already does a ping check
Ensured all firewall, acl, ufw, security groups were properly configured
Followed the documentation to setup the new Satellite
Used curl -k -s -u WDC:$PASS https://WDC-NOCSat-02.domain.corp:5665/v1 on Master to confirm that it could properly talk to the Satellite over the API.
<html><head><title>Icinga 2</title></head><h1>Hello from Icinga 2 (Version: r2.8.1-1)!</h1><p>You are authenticated as <b>WDC</b>. Your user has the following permissions:</p> <ul><li>*</li></ul><p>More information about API requests is available in the <a href="https://docs.icinga.com/icinga2/latest" target="_blank">documentation</a>.</p></html>
Fixed all the host templates so that they only have check_command = "hostalive"
Fixed my Satellite host configuration so that its command_endpoint was now Master
I understand that each host should have a parent. This does make sense. I just didn’t realize that they would need their own Zone? If I have 100 hosts, then each of them will need their own Zone?
I am not seeing any way to get command_endpoint = host.vars.agent_endpoint into the configuration with Director UI. I can’t change it manually because Director will just change it back.
I feel that showing you the configuration files was very helpful so ill leave them here at the bottom again for you.
I will check it out tomorrow morning In the mean time could you specify where you would like more help that will help me enormously, Sort of a morning puzzle to wake up the brain haha.
I see 3 problems in your first screeny Are those the problems you are trying to solve at this point in time ?
No rush at all (I am about to head home for the day)! I appreciate all the time you have spent helping.
My current goal is the same one I started with. Its been a fun puzzle .
I am just looking to setup distributed monitoring. Master communicates with Satellites which communicate with Agents. Currently the setup is just a simple POC setup. Master server, Satellite server, Windows agent, Linux agent, and a Meraki switch are the only devices involved here. All three of those agents should only ever be checked by the Satellite. Satellite reports back to Master. Master displays the dashboard. The checks being performed are just ping (hostalive), ssh (check-ssh), http (check-http), and winrm (check-winrm).
The Icinga 2 hierarchy consists of so-called zone objects. Zones depend on a parent-child relationship in order to trust each other.
found here: Distributed Monitoring - Icinga 2
I am not seeing any way to get command_endpoint = host.vars.agent_endpoint into the configuration with Director UI. I can’t change it manually because Director will just change it back.
Ah yes true, that might be a director limitation. and it want so set it to host.name which would work too that is what you should see when you do run on agent = yes
I do not see any weird stuff specifically in your new config at frist glance it is a way better setup
When selecting a command_endpoint I don’t have the ability to manually enter in a string like that unfortunately. It feels weird for Director to have that limitation.
However this also forces me to select a value for establish connection and accepts config. I am selecting no here because these agents do not have icinga installed. Goes back to the goal of not needing to have to install icinga on every host I want to monitor. Just need the satellites to do ping/http/winrm/ssh checks.
Now for the host template that is applied to the satellite host, if I tell it that the icinga agent is installed (which it is since it is a satellite), I cannot tell it that its command endpoint is master.
I can however add multiple templates to a host so let me try that. I will use the above template to tell the satellite that it is in the WDC zone and its command endpoint is AWS-NOC-02 (master). Then I will use this 2nd template to tell icinga that the satellite has an agent installed and accept configuration.
information/cli: Icinga application loader (version: r2.8.1-1)
information/cli: Loading configuration file(s).
information/ConfigItem: Committing config item(s).
information/ApiListener: My API identity: AWS-NOC-02
critical/config: Error: Validation failed for object 'WDC-NOCSat-02' of type 'Host'; Attribute 'command_endpoint': Command endpoint must be in zone 'WDC' or in a direct child zone thereof.
Location: in [stage]/zones.d/WDC/hosts.conf: 22:1-22:27
[stage]/zones.d/WDC/hosts.conf(20): }
[stage]/zones.d/WDC/hosts.conf(21):
[stage]/zones.d/WDC/hosts.conf(22): object Host "WDC-NOCSat-02" {
^^^^^^^^^^^^^^^^^^^^^^^^^^^
[stage]/zones.d/WDC/hosts.conf(23): import "wdc-satellites-endpoint"
[stage]/zones.d/WDC/hosts.conf(24): import "wdc-satellites-agent"
critical/config: 1 error
So the command_endpoint, which in this case is AWS-NOC-02, needs to be in the WDC zone. But there is no way it can be in that zone because it is the master and in its own zone. If we look at the endpoints we can see that it is not in the WDC Zone.
critical/config: Error: Validation failed for object 'WDC-NOCSat-02' of type 'Host'; Attribute 'command_endpoint': Command endpoint must be in zone 'WDC' or in a direct child zone thereof.
Location: in [stage]/zones.d/WDC/hosts.conf: 22:1-22:27
[stage]/zones.d/WDC/hosts.conf(20): }
[stage]/zones.d/WDC/hosts.conf(21):
[stage]/zones.d/WDC/hosts.conf(22): object Host "WDC-NOCSat-02" {
Is a sneaky problem I ran into this last week
No idea how to fix it in the director but I configure this:
Very interesting! That’s good information to have. I think the next best steps are to do some manual configuration of the conf files to establish a proper setup between the master and satellites and then use director kickstart to import them as objects. That would bypass a lot of the issues we are seeing here.