I recently installed the director in a small environment only being one master, one satellite and one windows host with the agent installed. I imported the agent endpoint via kickstart wizard and created a host. I can run service checks for disk, load and memory with no problems. The command endpoint for this checks is the satellite.
The problem i run into is that when i configure a hostcheck it always fails with a message like this:
execvpe(/usr/lib/nagios/plugins/check_disk.exe) failed: No such file or directory
I am not able to run any other checks than the dummy one. Something seems to be missing but i canât determine where. The referenced files like check_disk.exe are on every icinga server in /usr/lib/nagios/plugins/. Except on the Windows System where they are in C:\Program Files\ICINGA2\sbin which is also the standard directory.
I am just able to speculate/suspect that Icinga tries to run the checks from /usr/lib/nagios/plugins/ on the windows machine an canât because it doesnât exist.
I hope somebody can help me or point me in the right direction.
I changed the command endpoint to the Windows host (dc01) but i canât deploy the configuration because of following error.
[2019-12-10 14:24:38 +0100] information/cli: Icinga application loader (version: r2.11.2-1)
[2019-12-10 14:24:38 +0100] information/cli: Loading configuration file(s).
[2019-12-10 14:24:38 +0100] information/ConfigItem: Committing config item(s).
[2019-12-10 14:24:38 +0100] information/ApiListener: My API identity: master-01
[2019-12-10 14:24:38 +0100] critical/config: Error: Validation failed for object 'dc01' of type 'Host'; Attribute 'command_endpoint': Command endpoint must be in zone 'dc01' or in a direct child zone thereof.
Location: in [stage]/zones.d/dc01/hosts.conf: 1:0-1:44
[stage]/zones.d/dc01/hosts.conf(1): object Host "dc01" {
^^^^^^^^^^^^^^^^^^
[stage]/zones.d/dc01/hosts.conf(2): import "Windows Agent Satllit01"
[stage]/zones.d/dc01/hosts.conf(3):
[2019-12-10 14:24:38 +0100] critical/config: 1 error
[2019-12-10 14:24:38 +0100] critical/cli: Config validation failed. Re-run with 'icinga2 daemon -C' after fixing the config.
The strange thing is, that the endpoint dc01 is in the zone dc01. I configured the zone hierarchy manually before i ran the kickstart wizard (master - satellite - dc01) with each endpoint in its respective zone. I know there is a difference between host and endpoint. But satellite and dc01 were integrated in the same way into the environment, so i donât really understand why everything except the hostcheck works with the satellite as command endpoint but i canât deploy if I change it to dc01.
Maybe it is worth noting that the satellite doesnât know the IP of dc01 but the dc01 knows the IP of the satellite.
Setting the option ârun on agentâ in the service template to yes should solve the problem. Then this check is run on the host itself always (be it the master, a satellite or an agent).
To have this working the host object has to be named exactly like the zone and endpoint object of the satellite/agent host.
Can you show the zones.conf of the master, satellite and agent?
The option ârun on agentâ is enabled for all services I use. The problem is the check command for the host object of the windows host.
The Windows endpoint and zone have the same name. The master and the satellite zone are named differently than the endpoint. Do you think I should i change them?
Following are the zones.conf of each server. I mainly used the zones.d directory on the master for importing endpoints and zones into icinga with the kickstart wizard. The configuration in those files doesnât differ from the zones.conf.
Master zones.conf
object Endpoint "mon-master-01" {
}
object Zone "master" {
endpoints = [ "mon-master-01" ]
}
object Endpoint "mon-satellit01" {
host = "10.10.22.4"
}
object Zone "satellit01" {
endpoints = ["mon-satellit01"]
parent = "master"
}
object Zone "global-templates" {
global = true
}
object Zone "director-global" {
global = true
}
Satellite zones.conf
object Endpoint "mon-master-01" {
host = "10.10.20.4"
port = "5665"
}
object Zone "master" {
endpoints = [ "mon-master-01" ]
}
object Endpoint "mon-satellit01" {
}
object Zone "satellit01" {
endpoints = [ "mon-satellit01" ]
parent = "master"
}
object Zone "global-templates" {
global = true
}
object Zone "director-global" {
global = true
}
Agent zones.conf
object Endpoint "mon-satellit01" {
host = "1.2.3.4" //public satellite IP
port = "5665"
}
object Zone "satellit01" {
endpoints = [ "mon-satellit01" ]
}
object Endpoint "dc01" {
}
object Zone "dc01" {
endpoints = [ "dc01" ]
parent = "satellit01"
}
object Zone "global-templates" {
global = true
}
object Zone "director-global" {
global = true
}
What does icinga say about the check_source? Thatâs the system which tries to execute the check. For a disk check it should be the actual client/agent, not the satellite server.
The respective service object should have the directive command_endpoint == host.name
If all these are correct, have a look into the constants.conf of your windoze machine, the PluginDir should point to c:\program files\and\so\on\check_disk.exe
The zones.conf files look okay, although I hope the endpoint names are redacted and the real names match the fqdn of the respective system.
If they are missing Icinga 2 doesnât know about the agent and you will have to redeploy it to the agent host. Best via the script provided by the Director (shows up when you configure the Agent tab inside the host config).
Second (more advanced) option could be creating a host template, configure the agent tab there to create a self-service API key and have the host register it self.
See https://icinga.com/docs/director/latest/doc/74-Self-Service-API/ for more info.
I only saw hosts.conf in the preview so i tried both of your recommended options. I still donât have the endpoint and zones.conf in the preview but most of the services are executed on the agent and return useful results. What I want to mention regarding the agent is that i donât know its IP-Address, but they can connect to the satellite on a public IP. This IP I had to add manually to the satellite endpoint in the icinga2.conf on the windows host.
However Iâm getting more and more confused because nothing seems to behave in a consistent way.
Host is DOWN despite using the dummy-check, which seems to be executed by schedule. But there is still the plugin output of the failed check_disk. Check source is the satellite. Host is reachable, I guess this refers to the satellite?
Disk Service Check did work yesterday but doesnât work anymore. Check Source is the Agent. After redeploying the Service, it will stay in âoutstandingâ like the next one. âCheck nowâ doesnât seem to have any effect. Host is not reachable. I guess this also refers to the check source.
Finally the load check. Works as intended (like the memory check ânscp-local-memoryâ) and returns a correct status every minute. Check source is the Agent but not reachable.
All services were configured with the director in the exact same way except the check commands of course. No service has a pre-defined check source because if i set it to the Agent i canât deploy the config because of the error message in my first reply. In some cases it is the Agent, and then again there is nothing.
The âreachableâ information is a calculation of the monitoring system, if this host or service can be reached by the parent system. But without configured dependencies between hosts this is no real indicator. At least that is my understanding. (hope this is correct )
Please implement a check for the dc01 zone and show itâs output. Let the check run on both the master and the satellite system.
Use the check command cluster-zone from the ITL, add a variable called cluster_zone and set it to the dc01 zone name.
E.g.
The check source is the system where the check is actually executed, the âreachableâ indicator refers to that system. Your load check looks like it should be, source is the dc01, but the disk check is executed by your satellite (if it was a linux check it would show the disk space on your satellite server instead of the dc01)
Please show us the config preview of the disk check (hit âModifizierenâ or âModifyâ on the checkâs page and then again âPreviewâ on the right side.
Maybe the same for the correctly configured load check, for reference.
Edit regarding the cluster-zone check: Looks good. dc01 should only connect to the satellite, if it is itâs parent zone.
Please test if the behavior changes after you remove the command_endpoint from the host template.
Normally you donât need to set this. As the host is âinsideâ the satellit01 zone it will be checked by the satellites by default. If you want to change the check execution to the agent it self just set the Icinga2 Agent option in the Director to yes in the host/service template
[2019-12-13 13:39:59 +0100] information/cli: Icinga application loader (version: r2.11.2-1)
[2019-12-13 13:39:59 +0100] information/cli: Loading configuration file(s).
[2019-12-13 13:39:59 +0100] information/ConfigItem: Committing config item(s).
[2019-12-13 13:39:59 +0100] information/ApiListener: My API identity: mon-master-01
[2019-12-13 13:39:59 +0100] critical/config: Error: Validation failed for object 'dc01!windows uptime' of type 'Service'; Attribute 'command_endpoint': Command endpoint must be in zone 'master' or in a direct child zone thereof.
Location: in [stage]/zones.d/director-global/servicesets.conf: 33:1-33:30
[stage]/zones.d/director-global/servicesets.conf(31): }
[stage]/zones.d/director-global/servicesets.conf(32):
[stage]/zones.d/director-global/servicesets.conf(33): apply Service "windows uptime" {
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[stage]/zones.d/director-global/servicesets.conf(34): import "windows uptime"
[stage]/zones.d/director-global/servicesets.conf(35):
[2019-12-13 13:39:59 +0100] critical/config: Error: Validation failed for object 'dc01!windows memory' of type 'Service'; Attribute 'command_endpoint': Command endpoint must be in zone 'master' or in a direct child zone thereof.
Location: in [stage]/zones.d/director-global/servicesets.conf: 17:1-17:30
[stage]/zones.d/director-global/servicesets.conf(15): }
[stage]/zones.d/director-global/servicesets.conf(16):
[stage]/zones.d/director-global/servicesets.conf(17): apply Service "windows memory" {
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[stage]/zones.d/director-global/servicesets.conf(18): import "windows memory"
[stage]/zones.d/director-global/servicesets.conf(19):
[2019-12-13 13:39:59 +0100] critical/config: Error: Validation failed for object 'dc01!windows disk' of type 'Service'; Attribute 'command_endpoint': Command endpoint must be in zone 'master' or in a direct child zone thereof.
Location: in [stage]/zones.d/director-global/servicesets.conf: 25:1-25:28
[stage]/zones.d/director-global/servicesets.conf(23): }
[stage]/zones.d/director-global/servicesets.conf(24):
[stage]/zones.d/director-global/servicesets.conf(25): apply Service "windows disk" {
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[stage]/zones.d/director-global/servicesets.conf(26): import "windows disk"
[stage]/zones.d/director-global/servicesets.conf(27):
[2019-12-13 13:39:59 +0100] critical/config: Error: Validation failed for object 'dc01!windows load' of type 'Service'; Attribute 'command_endpoint': Command endpoint must be in zone 'master' or in a direct child zone thereof.
Location: in [stage]/zones.d/director-global/servicesets.conf: 9:1-9:28
[stage]/zones.d/director-global/servicesets.conf(7): */
[stage]/zones.d/director-global/servicesets.conf(8):
[stage]/zones.d/director-global/servicesets.conf(9): apply Service "windows load" {
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[stage]/zones.d/director-global/servicesets.conf(10): import "windows load"
[stage]/zones.d/director-global/servicesets.conf(11):
[2019-12-13 13:39:59 +0100] critical/config: 4 errors
[2019-12-13 13:39:59 +0100] critical/cli: Config validation failed. Re-run with 'icinga2 daemon -C' after fixing the config.
Addition: I was able to bypass this error message if I change the cluster zone directly on the host object to the host-zone itself and then change the cluster zone on the template. This has absolutely no effect at all but is deployed succesfully.
I also donât really think a hostcheck should be executed on the host itself, because it is mainly used to determine the state of the host, or am I wrong about this?
Why is there so much with space in front of the endpoint name?
Also please show the endpoint and zones inside the Director. There still seems to be a problem with them, or the Director not knowing them, hence the errors on deployment.
Also kind of a new thing developed over the weekend. The two successful checks (load and memory) are now also not working correctly:
As I first saw this the nächster check/next check timer was the same as the letzer check/last check timer. Services are still marked with âOKâ but delayed in the Webinterface. At least it is consistent again and nothing works
Iâm out of ideasâŚ
Iâm sure we are missing just one single/simple thing, but canât figure out what exactly
Is it possible that you install the agent on the windows host from scratch with the script provided by the Director, when you configure the Agent tab on the host?
@dnsmichi do you have any further ideas were to look for the problem?