Setting the option “run on agent” in the service template to yes should solve the problem. Then this check is run on the host itself always (be it the master, a satellite or an agent).
To have this working the host object has to be named exactly like the zone and endpoint object of the satellite/agent host.
Can you show the zones.conf of the master, satellite and agent?
The option “run on agent” is enabled for all services I use. The problem is the check command for the host object of the windows host.
The Windows endpoint and zone have the same name. The master and the satellite zone are named differently than the endpoint. Do you think I should i change them?
Following are the zones.conf of each server. I mainly used the zones.d directory on the master for importing endpoints and zones into icinga with the kickstart wizard. The configuration in those files doesn’t differ from the zones.conf.
Master zones.conf
object Endpoint "mon-master-01" {
}
object Zone "master" {
endpoints = [ "mon-master-01" ]
}
object Endpoint "mon-satellit01" {
host = "10.10.22.4"
}
object Zone "satellit01" {
endpoints = ["mon-satellit01"]
parent = "master"
}
object Zone "global-templates" {
global = true
}
object Zone "director-global" {
global = true
}
Satellite zones.conf
object Endpoint "mon-master-01" {
host = "10.10.20.4"
port = "5665"
}
object Zone "master" {
endpoints = [ "mon-master-01" ]
}
object Endpoint "mon-satellit01" {
}
object Zone "satellit01" {
endpoints = [ "mon-satellit01" ]
parent = "master"
}
object Zone "global-templates" {
global = true
}
object Zone "director-global" {
global = true
}
Agent zones.conf
object Endpoint "mon-satellit01" {
host = "1.2.3.4" //public satellite IP
port = "5665"
}
object Zone "satellit01" {
endpoints = [ "mon-satellit01" ]
}
object Endpoint "dc01" {
}
object Zone "dc01" {
endpoints = [ "dc01" ]
parent = "satellit01"
}
object Zone "global-templates" {
global = true
}
object Zone "director-global" {
global = true
}
What does icinga say about the check_source? That’s the system which tries to execute the check. For a disk check it should be the actual client/agent, not the satellite server.
The respective service object should have the directive command_endpoint == host.name
If all these are correct, have a look into the constants.conf of your windoze machine, the PluginDir should point to c:\program files\and\so\on\check_disk.exe
The zones.conf files look okay, although I hope the endpoint names are redacted and the real names match the fqdn of the respective system.
If they are missing Icinga 2 doesn’t know about the agent and you will have to redeploy it to the agent host. Best via the script provided by the Director (shows up when you configure the Agent tab inside the host config).
Second (more advanced) option could be creating a host template, configure the agent tab there to create a self-service API key and have the host register it self.
See https://icinga.com/docs/director/latest/doc/74-Self-Service-API/ for more info.
I only saw hosts.conf in the preview so i tried both of your recommended options. I still don’t have the endpoint and zones.conf in the preview but most of the services are executed on the agent and return useful results. What I want to mention regarding the agent is that i don’t know its IP-Address, but they can connect to the satellite on a public IP. This IP I had to add manually to the satellite endpoint in the icinga2.conf on the windows host.
However I’m getting more and more confused because nothing seems to behave in a consistent way.
Host is DOWN despite using the dummy-check, which seems to be executed by schedule. But there is still the plugin output of the failed check_disk. Check source is the satellite. Host is reachable, I guess this refers to the satellite?
Disk Service Check did work yesterday but doesn’t work anymore. Check Source is the Agent. After redeploying the Service, it will stay in “outstanding” like the next one. “Check now” doesn’t seem to have any effect. Host is not reachable. I guess this also refers to the check source.
Finally the load check. Works as intended (like the memory check “nscp-local-memory”) and returns a correct status every minute. Check source is the Agent but not reachable.
All services were configured with the director in the exact same way except the check commands of course. No service has a pre-defined check source because if i set it to the Agent i can’t deploy the config because of the error message in my first reply. In some cases it is the Agent, and then again there is nothing.
The “reachable” information is a calculation of the monitoring system, if this host or service can be reached by the parent system. But without configured dependencies between hosts this is no real indicator. At least that is my understanding. (hope this is correct )
Please implement a check for the dc01 zone and show it’s output. Let the check run on both the master and the satellite system.
Use the check command cluster-zone from the ITL, add a variable called cluster_zone and set it to the dc01 zone name.
E.g.
The check source is the system where the check is actually executed, the “reachable” indicator refers to that system. Your load check looks like it should be, source is the dc01, but the disk check is executed by your satellite (if it was a linux check it would show the disk space on your satellite server instead of the dc01)
Please show us the config preview of the disk check (hit “Modifizieren” or “Modify” on the check’s page and then again “Preview” on the right side.
Maybe the same for the correctly configured load check, for reference.
Edit regarding the cluster-zone check: Looks good. dc01 should only connect to the satellite, if it is it’s parent zone.
Please test if the behavior changes after you remove the command_endpoint from the host template.
Normally you don’t need to set this. As the host is “inside” the satellit01 zone it will be checked by the satellites by default. If you want to change the check execution to the agent it self just set the Icinga2 Agent option in the Director to yes in the host/service template
[2019-12-13 13:39:59 +0100] information/cli: Icinga application loader (version: r2.11.2-1)
[2019-12-13 13:39:59 +0100] information/cli: Loading configuration file(s).
[2019-12-13 13:39:59 +0100] information/ConfigItem: Committing config item(s).
[2019-12-13 13:39:59 +0100] information/ApiListener: My API identity: mon-master-01
[2019-12-13 13:39:59 +0100] critical/config: Error: Validation failed for object 'dc01!windows uptime' of type 'Service'; Attribute 'command_endpoint': Command endpoint must be in zone 'master' or in a direct child zone thereof.
Location: in [stage]/zones.d/director-global/servicesets.conf: 33:1-33:30
[stage]/zones.d/director-global/servicesets.conf(31): }
[stage]/zones.d/director-global/servicesets.conf(32):
[stage]/zones.d/director-global/servicesets.conf(33): apply Service "windows uptime" {
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[stage]/zones.d/director-global/servicesets.conf(34): import "windows uptime"
[stage]/zones.d/director-global/servicesets.conf(35):
[2019-12-13 13:39:59 +0100] critical/config: Error: Validation failed for object 'dc01!windows memory' of type 'Service'; Attribute 'command_endpoint': Command endpoint must be in zone 'master' or in a direct child zone thereof.
Location: in [stage]/zones.d/director-global/servicesets.conf: 17:1-17:30
[stage]/zones.d/director-global/servicesets.conf(15): }
[stage]/zones.d/director-global/servicesets.conf(16):
[stage]/zones.d/director-global/servicesets.conf(17): apply Service "windows memory" {
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[stage]/zones.d/director-global/servicesets.conf(18): import "windows memory"
[stage]/zones.d/director-global/servicesets.conf(19):
[2019-12-13 13:39:59 +0100] critical/config: Error: Validation failed for object 'dc01!windows disk' of type 'Service'; Attribute 'command_endpoint': Command endpoint must be in zone 'master' or in a direct child zone thereof.
Location: in [stage]/zones.d/director-global/servicesets.conf: 25:1-25:28
[stage]/zones.d/director-global/servicesets.conf(23): }
[stage]/zones.d/director-global/servicesets.conf(24):
[stage]/zones.d/director-global/servicesets.conf(25): apply Service "windows disk" {
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[stage]/zones.d/director-global/servicesets.conf(26): import "windows disk"
[stage]/zones.d/director-global/servicesets.conf(27):
[2019-12-13 13:39:59 +0100] critical/config: Error: Validation failed for object 'dc01!windows load' of type 'Service'; Attribute 'command_endpoint': Command endpoint must be in zone 'master' or in a direct child zone thereof.
Location: in [stage]/zones.d/director-global/servicesets.conf: 9:1-9:28
[stage]/zones.d/director-global/servicesets.conf(7): */
[stage]/zones.d/director-global/servicesets.conf(8):
[stage]/zones.d/director-global/servicesets.conf(9): apply Service "windows load" {
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[stage]/zones.d/director-global/servicesets.conf(10): import "windows load"
[stage]/zones.d/director-global/servicesets.conf(11):
[2019-12-13 13:39:59 +0100] critical/config: 4 errors
[2019-12-13 13:39:59 +0100] critical/cli: Config validation failed. Re-run with 'icinga2 daemon -C' after fixing the config.
Addition: I was able to bypass this error message if I change the cluster zone directly on the host object to the host-zone itself and then change the cluster zone on the template. This has absolutely no effect at all but is deployed succesfully.
I also don’t really think a hostcheck should be executed on the host itself, because it is mainly used to determine the state of the host, or am I wrong about this?
Why is there so much with space in front of the endpoint name?
Also please show the endpoint and zones inside the Director. There still seems to be a problem with them, or the Director not knowing them, hence the errors on deployment.
Also kind of a new thing developed over the weekend. The two successful checks (load and memory) are now also not working correctly:
As I first saw this the nächster check/next check timer was the same as the letzer check/last check timer. Services are still marked with ‘OK’ but delayed in the Webinterface. At least it is consistent again and nothing works
I’m out of ideas…
I’m sure we are missing just one single/simple thing, but can’t figure out what exactly
Is it possible that you install the agent on the windows host from scratch with the script provided by the Director, when you configure the Agent tab on the host?
@dnsmichi do you have any further ideas were to look for the problem?
I have run in the same issue as @ibex
I have a simple setup: single master server and I have a remote windows machine with the agent installed. The windows machine cannot be reached from outside, so the agent is connecting to the master and receving the top down scheduled checks to run.
Everything configured via director.
Service checks are being executed fine with the windows agent.
I cannot get the host check to be executed by the windows agent itself:
I assume that the “command_endpoint” needs to be set so that the command gets executed by the agent itself (e.g. as it gets set for service to “command_endpoint = host_name” if the option “run on agent” is set to Yes).
But would that really meant that I have configure per host (=agent) a zone so that I can select it as “cluster zone”? Or did I miss an option in director?