BTW: Haven’t been doing anything except analyzing (i.e. no restart yet).
Last restart on master was 2019-02-13 10:35:54 and the clients reported properly i.e.:
[2019-02-13 10:36:03 +0100] information/ApiListener: Finished syncing endpoint ‘master.example.com’ in zone ‘example.com’.
Name resolving is still working.
On the master: yes on the client: no (means default: const ZoneName = NodeName)
Maybe my dependency rule? However, the was no reason to so (no relevant event in the history). This notification disable was at [2019-02-13 17:43:56 +0100]
That sounds strange, I know, and I had a hard and long discussion with Lennart about this topic (and finally I could convince him). Background: I’ve been managing a default setup which is deployed one by one for every customer. Before director 1.6.0 there was no option to create services on a “template machine” and distribute them to all customer sites. Therefore, we decided to go with a mix (which is not recommended of course): hosts, host templates and some other handled via director, but service definition via conf files (distributed with deb packages). The “connection” between these two worlds is done by the assign rules.
And this setup works for all other clients and for the mentioned client until 5 days ago.
Unfortunately, it does, but I’ll follow Lennart’s advice to use host.name instead (but had not enough time yet, to replace it everywhere.
still a setup hard to debug and troubleshoot. Since you’re saying that you tried to convince Lennart, you already know that this isn’t a long term solution and needs a proper configuration at some point in the future.
I’m curious how the host object for this service looks like, I suspect that there’s a mix in place between local check execution in a zone plus the command endpoint triggered by the parent node.
icinga2 object list --type Host --name client.example.com
What happens in the debug logs on the master and the agent host when you force a re-check of this service?
Either the check should be executed by the agent itself, setting its zone to example.com
Or use the command endpoint execution bridge and leave the host’s zone to master, thus setting the service command_endpoint to the agent’s endpoint (with host.name in the static config, later via Director agent settings).
I’m sorry, I’m getting confused. If the agent shall execute the check itself, I’d assume the zone should be client.example.com. However, with Icinga Agent Discussion in mind, this is not fully supported, thus, not recommended.
the responsible zone (“authoritative for this object”) for initiating a remote command endpoint check should always be the parent zone of an agent, e.g. master or satellite. In your example output from object list, the host explicitly sets the zone to example.com, not master.
This is a common error with checks not being executed, thus I am asking it. If you’ve changed that to master already, everything should be fine.