It looks like Top Down Config Sync is no longer valid?

Have you tried to override the services’ zones?

No, what is your recommendation?

To put the Host in the agent Zone and to override the Services’ Zones as desired.

Cool, thanks. This is working partly:

  • All service checks as expected
  • zone for host’s check_command cannot be reconfigured, hence, no suitable solution here?
  • perfdata is replayed but not stored in influxdb as described here which now belongs to Linux as well

But you put the Host into the desired Zone, didn’t you?

Yes, in the agent’s zone. Hence, the host check_command is executed locally on that host (which does not make sense).

BTW: Shall I add a comment for the influxdb issue that it happens on linux as well?

  1. What would make sense?
  2. Yes, please.
  1. e.g. hostalive or cluster-zone running on parent
  2. done

Please try setting the services’ zones respectively.

For check_command on host object?

No, the zone attribute on the hostalive and cluster-zone services.

The services are well functioning. The host’s check_command is the problem.

The host’s… check_command?

Yes:

object Host "my-server1" {
  address = "10.0.0.1"
  check_command = "hostalive"
}

What if you set the zone attribute on the Host and all Services as needed?

All service checks are fine then but host check_command is scheduled at the agent which does not make sense.

Hmm, it looks like I found a solution:

/etc/icinga2/zones.d/iatl.em.lan/host.conf:

object Host "iatl.em.lan" {
   address = "192.168.1.223"
   check_command = "hostalive4"
   zone = get_object(Zone, name).parent
   vars.os = "Linux"
}

/etc/icinga2/zones.d/global-templates/ssh.conf (example for being scheduled on parent):

apply Service "ssh" {
   check_command = "ssh"
   check_interval = 10s
   assign where host.vars.os == "Linux"
}

/etc/icinga2/zones.d/global-templates/cpuload.conf (example for being scheduled on agent):

apply Service "linux_cpuload" {
   check_command = "load"
   check_interval = 10s
   zone = host.name
   assign where host.vars.os == "Linux"
}

Can anybody confirm?

Did you get this solved? I think I pretty much have a similar use case:

I’ve multiple agents run sensor checks (among others) collecting performance data. As far as I understand when I set Run on agent in Director the Command Endpoint mode is used to execute checks and I lose performance data when the host is down.

In this particular case it is quite normal for the host to lose internet for hours. If my understanding is correct I should use Config Sync method? Is this just done by setting Run on agent to false? Does the agent automatically run those when accept_config = true and checker feature is enabled?

By setting execute_active_checks = true on the host object the hostalive is ran from master but it seems to also run Command Endpoint checks and local agent checks causing the issue described in the troubleshooting guide. This results in random/broken/false notifications. Disabling checker feature on the agents seem to solve this case however then I have missing performance data.

I think it should be that the agent accepts the config from master, runs the checks on its own (unaffected by connection to master) and reports back its results immediatly or via replay log when online. However the hostalive check should be checked from the master (reachability of the agent) since I need notification when the host stays down for multiple days.

The director does not support top down config sync at all. accept_config is a different story and not relevant for scheduling.

I’ve described how to configure in my previous post, but in the meantime I had a discussion with one of the Netways consultants and he advice against using this approach since it works now but may be fail in future due to a code change. In my opinion such a change happened already and I’d assume it was with 2.11.

Thank you for the clarification. I currently run 2.11 with director and icingadb release candidate to evaluate all new features on our existing use cases.

So when using director currently only Command Endpoint mode is possible while abdicating performance data when host is not reachable. Obeying the rules from the troublehooting guide (disable checker on agents and do not include conf.d directory) should avoid stray notifications.

Additionally what I asked myself while reading the documentation was if I should use a agent-health service like described in Dependencies for Agent Checks vs. just trusting the traditional hostalive dependency described in Implicit Dependencies for Services on Host for agents?

I observed the issue that UNKNOWN service notifications (agent not connected) were triggert when the agent went down.

  • Hosts are 3 checks interval 2m retry 1m
  • Services are 4 checks interval 2m retry 1m

Still sometimes hostalive checks run up to 90s interval when on retry interval allowing service checks to reach hard state before hostalive.

Not sure if this somehow is related to the mentioned dpendency configuration above.