we’ve encountered a problem, where the network connection to one of our satellites went down but there was no indication on this problem in Icinga.
Sadly icinga2 did not show any problems on any affected host i.e.: hosts did not switch to status “critical” or “down”, but showed status “up” with that small “pending”-clock next to it.
Even the status of the icinga2 satellite-host did not change.
Does anyone have an idea, how to modify the configuration on the master-node or icinga2web-node to ensure that we get an error-status when the satellite is not reachable and no information to the hosts monitored by this satellite are available?
Instead of the cluster check command I would suggest the cluster-zone check mentioned in the agent health docs and implement one for each satellite zone.
The cluster check command (imo) has the drawback of only having one single check for all zones and thus not getting notified for any additional zones failures after the first failure.
The cluster check command (imo) has the drawback of only having one single check for all zones and thus not getting notified for any additional zones failures after the first failure.
Yes, exactly why I use both but it provides a nice overview.