No error status shown when satellite/zone information not available

mawolff · April 24, 2023, 6:34am

Hi all,

we’ve encountered a problem, where the network connection to one of our satellites went down but there was no indication on this problem in Icinga.

Sadly icinga2 did not show any problems on any affected host i.e.: hosts did not switch to status “critical” or “down”, but showed status “up” with that small “pending”-clock next to it.
Even the status of the icinga2 satellite-host did not change.

Does anyone have an idea, how to modify the configuration on the master-node or icinga2web-node to ensure that we get an error-status when the satellite is not reachable and no information to the hosts monitored by this satellite are available?

Icinga2Web 2.11.4

Kind regards
Markus

rivad · April 24, 2023, 3:00pm

This is by design to not swamp you with messages about every host and service behind the satellite.

I use a " Icinga Cluster Health" service on the master to let me know about such things.
The check command is called cluster.

I also use a agent-health check on the endpoints.

https://icinga.com/docs/icinga-2/latest/doc/06-distributed-monitoring/#health-checks

log1c · April 25, 2023, 8:23am

Instead of the cluster check command I would suggest the cluster-zone check mentioned in the agent health docs and implement one for each satellite zone.
The cluster check command (imo) has the drawback of only having one single check for all zones and thus not getting notified for any additional zones failures after the first failure.

rivad · April 25, 2023, 8:40am

The cluster check command (imo) has the drawback of only having one single check for all zones and thus not getting notified for any additional zones failures after the first failure.

Yes, exactly why I use both but it provides a nice overview.