I’ve been building out an Icinga2/IcingaWeb2 system and am impressed by the capabilities and configuration. However (the fly in the ointment) I’m seeing a lot of both service critical and host down notifications that (I think) should be blocked by dependencies.
My setup is fairly straightforward. A few servers depend on their
upstream router, which lies between the servers and a single Icinga2
instance. Each runs a “Nagios NRPE” (tcp-nrpe) Service that has an
implicit dependency on its host. In turn, each host has many NRPE
services that have an explicit dependency on “Nagios NRPE”, as well as
their implicit host dependency.
The router host has only ping monitoring. When the router no longer
pings, I see two things quickly afterwards. “Nagios NRPE” notifies,
then the router notifies again, within a minute. I may or may not see
a few random services on the servers notify in that same timeframe.
Trying to follow best practices, I have the router on a shorter check and
retry time that the downstream servers.
I tried to force dependencies by making the implicit host dependencies explicit,
only to get 100% duplicate dependency error messages when checking config.
As a result of this set of behaviors, Icinga notifies at roughly 3
times the rate of the older Nagios system.
I’ll summarize my configs in a moment. Thank you for taking a look.