Passive Checks - Reliable enough?

Hi,

Is passive check implementation reliable enough. I’m limited experience with it and I can’t see that it is reliable.

Is there anyone out there using passive check implementation on large scale and want to share experience with it?

Thanks

Not using them on a large scale, but just a few.

Internally I use some passive checks to collect information from our SCOM environment, where I have an active check that collects the data and splits the found messages to the corresponding passive checks of the windows host objects in icinga2.

For a customer I set up a passive check that gets filled by SNMP traps, when there is an auth failure on his WLC. This works very well, one every failure the check is updated and if there is none the check resets itself via an active dummy check.

The reason I don’t use passive checks heavily is that I find them a tad more complicated to set up (maybe a lack of experience) and I’m not seeing that many use cases for me (apart from SNMP traps, which I try to avoid anyway)

2 Likes

Hi,

when going the passive check result route, you’re leaving the “pass a threshold to a plugin from Icinga” way which requires the check result sender to determine

  • the output
  • the performance data metrics
  • the final state / exit code

This can become cumbersome when that logic is written into the scripts on the remote end - you will at least need some sort of config management (git, Puppet, etc.) to keep this in sync and rolled out anywhere used.

Typically you should use the REST API as secure remote transport for sending in check results. The legacy external command pipe 1) is to be deprecated and removed 2) needs an SSH tunnel for remote access including public key setup.

The docs already hint towards the REST API whose action process-check-result is exactly what you’re looking for. If you have a pattern for large scale check result and state calculation from massive results, you may also use the filter parameter to pass multiple of the same.

Cheers,
Michael

1 Like

Hi,

I have a large scale icinga installation in a distributed setup, I am also running passive checks to send and update around 7000 services at the same time but some of the services always remains at unknown state due to timout. Increasing the check_timeout interval did not help, the logs tells me the api connection is getting disconnected frequently and it tries to reconnect. Even I am finding the passive checks that are completely dependent on the api connection not so reliable for such a large setup. The api connection cannot be retained for a long period of time and it keeps on disconnecting and reconnecting