Check Result Freshness from Endpoint

I’m searching for a possibility to verify the check result freshness from a master or satellite point for a client check:

I’m using the config sync method to configure my checks on the clients. Whenever the connection to a client gets lost, the master recognizes that the client’s checks are overdue, but as the check is only being performed from the client side, the master (or satellite) does not perform any freshness checks. Therefore the overdue services stall in the “OK” state (or whatever was the last successful check result) and do not turn to “CRITICAL” or “UNKNOWN”.

How to archive that checks performed from clients turn into a “UNKNOWN” state whenever the check is overdue as the client has lost its connection to the master/satellite?

You could change the hostcheck to the cluster-zone check. Then it will go critical if the host is not connected to the master/satellite anymore.
But be aware that the check will go critical on reloads, so plan your max_check_attempts and the retry_interval to keep it in soft state until it reconnects to master/satellite.


1 Like


you could also create a DSL based cluster check, and collect the states as well as the last_check vs next_check in combination with estimated values from check/retry_interval. That’s something described in the troubleshooting docs, so you’ll need to do a little finger dance here :wink:

Please share your attempts and findings here :slight_smile: