I am using icinga with one master and many agent (almost 300 agent with 20 services per agent). All the monitoring is based on my cmdb.
I have a question,
Sometimes some servers are going to die (after 30 minutes), but the master can still join them (hostalive / ping) during this time. All the services check are not coming to the master so there is a lag and I have the message “Remote Icinga instance HOSTNAME is not connected to master” on all services (20) on the host with state UNKNOWN.
So my problem is I dont get any critical (only after 30 mimutes when the server is really dead because of the hostalive check). I would like to know if it’s possible to change the state when I have a message “Remote icinga is not connected …” to critical instead of unknown ? Or maybe, of all services are in stage unknown then change host state to critical ?
Of course if you have a better solution I would love to know it !