Hi !
I have an environment with many zones and always 2 satellites by zone.
I have some passive checks working well when data is sent on a regular basis from all satellites (every 5 min)
but there is something that look strange to me,
when a passive check is sent from only 1 satellite, icinga doesn’t care and think service is overdue and affect an unknow soft state.
It result in an alternace of OK / UNKNOWN of the service.
I’m interested in what you’re trying to do here. dummy_state = 3 will cause an unknown whenever the active check runs.
Are the satellites aware of each other? They’ll both run all the checks if not. They should have each other’s IP addresses in their respective zones.conf but not their own.
The dummy_state 3 is only aplied when check_interval is overdue, in this case it seeem this value is satellite dependant and not calculated for the service and this is my problem.
Here I can’t run the check on both satellite because it report a value from itself (script report on sat).
To keep it simple : I run every 5 min a python script on the satellite itself. It do its work and when it finish sucessfully, sends an API request to localhost port 5665 with this function :
This function is used by many passive checks, it works well.
In my context this is only executed from 1 satellite only, and use these parameters :
status : 0 (OK)
hostname : satellite name in icinga
servicename : the service i want to update
output : a string that say “script ok”
perfs : some perf data relative to script
user: icinga api user
password : icinga api password
checkHostName : satellite name in icinga
I don’t understand why the master does not merge the satellites results to determine service state.
There’s a little more to your zone that can cause duplicate checks, mainly the endpoints. Can we see those? For example, satellite 1 would look like this:
Similarly, the endpoint objects for satellite 2 would have a host entry for satellite 1, but not itself. They need this in order to coordinate with each other. This gets really scary when they fire event handlers at the same time.
so on sat1.lan, you can remove the host line for itself, but leave sat2’s host, and vice versa. This way they’re not trying to connect to themselves but will connect to each other. You’ll then need to restart each satellite.
We have updated every zone.conf considering thoses best practices and check are now well merged and no flapping is visible anymore on the passive checks.