Hi all, I have a passive check configured where I’m getting unexpected behavior after service state != OK. Basically, I have a cron job that runs daily @ 02:00 and then sends a process-check-result. I want the freshness check to run between 9:00-9:15, where if no update in the past 24 hours then change to UNKNOWN. What I am seeing with the following code is cron updates service as WARNING fine, then almos 6 hours later freshness kicks in and changes to UNKNOWN. Any thoughts on what I’m doing wrong?
Thanks for the response, but I actually want it to return UNKNOWN if freshness fails. I’m experiencing a few things I don’t understand:
status changing to UNKNOWN well before the check_period
status changing to unknown when check_interval hasn’t even come close
Maybe Icinga can’t do what I’m wanting it to do, but essentially when a result is sent to Icinga, the check_interval clock starts. When check_period is reached, then have freshness check execute dummy only if a result hasn’t been posted within the check_interval.
Did you send a TTL and mess up the freshness by doing so?
Sorry, scrap that but keep it in mind as it can mess up your scheduling!
@moreamazingnick is right, it will not work as you think.
you need to calculate vars.dummy_state via DSL this code from the link above could give you some ideas:
{{
var service = get_service(macro("$host.name$"), macro("$service.name$"))
var lastCheck = DateTime(service.last_check).to_string()
return "No check results received. Last result time: " + lastCheck
}}
This code is for vars.dummy_text but you can change it to fit for vars.dummy_state by calculating if the service.last_check is longer ago then now - 7h 15min then return the number 3 or if newer, return the number 0.
Icinga has a build in programming language - I highly doubt it can’t do what you want but maybe not the way you want
Well, the following day we didn’t experience the same issue, even though the status and results were sent the same way. The ongoing theory is that it was due to retry_interval, so I’ve set it to 24h as well.
if still you have set check attempts to “5” you need 5 cycles to go to a hard state.
if you set the retry interval to 24h this would mean you will recognise a error after like 5 days or so