Passive Check Behavior

robvan · September 11, 2023, 11:52pm

Hi all, I have a passive check configured where I’m getting unexpected behavior after service state != OK. Basically, I have a cron job that runs daily @ 02:00 and then sends a process-check-result. I want the freshness check to run between 9:00-9:15, where if no update in the past 24 hours then change to UNKNOWN. What I am seeing with the following code is cron updates service as WARNING fine, then almos 6 hours later freshness kicks in and changes to UNKNOWN. Any thoughts on what I’m doing wrong?

template Service "generic-service" {
  max_check_attempts = 5
  check_interval = 1m
  retry_interval = 30s
  enable_perfdata = false
}

object TimePeriod "0900to0915" {
  ranges = {
    "monday" 	= "09:00-09:15"
    "tuesday" 	= "09:00-09:15"
    "wednesday" = "09:00-09:15"
    "thursday" 	= "09:00-09:15"
    "friday" 	= "09:00-09:15"
    "saturday" 	= "09:00-09:15"
    "sunday" 	= "09:00-09:15"
  }
}

apply Service "test_service" {
  import "generic-service"
  check_command = "dummy"
  
  enable_active_checks = true
  enable_passive_checks = true
  
  check_interval = 24h
  max_check_attempts = 1
  check_period = "0900to0915"
  
  vars.dummy_state = 3
  vars.dummy_text = {{
    return "No check results received."
  }}
}

moreamazingnick · September 12, 2023, 9:12am

thats the return code for unknown

between 9 and 9.15 the active check dummy will run and and in your case returns 3 (unknown) every minute so about 14 times in this interval

In your case I would do something like this:
https://icinga.com/docs/icinga-2/latest/doc/08-advanced-topics/#check-result-freshness
and adapt the returncode based on the check-age

robvan · September 12, 2023, 9:38am

Thanks for the response, but I actually want it to return UNKNOWN if freshness fails. I’m experiencing a few things I don’t understand:

status changing to UNKNOWN well before the check_period
status changing to unknown when check_interval hasn’t even come close

Maybe Icinga can’t do what I’m wanting it to do, but essentially when a result is sent to Icinga, the check_interval clock starts. When check_period is reached, then have freshness check execute dummy only if a result hasn’t been posted within the check_interval.

rivad · September 13, 2023, 5:18pm

Did you send a TTL and mess up the freshness by doing so?
Sorry, scrap that but keep it in mind as it can mess up your scheduling!

@moreamazingnick is right, it will not work as you think.
you need to calculate vars.dummy_state via DSL this code from the link above could give you some ideas:

{{
    var service = get_service(macro("$host.name$"), macro("$service.name$"))
    var lastCheck = DateTime(service.last_check).to_string()

    return "No check results received. Last result time: " + lastCheck
  }}

This code is for vars.dummy_text but you can change it to fit for vars.dummy_state by calculating if the service.last_check is longer ago then now - 7h 15min then return the number 3 or if newer, return the number 0.

Icinga has a build in programming language - I highly doubt it can’t do what you want but maybe not the way you want

robvan · September 14, 2023, 4:40am

Well, the following day we didn’t experience the same issue, even though the status and results were sent the same way. The ongoing theory is that it was due to retry_interval, so I’ve set it to 24h as well.

moreamazingnick · September 14, 2023, 5:12am

if still you have set check attempts to “5” you need 5 cycles to go to a hard state.
if you set the retry interval to 24h this would mean you will recognise a error after like 5 days or so

check attempt 1 → Softstate critical
passive checkresult → hardstate OK
24h
check attempt 1 → Softstate critical
24h
check attempt 2 → Softstate critical
24h
check attempt 3 → Softstate critical
24h
check attempt 4 → Softstate critical
24h
check attempt 5 → Hardstate critical → notification

robvan · September 14, 2023, 8:55am

Thanks for the reply. I’m actually overriding the default from the template with max_check_attempts = 1 in the service.