Service Next Check is always late (negative value)

dargx · May 20, 2022, 10:39am

Hi guys. I’ve been reading lots of posts but I’m unable to find a solution to this issue

I just want to check a service once a day between 14:30 and 15:30. However it never gets checked. The service check is always late, I mean always shows negative value:

These are my related config files (Icinga2 version is 2.6.2):

Timeperiod config file:

object TimePeriod “1x7” {
import “legacy-timeperiod”
display_name = “Time 1x7”
ranges = {
“monday” = “14:30-15:30”
“tuesday” = “14:30-15:30”
“wednesday” = “14:30-15:30”
“thursday” = “14:30-15:30”
“friday” = “14:30-15:30”
“saturday” = “14:30-15:30”
“sunday” = “14:30-15:30”
}

Service config file:

apply Service “Service-1x7” {
import “generic-service”
import “generic-service-vars”
check_command = “command.to.execute”
assign where “virt.datacenter” in host.templates
check_interval = 1h
check_period = “1x7”
}

Thanks in advance for any comment.

Regards.

rivad · May 20, 2022, 1:09pm

Maybe use a cronjob/scheduled Task to execute command.to.execute and submit the result via API to the Service that is configured as passive check with freshness.

dargx · May 20, 2022, 1:15pm

Thanks for the hint Dominik.

However it’s hard con understand for me how an apparently easy configuration is so hard to get to work.
I’ve been thinking I should be making some kind of configuration mistakes or so …

Regards.

rivad · May 20, 2022, 2:43pm

check only once per day isn’t easy as you have some jitter and drift in the scheduler and if you allow a 1h window and check every 30min you could get 1 or 2 and if you check every 1h then you can get 1 or 0 check results.

If outside of the check window, “Next Check” is negative also on my system.

That is why, for infrequent and precisely timed checks, I recommend to use a external scheduler for submitting of the check result via API or to submit a check now command to avoid drift.

BTW, the jitter and drift is intentional as the scheduler is supposed to space out the checks as much as possible to even the load.

poing · May 23, 2022, 3:54am

Like @rivad said, 1h check_interval in a 1h check_period is hit-or-miss.

The check_command is being run, but it’s 50:50 the response will be received withing the check_period.

I would set a 24h check_interval, limit it to 1 check a day, and give it a few retires within that 1h window. The below should get you what you’re looking for.

apply Service “Service-1x7” {
  import “generic-service”
  import “generic-service-vars”
  check_command = “command.to.execute”
  assign where “virt.datacenter” in host.templates
- check_interval = 1h
+ check_interval = 24h
+ max_check_attempts = 1
+ retry_interval = 10m
  check_period = “1x7”
}

rivad · May 23, 2022, 8:03am

Welcome @poing,

Do retries help getting out of the overdue state? In other words will the check run in the first 10min of the 1h window?

poing · May 23, 2022, 9:20am

Technically no. The check_interval = 24h will force freshness and help get out of the overdue status quicker.

max_check_attempts limits it to 1 check, but a state change triggers retry_interval.

retry_interval is more about handling a state change, you could clear any issue during the check period. Because it rechecks every 10m. Best used with max_check_attempts.

bkai · May 24, 2022, 6:46pm

Hi @dargx - you may be having other causes for negative “next check” times. Check out my first post in this particular portal quite a while ago: Too many late service checks

Good luck, in any case!

dargx · May 25, 2022, 9:44am

Thanks a lot for your help guys. @poing, your suggestion worked like a charm

@bkai, your thread was the first one I had a look at when I found the issue

Regards.

bkai · May 25, 2022, 10:54am

(Great that your prob has been solved. It’s usual BTW that you mark the oldest post that contains the solution, so that it is shown afterward at the top of the thread after your 1st post. It i.a. tells others that they no longer have to try to help you.)