Unknown state removes Acknowlegement to Critical service

ken · March 18, 2020, 10:41pm

This behavior is probably by design, but I wonder if there is a way to change the behavior.

Let’s say a service is Critical, it’s a known problem, so we Ack it with no expiration, with a ticket number in the comment. We expect when it comes back to the OK state, both the Critical and the Ack will go away.

But in the mean time, the host is temporarily unreachable, and the check gets a timeout and goes to Unknown. At that point we lose the Ack, and when the host is reachable again, the Critical shows up at the top level minus the Ack.

Is there a way to preserve the Ack through these Critical->Unknown->Critical state changes?

rsx · March 19, 2020, 7:43am

I’d recommend to add dependency and disable service checks when the host is unreachable. This will keep the state Critical incl. Ack.

ken · March 19, 2020, 12:09pm

Hmm. I was thinking of modifying the state transitions, but that could work. The problem is, that service to host dependency is already implicitly created, how do I modify it? I’ve tried to replace it, and get an error that it already exists.

rsx · March 19, 2020, 12:43pm

I’m simply using this one:

apply Dependency "disable-host-service-checks" to Service {
  disable_checks = true
  assign where true
}

And please be aware that the host have to go to hard state before any service check can do. Means you may adapt your check_interval and/or retry_interval and/or max_check_attempts.

ken · March 19, 2020, 1:30pm

Thanks.
That’s right, I have elaborated on my setup in this thread: Notifications despite Dependencies

ken · November 29, 2021, 8:28pm

Still an issue, so I created a bug report/enhancement request.