Delay state change from warning to critical

Oliveter · September 5, 2019, 2:44pm

Hi,

I’m new to Icinga but I like it alot so far, alot of new things to learn.

I want the state change from warning to critical to behave similar to when it changes from an OK state to something else. For an OK state we have max_check_attempts and we get a soft state that we can define for example 3 attempts and the retry interval of 1 minute until the state changes to for example warning.

Is it possible to have the same behavior when going from warning to critical state?

Regards,
Erik

dnsmichi · September 5, 2019, 4:20pm

Hi,

you can render any state change a HARD one with enabling the volatile setting. I’m not sure though whether this really satisfies your need. What’s the idea with that warning to critical window?

Cheers,
Michael

Oliveter · September 5, 2019, 4:47pm

Hi,

The idea is get the notification when it changes from warning to critical after a number of checks and not instantly. For example if the state of a service is in warning and on the next check its critical it will send a notification immediately but I would like it to check a few more times before sending the notification and perhaps the state has changed to warning again so the critical notification will never be sent. I hope that makes sense.

From my understanding the volatile option does not help with this?

I’m very new to Icinga so perhaps I’m missing something obvious here.

Thanks.

Regards,
Erik

dnsmichi · September 6, 2019, 7:11am

Hi,

no, then forget about volatile. I don’t really think that this natively is supported, but maybe you can build such a logic with using event handlers. These are fired during SOFT states as well, and may detect the state changes and count for themselves. That being said, a script with its own state machine and storage would be needed … not the average beginners task though.

Regarding the chain - the notification will only be sent if the HARD state is reached. That happens when max_check_attempts is finally hit, e.g. check 3 of 3. If before the service changed from warn to crit to warn, and remains warning at the HARD state, only the warning will be sent.

If you then have notification type filters which disallow warning, you won’t get immediately an alert. Only if the service would be critical at this very moment.

Does that clear things up a little?

Cheers,
Michael

log1c · September 6, 2019, 7:45am

Would setting a notification delay satisfy your needs? With this you can define a time, let’s say 15m, before the notification is sent. Thus allowing a check with a 5m interval to roughly check three time in between the change to the HARD state and sending the notification.

Example:

apply Notification "mail-service-notification-autoticket-CHECKS_15m-Delay" to Service {
    times = {
        begin = 15m
    }
    command = "mail-service-notification-autoticket"
    interval = 0s
    period = "24x7"
    assign where service.name == "this" || ... and many more
    states = [ Critical, OK ]
    types = [ Custom, Problem, Recovery ]
    users = [ "Autoticket" ]
}

Oliveter · September 6, 2019, 8:47am

Thanks for your answers, I understand the problem a bit better now.

Michael, I think I probably could build something like that myself I would have learn a bit more about Icinga first but that could be fun

log1c, I thought about the notification delay myself actually but then I get the issue if the state goes from OK to Critical the notification is delayed for longer than I want. Thanks for the idea though.

Would it be possible to add something like this to the notification? But then I guess the notification would be sent even if the state has been changed back to OK or WARNING when the delay window is over?

apply Notification "mail-service-notification-autoticket-CHECKS_15m-Delay" to Service {
if ( $service.last_state$ == "WARNING") {
times = {
    begin = 15m
    }
}
command = "mail-service-notification-autoticket"
interval = 0s
period = "24x7"
assign where service.name == "this" || ... and many more
states = [ Critical, OK ]
types = [ Custom, Problem, Recovery ]
users = [ "Autoticket" ]

}

Regards,
Erik

Oliveter · September 6, 2019, 8:53am

Would not this send the notification even if the state has changed back to OK after 5 minutes?

Regards,
Erik

log1c · September 6, 2019, 9:40am

Yes and no. It will create the OK notification, which is needed for resetting the notification state, so that there will be further notifications for warning/critical problems.

I myself don’t send out these OK notification by configuring the user objects with just the warning/critical states and the problem type.

Oliveter · September 6, 2019, 1:15pm

Great, thanks. Then this is solved
Thanks for your help!

Regards,
Erik

pparent76 · May 31, 2021, 2:58pm

@Oliveter

I have the exact same problem as you had. Unfortunatly with the information you gave I cannot succeed in sovling it. Adding a “if” in my config file as you wrote above does not seem to work.

Can you summup how to overcome the problem?

Thank’s
Pierre.

Oliveter · June 2, 2021, 9:35pm

I don’t think I have this in use currently and I don’t remember if it worked or not.
The example above is not correct though I believe if you did a copy & paste.

I think you need to use “service.last_state” without the $ sign for it to work. If you don’t get any errors when applying the config I don’t see why it shouldn’t work.