Acknowledgment auto remove on state change

jens.bavenmark · December 3, 2020, 8:56am

Hi.

I have a question regarding the acknowledgment that should be removed on a state change on a monitoring point. This works well for alerts that goes from critical/warning to Ok.

But not when a monitoring point goes from warning to critical. This is probably due to the fact that moving from warning to critical isnt a proper state change since both are problems.
Just want to make sure that this is the way it is supposed to work and I dont have a problem with my setup.

And if this is in fact the way it is meant to work. Should it?
Wouldnt it be good if the acknowledgment was removed when going from a warning to a critical state. Since this is for the most i presume when we need to act on them. Warnings are a good way to know that we need to look into this but critical is fix this now

And I know that I could just not acknowledge warnings and it is fixed but since we create jira tickets (with the great plugin) and acknowledge them at the same time I can easily see which alerts hasn’t been handeld.

stevie-sy · December 3, 2020, 10:26am

HI and welcome

yes this is correct

I think this is a matter of definition and not the same for every Icinga user / company. Much depends on how precisely / generously the thresholds are set.You could set a threshold for warning whichs shows you, there is a beginning a problem. Or you set this threshould that this is already a problem for you. We for example are setting the thresholds very often that there could be “growing” a problem. So we are observing this. For example looking at the grafana graph how the rate of increase looks like. Other people will handle this cases in diffrent ways.

For this we have also our own solution. We also can create tickets for our internal specialist, depending on the problem (network, server etc.). But we don’t do this in automatic way. Because it happens sometimes that a colleague forgot to tell us or to create a downtime when he maintain a device.

jens.bavenmark · December 3, 2020, 10:47am

Hi Stevie.

Thanks for your reply. Then I know that its working as intended and will have to adjust our routines according to it.

I understand your point, and usually is this not an issue as warning is usually set so we recieve it a long time before it becomes a problem.

But say for example a disk check. Lets say we set warning at 60% full and critical at 85%. That would give us a good time to rectify the issue and as you say we will look at the graphs and determine how time sensitive this is. But that doesn’t help if the disk for some reason has a spike that pushes it to 95% due to a user doing something they shouldn’t. In this case, if the warning was acknowledge the critical wouldn’t trigger notifications. Might seem like a edge case but there is a reason Im using it as an example

In the previous monitoring system we used changes between warning and critical was seen as a state change (which I then asumed would be the same here, but you know what they say about asume )

But I´ll adapt But now you know that there is atleast one company that uses Icinga that would like that option.

And also, thanks Icinga for a great presentation last night.

stevie-sy · December 3, 2020, 11:45am

We are doing still the same in our company with the disk checks. In our cases it is very often a hudge log file or our server admin installing Windows updates. If it’s the last case, we ignore it for a while .

about the notification: you can adjust which messages you want receive every? only critical messages, also warnings etc. maybe this helps a little bit.

yes @theFeu and @bsheqa did a great job with with the presentation And of course the talks from @tgelf and @cstein were great.

jens.bavenmark · December 3, 2020, 1:51pm

Hi again

Yes that is setup, but since firstline has acknowledge the warning it doesn’t goes to oncall when it hit critical. But I guess I need to make firstline acctually fix the issue before it hit critical