aNAG notifications on soft state fails

fraouf · June 14, 2019, 11:19am

I hope I’m posting to the correct section about this, and provide enough info.

I’m being driven mad with my monitoring client, aNAG for Android, notifying me when a monitored service experiences a soft state fail. i.e. check 1 (out of 5 in my case) is above the warning or critical threshold that I’ve set. I do not get notification emails from Icinga2 when this happens - it is only aNAG that notifies me.

I can’t figure out if it is a bug in aNAG, or an error in the configuration I’ve used for the API user I’ve set up for aNAG, a configuration error I’ve made in aNAG, or some other configuration error I might have made in icinga2.

These soft failures are visible in Icinga2 Web. They are real events that Icinga2 correctly notices and shows if you happen to be looking. But they last only very briefly and I do not want to be warned about them through aNAG. That’s what soft states are for, right? If we reach the number of checks I’ve set (5) THEN I expect a hard fail and to be notified. But not before.

aNAG has a specific setting “Don’t fetch soft state services” which is normally ticked. I’ve tried ticking and unticking and vice versa, but it makes no difference.

I’m using 2.6.3 on Centos 7.

Please can someone suggest some configuration element I might look at?

In api-users.conf, for the user I’ve created specifically for aNAG, after filtering only the hosts and services I want visible based on the value of host.vars.anaghost and service.vars.anagservice, I only have:

permission = “actions/*”

Should there be more? Or less, maybe? Or maybe this has nothing to do with my problem?

In my service template, I have:
max_check_attempts = 5
check_interval = 1m
retry_interval = 30s

I believe there’s a “volatile” setting that can cause a problem like I’m experiencing, but this is definitely not the case in my config. I have never knowingly set it, and grepping the config can not find the word at all.

That’s all I can think of to offer as terms of my configuration, and maybe this would give someone an indication of where I’ve not looked or things I’ve not thought of changing?

I’ve tried searching for aNAG and for “softfail” “soft fail” and “soft state” to no avail.

dnsmichi · June 18, 2019, 7:33am

Hi,

it would be interesting to know which queries and attributes are used by aNag. To my knowledge, the source is not available so it’ll be a little harder to extract that - via debug log for example, getting the request body too.

The permissions required for fetching the objects are not inside the actions/* tree, but objects/query/* - see the table here. That may already be the problem.

Cheers,
Michael

fraouf · June 18, 2019, 1:31pm

Thanks Michael.

I think I had to add actions/* to allow aNAG to Acknowledge or reschedule etc. Maybe I’m mistaken. I will investigate.

I will see about trying to get some logs. I’ve also been trying to reach out to the aNAG author.

In the meantime, does anyone have any recommendations for another good monitoring app for Android that I might try?

Creamers158 · September 16, 2020, 5:48pm

Hi, think it a good question. I would expect more questions about the mobile side of icinga2.
Also, curious what others on this forum are using to get live notifications on mobile. (not talking about mails;)
@fraouf did you find an alternative that is just as good as aNag ?

fraouf · September 16, 2020, 6:35pm

On my god you scared me. Only yesterday I disabled the work-around I had in place to get around this (I had enabled the T3 feature with 3min delay - not ideal but it worked). Then this message came though and…well…my mind boggled.

Anyway, I was able to contact the very helpful developer, who pushed an update that was supposed to resolve the softfail issue. It was some considerable time ago. I did not take any action at the time in terms of disabling the workaround I mentioned and like I say, I only got round to doing it today.

I’ve not had any alerts on softfails so far, but then maybe I’ve not had any softfails yet.