CRITICAL states changing content and saving history

Hi,
I have written a custom check in bash to check certain status from data provided in json in service endpoint.
Basically what it does it checks for some statuses and returns for example:
“CRITICAL - Main status is DOWN and there are 1 modules reporting DOWN”
or
“CRITICAL - Main status is DOWN and there are 4 modules reporting DOWN”
and some perf data
In Webpanel I see that states are only visible when they change from one to another + some notifications.
Is it possible for Icinga to remember when the content of CRITICAL status content changes ?

By content, do you mean the output? Or just a history of changes?

It sounds like you’re using the API. That can pull things like last state change, last state, output, etc. It’s only the immediate preceding check, though.

I would recommend logging into your IDO database and running the following:

SELECT * FROM icinga_notifications;

And just browse this a bit. This will likely only be showing you things that went into HARD state, though.

Can I know a little more detail about your check? I’ve written some similar things.

By content I mean the output.

About API - I don’t think that this will help. I have logged into IDO and run following:

select state_time, state,output from icinga_statehistory where object_id = 3942;

and it showed only UNKNOWN and then single one to CRITICAL.

Thing is that CRITICAL output can change and I would like to make those data available for customer if needed.

About the check. It’s simple bash scripts with 4 functions (88 lines) that uses ‘jq’ and ‘curl’. I use NRPE v3 to gather data for Service from Host.
Please let me know if you would like to know more.

Well, we could take a couple approaches here.

Since notification history holds output, does this check go into hard state right away? You could setup a dummy notification for the sake of recording in that table.

Are there any details other than the number of modules being down? From what I’ve seen so far (which given 4 functions probably isn’t even the half of it), I’d try to get more out of my graphing solution for this.

The ‘dummy notification’ sounds reasonable. There are 3 checks (each 10 minutes apart) for SOFT and then it goes into HARD state.

Yes you are right - there are more details then number of modules. Specifically details of module which I would say resemble json at closest.

I have graphing solution (graphite module) which gives me the number of modules_down, _up and _reporting.

Okay, so sounds like you’re collecting more metrics than module1, module2, etc. I mean you can totally blow it up but sounds potentially super messy.

For the long run, though, if you’re writing a lot of custom stuff, I added a database called icingatrash and I dump all kinds of stuff in there so I can store and report on it however I want.

Forgot to mention that details for the module are only stored for modules in DOWN state.

Can you please elaborate on ‘icingatrash’ approach ?

That is not a standard naming convention or anything. Writing custom monitoring is one of my primary job functions so I just wanted a place to store things. I generally query Icinga’s API or IDO for retrieving info from Icinga itself though. More of programmer advice than Icinga.

I’d still probably try to leverage graphite here though. Also, have you looked into the Business Process module? I was going to start tinkering with that soon, but it tracks groups of checks that are alerting in a dashboard and pages accordingly. Doesn’t replace some of the custom checks I’ve written that behave similarly (and might not this), but it’ll probably add a lot of benefit.

actually are you comfortable posting a slightly scrubbed verion of one of the more verbose output messages?

I think so:

CRITICAL - Main status is DOWN and there are 1 modules reporting DOWN

xxx1 is DOWN because { “cei”: { “status”: “UP”, “connection”: { “status”: { “status”: “UP” }, “uri”: “domain1.com”, “socketTest”: { “status”: “UP” } } }, “sub_xxx1”: { “status”: “DOWN”, “connection”: { “status”: { “status”: “DOWN” }, “uri”: “domain2.com”, “socketTest”: { “status”: “UP” }, “sslSocketTest”: { “status”: “DOWN”, “error”: “java.security.cert.CertificateException: No name matching domain2” } } } } -

That’s pretty cool, actually.

So, as I’m following now, you want to be able to retrieve that status at any given time?

Honestly, I might not lean on icinga at all for storing that part of the info, definitely for tracking it but digging through IDO for it sounds cumbersome, especially if we’re getting it to page you by telling it to not go into soft states.

Have you considered adding a folder in to /var/log/icinga2 for that, and just having bash save the output to a timestamped .json file?

So, potentially wrong rabbit hole and I think I’ve given some crap advice this week.

I must have had brain fog on how icinga_statehistory actually works. Were you saying the “output” column was only saying UNKNOWN or CRITICAL and nothing else, but the actual output you see in icniga is the json object? did you check long_output?

1 Like