Get number of unhandled critical services from performance data of localhost

krappi01 · March 24, 2022, 5:22pm

Hi Icinga community,

I want to get the total amount of all unhandled critical services from performance data
of localhost icinga service check (that one returning “Icinga 2 has been running for…”)

I tried num_services_problem - num_services_handled - num_services_warning (too few) and
num_services_critical - num_services_acknowledged (too much)

I expect to get the same number that is displayed in the red box in icingaweb2 next to the menu entry “Service Problems”.

Anyone an idea which performance data values I can use to calculate the same exact vlaue?

Example values:
num_services_problem 313.00
num_services_critical 276.00
num_services_warning 9.00
num_services_handled 309.00
num_services_acknowledged 74.00

Expected value from ui = 5

Thanks and regards
Markus

moreamazingnick · March 24, 2022, 7:27pm

|num_services_unknown|0.00|
|num_services_critical|6.00|
|num_services_warning|1.00|
|num_services_acknowledged|2.00|
|num_services_problem|7.00|
|num_services_in_downtime|1.00|
|num_services_handled|3.00|
|num_services_acknowledged|2.00|

In my case, I have:
6 Criticals
2 of them acknowledged
1 Warning, this warning is in downtime
my badge shows 4

As you noticed the badge stands for unhandled criticals.
But you can’t get that out if the icinga performance data since you don’t know if there are warnings handled by an downtime, acknowledged or not.

I would suggest you create a feature request via github for these performance values, since unhandeld_criticals is very usable.

krappi01 · March 24, 2022, 9:44pm

thanks for your quick reply. Then it seems to be impossible right now.
but maybe you know how to get the value from the icinga or icingaweb databse?
Thanks

moreamazingnick · March 24, 2022, 11:17pm

I used the icingacli --showsql to generate the base query and wrapped something around to get the needed output:
run it agains your ido db

Select * from (SELECT so.name1 AS host_name, CASE WHEN hs.has_been_checked = 0 OR
hs.has_been_checked IS NULL THEN 99 ELSE hs.current_state END AS
host_state, hs.output AS host_output, CASE WHEN
(hs.problem_has_been_acknowledged + hs.scheduled_downtime_depth) > 0
THEN 1 ELSE 0 END AS host_handled, hs.problem_has_been_acknowledged AS
host_acknowledged, CASE WHEN (hs.scheduled_downtime_depth = 0) THEN 0
ELSE 1 END AS host_in_downtime, so.name2 AS service_description, CASE
WHEN ss.has_been_checked = 0 OR ss.has_been_checked IS NULL THEN 99 ELSE
ss.current_state END AS service_state, ss.problem_has_been_acknowledged
AS service_acknowledged, CASE WHEN (ss.scheduled_downtime_depth = 0 OR
ss.scheduled_downtime_depth IS NULL) THEN 0 ELSE 1 END AS
service_in_downtime, CASE WHEN (ss.problem_has_been_acknowledged +
ss.scheduled_downtime_depth + COALESCE(hs.current_state, 0)) > 0 THEN 1
ELSE 0 END AS service_handled, ss.output AS service_output, ss.perfdata
AS service_perfdata, UNIX_TIMESTAMP(ss.last_state_change) AS
service_last_state_change FROM icinga_objects AS so
 INNER JOIN icinga_services AS s ON s.service_object_id = so.object_id
AND so.is_active = 1 AND so.objecttype_id = 2
 INNER JOIN icinga_hoststatus AS hs ON hs.host_object_id =
s.host_object_id
 INNER JOIN icinga_servicestatus AS ss ON ss.service_object_id =
so.object_id WHERE (CASE WHEN COALESCE(ss.current_state, 0) = 0 THEN 0
ELSE 1 END = '1') ORDER BY s.display_name COLLATE latin1_general_ci ASC) as b where b.service_handled = 0 and b.service_state =2

if you use “SELECT count (*)” you get the count

krappi01 · March 25, 2022, 1:08am

wow - thanks a lot!
i simplified the query a little bit:

SELECT count(*) as unhandledCritical
FROM icinga_objects AS so
INNER JOIN icinga_services AS s ON s.service_object_id = so.object_id AND so.is_active = 1 AND so.objecttype_id = 2
INNER JOIN icinga_hoststatus AS hs ON hs.host_object_id = s.host_object_id
INNER JOIN icinga_servicestatus AS ss ON ss.service_object_id = so.object_id
WHERE (CASE WHEN COALESCE(ss.current_state, 0) = 0 THEN 0 ELSE 1 END = '1')
and (CASE WHEN (ss.problem_has_been_acknowledged + ss.scheduled_downtime_depth + COALESCE(hs.current_state, 0)) > 0 THEN 1 ELSE 0 END) = 0
and (CASE WHEN ss.has_been_checked = 0 OR ss.has_been_checked IS NULL THEN 99 ELSE ss.current_state END) = 2

Tomorrow I will put in our dashboards

krappi01 · March 25, 2022, 12:26pm

Do you know if it’s possible to get the value over a specific timerange e.g.
13:25 6
13:26 7
13:27 5

Or do I have to make a cronjob to save the historical values?

moreamazingnick · March 25, 2022, 2:40pm

you need to build something arround.
you could create a standard mysql check so you get performance data → these are data over time
you could create a own check that writes these data as perforance data → same as above
or you make your cron job and write it to disk/db/ether/blockchain/print_it_out

krappi01 · March 25, 2022, 5:32pm

Yes I made a mysql check for it that returns it as performance data.
Great idea - Thanks a lot!

marribork · July 6, 2022, 1:46pm

tell me how to get values from icinga database or icinga web?

krappi01 · July 6, 2022, 1:58pm

Run the above sql query on the icinga database

krappi01 · July 12, 2022, 1:56pm

in case anyone needs those statistic values from the new Icinga DB:

SELECT (SUM(CASE WHEN service_state.is_acknowledged = 'y' THEN 1 ELSE 0 END)) AS services_acknowledged, (SUM(CASE WHEN service.active_checks_enabled = 'y' THEN 1 ELSE 0 END)) AS services_active_checks_enabled, (SUM(CASE WHEN service.passive_checks_enabled = 'y' THEN 1 ELSE 0 END)) AS services_passive_checks_enabled, (SUM(CASE WHEN service_state.soft_state = 2 AND service_state.is_handled = 'y' THEN 1 ELSE 0 END)) AS services_critical_handled, (SUM(CASE WHEN service_state.soft_state = 2 AND service_state.is_handled = 'n' THEN 1 ELSE 0 END)) AS services_critical_unhandled, (SUM(CASE WHEN service.event_handler_enabled = 'y' THEN 1 ELSE 0 END)) AS services_event_handler_enabled, (SUM(CASE WHEN service.flapping_enabled = 'y' THEN 1 ELSE 0 END)) AS services_flapping_enabled, (SUM(CASE WHEN service.notifications_enabled = 'y' THEN 1 ELSE 0 END)) AS services_notifications_enabled, (SUM(CASE WHEN service_state.soft_state = 0 THEN 1 ELSE 0 END)) AS services_ok, (SUM(CASE WHEN service_state.soft_state = 99 THEN 1 ELSE 0 END)) AS services_pending, (SUM(CASE WHEN service_state.is_problem = 'y' AND service_state.is_acknowledged = 'n' THEN 1 ELSE 0 END)) AS services_problems_unacknowledged, (SUM(CASE WHEN service.id IS NOT NULL THEN 1 ELSE 0 END)) AS services_total, (SUM(CASE WHEN service_state.soft_state = 3 AND service_state.is_handled = 'y' THEN 1 ELSE 0 END)) AS services_unknown_handled, (SUM(CASE WHEN service_state.soft_state = 3 AND service_state.is_handled = 'n' THEN 1 ELSE 0 END)) AS services_unknown_unhandled, (SUM(CASE WHEN service_state.soft_state = 1 AND service_state.is_handled = 'y' THEN 1 ELSE 0 END)) AS services_warning_handled, (SUM(CASE WHEN service_state.soft_state = 1 AND service_state.is_handled = 'n' THEN 1 ELSE 0 END)) AS services_warning_unhandled FROM service LEFT JOIN service_state ON service_state.service_id = service.id