Author: @mfriedrich
Version: v0.1Tested with Icinga 2 v2.10.x & v2.11.x
Rationale
We’ve created this “live” while learning more about the Icinga DSL during a NETWAYS trainee workshop.
The idea is to get a list with counts on check plugin usage. That way you can analyse further with check latency and other performance issues.
In contrast to the static configuration and CheckCommand objects, this fetches the live information from the executed check results. If you have many pending checks, just wait a while after an initial start.
Resources
DSL
Use the debug console to connect to a running Icinga 2 API.
$ icinga2 console --connect 'https://root:icinga@localhost:5665/'
Run the following snippet.
// Collect the result, key is the plugin name, value is an incremented counter
res = {};
// Iterate over all service objects and fetch the check result
for (s in get_objects(Service)) {
var cr = s.last_check_result;
if (!cr) { continue; }
var command = cr.command
// Default is an array, but there may be a String as well.
if (typeof(command) == Array) {
for (a in command) {
if (match("*check_*", a)) {
res[a] += 1
}
}
} else if (typeof(command) == String){
res[command] += 1
}
};
// Print the result
res
Examples
<8> => res
{
"/usr/lib64/nagios/plugins/check_disk" = 2.000000
"/usr/lib64/nagios/plugins/check_dns" = 2.000000
"/usr/lib64/nagios/plugins/check_http" = 9.000000
"/usr/lib64/nagios/plugins/check_load" = 3.000000
"/usr/lib64/nagios/plugins/check_mysql_health" = 11.000000
"/usr/lib64/nagios/plugins/check_ping" = 6.000000
"/usr/lib64/nagios/plugins/check_procs" = 1.000000
"/usr/lib64/nagios/plugins/check_ssh" = 4.000000
"/usr/lib64/nagios/plugins/check_swap" = 1.000000
"/usr/lib64/nagios/plugins/check_users" = 1.000000
}
<52> => res
{
"/usr/local/sbin/check_dns_health_check.sh" = 1.000000
dummy = 4.000000
random = 2.000000
}
Note: dummy
and random
are internal checks, in order to collect them, v2.11 is required.
Ideas
- The match for
check_
might not be sufficient with plugins named differently, consider that for the result set. - The above works for Service objects only, Host objects need to be added in a separate loop.
- Use
schedule_*
andexecution_*
to calculate the check latency. - Use the above to calculate an average check latency for each check plugin.
- Move the code snippet into a global function, and call it from a dummy check similar to the cluster checks shown in the docs.