DSL: Count check plugin usage from service checks

Author: @mfriedrich
Version: v0.1

Tested with Icinga 2 v2.10.x & v2.11.x

Rationale

We’ve created this “live” while learning more about the Icinga DSL during a NETWAYS trainee workshop.

The idea is to get a list with counts on check plugin usage. That way you can analyse further with check latency and other performance issues.

In contrast to the static configuration and CheckCommand objects, this fetches the live information from the executed check results. If you have many pending checks, just wait a while after an initial start.

Resources

DSL

Use the debug console to connect to a running Icinga 2 API.

$ icinga2 console --connect 'https://root:icinga@localhost:5665/'

Run the following snippet.

// Collect the result, key is the plugin name, value is an incremented counter
res = {};

// Iterate over all service objects and fetch the check result
for (s in get_objects(Service)) { 
  var cr = s.last_check_result;

  if (!cr) { continue; }

  var command = cr.command

  // Default is an array, but there may be a String as well.
  if (typeof(command) == Array) {
    for (a in command) {
      if (match("*check_*", a)) {
        res[a] += 1
      }
    }
  } else if (typeof(command) == String){
    res[command] += 1
  }
};

// Print the result
res

Examples

<8> => res
{
	"/usr/lib64/nagios/plugins/check_disk" = 2.000000
	"/usr/lib64/nagios/plugins/check_dns" = 2.000000
	"/usr/lib64/nagios/plugins/check_http" = 9.000000
	"/usr/lib64/nagios/plugins/check_load" = 3.000000
	"/usr/lib64/nagios/plugins/check_mysql_health" = 11.000000
	"/usr/lib64/nagios/plugins/check_ping" = 6.000000
	"/usr/lib64/nagios/plugins/check_procs" = 1.000000
	"/usr/lib64/nagios/plugins/check_ssh" = 4.000000
	"/usr/lib64/nagios/plugins/check_swap" = 1.000000
	"/usr/lib64/nagios/plugins/check_users" = 1.000000
}
<52> => res
{
	"/usr/local/sbin/check_dns_health_check.sh" = 1.000000
	dummy = 4.000000
	random = 2.000000
}

Note: dummy and random are internal checks, in order to collect them, v2.11 is required.

Ideas

  • The match for check_ might not be sufficient with plugins named differently, consider that for the result set.
  • The above works for Service objects only, Host objects need to be added in a separate loop.
  • Use schedule_* and execution_* to calculate the check latency.
  • Use the above to calculate an average check latency for each check plugin.
  • Move the code snippet into a global function, and call it from a dummy check similar to the cluster checks shown in the docs.
4 Likes