Get Performance Data details (warn crit min max plugin_output) via API

We use https://codeberg.org/oxzi/icinga-prometheus-remote-writer
But missing performance metrics details like: warn crit min max plugin_output

For example about linux disk we can get disk name (/boot) and unit (MB) but missing disk size:
icinga_check_result_perf{host=“”, label=“/boot”, object_type=“service”, service=“Check Linux Disk”, unit=“MB”}

Thanks

Maybe @apenning got an Idea how to implement this? :smiley:

First and foremost: Welcome to the Icinga community and thanks for trying out my little Icinga Prometheus Remote Writer. However, I feel obliged to mention that this software should be considered in an early or proof-of-concept state.

Following Prometheus’s data model, the static attributes of a check are metric labels, while the changing result is the value.

For icinga_check_result_perf, the metric labels are object_type (host or service), host, service (if this is a service check) together with label and unit for each performance data entry. This is also described in the project’s README under “Metrics”.

Thus, if you are using the good old check_disk, there should be multiple icinga_check_result_perf entries, distinguishable via the label metric label - being /boot in your example. When querying for this, you should get the current value (or a historical one, depending on the kind of query.

$ promtool query instant http://localhost:9090 'icinga_check_result_perf{object_type="service", service="disk"}'
icinga_check_result_perf{host="example.com", label="/", object_type="service", service="disk", unit="B"} => 133169152 @[1761550803.964]
icinga_check_result_perf{host="example.com", label="/backup", object_type="service", service="disk", unit="B"} => 0 @[1761550803.964]
icinga_check_result_perf{host="example.com", label="/home", object_type="service", service="disk", unit="B"} => 18874368 @[1761550803.964]
icinga_check_result_perf{host="example.com", label="/tmp", object_type="service", service="disk", unit="B"} => 0 @[1761550803.964]
icinga_check_result_perf{host="example.com", label="/usr", object_type="service", service="disk", unit="B"} => 1700790272 @[1761550803.964]
icinga_check_result_perf{host="example.com", label="/usr/X11R6", object_type="service", service="disk", unit="B"} => 454033408 @[1761550803.964]
icinga_check_result_perf{host="example.com", label="/usr/local", object_type="service", service="disk", unit="B"} => 839909376 @[1761550803.964]
icinga_check_result_perf{host="example.com", label="/usr/obj", object_type="service", service="disk", unit="B"} => 0 @[1761550803.964]
icinga_check_result_perf{host="example.com", label="/usr/src", object_type="service", service="disk", unit="B"} => 0 @[1761550803.964]
icinga_check_result_perf{host="example.com", label="/var", object_type="service", service="disk", unit="B"} => 1307574272 @[1761550803.964]
[ . . . ]

For a specific example, we can query the current value of the /home mount point. This would result in the check_disk performance data value field, containing the utilization in bytes.

$ promtool query instant http://localhost:9090 'icinga_check_result_perf{object_type="service", service="disk", host="example.com", label="/home"}'
icinga_check_result_perf{host="example.com", label="/home", object_type="service", service="disk", unit="B"} => 18874368 @[1761550868.143]

$ python3 -c 'print(18874368 / 1024**2, "MiB")'
18.0 MiB

Thus, at the moment 18 MiB are used in /home. What a waste.

Btw, instead of using promtool, these queries can also be made from Prometheus’ web interface or by using Grafana.

The optional performance data fields warn, crit, min, and max are missing at the moment. I could extend the program to create additional metrics like icinga_check_result_perf_warn, icinga_check_result_perf_crit and so on, with additional metric labels as for icinga_check_result_perf.

However, what do you mean by plugin_output? The check command exit is available in the icinga_check_result_exit_status metric and the check state in icinga_check_result_state.

The latter would allow us to query, for example, for failing pings.

$ promtool query range --start 1761465308 --end 1761551708 http://localhost:9090 'icinga_check_result_state{service=~"ping.*"} != 0'
icinga_check_result_state{host="alpha.example.com", object_type="service", service="ping6"} =>
2 @[1761531893]
2 @[1761532238]
icinga_check_result_state{host="beta.example.com", object_type="service", service="ping4"} =>
2 @[1761465653]
icinga_check_result_state{host="beta.example.com", object_type="service", service="ping6"} =>
2 @[1761469448]
[ . . . ]

Is this what you would expect by the plugin_output or do you want the exact string returned by the check plugin? In this case, I cannot think of an idiomatic way how to store this in Prometheus, still being a time series database and all.

3 Likes

Thank you for Answer Alvar:

would be great to have warn, crit, min, and max. A set of four labels could be more effective than set of four new dedicated performance metrics. Anyway it’s just rarely varying parameters:
icinga_check_result_perf{host=“example.com”, label=“/usr/local”, object_type=“service”, service=“disk”, unit=“B”, warn=””, crit=””, min=””, max=”” }

Icinga Plugin Ouput I mean a string. Here is example:
DISK CRITICAL - free space: / 0 MB (0% inode=96%); /sys/firmware/efi/efivars 0 MB (84% inode=-); /boot 892 MB (64% inode=99%); /boot/efi 1066 MB (99% inode=-);

Only way I can access this output is GUI. Unable to get it via API nor Graphite DB.

Pardon, but this would not be more effective, but contradicts Prometheus’ Data Model. The measured values should not be in labels, as otherwise two consecutive measurements of the same check would have different labels. The label data should be “static” in this regard.

As an alternative to four additional metrics, some “kind” label (one of value, warn, crit, min, max) could be introduced, similar to this documentation’s example.

However, imo, these are just implementation details and I am not familiar enough with Prometheus to tell you which is more efficient or idiomatic.

Yes, because this is a string output and Prometheus is a time series database. This is not a feature of a TSDB and AFAIK not planned upstream. Same should apply for Graphite.

Thus, what do you actually want to achieve? The output logs are available in Icinga DB’s relational database. Furthermore, you could send them to some logging service, such as syslog-ng, Elasticsearch/OpenSearch, or whatnot.

So, what problem do you want to solve?

1 Like

Yes I agree labels are not good option then. Additional metrics would be appreciated.

To expose plugin_output via syslog sounds as good solution.

Problem to be solved is get disk utilization in % at Grafana via recalculation there.

Warn and Crit threshold is nice to have.