After updating to 2.13.1 all disk checks, memory checks, load checks dropped about 10%. It doesn’t matter whether it’s a linux or windows system. It seems to be a problem on interpreting the perfdata. A disk which has 80 GB shows in Grafana only as 72 GB and also all thresholds.
Before updating we ran version 2.12.5. Has there something been changed what could cause this? In the Changelog I couldn’t find an answer to this.
2.13 introduced support for new units of measurement: Service Monitoring - Icinga 2
So it is likely that disk and memory checks have now changed if something was using the incorrect conversion (GB vs. GiB), but changed values for load will have another reason.
Monitoring plugins report sizes as powers of 2. The default unit for check_disk is megabytes, so if icinga started dividing that by 1048576 for the second time then it will simply show wrong values.
# /usr/local/libexec/nagios/check_disk -w 80% -c 90% -u bytes /
DISK CRITICAL - free space: / 4845006848 B (25% inode=93%);| /=14286000128B;-2147483648;2079456870;0;20794568704
# /usr/local/libexec/nagios/check_disk -w 80% -c 90% /
DISK CRITICAL - free space: / 4620 MB (25% inode=93%);| /=13624MB;3966;1983;0;19831
Yes, but in this case, the monitoring plugin would be wrong as powers of 2 are not MB but MiB. The nagios plugins fixed this a while ago (2.3.0) and changed later the default (2.3.2). For the monitoring plugins a issue is open from 2017 about a mismatch here.
The monitoring plugins spec comes from mid-1990s when the “MiB” nonsense hadn’t been invented yet. A lot of accompanying software follows this.
The obvious solution is to always use --unit bytes (or equivalent for different plugins) to work around the confusion. I’d suggest adding a bold notice in the documentation.
I’ve tried using the
--units flag for the disk check. Unfortunately this seems to produce an overflow or something like that as it’ll then store
-2147483648 as the warning and critical values in InfluxDB2 which then makes IcingaWeb2 to show -2GiB as the warning and critical values for the disk check.
Have you experienced something similar or have I simply missed something?
Seems like there’s a bug with the Nagios plugin’s perfdata. Using
Raspbian GNU/Linux 10 (buster)
DISK OK - free space: / 45327069184 B (75% inode=91%); /boot 213937664 B (80% inode=-);
| /=2147483647B;2147483647;2147483647;0;2147483647 /boot=50351616B;224645888;237860352;0;264289280
DISK OK - free space: / 45327085568 B (75.35% inode=91%); /boot 213937664 B (80.94% inode=-);
| /=1937420288B;;;0;-1693577216 /boot=50351616B;224645888;237860352;0;264289280
Both are wrong, but in different ways.
For comparison, with
DISK OK - free space: / 43227 MiB (75.35% inode=91%); /boot 204 MiB (80.94% inode=-);
| /=14135MiB;50850;53841;0;59824 /boot=48MiB;214;226;0;252
CentOS 7 seems fine:
CentOS Linux 7 (Core)
DISK OK - free space: / 29723013120 B (36.91% inode=98%);
I’m on monitoring-plugins 2.3.1-1 (check_disk 2.3.1) on Debian 11 and thanks to your post I’ve noticed that the check_disk command itself throws the negative
-2147483648 in critical and warning, though it works for the total disk size:
# ./check_disk -w20% -c10% --unit bytes /
DISK WARNING - free space: / 1583017984 B (20% inode=83%);| /=6319013888B;-2147483648;-2147483648;0;8348200960
You may want to file a bug report with those folks. Or simply use a more suitable unit for your disks.
I’ve filed a bug report and it already got a reply + a pr. FYI: