Dear Icinga Community,
for a couple nodes the “load” and “procs” checks are flapping.
The higher warning/critical threshold values which are defined in the configuration files are periodically not taken into account and are reverted back to lower Icinga defaults leading to the flapping of the service check.
Couple seconds later the higher values from the config files take effect, so the critical state stops, but this is suddenly periodically overridden again by the low default Icinga values, and the whole deal repeats over and over, several times a minute.
The Icinga version is r2.13.6-1, running on CentOS 7.9 Linux with kernel 3.10.0-1160.76.1.el7.x86_64.
In the file /etc/icinga2/zones.d/master/services.conf
non-default high threshold values are explicitly set.
For the load check:
vars.load_wload1 = “70”
vars.load_wload5 = “70”
vars.load_wload15 = “70”
vars.load_cload1 = “90”
vars.load_cload5 = “90”
vars.load_cload15 = “90”
For the procs check:
vars.procs_warning = 3000
vars.procs_critical = 6000
The same values are also defined in the /usr/share/icinga2/include/command-plugins.conf
since that is where the low values were still present, so I hoped that changing both config files might fix the issue, but it did not.
After config changes the icinga2 service was restarted several times, that did not fix the flapping either.
In the web interface for a few hosts there are critical errors popping up periodically (multiple times per minute), where the WebGUI shows the default low values for the warning/critical thresholds.
For example:
Performance data
Label Value Warning Critical
procs 999.00 250.00 400.00
Couple seconds later the higher threshold values which were defined in the config files take effect. At this stage warning/critical threshold values displayed in the WebGUI are now those higher values which were set from the config files, and the red critical states are gone from the WebGUI, all is green:
Performance data
Label Value Warning Critical
procs 1,005.00 3,000.00 6,000.00
In couple seconds the higher threshold values from the config files are reset back again to the default low values, and critical checks appear in red.
The same thing repeats several times a minute, which makes the whole check very annoying and pointless.
I just can not figure out where are those pesky default values still stored, or why are they periodically set/reset from the low Icinga default values to the higher custom values and back again.
Could you please give me some hints what to change/check?
Best regards