Hi,
in my master-satellite Icinga2 configuration I’m monitoring the crond
command with the check_procs plugin.
On some CentOS 6 servers these checks change their states between OK and CRITICAL every 30 seconds more or less.
I manually checked the crond command with watch and it is running, stable and it is not changing its PID number.
I enabled the debug log on the satellite, and I see many lines like these:
[root@satellite ~]# grep cron /var/log/icinga2/debug.log
[2019-07-15 15:16:12 +0200] notice/Process: Running command '/usr/lib64/nagios/plugins/check_procs' '-C' 'cron' '-c' '1:' '-w' '250': PID 26610
[2019-07-15 15:16:12 +0200] notice/Process: PID 26610 ('/usr/lib64/nagios/plugins/check_procs' '-C' 'cron' '-c' '1:' '-w' '250') terminated with exit code 2
[2019-07-15 15:16:40 +0200] notice/Process: Running command '/usr/lib64/nagios/plugins/check_procs' '-C' 'crond' '-c' '1:' '-w' '250': PID 26805
[2019-07-15 15:16:40 +0200] notice/Process: PID 26805 ('/usr/lib64/nagios/plugins/check_procs' '-C' 'crond' '-c' '1:' '-w' '250') terminated with exit code 0
[2019-07-15 15:17:09 +0200] notice/Process: Running command '/usr/lib64/nagios/plugins/check_procs' '-C' 'cron' '-c' '1:' '-w' '250': PID 26917
[2019-07-15 15:17:09 +0200] notice/Process: PID 26917 ('/usr/lib64/nagios/plugins/check_procs' '-C' 'cron' '-c' '1:' '-w' '250') terminated with exit code 2
[2019-07-15 15:17:37 +0200] notice/Process: Running command '/usr/lib64/nagios/plugins/check_procs' '-C' 'crond' '-c' '1:' '-w' '250': PID 27110
[2019-07-15 15:17:37 +0200] notice/Process: PID 27110 ('/usr/lib64/nagios/plugins/check_procs' '-C' 'crond' '-c' '1:' '-w' '250') terminated with exit code 0
[2019-07-15 15:18:09 +0200] notice/Process: Running command '/usr/lib64/nagios/plugins/check_procs' '-C' 'cron' '-c' '1:' '-w' '250': PID 27244
[2019-07-15 15:18:09 +0200] notice/Process: PID 27244 ('/usr/lib64/nagios/plugins/check_procs' '-C' 'cron' '-c' '1:' '-w' '250') terminated with exit code 2
[2019-07-15 15:18:37 +0200] notice/Process: Running command '/usr/lib64/nagios/plugins/check_procs' '-C' 'crond' '-c' '1:' '-w' '250': PID 27433
[2019-07-15 15:18:37 +0200] notice/Process: PID 27433 ('/usr/lib64/nagios/plugins/check_procs' '-C' 'crond' '-c' '1:' '-w' '250') terminated with exit code 0
[2019-07-15 15:19:09 +0200] notice/Process: Running command '/usr/lib64/nagios/plugins/check_procs' '-C' 'cron' '-c' '1:' '-w' '250': PID 27569
[2019-07-15 15:19:09 +0200] notice/Process: PID 27569 ('/usr/lib64/nagios/plugins/check_procs' '-C' 'cron' '-c' '1:' '-w' '250') terminated with exit code 2
[2019-07-15 15:19:37 +0200] notice/Process: Running command '/usr/lib64/nagios/plugins/check_procs' '-C' 'crond' '-c' '1:' '-w' '250': PID 27820
[2019-07-15 15:19:37 +0200] notice/Process: PID 27820 ('/usr/lib64/nagios/plugins/check_procs' '-C' 'crond' '-c' '1:' '-w' '250') terminated with exit code 0
[...]
So the check_procs command seems to return exit code 2 (CRITICAL) every about 30 seconds.
I tried to reproduce the problem with the icinga user but I got 100% exit code 0 (OK) when repeatedly ran the command /usr/lib64/nagios/plugins/check_procs -C crond -c 1: -w 250
on the satellite as user icinga 1000 times:
bash-4.1$ for i in $(seq 1 1000); do /usr/lib64/nagios/plugins/check_procs -C crond -c 1: -w 250; done
PROCS OK: 1 process with command name ‘crond’ | procs=1;250;1:;0;
PROCS OK: 1 process with command name ‘crond’ | procs=1;250;1:;0;
PROCS OK: 1 process with command name ‘crond’ | procs=1;250;1:;0;
PROCS OK: 1 process with command name ‘crond’ | procs=1;250;1:;0;
PROCS OK: 1 process with command name ‘crond’ | procs=1;250;1:;0;
[…]
Could you help me to understand what’s going on, please?
Thank you very much!