icinga2 - The Icinga 2 network monitoring daemon (version: 2.13.1-1)
Kernel 3.10.0-1160.25.1.el7.x86_64
CentOS Linux release 7.9.2009
Sudoers I/O plugin version 1.8.23
On icinga client I see many similar errors which produces a lot of sudo zombies processes:
[2021-11-08 14:58:28 +0100] warning/Process: Killing process group 18816 (’/usr/bin/sudo’ ‘/usr/lib64/nagios/plugins/site/privileged/check_bind.sh’) after timeout of 66 seconds
[2021-11-08 14:58:28 +0100] warning/Process: Couldn’t kill the process group 18816 (’/usr/bin/sudo’ ‘/usr/lib64/nagios/plugins/site/privileged/check_bind.sh’): [errno 1] Operation not permitted
[2021-11-08 14:58:28 +0100] warning/PluginCheckTask: Check command for object ‘my.hidden.host!bind’ (PID: 18816, arguments: ‘/usr/bin/sudo’ ‘/usr/lib64/nagios/plugins/site/privileged/check_bind.sh’) terminated with exit code 128, output:
[2021-11-08 15:00:26 +0100] warning/Process: Terminating process 18938 (’/usr/bin/sudo’ ‘/usr/lib64/nagios/plugins/site/privileged/check_bind.sh’) after timeout of 60 seconds
[2021-11-08 15:00:26 +0100] warning/Process: Couldn’t terminate the process 18938 (’/usr/bin/sudo’ ‘/usr/lib64/nagios/plugins/site/privileged/check_bind.sh’): [errno 1] Operation not permitted
[2021-11-08 15:00:29 +0100] warning/PluginCheckTask: Check command for object ‘my.hidden.host!bind’ (PID: 18938, arguments: ‘/usr/bin/sudo’ ‘/usr/lib64/nagios/plugins/site/privileged/check_bind.sh’) terminated with exit code 128, output: tac: write error
[2021-11-08 15:05:01 +0100] warning/Process: Terminating process 19213 (’/usr/bin/sudo’ ‘/usr/lib64/nagios/plugins/site/privileged/check_bind.sh’) after timeout of 60 seconds
[2021-11-08 15:05:01 +0100] warning/Process: Couldn’t terminate the process 19213 (’/usr/bin/sudo’ ‘/usr/lib64/nagios/plugins/site/privileged/check_bind.sh’): [errno 1] Operation not permitted
[2021-11-08 15:05:07 +0100] warning/Process: Killing process group 19213 (’/usr/bin/sudo’ ‘/usr/lib64/nagios/plugins/site/privileged/check_bind.sh’) after timeout of 66 seconds
[2021-11-08 15:05:07 +0100] warning/Process: Couldn’t kill the process group 19213 (’/usr/bin/sudo’ ‘/usr/lib64/nagios/plugins/site/privileged/check_bind.sh’): [errno 1] Operation not permitted
[2021-11-08 15:05:07 +0100] warning/PluginCheckTask: Check command for object ‘my.hidden.host!bind’ (PID: 19213, arguments: ‘/usr/bin/sudo’ ‘/usr/lib64/nagios/plugins/site/privileged/check_bind.sh’) terminated with exit code 128, output:
[2021-11-08 15:07:06 +0100] warning/Process: Terminating process 19395 (’/usr/bin/sudo’ ‘/usr/lib64/nagios/plugins/site/privileged/check_bind.sh’) after timeout of 60 seconds
[2021-11-08 15:07:06 +0100] warning/Process: Couldn’t terminate the process 19395 (’/usr/bin/sudo’ ‘/usr/lib64/nagios/plugins/site/privileged/check_bind.sh’): [errno 1] Operation not permitted
[2021-11-08 15:07:12 +0100] warning/Process: Killing process group 19395 (’/usr/bin/sudo’ ‘/usr/lib64/nagios/plugins/site/privileged/check_bind.sh’) after timeout of 66 seconds
[2021-11-08 15:07:12 +0100] warning/Process: Couldn’t kill the process group 19395 (’/usr/bin/sudo’ ‘/usr/lib64/nagios/plugins/site/privileged/check_bind.sh’): [errno 1] Operation not permitted
[2021-11-08 15:07:12 +0100] warning/PluginCheckTask: Check command for object ‘my.hidden.host!bind’ (PID: 19395, arguments: ‘/usr/bin/sudo’ ‘/usr/lib64/nagios/plugins/site/privileged/check_bind.sh’) terminated with exit code 128, output:
[2021-11-08 15:09:11 +0100] warning/Process: Terminating process 19463 (’/usr/bin/sudo’ ‘/usr/lib64/nagios/plugins/site/privileged/check_bind.sh’) after timeout of 60 seconds
[2021-11-08 15:09:11 +0100] warning/Process: Couldn’t terminate the process 19463 (’/usr/bin/sudo’ ‘/usr/lib64/nagios/plugins/site/privileged/check_bind.sh’): [errno 1] Operation not permitted
[2021-11-08 15:09:18 +0100] warning/Process: Killing process group 19463 (’/usr/bin/sudo’ ‘/usr/lib64/nagios/plugins/site/privileged/check_bind.sh’) after timeout of 66 seconds
[2021-11-08 15:09:18 +0100] warning/Process: Couldn’t kill the process group 19463 (’/usr/bin/sudo’ ‘/usr/lib64/nagios/plugins/site/privileged/check_bind.sh’): [errno 1] Operation not permitted
[2021-11-08 15:09:18 +0100] warning/PluginCheckTask: Check command for object ‘my.hidden.host!bind’ (PID: 19463, arguments: ‘/usr/bin/sudo’ ‘/usr/lib64/nagios/plugins/site/privileged/check_bind.sh’) terminated with exit code 128, output:
[root@my tmp]# ps ax | grep defunc
19213 ? ZNs 0:00 [sudo]
19395 ? ZNs 0:00 [sudo]
19463 ? ZNs 0:00 [sudo]
19771 ? ZNs 0:00 [sudo]
19890 ? ZNs 0:00 [sudo]
[root@my ~]# ps ax | grep defunc | wc -l
53
The script which executed via sudo have all sufficient privileges and works like a charm if I run it from bash:
[root@my tmp]# sudo -u icinga /usr/bin/sudo /usr/lib64/nagios/plugins/site/privileged/check_bind.sh
Bind9 is running. 206 successfull requests, 0 referrals, 25 nxdomains since last check. | ‘success’=206 ‘referral’=0 ‘nxrrset’=69 ‘nxdomain’=25 ‘recursion’=0 ‘failure’=0 ‘duplicate’=0 ‘dropped’=0
From bash it works even every second when I run it in while;true loop, but when it runs within icinga it is constantly produces the same error almost every minute. Just spent all day on this problem, has someone experienced the same?
I found that there are some sudo bugs like these sudo hangs and leaves the executed program as “zombie” | /contrib/famzah but seems not my case, since I got no any issues or zombie processes of sudo when I run sudo from bash directly. But invoked within icinga sudo always make a lot of zombies.
The script looks fine as well. It’s pretty old check_bind.sh - Nagios Exchange
but works fine on other machines.
Can someone points me out what else can be done to investigate whether it is script bug or sudo bug?
Sometimes I see some errors with tac command like these: [2021-11-08 04:13:53 +0100] warning/PluginCheckTask: Check command for object ‘my.hidden.host!bind’ (PID: 32657, arguments: ‘/usr/bin/sudo’ ‘/usr/lib64/nagios/plugins/site/privileged/check_bind.sh’) terminated with exit code 128, output: tac: write error: Broken pipe
[2021-11-08 04:26:21 +0100] warning/PluginCheckTask: Check command for object ‘my.hidden.host!bind’ (PID: 1507, arguments: ‘/usr/bin/sudo’ ‘/usr/lib64/nagios/plugins/site/privileged/check_bind.sh’) terminated with exit code 128, output: tac: write error: Broken pipe
[2021-11-08 04:32:33 +0100] warning/PluginCheckTask: Check command for object ‘my.hidden.host!bind’ (PID: 1897, arguments: ‘/usr/bin/sudo’ ‘/usr/lib64/nagios/plugins/site/privileged/check_bind.sh’) terminated with exit code 128, output: tac: write error
[2021-11-08 05:44:30 +0100] warning/PluginCheckTask: Check command for object ‘my.hidden.host!bind’ (PID: 7810, arguments: ‘/usr/bin/sudo’ ‘/usr/lib64/nagios/plugins/site/privileged/check_bind.sh’) terminated with exit code 128, output: tac: write error: Broken pipe
[2021-11-08 05:47:39 +0100] warning/PluginCheckTask: Check command for object ‘my.hidden.host!bind’ (PID: 8043, arguments: ‘/usr/bin/sudo’ ‘/usr/lib64/nagios/plugins/site/privileged/check_bind.sh’) terminated with exit code 128, output: tac: write error
[2021-11-08 07:29:11 +0100] warning/PluginCheckTask: Check command for object ‘my.hidden.host!bind’ (PID: 16830, arguments: ‘/usr/bin/sudo’ ‘/usr/lib64/nagios/plugins/site/privileged/check_bind.sh’) terminated with exit code 128, output: tac: write error
[2021-11-08 08:33:58 +0100] warning/PluginCheckTask: Check command for object ‘my.hidden.host!bind’ (PID: 22126, arguments: ‘/usr/bin/sudo’ ‘/usr/lib64/nagios/plugins/site/privileged/check_bind.sh’) terminated with exit code 128, output: tac: write error
[2021-11-08 09:10:05 +0100] warning/PluginCheckTask: Check command for object ‘my.hidden.host!bind’ (PID: 25123, arguments: ‘/usr/bin/sudo’ ‘/usr/lib64/nagios/plugins/site/privileged/check_bind.sh’) terminated with exit code 128, output: tac: write error: Broken pipe
[2021-11-08 11:43:26 +0100] warning/PluginCheckTask: Check command for object ‘my.hidden.host!bind’ (PID: 32083, arguments: ‘/usr/bin/sudo’ ‘/usr/lib64/nagios/plugins/site/privileged/check_bind.sh’) terminated with exit code 128, output: tac: write error
[2021-11-08 15:00:29 +0100] warning/PluginCheckTask: Check command for object ‘my.hidden.host!bind’ (PID: 18938, arguments: ‘/usr/bin/sudo’ ‘/usr/lib64/nagios/plugins/site/privileged/check_bind.sh’) terminated with exit code 128, output: tac: write error
But again, when I run the script manually with sudo icinga rights like about 5 times a second for about 10 minutes I don’t see any slowness or bugs or zombies or these errors with tac.