Hi,
My icinga2.log occasionally gets these:
$ grep "keep up" /var/log/icinga2/icinga2.log
[2020-08-31 02:39:06 +0000] information/WorkQueue: #5 (IdoMysqlConnection, ido-mysql) items: 412, rate: 45.7667/s (2746/min 15010/5min 44566/15min); empty in infinite time, your task handler isn't able to keep up
[2020-08-31 04:20:15 +0000] information/WorkQueue: #5 (IdoMysqlConnection, ido-mysql) items: 352, rate: 40.4667/s (2428/min 14643/5min 43157/15min); empty in infinite time, your task handler isn't able to keep up
[2020-08-31 05:06:06 +0000] information/WorkQueue: #5 (IdoMysqlConnection, ido-mysql) items: 197, rate: 38.15/s (2289/min 14581/5min 44453/15min); empty in infinite time, your task handler isn't able to keep up
[2020-08-31 14:18:03 +0000] information/WorkQueue: #5 (IdoMysqlConnection, ido-mysql) items: 327, rate: 45.2/s (2712/min 13649/5min 42860/15min); empty in infinite time, your task handler isn't able to keep up
As you can see this doesn’t happen very often and either side of any occurrence the queue seems to be draining rapidly. For example, the 5 WorkQueue log lines before and after the most recent “isn’t able to keep up” log:
$ tac /var/log/icinga2/icinga2.log | grep "WorkQueue.*IdoMysql" | grep -m1 -A 5 -B 5 "keep up"
[2020-08-31 14:20:09 +0000] information/WorkQueue: #5 (IdoMysqlConnection, ido-mysql) items: 318, rate: 46.8167/s (2809/min 13697/5min 42966/15min); empty in less than 1millisecond
[2020-08-31 14:19:44 +0000] information/WorkQueue: #5 (IdoMysqlConnection, ido-mysql) items: 352, rate: 41.45/s (2487/min 13672/5min 43281/15min); empty in less than 1 millisecond
[2020-08-31 14:19:18 +0000] information/WorkQueue: #5 (IdoMysqlConnection, ido-mysql) items: 424, rate: 39.9667/s (2398/min 13270/5min 42915/15min); empty in 8 minutes and 36 seconds
[2020-08-31 14:18:53 +0000] information/WorkQueue: #5 (IdoMysqlConnection, ido-mysql) items: 404, rate: 40.1333/s (2408/min 13365/5min 43185/15min); empty in 7 minutes and 26 seconds
[2020-08-31 14:18:28 +0000] information/WorkQueue: #5 (IdoMysqlConnection, ido-mysql) items: 381, rate: 44.3333/s (2660/min 13667/5min 43104/15min); empty in 2 minutes and 59 seconds
[2020-08-31 14:18:03 +0000] information/WorkQueue: #5 (IdoMysqlConnection, ido-mysql) items: 327, rate: 45.2/s (2712/min 13649/5min 42860/15min); empty in infinite time,your task handler isn't able to keep up
[2020-08-31 14:17:37 +0000] information/WorkQueue: #5 (IdoMysqlConnection, ido-mysql) items: 327, rate: 41.7667/s (2506/min 13697/5min 42893/15min); empty in less than 1millisecond
[2020-08-31 14:17:12 +0000] information/WorkQueue: #5 (IdoMysqlConnection, ido-mysql) items: 333, rate: 36.5667/s (2194/min 13367/5min 42944/15min); empty in less than 1millisecond
[2020-08-31 14:16:47 +0000] information/WorkQueue: #5 (IdoMysqlConnection, ido-mysql) items: 452, rate: 40.1167/s (2407/min 13354/5min 43148/15min); empty in 24 seconds
[2020-08-31 14:16:22 +0000] information/WorkQueue: #5 (IdoMysqlConnection, ido-mysql) items: 1, rate: 50.5333/s (3032/min 13759/5min 43541/15min);
[2020-08-31 14:15:57 +0000] information/WorkQueue: #5 (IdoMysqlConnection, ido-mysql) items: 1, rate: 50.05/s (3003/min 13997/5min 43231/15min);
so it’s everything from <1ms to several seconds, minutes and then “infinite”.
Should I be concerned?
What does it actually mean? Is it that there is a sudden surge of rows being added to the WorkQueue where the rate of draining is less than the rate of insertion so for a short period of time icinga believes that it will never drain, but then it calms down and does drain?
I haven’t noticed any problems as such, other than this logging. In the Icingaweb late checks dashboard I see nothing, but I haven’t managed to look at it exactly when an “isn’t able to keep up” log happens.
This is a pretty small setup:
$ sudo icinga2 feature list
Disabled features: compatlog debuglog elasticsearch gelf graphite icingadb icingastatus influxdb livestatus opentsdb perfdata statusdata syslog
Enabled features: api checker command ido-mysql mainlog notification
$ sudo icinga2 daemon -C
[2020-08-31 16:48:29 +0000] information/cli: Icinga application loader (version: r2.12.0-1)
[2020-08-31 16:48:29 +0000] information/cli: Loading configuration file(s).
[2020-08-31 16:48:29 +0000] information/ConfigItem: Committing config item(s).
[2020-08-31 16:48:29 +0000] information/ApiListener: My API identity: mon0.example.com
[2020-08-31 16:48:31 +0000] information/ConfigItem: Instantiated 1 NotificationComponent.
[2020-08-31 16:48:31 +0000] information/ConfigItem: Instantiated 251 Hosts.
[2020-08-31 16:48:31 +0000] information/ConfigItem: Instantiated 1 Downtime.
[2020-08-31 16:48:31 +0000] information/ConfigItem: Instantiated 7 NotificationCommands.
[2020-08-31 16:48:31 +0000] information/ConfigItem: Instantiated 1 FileLogger.
[2020-08-31 16:48:31 +0000] information/ConfigItem: Instantiated 12 Comments.
[2020-08-31 16:48:31 +0000] information/ConfigItem: Instantiated 2914 Notifications.
[2020-08-31 16:48:31 +0000] information/ConfigItem: Instantiated 1 IcingaApplication.
[2020-08-31 16:48:31 +0000] information/ConfigItem: Instantiated 180 HostGroups.
[2020-08-31 16:48:31 +0000] information/ConfigItem: Instantiated 1333 Dependencies.
[2020-08-31 16:48:31 +0000] information/ConfigItem: Instantiated 1 CheckerComponent.
[2020-08-31 16:48:31 +0000] information/ConfigItem: Instantiated 3 Zones.
[2020-08-31 16:48:31 +0000] information/ConfigItem: Instantiated 1 Endpoint.
[2020-08-31 16:48:31 +0000] information/ConfigItem: Instantiated 1 ExternalCommandListener.
[2020-08-31 16:48:31 +0000] information/ConfigItem: Instantiated 1 IdoMysqlConnection.
[2020-08-31 16:48:31 +0000] information/ConfigItem: Instantiated 1 ApiUser.
[2020-08-31 16:48:31 +0000] information/ConfigItem: Instantiated 1 ApiListener.
[2020-08-31 16:48:31 +0000] information/ConfigItem: Instantiated 238 CheckCommands.
[2020-08-31 16:48:31 +0000] information/ConfigItem: Instantiated 3 TimePeriods.
[2020-08-31 16:48:31 +0000] information/ConfigItem: Instantiated 168 UserGroups.
[2020-08-31 16:48:31 +0000] information/ConfigItem: Instantiated 192 Users.
[2020-08-31 16:48:31 +0000] information/ConfigItem: Instantiated 2325 Services.
[2020-08-31 16:48:31 +0000] information/ConfigItem: Instantiated 98 ServiceGroups.
[2020-08-31 16:48:31 +0000] information/ScriptGlobal: Dumping variables to file '/var/cache/icinga2/icinga2.vars'
[2020-08-31 16:48:31 +0000] information/cli: Finished validating the configuration file(s).
(Although 3 zones are defined only 1 is used.)
It runs on a virtual machine with 4GiB RAM and 8 CPUs. The storage is SSD. MariaDB is running on same host. I do not see excessive memory usage; the system is stable with a load average hovering around 2. One icinga2 process appears to use 100% CPU but it has always done this so I assume that is normal.
$ top -b -n1 -o '%CPU' | head -40 | tee top.txt
top - 17:06:40 up 2 days, 1:31, 5 users, load average: 2.41, 2.21, 2.09
Tasks: 209 total, 3 running, 195 sleeping, 2 stopped, 9 zombie
%Cpu0 : 1.1 us, 18.5 sy, 3.9 ni, 74.9 id, 0.0 wa, 0.0 hi, 0.0 si, 1.7 st
%Cpu1 : 1.1 us, 18.4 sy, 3.8 ni, 74.9 id, 0.0 wa, 0.0 hi, 0.0 si, 1.8 st
%Cpu2 : 1.0 us, 18.5 sy, 3.8 ni, 74.9 id, 0.0 wa, 0.0 hi, 0.0 si, 1.7 st
%Cpu3 : 1.6 us, 18.3 sy, 3.8 ni, 74.5 id, 0.0 wa, 0.0 hi, 0.1 si, 1.7 st
%Cpu4 : 1.1 us, 18.3 sy, 3.8 ni, 75.0 id, 0.1 wa, 0.0 hi, 0.0 si, 1.7 st
%Cpu5 : 1.3 us, 18.0 sy, 3.8 ni, 74.9 id, 0.0 wa, 0.0 hi, 0.1 si, 1.8 st
%Cpu6 : 1.1 us, 18.5 sy, 3.9 ni, 74.8 id, 0.0 wa, 0.0 hi, 0.0 si, 1.7 st
%Cpu7 : 1.4 us, 18.4 sy, 3.8 ni, 74.6 id, 0.0 wa, 0.0 hi, 0.0 si, 1.7 st
KiB Mem : 4099432 total, 2342880 free, 844668 used, 911884 buff/cache
KiB Swap: 1047548 total, 1047548 free, 0 used. 3143744 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1803 nagios 20 0 2026868 194116 17196 S 138.1 4.7 190:13.86 /usr/lib/x86_64-linux-gnu/icinga2/sbin/icinga2 --no-stack-rlimit daemo+
2130 nagios 20 0 846180 78064 4764 R 100.0 1.9 2949:06 /usr/lib/x86_64-linux-gnu/icinga2/sbin/icinga2 --no-stack-rlimit daemo+
845 mysql 20 0 1141600 420684 17384 S 33.3 10.3 7:55.53 /usr/sbin/mysqld
5646 nagios 25 5 19112 6016 3148 R 33.3 0.1 0:00.07 /usr/bin/snmpget -Le -t 10 -r 5 -m ALL -v 1 -c 192.168.80.223:+
5627 andy 20 0 44932 3896 3264 R 14.3 0.1 0:00.05 top -b -n1 -o %CPU
5639 nagios 25 5 0 0 0 Z 9.5 0.0 0:00.02 [check_http] <defunct>
5634 nagios 25 5 0 0 0 Z 4.8 0.0 0:00.01 [check_dig] <defunct>
1 root 20 0 204668 7020 5356 S 0.0 0.2 0:46.17 /sbin/init
2 root 20 0 0 0 0 S 0.0 0.0 0:00.08 [kthreadd]
3 root 20 0 0 0 0 S 0.0 0.0 0:05.43 [ksoftirqd/0]
5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 [kworker/0:0H]
7 root 20 0 0 0 0 S 0.0 0.0 9:17.29 [rcu_sched]
8 root 20 0 0 0 0 S 0.0 0.0 0:00.00 [rcu_bh]
9 root rt 0 0 0 0 S 0.0 0.0 0:19.34 [migration/0]
10 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 [lru-add-drain]
11 root rt 0 0 0 0 S 0.0 0.0 0:01.12 [watchdog/0]
12 root 20 0 0 0 0 S 0.0 0.0 0:00.00 [cpuhp/0]
13 root 20 0 0 0 0 S 0.0 0.0 0:00.00 [cpuhp/1]
14 root rt 0 0 0 0 S 0.0 0.0 0:01.18 [watchdog/1]
15 root rt 0 0 0 0 S 0.0 0.0 0:19.17 [migration/1]
16 root 20 0 0 0 0 S 0.0 0.0 0:05.38 [ksoftirqd/1]
18 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 [kworker/1:0H]
19 root 20 0 0 0 0 S 0.0 0.0 0:00.00 [cpuhp/2]
20 root rt 0 0 0 0 S 0.0 0.0 0:01.10 [watchdog/2]
21 root rt 0 0 0 0 S 0.0 0.0 0:19.08 [migration/2]
22 root 20 0 0 0 0 S 0.0 0.0 0:04.93 [ksoftirqd/2]
Anything to be concerned about?
Thanks,
Andy