Occasional IdoMysql "isn't able to keep up" logs

Hi,

My icinga2.log occasionally gets these:

$ grep "keep up" /var/log/icinga2/icinga2.log
[2020-08-31 02:39:06 +0000] information/WorkQueue: #5 (IdoMysqlConnection, ido-mysql) items: 412, rate: 45.7667/s (2746/min 15010/5min 44566/15min); empty in infinite time, your task handler isn't able to keep up
[2020-08-31 04:20:15 +0000] information/WorkQueue: #5 (IdoMysqlConnection, ido-mysql) items: 352, rate: 40.4667/s (2428/min 14643/5min 43157/15min); empty in infinite time, your task handler isn't able to keep up
[2020-08-31 05:06:06 +0000] information/WorkQueue: #5 (IdoMysqlConnection, ido-mysql) items: 197, rate: 38.15/s (2289/min 14581/5min 44453/15min); empty in infinite time, your task handler isn't able to keep up
[2020-08-31 14:18:03 +0000] information/WorkQueue: #5 (IdoMysqlConnection, ido-mysql) items: 327, rate: 45.2/s (2712/min 13649/5min 42860/15min); empty in infinite time, your task handler isn't able to keep up

As you can see this doesn’t happen very often and either side of any occurrence the queue seems to be draining rapidly. For example, the 5 WorkQueue log lines before and after the most recent “isn’t able to keep up” log:

$ tac /var/log/icinga2/icinga2.log | grep "WorkQueue.*IdoMysql" | grep -m1 -A 5 -B 5  "keep up"                                                                            
[2020-08-31 14:20:09 +0000] information/WorkQueue: #5 (IdoMysqlConnection, ido-mysql) items: 318, rate: 46.8167/s (2809/min 13697/5min 42966/15min); empty in less than 1millisecond
[2020-08-31 14:19:44 +0000] information/WorkQueue: #5 (IdoMysqlConnection, ido-mysql) items: 352, rate: 41.45/s (2487/min 13672/5min 43281/15min); empty in less than 1 millisecond
[2020-08-31 14:19:18 +0000] information/WorkQueue: #5 (IdoMysqlConnection, ido-mysql) items: 424, rate: 39.9667/s (2398/min 13270/5min 42915/15min); empty in 8 minutes and 36 seconds
[2020-08-31 14:18:53 +0000] information/WorkQueue: #5 (IdoMysqlConnection, ido-mysql) items: 404, rate: 40.1333/s (2408/min 13365/5min 43185/15min); empty in 7 minutes and 26 seconds
[2020-08-31 14:18:28 +0000] information/WorkQueue: #5 (IdoMysqlConnection, ido-mysql) items: 381, rate: 44.3333/s (2660/min 13667/5min 43104/15min); empty in 2 minutes and 59 seconds
[2020-08-31 14:18:03 +0000] information/WorkQueue: #5 (IdoMysqlConnection, ido-mysql) items: 327, rate: 45.2/s (2712/min 13649/5min 42860/15min); empty in infinite time,your task handler isn't able to keep up
[2020-08-31 14:17:37 +0000] information/WorkQueue: #5 (IdoMysqlConnection, ido-mysql) items: 327, rate: 41.7667/s (2506/min 13697/5min 42893/15min); empty in less than 1millisecond
[2020-08-31 14:17:12 +0000] information/WorkQueue: #5 (IdoMysqlConnection, ido-mysql) items: 333, rate: 36.5667/s (2194/min 13367/5min 42944/15min); empty in less than 1millisecond
[2020-08-31 14:16:47 +0000] information/WorkQueue: #5 (IdoMysqlConnection, ido-mysql) items: 452, rate: 40.1167/s (2407/min 13354/5min 43148/15min); empty in 24 seconds
[2020-08-31 14:16:22 +0000] information/WorkQueue: #5 (IdoMysqlConnection, ido-mysql) items: 1, rate: 50.5333/s (3032/min 13759/5min 43541/15min);
[2020-08-31 14:15:57 +0000] information/WorkQueue: #5 (IdoMysqlConnection, ido-mysql) items: 1, rate: 50.05/s (3003/min 13997/5min 43231/15min);

so it’s everything from <1ms to several seconds, minutes and then “infinite”.

Should I be concerned?

What does it actually mean? Is it that there is a sudden surge of rows being added to the WorkQueue where the rate of draining is less than the rate of insertion so for a short period of time icinga believes that it will never drain, but then it calms down and does drain?

I haven’t noticed any problems as such, other than this logging. In the Icingaweb late checks dashboard I see nothing, but I haven’t managed to look at it exactly when an “isn’t able to keep up” log happens.

This is a pretty small setup:

$ sudo icinga2 feature list
Disabled features: compatlog debuglog elasticsearch gelf graphite icingadb icingastatus influxdb livestatus opentsdb perfdata statusdata syslog
Enabled features: api checker command ido-mysql mainlog notification
$ sudo icinga2 daemon -C                                                   
[2020-08-31 16:48:29 +0000] information/cli: Icinga application loader (version: r2.12.0-1)                                                                                 
[2020-08-31 16:48:29 +0000] information/cli: Loading configuration file(s).
[2020-08-31 16:48:29 +0000] information/ConfigItem: Committing config item(s).
[2020-08-31 16:48:29 +0000] information/ApiListener: My API identity: mon0.example.com
[2020-08-31 16:48:31 +0000] information/ConfigItem: Instantiated 1 NotificationComponent.
[2020-08-31 16:48:31 +0000] information/ConfigItem: Instantiated 251 Hosts.
[2020-08-31 16:48:31 +0000] information/ConfigItem: Instantiated 1 Downtime.
[2020-08-31 16:48:31 +0000] information/ConfigItem: Instantiated 7 NotificationCommands.
[2020-08-31 16:48:31 +0000] information/ConfigItem: Instantiated 1 FileLogger.
[2020-08-31 16:48:31 +0000] information/ConfigItem: Instantiated 12 Comments.
[2020-08-31 16:48:31 +0000] information/ConfigItem: Instantiated 2914 Notifications.
[2020-08-31 16:48:31 +0000] information/ConfigItem: Instantiated 1 IcingaApplication.
[2020-08-31 16:48:31 +0000] information/ConfigItem: Instantiated 180 HostGroups.
[2020-08-31 16:48:31 +0000] information/ConfigItem: Instantiated 1333 Dependencies.
[2020-08-31 16:48:31 +0000] information/ConfigItem: Instantiated 1 CheckerComponent.
[2020-08-31 16:48:31 +0000] information/ConfigItem: Instantiated 3 Zones.
[2020-08-31 16:48:31 +0000] information/ConfigItem: Instantiated 1 Endpoint.
[2020-08-31 16:48:31 +0000] information/ConfigItem: Instantiated 1 ExternalCommandListener.
[2020-08-31 16:48:31 +0000] information/ConfigItem: Instantiated 1 IdoMysqlConnection.
[2020-08-31 16:48:31 +0000] information/ConfigItem: Instantiated 1 ApiUser.
[2020-08-31 16:48:31 +0000] information/ConfigItem: Instantiated 1 ApiListener.
[2020-08-31 16:48:31 +0000] information/ConfigItem: Instantiated 238 CheckCommands.
[2020-08-31 16:48:31 +0000] information/ConfigItem: Instantiated 3 TimePeriods.
[2020-08-31 16:48:31 +0000] information/ConfigItem: Instantiated 168 UserGroups.
[2020-08-31 16:48:31 +0000] information/ConfigItem: Instantiated 192 Users.
[2020-08-31 16:48:31 +0000] information/ConfigItem: Instantiated 2325 Services.
[2020-08-31 16:48:31 +0000] information/ConfigItem: Instantiated 98 ServiceGroups.
[2020-08-31 16:48:31 +0000] information/ScriptGlobal: Dumping variables to file '/var/cache/icinga2/icinga2.vars'
[2020-08-31 16:48:31 +0000] information/cli: Finished validating the configuration file(s).

(Although 3 zones are defined only 1 is used.)

It runs on a virtual machine with 4GiB RAM and 8 CPUs. The storage is SSD. MariaDB is running on same host. I do not see excessive memory usage; the system is stable with a load average hovering around 2. One icinga2 process appears to use 100% CPU but it has always done this so I assume that is normal.

$ top -b -n1 -o '%CPU' | head -40 | tee top.txt
top - 17:06:40 up 2 days,  1:31,  5 users,  load average: 2.41, 2.21, 2.09
Tasks: 209 total,   3 running, 195 sleeping,   2 stopped,   9 zombie
%Cpu0  :  1.1 us, 18.5 sy,  3.9 ni, 74.9 id,  0.0 wa,  0.0 hi,  0.0 si,  1.7 st
%Cpu1  :  1.1 us, 18.4 sy,  3.8 ni, 74.9 id,  0.0 wa,  0.0 hi,  0.0 si,  1.8 st
%Cpu2  :  1.0 us, 18.5 sy,  3.8 ni, 74.9 id,  0.0 wa,  0.0 hi,  0.0 si,  1.7 st
%Cpu3  :  1.6 us, 18.3 sy,  3.8 ni, 74.5 id,  0.0 wa,  0.0 hi,  0.1 si,  1.7 st
%Cpu4  :  1.1 us, 18.3 sy,  3.8 ni, 75.0 id,  0.1 wa,  0.0 hi,  0.0 si,  1.7 st
%Cpu5  :  1.3 us, 18.0 sy,  3.8 ni, 74.9 id,  0.0 wa,  0.0 hi,  0.1 si,  1.8 st
%Cpu6  :  1.1 us, 18.5 sy,  3.9 ni, 74.8 id,  0.0 wa,  0.0 hi,  0.0 si,  1.7 st
%Cpu7  :  1.4 us, 18.4 sy,  3.8 ni, 74.6 id,  0.0 wa,  0.0 hi,  0.0 si,  1.7 st
KiB Mem :  4099432 total,  2342880 free,   844668 used,   911884 buff/cache
KiB Swap:  1047548 total,  1047548 free,        0 used.  3143744 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 1803 nagios    20   0 2026868 194116  17196 S 138.1  4.7 190:13.86 /usr/lib/x86_64-linux-gnu/icinga2/sbin/icinga2 --no-stack-rlimit daemo+
 2130 nagios    20   0  846180  78064   4764 R 100.0  1.9   2949:06 /usr/lib/x86_64-linux-gnu/icinga2/sbin/icinga2 --no-stack-rlimit daemo+
  845 mysql     20   0 1141600 420684  17384 S  33.3 10.3   7:55.53 /usr/sbin/mysqld
 5646 nagios    25   5   19112   6016   3148 R  33.3  0.1   0:00.07 /usr/bin/snmpget -Le -t 10 -r 5 -m ALL -v 1 -c          192.168.80.223:+
 5627 andy      20   0   44932   3896   3264 R  14.3  0.1   0:00.05 top -b -n1 -o %CPU
 5639 nagios    25   5       0      0      0 Z   9.5  0.0   0:00.02 [check_http] <defunct>
 5634 nagios    25   5       0      0      0 Z   4.8  0.0   0:00.01 [check_dig] <defunct>
    1 root      20   0  204668   7020   5356 S   0.0  0.2   0:46.17 /sbin/init
    2 root      20   0       0      0      0 S   0.0  0.0   0:00.08 [kthreadd]
    3 root      20   0       0      0      0 S   0.0  0.0   0:05.43 [ksoftirqd/0]
    5 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 [kworker/0:0H]
    7 root      20   0       0      0      0 S   0.0  0.0   9:17.29 [rcu_sched]
    8 root      20   0       0      0      0 S   0.0  0.0   0:00.00 [rcu_bh]
    9 root      rt   0       0      0      0 S   0.0  0.0   0:19.34 [migration/0]
   10 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 [lru-add-drain]
   11 root      rt   0       0      0      0 S   0.0  0.0   0:01.12 [watchdog/0]
   12 root      20   0       0      0      0 S   0.0  0.0   0:00.00 [cpuhp/0]
   13 root      20   0       0      0      0 S   0.0  0.0   0:00.00 [cpuhp/1]
   14 root      rt   0       0      0      0 S   0.0  0.0   0:01.18 [watchdog/1]
   15 root      rt   0       0      0      0 S   0.0  0.0   0:19.17 [migration/1]
   16 root      20   0       0      0      0 S   0.0  0.0   0:05.38 [ksoftirqd/1]
   18 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 [kworker/1:0H]
   19 root      20   0       0      0      0 S   0.0  0.0   0:00.00 [cpuhp/2]
   20 root      rt   0       0      0      0 S   0.0  0.0   0:01.10 [watchdog/2]
   21 root      rt   0       0      0      0 S   0.0  0.0   0:19.08 [migration/2]
   22 root      20   0       0      0      0 S   0.0  0.0   0:04.93 [ksoftirqd/2]

Anything to be concerned about?

Thanks,
Andy

Hi @grifferz,

the “empty in” can sometimes act quite odd. It such cases it’s just important that the items don’t pile up in the work queue. In your case everything seems to work as expected.

That time calculation definitely needs some fixing to be done.

Greetings
Noah