Icinga2 IDO-MySQL pending queries spike in 10 seconds

Hi all,

We have a simple Icinga2 2.11.8 master zone with two endpoints, icinga-01 and icinga-02. IDO-MySQL connects to a Galera cluster. IDO-MySQL has enabled HA feature. Now the IDO-MySQL node, usually icinga-02, can have a huge amount of database transactions out of nowhere in , say, 10 seconds:

Logs (redacted):

[2021-05-04 15:24:04 +0000] information/IdoMysqlConnection: Pending queries: 7507 (Input: 1295/s; Output: 580/s)
[2021-05-04 15:24:19 +0000] information/ApiListener: New client connection from [192.168.65.245]:48098 (no client certificate)
[2021-05-04 15:24:19 +0000] information/HttpServerConnection: Request: GET /v1/status/CIB (from [192.168.65.245]:48098), user: root, agent: curl/7.29.0, status: OK).
[2021-05-04 15:24:19 +0000] information/HttpServerConnection: HTTP client disconnected (from [192.168.65.245]:48098)
[2021-05-04 15:24:27 +0000] information/Checkable: Checkable 'xxxxxxxxxx'  has 1 notification(s). Checking filters for type 'Problem', sends will be logged.
[2021-05-04 15:24:30 +0000] information/Checkable: Notifications are disabled for checkable 'xxxxxxxx'.
[2021-05-04 15:24:49 +0000] information/ApiListener: New client connection from [192.168.1.251]:47632 (no client certificate)
[2021-05-04 15:24:49 +0000] information/HttpServerConnection: Request: GET /v1/status/CIB (from [192.168.1.251]:47632), user: root, agent: curl/7.29.0, status: OK).
[2021-05-04 15:24:49 +0000] information/HttpServerConnection: HTTP client disconnected (from [192.168.1.251]:47632)
[2021-05-04 15:25:01 +0000] information/Checkable: Checkable 'xxxxx' has 1 notification(s). Checking filters for type 'Problem', sends will be logged.
2021-05-04 15:25:02 +0000] information/Checkable: Notifications are disabled for checkable 'xxxxx'.
[2021-05-04 15:25:02 +0000] information/Checkable: Notifications are disabled for checkable 'xxx'.
[2021-05-04 15:25:04 +0000] information/IdoMysqlConnection: Pending queries: 661413 (Input: 69374/s; Output: 3265/s).
.......

DB was not blocked an working well during that time. DB Transaction Input spkied from 1295/s to 69374/s and even higher later:

[2021-05-04 15:25:14 +0000] information/IdoMysqlConnection: Pending queries: 1726627 (Input: 104354/s; Output: 20/s)
.........
[2021-05-04 15:25:34 +0000] information/IdoMysqlConnection: Pending queries: 4588794 (Input: 149639/s; Output: 13758/s)
.........
[2021-05-04 15:25:44 +0000] information/IdoMysqlConnection: Pending queries: 5708509 (Input: 126442/s; Output: 15204/s)
.........
[2021-05-04 15:26:24 +0000] information/IdoMysqlConnection: Pending queries: 9079370 (Input: 52845/s; Output: 17177/s)

Eventually Icinga2 got OOM and killed. It’s a 8GB box constantly with 6GB free memory available.

There are no error in the log. Eveything looks normal in the log or system logs.

During that time we didn’t do anything special, just some regular Icingaweb2 UI operation. No Icinga2 reload/restart. Galera cluster load was low and no error was found in innodb status.

My question here is : what can cause the big amount of database transactions in a short period of time?

This happens every several days. We have experienced this with 2.12.3. And it was more serious, got OOM in less time. We decided to downgrade to 2.11.8 and it is less severe but still not avoidable.

I have searched Icinga2 bug reports and discussions to no avail. I am at my wit’s end so any input or clues are appreciated.

  • Version used (icinga2 --version)
    2.11.8-1.el7.icinga

  • Operating System and version
    CentOS7

  • Enabled features (icinga2 feature list)
    api checker command compatlog ido-mysql mainlog notification statusdata

  • Icinga Web 2 version and modules (System - About)
    2.8.2-1.el7.icinga,

  • Config validation (icinga2 daemon -C)
    Passed

  • If you run multiple Icinga 2 instances, the zones.conf file (or icinga2 object list --type Endpoint and icinga2 object list --type Zone) from all affected nodes

Just one master zone with two endpoints. No distributed checks. Checks are done through NRPE.

Endpoint (redacted):

  % declared in '/etc/icinga2/zones.conf', lines 23:3-23:44
  * __name = "icinga-01"
  * host = "icinga-01"
    % = modified in '/etc/icinga2/zones.conf', lines 24:5-24:26
  * log_duration = 86400
  * name = "icinga-01"
  * package = "_etc"
  * port = "5665"
    % = modified in '/etc/icinga2/zones.conf', lines 25:5-25:15
  * source_location
    * first_column = 3
    * first_line = 23
    * last_column = 44
    * last_line = 23
    * path = "/etc/icinga2/zones.conf"
  * templates = [ "icinga-01" ]
    % = modified in '/etc/icinga2/zones.conf', lines 23:3-23:44
  * type = "Endpoint"
  * zone = ""

Object 'icinga-02' of type 'Endpoint':
  % declared in '/etc/icinga2/zones.conf', lines 28:3-28:44
  * __name = "icinga-02"
  * host = ""
  * log_duration = 86400
  * name = "icinga-02"
  * package = "_etc"
  * port = "5665"
  * source_location
    * first_column = 3
    * first_line = 28
    * last_column = 44
    * last_line = 28
    * path = "/etc/icinga2/zones.conf"
  * templates = [ "icinga-02" ]
    % = modified in '/etc/icinga2/zones.conf', lines 28:3-28:44
  * type = "Endpoint"
  * zone = ""

Zones (redacted):

  * __name = "master"
  * endpoints = [ "icinga-01", "icinga-02" ]
    % = modified in '/etc/icinga2/zones.conf', lines 32:5-32:74
  * global = false
  * name = "master"
  * package = "_etc"
  * parent = ""
  * source_location
    * first_column = 3
    * first_line = 31
    * last_column = 22
    * last_line = 31
    * path = "/etc/icinga2/zones.conf"
  * templates = [ "master" ]
    % = modified in '/etc/icinga2/zones.conf', lines 31:3-31:22
  * type = "Zone"
  * zone = ""

There’s a lot of problems here for us to get to a solution:

  • You are running quite an old version: GitHub - Icinga/icinga2 at v2.11.8
  • CentOS 7 is old - CentOS 8 is the current edition but looking at Download 8 is EOL and 7 will go EOL in 2024 - no idea what is going on there!

I seriously recommend that you run a modern OS for Icinga2. It is still under quite heavy development and there are a lot of moving parts. If you need a LTS OS then Ubuntu 20.04 works fine and that still has four years on the clock. I run some phone exchanges on CentOS but I would really not recommend Icinga2 on it.

If you need a hand migrating to a new system, close this post and open a new one focused on that task and we’ll do our best to help out.

Hi,

  1. We did use 2.12.3 in the beginning and OOM happened more frequently. Because we didn’t want to use IcingaDB (not mature enough?) and other new features yet, we decided to go back to 2.11.x. The OOM is less frequently but not unavoidable.
    IMHO, 2.11.8 is not really an old version. The lifecycle policy of Icinga is never clear, but at least 2.11.8 was out on the same day on 2020/Dec/15 as 2.12.3 and it was/is still supported.
    (I am also curious why 2.11.8 is labeled as “the latest release” on Github. Could that mean it’s the official version, rather than 2.12.3? :smiley: )

  2. I am in no position to call the shots about the OS choice… :slight_smile:

Thanks for your reply.

“I am also curious why 2.11.8” - I was trying to show you how old that version is.

I don’t use IcingaDB yet, probably for the same reasons as you.

You will have to sort out your approach to OS choice. CentOS is sort of EOL - CentOS Linux is dead—and Red Hat says Stream is “not a replacement” | Ars Technica

I’m not trying to be difficult but you will have to get someone to look into your OS choice.