Hi all,
We have a simple Icinga2 2.11.8 master zone with two endpoints, icinga-01 and icinga-02. IDO-MySQL connects to a Galera cluster. IDO-MySQL has enabled HA feature. Now the IDO-MySQL node, usually icinga-02, can have a huge amount of database transactions out of nowhere in , say, 10 seconds:
Logs (redacted):
[2021-05-04 15:24:04 +0000] information/IdoMysqlConnection: Pending queries: 7507 (Input: 1295/s; Output: 580/s)
[2021-05-04 15:24:19 +0000] information/ApiListener: New client connection from [192.168.65.245]:48098 (no client certificate)
[2021-05-04 15:24:19 +0000] information/HttpServerConnection: Request: GET /v1/status/CIB (from [192.168.65.245]:48098), user: root, agent: curl/7.29.0, status: OK).
[2021-05-04 15:24:19 +0000] information/HttpServerConnection: HTTP client disconnected (from [192.168.65.245]:48098)
[2021-05-04 15:24:27 +0000] information/Checkable: Checkable 'xxxxxxxxxx' has 1 notification(s). Checking filters for type 'Problem', sends will be logged.
[2021-05-04 15:24:30 +0000] information/Checkable: Notifications are disabled for checkable 'xxxxxxxx'.
[2021-05-04 15:24:49 +0000] information/ApiListener: New client connection from [192.168.1.251]:47632 (no client certificate)
[2021-05-04 15:24:49 +0000] information/HttpServerConnection: Request: GET /v1/status/CIB (from [192.168.1.251]:47632), user: root, agent: curl/7.29.0, status: OK).
[2021-05-04 15:24:49 +0000] information/HttpServerConnection: HTTP client disconnected (from [192.168.1.251]:47632)
[2021-05-04 15:25:01 +0000] information/Checkable: Checkable 'xxxxx' has 1 notification(s). Checking filters for type 'Problem', sends will be logged.
2021-05-04 15:25:02 +0000] information/Checkable: Notifications are disabled for checkable 'xxxxx'.
[2021-05-04 15:25:02 +0000] information/Checkable: Notifications are disabled for checkable 'xxx'.
[2021-05-04 15:25:04 +0000] information/IdoMysqlConnection: Pending queries: 661413 (Input: 69374/s; Output: 3265/s).
.......
DB was not blocked an working well during that time. DB Transaction Input spkied from 1295/s to 69374/s and even higher later:
[2021-05-04 15:25:14 +0000] information/IdoMysqlConnection: Pending queries: 1726627 (Input: 104354/s; Output: 20/s)
.........
[2021-05-04 15:25:34 +0000] information/IdoMysqlConnection: Pending queries: 4588794 (Input: 149639/s; Output: 13758/s)
.........
[2021-05-04 15:25:44 +0000] information/IdoMysqlConnection: Pending queries: 5708509 (Input: 126442/s; Output: 15204/s)
.........
[2021-05-04 15:26:24 +0000] information/IdoMysqlConnection: Pending queries: 9079370 (Input: 52845/s; Output: 17177/s)
Eventually Icinga2 got OOM and killed. It’s a 8GB box constantly with 6GB free memory available.
There are no error in the log. Eveything looks normal in the log or system logs.
During that time we didn’t do anything special, just some regular Icingaweb2 UI operation. No Icinga2 reload/restart. Galera cluster load was low and no error was found in innodb status.
My question here is : what can cause the big amount of database transactions in a short period of time?
This happens every several days. We have experienced this with 2.12.3. And it was more serious, got OOM in less time. We decided to downgrade to 2.11.8 and it is less severe but still not avoidable.
I have searched Icinga2 bug reports and discussions to no avail. I am at my wit’s end so any input or clues are appreciated.
-
Version used (
icinga2 --version
)
2.11.8-1.el7.icinga -
Operating System and version
CentOS7 -
Enabled features (
icinga2 feature list
)
api checker command compatlog ido-mysql mainlog notification statusdata -
Icinga Web 2 version and modules (System - About)
2.8.2-1.el7.icinga, -
Config validation (
icinga2 daemon -C
)
Passed -
If you run multiple Icinga 2 instances, the
zones.conf
file (oricinga2 object list --type Endpoint
andicinga2 object list --type Zone
) from all affected nodes
Just one master zone with two endpoints. No distributed checks. Checks are done through NRPE.
Endpoint (redacted):
% declared in '/etc/icinga2/zones.conf', lines 23:3-23:44
* __name = "icinga-01"
* host = "icinga-01"
% = modified in '/etc/icinga2/zones.conf', lines 24:5-24:26
* log_duration = 86400
* name = "icinga-01"
* package = "_etc"
* port = "5665"
% = modified in '/etc/icinga2/zones.conf', lines 25:5-25:15
* source_location
* first_column = 3
* first_line = 23
* last_column = 44
* last_line = 23
* path = "/etc/icinga2/zones.conf"
* templates = [ "icinga-01" ]
% = modified in '/etc/icinga2/zones.conf', lines 23:3-23:44
* type = "Endpoint"
* zone = ""
Object 'icinga-02' of type 'Endpoint':
% declared in '/etc/icinga2/zones.conf', lines 28:3-28:44
* __name = "icinga-02"
* host = ""
* log_duration = 86400
* name = "icinga-02"
* package = "_etc"
* port = "5665"
* source_location
* first_column = 3
* first_line = 28
* last_column = 44
* last_line = 28
* path = "/etc/icinga2/zones.conf"
* templates = [ "icinga-02" ]
% = modified in '/etc/icinga2/zones.conf', lines 28:3-28:44
* type = "Endpoint"
* zone = ""
Zones (redacted):
* __name = "master"
* endpoints = [ "icinga-01", "icinga-02" ]
% = modified in '/etc/icinga2/zones.conf', lines 32:5-32:74
* global = false
* name = "master"
* package = "_etc"
* parent = ""
* source_location
* first_column = 3
* first_line = 31
* last_column = 22
* last_line = 31
* path = "/etc/icinga2/zones.conf"
* templates = [ "master" ]
% = modified in '/etc/icinga2/zones.conf', lines 31:3-31:22
* type = "Zone"
* zone = ""