Icinga 2.13.1 crashes with segmentation fault

Igor-Petrov · October 25, 2021, 8:26am

Icinga2 consumes more than 60GB (see attached) of memory and eventually fails with exit code 139.

Version: 2.13.1-1
Operating System and version: CentOS Linux release 7.7.1908 (Core). It is running in docker.
Enabled features: api checker gelf ido-mysql mainlog
Icinga Web 2 version: 2.7.3
Config validation:

[2021-10-25 07:46:35 +0000] information/cli: Icinga application loader (version: 2.13.1-1)
[2021-10-25 07:46:35 +0000] information/cli: Loading configuration file(s).
[2021-10-25 07:46:38 +0000] warning/config: Ignoring directory '/var/lib/icinga2/api/zones/***-Satellite' for unknown zone '***-Satellite'.
[2021-10-25 07:46:38 +0000] warning/config: Ignoring directory '/var/lib/icinga2/api/zones/paas-***-training' for unknown zone 'paas-***-training'.
[2021-10-25 07:46:38 +0000] information/ConfigItem: Committing config item(s).
[2021-10-25 07:46:38 +0000] information/ApiListener: My API identity: icinga2
[2021-10-25 07:46:39 +0000] warning/ApplyRule: Apply rule 'mail-icingaadmin' (in /etc/icinga2/conf.d/notifications.conf: 11:1-11:45) for type 'Notification' does not match anywhere!
[2021-10-25 07:46:39 +0000] warning/ApplyRule: Apply rule 'mail-icingaadmin' (in /etc/icinga2/conf.d/notifications.conf: 23:1-23:48) for type 'Notification' does not match anywhere!
[2021-10-25 07:46:39 +0000] warning/ApplyRule: Apply rule 'backup-downtime' (in /etc/icinga2/conf.d/downtimes.conf: 5:1-5:52) for type 'ScheduledDowntime' does not match anywhere!
[2021-10-25 07:46:39 +0000] information/ConfigItem: Instantiated 1 GelfWriter.
[2021-10-25 07:46:39 +0000] information/ConfigItem: Instantiated 1 IdoMysqlConnection.
[2021-10-25 07:46:39 +0000] information/ConfigItem: Instantiated 1 CheckerComponent.
[2021-10-25 07:46:39 +0000] information/ConfigItem: Instantiated 1 User.
[2021-10-25 07:46:39 +0000] information/ConfigItem: Instantiated 1 UserGroup.
[2021-10-25 07:46:39 +0000] information/ConfigItem: Instantiated 3 ServiceGroups.
[2021-10-25 07:46:39 +0000] information/ConfigItem: Instantiated 3 TimePeriods.
[2021-10-25 07:46:39 +0000] information/ConfigItem: Instantiated 46 Zones.
[2021-10-25 07:46:39 +0000] information/ConfigItem: Instantiated 2 NotificationCommands.
[2021-10-25 07:46:39 +0000] information/ConfigItem: Instantiated 2 HostGroups.
[2021-10-25 07:46:39 +0000] information/ConfigItem: Instantiated 1 IcingaApplication.
[2021-10-25 07:46:39 +0000] information/ConfigItem: Instantiated 55 Endpoints.
[2021-10-25 07:46:39 +0000] information/ConfigItem: Instantiated 1 FileLogger.
[2021-10-25 07:46:39 +0000] information/ConfigItem: Instantiated 1 ApiUser.
[2021-10-25 07:46:39 +0000] information/ConfigItem: Instantiated 282 CheckCommands.
[2021-10-25 07:46:39 +0000] information/ConfigItem: Instantiated 1 ApiListener.
[2021-10-25 07:46:39 +0000] information/ScriptGlobal: Dumping variables to file '/var/cache/icinga2/icinga2.vars'
[2021-10-25 07:46:39 +0000] information/cli: Finished validating the configuration file(s).

The node is a master node. There are 45 zones and 55 endpoints in the cluster. No zone has more than 2 endpoints. Icinga and IDO checks report OK. It doesn’t use much CPU.

There are ~120k checks deployed to the cluster, some of them are active some are passive. Most of the checks are running on satellites.

Icinga2 doesn’t log any errors before the failure.

I believe that 60GB is too much even for 120k checks. Any ideas what can cause so high memory consumption and what we can do to fix it?

steaksauce · October 25, 2021, 2:49pm

From experience, API calls to the Icinga2 API (such as those used in passive checks) use memory as the main resource.

We have 10s of thousands of check results via the API, among other API calls that we make – It seems like each time I write something that uses the API, the memory footprint increases.

Igor-Petrov · October 25, 2021, 3:31pm

Thank you @steaksauce. Indeed, we use Icinga API a lot. Is there a way to make Icinga release memory or decrease the memory footprint?

steaksauce · October 25, 2021, 3:41pm

Unfortunately, not that I am aware of;

we had the thought of introducing a 2nd master, but afaik they don’t actually load balance like that – only 1 master has the IDO role, so it’s more of an active-passive HA setup.

In our case, we were able to significantly reduce our memory footprint by moving all of the passive checks to another server and point the calls to the fqdn of the Icinga2 API however. Not sure if this applies to you, but worth a shot.

Worth noting that this was a one-time fix and as we added more scripts/processes that required API calls, we had to increase server resources.

drapiti · October 29, 2021, 7:04pm

Two things, you might consider using two masters with HA disabled. This means each master has it’s own db and for api passive checks will allow you to balance the load on two rather than one server. Secondly try using Jamalloc, it will be integrated in version 2.14, however you can load it on current versions. The memory and cpu usage is much better.