Icinga2 stuck at startup

Hey,

After I upgraded my first satellite from 2.8 to 2.10, icinga2 stuck during ~10 minutes at startup. Once it is started, checks are started and then, after few seconds Icinga2 crash.

To fix it, I’ve upgraded the memory of server from 4GB to 8GB. Now, it seems the satellite is stable.

Is the 2.10 need more memory to start? Or my Icinga2 configuration is getting bad? ^^

Version:

icinga2 - The Icinga 2 network monitoring daemon (version: r2.10.5-1)

Copyright (c) 2012-2019 Icinga GmbH (https://icinga.com/)
License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl2.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

System information:
  Platform: Debian GNU/Linux
  Platform version: 9 (stretch)
  Kernel: Linux
  Kernel version: 4.9.0-9-amd64
  Architecture: x86_64

Build information:
  Compiler: GNU 6.3.0
  Build host: cb654124b660

Application information:

General paths:
  Config directory: /etc/icinga2
  Data directory: /var/lib/icinga2
  Log directory: /var/log/icinga2
  Cache directory: /var/cache/icinga2
  Spool directory: /var/spool/icinga2
  Run directory: /run/icinga2

Old paths (deprecated):
  Installation root: /usr
  Sysconf directory: /etc
  Run directory (base): /run
  Local state directory: /var

Internal paths:
  Package data directory: /usr/share/icinga2
  State path: /var/lib/icinga2/icinga2.state
  Modified attributes path: /var/lib/icinga2/modified-attributes.conf
  Objects path: /var/cache/icinga2/icinga2.debug
  Vars path: /var/cache/icinga2/icinga2.vars
  PID path: /run/icinga2/icinga2.pid

Thanks for reading :slight_smile:
Kevin

Have you check the icinga2.log? What does it say around the time when the crash/hanging occured?

The output of icinga2 daemon -C might also be helpful.

1 Like

Here the icinga2 daemon -C

[2019-06-04 10:37:42 +0200] information/cli: Icinga application loader (version: r2.10.5-1)
[2019-06-04 10:37:42 +0200] information/cli: Loading configuration file(s).
[2019-06-04 10:39:03 +0200] information/ConfigItem: Committing config item(s).
[2019-06-04 10:39:04 +0200] information/ApiListener: My API identity: icinga-satellite1.localdomain
[2019-06-04 10:39:13 +0200] information/WorkQueue: #4 (DaemonUtility::LoadConfigFiles) items: 0, rate: 2.86667/s (172/min 172/5min 172/15min);
[2019-06-04 10:39:14 +0200] information/WorkQueue: #5 (ApiListener, RelayQueue) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2019-06-04 10:39:14 +0200] information/WorkQueue: #6 (ApiListener, SyncQueue) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2019-06-04 10:39:43 +0200] warning/ApplyRule: Apply rule 'ADFS-tokenrequest' (in /var/lib/icinga2/api/zones/director-global/director/service_apply.conf: 89:1-89:33) for type 'Service' does not match anywhere!
[2019-06-04 10:39:43 +0200] information/ConfigItem: Instantiated 23404 Services.
[2019-06-04 10:39:43 +0200] information/ConfigItem: Instantiated 1 IcingaApplication.
[2019-06-04 10:39:43 +0200] information/ConfigItem: Instantiated 1739 Hosts.
[2019-06-04 10:39:43 +0200] information/ConfigItem: Instantiated 1 EventCommand.
[2019-06-04 10:39:43 +0200] information/ConfigItem: Instantiated 1 FileLogger.
[2019-06-04 10:39:43 +0200] information/ConfigItem: Instantiated 1 NotificationCommand.
[2019-06-04 10:39:43 +0200] information/ConfigItem: Instantiated 49 HostGroups.
[2019-06-04 10:39:43 +0200] information/ConfigItem: Instantiated 1 ApiListener.
[2019-06-04 10:39:43 +0200] information/ConfigItem: Instantiated 133 Downtimes.
[2019-06-04 10:39:43 +0200] information/ConfigItem: Instantiated 179718 Comments.
[2019-06-04 10:39:43 +0200] information/ConfigItem: Instantiated 1 CheckerComponent.
[2019-06-04 10:39:43 +0200] information/ConfigItem: Instantiated 4 Zones.
[2019-06-04 10:39:43 +0200] information/ConfigItem: Instantiated 1 ExternalCommandListener.
[2019-06-04 10:39:43 +0200] information/ConfigItem: Instantiated 4 Endpoints.
[2019-06-04 10:39:43 +0200] information/ConfigItem: Instantiated 266 CheckCommands.
[2019-06-04 10:39:43 +0200] information/ConfigItem: Instantiated 6 ServiceGroups.
[2019-06-04 10:39:43 +0200] information/ConfigItem: Instantiated 7 TimePeriods.
[2019-06-04 10:39:43 +0200] information/ScriptGlobal: Dumping variables to file '/var/cache/icinga2/icinga2.vars'
[2019-06-04 10:39:44 +0200] information/cli: Finished validating the configuration file(s).

A ton of comments there, but I didn’t found the way to remove them. I tried with the remove-comment (https://icinga.com/docs/icinga2/latest/doc/12-icinga2-api/#remove-comment), calls went ok, but there are still instantiated here. Did I miss something?

Another fact, the systemctl start icinga2.service run instantly on the satellite 2.8, and it take a while (~minutes) from the 2.10. Is it normal?

There is no details about the crash inside of the icinga2.log.
Here an example:

[2019-06-04 02:37:25 +0200] information/FileLogger: 'main-log' started.
[2019-06-04 02:37:33 +0200] information/WorkQueue: #8 (DaemonCommand::Run) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2019-06-04 02:39:26 +0200] information/ApiListener: 'api' started.
[2019-06-04 02:39:26 +0200] information/ApiListener: Started new listener on '[0.0.0.0]:5665'
[2019-06-04 02:39:26 +0200] information/ApiListener: Reconnecting to endpoint 'icinga-master1.localdomain' via host '172.16.0.151' and port '5665'
[2019-06-04 02:39:26 +0200] information/ApiListener: Reconnecting to endpoint 'icinga-master2.localdomain' via host '172.16.0.152' and port '5665'
[2019-06-04 02:39:26 +0200] information/ApiListener: Reconnecting to endpoint 'icinga-satellite2.localdomain' via host '172.16.0.162' and port '5665'
[2019-06-04 02:39:26 +0200] information/ApiListener: New client connection for identity 'icinga-master1.localdomain' to [172.16.0.151]:5665
[2019-06-04 02:39:26 +0200] information/ApiListener: New client connection for identity 'icinga-satellite2.localdomain' to [172.16.0.162]:5665
[2019-06-04 02:39:26 +0200] information/ApiListener: New client connection for identity 'icinga-master2.localdomain' to [172.16.0.152]:5665
[2019-06-04 02:39:27 +0200] information/ApiListener: Finished reconnecting to endpoint 'icinga-master1.localdomain' via host '172.16.0.151' and port '5665'
[2019-06-04 02:39:27 +0200] information/ApiListener: Finished reconnecting to endpoint 'icinga-satellite2.localdomain' via host '172.16.0.162' and port '5665'
[2019-06-04 02:39:27 +0200] information/ApiListener: Sending config updates for endpoint 'icinga-satellite2.localdomain' in zone 'satellite'.
[2019-06-04 02:39:27 +0200] information/ApiListener: Requesting new certificate for this Icinga instance from endpoint 'icinga-master1.localdomain'.
[2019-06-04 02:39:27 +0200] information/ApiListener: Finished reconnecting to endpoint 'icinga-master2.localdomain' via host '172.16.0.152' and port '5665'
[2019-06-04 02:39:27 +0200] information/ApiListener: Requesting new certificate for this Icinga instance from endpoint 'icinga-master2.localdomain'.
[2019-06-04 02:39:27 +0200] information/ApiListener: Syncing configuration files for global zone 'director-global' to endpoint 'icinga-satellite2.localdomain'.
[2019-06-04 02:39:27 +0200] information/ApiListener: Sending config updates for endpoint 'icinga-master2.localdomain' in zone 'master'.
[2019-06-04 02:39:27 +0200] information/ApiListener: Finished sending config file updates for endpoint 'icinga-master2.localdomain' in zone 'master'.
[2019-06-04 02:39:27 +0200] information/ApiListener: Syncing runtime objects to endpoint 'icinga-master2.localdomain'.
[2019-06-04 02:39:27 +0200] information/ApiListener: Sending config updates for endpoint 'icinga-master1.localdomain' in zone 'master'.
[2019-06-04 02:39:27 +0200] information/ApiListener: Finished sending config file updates for endpoint 'icinga-master1.localdomain' in zone 'master'.
[2019-06-04 02:39:27 +0200] information/ApiListener: Syncing runtime objects to endpoint 'icinga-master1.localdomain'.
[2019-06-04 02:39:27 +0200] information/ApiListener: Applying config update from endpoint 'icinga-master1.localdomain' of zone 'master'.
[2019-06-04 02:39:27 +0200] information/ApiListener: Applying config update from endpoint 'icinga-satellite2.localdomain' of zone 'satellite'.
[2019-06-04 02:39:27 +0200] information/ApiListener: Applying config update from endpoint 'icinga-master2.localdomain' of zone 'master'.
[2019-06-04 02:39:27 +0200] information/ApiListener: Syncing configuration files for zone 'satellite' to endpoint 'icinga-satellite2.localdomain'.
[2019-06-04 02:39:27 +0200] information/ApiListener: Finished sending config file updates for endpoint 'icinga-satellite2.localdomain' in zone 'satellite'.
[2019-06-04 02:39:27 +0200] information/ApiListener: Syncing runtime objects to endpoint 'icinga-satellite2.localdomain'.
[2019-06-04 02:39:27 +0200] information/ExternalCommandListener: 'command' started.
[2019-06-04 02:39:27 +0200] information/CheckerComponent: 'checker' started.
[2019-06-04 02:39:27 +0200] information/ConfigItem: Activated all objects.
[2019-06-04 02:39:27 +0200] information/cli: Closing console log.
**###### CRASH HERE ######**
[2019-06-04 03:04:16 +0200] information/FileLogger: 'main-log' started.
[2019-06-04 03:04:24 +0200] information/WorkQueue: #8 (DaemonCommand::Run) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2019-06-04 03:06:17 +0200] information/ApiListener: 'api' started.

I’ve checked again the documentation about the upgrade, I don’t find what I missed :confused:

You can manually purge the comment files from /var/lib/icinga2/api/packages/_api/*/conf.d/comments. Use with care.

cd /var/lib/icinga2/api/packages/_api
ls -la $(cat active-stage)/conf.d/comments
rm $(cat active-stage)/conf.d/comments/*

Cheers,
Michael

It did the job :slight_smile:
For the rm, it was not working so I used the following command:

find $(cat active-stage)/conf.d/comments/ -maxdepth 1 -name "*.conf" -print0 | xargs -0 rm

Now, the satellite is starting way faster, but a bit slower than with 2.8 version. It’s stable and memory consumption is correct.

Thanks you for your help :slight_smile:

1 Like

I would investigate on the satellite why it had 140k comments though - are there any external scripts which create comments/acknowledgements with persistent comments? Grep the logs to find some answers here.

Cheers,
Michael

You’re right, comments are created by an external scripts.

Every month, we roll the update of our Windows host (~1300) via this script. It put a comment on host+services to announce the beginning of the process. 1300*10 comments, every months…

So I’ll fix that script to put a comment only on the host. And I’ll configure the cleanup attributes of IdoMySqlConnection to flush the comments after a while (for those who will read that topic: https://icinga.com/docs/icinga2/latest/doc/09-object-types/#idomysqlconnection)

1 Like