Monitoring backend 'icinga' is not running, Terminated by signal 9 (Killed), permanently reconnecting clients

Hi,

i guess there is one root cause for several issues.
Any ideas about it would be welcome.

Setup is:
Icinga 2.10.7-1.stretch but Problems were the same from 2.7 on.

 icinga2 feature list
Disabled features: compatlog debuglog elasticsearch gelf graphite livestatus opentsdb perfdata statusdata syslog
Enabled features: api checker command ido-pgsql influxdb mainlog notification

 icinga2 daemon -C
[2021-02-05 18:43:26 +0100] information/cli: Icinga application loader (version: r2.10.7-1)
[2021-02-05 18:43:26 +0100] information/cli: Loading configuration file(s).
[2021-02-05 18:43:26 +0100] information/ConfigItem: Committing config item(s).
[2021-02-05 18:43:26 +0100] information/ApiListener: My API identity: svr-dbmonitor.local
[2021-02-05 18:43:28 +0100] warning/ApplyRule: Apply rule 'satellite-host' (in /etc/icinga2/conf.d/satellite.conf: 29:1-29:41) for type 'Dependency' does not match anywhere!
[2021-02-05 18:43:28 +0100] warning/ApplyRule: Apply rule 'backup-downtime' (in /etc/icinga2/conf.d/downtimes.conf: 5:1-5:52) for type 'ScheduledDowntime' does not match anywhere!
[2021-02-05 18:43:28 +0100] warning/ApplyRule: Apply rule 'check-smart-attributes' (in /etc/icinga2/conf.d/services.conf: 284:1-284:38) for type 'Service' does not match anywhere!
[2021-02-05 18:43:28 +0100] information/ConfigItem: Instantiated 772 Services.
[2021-02-05 18:43:28 +0100] information/ConfigItem: Instantiated 1 InfluxdbWriter.
[2021-02-05 18:43:28 +0100] information/ConfigItem: Instantiated 1 IcingaApplication.
[2021-02-05 18:43:28 +0100] information/ConfigItem: Instantiated 32 Hosts.
[2021-02-05 18:43:28 +0100] information/ConfigItem: Instantiated 1 FileLogger.
[2021-02-05 18:43:28 +0100] information/ConfigItem: Instantiated 6 NotificationCommands.
[2021-02-05 18:43:28 +0100] information/ConfigItem: Instantiated 722 Notifications.
[2021-02-05 18:43:28 +0100] information/ConfigItem: Instantiated 1 NotificationComponent.
[2021-02-05 18:43:28 +0100] information/ConfigItem: Instantiated 2 HostGroups.
[2021-02-05 18:43:28 +0100] information/ConfigItem: Instantiated 1 ApiListener.
[2021-02-05 18:43:28 +0100] information/ConfigItem: Instantiated 61 Comments.
[2021-02-05 18:43:28 +0100] information/ConfigItem: Instantiated 1 CheckerComponent.
[2021-02-05 18:43:28 +0100] information/ConfigItem: Instantiated 20 Zones.
[2021-02-05 18:43:28 +0100] information/ConfigItem: Instantiated 1 ExternalCommandListener.
[2021-02-05 18:43:28 +0100] information/ConfigItem: Instantiated 24 Endpoints.
[2021-02-05 18:43:28 +0100] information/ConfigItem: Instantiated 1 ApiUser.
[2021-02-05 18:43:28 +0100] information/ConfigItem: Instantiated 4 Users.
[2021-02-05 18:43:28 +0100] information/ConfigItem: Instantiated 239 CheckCommands.
[2021-02-05 18:43:28 +0100] information/ConfigItem: Instantiated 1 IdoPgsqlConnection.
[2021-02-05 18:43:28 +0100] information/ConfigItem: Instantiated 3 UserGroups.
[2021-02-05 18:43:28 +0100] information/ConfigItem: Instantiated 9 ServiceGroups.
[2021-02-05 18:43:28 +0100] information/ConfigItem: Instantiated 3 TimePeriods.
[2021-02-05 18:43:28 +0100] information/ScriptGlobal: Dumping variables to file '/var/cache/icinga2/icinga2.vars'
[2021-02-05 18:43:28 +0100] information/cli: Finished validating the configuration file(s).

Icinga Web 2 Version
2.8.2
Git commit
8a89839af94a247ee2149b2336c73b8251b477c0
PHP Version
7.0.33-0+deb9u10
Git commit date
2020-08-17
Copyright

Loaded modules
Name	Version
doc 	0.0.0 
grafana 	1.3.4 
monitoring 	2.8.2 
setup 	2.8.2 

Problems are:

Host Checks are <Terminated by signal 9 (Killed).> every few hours up to two times an hour.

Remote Icinga instance ‘foobar.local’ is not connected to ‘svr-dbmonitor.local’
All clients are reconnecting permanently.
“Duration” and “Attempt” fields in nagstamon are reset every 30 to 60 minutes for most of the checks.

Monitoring Health Info is “Backend icinga is not running” frequently.
then
icinga has been up and running with PID 57298 for 19m 41s
While this pid is running for 2h 41min:

 service icinga2 status
● icinga2.service - Icinga host/service/network monitoring system
   Loaded: loaded (/lib/systemd/system/icinga2.service; enabled; vendor preset: enabled)
  Drop-In: /etc/systemd/system/icinga2.service.d
           └─limits.conf
   Active: active (running) since Fri 2021-02-05 13:41:02 CET; 2h 41min ago
  Process: 57159 ExecReload=/usr/lib/icinga2/safe-reload /etc/default/icinga2 (code=exited, status=0/SUCCESS)
 Main PID: 57298 (icinga2)

There are I/O read wait peaks on the monitoring server but any other system check results are normal.

Icinga is logging in UTC, right?
icinga_programstatus is updatet regularily:

icinga2db=# set timezone to 'UTC';
SET
icinga2db=# select now(), status_update_time from icinga_programstatus;
              now              | status_update_time  
-------------------------------+---------------------
 2021-02-05 17:34:42.974882+00 | 2021-02-05 17:34:35
(1 row)

Rgds, Jens

Might be a boring/not really helpful default answer, but:
Your icinga2 version is quite old (Oct 2019). Have you thought about updating?
There have been many fixes, especially regarding the cluster/client connections.

What comes to mind about the clients reconnecting: Are they on a newer version than the master?

1 Like