All checks displayed as "overdue" after Icinga update

Give as much information as you can, e.g.

  • Icinga 2.16, icingaweb 2.13.0, icingadb 1.5.1, icingadb-web 1.4.0

  • Used modules and their versions (System - About)

    • icingadb 1.4.0
    • businessprocess 2.6.0
    • dependencies 1.0.3
    • doc 2.13.0
    • grafana 3.1.2
    • map 2.0.0
    • migrate 2.13.0
    • reactbundle 0.7.0
    • x509 1.3.2
  • Icinga 2 version used (icinga2 --version) r2.16.0-1

  • PHP version used (php --version) 8.3.30

  • Server operating system and version: RHEL 8

After updating the master server today initially all checks looked good. Then some checks based on check_oracle_health and the Oracle Instantclient started to fail. I tried to upgrade the instant client to fix this but after that ALL checks started to show as Overdue in Icingaweb. So I downgraded the instantclient again and rebooted the system. The checks briefly recovered but returned to the overdue status. I am not totally sure if the instantclient upgrade was the cause but it was something I did after the OS and application upgrade.

Restarting the service temporarily fixes the issue. At that moment all pending notifications are sent at once.

The load of the host looks ok, no high CPU or memory usage, no I/O bottlenecks.

The checks themselves seem to work and the data seems to arrive, as some checks show overdue but in the history tab the last state is already more up to date. So probably only a display error in Icingaweb/icingadb-web or communication between icinga2 and icingaweb services / DB backend?

All Icinga services, Redis and MariaDB are running and I can see no errors in the logs.

Any suggestion what could be the cause of this issue or what else I could check?

regards,

Andreas

Hi !

Without more information it is hard to say what may cause this behaviour can you post the output of

icinga2 daemon -C or it might be necessary to use icinga2 daemon --validate --dump-objects

As the check_oracle_health is relatively cpu intensive .. does it work on the cli without icinga2 ?

What does the icinga2.log report in /var/logs/icinga2/icinga2.log when triggering the check_oracle_health .. does it time out ?

Maybe you would need to increase the check timeout so it can deliver the results from the check back in time ?
As it is an RHEL System does it conflict with SELinux settings ?

Regards

David

Hello David,

According to Dirk Götz the problem sounds very much like the one described here:

I will try to downgrade my instance back to 2.15.2 and would come back to you.

regards,

Andreas

I’m also forced to downgrade.

Downgrade to 2.15.3 seems to have resolved the issue for now.

Interesting.
We also downgraded back to 2.15.3 in our DEV environment yesterday, after the Icinga master stopped suddenly working thrice this week.

I first wanted to verify that the problem does not occur with the v2.15.3 version before creating an issue/community post.

What I found in the logs so far is:

First “crash” happened on sunday (2026-05-03) morning. As it was the DEV env I didn’t bothe looking into it then.
On monday I found that the icinga2.log file blew up to 96GB and filled up the server with

May 04 08:01:34 msd-ic-ma02 icinga2[744938]: Cannot accept new connection: Too many open files
May 04 08:01:34 msd-ic-ma02 icinga2[744938]: Cannot accept new connection: Too many open files
May 04 08:01:34 msd-ic-ma02 icinga2[744938]: Cannot accept new connection: Too many open files
May 04 08:01:34 msd-ic-ma02 icinga2[744938]: Cannot accept new connection: Too many open files
May 04 08:01:34 msd-ic-ma02 icinga2[744938]: Cannot accept new connection: Too many open files
May 04 08:01:34 msd-ic-ma02 icinga2[744938]: Cannot accept new connection: Too many open files
May 04 08:01:34 msd-ic-ma02 icinga2[744938]: Cannot accept new connection: Too many open files
May 04 08:01:34 msd-ic-ma02 icinga2[744938]: Cannot accept new connection: Too many open files
May 04 08:01:34 msd-ic-ma02 icinga2[744938]: Cannot accept new connection: Too many open files
May 04 08:01:34 msd-ic-ma02 icinga2[744938]: Cannot accept new connection: Too many open files
May 04 08:01:34 msd-ic-ma02 icinga2[744938]: Cannot accept new connection: Too many open files
May 04 08:01:34 msd-ic-ma02 icinga2[744938]: Cannot accept new connection: Too many open files
May 04 08:01:34 msd-ic-ma02 icinga2[744938]: Cannot accept new connection: Too many open files
May 04 08:01:34 msd-ic-ma02 icinga2[744938]: Cannot accept new connection: Too many open files
May 04 08:01:34 msd-ic-ma02 icinga2[744938]: Cannot accept new connection: Too many open files
May 04 08:01:34 msd-ic-ma02 icinga2[744938]: Cannot accept new connection: Too many open files
May 04 08:01:34 msd-ic-ma02 icinga2[744938]: Cannot accept new connection: Too many open files
May 04 08:01:34 msd-ic-ma02 icinga2[744938]: Cannot accept new connection: Too many open files
May 04 08:01:34 msd-ic-ma02 icinga2[744938]: Cannot accept new connection: Too many open files
May 04 08:01:34 msd-ic-ma02 icinga2[744938]: Cannot accept new connection: Too many open files
May 04 08:01:34 msd-ic-ma02 icinga2[744938]: Cannot accept new connection: Too many open files

Looking at the rotated logs:

After some time the “normal” (like information/WorkQueue,information/ApiListener, information/HttpServerConnection, information/IcingaDB, …) log entries get mostly replaced by

[2026-05-02 17:06:02 +0200] information/ApiListener: Ignoring JSON-RPC connection from [::ffff:10.138.1.6]:42816. We're already connected to Endpoint 'msd-ic-so02' (last message sent: 2026-05-02 16:26:10, last message received: 2026-05-02 16:25:12).
[2026-05-02 17:06:02 +0200] information/WorkQueue: #12 (Influxdb2Writer, victoria-metrics) items: 12007, rate:  0/s (0/min 0/5min 0/15min); empty in 2 hours, 22 minutes and 56 seconds
[2026-05-02 17:06:02 +0200] information/ApiListener: New client connection for identity 'por-lm1-fra' from [::ffff:10.254.3.4]:59066
[2026-05-02 17:06:02 +0200] information/ApiListener: Ignoring JSON-RPC connection from [::ffff:10.254.3.4]:59066. We're already connected to Endpoint 'por-lm1-fra' (last message sent: 2026-05-02 15:50:31, last message received: 2026-05-02 15:49:32).
[2026-05-02 17:06:04 +0200] information/ApiListener: New client connection from [::ffff:10.254.1.70]:54230 (no client certificate)
[2026-05-02 17:06:04 +0200] information/ApiListener: New client connection from [::ffff:10.254.1.70]:54238 (no client certificate)
[2026-05-02 17:06:05 +0200] information/JsonRpcConnection: No messages for identity 'msd-ic-sl02' have been received in the last 60 seconds.
[2026-05-02 17:06:05 +0200] information/ApiListener: Removing API client for endpoint 'msd-ic-sl02'. 0 API clients left.
[2026-05-02 17:06:05 +0200] information/JsonRpcConnection: API client disconnected for identity 'msd-ic-sl02'
[2026-05-02 17:06:07 +0200] information/ApiListener: New client connection for identity 'msd-ic-sl01' from [::ffff:10.132.1.13]:52052
[2026-05-02 17:06:07 +0200] information/ApiListener: Ignoring JSON-RPC connection from [::ffff:10.132.1.13]:52052. We're already connected to Endpoint 'msd-ic-sl01' (last message sent: 2026-05-02 16:15:25, last message received: 2026-05-02 16:14:26).
[2026-05-02 17:06:07 +0200] information/ApiListener: New client connection for identity 'por-mgmt04' from [::ffff:10.254.3.3]:36826
[2026-05-02 17:06:07 +0200] information/ApiListener: Ignoring JSON-RPC connection from [::ffff:10.254.3.3]:36826. We're already connected to Endpoint 'por-mgmt04' (last message sent: 2026-05-02 15:52:55, last message received: 2026-05-02 15:51:57).
[2026-05-02 17:06:09 +0200] information/ApiListener: New client connection for identity 'mvd-lm1-fra' from [::ffff:10.254.3.138]:33852
[2026-05-02 17:06:09 +0200] information/ApiListener: Ignoring JSON-RPC connection from [::ffff:10.254.3.138]:33852. We're already connected to Endpoint 'mvd-lm1-fra' (last message sent: 2026-05-02 15:44:45, last message received: 2026-05-02 15:44:49).
[2026-05-02 17:06:09 +0200] information/ApiListener: New client connection for identity 'mvd-lm1-mde01' from [::ffff:10.254.1.70]:54256
[2026-05-02 17:06:09 +0200] information/ApiListener: Ignoring JSON-RPC connection from [::ffff:10.254.1.70]:54256. We're already connected to Endpoint 'mvd-lm1-mde01' (last message sent: 2026-05-02 15:48:25, last message received: 2026-05-02 15:46:59).
[2026-05-02 17:06:10 +0200] information/ApiListener: New client connection for identity 'q1cv-lm1-fra' from [::ffff:10.254.3.106]:60816
[2026-05-02 17:06:10 +0200] information/ApiListener: Ignoring JSON-RPC connection from [::ffff:10.254.3.106]:60816. We're already connected to Endpoint 'q1cv-lm1-fra' (last message sent: 2026-05-02 16:21:37, last message received: 2026-05-02 16:20:40).
[2026-05-02 17:06:12 +0200] information/WorkQueue: #12 (Influxdb2Writer, victoria-metrics) items: 12029, rate:  0/s (0/min 0/5min 0/15min); empty in 1 hour, 31 minutes and 7 seconds
[2026-05-02 17:06:12 +0200] information/ApiListener: New client connection for identity 'q1sc2-lm1-fra' from [::ffff:10.254.3.154]:56894
[2026-05-02 17:06:12 +0200] information/ApiListener: Ignoring JSON-RPC connection from [::ffff:10.254.3.154]:56894. We're already connected to Endpoint 'q1sc2-lm1-fra' (last message sent: 2026-05-02 16:55:42, last message received: 2026-05-02 16:54:42).
[2026-05-02 17:06:15 +0200] information/ApiListener: New client connection for identity 'msd-ic-so01' from [::ffff:10.138.1.5]:36706
[2026-05-02 17:06:15 +0200] information/ApiListener: Ignoring JSON-RPC connection from [::ffff:10.138.1.5]:36706. We're already connected to Endpoint 'msd-ic-so01' (last message sent: 2026-05-02 16:29:24, last message received: 2026-05-02 16:28:25).
[2026-05-02 17:06:15 +0200] information/ApiListener: New client connection for identity 'q1poc-lm1-mde01' from [::ffff:10.254.1.54]:52520
[2026-05-02 17:06:15 +0200] information/ApiListener: Ignoring JSON-RPC connection from [::ffff:10.254.1.54]:52520. We're already connected to Endpoint 'q1poc-lm1-mde01' (last message sent: 2026-05-02 15:51:43, last message received: 2026-05-02 15:50:45).
[2026-05-02 17:06:15 +0200] information/ApiListener: New client connection from [::ffff:10.254.1.70]:60432 (no client certificate)
[2026-05-02 17:06:16 +0200] information/ApiListener: New client connection for identity 'q1sc3-lm1-fra' from [::ffff:10.254.3.130]:53038
[2026-05-02 17:06:16 +0200] information/ApiListener: Ignoring JSON-RPC connection from [::ffff:10.254.3.130]:53038. We're already connected to Endpoint 'q1sc3-lm1-fra' (last message sent: 2026-05-02 16:57:05, last message received: 2026-05-02 16:56:06).
[2026-05-02 17:06:17 +0200] information/ApiListener: New client connection for identity 'msd-ic-sl01' from [::ffff:10.132.1.13]:56480
[2026-05-02 17:06:17 +0200] information/ApiListener: Ignoring JSON-RPC connection from [::ffff:10.132.1.13]:56480. We're already connected to Endpoint 'msd-ic-sl01' (last message sent: 2026-05-02 16:15:25, last message received: 2026-05-02 16:14:26).
[2026-05-02 17:06:17 +0200] information/ApiListener: New client connection for identity 'por-mgmt04' from [::ffff:10.254.3.3]:47784
[2026-05-02 17:06:17 +0200] information/ApiListener: Ignoring JSON-RPC connection from [::ffff:10.254.3.3]:47784. We're already connected to Endpoint 'por-mgmt04' (last message sent: 2026-05-02 15:52:55, last message received: 2026-05-02 15:51:57).
[2026-05-02 17:06:19 +0200] information/ApiListener: New client connection from [::ffff:10.132.1.13]:56486 (no client certificate)
[2026-05-02 17:06:19 +0200] information/ApiListener: New client connection for identity 'mvd-lm1-mde01' from [::ffff:10.254.1.70]:60444
[2026-05-02 17:06:19 +0200] information/ApiListener: Ignoring JSON-RPC connection from [::ffff:10.254.1.70]:60444. We're already connected to Endpoint 'mvd-lm1-mde01' (last message sent: 2026-05-02 15:48:25, last message received: 2026-05-02 15:46:59).
[2026-05-02 17:06:21 +0200] information/WorkQueue: #7 (ApiListener, SyncQueue) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2026-05-02 17:06:21 +0200] information/WorkQueue: #6 (ApiListener, RelayQueue) items: 0, rate: 19.9/s (1194/min 5606/5min 16685/15min);
[2026-05-02 17:06:22 +0200] information/ApiListener: New client connection for identity 'msd-ic-so02' from [::ffff:10.138.1.6]:37270
[2026-05-02 17:06:22 +0200] information/ApiListener: Ignoring JSON-RPC connection from [::ffff:10.138.1.6]:37270. We're already connected to Endpoint 'msd-ic-so02' (last message sent: 2026-05-02 16:26:10, last message received: 2026-05-02 16:25:12).
[2026-05-02 17:06:22 +0200] information/WorkQueue: #12 (Influxdb2Writer, victoria-metrics) items: 12044, rate:  0/s (0/min 0/5min 0/15min); empty in 2 hours, 13 minutes and 49 seconds
[2026-05-02 17:06:22 +0200] information/ApiListener: New client connection for identity 'por-lm1-fra' from [::ffff:10.254.3.4]:42886
[2026-05-02 17:06:22 +0200] information/ApiListener: Ignoring JSON-RPC connection from [::ffff:10.254.3.4]:42886. We're already connected to Endpoint 'por-lm1-fra' (last message sent: 2026-05-02 15:50:31, last message received: 2026-05-02 15:49:32).
[2026-05-02 17:06:22 +0200] information/ApiListener: New client connection for identity 'q1sc2-lm1-fra' from [::ffff:10.254.3.154]:41620
[2026-05-02 17:06:22 +0200] information/ApiListener: Ignoring JSON-RPC connection from [::ffff:10.254.3.154]:41620. We're already connected to Endpoint 'q1sc2-lm1-fra' (last message sent: 2026-05-02 16:55:42, last message received: 2026-05-02 16:54:42).
[2026-05-02 17:06:25 +0200] information/ApiListener: New client connection for identity 'msd-ic-sl02' from [::ffff:10.138.1.3]:40920
[2026-05-02 17:06:25 +0200] information/ApiListener: Sending config updates for endpoint 'msd-ic-sl02' in zone 'msd-private_vcloud'.
[2026-05-02 17:06:25 +0200] information/ApiListener: Syncing configuration files for global zone 'director-global' to endpoint 'msd-ic-sl02'.
[2026-05-02 17:06:25 +0200] information/ApiListener: Syncing configuration files for global zone 'global-satellites' to endpoint 'msd-ic-sl02'.
[2026-05-02 17:06:25 +0200] information/ApiListener: Syncing configuration files for global zone 'global-templates' to endpoint 'msd-ic-sl02'.
[2026-05-02 17:06:25 +0200] information/ApiListener: Syncing configuration files for zone 'msd-private_vcloud' to endpoint 'msd-ic-sl02'.
[2026-05-02 17:06:25 +0200] information/ApiListener: Finished sending config file updates for endpoint 'msd-ic-sl02' in zone 'msd-private_vcloud'.
[2026-05-02 17:06:25 +0200] information/ApiListener: Syncing runtime objects to endpoint 'msd-ic-sl02'.
[2026-05-02 17:06:25 +0200] information/ApiListener: Finished syncing runtime objects to endpoint 'msd-ic-sl02'.
[2026-05-02 17:06:25 +0200] information/ApiListener: Finished sending runtime config updates for endpoint 'msd-ic-sl02' in zone 'msd-private_vcloud'.
[2026-05-02 17:06:25 +0200] information/ApiListener: Sending replay log for endpoint 'msd-ic-sl02' in zone 'msd-private_vcloud'.
[2026-05-02 17:06:25 +0200] information/ApiListener: Finished sending replay log for endpoint 'msd-ic-sl02' in zone 'msd-private_vcloud'.
[2026-05-02 17:06:25 +0200] information/ApiListener: Finished syncing endpoint 'msd-ic-sl02' in zone 'msd-private_vcloud'.
[2026-05-02 17:06:27 +0200] information/ApiListener: New client connection for identity 'msd-ic-sl01' from [::ffff:10.132.1.13]:36446
[2026-05-02 17:06:27 +0200] information/ApiListener: Ignoring JSON-RPC connection from [::ffff:10.132.1.13]:36446. We're already connected to Endpoint 'msd-ic-sl01' (last message sent: 2026-05-02 16:15:25, last message received: 2026-05-02 16:14:26).
[2026-05-02 17:06:27 +0200] information/ApiListener: New client connection for identity 'por-mgmt04' from [::ffff:10.254.3.3]:35696
[2026-05-02 17:06:27 +0200] information/ApiListener: Ignoring JSON-RPC connection from [::ffff:10.254.3.3]:35696. We're already connected to Endpoint 'por-mgmt04' (last message sent: 2026-05-02 15:52:55, last message received: 2026-05-02 15:51:57).
[2026-05-02 17:06:29 +0200] information/ApiListener: New client connection from [::ffff:10.132.1.13]:36448 (no client certificate)
[2026-05-02 17:06:29 +0200] information/ApiListener: New client connection for identity 'mvd-lm1-fra' from [::ffff:10.254.3.138]:37178
[2026-05-02 17:06:29 +0200] information/ApiListener: Ignoring JSON-RPC connection from [::ffff:10.254.3.138]:37178. We're already connected to Endpoint 'mvd-lm1-fra' (last message sent: 2026-05-02 15:44:45, last message received: 2026-05-02 15:44:49).
[2026-05-02 17:06:29 +0200] information/ApiListener: New client connection for identity 'mvd-lm1-mde01' from [::ffff:10.254.1.70]:52048
[2026-05-02 17:06:29 +0200] information/ApiListener: Ignoring JSON-RPC connection from [::ffff:10.254.1.70]:52048. We're already connected to Endpoint 'mvd-lm1-mde01' (last message sent: 2026-05-02 15:48:25, last message received: 2026-05-02 15:46:59).
[2026-05-02 17:06:30 +0200] information/ApiListener: New client connection for identity 'q1cv-lm1-fra' from [::ffff:10.254.3.106]:42144
[2026-05-02 17:06:30 +0200] information/ApiListener: Ignoring JSON-RPC connection from [::ffff:10.254.3.106]:42144. We're already connected to Endpoint 'q1cv-lm1-fra' (last message sent: 2026-05-02 16:21:37, last message received: 2026-05-02 16:20:40).
[2026-05-02 17:06:31 +0200] information/ApiListener: New client connection from [::ffff:10.254.1.70]:52054 (no client certificate)
[2026-05-02 17:06:32 +0200] information/ApiListener: New client connection for identity 'msd-ic-so02' from [::ffff:10.138.1.6]:55818
[2026-05-02 17:06:32 +0200] information/ApiListener: Ignoring JSON-RPC connection from [::ffff:10.138.1.6]:55818. We're already connected to Endpoint 'msd-ic-so02' (last message sent: 2026-05-02 16:26:10, last message received: 2026-05-02 16:25:12).
[2026-05-02 17:06:32 +0200] information/WorkQueue: #12 (Influxdb2Writer, victoria-metrics) items: 12057, rate:  0/s (0/min 0/5min 0/15min); empty in 2 hours, 34 minutes and 34 seconds
[2026-05-02 17:06:32 +0200] information/ApiListener: New client connection for identity 'q1sc2-lm1-fra' from [::ffff:10.254.3.154]:44092
[2026-05-02 17:06:32 +0200] information/ApiListener: Ignoring JSON-RPC connection from [::ffff:10.254.3.154]:44092. We're already connected to Endpoint 'q1sc2-lm1-fra' (last message sent: 2026-05-02 16:55:42, last message received: 2026-05-02 16:54:42).
[2026-05-02 17:06:35 +0200] information/ApiListener: New client connection for identity 'msd-ic-so01' from [::ffff:10.138.1.5]:40054
[2026-05-02 17:06:35 +0200] information/ApiListener: Ignoring JSON-RPC connection from [::ffff:10.138.1.5]:40054. We're already connected to Endpoint 'msd-ic-so01' (last message sent: 2026-05-02 16:29:24, last message received: 2026-05-02 16:28:25).
[2026-05-02 17:06:35 +0200] information/ApiListener: New client connection for identity 'q1poc-lm1-mde01' from [::ffff:10.254.1.54]:36412
[2026-05-02 17:06:35 +0200] information/ApiListener: Ignoring JSON-RPC connection from [::ffff:10.254.1.54]:36412. We're already connected to Endpoint 'q1poc-lm1-mde01' (last message sent: 2026-05-02 15:51:43, last message received: 2026-05-02 15:50:45).
[2026-05-02 17:06:36 +0200] information/ApiListener: New client connection for identity 'q1sc3-lm1-fra' from [::ffff:10.254.3.130]:33678
[2026-05-02 17:06:36 +0200] information/ApiListener: Ignoring JSON-RPC connection from [::ffff:10.254.3.130]:33678. We're already connected to Endpoint 'q1sc3-lm1-fra' (last message sent: 2026-05-02 16:57:05, last message received: 2026-05-02 16:56:06).
[2026-05-02 17:06:37 +0200] information/ApiListener: New client connection for identity 'msd-ic-sl01' from [::ffff:10.132.1.13]:37562
[2026-05-02 17:06:37 +0200] information/ApiListener: Ignoring JSON-RPC connection from [::ffff:10.132.1.13]:37562. We're already connected to Endpoint 'msd-ic-sl01' (last message sent: 2026-05-02 16:15:25, last message received: 2026-05-02 16:14:26).
[2026-05-02 17:06:37 +0200] information/ApiListener: New client connection for identity 'por-mgmt04' from [::ffff:10.254.3.3]:50872
[2026-05-02 17:06:37 +0200] information/ApiListener: Ignoring JSON-RPC connection from [::ffff:10.254.3.3]:50872. We're already connected to Endpoint 'por-mgmt04' (last message sent: 2026-05-02 15:52:55, last message received: 2026-05-02 15:51:57).
[2026-05-02 17:06:39 +0200] information/ApiListener: New client connection for identity 'mvd-lm1-mde01' from [::ffff:10.254.1.70]:49742
[2026-05-02 17:06:39 +0200] information/ApiListener: Ignoring JSON-RPC connection from [::ffff:10.254.1.70]:49742. We're already connected to Endpoint 'mvd-lm1-mde01' (last message sent: 2026-05-02 15:48:25, last message received: 2026-05-02 15:46:59).
[2026-05-02 17:06:40 +0200] information/ApiListener: New client connection for identity 'q1cv-lm1-fra' from [::ffff:10.254.3.106]:44786
[2026-05-02 17:06:40 +0200] information/ApiListener: Ignoring JSON-RPC connection from [::ffff:10.254.3.106]:44786. We're already connected to Endpoint 'q1cv-lm1-fra' (last message sent: 2026-05-02 16:21:37, last message received: 2026-05-02 16:20:40).
[2026-05-02 17:06:41 +0200] information/ApiListener: New client connection from [::ffff:10.132.1.13]:37578 (no client certificate)
[2026-05-02 17:06:42 +0200] information/ApiListener: New client connection for identity 'msd-ic-so02' from [::ffff:10.138.1.6]:60620

These messages started on 2026-05-02 at ~15:30 and it took about 16 hours for the icinga service to stop doing checks and accepting api requests completely.

I have the log files for the day of the crash (meaning no more active checks are executed, which we notice via an external endpoint not being pinged anymore) and the days prior., though no debug logs.

I have the feeling that the master is in an deployment loop and can’t get out of it.

A customer also informed me that their updated Icinga stopped working. But the rebooted, before I could take a look at it, so I’m waiting if it happens again and I can get a look at the problem and save logs from there as well.

Just downgraded the customer setup as well, as it stopped working shortly after midnight.

No “etxraordinary” messages in the log, but there are some agent connect messages that constantly try to connect with a name that is not in the monitoring config (customer error with cloned VMs…). Also one agent with the “we’re already connected to” message and the InfluxDB queue message

[2026-05-08 00:07:13 +0200] information/ApiListener: New client connection for identity 's049cx30' from [::ffff:10.49.35.117]:57454 (no Endpoint object found for identity)
[2026-05-08 00:07:13 +0200] warning/ApiListener: Unknown endpoint 's049cx30' with valid certificate. Aborting JSON-RPC connection.
[2026-05-08 00:07:14 +0200] information/ApiListener: New client connection for identity 's049ix43' from [::ffff:10.49.31.246]:55784 (no Endpoint object found for identity)
[2026-05-08 00:07:14 +0200] warning/ApiListener: Unknown endpoint 's049ix43' with valid certificate. Aborting JSON-RPC connection.
[2026-05-08 00:07:14 +0200] information/WorkQueue: #9 (Influxdb2Writer, influxdb2) items: 99328, rate:  0/s (0/min 0/5min 0/15min); empty in infinite time, your task handler isn't able to keep up
[2026-05-08 00:07:16 +0200] information/ApiListener: New client connection for identity 's049cx30' from [::ffff:10.49.35.129]:55725 (no Endpoint object found for identity)
[2026-05-08 00:07:16 +0200] warning/ApiListener: Unknown endpoint 's049cx30' with valid certificate. Aborting JSON-RPC connection.
[2026-05-08 00:07:17 +0200] information/ApiListener: New client connection for identity 's049ix41' from [::ffff:10.49.31.245]:59681 (no Endpoint object found for identity)
[2026-05-08 00:07:17 +0200] warning/ApiListener: Unknown endpoint 's049ix41' with valid certificate. Aborting JSON-RPC connection.
[2026-05-08 00:07:17 +0200] information/ApiListener: New client connection for identity 's049cx30' from [::ffff:10.49.35.197]:59917 (no Endpoint object found for identity)
[2026-05-08 00:07:17 +0200] warning/ApiListener: Unknown endpoint 's049cx30' with valid certificate. Aborting JSON-RPC connection.
[2026-05-08 00:07:17 +0200] information/ApiListener: New client connection for identity 's049ix04' from [::ffff:10.49.31.241]:58466 (no Endpoint object found for identity)
[2026-05-08 00:07:17 +0200] warning/ApiListener: Unknown endpoint 's049ix04' with valid certificate. Aborting JSON-RPC connection.
[2026-05-08 00:07:18 +0200] information/ApiListener: New client connection for identity 's049cx30' from [::ffff:10.49.35.198]:56544 (no Endpoint object found for identity)
[2026-05-08 00:07:18 +0200] warning/ApiListener: Unknown endpoint 's049cx30' with valid certificate. Aborting JSON-RPC connection.
[2026-05-08 00:07:18 +0200] information/ApiListener: New client connection for identity 's049cx30' from [::ffff:10.49.35.199]:55452 (no Endpoint object found for identity)
[2026-05-08 00:07:18 +0200] warning/ApiListener: Unknown endpoint 's049cx30' with valid certificate. Aborting JSON-RPC connection.
[2026-05-08 00:07:19 +0200] information/ApiListener: New client connection for identity 's049ix05' from [::ffff:10.49.31.242]:52763 (no Endpoint object found for identity)
[2026-05-08 00:07:19 +0200] warning/ApiListener: Unknown endpoint 's049ix05' with valid certificate. Aborting JSON-RPC connection.
[2026-05-08 00:07:20 +0200] information/ApiListener: New client connection for identity 's049cx30' from [::ffff:10.49.35.146]:64855 (no Endpoint object found for identity)
[2026-05-08 00:07:20 +0200] warning/ApiListener: Unknown endpoint 's049cx30' with valid certificate. Aborting JSON-RPC connection.
[2026-05-08 00:07:23 +0200] information/ApiListener: New client connection for identity 's049db55' from [::ffff:10.49.38.94]:52899
[2026-05-08 00:07:23 +0200] information/ApiListener: Ignoring JSON-RPC connection from [::ffff:10.49.38.94]:52899. We're already connected to Endpoint 's049db55' (last message sent: 2026-05-07 23:15:32, last message received: 2026-05-07 23:15:32).

Not sure if the “many” agent connections or the Influx queue might be the problem.

Both setups (our DEV and the customers) don’t use graphite (mentioning because of the linked graphite issue above)

I can confirm the issue. Downgrading to 2.15.3 seems to help.

Question: All of that comes from a clanker?

We are having the same issue as described by others in this thread. The one thing I did notice is that we begin seeing rates of 0/s and items begin piling up for InfluxDB2Writer.

[2026-05-18 11:03:04 -0500] information/WorkQueue: #9 (Influxdb2Writer, influxdb2) items: 7685, rate:  0/s (0/min 5123/5min 30974/15min); empty in 2 minutes and 46 seconds
[2026-05-18 11:03:14 -0500] information/WorkQueue: #9 (Influxdb2Writer, influxdb2) items: 8135, rate:  0/s (0/min 4733/5min 30603/15min); empty in 3 minutes
....<removed hundreds of repeated lines counting upward>....
[2026-05-18 20:56:24 -0500] information/WorkQueue: #9 (Influxdb2Writer, influxdb2) items: 1570489, rate:  0/s (0/min 0/5min 0/15min); empty in 10 hours, 24 minutes and 43 seconds

Eventually memory usage climbs till full followed by swap space filling. Icinga2 log continues to run and update, and alert notifications send. Icingaweb2 stalls and begins showing all checks as overdue and does not update until restarting the icinga2 service.

So far, the only workaround here has been setting up cronjob restarts of the icinga2 service to clear things up every so often…

Debian 12 Bookworm OS - Icinga2 2.16.0-1
Icinga 2.16, icingaweb 2.13.0, icingadb 1.5.1, icingadb-web 1.4.0

It seems I can confirm this issue, though I can’t quite figure out the trigger in our setup.

In our DEV env we are running automated testing deployments each night and re-applying the 2.16 update last Friday, Icinga stopped working during these automated deployments, when they trigger Director deplyoments or Icinga service restarts.

Afterwards the logs fills with config sync loops and WorkQueue messages regarding the InfluxDB writer

[2026-05-17 22:51:12 +0200] notice/WorkQueue: #8 (IcingaDB) items: 0, rate: 0.0166667/s (1/min 1/5min 1/15min);
[2026-05-17 22:51:12 +0200] information/WorkQueue: #9 (Influxdb2Writer, msd-ic-db01) items: 0, rate: 1.28333/s (77/min 77/5min 77/15min);
[2026-05-17 22:51:12 +0200] information/WorkQueue: #10 (Influxdb2Writer, msd-ic-db02) items: 0, rate: 1.28333/s (77/min 77/5min 77/15min);
[2026-05-17 22:51:12 +0200] information/WorkQueue: #12 (Influxdb2Writer, victoria-metrics) items: 0, rate: 1.28333/s (77/min 77/5min 77/15min);
[2026-05-17 22:51:12 +0200] information/WorkQueue: #11 (Influxdb2Writer, msd-ic-db04) items: 0, rate: 1.28333/s (77/min 77/5min 77/15min);
[2026-05-17 22:52:12 +0200] information/WorkQueue: #12 (Influxdb2Writer, victoria-metrics) items: 87, rate: 21.55/s (1293/min 1378/5min 1378/15min); empty in 10 seconds
[2026-05-17 22:52:22 +0200] information/WorkQueue: #12 (Influxdb2Writer, victoria-metrics) items: 315, rate: 16.7333/s (1004/min 1378/5min 1378/15min); empty in 13 seconds
[2026-05-17 22:52:32 +0200] information/WorkQueue: #12 (Influxdb2Writer, victoria-metrics) items: 540, rate: 12.6167/s (757/min 1378/5min 1378/15min); empty in 23 seconds
[2026-05-17 22:52:42 +0200] information/WorkQueue: #12 (Influxdb2Writer, victoria-metrics) items: 763, rate: 8.86667/s (532/min 1378/5min 1378/15min); empty in 34 seconds
[2026-05-17 22:52:52 +0200] information/WorkQueue: #12 (Influxdb2Writer, victoria-metrics) items: 995, rate: 5.08333/s (305/min 1378/5min 1378/15min); empty in 42 seconds
[2026-05-17 22:53:02 +0200] information/WorkQueue: #12 (Influxdb2Writer, victoria-metrics) items: 1188, rate:  2/s (120/min 1378/5min 1378/15min); empty in 1 minute and 1 second
[2026-05-17 22:53:12 +0200] information/WorkQueue: #12 (Influxdb2Writer, victoria-metrics) items: 1364, rate:  0/s (0/min 1378/5min 1378/15min); empty in 1 minute and 17 seconds
[2026-05-17 22:53:22 +0200] information/WorkQueue: #12 (Influxdb2Writer, victoria-metrics) items: 1585, rate:  0/s (0/min 1378/5min 1378/15min); empty in 1 minute and 11 seconds
[2026-05-17 22:53:32 +0200] information/WorkQueue: #12 (Influxdb2Writer, victoria-metrics) items: 1786, rate:  0/s (0/min 1378/5min 1378/15min); empty in 1 minute and 28 seconds
[2026-05-17 22:53:42 +0200] information/WorkQueue: #12 (Influxdb2Writer, victoria-metrics) items: 1962, rate:  0/s (0/min 1378/5min 1378/15min); empty in 1 minute and 51 seconds
[2026-05-17 22:53:52 +0200] information/WorkQueue: #12 (Influxdb2Writer, victoria-metrics) items: 2164, rate:  0/s (0/min 1378/5min 1378/15min); empty in 1 minute and 47 seconds
[2026-05-17 22:54:02 +0200] information/WorkQueue: #12 (Influxdb2Writer, victoria-metrics) items: 2341, rate:  0/s (0/min 1378/5min 1378/15min); empty in 2 minutes and 12 seconds
[2026-05-17 22:54:12 +0200] information/WorkQueue: #12 (Influxdb2Writer, victoria-metrics) items: 2560, rate:  0/s (0/min 1378/5min 1378/15min); empty in 1 minute and 56 seconds
[2026-05-17 22:54:22 +0200] information/WorkQueue: #12 (Influxdb2Writer, victoria-metrics) items: 2762, rate:  0/s (0/min 1378/5min 1378/15min); empty in 2 minutes and 16 seconds
[2026-05-17 22:54:32 +0200] information/WorkQueue: #12 (Influxdb2Writer, victoria-metrics) items: 3014, rate:  0/s (0/min 1378/5min 1378/15min); empty in 1 minute and 59 seconds
[2026-05-17 22:54:42 +0200] information/WorkQueue: #12 (Influxdb2Writer, victoria-metrics) items: 3224, rate:  0/s (0/min 1378/5min 1378/15min); empty in 2 minutes and 33 seconds
[2026-05-17 22:54:52 +0200] information/WorkQueue: #12 (Influxdb2Writer, victoria-metrics) items: 3281, rate:  0/s (0/min 1378/5min 1378/15min); empty in 9 minutes and 35 seconds
[2026-05-17 22:55:02 +0200] information/WorkQueue: #12 (Influxdb2Writer, victoria-metrics) items: 3288, rate:  0/s (0/min 1378/5min 1378/15min); empty in 1 hour, 18 minutes and 17 seconds
[2026-05-17 22:55:12 +0200] information/WorkQueue: #12 (Influxdb2Writer, victoria-metrics) items: 3301, rate:  0/s (0/min 1378/5min 1378/15min); empty in 42 minutes and 19 seconds
[2026-05-17 22:55:22 +0200] information/WorkQueue: #12 (Influxdb2Writer, victoria-metrics) items: 3318, rate:  0/s (0/min 1378/5min 1378/15min); empty in 32 minutes and 31 seconds
[2026-05-17 22:55:32 +0200] information/WorkQueue: #12 (Influxdb2Writer, victoria-metrics) items: 3337, rate:  0/s (0/min 1378/5min 1378/15min); empty in 29 minutes and 16 seconds
[2026-05-17 22:55:42 +0200] information/WorkQueue: #12 (Influxdb2Writer, victoria-metrics) items: 3354, rate:  0/s (0/min 1378/5min 1378/15min); empty in 32 minutes and 52 seconds
[2026-05-17 22:55:52 +0200] information/WorkQueue: #12 (Influxdb2Writer, victoria-metrics) items: 3366, rate:  0/s (0/min 1378/5min 1378/15min); empty in 46 minutes and 45 seconds
[2026-05-17 22:56:02 +0200] information/WorkQueue: #12 (Influxdb2Writer, victoria-metrics) items: 3376, rate:  0/s (0/min 1378/5min 1378/15min); empty in 56 minutes and 16 seconds
[2026-05-17 22:56:12 +0200] information/WorkQueue: #6 (ApiListener, RelayQueue) items: 0, rate: 13.6/s (816/min 21447/5min 27663/15min);
[2026-05-17 22:56:12 +0200] information/WorkQueue: #7 (ApiListener, SyncQueue) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2026-05-17 22:56:12 +0200] notice/WorkQueue: #8 (IcingaDB) items: 0, rate:  0/s (0/min 0/5min 1/15min);
[2026-05-17 22:56:12 +0200] information/WorkQueue: #10 (Influxdb2Writer, msd-ic-db02) items: 0, rate: 1.51667/s (91/min 4706/5min 4791/15min);
[2026-05-17 22:56:12 +0200] information/WorkQueue: #9 (Influxdb2Writer, msd-ic-db01) items: 0, rate: 1.51667/s (91/min 4706/5min 4791/15min);
[2026-05-17 22:56:12 +0200] information/WorkQueue: #11 (Influxdb2Writer, msd-ic-db04) items: 0, rate: 1.51667/s (91/min 4706/5min 4791/15min);
[2026-05-17 22:56:12 +0200] information/WorkQueue: #12 (Influxdb2Writer, victoria-metrics) items: 3388, rate:  0/s (0/min 1293/5min 1378/15min); empty in 47 minutes and 3 seconds
[2026-05-17 22:56:22 +0200] information/WorkQueue: #12 (Influxdb2Writer, victoria-metrics) items: 3388, rate:  0/s (0/min 1004/5min 1378/15min); empty in infinite time, your task handler isn't able to keep up
[2026-05-17 22:56:32 +0200] information/WorkQueue: #12 (Influxdb2Writer, victoria-metrics) items: 3388, rate:  0/s (0/min 757/5min 1378/15min); empty in infinite time, your task handler isn't able to keep up
[2026-05-17 22:56:42 +0200] information/WorkQueue: #12 (Influxdb2Writer, victoria-metrics) items: 3388, rate:  0/s (0/min 532/5min 1378/15min); empty in infinite time, your task handler isn't able to keep up
[2026-05-17 22:56:52 +0200] information/WorkQueue: #12 (Influxdb2Writer, victoria-metrics) items: 3388, rate:  0/s (0/min 305/5min 1378/15min); empty in infinite time, your task handler isn't able to keep up
[2026-05-17 22:57:02 +0200] information/WorkQueue: #12 (Influxdb2Writer, victoria-metrics) items: 3388, rate:  0/s (0/min 120/5min 1378/15min); empty in infinite time, your task handler isn't able to keep up
[2026-05-17 22:57:12 +0200] information/WorkQueue: #12 (Influxdb2Writer, victoria-metrics) items: 3388, rate:  0/s (0/min 0/5min 1378/15min); empty in infinite time, your task handler isn't able to keep up
[2026-05-17 22:57:22 +0200] information/WorkQueue: #12 (Influxdb2Writer, victoria-metrics) items: 3388, rate:  0/s (0/min 0/5min 1378/15min); empty in infinite time, your task handler isn't able to keep up
[2026-05-17 22:57:32 +0200] information/WorkQueue: #12 (Influxdb2Writer, victoria-metrics) items: 3388, rate:  0/s (0/min 0/5min 1378/15min); empty in infinite time, your task handler isn't able to keep up
[2026-05-17 22:57:42 +0200] information/WorkQueue: #12 (Influxdb2Writer, victoria-metrics) items: 3388, rate:  0/s (0/min 0/5min 1378/15min); empty in infinite time, your task handler isn't able to keep up
[2026-05-17 22:57:52 +0200] information/WorkQueue: #12 (Influxdb2Writer, victoria-metrics) items: 3388, rate:  0/s (0/min 0/5min 1378/15min); empty in infinite time, your task handler isn't able to keep up
[2026-05-17 22:58:02 +0200] information/WorkQueue: #12 (Influxdb2Writer, victoria-metrics) items: 3388, rate:  0/s (0/min 0/5min 1378/15min); empty in infinite time, your task handler isn't able to keep up
[2026-05-17 22:58:12 +0200] information/WorkQueue: #12 (Influxdb2Writer, victoria-metrics) items: 3388, rate:  0/s (0/min 0/5min 1378/15min); empty in infinite time, your task handler isn't able to keep up

If this “runs” long enough the logs will further inform about “too many open files”. These messages spam the log multiple times a second and fill up the drive.

May 04 08:01:34 msd-ic-ma02 icinga2[744938]: Cannot accept new connection: Too many open files
May 04 08:01:34 msd-ic-ma02 icinga2[744938]: Cannot accept new connection: Too many open files
May 04 08:01:34 msd-ic-ma02 icinga2[744938]: Cannot accept new connection: Too many open files
May 04 08:01:34 msd-ic-ma02 icinga2[744938]: Cannot accept new connection: Too many open files
May 04 08:01:34 msd-ic-ma02 icinga2[744938]: Cannot accept new connection: Too many open files
May 04 08:01:34 msd-ic-ma02 icinga2[744938]: Cannot accept new connection: Too many open files
May 04 08:01:34 msd-ic-ma02 icinga2[744938]: Cannot accept new connection: Too many open files
May 04 08:01:34 msd-ic-ma02 icinga2[744938]: Cannot accept new connection: Too many open files
May 04 08:01:34 msd-ic-ma02 icinga2[744938]: Cannot accept new connection: Too many open files
May 04 08:01:34 msd-ic-ma02 icinga2[744938]: Cannot accept new connection: Too many open files
May 04 08:01:34 msd-ic-ma02 icinga2[744938]: Cannot accept new connection: Too many open files
May 04 08:01:34 msd-ic-ma02 icinga2[744938]: Cannot accept new connection: Too many open files
May 04 08:01:34 msd-ic-ma02 icinga2[744938]: Cannot accept new connection: Too many open files
May 04 08:01:34 msd-ic-ma02 icinga2[744938]: Cannot accept new connection: Too many open files
May 04 08:01:34 msd-ic-ma02 icinga2[744938]: Cannot accept new connection: Too many open files
May 04 08:01:34 msd-ic-ma02 icinga2[744938]: Cannot accept new connection: Too many open files
May 04 08:01:34 msd-ic-ma02 icinga2[744938]: Cannot accept new connection: Too many open files
May 04 08:01:34 msd-ic-ma02 icinga2[744938]: Cannot accept new connection: Too many open files
May 04 08:01:34 msd-ic-ma02 icinga2[744938]: Cannot accept new connection: Too many open files
May 04 08:01:34 msd-ic-ma02 icinga2[744938]: Cannot accept new connection: Too many open files
May 04 08:01:34 msd-ic-ma02 icinga2[744938]: Cannot accept new connection: Too many open files

Icinga team is working on it, so hopefully there is a fix soon :slight_smile:

I now tried disabling the Influx writer we used as a test to write into VictoriaMetrics and see if that is the sole culprit. In future we can use the new OpenTelemetry writer for that :+1: