Connection to agent timed out but no UNKNOWN service check status and no notification

Hi everyone,

I have searched, so please forgive me if I could have found the answer somewhere already.
I have a Top Down Endpoint Command setup, and the connection to one of my agent nodes timed out.
Still, I received no warning about the check being late / unknown.

How do I make sure that the service check status becomes UNKNOWN so I get a notification?
This is the log at /var/log/icinga2/icinga2.log on master:

[2020-03-10 17:52:23 +0100] warning/JsonRpcConnection: API client disconnected for identity 'icinga2-agent1.localdomain'
[2020-03-10 17:52:23 +0100] warning/ApiListener: Removing API client for endpoint 'icinga2-agent1.localdomain'. 0 API clients left.
[2020-03-10 17:52:23 +0100] information/ApiListener: Reconnecting to endpoint 'icinga2-agent1.localdomain' via host 'ip.address.agent1' and port '5665'
[2020-03-10 17:52:42 +0100] information/WorkQueue: #8 (IdoMysqlConnection, ido-mysql) items: 0, rate: 8.2/s (492/min 2616/5min 7986/15min);
[2020-03-10 17:52:53 +0100] information/WorkQueue: #7 (ApiListener, SyncQueue) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2020-03-10 17:52:53 +0100] information/WorkQueue: #5 (InfluxdbWriter, influxdb) items: 0, rate: 2.9/s (174/min 929/5min 2867/15min);
[2020-03-10 17:53:12 +0100] information/WorkQueue: #6 (ApiListener, RelayQueue) items: 0, rate: 10.95/s (657/min 3525/5min 10957/15min);
[2020-03-10 17:54:03 +0100] information/ConfigObject: Dumping program state to file '/var/lib/icinga2/icinga2.state'
[2020-03-10 17:54:33 +0100] critical/ApiListener: Cannot connect to host 'ip.address.agent1' on port '5665': Connection timed out
[2020-03-10 17:54:43 +0100] information/ApiListener: Reconnecting to endpoint 'icinga2-agent1.localdomain' via host 'ip.address.agent1' and port '5665'
[2020-03-10 17:56:52 +0100] critical/ApiListener: Cannot connect to host 'ip.address.agent1' on port '5665': Connection timed out
[2020-03-10 17:56:53 +0100] information/ApiListener: Reconnecting to endpoint 'icinga2-agent1.localdomain' via host 'ip.address.agent1' and port '5665'
[2020-03-10 17:57:52 +0100] information/WorkQueue: #8 (IdoMysqlConnection, ido-mysql) items: 0, rate: 7.78333/s (467/min 2455/5min 7752/15min);
[2020-03-10 17:58:03 +0100] information/WorkQueue: #5 (InfluxdbWriter, influxdb) items: 0, rate: 2.86667/s (172/min 866/5min 2768/15min);
[2020-03-10 17:58:03 +0100] information/WorkQueue: #7 (ApiListener, SyncQueue) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2020-03-10 17:58:22 +0100] information/WorkQueue: #6 (ApiListener, RelayQueue) items: 0, rate: 11/s (660/min 3314/5min 10552/15min);
[2020-03-10 17:59:03 +0100] information/ConfigObject: Dumping program state to file '/var/lib/icinga2/icinga2.state'
[2020-03-10 17:59:04 +0100] critical/ApiListener: Cannot connect to host 'ip.address.agent1' on port '5665': Connection timed out
[2020-03-10 17:59:13 +0100] information/ApiListener: Reconnecting to endpoint 'icinga2-agent1.localdomain' via host 'ip.address.agent1' and port '5665'
[2020-03-10 18:01:23 +0100] critical/ApiListener: Cannot connect to host 'ip.address.agent1' on port '5665': Connection timed out
[2020-03-10 18:01:33 +0100] information/ApiListener: Reconnecting to endpoint 'icinga2-agent1.localdomain' via host 'ip.address.agent1' and port '5665'
[2020-03-10 18:03:02 +0100] information/WorkQueue: #8 (IdoMysqlConnection, ido-mysql) items: 0, rate: 8.11667/s (487/min 2477/5min 7576/15min);
[2020-03-10 18:03:13 +0100] information/WorkQueue: #7 (ApiListener, SyncQueue) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2020-03-10 18:03:13 +0100] information/WorkQueue: #5 (InfluxdbWriter, influxdb) items: 0, rate: 2.9/s (174/min 869/5min 2662/15min);
[2020-03-10 18:03:32 +0100] information/WorkQueue: #6 (ApiListener, RelayQueue) items: 0, rate: 11.0333/s (662/min 3317/5min 10130/15min);
[2020-03-10 18:03:42 +0100] critical/ApiListener: Cannot connect to host 'ip.address.agent1' on port '5665': Connection timed out
[2020-03-10 18:03:43 +0100] information/ApiListener: Reconnecting to endpoint 'icinga2-agent1.localdomain' via host 'ip.address.agent1' and port '5665'
[2020-03-10 18:04:03 +0100] information/ConfigObject: Dumping program state to file '/var/lib/icinga2/icinga2.state'
[2020-03-10 18:05:53 +0100] critical/ApiListener: Cannot connect to host 'ip.address.agent1' on port '5665': Connection timed out
[2020-03-10 18:06:03 +0100] information/ApiListener: Reconnecting to endpoint 'icinga2-agent1.localdomain' via host 'ip.address.agent1' and port '5665'
[2020-03-10 18:08:12 +0100] information/WorkQueue: #8 (IdoMysqlConnection, ido-mysql) items: 0, rate: 7.81667/s (469/min 2460/5min 7439/15min);
[2020-03-10 18:08:12 +0100] critical/ApiListener: Cannot connect to host 'ip.address.agent1' on port '5665': Connection timed out
[2020-03-10 18:08:13 +0100] information/ApiListener: Reconnecting to endpoint 'icinga2-agent1.localdomain' via host 'ip.address.agent1' and port '5665'
[2020-03-10 18:08:23 +0100] information/WorkQueue: #7 (ApiListener, SyncQueue) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2020-03-10 18:08:23 +0100] information/WorkQueue: #5 (InfluxdbWriter, influxdb) items: 0, rate: 2.86667/s (172/min 869/5min 2606/15min);
[2020-03-10 18:08:42 +0100] information/WorkQueue: #6 (ApiListener, RelayQueue) items: 0, rate: 11.05/s (663/min 3317/5min 9943/15min);
[2020-03-10 18:09:03 +0100] information/ConfigObject: Dumping program state to file '/var/lib/icinga2/icinga2.state'
[2020-03-10 18:10:23 +0100] critical/ApiListener: Cannot connect to host 'ip.address.agent1' on port '5665': Connection timed out
[2020-03-10 18:10:33 +0100] information/ApiListener: Reconnecting to endpoint 'icinga2-agent1.localdomain' via host 'ip.address.agent1' and port '5665'
[2020-03-10 18:12:43 +0100] critical/ApiListener: Cannot connect to host 'ip.address.agent1' on port '5665': Connection timed out
[2020-03-10 18:12:53 +0100] information/ApiListener: Reconnecting to endpoint 'icinga2-agent1.localdomain' via host 'ip.address.agent1' and port '5665'
[2020-03-10 18:13:22 +0100] information/WorkQueue: #8 (IdoMysqlConnection, ido-mysql) items: 0, rate: 8.05/s (483/min 2469/5min 7449/15min);
[2020-03-10 18:13:33 +0100] information/WorkQueue: #7 (ApiListener, SyncQueue) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2020-03-10 18:13:33 +0100] information/WorkQueue: #5 (InfluxdbWriter, influxdb) items: 0, rate: 2.88333/s (173/min 869/5min 2608/15min);
[2020-03-10 18:13:52 +0100] information/WorkQueue: #6 (ApiListener, RelayQueue) items: 0, rate: 10.45/s (627/min 3280/5min 9905/15min);
[2020-03-10 18:14:03 +0100] information/ConfigObject: Dumping program state to file '/var/lib/icinga2/icinga2.state'
[2020-03-10 18:15:02 +0100] critical/ApiListener: Cannot connect to host 'ip.address.agent1' on port '5665': Connection timed out
[2020-03-10 18:15:03 +0100] information/ApiListener: Reconnecting to endpoint 'icinga2-agent1.localdomain' via host 'ip.address.agent1' and port '5665'
[2020-03-10 18:17:13 +0100] critical/ApiListener: Cannot connect to host 'ip.address.agent1' on port '5665': Connection timed out```

I actually found the solution!

https://icinga.com/docs/icinga2/latest/doc/06-distributed-monitoring/#health-checks

1 Like

Hi @ahakla and welcome to the community!

That’s great that you found the solution. I would have advised to use the cluster_zone check as well, but it’s even better that you found it yourself.

Would you please mark your second posting with the “Solution” checkbox so others can see that you don’t need any more help?

1 Like

Done, thanks for the warm welcome! I’ve seen others struggling with this issue too, so hopefully I’ve helped others as well. :slight_smile:

1 Like