Check result is late

cooljay · June 20, 2019, 9:46pm

Hello all,
I sucessfully configured an icinga2 master satelite nodes. The icinga satellite was connected to the master through a VPN connection (Openvpn). I have some hosts which are connected to the satellite and everything works fine. Now I discovererd that after some hours, check command could not be executed on the hosts connected to the satellite. I also see the message check result is late from the icinga2web of the hosts.
Please who knows what could be the problem.

Regards icinga

aflatto · June 25, 2019, 8:35am

The result you see here is ICMP based and has no indication if the satellite is actually working, all it tells you is that the VPN is working.

I have encountered something similar in the past.
Can you login to the satellite and make sure that the pushed icinga configuration is valid
icinga2 deamon -C
It might be that the latest config change pushed to the Satellite has passed on the master but broke on the remote end and thus not communicating effectively with the master.

cooljay · June 25, 2019, 9:16am

Hello Assaf,

Yes, my icinga sattelite picks configuration from the master and the connected nodes can be viewed from the icingaweb. I noticed that this always happen at exactly 10:20 pm everyday , and will last for exactly two hours after which the red dot will go away and check will be performed timely on the clients connected to the satellite.

aflatto · June 25, 2019, 11:00am

could it be related to the backup window ?

Some high traffic on the network, the VPN server ?

bkai · June 26, 2019, 1:00pm

Essentially a check is shown late (“next check” time negative or in the past in the GUI) if there is a dependency suppressing it, or if a specific acknowledgement or downtime is doing so. Since you have this regular (!) behaviour I would guess it might be the latter - i.e. you have a recurring downtime configured somewhere - at 10:20 for 2 hours - affecting this check directly or its host.

I started a “late checks” thread some time ago, where we had a massive late checks problem, but the cause/resolution seems to be different than in your case. (Badly configured dependency in our case.)

cooljay · June 26, 2019, 3:03pm

Hello kai,
I found the issue, the issue lies on the time Synchronisation of the satellite server. After troubleshooting and fixing the issue with my system UTC, RTC, and local time. It started working fine again.

Regards

bkai · June 26, 2019, 4:09pm

Okay, thanks. Probably another reason to always have NTP service active on any server… But I do have 1 question. From your capture in the first post here it seems that the “next check” in your case looked normal (i.e. in the future). So i.e. your check was not even considered a late check by Icinga & would not have appeared in the GUI’s late checks overview? (Dashboard -> Overdue (tab) -> middle column)