Icinga last_check incorrect after NTP issues

gigagoirle · June 29, 2022, 8:42am

We use icinga2 to monitoring customers of ours and last weekend on the 25th of june one of the endpoint icinga2 sites (CentOS 7) got migrated to a new VMware plafform where the ESX time was not correct.

The system got a weird time (I assume the hwclock), and after a while chronyc corrected the time. The time on the ESX got also fixed in the meantime so there is no NTP issue anymore.

The problem now is that the last_check of the hosts in our monitoring is the time right before the VM migration and it will not update to the current time.

So no data is being sent to the Icinga master, when I click the check now it does nothing.

I did all the communication tests between the icinga master and the endpoint (ping, reachable on port 5665 etc…) so the communication should not be a problem.

Is there a fix for this as right now we cant monitor the customer.

icinga2 - The Icinga 2 network monitoring daemon (version: r2.10.5-1)

dgoetz · June 29, 2022, 9:01am

There is the closed issue Changing system time backwards causes CheckerComponent to no longer execute active checks · Issue #9278 · Icinga/icinga2 · GitHub telling it works as expected. So the only workaround I know would be “resetting” the specific system by deleting the state file with all the negative impacts, but as it is only a satellite it is what I would do.

Pooh · June 29, 2022, 9:00am

Can you tell us, at least approximately, what date/time the system did get set
to?

At least whether it was “way in the past” or “way in the future” would help.

Was it hours, days or years off?

Antony.

gigagoirle · June 29, 2022, 9:51am

The time forwared to 5 dec 2034 and chrony corrected it. I deleted the state files /var/lib/icinga2 on both master and the endpoint, restarted master, db satellite etc… nothing helps. In one other blog I found out that another user corrected the last check time in the database so I also did that but it still does not work.

dgoetz · June 29, 2022, 10:39am

The database is only output, so I think this should not be blocking anything, even the rows should be updated without looking at the date.

The state file is the only thing which let’s Icinga 2 know that something was done already, so you wrote you deleted and then restarted which could explain that there was no effect because Icinga 2 will write the state during stop. Can you try stopping the service on all affected nodes again, delete the file and then start the service again?

gigagoirle · June 29, 2022, 11:00am

Thanks! this solved the problem. I am very gratefull!