Last check in future? Unable to reschedule

Hi there,

Something very strange seems to be happening with my deployment of Icinga.

A zone had a few servers rebooted, and the host checks went into a failed state, but for some reason the last check time is showing as November 8th 2020.

image

Clicking the Reschedule or Check now button does not appear to properly reschedule these checks. All other checks in the satellite zone are working fine.

Both the host check and services on those hosts seem to be affected - but in this case it is saying the last check is late and the next check is November 8th.

image

Iā€™ve checked to ensure NTP is working on all agents in the zone, and all appears to be fine.

Any ideas? Iā€™m completely lost!

Hi,

As far as I remember, there was a bug in versions < 2.11 that did not allow to reschedule a check that was ā€œrun in the futureā€. What version are you using?

Things like that can happen with NTP not correctly syncing or syncing late. and problems with your hardware clock.

Cheers,
Thomas

1 Like

Ah, that would explain it. Weā€™re on a custom build of 2.10.5 to include our OpenTSDBWriter changes, but that has since been merged into the last release so I may upgrade us to the latest.

I can see in the debuglog the CheckerComponent and notice/Process logs are correctly executing the program, but it looks like result isnā€™t making itā€™s way back to the masters.

Iā€™ll upgrade to the latest version and report back.

I think what may have happened is the server that was rebooted was a Hyper-V host, and potentially thatā€™s interfered with the domain controller(s) hardware clock/NTP.

1 Like

Yes - I see happening problems with time on some Virtualization platforms regularly.

Please do as you suggested. Iā€™m looking forward to hearing if this solved the problem.

After some further troubleshooting, we were able to fix this up.

I stopped the problematic satellite, stopped both masters, and removed the state file from all three.

Then, jumped into the database, looked at the icinga_hoststatus table and sorted by ā€˜Last Checkā€™ - after taking a backup, I modified the problematic rows to be a date from this morning.

I also made the same change to the problematic services in the icinga_servicestatus table.

I can now see services being scheduled and checked again.

We do plan on upgrading ASAP to the latest version but it looks like our OpenTSDB changes are only in the latest RC, and given that there is only one bug in GitHub we are happy to wait for the official release to come out.

Thanks again for your help @twidhalm!

2 Likes

Ok, great! Could you please flag one of the posts as ā€œSolutionā€ so other users see that you donā€™t need any more help.

2 Likes