Icinga new service checks are stuck in Pending - Some never get a check source, some do get a check source but both are Pending

Hi all,

We have icinga configured with one master and several agents. The current setup has been working fine for over a year. We implement a number of custom imports that work fine with a number of custom commands and service templates and service apply rules.

Recently (last two weeks), we’ve noticed that our new hosts (seems to be both automatic and manual) are stuck in pending. One specific trend I have noticed is that some do have a check source and some do not. If the host gets a check source, then its services will work, but the host check itself will stay stuck in pending. If, on the other hand, the host does not get a check source, then its services and host check will stay stuck in pending.

These service checks and host checks are from the same apply rules that ARE working for other pre-existing hosts/services, but they seem to be failing for new ones.


The only change that I’m aware of happening approximately at that time was an adjustment to our imports to assign a variable that is used to apply a hostgroup instead of directly assigning a hostgroup in the import, but this, as far as I can tell, is unrelated.

One other note is we have a custom auto-downtime check that sets a variable with the current time at host creation and sets downtime for the host every hour. After a period of time, it assigns the host to a group that removes it from the service apply rule that applies the auto-downtime check. I also do not believe this is related as it has been in place for months but wanted to mention it.


We’ve rebooted all 5 servers and had no luck with resolution. We also checked the debug log for errors and haven’t seen any. Does anyone have any suggestions for next troubleshooting steps to try to track down this issue?

Thanks!


System details:

icinga2 version: 2.13.2-1
Browser - firefox
Icinga Web 2 Version - 2.9.5
PHP Version - 7.4.33
OS - Alma Linux 8.8

Modules:
|director |1.9.0|
|fileshipper |1.2.0|
|grafana |1.4.2|
|incubator |0.20.0|
|ipl |0.0.0|
|migrate |2.9.5|
|monitoring |2.9.5|
|mygrafana |0.0.0|
|reactbundle |0.0.0|
|setup |2.9.5|
|vspheredb |1.6.0 |

One common problem is when master’s and agent’s local time difference gets to big.

Thanks! Unfortunately in this case the server times are all in sync. They all poll a local NTP server and I also manually validated they seem to be the same

Any other suggestions of potential issues or ways to trace out the cause?

Other common issues can be a result of certificate failures and/or host name mismatch. However, they would be reported in the logs and you have checked them already.