Host and Service checks hang on Pending indefinitely on 1 satellite only

Hello all,

I’m having issues with one of my satellite installations. This installation is new and for some reason the host and guests keep hanging on the status Pending with no update or whatsoever. I’ve already tried running the Node Wizard again, but doesn’t change anything. The master has 4 satellites that work fine, but the fifth is giving the issues. The time is set and timezone is synced correctly and communication is being established.

One odd thing that I noted: If I add a new endpoint (Windows Agent) and complete the wizard, the endpoint isn’t visible on the master server until I restart the icinga2 service on the satellite or the master. Like there’s communication, but it initiates and doesn’t do anything else after.

Does anybody know what could be going on? What logging would you need?

Hi,

did you read https://icinga.com/docs/icinga2/latest/doc/15-troubleshooting/#cluster-and-clients-troubleshooting? Especially the section regarding config sync issues? You did not share enough details to know the exact setup on your end but I guess the troubleshooting docs will help here.

My first guess (and just a stab in the blue) would be that the host is not mapped to the correct zone or endpoint.

You can also use the icinga2 command line client to verify that the configuration sync works. Does icinga2 object list --type endpoint for example show the expected endpoint?

Hi,

please share additional details on your setup such as

  • involved versions of all nodes
  • zones.conf of the involved problematic nodes - master, the satellite, agents
  • monitoring health in general, the stats shown via /v1/status on the REST API
  • Logs, best debug logs which should the specific problem routes for
    • agent signing requests
    • check execution and check result receival

Cheers,
Michael

All of a sudden it started working partially (waited for 3+ hours though…). Right now the host is at least reachable and I’m getting an ICMP response, but the services are not responding.

Icinga is telling me: Remote Icinga instance ‘dc01.domain.local’ is not connected to ‘sat01.domain.local’ for the other services. This is connected, but since domain.local is already used and can’t use the same name twice for an object over different satellites, as the director will complain duplicates are not allowed.

I’ve referenced to it as domain2.local, by adding an extra DNS zone called domain2.local, copying over the DNS entries so everything will have a domain2.local extension too and by adding it to my search domain on my satellite.

Resolution is successful, but the status is still Unknown.

When I’m running tail -f /var/log/icinga2/icinga2.log, it shows the following entry repeatedly:

[2019-08-07 10:05:34 +0200] warning/ApiListener: Unexpected certificate common name while connecting to endpoint ‘dc01.domain2.local’: got ‘dc01.domain.local’
Context:
(0) Handling new API client connection
[2019-08-07 10:08:14 +0200] information/ApiListener: Finished reconnecting to endpoint ‘as03.domain2.local’ via host ‘as03.domain.local’ and port ‘5665’
[2019-08-07 10:08:14 +0200] information/ApiListener: Finished reconnecting to endpoint ‘dc01.domain2.local’ via host ‘dc01.domain.local’ and port ‘5665’
[2019-08-07 10:08:17 +0200] warning/JsonRpcConnection: API client disconnected for identity ‘dc01.domain.local’
[2019-08-07 10:08:19 +0200] warning/JsonRpcConnection: API client disconnected for identity ‘as03.domain.local’
[2019-08-07 10:08:21 +0200] information/ApiListener: New client connection for identity ‘dc01.domain.local’ from [10.0.14.1]:58924 (no Endpoint object found for identity)

Is there a switch in icinga2 which will allow me to ignore the common name mismatch (as it’s correct, but not correct) on the satellite for an Endpoint (Windows Agent)?

The client certificate needs to be re-generated. The certificates common name is stored and read on connect. Which in term means that you need to remove the old certificates and re-run the signing request with the setup wizard.

Micheal,

The problem is that with 2 domains (domain.local and domain2.local) that when you ping the address it sometimes resolves to dc01.domain.local and sometimes to dc01.domain2.local, will that be a problem for icinga or does it just compare the hostname (FQDN) to the host (FQDN) it connects to?