I’ve been adding some notifications recently and wanted to check if they all work. For that I turned off Icinga on one of my agents but my Web-Frontend continues to show that it is active and okay.
I checked the service object in case it’s accidentally monitoring something different. But the “hostname” was the right one. So now I’m questioning what the Icinga Service even checks.
This would be the log whenever I check:
[2022-02-02 08:18:03 +0100] information/ApiListener: New client connection from [::ffff:127.0.0.1]:37620 (no client certificate)
[2022-02-02 08:18:03 +0100] information/HttpServerConnection: Request: POST /v1/actions/reschedule-check (from [::ffff:127.0.0.1]:37620), user: icingaweb2, agent: , status: OK).
[2022-02-02 08:18:03 +0100] information/HttpServerConnection: HTTP client disconnected (from [::ffff:127.0.0.1]:37620)
The agent is right underneath my master without a satellite and is within the masters zone. I only have one master. Working on a Debian system.
Ok, multiple things to explain here.
First what does the CheckCommand icinga check: It queries internally the service to gather metrics, issues a warning if the last reload fails and optionally you can check for a minimum version. So on the master and satellites I recommend it for the metrics, creating an inconsistent configuration which is valid on the master but not on a satellite (or sometimes an agent) is also create to see instead of searching why something is not running as expected and the minimum version is create to not have agents fall behind.
Second if the agent is down and the check is still OK, look at the check source as the Check Command icinga has to run on the agent and an agent check would complain as UNKNOWN if the connection is not established.
Third use the Check Command cluster_zone in addition to check agent connectivity and optionally add a dependencies for all agent checks to it. This will better show you if some agent is down on a still running host.
And last but not least what you see in the log is a connection to the API for you rescheduling the check (probably in the webinterface), the actual check will only be shown in debug if it does not fail.
Thank you very much for the explanation!
The Check Source is set to the master on all of the agents checks. I thought that would be fine since the service object has the hostname set to the agent. But I’ll figure out how to set the Check Source to the agent!
Added the cluster-zone and all of them are showing either critical or unknown, so I think we have discovered the issue.
Thank you very much. I’m pretty new to icinga