I’m looking for some guidance on how to properly monitor up/down on multiple hosts behind the same firewall (and therefore all clients have same external IP).
I have set up a Distributed solution, my master node lives in the cloud and has a public IP. I have no issues setting up monitoring of other agents in the cloud.
I have multiple physical and virtual servers and switches in my HQ that I want to monitor as well. Unfortunately, none of these have publicly-routable IPs.
If All the HQ machines are getting NAT’d and the hostalive check_command tracks UP/DOWN by pinging the IP, I am really only monitoring if the Firewall is UP/DOWN, not each individual host.
Is there a way that I can monitor a host’s UP/DOWN status based on the active tcp connection instead of ping?
(I’d also mention that I am able to properly monitor SERVICES on each host behind the firewall, my only issue at present is properly monitoring UP/DOWN)
Hello, yup, what you need to do is to change the realted check command object tied to the host objects you want to check, you basically would have something like this.
Thank you for the help, it sounds like this is the right path (monitoring the server for an open listen connection on port 5665), however I’m clearly implementing wrong, can you make any further suggestions?
Additionally, I believe that the “check_command” under a Host definition is run from the Monitoring Master’s side, not on the agent. I’m specifically trying to determine if I can use a check_command to tell if tcp 5665 is open on the agent end
I know you’ve solved this already, but have you looked into running a satellite node at your HQ? This would make more sense.
e.g. Master <----- Satellite <----- Clients
If you put the satellite and clients in the same zone, the satellite node will be able to monitor the HQ clients up/down status with their local IP and you won’t have to worry about NAT.
We setup a satellite and zone for every site and it has worked very well for us (~50 zones).
Tbh I would also vote for a satellite behind the firewall that checks the servers there.
with your solution you have server1 ping itself on tcp/5665. This will “always” be ok, unless the Icinga Agent is not running. And that you won’t notice, since there are no check updates sent when the Agent is down.