Handling multiple agents behind same Firewall/Same NAT IP

I’m looking for some guidance on how to properly monitor up/down on multiple hosts behind the same firewall (and therefore all clients have same external IP).

I have set up a Distributed solution, my master node lives in the cloud and has a public IP. I have no issues setting up monitoring of other agents in the cloud.

I have multiple physical and virtual servers and switches in my HQ that I want to monitor as well. Unfortunately, none of these have publicly-routable IPs.

If All the HQ machines are getting NAT’d and the hostalive check_command tracks UP/DOWN by pinging the IP, I am really only monitoring if the Firewall is UP/DOWN, not each individual host.

Is there a way that I can monitor a host’s UP/DOWN status based on the active tcp connection instead of ping?

(I’d also mention that I am able to properly monitor SERVICES on each host behind the firewall, my only issue at present is properly monitoring UP/DOWN)

Hello, yup, what you need to do is to change the realted check command object tied to the host objects you want to check, you basically would have something like this.

Object Host "host" {
check_command="tcp"
[...]
}

Now, for the check to choose i’m not entirely sure what to take for your need, but this one seems promising :
https://icinga.com/docs/icinga2/latest/doc/10-icinga-template-library/#tcp

Thank you for the help, it sounds like this is the right path (monitoring the server for an open listen connection on port 5665), however I’m clearly implementing wrong, can you make any further suggestions?

My host definition:

object Host "server1" {
        check_command = "tcp"
        vars.tcp_port = "5665"
        vars.client_endpoint = name
}

After reloading and restarting icinga2 service on the Master and server1, I see this in the dashboard:

TCP CRITICAL - Invalid hostname, address or socket:

but when I run the check directly on the machine, it shows success:

root@server1# /usr/lib64/nagios/plugins/check_tcp -p 5665
TCP OK - 0.000 second response time on port 5665|time=0.000346s;;;0.000000;10.000000

any suggestions greatly appreciated

Additionally, I believe that the “check_command” under a Host definition is run from the Monitoring Master’s side, not on the agent. I’m specifically trying to determine if I can use a check_command to tell if tcp 5665 is open on the agent end

I got this to work, had to specify the command_endpoint as the client_endpoint:

object Host "server1" {
    check_command = "tcp"
    check_interval = 1m
    max_check_attempts = 2
    retry_interval = 30s
    vars.tcp_port = "5665"
    vars.address = "localhost"
    vars.client_endpoint = name
    command_endpoint = vars.client_endpoint

 }
1 Like

Great ! Sorry i couldn’t follow up the thread to help, i have been quite busy recently.

No worries, your nudge really helped! Thank you

I know you’ve solved this already, but have you looked into running a satellite node at your HQ? This would make more sense.

e.g. Master <----- Satellite <----- Clients

If you put the satellite and clients in the same zone, the satellite node will be able to monitor the HQ clients up/down status with their local IP and you won’t have to worry about NAT.

We setup a satellite and zone for every site and it has worked very well for us (~50 zones).

1 Like

Tbh I would also vote for a satellite behind the firewall that checks the servers there.
with your solution you have server1 ping itself on tcp/5665. This will “always” be ok, unless the Icinga Agent is not running. And that you won’t notice, since there are no check updates sent when the Agent is down.

Another possibility, if all the hosts have agents installed, is using the cluster-zone check command:
https://icinga.com/docs/icinga2/latest/doc/10-icinga-template-library/#cluster-zone

With that you can check (from the master) if a specified zone is connected (each agent has its own zone/endpoint object).

1 Like