Satelite zone offline but UP in Icinga

toth.szabolcs · April 29, 2019, 10:30am

Hi Everyone!!!

I have a problem, and i don’t know why in my Master-Satelit-Client configuration.
I created everything in Icinga Director, so I have so many working Satelite Zones wich is working great,

I get the checks result correctly in my main (Master) icinga, but if my satelite host is suddenly or planed offline, my main icinga show everthing on this satelite host or its satelite zone hosts is okay. There isn’t DOWN or PROBLEM.

Normal mode, the Check Now is working also correctly for host and for services, but if satelite is down, nothing happened.

I have active and passive check to in the host config file:

object Host “watch.vigado.local” {
display_name = “[VIGADO][SERVER] - watch.vigado.local”
address = “192.168.20.245”
check_command = “hostalive”
max_check_attempts = “3”
check_period = “ALWAYS”
check_interval = 1m
retry_interval = 20s
enable_notifications = true
enable_active_checks = true
enable_passive_checks = true
enable_event_handler = false
enable_perfdata = true
volatile = false
zone = “vigado-zone”
notes = “apt-get install nagios-nrpe-plugin nagios-plugins nagios-plugins-basic nagios-plugins-common\r\n\r\napt install nagios-nrpe-plugin nagios-plugins monitoring-plugins-basic monitoring-plugins-common monitoring-plugins-standard”
icon_image = “debian.png”
groups = [ “VIGADO” ]
vars.AgentInstalled = “true”
vars.alert = “24x7”
vars.os = “linux”
vars.web = false
}

So what did i do wrong? Why didn’t get an UNKNOWN state like this:

My servers are uptodate, icinga2, icinga2web nagios plugins are uptodate.

Thanks for any idea what is wrong with my icingachecks.

Best Regards Szabolcs!

rsx · April 29, 2019, 10:47am

Your screenshot shows a wrong check source, means hostalive will ping itself.

toth.szabolcs · April 29, 2019, 11:35am

Yes, because it’s a satelite host.
So main Master icinga is in public network (like: watch.publicdomain), this satelite is behind a firwall ins a local network (like: watch.vigado.local) the satellite connected to the master, master can’t check the satelite hostalive because the satlite not on the same network. in the local area all the other server has a master or parent: watch.vigado.local, so if you use zone-s the check endpoint is the local master, if i’m not worng.

So all of my local servers check endpoint is my watch.vigado.local host wich is a satelite os my master, and master for my local hosts

The fact is that the Satelite checked there own hostalive i think is not a problem, because the Satelite push the result to the master. Again, the master can’t ping the satelite, but why don’t show UNKNOWN state if can’t get any result from satelite zone?

dnsmichi · April 29, 2019, 1:26pm

For cluster health checks, you can use the cluster-zone check command which checks the connection and turns red in case. The default host and service checks won’t be changed, since satellites will cache ongoing checks in their replay log and later synchronise this again on reconnect. Therefore you cannot modify the check history and have two lines of history - one on the master for connection issues, one from the satellite with actual results.

Cheers,
Michael

toth.szabolcs · April 29, 2019, 2:50pm

Waooo!!! That was i’m looking for. Working, perfect!!! Thanks (google translate:) I’m coming with a beer for you!

object Host “watch.vigado.local” {
display_name = “[VIGADO][SERVER] - watch.vigado.local”
address = “192.168.20.245”
check_command = “cluster-zone”
max_check_attempts = “3”
check_period = “ALWAYS”
check_interval = 1m
retry_interval = 20s
enable_notifications = true
enable_active_checks = true
enable_passive_checks = true
enable_event_handler = false
enable_perfdata = true
volatile = false
zone = “public-zone_master”
command_endpoint = “watch…”
icon_image = “debian.png”
groups = [ “VIGADO” ]
vars.cluster_zone = “vigado-zone”
vars.os = “linux”
vars.web = false
}

dnsmichi · April 30, 2019, 6:43am

Perfect, glad you’ve figured it out Please pick a reply and mark it as solution - this helps others to see immediately how questions have been solved and grants users a badge for being very helpful too.

More here: Mark a topic as solved