3 Tier distributed setup - cross zone service checks

Just looking at an inherited system on 2.11.3/2.11.4 that I’m trying to understand
and possibly fix and for clarification if something like the following would be expected to work at all.

  • As far as I can tell, the whole thing seems to be a 3 tier distributed master/satellite/agent setup as described in the ICINGA2 documentation.
  • No Director
  • Master with multiple satellite zones (one per global region)
    → EMEA
    → NAFTA
    → APAC
  • Hosts usually only reachable from their respective satellite
  • A Backup Server in zone EMEA with local ICINGA2 agent installation
  • Hundreds of Host objects in all the satellite zones
  • A service check “Backup-Status” assignd to all applicable Hosts
  • The service check object has its zone property set to “EMEA” and command_endpoint property to the name of the endpoint object of the Backup Server
  • Currently, checks outside of the zone of the Backup Server (EMEA) never get scheduled (no check source).
  • In the past this seems to have once worked, as on older hosts there are stale check results with plausible result (months old)
  • On newly added hosts the service check stays in “PENDING”
  • “Check Now” only triggers a forced service check log entry in the debug.log on master
  • Generally {hostname} does not run and cannot run an ICINGA2 agent (they are mostly firewall or router appliances)
  • The host, endpoint and zone objects for master, satelites and the backup server are present on all master/sattellite/agent nodes
  • “icinga2 object list --name ‘{hostname}!Backup-Status’” yields the Service object on the master and
    the satellite of the zone of {hostname}, but nowhere else. They only differ in the value of “package” (“_etc” for master, and “_cluster” for the satellite of the zone of {hostname})

I’ll probbaly be able to provide anonymized command output and config files (pending approval).
But these would have to have any hostnames, ip addresses and other identifying information replaced by unique placeholders (i.e. master.doma.in, satellite-emea-1.doma.in, satellite-emea-2.doma.in , satelite-nafta.doma.in, backup-server.doma.in)

Hi & welcome,

Whit that:

the service will only be available for hosts on the EMEA satellite. You need to put the service object in a global zone instead.

Ok, If I understand this right, a zone with the propery “global=true”, right?
And since the backup server is running an icinga agent, its endpoint object would need that global zone as parent, right?

if I configure both of the above, config validation barfs on the master with:
"critical/config: Error: Zone ‘backupserver.doma.in’ can not have a global zone as parent.

If I only put the service check object in the shared zone, config validation complains about the command_endpoint beeing in the wrong zone:
critical/config: Error: Validation failed for object ‘{hostname}!Backup Status’ of type ‘Service’; Attribute ‘command_endpoint’: Command endpoint must be in zone ‘shared-test’ or in a direct child zone thereof.

object Zone “shared-test” {
global = true

object Zone “backupserver.doma.in” {
endpoints = [ “backupserver.doma.in” ]
parent = “shared-test”
object Endpoint “backupserver.doma.in” {
host = “backupserver.doma.in”

apply Service “Backup Status” {
import “generic-service”
zone = “shared-test”
command_endpoint = “backupserver.doma.in”
assign where host.vars.do_backup_check

A global zone can exist on every icinga instance (master, satellite and/or agent) by simply adding its definition to zones.conf of all that instances. There is no parent needed. There is usually one global zone global-templates defined by default.

It looks like you are using it (by accident) storing backupstatus.conf in its directory. Which is ok, but you overwrite the service zone with

zone = “shared-test”

So, please remove this.

BTW: command_endpoint don’t need to be defined individually, best practice is to use

command_endpoint = host.name

Zone and endpoint objects are not allowed to be defined within a zone (directory). The zone concept is pretty hard to understand at the beginning, but it’s crucial and you should get familiar e.g. by fully reading Distributed Monitoring.