Master-Master Cluster with Satellite not working

hi

i have running an icinga2 master-master cluster and now trying to add a satellite to the cluster.
On the 1st master node the satellite folder exist and is listed in icingaweb2.
But the 2nd master most times writes into ido and then the hosts defined on satellite are not visible in icingaweb2.

Describe the bug

Configuration in /var/lib/icinga2/api/zones/ is missing for Satellite
Host(s) are missing in icingaweb2 which are assigned to satellite.

To Reproduce

  1. Edited zones.conf via ssh and added the new endpoint as zone on master and endpoint.
  2. Same on Endpoint edited zones.conf and added the master.
  3. In director used kickstart to import endpoint and zone.
  4. created a new host with same name as satellite, custer-zone is endpoint zone and icinga2 agent is enabled.
  5. deployed but new host is not visible in icingaweb2
  6. same with additional hosts, when i change cluster zone to master then host(s) are visible. but then checks running on master and not on satellite.

Environment

Include as many relevant details about the environment you experienced the problem in

  • Icinga2 r2.11.0-1
  • Debian 10
  • Enabled features: api checker ido-mysql mainlog notification
  • Icinga Web 2 2.6.2 with director and monitoring module
  • Config validation (information/cli: Finished validating the configuration file(s).)

master config:

object Endpoint “master1” {
host = “192.168.0.1”
}
object Endpoint “master2” {
host = “192.168.0.2”
}
object Zone “master” {
endpoints = [ “master1”,“master2” ]
}

object Zone “global-templates” {
global = true
}

object Zone “director-global” {
global = true
}

object Endpoint “sat1” {
}
object Zone “sat1” {
endpoints = [ “sat1” ]
parent = “master”
}

endpoint config:

object Endpoint “sat1” {
}
object Zone “sat1” {
endpoints = [ “sat1” ]
parent = “master”
}
object Endpoint “master1” {
host = “master1”
}
object Endpoint “master2” {
host = “master2”
}
object Zone “master” {
endpoints = [ “master1”,“master2” ]
}

object Zone “global-templates” {
global = true
}

object Zone “director-global” {
global = true
}

Hello,
Please run icinga2 daemon -C on all masters and satellites to make sure there is no error. Furthermore is the zone configuration on both masters the same. Also can the satellite resolve the hosts master1 and master2 via dns or /etc/hosts?

Regards,
Carsten

Hi Carsten

thanks for feedback.
configuration on both masters and satellite looks good (green).
both masters are reachable from endpoint.
ping and telnet works fine.

i also tried to remvoe all files in /var/lib/icinga2/api/zones/ at both masters, restart icinga2 and deploy again. then sat1 still missing on master2.

Can you check the log on master2, maybe there is a sync problem.
You can also check icinga2 object list --type host --name SATELLITE-HOSTNAME on master2 to check if he got the configuration.

Did you adjust the api config on the endpoint and allow commands and config?
Example:

object ApiListener "api" {
  accept_config = true
  accept_commands = true
}

Is your master2 server working as expected? I.e. loadbalancing checks or taking over when master1 fails…

Even if there are 2 masters, the 2nd master has to be setup like a satellite to get configs from master1.
Docu HA Master

icinga2 object list --type host --name sat1
This returns nothing

accept_config = true
accept_commands = true
is configured.

Failover works, master2 is mostly writing to ido.
when stop icinga2 on master2 is failing over to master1

What does icinga2 object list --type endpoint show?

on all nodes:

Object ‘sat1’ of type ‘Endpoint’:
% declared in ‘/etc/icinga2/zones.conf’, lines 25:1-25:33

  • __name = “sat1”
  • host = “”
  • log_duration = 86400
  • name = “sat1”
  • package = “_etc”
  • port = “5665”
  • source_location
    • first_column = 1
    • first_line = 25
    • last_column = 33
    • last_line = 25
    • path = “/etc/icinga2/zones.conf”
  • templates = [ “sat1” ]
    % = modified in ‘/etc/icinga2/zones.conf’, lines 25:1-25:33
  • type = “Endpoint”
  • zone = “”

Object ‘master2’ of type ‘Endpoint’:
% declared in ‘/etc/icinga2/zones.conf’, lines 9:1-9:33

  • __name = “master2”
  • host = “”
  • log_duration = 86400
  • name = “master2”
  • package = “_etc”
  • port = “5665”
  • source_location
    • first_column = 1
    • first_line = 9
    • last_column = 33
    • last_line = 9
    • path = “/etc/icinga2/zones.conf”
  • templates = [ “master2” ]
    % = modified in ‘/etc/icinga2/zones.conf’, lines 9:1-9:33
  • type = “Endpoint”
  • zone = “”
    Object ‘master1’ of type ‘Endpoint’:
    % declared in ‘/etc/icinga2/zones.conf’, lines 6:1-6:33
  • __name = “master1”
  • host = “master1”
    % = modified in ‘/etc/icinga2/zones.conf’, lines 7:2-7:25
  • log_duration = 86400
  • name = “master1”
  • package = “_etc”
  • port = “5665”
  • source_location
    • first_column = 1
    • first_line = 6
    • last_column = 33
    • last_line = 6
    • path = “/etc/icinga2/zones.conf”
  • templates = [ “master1” ]
    % = modified in ‘/etc/icinga2/zones.conf’, lines 6:1-6:33
  • type = “Endpoint”
  • zone = “”

Can you please share the full rendered configuration from the deployments tab? Obfuscate host names as needed.

Cheers,
Michael

Can you add the “host” line in your endpoint definition? Currently on masters and on the sat1 endpoint your configuration shows this:

object Endpoint “sat1” {
}
object Zone “sat1” {
endpoints = [ “sat1” ]
parent = “master”
}

Change it to

object Endpoint “sat1” {
  host = "sat1"
}
object Zone “sat1” {
endpoints = [ “sat1” ]
parent = “master”
}

(similar to the master endpoints)

Then restart all endpoints (masters and sat1) and try again.
I’ve had cases where this was needed so the communication was correctly working.

14 files rendered in 0.17s
Do you need all 14 files from rendered config?

i verified all nodes and added:
host = “sat1”
also restarted icinga2 on all but still the folder for sat1 is missing:

ls /var/lib/icinga2/api/zones/

director-global master

What happens if you manually create an example host inside the zone?
Make sure /etc/icinga2/zones.d/sat1 exists and create a config file in it, e.g. /etc/icinga2/zones.d/sat1/host1.conf. Just add a basic host object which should be checked from the satellite sat1.
Then reload Icinga2. Does this host show up, does the sat1 zone show up in the path?

i already have 4 hosts on sat1 in /var/lib/icinga2/api/zones/sat1/director/hosts.conf
also performance data for this hosts are submitted, working and visible.
i excluded zones.d folder from icinga.cfg

OK now I’m confused. You want to use zones but you disabled the zones.d folder?
From the docs (https://icinga.com/docs/icinga2/latest/doc/06-distributed-monitoring/):

Navigate to /etc/icinga2/zones.d on your master node icinga2-master1.localdomain and create a new directory with the same name as your satellite/agent zone name

i use icinga director which creates config via api

I’d leave zones.d in the config nonetheless. I don’t know if Icinga 2 really likes if you remove it, even when it’s empty.

To avoid further irritations @twidhalm @Napsty here’s some insights:

  • Managing checkable object files and automatically have them synced throughout the cluster follows the term “zones.d”.
  • That can be static config in /etc/icinga2/zones.d which is automatically detected by Icinga. Never include that manually in icinga2.conf. That will created duplicated inclusion.
  • Or the REST API config packages allow you to define “zones.d” as target which also acts in the way that zone directories with file content are registered for the cluster config sync.

The portions I don’t understand from the problem analysis:

  • /var/lib/icinga2/api/zones/sat1/director/ is Icinga’s internal cache directory, and follows the convention with api / zones / zonename / packagename. If files exist in there, there is no direct guarantee that the core actually has read them.
  • Always verify configuration objects with object list (requires a reload) or better, via the REST API. Create a local ApiUser object on the involved nodes, and query the REST API directory for the runtime state of the host/service.
  • If the host/service doesn’t exist via REST API on the satellite, dig deeper on this side.
  • If the host/service exists, analyse whether it will be checked on the satellite
  • If the satellite does it all fine, check whether the connection to the parent masters works fine
  • Connect to the API event stream on the master(s) and filter for CheckResults coming in for this specific host/service.

All of the above is available in the troubleshooting docs as well.

On a related note - since a HA enabled zone is into play here, an upgrade to 2.11.2 might influence or solve the problem already (If my guess is correct).

Cheers,
Michael

hi michael

thanks for feedback.
i updated to icinga 2.11.2 now but still the same issue.
the endpoints listed with icinga2 object list --type endpoint

And how about the other bullet points from my reply?