Subsequently add second master node for HA-setup via Director

Hi Icinga-Community,

we are currently trying to add a second master node for high-availability to our running, productive Icinga setup. Unfortunately we are have encountered some difficulties regarding the Icinga configs on the new, second master node.

We started with a fresh machine, installed Icinga and run the Node Wizard (created the second master node as satellite). For the following steps we used the official documentation: Distributed Monitoring - Icinga 2
Whereas each our master nodes has its own database backend.
By modifying the zones.conf under “/etc/icinga2/” we were able to get both nodes running. A look at the network traffic (via netstat) showed that the satellites were already communicating with the newly added master node. We have also successfully added the new node as a host via Director in our icingaweb2 frontend. But no new check results could be found in the database of the second master node.

So we looked further on the documentation and found the following:

And from this point it gets more and more messy :confused:
We did this initial sync but afterwards we could not restart the second master node successfully, because of a re-defining error.

E.g.:

critical/config: Error: Object ‘secondmaster.xyz.net’ of type ‘Endpoint’ re-defined: in /var/lib/icinga2/api/packages/director/5484465a-5088-4025-971b-72eeb3ca059c/zones.d/master/agent_endpoints.conf: 1:0-1:39; previous definition: in /etc/icinga2/zones.conf

If we comment out the corresponding object in /etc/icinga2/zones.conf, we get error messages about missing objects.

E.g.:

[2021-05-12 15:19:09 +0200] information/cli: Icinga application loader (version: r2.12.3-1)
[2021-05-12 15:19:09 +0200] information/cli: Loading configuration file(s).
[2021-05-12 15:19:09 +0200] information/ConfigItem: Committing config item(s).
[2021-05-12 15:19:09 +0200] information/ApiListener: My API identity: secondmaster.xyz.net
[2021-05-12 15:19:09 +0200] critical/config: Error: Validation failed for object ‘node-446.xyz.net!win_disk!Teams_service’ of type ‘Notification’; Attribute ‘command’: Object ‘mail-service-notification’ of type ‘NotificationCommand’ does not exist.
Location: in /var/lib/icinga2/api/packages/director/5484465a-5088-4025-971b-72eeb3ca059c/zones.d/master/notification_templates.conf: 2:5-2:41
/var/lib/icinga2/api/packages/director/5484465a-5088-4025-971b-72eeb3ca059c/zones.d/master/notification_templates.conf(1): template Notification “Teams_serviceNotification” {
/var/lib/icinga2/api/packages/director/5484465a-5088-4025-971b-72eeb3ca059c/zones.d/master/notification_templates.conf(2): command = “mail-service-notification”
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
/var/lib/icinga2/api/packages/director/5484465a-5088-4025-971b-72eeb3ca059c/zones.d/master/notification_templates.conf(3): users = [ “team” ]
/var/lib/icinga2/api/packages/director/5484465a-5088-4025-971b-72eeb3ca059c/zones.d/master/notification_templates.conf(4): }
…

This error log goes on endlessly and seems to list every service on every host we have.

Note: For stability reasons we configurated our main config master over the local confs files (only executing checks and master zone) under /etc/icinga2/ and not with the Director. Notifications etc. were completely defined via the Director.

Now we are completely confused if and how to do this initial sync. We searched the Icinga community, Github and the web and couldn’t find a right answer.

Thank you for your help :slight_smile:

1 Like

We’ve run into the same issue trying to introduce a second master. It’s like a chicken and the egg type problem – if you do not define the new master in newmaster:/etc/icinga2/zones.conf as an endpoint belonging to a zone, it will not start. But if you do that, you get an error stating that it’s a duplicate object when the new master tries to sync everything over.

Looks like I’ve run into this issue before:

1 Like

I did some unscheduled digging into our production environment over the weekend while traffic was down – it seems like pushing out the initial master node and/or the master zones with director would cause these issues when introducing a 2nd master.

I am attempting to schedule a maintenance with my internal customer for tomorrow night to reconfigure the master zone and nodes to NOT be pushed with the director, and then attempt to reintroduce the 2nd master.

Assuming this thread gets bumped, does anyone have any input?

It is difficult for me to explain all steps, because we used paid consulting, but in the result, no zones are defined in director anymore.
Also all masters and satellites are defined externally. Only nodes are defined in director.

1 Like

As Konrad mentioned, the problem was solved by removing all zone definitions from the director. Therefore, this issue is no longer relevant for us.

I figured as such – based on the other post where I had a similar problem I am going to remove at least the master zone (probably all zones) from Director… Can’t duplicate a zone and/or endpoint it doesn’t know about

Things worked out great for me – I still deploy my 3 “agent zones” via director, but I no longer push out the master or any of the master endpoints with the director.

Once I resolved this, the 2nd master started accepting configs, but now I’m fighting a different issue.

If someone stumbles upon this, the “fix” is to:

  • Add the master endpoints and master zone to ALL agent /etc/icinga2/zones.conf file
  • Ensure that the /etc/icinga2/zones.conf file on the masters have the master endpoints and zone configured.
  • Either remove the master endpoints and master zone from the director, or better yet drop into the director database and “disable” the endpoints and master zone (easier to roll back if it fails). You can go a step further in the db and also make them “external_objects” – the tables are icinga_zone and icinga_endpoint – you may need to trigger a new config by simply adding or changing a custom variable for a host/service
  • Restart icinga2 on your agents
  • Things should be going.

The steps above can be changed from the master zone to apply to any zone you might be having trouble with. If I recall, you also never want to deploy the “director-global” zone via the director.