Second master ignores zones config (Invalid endpoint origin (client not allowed) // Ignoring config update from endpoint)

Hello,

we have just encountered a strange problem.

In our dev setup, consisting of two masters and ~5 satellite zones, the second master is discarding all zones except the master and global zones. Thus the configration validation fails for all objects not in those two zones.

The normal icinga2.log shows the following messages on reload/restart:

warning/ApiListener: Ignoring config update from endpoint 'msd-ic-ma01' for unknown zone 'msd-azure'.
warning/ApiListener: Ignoring config update from endpoint 'msd-ic-ma01' for unknown zone 'msd-private_vcloud'.
warning/ApiListener: Ignoring config update from endpoint 'msd-ic-ma01' for unknown zone 'mvd-azure'.
warning/ApiListener: Ignoring config update from endpoint 'msd-ic-ma01' for unknown zone 'por-azure'.
warning/ApiListener: Ignoring config update from endpoint 'msd-ic-ma01' for unknown zone 'q1abc-private_vcloud'.
warning/ApiListener: Ignoring config update from endpoint 'msd-ic-ma01' for unknown zone 'q1ale-private_vcloud'.
warning/ApiListener: Ignoring config update from endpoint 'msd-ic-ma01' for unknown zone 'q1au1-azure'.
warning/ApiListener: Ignoring config update from endpoint 'msd-ic-ma01' for unknown zone 'q1au2-azure'.
warning/ApiListener: Ignoring config update from endpoint 'msd-ic-ma01' for unknown zone 'q1au3-private_vcloud'.
warning/ApiListener: Ignoring config update from endpoint 'msd-ic-ma01' for unknown zone 'q1def-azure'.
warning/ApiListener: Ignoring config update from endpoint 'msd-ic-ma01' for unknown zone 'q1dmd-private_vcloud'.
warning/ApiListener: Ignoring config update from endpoint 'msd-ic-ma01' for unknown zone 'q1ini-azure'.
warning/ApiListener: Ignoring config update from endpoint 'msd-ic-ma01' for unknown zone 'q1jkl-azure'.
warning/ApiListener: Ignoring config update from endpoint 'msd-ic-ma01' for unknown zone 'q1mno-azure'.
warning/ApiListener: Ignoring config update from endpoint 'msd-ic-ma01' for unknown zone 'q1poc-azure'.
warning/ApiListener: Ignoring config update from endpoint 'msd-ic-ma01' for unknown zone 'q1poc-private_vcloud'.
warning/ApiListener: Ignoring config update from endpoint 'msd-ic-ma01' for unknown zone 'q1xyz-azure'.

After enabling the debug log we saw the following messages for the first time in our life:
(filtered via

# grep "client not allowed" /var/log/icinga2/debug.log | awk -v FS=Discarding '{print $2}' | sort -n | uniq -c
   2800  'check result' message from 'msd-ic-sl02': Invalid endpoint origin (client not allowed).
    603  'check result' message from 'q1mno-mgmt02': Invalid endpoint origin (client not allowed).
    348  'config update object' message from 'msd-ic-sl01': Invalid endpoint origin (client not allowed).
    116  'config update object' message from 'msd-ic-sl02': Invalid endpoint origin (client not allowed).
    268  'config update object' message from 'q1def-mgmt02': Invalid endpoint origin (client not allowed).
    716  'config update object' message from 'q1mno-mgmt02': Invalid endpoint origin (client not allowed).
   2796  'last_check_started changed' message from 'msd-ic-sl02': Invalid endpoint origin (client not allowed).
    601  'last_check_started changed' message from 'q1mno-mgmt02': Invalid endpoint origin (client not allowed).
   6468  'next check changed' message from 'msd-ic-sl02': Invalid endpoint origin (client not allowed).
   1599  'next check changed' message from 'q1mno-mgmt02': Invalid endpoint origin (client not allowed).
      4  'send notification' message from 'msd-ic-sl02': Invalid endpoint origin (client not allowed).

Atm we have no idea why this is happening or what caused it.
We have tried clearing the /var/lib/icinga2/api folder, which didn’t help.

Configuration of the master and global zones is coming from ma01 in /etc/icinga2/zones.conf.
All other zones are configured in the Icinga Director via API.

ma01 zones.conf
(config master)

/*
 * Generated by Icinga 2 node setup commands
 * on 2020-03-13 08:55:51 +0100
 */

object Endpoint "msd-ic-ma01" {
  host = "10.138.1.2"
}

object Endpoint "msd-ic-ma02" {
  host = "10.132.1.14"
}

object Zone "master" {
        endpoints = [ "msd-ic-ma01", "msd-ic-ma02" ]
}

object Zone "global-templates" {
        global = true
}

object Zone "director-global" {
        global = true
}

object Zone "global-satellites" {
        global = true
}

ma02 zones.conf

object Endpoint "msd-ic-ma01" {
  host = "10.138.1.2"
}

object Endpoint "msd-ic-ma02" {
}

object Zone "master" {
        endpoints = [ "msd-ic-ma01", "msd-ic-ma02" ]
}

object Zone "global-templates" {
        global = true
}

object Zone "director-global" {
        global = true
}

object Zone "global-satellites" {
        global = true
}

both masters features-enabled/api-conf

object ApiListener "api" {
  accept_commands = true
  accept_config = true
  ticket_salt = TicketSalt
}

Any ideas about where to look next or what to try are much appreciated.

Icinga2 version is 2.13.2 running on RHEL8

:v:

1 Like

Had another look in the debug log.

Not sure if this is relevant, but I found it interesting.
When starting both masters each assumes itself as the current master of the zone:
ma01:

[2022-10-25 11:54:28 +0200] notice/ApiListener: Current zone master: msd-ic-ma01

ma02:

[2022-10-25 11:55:00 +0200] notice/ApiListener: Current zone master: msd-ic-ma02

I really don’t understand why ma02 started ignoring the configs from all zones but the master zone…

We fixed this by moving the configuration objects for zones and endpoints of our satellites from the Icinga Director to the /etc/icinga2/zones.conf file on both masters.

The previously configured endpoints and zones recently made our HA cluster only work when the config master was active. The secondary master didn’t want to reload the config because it “suddenly” didn’t know the zones anymore.

Most likely because of the order the config is reloaded in. First /etc/icinga2 then “the rest”.

Hello,

Just my 2 cents way too late, but maybe somebody will stumble upon this thread and this might help.

I had the same issue with both masters believing they are the config master and once one of them went down suddently things pushed via Icinga Director in it “dissapeared”, because the other master was not serving the latest configuration.
The reason is here: Technical Concepts - Icinga 2
More exactly: “Only one config master is allowed. This one identifies itself with configuration files in /etc/icinga2/zones.d
I do remember seeing another documentation page which clearly stated that multiple masters having data in zones.d is not supported, but I cannot find it anymore at this moment.

Aka one should have folders and files in zones.d on master-01, but not in master-02.
Then Icinga Director will push configuration data in master-01.
The resulting configuration (Icinga Director data + what master-01 has in zones.conf + zones.d/*) will be synced to master-02.
If master-01 goes down, -02 will continue to serve the latest configuration and become config master only when -01 is down.

Thank you.

That is wrong, afaik.

If your config master goes down your are not able to deploy new configurations.
Icinga Director needs to point to the configuration master as well.

master02 will take over checks from master01, but will simply continue with its running config in /var/lib/icinga2/api/zones
It will not sync any configuration to any child host.

child host still get the config from the second master, but you are right that changes in icinga director can not be deployed to any node if the config master is down. The second master will distribute the last config it has

1 Like

My bad, the 2 posts above are exactly what I was (poorly) trying to say.

There is only one configuration master, The One with zone folders and config files in zones.d, which is also The One which gets configuration data from Icinga Director.

What I meant to say was that once master-01 goes down, master-02 becomes “Current zone master” as logged in the debug.log. Once 01 is back, 01 becomes again the current zone master.
And yes, while 01 is down all checks are being performed scheduled by 02 only.

Thank you.