Icinga for Windows - Config validation failed for staged cluster config sync

Good Morning all,
I never used the icinga agent in the past (SSH and WMI did their job, but wmic seems to be gonna die
in the future). I am trying the new Icinga for windows solution and have problems with config validation as soon as one windows agent is configured. The following cluster is given:

  • 1 Master server with icingaweb2 in master zone
  • 2 Satellites, zone name is “dc1”, parent is master
  • several linux clients monitored over ssh (works perfectly) and one windows client with icinga for windows for testing, parent zone = dc1
  • As we are using Git (and CI) for the configuration, all of the config should be done on the master server and should be synced to the satellites. Director is not installed, because we love monitoring as code. Every agent needs a separate Endpoint and Zone configuration, but I won’t edit the /etc/icinga2/zones.conf file on every node every time. Instead of this, I tried to configure the agent zone and endpoints part inside of zones.d. On the one hand, I want all windows agents be connected from the dc1 satellites (not the master). Local agent checks should be executed locally with command_endpoint = host.vars.agent_endpoint and checks like http should be done by the dc1 satellites.

I tried to configure it on the master node in the dc1 zone like this:

vi /etc/icinga2/zones.d/dc1/win-1234.conf

object Host "win-1234" {
  import "generic-windows"
  address = "1.2.3.4"
}

object Endpoint "win-1234" {
  host = "1.2.3.4"
}

object Zone "win-1234" {
  endpoints = [ "win-1234" ]
  parent = zone
}

# for the completion
template Host "generic-windows" {
  import "generic-host"
  vars.os = "windows"
  vars.agent_endpoint = name
  icon_image = "win.png"
  vars.load_core = "_Total"
  vars.load_warning = "90"
  vars.load_critical = "100"
}


When this configuration is going live for the first time, everything works. The satellites connect to the agent, local checks are executed correctly etc.
As soon as I change sth. in the configuration again, I got sync validation failures for the staged config sync on both satellites:

- critical/ApiListener: Config validation failed for staged cluster config sync in '/var/lib/icinga2/api/zones-stage/'. Aborting. Logs: '/var/lib/icinga2/api/zones-stage//startup.log'
[2021-11-29 08:33:00 +0100] critical/config: Error: Object 'win-1234' of type 'Host' re-defined: in /var/lib/icinga2/api/zones-stage//win-1234/_etc/win-1234.conf: 1:0-1:28; previous definition: in /var/lib/icinga2/api/zones-stage//dc1/_etc/host/win-1234.conf: 1:0-1:28
Location: in /var/lib/icinga2/api/zones-stage//win-1234/_etc/win-1234.conf: 1:0-1:28
/var/lib/icinga2/api/zones-stage//win-1234/_etc/win-1234.conf(1): object Host "win-1234" {
                                                                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
/var/lib/icinga2/api/zones-stage//win-1234/_etc/win-1234.conf(2):   import "generic-windows"
/var/lib/icinga2/api/zones-stage//win-1234/_etc/win-1234.conf(3):   address = "1.2.3.4"
[2021-11-29 08:33:00 +0100] critical/cli: Config validation failed. Re-run with 'icinga2 daemon -C' after fixing the config.

So it seems that as soon as a new agent is configured, there is a host object created in this zone during config reload, which then conflicts with following config reloads, because it is internally defined twice, although in the config files it is just defined once. Right now I don’t know how to fix this, except don’t configure it under zones.d but configure the Endpoint and Zone part in zones.conf on every node.
Any help here is very appreciated.
Thanks and cheers!

Marcus

Since V2.11 you cannot define zones in zones as described here. That means you need to move your zone and endpoint objects away from /etc/icinga2/zones.d.

Hi @rsx

But the docs you linked also say

The thing you can do: For command_endpoint agents like inside the Director: Host → Agent → yes, there is no config sync for this zone in place. Therefore it is valid to just sync their zones via the config sync.

And this is exactly what I want to do.

Additional thought: I think the internal creation of /var/lib/icinga2/api/zones-stage//win-1234/_etc/win-1234.conf(1): is just wrong. Nothing in the config told icinga to create a host object in this zone.

I’d recommend this issue for better understanding.

I understand what you want to tell me, but that is imho not in the scope of my issue.
There is a host object automatically created by icinga in the zone “win-1234” named “win-1234” which imho should not be done, because no one told icinga to do it. The only host object definition done by me is in the zone dc1.

But to stop going round in circles, you think I have to define every agent endpoint and zone on every Icinga node in zones.conf ?

I use zones.conf for master, satellite and global zones objects only. I’ve added include_recursive to icinga2.conf for host objects. The included directory has one conf file per host object including endpoint and zone objects if needed. And we don’t use config sync any longer for security reasons.

It’s up to you how you arrange your conf files.

Well, I am happy it works your way. But unfortunately it does not gives an answer, why icinga is creating a host object out of nowhere, what results in this re-defined error :slight_smile:

Nevermind, it is working now (exactly how I thought it has to work). It seems that somehow there was a wrong staging state by a wrong configuration in the beginning, what was not deleted.
After manually deleting /var/lib/icinga2/api/zones-stage/ and restarting all nodes, it is working as expected and endpoints / zones are synced