Hi everyone. I hope this would be a much shorter thread than my previous one despite it is connected.
I have deployed through puppet 1 master, 2 satellites and several agents. 8 agents are behind sat-01 and 4 behind sat-02. As I said they are deployed through puppet so the configuration is fully consistent on the master where it is structured as follows:
[root@mon-00.local zones.d]# tree
.
├── agent-01.local
│ └── hosts.conf
├── agent-02.local
│ └── hosts.conf
├── agent-03.local
│ └── hosts.conf
├── agent-04.local
│ └── hosts.conf
├── agent-05.local
│ └── hosts.conf
├── agent-06.local
│ └── hosts.conf
├── agent-07.local
│ └── hosts.conf
├── agent-08.local
│ └── hosts.conf
├── agent-09.local
│ └── hosts.conf
├── agent-20.local
│ └── hosts.conf
├── agent-30.local
│ └── hosts.conf
├── agent-40.local
│ └── hosts.conf
├── global-templates
│ ├── commands.conf
│ ├── groups.conf
│ ├── services.conf
│ ├── templates.conf
│ ├── timeperiods.conf
│ └── users.conf
├── master
│ ├── api-users.conf
│ ├── hostgroups.conf
│ ├── hosts.conf
│ └── my-icinga2.pp
├── my-safereload.pp
├── my-safereload.te
├── README
├── sat-01
│ └── hosts.conf
└── sat-02
└── hosts.conf
Now, as you will see below, on satellite 1 the configurations in the api directory are somehow duplicated within the zones-stage directory for no apparent reason. I’ve actually realised that because there is a behavioral inconsistency among the agents which makes the nodes behind satellite 1 to fail the update (and fail to restart if I try) when I add the “zone=master” trick to the hosts to force the check (example below). While the agents behind the satellite 2 accepts the configuration with no problems at all.
So the agent-01.local is behind satellite 1 and misbehave just like all the others behind this satellite
object Host "agent-01.local" {
address = "10.X.X.X"
display_name = "agent-01.local"
check_command = "hostalive"
zone = "master" <--- screws the node up here
vars.client_endpoint = name
}
when it syncs its configuration you can see:
critical/Application: Found error in config: reloading aborted
and if restarted it fails with:
display_name = “agent-01.local”
check_command = “hostalive”
zone = “master”
^^^^^^^^^^^^^^^
surprisingly the same configuration is perfectly fine for the agents behind the satellite 2
object Host "agent-08.local" {
address = "10.X.X.X"
display_name = "agent-08.local"
check_command = "hostalive"
zone = "master" <--- perfectly fine here
vars.client_endpoint = name
}
SATELLITE 1 (mon-02.local)
> root@mon-02.local:/var/lib/icinga2/api# tree > . > |-- log > | |-- 1594803999 > | |-- 1594804326 > | `-- current > |-- packages > | `-- _api > | |-- 5cff5d89-ca2e-45bc-b504-d45272c1f187 > | | |-- conf.d > | | |-- include.conf > | | `-- zones.d > | |-- active-stage > | |-- active.conf > | `-- include.conf > |-- repository > |-- zones > | |-- agent-01.local > | | `-- _etc > | | `-- hosts.conf > | |-- agent-02.local > | | `-- _etc > | | `-- hosts.conf > | |-- agent-03.local > | | `-- _etc > | | `-- hosts.conf > | |-- agent-04.local > | | `-- _etc > | | `-- hosts.conf > | |-- agent-05.local > | | `-- _etc > | | `-- hosts.conf > | |-- agent-20.local > | | `-- _etc > | | `-- hosts.conf > | |-- agent-30.local > | | `-- _etc > | | `-- hosts.conf > | |-- agent-40.local > | | `-- _etc > | | `-- hosts.conf > | |-- global-templates > | | `-- _etc > | | |-- commands.conf > | | |-- groups.conf > | | |-- services.conf > | | |-- templates.conf > | | |-- timeperiods.conf > | | `-- users.conf > | `-- sat-01 > | `-- _etc > | `-- hosts.conf > `-- zones-stage <------ WHY IS THIS AND ITS CONTENT HERE??? > |-- agent-01.local > | `-- _etc > | `-- hosts.conf > |-- agent-02.local > | `-- _etc > | `-- hosts.conf > |-- agent-03.local > | `-- _etc > | `-- hosts.conf > |-- agent-04.local > | `-- _etc > | `-- hosts.conf > |-- agent-05.local > | `-- _etc > | `-- hosts.conf > |-- agent-20.local > | `-- _etc > | `-- hosts.conf > |-- agent-30.local > | `-- _etc > | `-- hosts.conf > |-- agent-40.local > | `-- _etc > | `-- hosts.conf > |-- global-templates > | `-- _etc > | |-- commands.conf > | |-- groups.conf > | |-- services.conf > | |-- templates.conf > | |-- timeperiods.conf > | `-- users.conf > `-- sat-01 > `-- _etc > `-- hosts.conf > > 49 directories, 37 files
SATELLITE 2 (mon-03.local)
> root@mon-03.local:/var/lib/icinga2/api# tree > . > |-- log > | |-- 1594733785 > | |-- 1594734875 > | |-- 1594735042 > | |-- 1594735045 > | |-- 1594735135 > | |-- 1594737135 > | |-- 1594737655 > | |-- 1594745290 > | |-- 1594745293 > | |-- 1594746113 > | |-- 1594747992 > | |-- 1594747994 > | |-- 1594748234 > | |-- 1594748235 > | |-- 1594801541 > | |-- 1594802231 > | |-- 1594802233 > | |-- 1594802300 > | |-- 1594803860 > | |-- 1594803869 > | `-- current > |-- packages > | `-- _api > | |-- active-stage > | |-- active.conf > | |-- ff1fd466-8cb7-44ad-abcc-b43820b1e0b1 > | | |-- conf.d > | | | `-- downtimes > | | |-- include.conf > | | `-- zones.d > | `-- include.conf > |-- repository > `-- zones > |-- agent-06.local > | `-- _etc > | `-- hosts.conf > |-- agent-07.local > | `-- _etc > | `-- hosts.conf > |-- agent-08.local > | `-- _etc > | `-- hosts.conf > |-- agent-09.local > | `-- _etc > | `-- hosts.conf > |-- global-templates > | `-- _etc > | |-- commands.conf > | |-- groups.conf > | |-- services.conf > | |-- templates.conf > | |-- timeperiods.conf > | `-- users.conf > `-- sat-02 > `-- _etc > `-- hosts.conf > > 21 directories, 36 files