No agents on master when config master is down

I have the setup as drawn in my beautiful colorful picture :slight_smile:

On my config master I have the zones.conf with all the (red) nodes and in “/zones.d/satellitezone/zones.conf” I have the (blue) agents.

If my config master is down the second master loses the connection to the two agents below the satellite.

What did I do wrong? I needed to fill the zones.conf at master B to get it working properly, but should I also create a zone manually on master B so master B will know about those when the config master A goes down ? What am I doing wrong here?

Everything else works perfectly! Thanks Icinga2 !

Hi,

thanks for the picture, that helps :slight_smile:

Can you please share the content of both zones.conf from both masters? I’d say that something is missing here.

Also, which connection direction is used for the agents? Is it the master connecting to them, or do they connect themselves? If the latter applies, they need to have the parent zone “master” configured with both master endpoints inside.

Cheers,
Michael

@dnsmichi thanks for your reply!

Compared the two using notepad++ and everything is exactly the same except for the 2nd master having a host ip of the config master. See below. (removed other endpoints as they are the same in both files)

I use the latter, so agents connect to the master. And yes, every agents has both master endpoints with ip and master zone with the two endpoints, see picture:

piece of config master:

/*

  • Master Zone with icinga Endpoint
  • child cosmos Zone in parent masterzone
    */

object Zone “master” {
endpoints = [ “icinga”, “icinga2” ]
}

object Endpoint “icinga” {
//thats us

}

object Endpoint “icinga2” {

}

//object Zone “icinga2” {
// endpoints = [ “icinga2” ]
// parent = “master”
//}

object Zone “global-templates” {
global = true
}

object Zone “director-global” {
global = true
}

piece of 2nd master:

/*

  • Master Zone with icinga Endpoint
  • child cosmos Zone in parent masterzone
    */

object Zone “master” {
endpoints = [ “icinga”, “icinga2” ]
}

object Endpoint “icinga” {
host = “ip.ip.ip.ip”
port = “5665”
}

object Endpoint “icinga2” {
//thats us
}

object Zone “global-templates” {
global = true
}

object Zone “director-global” {
global = true
}

Tried to keep the colours for christmas :slight_smile:

Hi,

if I read the image and configuration correctly, the masters do not know about the satellite zone. Edit /etc/icinga2/zones.conf and add the following entries on both masters:

object Endpoint "satellite" {

}

object Zone "satellite" {
  endpoints = [ "satellite" ]
  parent = "master"
}

That is needed as otherwise both masters won’t trust the satellite zone, neither will they sync the configuration to that zone, including the agents below the master.

Cheers,
Michael

1 Like

Hi @dnsmichi,

Sorry for the late reply, but didnt forget about it :wink: !

The two (blue) were monitored correctly unless the config master is down like described above.

Are you suggesting to add /icinga2/zones.d/satellite/zones.conf content to the /icinga2/zones.conf ?

If I try that on the 2nd master it tells me that there is a ‘duplicate’ (if my interpretation is right)

So I first changed the config master and added the /icinga2/zones.d/satellite/zones.conf content to the /icinga2/zones.conf file to let it sync. I restarted the service on the config master but now get this warning:
warning

I really thought that adding a (blue) zone to the subfolder in zones.d was the right way to add a satellite with agents to it and that it would sync to other masters.

Would love to know how to solve this the way that is best. Any reference to a compleet example with two masters and satellites including agents and the according zones.conf would help to !

For now my satellite and two agents doesnt seem to recover and show an error in icinga web. The satellite cluster seems to be flapping. Files appear and disappear on the satellite looking at these two folders :slight_smile:
./var/lib/icinga2/api/zones-stage/global-templates/_etc
./var/lib/icinga2/api/zones/validator02/_etc

Thanks!

I don’t know its content, but if there’s Zone/Endpoint objects for the satellite zone, I’d recommend moving them to /etc/icinga2/zones.conf

Cheers,
Michael

Are you saying that I need to add al zone/endpoints from the RED zone (see picture) to the masters config file (zones.conf) and that I need to add all endpoints/zones of the BLUE zone into the Satellite agent (zones.conf) ?

I thought that using a folder zone on the master would prevent this manual editing on the satellite ?

For indirectly connected zones over 3 levels, this works with regard of agents as command endpoints. Everything else needs to be configured inside the zones.conf for 1) allowing the zone trust levels 2) enabling the cluster config sync.

That being said, the master needs to know about the satellite zone, outside of zones.d, thus in zones.conf. That’s what’s written in the documentation too.

Cheers,
Michael

Sorry for starting to ask noob questions but what should I do now? (Dont completely understand what you are saying)

( I already have all 2nd level endpoints/zones (RED) into the zones.conf file on the master (=1st level). I only added the (blue) 3rd level endpoints/agents to the satellite/zones.conf. )

Is this assumption correct ?

“Are you saying that I need to add al zone/endpoints from the RED zone (see picture) to the masters config file (zones.conf) and that I need to add all endpoints/zones of the BLUE zone into the Satellite agent (zones.conf) (and thus NOT into a folder on the master) ?”

added picture:

Do as suggested above, edit the zones.conf on the config master and the secondary master and add

object Endpoint "satellite" {

}

object Zone "satellite" {
  endpoints = [ "satellite" ]
  parent = "master"
}

Remove the objects which are colliding in /etc/icinga2/zones.d/satellite/zones.conf (you can also comment them out).

Restart Icinga.

1 Like

Btw, cleaning up /var/lib/icinga2/api on alot of clients seems to solve alot of problems but not all of them.

1#
Also, I previously disabled the 2nd master, enabling this 2nd master again solved the remaining cluster endpoint problems but when I stop the 2nd master service again, one agent on 3rd level gives the cluster error again. This behavious repeats when stopping and starting the 2nd master.
Shouldnt those automatically check for the other master ?

2#
Reaction to previous post: That’s what I wanted to point out, I already had the red zone as in the picture in the zones.conf of the config master (including the satellite endpoint). I only had the blue zones endpoints in the satellite/zones.conf file. Now placed everything in the zones.conf file and the blue 3rd level endpoints also (manual) in the satellite zones.conf file.

So now we are back to the original question, why do my two agents on the 3rd level disconnect when stopping the 2nd master?

Do you have logs from all three levels and endpoints at the timestamp this happens?

I could choose a time and stop the second master then grep the log from the first master, satellite and two endpoints. That’s what you mean right? If yes, where to send them?

Back. ( @dnsmichi )

Your suggestion to move the agents from the satellite zone from the zones.conf file ( /etc/icinga2/zones.d/satellite/zones.conf ) to the master zones.conf eventually (after some very strange sync problems) seemed to solve my problem
BUT I also had to write the same agents ( /etc/icinga2/zones.d/satellite/zones.conf ) to the zones.conf of the satellite otherwise I get alot of errors with “icinga2 daemon -C”

Is this the way to go when using a satellite zone? Do I need to write the agents from the satellite zone into zones.conf of both the master(s) and the satellite ?