Losing agents connections when deploying new config in Director since 2.11 upgrade

Hi all,

I have problems with my Icinga deployment since upgraded to 2.11.0 and Director 1.7.0.

Everytime a new configuration is deployed using Director, lots of host (and their services) lose connection to Satellite/Master servers. Then, if I wait 1 or 2 minutes, all of them recover connection and seems to work fine.

I have read in other topics that must be caused because of an incorrect Zones configuration and synchronization issues.

Since I stated working with Icinga everything has been managed using Director and it’s the first time I see problems.

Also I have read something about a possible fix in 2.11.1 for this specific problem (Zones).

How can I troubleshoot this issue and confirm that could be related with Zones?

This message is shown when applying config so it seems clear:

Warning: you’re running Icinga v2.11.0 and our configuration looks like you could face issue #7530. We’re already working on a solution. The GitHub Issue and our Upgrading documentation contain related details.

Thanks!

To me it looks like a normal behavior as the director initiates a reload and every connected satellite/agent needs to reconnect.

1 Like

Hi Roland,

Makes sense but it was not happening before 2.11.0 or at least I was not aware of it.

We got more than 300 unknown status until services are re checked.

And also message I get every time I want to deploy some config is pointing to some errors in config.

I recognized this behavior with older versions as well.

Second, there was a change regarding config sync handling. I’d recommend to check if your deployment is affected by this change.

1 Like

Will take a look. Also awaiting for 2.11.1 that seems to “solve” issue.

Hmm I don’t know if 2.11.1 will solve this problem, it is not clear how to re-implement the old behaviour with keeping the bug fix implemented in 2.11, that’s also mentioned in the linked GitHub issue. At this time, I fairly doubt that this will hit a bugfix release anytime soon.

Since you’re seeing the message inside the Director, I strongly recommend to change the Director configuration and remove the master/satellite endpoint and zone details from it, moving it again into the outside zones.conf

There’s some additional hints found in this issue: https://github.com/Icinga/icinga2/issues/7542#issuecomment-535976245 for restoring the correct behaviour.

Cheers,
Michael

Hi Michael,

Of course, I would like to fix the configuration issue I have right now and get rid of Director warning message.

I will check it.

Thanks!!!

Hi,

After installing Icinga2 2.11.1 in Master/Satellite nodes, Directors warning about facing issue #7530 is not happening any more.

Still got hundreds of host/services disconnected (unknown status) then reconnected after 1 minute everytime a new config is deployed.

/Marcos

Hi,

did you apply the configuration changes as suggested?

Cheers,
Michael

Hi Michael,

Not yet. Will try tonight.

Should zones.conf files include information about agents or just master/satellite servers?

Thanks

/Marcos

Hi,

zones.conf on each master/satellite servers updated with endpoints and zones

Director database modified:
update icinga_endpoint set object_type = ‘external_object’ where object_type = ‘object’;
update icinga_zone set object_type = ‘external_object’ where object_type = ‘object’;

Icinga service restarted in all master/satellite servers.

All seems working as expected but still loosing agents connections while deploying new config.
2 minutes later all agents are connected again.

/Marcos