2nd master sends notification for acknowledged hosts

Hi,

So we have a setup with 2x master 2x satelite. all worked fine untill last week when in noticed the icinga service on the 2nd master stopped. I restarted the service and all was honkydory, for some time, but in the night the second master started to send out notifications for services and host that was already acknowledged. this was only from the 2nd master.
To explain, most service checks are done by the 2x satelites, but all notifications are done by the 2x masters, in a “loadbalenced” way. so i stopped the service on master2 and all is fine.
Read a bit, looked at logs, and noticed something, so i decided to upgrade from 2.11 -> r2.12.1-1.
all looked fine, but after some time same started to happen.

grep “information/Api” from master02 log

[2020-11-24 09:33:04 +0100] information/ApiListener: 'api' started.
[2020-11-24 09:33:04 +0100] information/ApiListener: Started new listener on '[0.0.0.0]:5665'
[2020-11-24 09:33:04 +0100] information/ApiListener: Reconnecting to endpoint 'icingag01.cnt.int' via host 'icingag01.cnt.int' and port '5665'
[2020-11-24 09:33:04 +0100] information/ApiListener: Reconnecting to endpoint 'icingam01.cnt.int' via host 'icingam01.cnt.int' and port '5665'
[2020-11-24 09:33:04 +0100] information/ApiListener: Reconnecting to endpoint 'monitoringvmdb01.cnt.int' via host 'monitoringvmdb01.cnt.int' and port '5665'
[2020-11-24 09:33:04 +0100] information/ApiListener: Reconnecting to endpoint 'icingas01.cnt.int' via host 'icingas01.cnt.int' and port '5665'
[2020-11-24 09:33:04 +0100] information/ApiListener: Reconnecting to endpoint 'icingas02.cnt.int' via host 'icingas02.cnt.int' and port '5665'
[2020-11-24 09:33:04 +0100] information/ApiListener: New client connection for identity 'icingas02.cnt.int' to [10.222.220.62]:5665
[2020-11-24 09:33:04 +0100] information/ApiListener: Sending config updates for endpoint 'icingas02.cnt.int' in zone 'zone_bk'.
[2020-11-24 09:33:04 +0100] information/ApiListener: Syncing configuration files for global zone 'global-templates' to endpoint 'icingas02.cnt.int'.
[2020-11-24 09:33:04 +0100] information/ApiListener: New client connection for identity 'icingas01.cnt.int' to [10.222.220.61]:5665
[2020-11-24 09:33:04 +0100] information/ApiListener: Sending config updates for endpoint 'icingas01.cnt.int' in zone 'zone_bk'.
[2020-11-24 09:33:04 +0100] information/ApiListener: Syncing configuration files for global zone 'global-templates' to endpoint 'icingas01.cnt.int'.
[2020-11-24 09:33:04 +0100] information/ApiListener: Syncing configuration files for global zone 'director-global' to endpoint 'icingas02.cnt.int'.
[2020-11-24 09:33:04 +0100] information/ApiListener: Syncing configuration files for global zone 'director-global' to endpoint 'icingas01.cnt.int'.
[2020-11-24 09:33:04 +0100] information/ApiListener: Syncing configuration files for zone 'zone_bk' to endpoint 'icingas02.cnt.int'.
[2020-11-24 09:33:04 +0100] information/ApiListener: Syncing configuration files for zone 'zone_bk' to endpoint 'icingas01.cnt.int'.
[2020-11-24 09:33:04 +0100] information/ApiListener: New client connection for identity 'monitoringvmdb01.cnt.int' to [10.222.220.40]:5665
[2020-11-24 09:33:04 +0100] information/ApiListener: Sending config updates for endpoint 'monitoringvmdb01.cnt.int' in zone 'zone_db'.
[2020-11-24 09:33:04 +0100] information/ApiListener: Syncing configuration files for global zone 'global-templates' to endpoint 'monitoringvmdb01.cnt.int'.
[2020-11-24 09:33:04 +0100] information/ApiListener: Syncing configuration files for global zone 'director-global' to endpoint 'monitoringvmdb01.cnt.int'.
[2020-11-24 09:33:04 +0100] information/ApiListener: Finished sending config file updates for endpoint 'monitoringvmdb01.cnt.int' in zone 'zone_db'.
[2020-11-24 09:33:04 +0100] information/ApiListener: Syncing runtime objects to endpoint 'monitoringvmdb01.cnt.int'.
[2020-11-24 09:33:04 +0100] information/ApiListener: New client connection for identity 'icingag01.cnt.int' to [10.222.220.50]:5665
[2020-11-24 09:33:04 +0100] information/ApiListener: Sending config updates for endpoint 'icingag01.cnt.int' in zone 'zone_gr'.
[2020-11-24 09:33:04 +0100] information/ApiListener: Syncing configuration files for global zone 'global-templates' to endpoint 'icingag01.cnt.int'.
[2020-11-24 09:33:04 +0100] information/ApiListener: Syncing configuration files for global zone 'director-global' to endpoint 'icingag01.cnt.int'.
[2020-11-24 09:33:04 +0100] information/ApiListener: Finished sending config file updates for endpoint 'icingag01.cnt.int' in zone 'zone_gr'.
[2020-11-24 09:33:04 +0100] information/ApiListener: Syncing runtime objects to endpoint 'icingag01.cnt.int'.
[2020-11-24 09:33:04 +0100] information/ApiListener: New client connection for identity 'icingam01.cnt.int' to [10.222.220.51]:5665
[2020-11-24 09:33:04 +0100] information/ApiListener: Sending config updates for endpoint 'icingam01.cnt.int' in zone 'master'.
[2020-11-24 09:33:04 +0100] information/ApiListener: Syncing configuration files for global zone 'global-templates' to endpoint 'icingam01.cnt.int'.
[2020-11-24 09:33:04 +0100] information/ApiListener: Syncing configuration files for global zone 'director-global' to endpoint 'icingam01.cnt.int'.
[2020-11-24 09:33:04 +0100] information/ApiListener: Syncing configuration files for zone 'master' to endpoint 'icingam01.cnt.int'.
[2020-11-24 09:33:04 +0100] information/ApiListener: Syncing configuration files for zone 'zone_bk' to endpoint 'icingam01.cnt.int'.
[2020-11-24 09:33:05 +0100] information/ApiListener: Finished syncing runtime objects to endpoint 'icingag01.cnt.int'.
[2020-11-24 09:33:05 +0100] information/ApiListener: Finished sending runtime config updates for endpoint 'icingag01.cnt.int' in zone 'zone_gr'.
[2020-11-24 09:33:05 +0100] information/ApiListener: Finished syncing runtime objects to endpoint 'monitoringvmdb01.cnt.int'.
[2020-11-24 09:33:05 +0100] information/ApiListener: Sending replay log for endpoint 'icingag01.cnt.int' in zone 'zone_gr'.
[2020-11-24 09:33:05 +0100] information/ApiListener: Finished sending runtime config updates for endpoint 'monitoringvmdb01.cnt.int' in zone 'zone_db'.
[2020-11-24 09:33:05 +0100] information/ApiListener: Sending replay log for endpoint 'monitoringvmdb01.cnt.int' in zone 'zone_db'.
[2020-11-24 09:33:05 +0100] information/ApiListener: Finished sending config file updates for endpoint 'icingas01.cnt.int' in zone 'zone_bk'.
[2020-11-24 09:33:05 +0100] information/ApiListener: Syncing runtime objects to endpoint 'icingas01.cnt.int'.
[2020-11-24 09:33:05 +0100] information/ApiListener: Finished sending config file updates for endpoint 'icingas02.cnt.int' in zone 'zone_bk'.
[2020-11-24 09:33:05 +0100] information/ApiListener: Syncing runtime objects to endpoint 'icingas02.cnt.int'.
[2020-11-24 09:33:05 +0100] information/ApiListener: Finished sending config file updates for endpoint 'icingam01.cnt.int' in zone 'master'.
[2020-11-24 09:33:05 +0100] information/ApiListener: Syncing runtime objects to endpoint 'icingam01.cnt.int'.
[2020-11-24 09:33:05 +0100] information/ApiListener: Finished syncing runtime objects to endpoint 'icingam01.cnt.int'.
[2020-11-24 09:33:05 +0100] information/ApiListener: Finished sending runtime config updates for endpoint 'icingam01.cnt.int' in zone 'master'.
[2020-11-24 09:33:05 +0100] information/ApiListener: Sending replay log for endpoint 'icingam01.cnt.int' in zone 'master'.
[2020-11-24 09:33:05 +0100] information/ApiListener: Finished syncing runtime objects to endpoint 'icingas02.cnt.int'.
[2020-11-24 09:33:05 +0100] information/ApiListener: Finished sending runtime config updates for endpoint 'icingas02.cnt.int' in zone 'zone_bk'.
[2020-11-24 09:33:05 +0100] information/ApiListener: Sending replay log for endpoint 'icingas02.cnt.int' in zone 'zone_bk'.
[2020-11-24 09:33:05 +0100] information/ApiListener: Finished syncing runtime objects to endpoint 'icingas01.cnt.int'.
[2020-11-24 09:33:05 +0100] information/ApiListener: Finished sending runtime config updates for endpoint 'icingas01.cnt.int' in zone 'zone_bk'.
[2020-11-24 09:33:05 +0100] information/ApiListener: Sending replay log for endpoint 'icingas01.cnt.int' in zone 'zone_bk'.
[2020-11-24 09:33:05 +0100] information/ApiListener: Finished sending replay log for endpoint 'monitoringvmdb01.cnt.int' in zone 'zone_db'.
[2020-11-24 09:33:05 +0100] information/ApiListener: Finished syncing endpoint 'monitoringvmdb01.cnt.int' in zone 'zone_db'.
[2020-11-24 09:33:05 +0100] information/ApiListener: Finished reconnecting to endpoint 'monitoringvmdb01.cnt.int' via host 'monitoringvmdb01.cnt.int' and port '5665'
[2020-11-24 09:33:05 +0100] information/ApiListener: Finished sending replay log for endpoint 'icingag01.cnt.int' in zone 'zone_gr'.
[2020-11-24 09:33:05 +0100] information/ApiListener: Finished syncing endpoint 'icingag01.cnt.int' in zone 'zone_gr'.
[2020-11-24 09:33:05 +0100] information/ApiListener: Finished reconnecting to endpoint 'icingag01.cnt.int' via host 'icingag01.cnt.int' and port '5665'
[2020-11-24 09:33:05 +0100] information/ApiListener: Finished sending replay log for endpoint 'icingas02.cnt.int' in zone 'zone_bk'.
[2020-11-24 09:33:05 +0100] information/ApiListener: Finished syncing endpoint 'icingas02.cnt.int' in zone 'zone_bk'.
[2020-11-24 09:33:05 +0100] information/ApiListener: Finished reconnecting to endpoint 'icingas02.cnt.int' via host 'icingas02.cnt.int' and port '5665'
[2020-11-24 09:33:05 +0100] information/ApiListener: Finished sending replay log for endpoint 'icingas01.cnt.int' in zone 'zone_bk'.
[2020-11-24 09:33:05 +0100] information/ApiListener: Finished syncing endpoint 'icingas01.cnt.int' in zone 'zone_bk'.
[2020-11-24 09:33:05 +0100] information/ApiListener: Finished reconnecting to endpoint 'icingas01.cnt.int' via host 'icingas01.cnt.int' and port '5665'
[2020-11-24 09:33:05 +0100] information/ApiListener: Finished sending replay log for endpoint 'icingam01.cnt.int' in zone 'master'.
[2020-11-24 09:33:05 +0100] information/ApiListener: Finished syncing endpoint 'icingam01.cnt.int' in zone 'master'.
[2020-11-24 09:33:05 +0100] information/ApiListener: Finished reconnecting to endpoint 'icingam01.cnt.int' via host 'icingam01.cnt.int' and port '5665'
[2020-11-24 09:33:17 +0100] information/ApiListener: Applying config update from endpoint 'icingam01.cnt.int' of zone 'master'.
[2020-11-24 09:33:17 +0100] information/ApiListener: Received configuration for zone 'director-global' from endpoint 'icingam01.cnt.int'. Comparing the timestamp and checksums.
[2020-11-24 09:33:17 +0100] information/ApiListener: Our production configuration is more recent than the received configuration update. Ignoring configuration file update for path '/var/lib/icinga2/api/zones-stage/director-global'. Current timestamp '2020-11-24 05:15:40 +0100' (1606191340.990582) >= received timestamp '2020-11-24 05:15:40 +0100' (1606191340.990582).
[2020-11-24 09:33:17 +0100] information/ApiListener: Stage: Updating received configuration file '/var/lib/icinga2/api/zones-stage/director-global//director/001-director-basics.conf' for zone 'director-global'.
[2020-11-24 09:33:17 +0100] information/ApiListener: Stage: Updating received configuration file '/var/lib/icinga2/api/zones-stage/director-global//director/commands.conf' for zone 'director-global'.
[2020-11-24 09:33:17 +0100] information/ApiListener: Stage: Updating received configuration file '/var/lib/icinga2/api/zones-stage/director-global//director/host_templates.conf' for zone 'director-global'.
[2020-11-24 09:33:17 +0100] information/ApiListener: Stage: Updating received configuration file '/var/lib/icinga2/api/zones-stage/director-global//director/hostgroups.conf' for zone 'director-global'.
[2020-11-24 09:33:17 +0100] information/ApiListener: Stage: Updating received configuration file '/var/lib/icinga2/api/zones-stage/director-global//director/service_apply.conf' for zone 'director-global'.
[2020-11-24 09:33:17 +0100] information/ApiListener: Stage: Updating received configuration file '/var/lib/icinga2/api/zones-stage/director-global//director/service_templates.conf' for zone 'director-global'.
[2020-11-24 09:33:17 +0100] information/ApiListener: Stage: Updating received configuration file '/var/lib/icinga2/api/zones-stage/director-global//director/user_templates.conf' for zone 'director-global'.
[2020-11-24 09:33:17 +0100] information/ApiListener: Applying configuration file update for path '/var/lib/icinga2/api/zones-stage/director-global' (12796 Bytes).
[2020-11-24 09:33:17 +0100] information/ApiListener: Received configuration for zone 'global-templates' from endpoint 'icingam01.cnt.int'. Comparing the timestamp and checksums.
[2020-11-24 09:33:17 +0100] information/ApiListener: Our production configuration is more recent than the received configuration update. Ignoring configuration file update for path '/var/lib/icinga2/api/zones-stage/global-templates'. Current timestamp '2020-11-24 05:15:40 +0100' (1606191340.896365) >= received timestamp '2020-11-24 05:15:40 +0100' (1606191340.896365).
[2020-11-24 09:33:17 +0100] information/ApiListener: Stage: Updating received configuration file '/var/lib/icinga2/api/zones-stage/global-templates//_etc/commands.conf' for zone 'global-templates'.
[2020-11-24 09:33:17 +0100] information/ApiListener: Applying configuration file update for path '/var/lib/icinga2/api/zones-stage/global-templates' (8591 Bytes).
[2020-11-24 09:33:17 +0100] information/ApiListener: Received configuration for zone 'master' from endpoint 'icingam01.cnt.int'. Comparing the timestamp and checksums.
[2020-11-24 09:33:17 +0100] information/ApiListener: Our production configuration is more recent than the received configuration update. Ignoring configuration file update for path '/var/lib/icinga2/api/zones-stage/master'. Current timestamp '2020-11-24 05:15:40 +0100' (1606191340.895572) >= received timestamp '2020-11-24 05:15:40 +0100' (1606191340.895572).
[2020-11-24 09:33:17 +0100] information/ApiListener: Stage: Updating received configuration file '/var/lib/icinga2/api/zones-stage/master//director/host_templates.conf' for zone 'master'.
[2020-11-24 09:33:17 +0100] information/ApiListener: Stage: Updating received configuration file '/var/lib/icinga2/api/zones-stage/master//director/hosts.conf' for zone 'master'.
[2020-11-24 09:33:17 +0100] information/ApiListener: Stage: Updating received configuration file '/var/lib/icinga2/api/zones-stage/master//director/notification_apply.conf' for zone 'master'.
[2020-11-24 09:33:17 +0100] information/ApiListener: Stage: Updating received configuration file '/var/lib/icinga2/api/zones-stage/master//director/notification_templates.conf' for zone 'master'.
[2020-11-24 09:33:17 +0100] information/ApiListener: Stage: Updating received configuration file '/var/lib/icinga2/api/zones-stage/master//director/services.conf' for zone 'master'.
[2020-11-24 09:33:17 +0100] information/ApiListener: Stage: Updating received configuration file '/var/lib/icinga2/api/zones-stage/master//director/usergroups.conf' for zone 'master'.
[2020-11-24 09:33:17 +0100] information/ApiListener: Stage: Updating received configuration file '/var/lib/icinga2/api/zones-stage/master//director/users.conf' for zone 'master'.
[2020-11-24 09:33:17 +0100] information/ApiListener: Applying configuration file update for path '/var/lib/icinga2/api/zones-stage/master' (78811 Bytes).
[2020-11-24 09:33:17 +0100] information/ApiListener: Received configuration for zone 'zone_bk' from endpoint 'icingam01.cnt.int'. Comparing the timestamp and checksums.
[2020-11-24 09:33:17 +0100] information/ApiListener: Our production configuration is more recent than the received configuration update. Ignoring configuration file update for path '/var/lib/icinga2/api/zones-stage/zone_bk'. Current timestamp '2020-11-24 05:15:40 +0100' (1606191340.988809) >= received timestamp '2020-11-24 05:15:40 +0100' (1606191340.988809).
[2020-11-24 09:33:17 +0100] information/ApiListener: Stage: Updating received configuration file '/var/lib/icinga2/api/zones-stage/zone_bk//director/host_templates.conf' for zone 'zone_bk'.
[2020-11-24 09:33:17 +0100] information/ApiListener: Stage: Updating received configuration file '/var/lib/icinga2/api/zones-stage/zone_bk//director/hosts.conf' for zone 'zone_bk'.
[2020-11-24 09:33:17 +0100] information/ApiListener: Stage: Updating received configuration file '/var/lib/icinga2/api/zones-stage/zone_bk//director/service_apply.conf' for zone 'zone_bk'.
[2020-11-24 09:33:17 +0100] information/ApiListener: Stage: Updating received configuration file '/var/lib/icinga2/api/zones-stage/zone_bk//director/service_templates.conf' for zone 'zone_bk'.
[2020-11-24 09:33:17 +0100] information/ApiListener: Stage: Updating received configuration file '/var/lib/icinga2/api/zones-stage/zone_bk//director/services.conf' for zone 'zone_bk'.
[2020-11-24 09:33:17 +0100] information/ApiListener: Applying configuration file update for path '/var/lib/icinga2/api/zones-stage/zone_bk' (3852850 Bytes).
[2020-11-24 09:33:17 +0100] information/ApiListener: Received configuration updates (4) from endpoint 'icingam01.cnt.int' are equal to production, skipping validation and reload.

Hi @grootwitbaas Welcome to the community!

Be sure to checkout its guidelines :smiley:
https://community.icinga.com/guidelines

Have you tried to restart the master2 like this:

systemctl stop icinga2 && rm -rf /var/lib/icinga2/api/zones-stage/* && rm -rf /var/lib/icinga2/api/zones/* && systemctl start icinga2

You dont seem to have any problems in your syncing, but It will force a resync between the 2 masters from master1 -> master2, It might not solve your problem but its worth a try I found it here originally:

https://community.icinga.com/t/last-zone-sync-stage-validation-failed/

Based on some posts I found while looking into this i did try the following, seems to be similar to the above, but i could be wrong (has be proven wrong from time to time)
stop service

sudo rm -rf /var/lib/icinga2/api/{packages,zones,zones-stage}/*

start service again.
after this in the log tail i can see it gets the configuration from m1 but still starts to send them mails again

Yes, That will do the same :slight_smile:
It was worth a try, Assuming the syncing is now complete and not faulty.

Have you tried to enable the debug logging, and then trigger a alert?

Not yet, since i’m not so sure how to enable debug logging.
Triggering an allert is easy, they come every 30 minutes. I have replaced my email scripts with blanks so not to get bombarded by the emails, while testing, but for now i just shut down the icinga2 service on m2 and have no problems from m1
But yes the above was worth the try.

You can find that here:

tried

sudo icinga2 daemon -x debug

just a flood of logs …will try dumping to a file and see what i find thanks

MMMM seems like i lose connection between m01 and m02 …m02 asumes to be active, or am i misreading the below logs

[2020-11-24 13:58:58 +0100] information/JsonRpcConnection: The certificate for CN 'icingas01.cnt.int' is valid and uptodate. Skipping automated renewal.
[2020-11-24 13:59:00 +0100] information/JsonRpcConnection: Received certificate request for CN 'icingas02.cnt.int' signed by our CA.
[2020-11-24 13:59:00 +0100] information/JsonRpcConnection: The certificate for CN 'icingas02.cnt.int' is valid and uptodate. Skipping automated renewal.
[2020-11-24 13:59:57 +0100] information/JsonRpcConnection: No messages for identity 'icingam01.cnt.int' have been received in the last 60 seconds.
[2020-11-24 13:59:57 +0100] warning/JsonRpcConnection: API client disconnected for identity 'icingam01.cnt.int'
[2020-11-24 13:59:57 +0100] warning/ApiListener: Removing API client for endpoint 'icingam01.cnt.int'. 0 API clients left.
[2020-11-24 14:00:06 +0100] information/ApiListener: Reconnecting to endpoint 'icingam01.cnt.int' via host 'icingam01.cnt.int' and port '5665'
[2020-11-24 14:00:06 +0100] information/ApiListener: New client connection for identity 'icingam01.cnt.int' to [10.2.2.51]:5665
[2020-11-24 14:00:06 +0100] information/ApiListener: Sending config updates for endpoint 'icingam01.cnt.int' in zone 'master'.
[2020-11-24 14:00:06 +0100] information/ApiListener: Syncing configuration files for global zone 'global-templates' to endpoint 'icingam01.cnt.int'.
[2020-11-24 14:00:06 +0100] information/ApiListener: Syncing configuration files for zone 'master' to endpoint 'icingam01.cnt.int'.
[2020-11-24 14:00:06 +0100] information/ApiListener: Syncing configuration files for global zone 'director-global' to endpoint 'icingam01.cnt.int'.
[2020-11-24 14:00:06 +0100] information/ApiListener: Syncing configuration files for zone 'zone_bk' to endpoint 'icingam01.cnt.int'.
[2020-11-24 14:00:06 +0100] information/ApiListener: Finished sending config file updates for endpoint 'icingam01.cnt.int' in zone 'master'.
[2020-11-24 14:00:06 +0100] information/ApiListener: Syncing runtime objects to endpoint 'icingam01.cnt.int'.
[2020-11-24 14:00:06 +0100] information/ApiListener: Finished syncing runtime objects to endpoint 'icingam01.cnt.int'.
[2020-11-24 14:00:06 +0100] information/ApiListener: Finished sending runtime config updates for endpoint 'icingam01.cnt.int' in zone 'master'.
[2020-11-24 14:00:06 +0100] information/ApiListener: Sending replay log for endpoint 'icingam01.cnt.int' in zone 'master'.
[2020-11-24 14:00:06 +0100] information/ApiListener: Replayed 578 messages.
[2020-11-24 14:00:06 +0100] information/ApiListener: Finished sending replay log for endpoint 'icingam01.cnt.int' in zone 'master'.
[2020-11-24 14:00:06 +0100] information/ApiListener: Finished syncing endpoint 'icingam01.cnt.int' in zone 'master'.
[2020-11-24 14:00:06 +0100] information/ApiListener: Finished reconnecting to endpoint 'icingam01.cnt.int' via host 'icingam01.cnt.int' and port '5665'

No I think you are right there, I assume both hosts are in the 10.2.2.X/24? and in the same datacenter and if you ping them its a 125MS response ?

I found these base on your logs :

Perhaps its worth your time reading this

Strangly this started after a reboot of all hosts.
Yes all are hosted onprems, same VMCluster
I go read, will report back
Thanks for the help so far

I had a similar problem in the past where the state file was not in sync because of outages in the past which took longer than the replay log retention.

Have a look at https://icinga.com/docs/icinga2/latest/doc/06-distributed-monitoring/#initial-sync-for-new-endpoints-in-a-zone

Hi Dirk, yes i have read this before, however i don’t think this is relevant, since i already removed all sync data, see above:

Maybe i should also explain or detail my setup a bit more.

Master1 and Master2 - Zone Master -> dedicated Master HA responcible only for checking Icinga, core network and sending notifications.
Satelite1 and Satelite2 - Zone BK -> dedicated for running hostalive and custom service checks >3300 hosts 5 services per host
Graph01 - Zone GR -> dedicated icingaweb2 and Grafana server
DataBase01 - Zone DB -> dedicated DB server

contents of /etc/icinga2/zones.conf

object Endpoint "Master1" {
        host = "Master1"
}

object Endpoint "Master2" {
        host = "Master2"
}

object Endpoint "Satelite1" {
        host = "Satelite1"
}

object Endpoint "Satelite2" {
        host = "Satelite2"
}

object Endpoint "Graph01" {
        host = "Graph01"
}

object Endpoint "DataBase01" {
        host = "DataBase01"
}


object Zone "master" {
        endpoints = [ "Master1","Master2" ]
}

object Zone "zone_bk" {
        endpoints = ["Satelite1","Satelite2"]
        parent = "master"
}

object Zone "zone_gr" {
        endpoints = [ "Graph01" ]
        parent = "master"
}

object Zone "zone_db" {
        endpoints = [ "DataBase01" ]
        parent = "master"
}

object Zone "global-templates" {
        global = true
}

object Zone "director-global" {
        global = true
}

now all works fine as long as master 2 is shutdown (or the service icinga2 stopped)
if i start the service i can see in the log it syncs with Master1, but after some time it starts to send emails for already acknowledged hosts and services
based on the above logs posted before it seems master 2 loses conection to m1 takes control, reconects and tries to push m2 config to m1 since it thinks m1 was down. M1 rejects this since it was not down. somehow m2 then start sending them notifications. but i can’t seem to lay my finger on why.
I have restarted all servers again (áfter the upgrade yesterday)
curenlty I just leave m2 shutdown.

hi @grootwitbaas

Master1 and Master2 - Zone Master -> dedicated Master HA responcible only for checking Icinga, core network and sending notifications.
Satelite1 and Satelite2 - Zone BK -> dedicated for running hostalive and custom service checks >3300 hosts 5 services per host
Graph01 - Zone GR -> dedicated icingaweb2 and Grafana server
DataBase01 - Zone DB -> dedicated DB server

Why not make a extra satelite in the master zone:

zone1: 
- master <- ssat1,2,3
- sat1 <- agents

zone2:
- sat2 

and put the DB and grapher in that zone and keep thing a little more simple?
We run that way to offload all tasks to the satellites and keep the master free for graphing and DB stuff.

I am sure you thought about this :slight_smile: but this way now you are kinda mixing 2 icinga topologies.

From my understanding any zone, master or satelite, can only have 2 servers, agents can be more. This was pointed out to me before (Thilo from Netways also confirmed this during our setup some years ago) and also in documentation https://icinga.com/docs/icinga2/latest/doc/06-distributed-monitoring/#distributed-monitoring-scenarios (but again I have been proven wrong in my life :slight_smile: )
I have no agents, all service checks are done from Satelite’s (loadbalanced) and this works.
i run icinga service on db and grapher only to check status of them and show the complete cluster.

Hmm, interesting, I have not yet come across that statement in the documentation hence I am running 3 satelites under the master :slight_smile: (all in their own zone ,US1, US2 and EU)

As I only have one master it is hard to replicate your problem :slight_smile: Ill see If i get to play with some vm`s in the spare time :smiley:

yes you have them in zones …thats it per zone only 2. you can have 20 on the same master, but per zone only 2. got to work on my reading skillz :smiley:

1 Like