Clustering issues and apparently random assignment of Recurring Downtimes

rsturm · November 22, 2019, 7:03pm

Thanks for your efforts, very appreciated… I wonder where my mistake is.
Do you use the latest CentOS 7 too?

anon66228339 · November 22, 2019, 9:29pm

CentOS 8 But old setup was latest CentOS 7 and also worked.

rsturm · November 23, 2019, 11:23am

I found my mistake… I missed the parent directive in the child zone definition. Guess this is what we call “Betriebsblind”
I’ll correct the configs on the soon-to-be-production machines and have a look if my downtime creation problem vanishes as well.

rsturm · November 25, 2019, 9:28am

Further testing showed that the cluster communication works fine.
My initial problem with the ScheduledDowntimes stayed there however, only about 50% of the hosts got the actual downtime:

apply ScheduledDowntime "test3" to Host {
  author = "someone"
  comment = "no comment"
  ranges = { "2019-11-25" = "10:10-10:15"  }
  assign where match("*hal*",host.name) //matches all 100 dummy-hal-XXX hosts
  //assign where match("*kln*",host.name) //same behaviour
}

ScheduledDowntime is defined in Zone global-templates, dummy hosts in their zones site-hal and site-kln

icinga2 daemon -C shows 100 ScheduledDowntime Objects on all three servers, but when the actual Downtime Objects are created, the following happens:
master1: Instantiated 51 Downtimes (49 in other iterations of the tests)
master2: Instantiated 100 Downtimes
satellite: Instantiated 100 Downtimes (I once saw 149 Objects for a short moment when the downtimes were for the other zone)

Following the icinga2.log on all machines, it occured to me that the Downtime creation may be a distributed task in a HA zone and master1 does ignore all objects which are created on his partner, because accept_config = false in features-enabled/api.conf
After restarting, the behaviour is as expected, 100 of 100 hosts get their downtime objects.

The current documentation isn’t clear on that setting on the config master and I am pretty sure that in older versions of icinga2 the setting had to be false on the config master to avoid conflicts between /etc/icinga2/zones.d and the api-config.

@dnsmichi as you are following the topic: can you clarify this? Is it safe to have accept_config = true on master1?

dnsmichi · November 25, 2019, 12:19pm

I’m not really following closely, but for this statement - since 2.11 it is safe to enable the config on master1, since we’ve fixed a long lasting looping bug with downtimes/comments.

rsturm · November 25, 2019, 12:56pm

Thank you. It looks like my issues are all solved, after reactivation of director and production config everything seems fine, 5784 of 5784 downtime objects are created.

Thanks Carsten and Michael for helping out