HA satellites without using config sync

ibbm · May 9, 2019, 3:28pm

hello,

i am currently trying to set up a highly available icinga satellite zone. all of the nodes in the network receive their icinga configuration from puppet, i.e. we do not use the icinga configuration synchronization mechanics (the conf.d directory is absent everywhere). all the tutorials, howtos and documentation usually recommend and describe the icinga setup using config sync, but make no remarks if this is the only possible way to achieve it. (apart from using icinga director)
adding a second endpoint to the satellite zone causes certain check results for the client to never make it to the master (at least in icingaweb2 the Next check timer is in the past). i assume the affected checks are the ones the secondary satellite schedules.

the zones.conf is identical on icinga-satellite01-test.example.com and icinga-satellite02-test.example.com

object Endpoint "icinga-client01-test.example.com" {
  host = "icinga-client01-test.example.com"
  log_duration = 0
}

object Endpoint "icinga-satellite01-test.example.com" {
  host = "icinga-satellite01-test.example.com"
  log_duration = 3600
}

object Endpoint "icinga-satellite02-test.example.com" {
  host = "icinga-satellite02-test.example.com"
  log_duration = 3600
}

object Endpoint "icinga-master01-test.example.com" {
  log_duration = 0
}

object Zone "director-global" {
  global = true
}

object Zone "global-templates" {
  global = true
}

object Zone "icinga-client01-test.example.com" {
  endpoints = [ "icinga-client01-test.example.com", ]
  parent = "test"
}

object Zone "master" {
  endpoints = [ "icinga-master01-test.example.com", ]
}

object Zone "test" {
  endpoints = [ "icinga-satellite01-test.example.com", "icinga-satellite02-test.example.com", ]
  parent = "master"
}

because we use puppet to manage the configuration in /etc/icinga2, accept_config is set to false

object ApiListener "api" {
  accept_commands = true
  accept_config = false
}

the log on the secondary satellite (icinga-satellite02-test.example.com) shows at least connectivity between the involved endpoints:

notice/ApiListener: Current zone master: icinga-satellite01-test.example.com
notice/ApiListener: Connected endpoints: icinga-client01-test.example.com (1), icinga-satellite01-test.example.com (1) and icinga-master01-test.example.com (1)

the zone master (icinga-satellite01-test.example.com) rejects the results:

notice/ClusterEvents: Discarding ‘next check changed’ message for checkable ‘icinga-client01-test.example.com!memory’ from ‘icinga-satellite02-test.example.com’: Unauthorized access.

the satellites and client hosts use icinga2 r2.10.4-1

later on we would also like to set up the master zone with two nodes, although i have no idea yet how to synchronize icinga objects which are not managed by puppet (announcements, acknowledgements, downtimes)

dnsmichi · May 9, 2019, 3:36pm

Hi,

the docs, howtos and blog posts are correct about this, using a distributed environment without the config sync is neither recommended nor best practice. You somehow can do it with managing the objects with Puppet and setting the zone attribute, but debugging is hard, especially when messages are denied, check results are late or a “check now” in the web interface doesn’t lead to anything wanted.

That being said, I remember having a discussion about this a while ago where I said that I won’t help with that. Unfortunately for you that’s still the case - so why are you trying to go the hard way, and not using the config sync here?

Cheers,
Michael

ibbm · May 10, 2019, 6:49am

Hello,

thank you for the clarification. the reason we want to manage the config with puppet is that we have reports about what changed on each node. the icinga config sync distributes the files on its own, but there is no track record about when changes where actually applied on each node.

dnsmichi · May 10, 2019, 7:49am

Hi,

I’m not sure if that’s important to know from a Puppet report. Such sanity checks could be applied at runtime e.g. when there are no check results received from the satellite. Or by checking the satellite’s API for object existence when needed.

Anyhow, I am not a deep Puppet user in this regard, maybe @mfrosch or @bsheqa know better

Cheers,
Michael

ch2 · December 5, 2022, 12:58pm

I am currently also encountering an error “notice/ClusterEvents: Discarding ‘next check changed’ message for checkable ‘…’ from ‘…’: Unauthorized access.” (in debug.log on a host that has both accept_config = true and accept_commands = true (i.e. with config sync). What could explain the error in this situation?