Satellite load balancing

Hello,

I have deployed an HA satellite setup, with two satellite within the same zone. I was expecting the satellites to load-balance my checks as per documentation. However, looking at tcpdump outrouts, the checks (hostalive) are performed by both satellites instead.

High-level info

  • Incinga : r2.13.1-1
  • Operating System : Ubuntu 20.04.3 LTS with 5.4.0-1045-aws kernel.
  • Enabled features (master) : api checker ido-pgsql influxdb2 mainlog notification
  • Enabled features (satellites) : api checker mainlog

Config files

zones.conf - master

object Zone "global-templates" {
  global = true
}

object Endpoint "master01.lab" { }

object Zone "master" {
  endpoints = [ "master01.lab" ]
}

object Endpoint "satellite01.lab" {
  # log_duration = 0       # Use command mode instead; does not help.
}

object Endpoint "satellite02.lab" {
  # log_duration = 0       # Use command mode instead; does not help.
}

object Zone "satellite-germany" {
  endpoints = [ "satellite01.lab", "satellite02.lab" ]
  parent = "master"
}

zones.conf - satellites

object Zone "global-templates" {
  global = true
}

object Endpoint "master-01.lab" {
  host = "10.0.10.100"
}

object Zone "master" {
  endpoints = [ "master-01.lab" ]
}

object Endpoint "satellite01.lab" {
  # log_duration = 0       # Use command mode instead; does not help.
  # host = "10.0.20.150"   # Let sattelite02 connect to sattelite01; does not help.
}

object Endpoint "satellite02.lab" {
  # log_duration = 0       # Use command mode instead; does not help.
}

object Zone "satellite-germany" {
  endpoints = [ "satellite01.lab", "satellite02.lab" ]
  parent = "master"
}

Host

object Host "ExampleHost" {
  address = "10.20.50.222"
  check_command = "hostalive"
  check_interval = 2m
  max_check_attempts = 10
  zone = "satellite-germany"
}

Tcpdump

Example tcpdump output below; both satellite pings the host every 2mn.

satellite01

09:58:01.375136 IP 10.0.20.150 > 10.20.50.222: ICMP echo request, id 58014, seq 1, length 64
09:58:01.403512 IP 10.20.50.222 > 10.0.20.150: ICMP echo reply, id 58014, seq 1, length 64
10:00:01.374583 IP 10.0.20.150 > 10.20.50.222: ICMP echo request, id 59444, seq 1, length 64
10:00:01.402910 IP 10.20.50.222 > 10.0.20.150: ICMP echo reply, id 59444, seq 1, length 64
10:02:01.374966 IP 10.0.20.150 > 10.20.50.222: ICMP echo request, id 60877, seq 1, length 64
10:02:01.403399 IP 10.20.50.222 > 10.0.20.150: ICMP echo reply, id 60877, seq 1, length 64
10:04:01.373899 IP 10.0.20.150 > 10.20.50.222: ICMP echo request, id 62305, seq 1, length 64
10:04:01.402275 IP 10.20.50.222 > 10.0.20.150: ICMP echo reply, id 62305, seq 1, length 64

satellite02

09:58:54.774500 IP 10.0.20.160 > 10.20.50.222: ICMP echo request, id 36910, seq 1, length 64
09:58:54.804295 IP 10.20.50.222 > 10.0.20.160: ICMP echo reply, id 36910, seq 1, length 64
10:00:54.773566 IP 10.0.20.160 > 10.20.50.222: ICMP echo request, id 38343, seq 1, length 64
10:00:54.803389 IP 10.20.50.222 > 10.0.20.160: ICMP echo reply, id 38343, seq 1, length 64
10:02:54.773816 IP 10.0.20.160 > 10.20.50.222: ICMP echo request, id 39768, seq 1, length 64
10:02:54.803616 IP 10.20.50.222 > 10.0.20.160: ICMP echo reply, id 39768, seq 1, length 64
10:04:54.773972 IP 10.0.20.160 > 10.20.50.222: ICMP echo request, id 41196, seq 1, length 64
10:04:54.803758 IP 10.20.50.222 > 10.0.20.160: ICMP echo reply, id 41196, seq 1, length 64

Hi,

can you please show us the running configuration details for the host in question? Use the following command to fetch the running configuration for the host object:

icinga2 object list --name *Example* --type Host

Please run the command on both satellites and post the output.

Best regards
Michael

Here you go :slight_smile: . I am not sure if it makes a difference, but that host was added trough the API.

Note that I did omit some of the config that didn’t seem relevant (e.g. retry interval or ping packet count) and/or was sensitive (e.g. custom variables). If you need to see some of this info let me know.

satellite01

Object 'ExampleHost' of type 'Host':
  % declared in '/var/lib/icinga2/api/packages/_api/491ad95a-5a14-42d4-92c7-2e7c5ac0d5b1/conf.d/hosts/ExampleHost.conf', lines 1:0-1:29
  * __name = "ExampleHost"
  * action_url = ""
  * address = "10.20.50.222"
    % = modified in '/var/lib/icinga2/api/packages/_api/491ad95a-5a14-42d4-92c7-2e7c5ac0d5b1/conf.d/hosts/ExampleHost.conf', lines 4:2-4:27
  * address6 = ""
  * check_command = "hostalive"
    % = modified in '/var/lib/icinga2/api/zones/global-templates/_etc/hosts.conf', lines 4:5-4:31
  * check_interval = 120
    % = modified in '/var/lib/icinga2/api/zones/global-templates/_etc/hosts.conf', lines 6:5-6:23
  * check_period = ""
  * check_timeout = null
  * command_endpoint = ""
  * display_name = "ExampleHost"
  * enable_active_checks = true
  * enable_event_handler = true
  * enable_flapping = false
  * enable_notifications = true
  * enable_passive_checks = true
  * enable_perfdata = true
  * event_command = ""
  * flapping_ignore_states = null
  * flapping_threshold = 0
  * flapping_threshold_high = 30
  * flapping_threshold_low = 25
  * groups = [ ]
  * icon_image = ""
  * icon_image_alt = ""
  * max_check_attempts = 10
    % = modified in '/var/lib/icinga2/api/zones/global-templates/_etc/hosts.conf', lines 11:5-11:27
  * name = "ExampleHost"
  * notes = ""
  * notes_url = ""
  * package = "_api"
  * retry_interval = 60
    % = modified in '/var/lib/icinga2/api/zones/global-templates/_etc/hosts.conf', lines 9:5-9:23
  * source_location
    * first_column = 0
    * first_line = 1
    * last_column = 29
    * last_line = 1
    * path = "/var/lib/icinga2/api/packages/_api/491ad95a-5a14-42d4-92c7-2e7c5ac0d5b1/conf.d/hosts/ExampleHost.conf"
  * templates = [ "ExampleHost", "host-basic" ]
    % = modified in '/var/lib/icinga2/api/packages/_api/491ad95a-5a14-42d4-92c7-2e7c5ac0d5b1/conf.d/hosts/ExampleHost.conf', lines 1:0-1:29
    % = modified in '/var/lib/icinga2/api/zones/global-templates/_etc/hosts.conf', lines 2:1-2:26
  * type = "Host"
  * vars
    [...OMITTED...]
  * volatile = false
  * zone = "sattelite-germany"
    % = modified in '/var/lib/icinga2/api/packages/_api/491ad95a-5a14-42d4-92c7-2e7c5ac0d5b1/conf.d/hosts/ExampleHost.conf', lines 15:2-15:27

satellite02

Object 'ExampleHost' of type 'Host':
    % declared in '/var/lib/icinga2/api/packages/_api/71a1f45e-daed-41c8-a7d3-1c0c9dd7b5c5/conf.d/hosts/ExampleHost.conf', lines 1:0-1:29
    * __name = "ExampleHost"
    * action_url = ""
    * address = "10.20.50.222"
      % = modified in '/var/lib/icinga2/api/packages/_api/71a1f45e-daed-41c8-a7d3-1c0c9dd7b5c5/conf.d/hosts/ExampleHost.conf', lines 4:2-4:27
    * address6 = ""
    * check_command = "hostalive"
      % = modified in '/var/lib/icinga2/api/zones/global-templates/_etc/hosts.conf', lines 4:5-4:31
    * check_interval = 120
      % = modified in '/var/lib/icinga2/api/zones/global-templates/_etc/hosts.conf', lines 6:5-6:23
    * check_period = ""
    * check_timeout = null
    * command_endpoint = ""
    * display_name = "ExampleHost"
    * enable_active_checks = true
    * enable_event_handler = true
    * enable_flapping = false
    * enable_notifications = true
    * enable_passive_checks = true
    * enable_perfdata = true
    * event_command = ""
    * flapping_ignore_states = null
    * flapping_threshold = 0
    * flapping_threshold_high = 30
    * flapping_threshold_low = 25
    * groups = [ ]
    * icon_image = ""
    * icon_image_alt = ""
    * max_check_attempts = 10
      % = modified in '/var/lib/icinga2/api/zones/global-templates/_etc/hosts.conf', lines 11:5-11:27
    * name = "ExampleHost"
    * notes = ""
    * notes_url = ""
    * package = "_api"
    * retry_interval = 60
      % = modified in '/var/lib/icinga2/api/zones/global-templates/_etc/hosts.conf', lines 9:5-9:23
    * source_location
      * first_column = 0
      * first_line = 1
      * last_column = 29
      * last_line = 1
      * path = "/var/lib/icinga2/api/packages/_api/71a1f45e-daed-41c8-a7d3-1c0c9dd7b5c5/conf.d/hosts/ExampleHost.conf"
    * templates = [ "ExampleHost", "host-basic" ]
      % = modified in '/var/lib/icinga2/api/packages/_api/71a1f45e-daed-41c8-a7d3-1c0c9dd7b5c5/conf.d/hosts/ExampleHost.conf', lines 1:0-1:29
      % = modified in '/var/lib/icinga2/api/zones/global-templates/_etc/hosts.conf', lines 2:1-2:26
    * type = "Host"
    * vars
      [...OMITTED...]
    * volatile = false
    * zone = "sattelite-germany"
      % = modified in '/var/lib/icinga2/api/packages/_api/71a1f45e-daed-41c8-a7d3-1c0c9dd7b5c5/conf.d/hosts/ExampleHost.conf', lines 15:2-15:27

Hi,

the satellite hosts need a connection between them, otherwise they run in split-brain mode where both satellites think the other one is down and they have to take over.

Adjust your zones.conf file on both satellites like the following

zones.conf - satellite-1

[...]
object Endpoint "satellite01.lab" {
  # that's us
}

object Endpoint "satellite02.lab" {
  host = "<ip address of satellite-2>"
}

object Zone "satellite-germany" {
  endpoints = [ "satellite01.lab", "satellite02.lab" ]
  parent = "master"
}
[...]

zones.conf - satellite-2

[...]
object Endpoint "satellite01.lab" {
  host = "<ip address of satellite-1>"
}

object Endpoint "satellite02.lab" {
 # that's us
}
object Zone "satellite-germany" {
  endpoints = [ "satellite01.lab", "satellite02.lab" ]
  parent = "master"
}
[...]

Now restart both satellite hosts.

Best regards
Michael

2 Likes

That was it. I tried that config before, but I messed-up the firewall rules and the satellites were unable to speak to each other. It is working as expected now. Thanks!