Loadbalancing on satellites not working as expected

rafi01010 · July 18, 2022, 1:37pm

Hello Community,

I am currently running a High Availability Cluster, and this is the setup for this question:
(master01/master02) → satellite01 (internal) → (satellite01/satellite02) (external).
And inside the external satellite zone, about 1k hosts (14k services) are checked with nrpe from the external satellites.
The setup with two satellites in the external zone (before it was only one satellite) has been running for a few days now, and I thought that both satellites would load balance the checks. But based on the procs, satellite01 has <200 procs and satellite02 has >400 procs. And comparing systemctl status icinga2.service, satellit02 is definitely running more checks than satellit01.

Each master/satellite is checked by the icinga check. I tried to read the active_service_checks metric, but the value is the same for both satellites.

Shouldn`t the value be different if one satellite does more checks than the other?
Any idea why the load balancing is not working properly?

satellite01 (external) zones.conf: (I changed the real domain to .internal and .external as in the description above)

object Endpoint "icingaproxy01.internal.de" {
        host = "ip"
        port = "5665"
}

object Zone "icingaproxy01.internal.de" {
        endpoints = [ "icingaproxy01.internal.de" ]
}

object Endpoint "icingaproxy01.external.de" {
}

object Endpoint "icingaproxy02.external.de" {
        host = "ip"
        port = "5665"
}

object Zone "icingaproxy01.external.de" {
        endpoints = [ "icingaproxy01.external.de", "icingaproxy02.external.de" ]
        parent = "icingaproxy01.internal.de"
}

object Zone "global-templates" {
        global = true
}

satellite02 (external) zones.conf:

object Endpoint "icingaproxy01.internal.de" {
        host = "ip"
        port = "5665"
}

object Zone "icingaproxy01.internal.de" {
        endpoints = [ "icingaproxy01.internal.de" ]
}

object Endpoint "icingaproxy01.external.de" {
        host = "ip"
        port = "5665"
}

object Endpoint "icingaproxy02.external.de" {
}

object Zone "icingaproxy01.external.de" {
        endpoints = [ "icingaproxy01.external.de", "icingaproxy02.external.de" ]
        parent = "icingaproxy01.internal.de"
}

object Zone "global-templates" {
        global = true
}

Version used (r2.13.2-1)
Operating System and version: Debian 11
Enabled features (api checker mainlog (satellites), api checker ido-mysql influxdb2 mainlog notification (masters))

Al2Klimov · July 26, 2022, 5:08pm

Hello @rafi01010!

Unfortunately Icinga distributes checkables not perfectly equal, but based on isEven(hash(name)).

If you shut down one of the two sats and the other gets 600 procs and the other way around as well, you know everything’s configured fine and you can’t do much about it.

Best,
A/K

rafi01010 · August 31, 2022, 10:08am

Hello @Al2Klimov,
sorry for my late reply.
Then I think everything is working fine

Can you tell me something about my first question with the metrics?

Al2Klimov · September 5, 2022, 9:31am

Maybe the icinga checks aren’t pinned to the nodes to be checked via command_endpoint?

rafi01010 · October 17, 2022, 11:52am

But that’s the point, that the two sattelites split the checks and I don’t want to pin the checks to one endpoint specifically. Both sattelites show the sum of both sattelites in the metric. And not each the number of his own checks

Al2Klimov · October 24, 2022, 3:27pm

Please share how this is configured.

rafi01010 · October 27, 2022, 7:59am

apply Service "Icinga" {
    import "generic-service"

    check_command = "icinga"
    command_endpoint = host.vars.client_endpoint

    assign where "icinga" in host.groups && host.vars.client_endpoint
}

Each Icinga Master/Sattelite has the hostgroup “icinga” and a clinet_endpoint is defined (FQDN of the master/sattelite)

Al2Klimov · October 27, 2022, 8:27am

And this one yields equal metrics despite different process amounts?

rafi01010 · October 27, 2022, 8:50am

The values are slightly different, because I have not immediately took the 2. screenshot.
Check source is also always the respective server

icingaproxy01:

icingaproxy02: