HA and LB functionality

Hey community,

Do you happen to know if the following configuration would work with Icinga2?

Master Zone

  • Masters: 4xMasters within same zone
    

Zone 1:

  • 4xSatellites within same zone
    
  • 12xClients with 4 satellites parents
    

Zone 2:

  • 2xSatellites within same zone
  • 12xClients with 2 satellites parents.
    

Questions:

  • Would H/A work fine between masters?
    
  • Would H/A work fine between satellites within same zone?
    
  • Would H/A work fine between Clients within same zone?
    
  • Would Clients connect to 2nd satellite if 1st go down?
    
  • Would balancing work?
    
  • Should I specify all clients endpoints on satellite nodes?
  • Should all clients within a zone know about each other in case we have 12 clients?

Asking due to well-known bug # https://github.com/Icinga/icinga2/issues/3533

Thanks everyone for helping!

Hi,

the issue still open, so no, the master and satellite zones cannot be built like this. Reduce the endpoints per zone to 2 and the following answers apply:

Yes.

Yes.

Yes, but that doesn’t make sense. Agents/Clients should only live in their own local zone.

Agents/Satellites need to be configured to all their parent endpoints. They will actively connect to them if the host attribute in the Endpoint objects are specified. Nevertheless, the cluster needs these active connections, there is no left then right decision making here.

Balancing checkable and notification objects in a HA zone works just fine with 2 endpoints.

You need to, as otherwise the satellite doesn’t execute checks there nor trusts the agent/client to receive check results.

No, each agent/client needs its own zone. Best practice is to just the same name as for the endpoint’s FQDN name.

Cheers,
Michael

1 Like

Thank you very much @dnsmichi!

So, if we have each agent/client in its own zone, how to specify which agent/client is responsible for?

Currently, I have the following:

Zone: satellite-europe
endpoints: satellite1, satellite2

Zone: europe
client 1, client 2, client 3, client 4

Zone: satellite-canada
endpoints: satellite1, satellite2

Zone: canada
client 1, client 2, client 3, client 4

If I specify to poll infra in Canada from zone canada, everything looks working fine, but I didn’t try with more then 4 clients yet. How zones file will look like in case each agent/client should have own zone?

Will I need to specify satellite zone (satellite-canada) to poll hosts from the Cadana only, instead of canada zone? Or how better to balance the number of hosts within a region?

Thank you in advance for help!

I’d suggest following the scenarios chapter in the docs to get a better idea how things are organized with zones and endpoints. Typically a zone isn’t necessarily a location or area, it is a defined container for trust against parents, children and same level endpoints.

Thank you very much @dnsmichi! I really appreciate your help with my case! Owe you a lunch on the next meetup! :slight_smile:

Yeah, that’s the problem, I have read scenarios and entire documentation at least 10 times, but it doesn’t says anything about huge setups that I’m going to have. I totally understand that zone isn’t a location, and the well-known bug completely brakes my theory.

Is there any example of balancing all hosts monitored on all agents/clients within a location? Or could you please describe your thoughts on that?

Guessing in such difficult setups the Icinga Director is useless, right?

Thanks!

Hi,

The original topic doesn’t really cover the scenario you’re going to build, so I am not sure where you’re heading. The third scenario on the docs builds a 3 level cluster which typically is what large customer environments have - a master zone, many satellite zones where the checks are scheduled, and agents which only execute local checks.

An agent/client is just an execution bridge for executing local checks. Say, disk, IO, load, etc. - or any type of local access for a RDBMS which cannot be monitored via remote.

The satellite zone specifies where checks are scheduled, they may be executed over at the agents via command endpoint, or fired via remote, some tcp, ping, http checks for instance.

Such a satellite zone can be put into a DMZ within a specified location. Making them two endpoints ensures that the location is still reachable when one of them dies or reloads.

With having multiple satellite zones, you’ll define the hosts and services being checked in each. As said, either directly on the agent or remotely accessible services.

More

Once you’re built such a scenario with a 3 level cluster, and it works and you know its edge cases and how to troubleshoot it, you can look into more complex setups.

E.g. with adding another layer with two or more satellite zones. master → satellite-country → satellite-city → agent. This is fully supported and also works with the signing methods for setting up agents and satellites, also with config syncs and synchronizing check results.

Still, the more layers are in your cluster tree, the more complex it gets. And typically, granting enough resources to satellite endpoints executing the checks ensures that Icinga scale horizontally.

The key is to try things out before putting it onto production. Or get a training or workshop on this, if it is still unclear.

Director

The Director can be used in any of the described scenarios, you just need to learn how agent hosts and cluster zones work in the background. Still, I’ve found it very convenient - when you know how the Icinga distributed system works of course.

Cheers,
Michael

Thank you very much @dnsmichi. I really appreciate your authoritative opinion on my case. I’d love to take a training, but based on the schedule there are no trainings in the US yet…

That’s the perfect scenario you described. I have a country and two cities, each city has 2 datacenters, they are fully isolated from each other.

I just can’t understand 2 things:

  1. I have single import from SQL for 10k hosts for single datacenter, what zone should I specify, if you’re saying I should keep a clients in separate zones.

  2. How to tell satellites to run active checks on Clients (let’s say I have 30 Clients connected to Satellite to monitor 10k hosts), instead let Clients balance the work between each other.

Thank you for your help!

If there any option I can sign up for a paid training and workshop in US, I’d much appreciated for any advice.

Thank you!

Hi,

in terms of training and more help, I’d suggest getting in touch with my colleagues who may route you better.

I’m not entirely sure what

I have single import from SQL for 10k hosts for single datacenter

means, but likely that means “Director import”. The hosts should be put into the zone where the endpoint schedules the checks. If all these hosts belong to a country and and a specific city, put them into the city zone.

Icinga knows about indirectly connected zones by configuring them in zones.conf, and will ensure to sync the city zone to the country zone, whose endpoints will forward the configuration.

The agent/client only needs its own Endpoint/Zone for configuration reasons. You don’t sync any local config to it from the zones.d directory.

In that description, a client is still a satellite on the second level, who actively execute the checks then being run. An agent/client instance of Icinga 2 is running on the hosts then, making them an agent.

Start simple. Install a test environment with a master, a satellite for a country, below a satellite for a city and then an agent. Once you are able to schedule a check and see it on the master.

Then continue with more agents, and once that works, continue with more cities and then countries.
Or get in touch with an Icinga partner as said :slight_smile:

Cheers,
Michael