A question about zones

gsf · May 9, 2019, 9:55am

Hello there!

So I have question regarding zones. I know about the bug (not more than 2 endpoints per zone) so I thought about doing something like that:

1 “meta” zone which has the master zone as parent - called “META-ZONE”
let’s say 2 satellite zones with satellite endpoints in it, with the “META-ZONE” as parent - called “SAT-ZONE-1” and “SAT-ZONE-2”

Now, if were to add a host to the “META-ZONE” would the satellites check the new host?

If not, what are the best practices regarding zone management? I’d rather not have 1 satellite zone with 2 humungous satellites in it.

Thanks in advance!

blakehartshorn · May 9, 2019, 2:14pm

Hi Giuseppe,

Can I get a better idea of the layout of your infrastructure? Icinga2 is pretty efficent. I’ve got more than 20,000 service checks running in my various satellite zones and it only takes a pair of small virtual machines in each. Not sure why we would need these buffers.

If you put something within a parent zone, the child zones aren’t going to see it unless it’s defined as global. Putting hosts in there has the potential to make a mess, though.
(my lingo on this last line might have been totally wrong and I am currently too foggy to fix it.)

gsf · May 9, 2019, 3:01pm

Hi Blake,

thank you for your answer. We have more than 1700 hosts we want to monitor with about three dedicated satellite zones. One zone is going to pretty big, with like 90% the hosts I mentioned.

Right now we have about 10,000 service checks in our old environment and this is with about 800 hosts.

Because of that we want to have more than two satellites in that zone. But it seems like your saying these two would be sufficient for this number of hosts?

What you described in the second part makes more than sense, I suspect that, too. Just wanted to be sure.

blakehartshorn · May 9, 2019, 3:09pm

Oh you’re going to be fine in that case. I have 2 satellites in my Vegas DC which is our biggest zone. Roughly 1.000 hosts and 20,000 service checks in that zone. On top of that, I have a ton of checks written in python that run locally on the satellites about every 90 seconds each. Those are running 6 CPU cores each, hovering around 60% usage at the moment, and it’s only using 587M of memory.

You could probably start at less cores if you’re less abusive than I am. Increase as needed. Give a little extra to your masters if they’re also monitoring a DC.

gsf · May 9, 2019, 3:14pm

Thanks a lot! You helped a ton with that.

dnsmichi · May 9, 2019, 3:32pm

Also note that an old environment being Icinga 1.x or Nagios Core means limited scaling horizontally. Icinga 2 takes the CPUs and resources it gets assigned, and is optimized for less resource usage too. If not, that’s typically a bug - that’s also a reason for rewriting the network stack in 2.11 for example.

20k services is something where I would say you should have at least 4 cores, better 8. Depending on the checks, 8 GB RAM - if the masters run MySQL and/or Graphite, double this.

If you experience problems with too many checks in parallel which last long, try the MaxConcurrentChecks setting.

Cheers,
Michael