Troubleshouting on satellite

Hi

I’ve a problem with my icinga configuration.
i’ve 2 masters and many satellites / zones. All is working fine with all zones, but i’ve problem with one zone.

I’ve 2 satellites on this zone.
On this zone I’ve 5000 hosts and 2 services by host.
It could happen that one satellite should be KO (for any reasons (hardware , network, …). And when it arrive, my second satellite load explode. (3 -> 300 for example)

Then many of my checks are on timeout during this time.
When the problem on my first satellite is fixed, all returns to normal in few minutes.

Do you have any idea to improve that?
Should I split my config on more satellites? Is there any options to change that?
Thanks for help

icinga2 - The Icinga 2 network monitoring daemon (version: r2.11.3-1)

Alex

Hello Alex

One thing that can help with the load on the redundant satellite is to have the two devices larger ( more Cores and RAM) to be able to withstand the lose of the other unit.

Having more then 2 Satellites in the same zone is not recommended, as it can cause a problem with replication and sync between the nodes and in essence cause a DOS between them and the masters.
What you may want to do is create another zone in parallel to that one and split the checks from the one zone to the 2 so you have 2500 hosts per zone with 2 Satellites in each zone.

Regards

Having more than 2 Satellites in the same zone is not recommended,

Agreed.

What you may want to do is create another zone in parallel

with 2 Satellites in each zone.

Er, sorry?

Antony.