Master / Sat / remote Sat / agent

10RUPTiV · March 9, 2022, 1:13pm

Hey guys…

Need a little bit of your toughts about our actual setup…

Right now we have

1x Master
25x remote endpoint (connected to master, we used to called them poller, as it’s a physical box at each remote location)
300x hosts (agent) that are connected at 95% to one of the 25 remote endpoint and some of them direcly to the master

Should we move our setup to have 2 satellite between the master and the 25 remote endpoint to offload the workload on the master ?

Pooh · March 9, 2022, 1:36pm

My immediate answer to “Should we do X to offload the workload on the master?”
is to ask “How busy is your master?”

What sort of workload does it currently have, and does that seem reasonable
for whatever spec server it is?

Antony.

10RUPTiV · March 9, 2022, 1:58pm

@Pooh

The master itself is running at 15% of the processor all the time… and the same “master” is also running the icingaweb2

My concern is more about optimal setup… right now all our 25 remote endpoint are connected to the master directly, if we setup 2 satellites at 2 different location of the master, it will add some layer of HA ?

Pooh · March 9, 2022, 2:25pm

The master itself is running at 15% of the processor all the time… and
the same “master” is also running the icingaweb2

That sounds pretty reasonable to me.

My concern is more about optimal setup… right now all our 25 remote
endpoint are connected to the master directly, if we setup 2 satellites at
2 different location of the master, it will add some layer of HA ?

I wouldn’t quite call that HA, but you might regard it as an improvement:

your current Master is clearly a Single Point Of Failure
implementing a Satellite for 12 endpoints and another one for 13 of them
means you can still see half your network if one of the Satellites dies
it also means the Satellites will collect data from the endpoints even if
they (either, or both) cannot see the Master
if the Master goes down, the Satellites will continue doing their job; you
just can’t reconfigure stuff until the Master comes back
you can run Icingaweb2 on the Satellites as well if you wish; it will only
show the endpoints connected to that Satellite

Icinga can also do 2-Master HA which may be what you want:
https://icinga.com/docs/icinga-2/latest/doc/06-distributed-monitoring/
#high-availability-master-with-agents

Antony.

10RUPTiV · March 9, 2022, 3:08pm

@Pooh

What about the fact that 2 satellite can load balance themself ? I saw that somewhere…

Our setup will look like this ?

aclark6996 · March 9, 2022, 6:43pm

You should add a 2nd master server as @Pooh said if your worried about high availability. Once you add the 2nd master your checks and notification will get load balanced between both masters so that will help reduce the load on your current master.
I’m not sure what a ‘remote endpoint’ is. I did not see that term used anywhere in the documentation. I cannot comment on a remote endpoint.

Alex

10RUPTiV · March 9, 2022, 6:56pm

@aclark6996

We are monitoring remote site, and each “agent” at each remote location CAN’T reach the master directly so we are having a physical machine at each location that act as a remote satellite!

We are having one ZONE per physical machine at each remote location.

The parent for those zone are the master (internally we called them poller)
Instead of having ALL “poller” connecting to the master, we will like to connect them to the 2 new satellite as our drawing…

That will do something like

agent1 will connect to his local “poller”, this local poller will send data to satellite1/2 that will then send back to master…

agent1 zone parent will be poller1
poller1 zone parent will be satellite1/2 zone
satellite1/2 zone parent will be master

But not sure that’s possible… in my understand it’s like having a zone in a zone ?

10RUPTiV · March 9, 2022, 7:07pm

Found my answer

https://icinga.com/docs/icinga-2/latest/doc/15-troubleshooting/#zones-in-zones-doesnt-work

aclark6996 · March 9, 2022, 7:16pm

Glad you found you answer. Icinga is open source so you can set it up many way so I never heard of a zone inside a zone.
Regarding your setup I would change my local ‘poller’ to a Icinga satellite if possible. All the local agents that cannot communicate with the master will communicate with the local Icinga satellite. Setup the Icinga satellite to communicate with the master. You still have the problem with a single point of failure at your master and each site. You can add a second master and a second satellite at each site to solve the SPOF. This is described here in the documentation.

Regards
Alex