Region / Site Specific Monitoring

acolvinPID · October 23, 2019, 2:49pm

Hello!

I am currently deploying icinga to monitor our companies infrastructure. We have servers homed in various regions with staff managing each region. Currently we have all of the checks reporting up to a master server via satellites per the distributing monitoring guide. We are using Director to configure our checks.

Is there any way to set it up where each region can have their own Icinga Web view of their own satellites and agents without having any insight into other regions / zones? We don’t necessarily want the UK team to see the alerts for the US servers and vice versa.

We still want to have one overarching master so that HQ can monitor both regions simultaneously but each individual region should only have insight into their local deployment.

Thank you for you help!

Pooh · October 23, 2019, 3:03pm

Hello!

I am currently deploying icinga to monitor our companies infrastructure. We
have servers homed in various regions with staff managing each region.
Currently we have all of the checks reporting up to a master server via
satellites per the distributing monitoring guide. We are using Director to
configure our checks.

Is there any way to set it up where each region can have their own Icinga
Web view of their own satellites and agents without having any insight
into other regions / zones? We don’t necessarily want the UK team to see
the alerts for the US servers and vice versa.

Yes - simple - just install Icingaweb2 on each Satellite, and that will then
show the checks performed by that machine and those below it.

It won’t see the Master above it or any peer-level Satellites.

We still want to have one overarching master so that HQ can monitor both
regions simultaneously but each individual region should only have insight
into their local deployment.

I do precisely this for monitoring multiple customer networks - each customer
can see their own network, and I can see all of them, plus our own network, in
one big summary.

Antony.

acolvinPID · October 23, 2019, 11:01pm

This sounds like exactly what I need,

How did you configure the IDO and Icinga DB databases on the satellite? During the initial configuration it is asking for those but my understanding is all of this is brought up to the master server rather than being stored on the satellite.

Is it necessary to reconfigure the checks on the satellite itself essentially making it a “sub-master” server of sorts?

stevie-sy · October 24, 2019, 5:13am

You can also build your own dashboards for each Group/user in icingaweb2 and bind them to the users at /etc/icingaweb2/dashboards.
So you have only to maintain the icingaweb2 instance on the master server.

twidhalm · October 24, 2019, 11:36am

You can just activate the ido feature on your satellites just like you did on your masters but with a local database. Then setup Icinga Web 2 to connect to this database.

All satellites ship their information to the masters, but they will copy their local information the the IDO as well, as soon as the ido feature is enabled.

dnsmichi · October 24, 2019, 12:53pm

Hi,

I strongly advise against satellites with local IDO and web interfaces. While technically that’s possible, it doubles up the maintenance burden.

The thing you can do with the current feature set, is to put all these hosts into specific host groups for instance, and then adjust the permissions in Icinga Web roles. Whenever users from an assigned role (zone) log in, they only see their hosts. This is typically called RBAC if you google for it.

Another idea would be to assign custom variables to these hosts, and build the permission filters based on this in Icinga Web.

This solution allows you as the admin role to see everything, and to also troubleshooting just one interface with permissions or problems, than for all of the regions. And another thing: Decentralized interfaces also expose a security risk, with also having to manage different (AD) logins and whatnot.

Cheers,
Michael

cconstantakis · October 25, 2019, 12:22am

We are looking into a similar setup with multiple large datacenters and the requirement of being able to monitor a specific site even if the main “master” zone is offline. It sounds like having local IDO and web interfaces on the datacenter satellites would be a solution. I see that you don’t suggest it so is there any other possible topology that would work.

Our fear is that a network interruption to our HQ datacenter with our masters would result in all alerting being down for our satellite zones.

Having the ability to monitor all of our infrastructure from icingaweb2 in a master zone while maintaining the ability to receive and check alerts from datacenters a/b/c from their local icinaga+web instances if master is down would be ideal if possible.

twidhalm · October 25, 2019, 7:00am

When your masters go offline, your satellites will keep monitoring. The bad thing about it is, that you won’t be able to see the results, because your source for Icinga Web 2 is offline and you won’t get any notifications. But the monitoring goes on and when the masters come online again, you should see a correct history of what has happened in the meantime.

When it’s just about SLA calculation then a standard setup should be sufficient.

rsx · October 25, 2019, 7:05am

I’ve been using such a deployment since 1,5 year at one customer. The master and every satellite has icinga2 core, icinga2-ido, icingaweb2 (incl. InfluxDB and Grafana etc.) and each of the satellite site acts on there own e.g. acknowledge alarms, managing downtimes etc. This could also be done at the master. Each of the sites has its own MTA which sends mails directly/locally. If the master is done or the connection between master an satelitte(s) is down, everything is still working and the check results incl. perfdata is (re)send once the connection is back online.

The only downside is that only one director is allow in this setup and is has to be on the master.

We don’t have been experiencing any bad behavior are faults that are caused be this deployment. And for the additional maintenance mentioned by @dnsmichi we use ansible.

dnsmichi · October 25, 2019, 7:07am

If the connection drops, the masters will not start alarming you with “late check results”. They are waiting up until the satellite zone reconnects and then the stored historical events are replayed to keep SLA reporting intact. That being said, the satellite zone continues to run checks, stores them in a local replay log, with later synchronising this again.

You can and should create single health checks with the cluster-zone check command which notify you about the outage from the master to the satellite zone. But only one check, and not many of them.

Your scenario describes exactly how the Icinga 2 Cluster is designed and intended to use.

Cheers,
Michael

JulMerlin · October 25, 2019, 3:27pm

I had made that setup & it’s working very weel. I have a zone with 9 slaves & a master zone where I have 1 server with icingaweb2.
I just didn’t find out how to use our ldap to do the mapping between teams names from the LDAP & rights. So we are using locals account for each teams.

acolvinPID · November 1, 2019, 4:09pm

Thank you all for your input!

It looks like I have a couple options when it comes to getting a setup up and running as needed. I marked what I see as being the best solution, RBAC, though depending on what access / uptime requirements my manager brings to me I may need to increase the over all work load and install the web interface on the satellites.

This info has been hugely helpful. Thank you all!