HA Master setup in different sites

Hello,

we have a 2 Master Setup with multiple Satellite server. The Master servers are each on a different site, the Satellites, too. The sites are connected via VPN in a mesh (Each site has a VPN connection to each other site). The database is in a Galera Cluster with a dummy node in a third site.

We had now the issue that the VPN connection from the Master 1 site to a Satellite site was down. The Master 2 was still connected to this Satellite zone. This Satellite zone was reported as not connected because it was checked from Master 1 in this moment.

Is it possible to create a dependency between both Master servers, so that if one of them reports a check as critical, the other master will first check before an alert is generated?

Hello @ma_bsi
Did you change the enable_ha attribute in the NotificationComponent object from the default in your setup? The default value (true) should configure only one notification to get sent out when a High Availability Dual Master setup is configured.

Alex

Hi Alex,

the enable_ha attribute is set to true and it is working like it should. But that is not my problem. The problem is that the masters are on two different sites and when the one master that is performing the zone check of a zone that is not reachable from that master, it is reported as CRITICAL while the other master can still reach the zone. I want to have a dependency between both masters, so that a check is only CRITICAL if both had checked it. I hope something like that is possible.

Best regards,
Martin

Hello @ma_bsi,
I believe the configuration your hoping for is not possible. If setup correctly both Masters should be able to able to perform checks on all agents. If Master 1 is not reachable then the Master 2 should take over all checks until Master 1 is reachable. It is not possible to have a dependency between both masters.

Did you reference the High Availability documentation when configuring your Master servers? It sounds like your zones.conf files are not setup for both masters. Can you share your zones.conf files for your Master and agents?

Alex

Hello Alex,

yes, I referenced the High Availability documentation and everything is working fine. The checks and notifications are loadbalanced between both masters.

Anyway here is the zones.conf of one master. I doublechecked, both masters have the same file.

object Zone “global-templates” {
global = true
}

object Zone “master” {
endpoints = [ “pavsnmp2.company.enterprise”, “parsnmp1.company.enterprise”]
}

object Zone “bcn” {
endpoints = [ “bcnsnmp1.company.enterprise” ]
parent = “master”
}

object Zone “ist” {
endpoints = [ “istsnmp1.company.enterprise” ]
parent = “master”
}

object Zone “lax” {
endpoints = [ “laxsnmp1.company.enterprise” ]
parent = “master”
}

object Zone “mod” {
endpoints = [ “modsnmp1.company.enterprise” ]
parent = “master”
}

object Zone “par” {
endpoints = [ “parsnmp2.company.enterprise” ]
parent = “master”
}

object Zone “pav” {
endpoints = [ “pavsnmp1.company.enterprise” ]
parent = “master”
}

object Endpoint “pavsnmp2.company.enterprise” {
host = “10.4.1.162”
}

object Endpoint “parsnmp1.company.enterprise” {
host = “10.2.1.161”
}

object Endpoint “bcnsnmp1.company.enterprise” {
host = “10.9.1.161”
}

object Endpoint “istsnmp1.company.enterprise” {
host = “10.26.1.161”
}

object Endpoint “laxsnmp1.company.enterprise” {
host = “10.19.1.161”
}

object Endpoint “modsnmp1.company.enterprise” {
host = “10.7.1.161”
}

object Endpoint “parsnmp2.company.enterprise” {
host = “10.2.1.159”
}

object Endpoint “pavsnmp1.company.enterprise” {
host = “10.4.1.161”
}

And here is the zones.conf of one of the satellites

object Zone “global-templates” {
global = true
}

object Zone “master” {
endpoints = [ “pavsnmp2.company.enterprise”, “parsnmp1.company.enterprise” ]
}

object Endpoint “pavsnmp2.company.enterprise” {
}

object Endpoint “parsnmp1.company.enterprise” {
}

object Zone “par” {
endpoints = [ “parsnmp2.company.enterprise” ]
parent = “master”
}

object Endpoint “parsnmp2.company.enterprise” {
}

To make it HA I had to put the master servers on 2 different sites. Otherwise, if both masters are on the same site and the site goes down, I don’t have any monitoring at all, which is not HA in my understanding.

If the configuration I’m hoping for is not possible, I wonder how other people do their HA setup over multiple sites? Maybe I’m thinking of it the wrong way?

Martin

Hello Martin,
Is parsnmp2 a true “Satellite” server? Does it perform check on other agents?

In the zones.conf file for parsnmp2 the zone is “Master”. In the zones.conf file for Master the zone is “par”. I believe this is the problem. I have edited the zones.conf file for your Master below.

object Zone “global-templates” {
global = true
}

object Zone “master” {
endpoints = [ “pavsnmp2.company.enterprise”, “parsnmp1.company.enterprise”]
}

object Zone “bcn” {
endpoints = [ “bcnsnmp1.company.enterprise” ]
parent = “master”
}

object Zone “ist” {
endpoints = [ “istsnmp1.company.enterprise” ]
parent = “master”
}

object Zone “lax” {
endpoints = [ “laxsnmp1.company.enterprise” ]
parent = “master”
}

object Zone “mod” {
endpoints = [ “modsnmp1.company.enterprise” ]
parent = “master”
}

//object Zone “par” {
//endpoints = [ “parsnmp2.company.enterprise” ]
//parent = “master”
//}

object Zone “pav” {
endpoints = [ “pavsnmp1.company.enterprise” ]
parent = “master”
}

object Endpoint “pavsnmp2.company.enterprise” {
host = “10.4.1.162”
}

object Endpoint “parsnmp1.company.enterprise” {
host = “10.2.1.161”
}

object Endpoint “bcnsnmp1.company.enterprise” {
host = “10.9.1.161”
}

object Endpoint “istsnmp1.company.enterprise” {
host = “10.26.1.161”
}

object Endpoint “laxsnmp1.company.enterprise” {
host = “10.19.1.161”
}

object Endpoint “modsnmp1.company.enterprise” {
host = “10.7.1.161”
}

object Endpoint “parsnmp2.company.enterprise” {
host = “10.2.1.159”
}

object Endpoint “pavsnmp1.company.enterprise” {
host = “10.4.1.161”
}

In my Icinga configuration I have the same setup that you are talking about. I have two Master servers at two different datacenters. If one datacenter goes offline the Icinga monitoring is still available. I have stopped the Icinga services on Master 1 may times and the failover works great. This is very helpful when upgrading to a new version of Icinga. I do not have any satellites server in my configuration

Alex

Hi Alex,

yes, parsnmp2 is a true Satellite server that performs checks on other agents. And my configuration is working fine.

When I shutdown Master 1 or the whole site of Master 1 is down the failover works as expected. That is not the problem.

Let me try to describe my problem a bit different and simplified.
I have 3 datacenters. Each is connected via VPN to each other datacenter. So Master 1 has a connection to the Satellite, Master 2 has a connection to the Satellite and also Master 1 has a connection to Master 2. Now the VPN connection from Master 1 to the Satellite goes down. Master 1 and Master 2 are still seeing each other. Master 2 still sees the Satellite. But Master 1 does not see the Satellite. When Master 1 checks the Satellite it reports CRITICAL. But if Master 2 would check the Satellite it would report OK.

That’s why I asked if there is a possibility to install a dependency between the checks of the master servers.

Martin

Thanks for the simplified explanation of your setup. Yea, I do not know of a way to add a dependency between Masters like you asking. I do not think this is possible. I do not have any satellites servers in my setup. Maybe another person in the Icinga community with satellites server in their setup can provide feedback on how they manage this.

Hello @ma_bsi!

Have you tried to create two services instead of one, pin each to another master (command_endpoint) and create a third service with check_command = "dummy" which fetches the two others’ check results and compares them?

Best,
AK

Hello @Al2Klimov,

this seems like it is what I’m looking for. I will try it when I got the time.

Thank you,
Martin