Implementing HA Master-Master setup

Hello,

I am trying to setup and configure a HA Master-Master distributed setup, but I’m having trouble understanding the best implementation for this solution.

Currently, I have installed icinga2 and icinga2web on both servers ( both with their own databases IDO etc).
You can find the scheme of the installed solution bellow:

Will this work ? I mean there are seperate IDO databases on both servers, adding both masters in a zone will the databases be replicated automatically ?

Any other recommendation regarding the best possible solution for what I’m trying to achieve will be highly welcome.

Best Regards,
Panagiotis

Yes, a setup with 2 masters where each has their own database does work.
Icinga 2 keeps the data synced in both databases. For this you need to set enable_ha=false in the ido config, so both masters write into their own database.
Also Icinga 2 does not have an active-passive setup but all endpoints in the same zone (here the masters in the master zone) each get a portion of the checks and are mainly responsible for them. The checks will only switch to the other endpoint in a zone, if the endpoint originally holding the checks is down.

Iirc the setup with two masters where each has a local DB is even favoured over 2 masters with a central single db instance

2 Likes

Thank you @log1c for the enlightening answer.
I have successfully created what is needed.
One question though.
If I do not put enable_ha=false one of the two IDO databases gets disabled. As far as documentation includes that is the expected behavior.
Will this mean that the databases do not get synced?
If I create client checks on master1 will the checks get propagated to master2 ?
Lastly, I want to enable director, will director database get synced as well ?
Sorry if my questions are somewhat “noobie” but after reading the documentation many many times, I still find it somehow chaotic.

Best Regards,
Panagiotis

Hm, can’t answer that as I don’t know. Maybe someone else can share their thoughts.

Yes, checks are distributed between the two masters and the last remaining one takes over in case of a failure (some goes for satellites in a zone).

No, this DB does not get synced. Also only one of the masters can be the config master.

Hi,

replication inside the MySQL cluster needs to be done by the user. This is a separate HA cluster, and has nothing to do with Icinga and its HA capabilities.

The best thing is a central virtual cluster IP address where both Icinga instances can write their IDO data into. With enable_ha being true by default, only one master will actively write to the database backend at the time.

If you don’t have the possibility to create a MySQL cluster with a VIP by yourself, you can use local databases on each Icinga master, where each of them writes to the local database. This is what enable_ha=false ensures, both IDO features are active and running.

In addition to the two masters with local databases, there’s also users who build their own MySQL replication between the nodes. This works ™ but if you run into slave lag synchronisation, this will generate a huge kab00m with DB exceptions, huge bin logs, etc. I don’t recommend that scenario unless you are an experienced DB admin.

Instead, a central database on a dedicated host, or a DB cluster on that IP works best in a HA enabled Icinga master scenario. Try it out simple, create two masters, one DB host and test the runtime behavior.

Cheers,
Michael

3 Likes

So, MySQL cluster with VIP is not an option since the two icinga masters servers with the local databases are in different regions ( one in Frankfurt and the other in UK ).

What I’m trying to achieve here is :

  1. Two icinga masters
  2. Both synced ( checks and databases/database )
  3. If one goes down the other one takes over.

I was under the impression that icinga2 HA means that the IDO database is being replicated automagically. Is icinga2 HA only for the icinga2 services and not including databases?

If I enable HA on the IDO databases, I lose the one database, so since one IDO database is active at the time, will DNS records ( active - failover ) work if both databases have the same hostname ( different IP ) and I setup the IDO connection to be the hostname ( instead of IP ). In case the one IDO database goes down will the other one come up?

I’m trying to find the best possible scenario for a truly high available solution, since these servers will monitor a huge amount of servers and I want to make sure that the availability of the monitoring system is 99,99% SLA uptime.

That’s what I thought, too.
I’m running one setup with two masters were each has their own database server and enable_ha=false. Previously this was a two-node (I know…) Galera “cluster”, which I deactivated.
Have never check if the databases on the two servers are really the same.

Hi,

Use Vagrant or your cloud provider and build a simple setup with 2 masters and 1 DB host as VMs. Test your scenarios in there, and collect your answers. You will see how replication works, and which parts of HA work as described in the documentation. If you miss IDO replication being described in the docs, it doesn’t exist. The docs are complete and every supported functionality is located there.

I just wanted to add, since you mentioned the different regions with data centers spread around the globe: Ensure that this isn’t a low latency connection. Otherwise the HA syncs (config, check results, etc. as runtime data) will be slow and you won’t gain any benefit from that.

The cluster functionality takes care of syncing runtime events to any node in the same zone. Meaning to say, if you schedule a downtime, or a new check result reaches the master zone, both masters receive the message and will forward this to their backends. In that second, you’ll have 2 check results written to 2 local databases.

I have the feeling that you’re mixing runtime event replication with true builtin MySQL replication. Icinga does not look into both IDO databases and runs a replication sync. That is something MySQL on its own does, when configured, i.e. with binlogs and master-master or master-slave replication setups.

MySQL/PostgreSQL as a storage layer is a different application cluster. Icinga only ensures to keep both masters in sync, allowing a failover having the same historical and runtime data.

Cheers,
Michael

PS: There are numerous topics in this community where this has been discussed. You’ll likely find other opinions there, and are obviously free to build what’s best for you. Just keep in mind, that complicated non-standard setups will make it harder to read, understand and provide useful answers.

2 Likes