Implementing HA Master-Master setup

operat0r · February 12, 2020, 10:38am

Hello,

I am trying to setup and configure a HA Master-Master distributed setup, but I’m having trouble understanding the best implementation for this solution.

Currently, I have installed icinga2 and icinga2web on both servers ( both with their own databases IDO etc).
You can find the scheme of the installed solution bellow:

Will this work ? I mean there are seperate IDO databases on both servers, adding both masters in a zone will the databases be replicated automatically ?

Any other recommendation regarding the best possible solution for what I’m trying to achieve will be highly welcome.

Best Regards,
Panagiotis

log1c · February 12, 2020, 11:11am

Yes, a setup with 2 masters where each has their own database does work.
Icinga 2 keeps the data synced in both databases. For this you need to set enable_ha=false in the ido config, so both masters write into their own database.
Also Icinga 2 does not have an active-passive setup but all endpoints in the same zone (here the masters in the master zone) each get a portion of the checks and are mainly responsible for them. The checks will only switch to the other endpoint in a zone, if the endpoint originally holding the checks is down.

Iirc the setup with two masters where each has a local DB is even favoured over 2 masters with a central single db instance

operat0r · February 13, 2020, 6:34am

Thank you @log1c for the enlightening answer.
I have successfully created what is needed.
One question though.
If I do not put enable_ha=false one of the two IDO databases gets disabled. As far as documentation includes that is the expected behavior.
Will this mean that the databases do not get synced?
If I create client checks on master1 will the checks get propagated to master2 ?
Lastly, I want to enable director, will director database get synced as well ?
Sorry if my questions are somewhat “noobie” but after reading the documentation many many times, I still find it somehow chaotic.

Best Regards,
Panagiotis

log1c · February 13, 2020, 7:05am

Hm, can’t answer that as I don’t know. Maybe someone else can share their thoughts.

Yes, checks are distributed between the two masters and the last remaining one takes over in case of a failure (some goes for satellites in a zone).

No, this DB does not get synced. Also only one of the masters can be the config master.

dnsmichi · February 13, 2020, 9:04am

Hi,

replication inside the MySQL cluster needs to be done by the user. This is a separate HA cluster, and has nothing to do with Icinga and its HA capabilities.

The best thing is a central virtual cluster IP address where both Icinga instances can write their IDO data into. With enable_ha being true by default, only one master will actively write to the database backend at the time.

If you don’t have the possibility to create a MySQL cluster with a VIP by yourself, you can use local databases on each Icinga master, where each of them writes to the local database. This is what enable_ha=false ensures, both IDO features are active and running.

In addition to the two masters with local databases, there’s also users who build their own MySQL replication between the nodes. This works ™ but if you run into slave lag synchronisation, this will generate a huge kab00m with DB exceptions, huge bin logs, etc. I don’t recommend that scenario unless you are an experienced DB admin.

Instead, a central database on a dedicated host, or a DB cluster on that IP works best in a HA enabled Icinga master scenario. Try it out simple, create two masters, one DB host and test the runtime behavior.

Cheers,
Michael

operat0r · February 14, 2020, 8:35am

So, MySQL cluster with VIP is not an option since the two icinga masters servers with the local databases are in different regions ( one in Frankfurt and the other in UK ).

What I’m trying to achieve here is :

Two icinga masters
Both synced ( checks and databases/database )
If one goes down the other one takes over.

I was under the impression that icinga2 HA means that the IDO database is being replicated automagically. Is icinga2 HA only for the icinga2 services and not including databases?

If I enable HA on the IDO databases, I lose the one database, so since one IDO database is active at the time, will DNS records ( active - failover ) work if both databases have the same hostname ( different IP ) and I setup the IDO connection to be the hostname ( instead of IP ). In case the one IDO database goes down will the other one come up?

I’m trying to find the best possible scenario for a truly high available solution, since these servers will monitor a huge amount of servers and I want to make sure that the availability of the monitoring system is 99,99% SLA uptime.

log1c · February 14, 2020, 8:57am

That’s what I thought, too.
I’m running one setup with two masters were each has their own database server and enable_ha=false. Previously this was a two-node (I know…) Galera “cluster”, which I deactivated.
Have never check if the databases on the two servers are really the same.

dnsmichi · February 14, 2020, 9:19am

Hi,

Use Vagrant or your cloud provider and build a simple setup with 2 masters and 1 DB host as VMs. Test your scenarios in there, and collect your answers. You will see how replication works, and which parts of HA work as described in the documentation. If you miss IDO replication being described in the docs, it doesn’t exist. The docs are complete and every supported functionality is located there.

I just wanted to add, since you mentioned the different regions with data centers spread around the globe: Ensure that this isn’t a low latency connection. Otherwise the HA syncs (config, check results, etc. as runtime data) will be slow and you won’t gain any benefit from that.

The cluster functionality takes care of syncing runtime events to any node in the same zone. Meaning to say, if you schedule a downtime, or a new check result reaches the master zone, both masters receive the message and will forward this to their backends. In that second, you’ll have 2 check results written to 2 local databases.

I have the feeling that you’re mixing runtime event replication with true builtin MySQL replication. Icinga does not look into both IDO databases and runs a replication sync. That is something MySQL on its own does, when configured, i.e. with binlogs and master-master or master-slave replication setups.

MySQL/PostgreSQL as a storage layer is a different application cluster. Icinga only ensures to keep both masters in sync, allowing a failover having the same historical and runtime data.

Cheers,
Michael

PS: There are numerous topics in this community where this has been discussed. You’ll likely find other opinions there, and are obviously free to build what’s best for you. Just keep in mind, that complicated non-standard setups will make it harder to read, understand and provide useful answers.

TSt · January 20, 2021, 7:04am

Hi,

I am trying to set up two Icinga2 Servers that both write into their own database similar to this topic.

I’m stuck at the simplest thing, I cannot find the config file where the “enable_ha=false” settings needs to be put.

Thank you in advance.

stevie-sy · January 20, 2021, 7:19am

Hi,

depending on your installed database:
MariaDB/Mysql: /etc/icinga2/features-available/ido-mysql.conf
look here: https://icinga.com/docs/icinga-2/latest/doc/09-object-types/#idomysqlconnection
PostgreSQL: /etc/icinga2/features-available/ido-pgsql.conf
look here: https://icinga.com/docs/icinga-2/latest/doc/09-object-types/#idopgsqlconnection

TSt · January 20, 2021, 8:00am

Thank you! I’ve found the file and added the Setting.

lobr · December 23, 2021, 9:21am

Hi,

just to be sure, there are only two possibilities regarding a HA master setup with two master instances?

Active - Passive mode: There are two icinga-master-nodes which are communicating to one database. If one master instance goes down, the other takes over. enable_ha attribute must be set to true.
Active - Active mode: There are two icinga-master-nodes which can communicate to their own database. If one master instance goes down, the other one is still available. enable_ha attribute must be set to false.

Is this correct?

And another topic: Is it possible to that the database is running on another VM than the icinga master instances do, or does the DB have to be installed local on the icinga master VM?

log1c · December 23, 2021, 10:00am

Not quite.
Instances in a zone (be it master or satellite) always distribute the hosts and services between them (active-active). If one of the instances goes down the remaining instance will take over the checks from the failed instance.

Regarding the enable_ha for the database feature (ido):
If you set this to false each node will write into their own database.
If you set it to true (default) only one of the nodes will write into the database.
But both nodes will still be “active” regarding check execution!

The DB can be set up on a separate server, you then just need to configure the correct parameters in the ido config file(or during setup of the webinterface).

The docs have a quite elaborate chapter on describing the HA features:
https://icinga.com/docs/icinga-2/latest/doc/19-technical-concepts/#high-availability

lobr · December 23, 2021, 10:28am

Is it possible to set enable_ha=true while each node will have its own database?

If you set it to true (default) only one of the nodes will write into the database.
But both nodes will still be “active” regarding check execution!

Does that mean both master nodes will perform checks simultaneously but only one node does store the results in the DB? If so, this would increase the load on network and nodes to be checked significantly…

log1c · December 23, 2021, 2:33pm

Possible? Yes.
Would I do that? No. If I am not mistaken that scenario would lead you to two databases with different states as there is no replication between the databases. If you want multiple DB servers you need some form DB replication. Icinga does not cover that.

Yes, both nodes in the same zone will split the host objects and their checks between them (see docs).
If enable_ha is set to true only one of the node will write to the DB, the other node will sync their received check results to the node writing to the DB.

Why? Please explain.

steffeneichler · January 20, 2022, 1:34pm

6 or 7 years ago I installed a Master-Master setup in two datacenters. One in Nuremberg and one in Berlin. Both masters wrote to a local running mariadb. The mariadb had galera installed, so the data was replicated by galera over the 2 locations.
I used a small vm with mariadb and galera only for the quorum.
I left the company some years ago. But as far as I know, the setup is still running without any problem.

lobr · February 9, 2022, 4:40pm

@log1c
Is it correct that in a master HA scenario one instance (e.g. master1) is a complete Icinga2 master installation, while the second instance (master2) consists out of a satellite installation?

If so, in case of a master1 failure, master2 wouldn’t be able to send data to the master1, whereas no data will be written to the IDO, since only master1 can manage that?

log1c · February 10, 2022, 8:03am

Master2 starts as a satellite setup. But after that you have to do some manual tasks which elevate it from a satellite to a master (zones config, feature config).
If master1 goes down, master2 will take over the checks normally held bei master1 and will start writing to the IDO DB, if enabled_ha = true. If the HA functionality for the IDO feature isn’t enabled both masters need their own DB configured and will always write to this DB.

lobr · February 10, 2022, 8:29am

@log1c
where are these steps described how to elevate a satellite to a master?
Or is it just copying zones and features directory?

log1c · February 10, 2022, 8:33am

Check this section of the docs
https://icinga.com/docs/icinga-2/latest/doc/06-distributed-monitoring/#high-availability-master-with-agents