I am trying to setup and configure a HA Master-Master distributed setup, but I’m having trouble understanding the best implementation for this solution.
Currently, I have installed icinga2 and icinga2web on both servers ( both with their own databases IDO etc).
You can find the scheme of the installed solution bellow:
Will this work ? I mean there are seperate IDO databases on both servers, adding both masters in a zone will the databases be replicated automatically ?
Any other recommendation regarding the best possible solution for what I’m trying to achieve will be highly welcome.
Yes, a setup with 2 masters where each has their own database does work.
Icinga 2 keeps the data synced in both databases. For this you need to set enable_ha=false in the ido config, so both masters write into their own database.
Also Icinga 2 does not have an active-passive setup but all endpoints in the same zone (here the masters in the master zone) each get a portion of the checks and are mainly responsible for them. The checks will only switch to the other endpoint in a zone, if the endpoint originally holding the checks is down.
Iirc the setup with two masters where each has a local DB is even favoured over 2 masters with a central single db instance
Thank you @log1c for the enlightening answer.
I have successfully created what is needed.
One question though.
If I do not put enable_ha=false one of the two IDO databases gets disabled. As far as documentation includes that is the expected behavior.
Will this mean that the databases do not get synced?
If I create client checks on master1 will the checks get propagated to master2 ?
Lastly, I want to enable director, will director database get synced as well ?
Sorry if my questions are somewhat “noobie” but after reading the documentation many many times, I still find it somehow chaotic.
replication inside the MySQL cluster needs to be done by the user. This is a separate HA cluster, and has nothing to do with Icinga and its HA capabilities.
The best thing is a central virtual cluster IP address where both Icinga instances can write their IDO data into. With enable_ha being true by default, only one master will actively write to the database backend at the time.
If you don’t have the possibility to create a MySQL cluster with a VIP by yourself, you can use local databases on each Icinga master, where each of them writes to the local database. This is what enable_ha=false ensures, both IDO features are active and running.
In addition to the two masters with local databases, there’s also users who build their own MySQL replication between the nodes. This works ™ but if you run into slave lag synchronisation, this will generate a huge kab00m with DB exceptions, huge bin logs, etc. I don’t recommend that scenario unless you are an experienced DB admin.
Instead, a central database on a dedicated host, or a DB cluster on that IP works best in a HA enabled Icinga master scenario. Try it out simple, create two masters, one DB host and test the runtime behavior.
So, MySQL cluster with VIP is not an option since the two icinga masters servers with the local databases are in different regions ( one in Frankfurt and the other in UK ).
What I’m trying to achieve here is :
Two icinga masters
Both synced ( checks and databases/database )
If one goes down the other one takes over.
I was under the impression that icinga2 HA means that the IDO database is being replicated automagically. Is icinga2 HA only for the icinga2 services and not including databases?
If I enable HA on the IDO databases, I lose the one database, so since one IDO database is active at the time, will DNS records ( active - failover ) work if both databases have the same hostname ( different IP ) and I setup the IDO connection to be the hostname ( instead of IP ). In case the one IDO database goes down will the other one come up?
I’m trying to find the best possible scenario for a truly high available solution, since these servers will monitor a huge amount of servers and I want to make sure that the availability of the monitoring system is 99,99% SLA uptime.
That’s what I thought, too.
I’m running one setup with two masters were each has their own database server and enable_ha=false. Previously this was a two-node (I know…) Galera “cluster”, which I deactivated.
Have never check if the databases on the two servers are really the same.
Use Vagrant or your cloud provider and build a simple setup with 2 masters and 1 DB host as VMs. Test your scenarios in there, and collect your answers. You will see how replication works, and which parts of HA work as described in the documentation. If you miss IDO replication being described in the docs, it doesn’t exist. The docs are complete and every supported functionality is located there.
I just wanted to add, since you mentioned the different regions with data centers spread around the globe: Ensure that this isn’t a low latency connection. Otherwise the HA syncs (config, check results, etc. as runtime data) will be slow and you won’t gain any benefit from that.
The cluster functionality takes care of syncing runtime events to any node in the same zone. Meaning to say, if you schedule a downtime, or a new check result reaches the master zone, both masters receive the message and will forward this to their backends. In that second, you’ll have 2 check results written to 2 local databases.
I have the feeling that you’re mixing runtime event replication with true builtin MySQL replication. Icinga does not look into both IDO databases and runs a replication sync. That is something MySQL on its own does, when configured, i.e. with binlogs and master-master or master-slave replication setups.
MySQL/PostgreSQL as a storage layer is a different application cluster. Icinga only ensures to keep both masters in sync, allowing a failover having the same historical and runtime data.
Cheers,
Michael
PS: There are numerous topics in this community where this has been discussed. You’ll likely find other opinions there, and are obviously free to build what’s best for you. Just keep in mind, that complicated non-standard setups will make it harder to read, understand and provide useful answers.
just to be sure, there are only two possibilities regarding a HA master setup with two master instances?
Active - Passive mode: There are two icinga-master-nodes which are communicating to one database. If one master instance goes down, the other takes over. enable_ha attribute must be set to true.
Active - Active mode: There are two icinga-master-nodes which can communicate to their own database. If one master instance goes down, the other one is still available. enable_ha attribute must be set to false.
Is this correct?
And another topic: Is it possible to that the database is running on another VM than the icinga master instances do, or does the DB have to be installed local on the icinga master VM?
Not quite.
Instances in a zone (be it master or satellite) always distribute the hosts and services between them (active-active). If one of the instances goes down the remaining instance will take over the checks from the failed instance.
Regarding the enable_ha for the database feature (ido):
If you set this to false each node will write into their own database.
If you set it to true (default) only one of the nodes will write into the database.
But both nodes will still be “active” regarding check execution!
The DB can be set up on a separate server, you then just need to configure the correct parameters in the ido config file(or during setup of the webinterface).
Is it possible to set enable_ha=true while each node will have its own database?
If you set it to true (default) only one of the nodes will write into the database.
But both nodes will still be “active” regarding check execution!
Does that mean both master nodes will perform checks simultaneously but only one node does store the results in the DB? If so, this would increase the load on network and nodes to be checked significantly…
Possible? Yes.
Would I do that? No. If I am not mistaken that scenario would lead you to two databases with different states as there is no replication between the databases. If you want multiple DB servers you need some form DB replication. Icinga does not cover that.
Yes, both nodes in the same zone will split the host objects and their checks between them (see docs).
If enable_ha is set to true only one of the node will write to the DB, the other node will sync their received check results to the node writing to the DB.
6 or 7 years ago I installed a Master-Master setup in two datacenters. One in Nuremberg and one in Berlin. Both masters wrote to a local running mariadb. The mariadb had galera installed, so the data was replicated by galera over the 2 locations.
I used a small vm with mariadb and galera only for the quorum.
I left the company some years ago. But as far as I know, the setup is still running without any problem.
@log1c
Is it correct that in a master HA scenario one instance (e.g. master1) is a complete Icinga2 master installation, while the second instance (master2) consists out of a satellite installation?
If so, in case of a master1 failure, master2 wouldn’t be able to send data to the master1, whereas no data will be written to the IDO, since only master1 can manage that?
Master2 starts as a satellite setup. But after that you have to do some manual tasks which elevate it from a satellite to a master (zones config, feature config).
If master1 goes down, master2 will take over the checks normally held bei master1 and will start writing to the IDO DB, if enabled_ha = true. If the HA functionality for the IDO feature isn’t enabled both masters need their own DB configured and will always write to this DB.