Confusion about IcingaDB database and HA

NielsH · March 4, 2025, 9:34am

Hi all,

We are in the process of rebuilding our old Icinga Cluster with IDO-MySQL to a new one with IcingaDB
We have the following setup:
Site 1:

Master01 (Icinga + IcingaDB + Redis + Icingaweb + MySQL)
Satellite01a (Icinga)
Satellite01b (Icinga)

Site 2:

Master02 (Icinga + IcingaDB + Redis + Icingaweb + MySQL)
Satellite02a
Satellite02b

We initially configured MySQL Master-Master between the two sites, but we frequently get duplicate key errors so it seems this is not the right way to go.
In the documentation ( Distributed Setups - Icinga DB ) we see:

For each Redis® server you need to set up its own dedicated Icinga DB instance that connects to it, but the Icinga DB instances must write to the same database, which of course can be replicated or a cluster. So the steps from the standard Icinga DB installation documentation can be followed. However, as mentioned, the database only needs to be set up once.

It is mentioned here that MySQL Replication is OK, but it seems IcingaDB should write to the same master to prevent any issues. At the same time, the documentation suggests that only a single MySQL instance is fine as well. Because of this, I have the following questions:

What is the role of MySQL? I.e. if the data is written to Redis and MySQL, what does MySQL add that Redis does not?
What happens when the MySQL server goes down? Having only a single MySQL server means it is a SPOF. What breaks when MySQL is unavailable? How long can IcingaDB survive and how long until monitoring breaks?
In addition, will Icingaweb also be able to show data despite the MySQL backend being unavailable? I think it will, because it also connects to Redis, but I would like to be sure of this.

Thanks in advance for clarifying!

apenning · March 4, 2025, 10:33am

Thanks for giving Icinga DB a try and your questions.

In my personal experience, a master-master replication always comes with issues. Having a more conservative master-failover setup proved to be more stable. For similar reasons, PostgreSQL does not even support master-master.

Icinga 2 sends data to the Redis, using it as a queue. The Icinga DB daemon consumes this data and stores it in a relational database. Thus, the Redis is “just” a queue or ephemeral storage, while your MySQL stores the data for eternity, especially the history.

If you are building an HA setup with a single node MySQL, then it is not really HA; correct. So please have an HA database setup, e.g., using a good old two node MySQL setup, one active and one failover node.

If the relational database is gone, Icinga DB will have a hard time and give up after round about five minutes.

The Icinga DB Web module relies on the relational database for some data, especially the history. If your MySQL is down, Icinga DB Web will also be very unhappy.

So in a nutshell, your Icinga DB setup requires a relational database to be available. Hopefully my words may help you make more sense out of the graph at Distributed Setups - Icinga DB.

NielsH · March 6, 2025, 10:34pm

Hello @Alvar

Thanks for your response. Very enlightening, especially that Redis is just meant as a “queue” before saving data to the database.

We have converted our database setup to active/backup, so data is always written to the same instance. Cheers!

Regarding Redis, I have noticed that the “active” Redis instance uses ± 140MB of RAM, and the Redis on the other master instance (where IcingaDB logs “Another instance is active” consumes 220MB of RAM. This appears to be a stable baseline, that does not decrease. So it seems that this data is not being flushed to the database and freed up. Fwiw this is with ± 600 hosts and ± 17.000 services monitored.
We have allocated 1GB of RAM to Redis.

Should we ensure that the Redis instance essentially is never allowed to reach 100%? Is there a recommendation for the Redis eviction policy? We have now configured noevict. Is this correct, or is it also fine to let Redis automatically clean up keys based on an LRU policy for example?

Thank you!
Niels

apenning · March 7, 2025, 8:40am

Glad I could help.

When writing that it “consumes 220MB of RAM”, do you mean that this is static? If so, I would guess that Redis does its memory management thing and keeps it reserved. Unless Redis starts nibbling away your memory, I would not worry.

Should be fine. However, I would advise to monitor the Redis server and Icinga DB itself. For the latter, please check out the icingadb check command.

While Icinga 2 writes into Redis, the Icinga DB daemon consumes the data and frees them up. Thus, there should be no need for a customized eviction policy.

The aforementioned icingadb check command creates multiple performance data metrics indicating the system state, including backlog sizes.

NielsH · March 7, 2025, 9:23am

Thanks again for your pointers, very useful!

I do indeed made it is static.
I made a summary of all the data in Redis, see below:

icinga:runtime:state 29.24 MB
icinga:service:customvar 28.03 MB
icinga:notification 25.87 MB
icinga:service:state 21.81 MB
icinga:service 17.36 MB
icinga:notification:recipient 9.74 MB
icinga:notification:user 9.74 MB
icinga:checksum:notification 5.38 MB
icinga:host:customvar 3.80 MB
icinga:checksum:service 2.59 MB
icinga:checksum:service:state 2.46 MB
icinga:nextupdate:service 2.12 MB
icinga:customvar 1.67 MB
icinga:checkcommand:argument 1.10 MB
icinga:host:state 0.78 MB
icinga:host 0.56 MB
icinga:checksum:checkcommand:argument 0.47 MB
icinga:hostgroup:member 0.43 MB
icinga:checkcommand:customvar 0.35 MB
icinga:zone 0.27 MB
icinga:runtime 0.26 MB
icinga:endpoint 0.22 MB
icinga:checkcommand 0.09 MB
icinga:checksum:zone 0.09 MB
icinga:checksum:host:state 0.09 MB
icinga:checksum:host 0.09 MB
icinga:checksum:endpoint 0.08 MB
icinga:nextupdate:host 0.07 MB
icinga:comment 0.04 MB
icinga:checksum:checkcommand 0.03 MB
icingadb:telemetry:stats 0.02 MB
icinga:notificationcommand:argument 0.02 MB
icinga:hostgroup 0.02 MB
icinga:notificationcommand:customvar 0.01 MB
icinga:timeperiod:range 0.01 MB
icinga:checksum:notificationcommand:argument 0.01 MB
icinga:checksum:comment 0.01 MB
icinga:stats 0.01 MB
icingadb:telemetry:heartbeat 0.01 MB
icinga:notificationcommand:envvar 0.01 MB
icinga:checksum:hostgroup 0.01 MB
icinga:dump 0.00 MB
icinga:schema 0.00 MB
icinga:notificationcommand 0.00 MB
icinga:checkcommand:envvar 0.00 MB
icinga:downtime 0.00 MB
icinga:timeperiod 0.00 MB
icinga:user 0.00 MB
icinga:checksum:notificationcommand:envvar 0.00 MB
icinga:checksum:checkcommand:envvar 0.00 MB
icinga:checksum:notificationcommand 0.00 MB
icinga:checksum:timeperiod 0.00 MB
icinga:history:stream:acknowledgement 0.00 MB
icinga:history:stream:notification 0.00 MB
icinga:history:stream:state 0.00 MB
icinga:history:stream:downtime 0.00 MB
icinga:checksum:downtime 0.00 MB
icinga:checksum:user 0.00 MB

It doesn’t seem very extreme, I just expected it to go down if it is supposed to go to the database.
If these numbers don’t seem strange to you either I think we can assume everything is OK and as intended

We already had Redis memory monitoring but I will check out the icingadb check you mentioned as well!

Cheers
Niels