IcingaDB - Is production ready for a large scale Icinga2 cluster?

Hello,

Has anyone implemented IcingaDB based installation for a relatively large scale Icinga2 cluster in production? If yes, did you come across any issues, gotcha , surprises etc? Please share if you did.

I’ve been thinking setting up a cluster in a traditional way ( 3 tier ) without Icinga2 agent and director. IcingaDB and Redis hosts will exist as inactive standalone hosts. Later on if I decide to go with IcingaDB, I would like them wired on to existing setup. Is it possible and has anyone done it?

Does IcingaDB support 2 clusters at 2 different sites? I mean 2x masters at one site and 2x masters at the second one for providing HA in case of a site issue/disaster?

Please advise,
Thanks

Thanks for your interest in Icinga DB and considering to switch.

I have seen some setups one consider calling large running Icinga DB next to IDO or instead of it. As Icinga DB v1.0 was released over two years ago, lots of trivial bugs were seen and (mostly) addressed. It has reached some maturity.

Compared to the IDO, using Icinga DB might consume more memory in total, but back pressure will not result in a spike of memory consumption in Icinga 2 itself.

The Redis is primary used as a message bus between Icinga 2 and Icinga DB. So when expecting lots of events (history entries, state changes, and the like), having these three components close to each other is advisable to minimize additional delays, e.g., from the network.

However, there is no reason not to deploy it on another host. Note, when using Icinga DB in an HA setup, each Icinga 2 master needs their own Icinga DB with their own Redis in between. These two Icinga DB instances will then work with the same relational database. Icinga DB’s HA logic also works by using the common SQL database. More in the docs, https://icinga.com/docs/icinga-db/latest/doc/05-Distributed-Setups/.

Within Icinga DB’s relational database, there is the environment ID to distinguish between different setups. I am a bit unsure if I have understood your motivation correctly, but this may help.

Please feel free to give Icinga DB a try, maybe first on an isolated testing host as you already considered.

Thank you Alvar for your opinion and insights.

Just wanted to add to the above quote of mine - The masters at each site will share the same group of satellites. So in the event of a site issue, a DNS change for the masters and bringing them in the cluster. It should work yeah?

Without details, I cannot answer this question honestly. However, if this setup works with the IDO, it should also work with Icinga DB.

I added some info in the way we shape icinga installs for large environments in this post.

I’m currently working on a setup with 50,000+ hosts and for the most part it is OK.

Some of the size problems we notice is IcingaDBWeb can get a bit out of wack as it gets some data from redis and other data direct from the db which results in some parts of the screen disagreeing with each other particularly after deployment.
We are also seeing 40-60 seconds for deployment to run as all the config load, this is partially because the config has some stuff it really shouldn’t have but config size is a thing. This slowness to deploy also affect restart.

1 Like

Thank you @matthew.smith for sharing your experience.

I do have my own doubts about IcingaDB performance, stability. I’m gonna test it in QA and sticking with IDO for Production for now. Once IcingaDB reaches the maturity and stability I will probably wire it on at some point. I find IcingaDB documentation as poorly written, confusing as well.