IcingaDB Redis continuously growing on secondary-master

My setup has a primary and secondary in the Master zone, and then multiple other zones each with two satellites. Each master has its own independent Redis instance, icingadb instance, and they share a backend database (mysql).

Both masters are running checks, and the system seems fine. I got an alert about memory usage on my secondary master, and started looking into it - yesterday Redis RSS was at 1.4GB, today at 1.8GB.

When I check various Redis keys/streams, the secondary’s are much larger than the primary:

The memory usage is also vastly different:

Key Bytes (Secondary Master) Bytes (Primary Master) Column 4
icinga:checkcommand 152777 141033
icinga:checkcommand:argument 1133889 1178600
icinga:checkcommand:customvar 376880 376880
icinga:checkcommand:envvar 1923 1923
icinga:checksum:checkcommand 41032 41032
icinga:checksum:checkcommand:argument 507960 507960
icinga:checksum:checkcommand:envvar 720 720
icinga:checksum:comment 296 296
icinga:checksum:downtime 18360 22848
icinga:checksum:endpoint 204192 204192
icinga:checksum:host 221728 221728
icinga:checksum:host:state 221736 221736
icinga:checksum:hostgroup 3144 3144
icinga:checksum:notification 7874264 7874264
icinga:checksum:notificationcommand 720 720
icinga:checksum:notificationcommand:envvar 7256 7256
icinga:checksum:service 8079760 8079760
icinga:checksum:service:state 8079760 8079760
icinga:checksum:servicegroup 1864 1864
icinga:checksum:timeperiod 968 968
icinga:checksum:user 3136 3136
icinga:checksum:usergroup 712 712
icinga:checksum:zone 203368 203368
icinga:comment 1600 1600
icinga:customvar 1983880 1983880
icinga:downtime 114313 149304
icinga:dump 9824 4736
icinga:endpoint 557464 557464
icinga:history:stream:acknowledgement 412 412
icinga:history:stream:downtime 396 396
icinga:history:stream:flapping 396 396
icinga:history:stream:notification 404 396
icinga:history:stream:state 396 404
icinga:host 1477208 396
icinga:host:customvar 1917616 1477208
icinga:host:state 1670368 1917616
icinga:hostgroup 12136 1747628
icinga:hostgroup:member 1054312 12136
icinga:nextupdate:host 191028 1054312
icinga:nextupdate:service 6939187 198272
icinga:notes:url 1008 6592480
icinga:notification 45920528 1008
icinga:notification:customvar 31248 45920528
icinga:notification:recipient 154858456 31248
icinga:notification:user 276544 154858456
icinga:notification:usergroup 14785688 276544
icinga:notificationcommand 2928 14785688
icinga:notificationcommand:envvar 20494 2928
icinga:runtime 600468848 21364
icinga:runtime:state 946190028 380
icinga:schema 1024 388
icinga:service 58567219 4736
icinga:service:customvar 64569712 55722854
icinga:service:state 68522504 64569712
icinga:servicegroup 7432 81322145
icinga:servicegroup:member 1055632 8123
icinga:stats 10880 1055632
icinga:timeperiod 3627 10880
icinga:timeperiod:range 11192 3524
icinga:user 15448 11192
icinga:usergroup 2536 15819
icinga:usergroup:member 5264 2536
icinga:zone 643040 5264
icingadb:overdue:service 14200 643040
icingadb:telemetry:heartbeat 4752 4752
icingadb:telemetry:stats 24088 37424

I’m wondering if theres an issue here, or the secondary is just holding more data than the primary for “reasons”, or if its somehow misbalanced and the secondary is just doing more work.
I’d have expected that once things get written out to the DB, it would remove them from Redis.

Can anyone advise on the reason for the difference?

  • Icinga DB Web version (System - About): 1.1.3
  • Icinga DB Redis version: 7.2.6
  • Icinga Web 2 version (System - About): 2.12.2
  • Web browser: Safari
  • Icinga 2 version (icinga2 --version): r2.14.3-1
  • Icinga DB version (icingadb --version): v1.2.0
  • PHP version used (php --version): 8.2.28
  • Server operating system and version: Debian 12

Welcome to the Icinga Community and thanks for coming forward with your issue in such a detailed way.

First thing first, the versions of your installed Icinga components are quite outdated. There are also some security updates missing, for example for Icinga 2 (not really relevant for Debian 12, though) and Redis. Furthermore, there were quite some changes in Icinga DB between 1.2.0 and 1.4.0.

How does the memory usage of the Redis develops over time? Do you have more data points, for example, from a perfdata writer? And how is the memory consumption of the primary master in comparison?

Looking at your table, lots of values or equal or quite similar, but there are a few exceptions outlining in both directions, e.g., icinga:host (1,477,208 vs 396) or icinga:hostgroup (12,136 vs 1,747,628).

More interesting, do you have the icingadb check command defined for each Icinga DB node? This check reports lots of interesting performance data metrics, like icinga2_redis_query_backlog or icingadb_history_backlog, icingadb_runtime_update_backlog. Is any of those values greater zero and how do they variate over time? Again, a perfdata writer might be useful.

Another thing to consider is Redis default behavior regarding perpetually dumping its state. Please consider the “Huge memory footprint and IO usage in large setups” section in the Operations manual and check if this applies to your setup.