we are believing that we are having issue with cache memory on icinga2 masters (v2.11.4). Both master are affected, worst situation is on config master (8 cores 16GB both masters). So we are having around 40 satellites with 2 masters, 1 postgres db (8 cores 16GB) and 1 graphite (8 cores 16GB) server with 1 carbon relay and 8 carbon caches (python version).
Currently used + cache = 96% of 16GB - conf. master - used mem is around 2GB rest is cache
used + cache = 75% of 16GB - second master - used mem is around 2GB rest is cache
Swap is ok 100% free on all machines in env.
load is very low on all machine except graphite. 1.5 sometimes is 4 but it has 8 cores so it should be fine
We are having around 10 000 service checks and around 2000 host objects. I think this should be more then enough to run smothly but our perf graphs showing us diff. Lot of mem usage. For now everything is stable and icinga2 is running perfectly but we will have in the next few months a lot of new devices (will double it or we will get even more).
- What is your expirienced about scaling/resources of icinga2?
- Do we need to find some conf file and set limitation on cache or it is related with maybe graphite server or postgress db or with other infrastruct. elements? - couldn’t find so much on net
- Did you had simular issue and how did you solved it?
Curently we are investigating everything, including VM platform, network connectivity to satellites, selinux, will test carbon cache golang version and many more