I’m using an Icinga2 with a distributed setup :
- 12x VMware VM with 8vCPU & 16 Gb.
- In 3 zones
- 1x DB it’s a 4vCPU & 8 Gb
- 1x Master node with IcingaWeb2 + Director
Checks are run on each host mostly each 5 min & using mostly SNMP IO (we are monitoring network devices only)
Currently we have set the max_current_checks to 128 be able to continue to use the server otherwise OOM kill the Icinga2 process.
So, due to the scheduled checks each minutes & the capacity of the farm, checks are not delivered on time. Sometimes with more than 600 sec of delay…
All pollers are at 0% CPU idle.
So maybe we are doing something wrong, or maybe it’s normal due to the load required.
I know also that more that 2 clients in a zone is not the best.
Can you suggest how to setup a better configuration ?
Does it mean VM cannot handle well that kind of job (they are each consuming more than 10Ghz) ?
Any help appreciate