Server resources guidelines or benchmarks

petew · April 2, 2021, 9:14pm

are there any guidelines or benchmarks regarding server resources (cpu, ram) relative to the number of hosts/services being monitored. e.g. zone with satellites having X cpu, Y ram should be able to handle A hosts and/or B services.

we are looking to set up a new environment using reduced server resource in favor of more horizontal scaling of satellite zones but need some help determining “right-sizing” the resources. i.e. we want more, smaller satellites, but how small?

our current environment:
icinga2 r2.11.2-1
icingaweb 2.7.3
galera DB

every server:
Ubuntu 16.04.6 LTS (Xenial Xerus)
32 cpu
64g RAM
200g disk

HA setup:
2 masters
12 zones with 2 satellites (24 total)

monitoring:
total hosts = 18824
total services = 164013

zones are checking anywhere from a max 2011/18070 host/service with “middle” between 1820/16400 and 1643/13336 to a minimum 1246/10794 (new zone to scale/rebalance)

load on satellites ranges from high 9’s to low 3’s but typically in the 6-8 range and does not appear to hamper monitoring.

in addition, the new environment will not use the icingaweb dashboard and only 1 master. other than ‘single point of failure’ are there other considerations/cautions for a single master with multiple satellite zones?

log1c · April 6, 2021, 7:07am

I myself have nearly no experience with such large setups, but maybe the following thread can give you some insights:

This focuses mostly on the IDO DB performance.

Some generic thoughts:
The resource requirements depend heavily on the checks used.
Something like check_by_ssh, check_tcp or other small checks have a small footprint, other checks, e.g. check_nwc_health or similar have heavy footprints eating up memory, especially if they have a high execution time.

I guess the database and the webinterface is/will be on a separate server? This would take some load the monitoring servers.

The load you describe for your satellites are pretty low for 32 cpu cores, so you don’t seem to have a problem here.