Advice for hardware requirement

Hello,

I want to setup a master/satellite cluster to monitor 1000 devices each having 10 service on average with a polling interval of 300 seconds. Most of the services check are through SNMP while few are based on ssh.
Features that need to be enabled in icinga are DB-IDO,Notification and Graphite

Can you please advice what hardware requirements(CPU,RAM,Disk) for Icinga master,satellite,mysql & graphite do i need to efficiently monitor my devices.

Thank you.

Hello
Asking such a question is difficult to answer but iā€™d try to give an estimation:
As you are planning to distribute the load to satellites, the load on any individual machine will not be too much so for the Master & Satellites iā€™d recommend a 2 Core machine with 8 GB Ram and 40 GB HD as the minimum.
If you plan to run the Graphite on the Master servers, iā€™d increase the HD size to min 60GB.
For the Mysql server iā€™d say 4 Core, 16 GB Ram and as big a HD as you can as the DB tend to grow, without any limit, unless you as the DBA perform table pruning.

Hope that helps

1 Like

I have no experience with such big setups, yet (sadly).
Just some figures from the biggest one I have set up until now:
Monitoring ~2100 hosts and ~3200 services.
HW setup:
Master-Cluster: 4vCPU, 8GB RAM, 60GB HDD each.
Satellite-Cluster: 4vCPU, 8GB RAM, 30GB HDD each.
DB: 4vCPU, 8GB RAM, 60GB HDD

Stats:
Load:

  • Master
    ā€“ is bored, load around 0.6 - 1.1
  • Satellites (doing the most checks ~1650 hosts ~2900 services)
    ā€“ still bored 0.7 - 1.8 mostly
  • DB
    ā€“ around 1.0 - 1.3

Memory:

  • Master
    ā€“ master1 ~60%, master2 ~10% (assuming this is due to master1 being used as the main webinterface)
  • Satellites (doing the most checks ~1650 hosts ~2900 services)
    ā€“ ~10%
  • DB
    ā€“ ~20%

Disk-/var:

  • Master
    ā€“ around 9 - 11 GB
  • DB
    ā€“ around 6GB

The setup is running for about a year now.

As you plan to run mostly SNMP checks keep in mind that checks that create temporary cache files (like some check from check_nwc_health do) can have a considerable performance impact.

1 Like