Icinga2 at large scale

Hello community,
I was not sure where to put this topic, please move it if it’s under wrong category

In this topic i would like to cover several questions: system requirements, possible bottlenecks and benchmarking. Hope this will simplify installation for new users and provide “ready for deploy” cluster schemas and strategy.

My main goals are:

  1. define benchmarking tools for stress testing of cluster prototype
  2. define how scalable Icinga2 is, based on stress tests
  3. provide system requirements for the server hardware

A little bit of background
Currently i’m working under Proof of Concept of Icinga2 for my company production environment.
It will be couple of Icinga2 distributed clusters of different size, the biggest one should handle up to 30k clients.
I’ll need to make several standardized clusters, for example:
Cluster S: up to 500 clients
Cluster M: up to 5000 clients
Cluster L: up to 10000 clients
Cluster XL: up to 30000 clients

Preferred environment:

  1. “master” & “satellite” components will be cloud based systems with instances with 8 CPUs / 16 GB RAM.
  2. MySQL cluster on cloud instances with 8 CPUs / 16 GB RAM with floating IP.

Optional environment for biggest clusters:

  1. “master” components on baremetal servers, “satellites” on cloud instances with 8 CPUs / 16 GB RAM.
  2. MySQL cluster on baremetal servers with floating IP

Possible bottlenecks:
Following documentation of top-down 3 level clusters, main bottleneck could be a Icinga2 master zone as it’s horizontal scale limit is 2 nodes in HA mode (please correct me if i’m wrong).

First of all i need benchmarking tool to emulate 30k of real clients, this will allow me to understand if it fit our needs for the largest cluster.
Test requirements: 50 checks, 1 minute frequency for each check, 30000 emulated clients.
Summary for system would be to handle: 1.5 million events per min or 25000 events per second.

Does anyone have experience with standardization and benchmarking of Icinga2?

I would like to generate real load with active checks, it could be done with docker instances but i’m looking for a more elegant way, because docker will make a big resource overhead. Please share with what tools you generated load for a tests.

With best regards,
Dmitriy.