Hi, I’m trying to determine if my icinga2 satellites are overloaded or not with current checkload.
As per documentation, there is an ITL checkcommand named “icinga” that helps measure the performance of an icinga instance. As “descriptives” as names are on the performance data; I’m having a hard time undertstanding if my Icinga instance needs more hardware or not.
I’ve checked a few sources on the community (Number of devices monitored on Icinga, Capacity Planning - Best Practice For iCinga Master / Satellite - #4 by theFeu, Icinga2 at large scale - #7 by Solkren) and their reference source code (icinga2/checkercomponent.cpp at master · Icinga/icinga2 · GitHub, icinga2/icingachecktask.cpp at ee705bb110e802f8cafd21bab2d8697b0a538b0a · Icinga/icinga2 · GitHub); But I can’t get to understand the information.
I have following grafana graph for a HA Zone (to satellites running as many checks as configured) and its coresponding cpu% and cpu-load graph:
- what the measurement units are for avg_execution_time and avg_latency ?
- what does checkercomponent_checker_idle means ? and what units are being represented?
- What does checkercomponent_checker_pending measures, and what units are being used? < as per code it waits half a second when you have more checks than MaxConcurrentChecks running>
- what is the difference betweeen avg_execution_time (is that seconds? mintues?) and avg_latency (seconds ??). how do they correlate?, what times are being measured ? (it would be nice to have a timeline explaining start/end of each on them and their correlation)
- how can I tell if the checker component is under heavy load or not?
Thanks in advance,