Metrics to make installations comparable

twidhalm · November 12, 2019, 9:55am

Hi,

I was thinking of a way to compare several installations of Icinga 2 with each other. There are so many ways how they can differ but I want to have new metrics to measure how “big” an installation is. This way we can more easily check if the system is doing well and we can predict how much a system will grow.

What I came up with so far:

Count of each object
Count of apply rules (for measuring startup time)
“cpc” : Checks per second (in reference to the events per second / eps from Logmanagement). Including Service and Host checks. It doesn’t take into account that there are plugins with very different resource needs
Services per Host : What ever this might be good for
Loglines per Hour in icinga2.log
Loglines per Hour in debug.log
Loglines / Size per minute in api/log/current
Size / lines of icinga2.state whatever it is good for
Time for config check
Time for reload
Average time of check execution, latency ( I hear Icinga 1 calling)

Is this too much? Am I missing something? Is everything I added useful?

I’m thinking of adding the useful ones to Icinga Diagnostics.

KevinHonka · November 12, 2019, 10:12am

Hey,
for comparison of icinga environments, the object/apply rules count would be good.
if it is possible there should also be a way to compare the objects that each applyrule will use. This could be good for troubleshooting deployment issues when apply rules take way to long.

I’m not so sure about the log lines, as this will most likely not that interesting.
Everything else looks rather interesting. One would need to check which of those parameters can really be useful.

Some of the stats like Average tome of Execution and latency should be generated by service to make them relevant, as an overall number can give an indicator that something might be wrong, an by service value could show problems much better.

twidhalm · November 12, 2019, 10:17am

The main purpose of this is to have a first check whether the system seems to be doing ok over all. e.g. if a standard 2.12 Icinga is writing approximatly 5 log lines per object per hour you can tell that something’s wrong with King Kong when yours is writing 300 log lines per object per hour.

A rate of matches per apply rule seems to be nice. Is this what you are intending? I’m not sure if this is something we can achieve with diagnostics.

KevinHonka · November 12, 2019, 10:20am

when looking at it this way, log line statistics do sound like a good idea. also yes I meant a rate of matches per apply rule, will have to think about a way to maybe make this possible.
I also finished the first casual rewrite of the python script. If you are up to it, we could talk about it tomorrow or some other day

twidhalm · November 12, 2019, 10:22am

Tomorrow sounds good. I’ll contact on another way so we won’t spam this thread with diagnostics rewrite details.

unic · November 12, 2019, 12:20pm

You are welcome to spam here

An Overview about state changes of checks would be nice too.

I would like to see this on icingaweb as a report summary from yesterday where to see State changes, executed checks, etc.

Another thing is, that every check has different cpu usage. Maybe a report about which check has how much load or execution time, if this is even possible.

twidhalm · November 12, 2019, 8:08pm

@unic : Maybe yours are more of feature requests for https://github.com/Icinga/icingaweb2-module-reporting ? I would have to look it up but it seems like some of the information you wanted to have as a report are collected but I don’t think, they are written to IDO. So it could be a combinated feature request for Icinga 2 and reporting.