I’m trying to find a way to know what processes are using memory whenever a memory threshold is triggered. Ideally this information would be part of the memory check and not a separate check altogether. There are quite a few checks out there for both *nix and Windows for memory utilization, but I can’t seem to find any that tell you what is actually using the excessive amount of RAM. When you already know the process name or pid, you can craft checks to monitor the info on that particular process, but there’s always a chance of an unknown process using the memory.
Has anyone else tackled this issue? If so how? How are you storing the historical memory usage per process? Are you able to reference this in any graphs or reports? Ideally I’d like to have a mouseover in grafana that shows the top n processes when hovering over a memory usage graph.
Just thinking out loud here…the issue is that everything I have is mostly saved in a TSDB, everything from the checks at least. Maybe some perfdata output is stored in the icinga2 DB…not sure. But for the historical process info, that stuff would need to be put into something that can handle strings…so not graphite/whisper, something meant to handle event logging. I don’t know, just hoping somebody else has handled this.
-m, --metric=TYPE
Check thresholds against metric. Valid types:
PROCS - number of processes (default)
VSZ - virtual memory size
RSS - resident set memory size
CPU - percentage CPU
ELAPSED - time elapsed in seconds
check_procs -w 50000 -c 100000 --metric=VSZ
Alert if VSZ of any processes over 50K or 100K
check_procs -w 10 -c 20 --metric=CPU
Alert if CPU of any processes over 10%% or 20%%