Best way to define a variable with the CPU count on an agent

Hi there,

for the well known check check_load there is the parameter --percpu which divides the load by the number of installed CPUs. This is very handy in an environment with a lot of VMs where you not always know when its owner changes the CPU cores.

Now I want to have the same for check_procs. I would like to have some kind of multiplicator for the thresholds without changing and recompiling the check by myself or creating a pull request and waiting for years to be ignored because the idea might be stupid.

Anyway. I first looked into the Icinga2 language reference to find the possibility to execute external commands like nproc that just output the multiplicator I am looking for and store it in a variable. Unfortunately there is no such command/function/macro that can do that.
So next I thought about a cronjob or a small script that runs on a reboot and sets a variable in /etc/icinga2/constants.conf before the icinga2 service starts on each agent. A modified CheckCommand could then take this variable which can be different on each host and calculate the proper thresholds automatically depended on the CPU count.
And the third idea would be to create a cronjob that uses the Icinga2 API to sets/updates a variable on the Host object that belongs to itself. And this host variable can then be used to calculate the proper thresholds using a modified CheckCommand.

What do you think is the best way? Or is there even a better one?

  1. there are alternative plugins that might already do that
  2. add a small wrapper script that gets the # of cores from proc and then calls the plugin?
1 Like

Your second point is also a good idea.

But I don’t know any alternative ready-to-use plugin that might help. I looked into nagios-plugins, monitoring-plugins and the LinuxFabrik check-plugins so far.

  1. Use some Icinga2 DSL magic in the check arguments.

What do you mean with that? Can you give me an example?

Have a look at this post: How do I call the variable "service.state_id" from a different host and service? - #5 by rivad

You could get the live output of one check (for example: about-me) to figure out the number of CPUs while constructing the arguments of your check_procs command.

1 Like

Sorry if the following is not really helpful, BUT what do you really want to know?
If you want to get an alarm for an overloaded machine, using the PSI interface is IMHO the best candidate on linux.
load is hard to read and close to being useless and the number of processes does not really tell me anything about the load on the system.

It might be worth a try to take a look at check_system_basics or the pressure plugin of the Madrisan collection

1 Like

@lorenz I just opened an issue requesting a PSI check for the Linuxfabrik monitoring plugins.