Author: @MarcusCaepio
Revision: v0.1
Tested with:
- Icinga 2 different versions
- Different Linux distributions
Introduction
Hi all,
if you are monitoring your linux clients agentless via check_by_ssh, you may also get the problem, that you have a lot of messages in /var/log/syslog or /var/log/auth for every SSH check, because every check creates a new SSH Session. Instead of this, you can also use a persistent SSH connection.
Since OpenSSH 5.6:
Added a ControlPersist option to ssh_config(5) that automatically
starts a background ssh(1) multiplex master when connecting. This
connection can stay alive indefinitely, or can be set to
automatically close after a user-specified duration of inactivity.
As I did not find any topics about persistent SSH connections, here is a little how-to (OS: Ubuntu 16.04):
On your Checking Server (Master, Satellites), create a directory for the control socket. I use /var/run/icinga2 (which is already there)
Configuration
Create a Service Template for your SSH Checks:
template Service "by_ssh" {
import "generic-service"
check_command = "by_ssh"
vars.by_ssh_logname = "nagios"
vars.by_ssh_identity = "/<path_to_your>/id_rsa"
vars.by_ssh_options = [ "ControlMaster=auto","ControlPath=/var/run/icinga2/$host.name$","ControlPersist=10m" ]
}
Where
- ControlMaster=auto: Create the control master socket automatically
- ControlPersist=10m: Enable Control persist and spam a ssh process in background that will keep the connection for 10 minutes after your last SSH session on that connection has exited
- ControlPath=/var/run/icinga2/ssh/$host.name$: Path to the control socket.
Then use it for your further SSH checks:
apply Service "load" {
import "by_ssh"
vars.by_ssh_command = "/usr/lib/nagios/plugins/check_load"
vars.by_ssh_arguments = {
"-w" = {
value = "$load_wload1$,$load_wload5$,$load_wload15$"
description = "Exit with WARNING status if load average exceeds WLOADn"
}
"-c" = {
value = "$load_cload1$,$load_cload5$,$load_cload15$"
description = "Exit with CRITICAL status if load average exceed CLOADn; the load average format is the same used by 'uptime' and 'w'"
}
"-r" = {
set_if = "$load_percpu$"
description = "Divide the load averages by the number of CPUs (when possible)"
}
}
assign where host.vars.os == "linux"
}
and so on…
Cheers,
Marcus