Check_by_ssh with persistent connections

MarcusCaepio · March 27, 2019, 8:59pm

Author: @MarcusCaepio

Revision: v0.1

Tested with:

Icinga 2 different versions

Different Linux distributions

Introduction

Hi all,
if you are monitoring your linux clients agentless via check_by_ssh, you may also get the problem, that you have a lot of messages in /var/log/syslog or /var/log/auth for every SSH check, because every check creates a new SSH Session. Instead of this, you can also use a persistent SSH connection.

Since OpenSSH 5.6:
Added a ControlPersist option to ssh_config(5) that automatically
starts a background ssh(1) multiplex master when connecting. This
connection can stay alive indefinitely, or can be set to
automatically close after a user-specified duration of inactivity.

As I did not find any topics about persistent SSH connections, here is a little how-to (OS: Ubuntu 16.04):
On your Checking Server (Master, Satellites), create a directory for the control socket. I use /var/run/icinga2 (which is already there)

Configuration

Create a Service Template for your SSH Checks:

template Service "by_ssh" {
  import "generic-service"
  check_command = "by_ssh" 
  vars.by_ssh_logname = "nagios"
  vars.by_ssh_identity = "/<path_to_your>/id_rsa"
  vars.by_ssh_options = [ "ControlMaster=auto","ControlPath=/var/run/icinga2/$host.name$","ControlPersist=10m" ]
}

Where

ControlMaster=auto: Create the control master socket automatically
ControlPersist=10m: Enable Control persist and spam a ssh process in background that will keep the connection for 10 minutes after your last SSH session on that connection has exited
ControlPath=/var/run/icinga2/ssh/$host.name$: Path to the control socket.

Then use it for your further SSH checks:

apply Service "load" {
	import "by_ssh"
	vars.by_ssh_command = "/usr/lib/nagios/plugins/check_load"
	vars.by_ssh_arguments = {
		"-w" = {
                        value = "$load_wload1$,$load_wload5$,$load_wload15$"
                        description = "Exit with WARNING status if load average exceeds WLOADn"
                }
        "-c" = {
                        value = "$load_cload1$,$load_cload5$,$load_cload15$"
                        description = "Exit with CRITICAL status if load average exceed CLOADn; the load average format is the same used by 'uptime' and 'w'"
                }
        "-r" = {
                        set_if = "$load_percpu$"
                        description = "Divide the load averages by the number of CPUs (when possible)"
                }
	}
assign where host.vars.os == "linux"
}

and so on…

Cheers,
Marcus