Hi,
ok, that’s a good basis for both CheckCommand requirements 
Prepare the Master
Communication happens via TLS, so you’ll need to setup a CA key pair, and a signed certificate for the master node. Everything is decoupled into the node wizard
CLI command as shown in the docs.
Setup the Icinga Agent
That’s pretty straight forward with following the docs, install the package and run the setup wizard. Here you’ll decide whether to go with a pre-generated ticket (CSR auto-signing), or you’ll leave it empty and approve the signing request on the master (CSR on-demand-signing).
The master then needs the agent zone and endpoint defined. In this case I’d assume you want the master to actively connect to the agent, so add the host
attribute to the agent’s endpoint.
Host Object Preparations
If not already done, move the Host object into the master
zone on the Master node. Ensure that its object name is the FQDN of the agent host.
object Host "icinga-agent1.example.com" {
//..add your existing configuration.
}
Create a new Endpoint object with the same name, this will be used for telling Icinga which target endpoint will be used for checks.
object Endpoint "icinga-agent1.example.com" {
host = ""192.168.56.110" //add the real IP address where port 5665 is listening on the agent
}
Build the trust relationship by assigning the endpoint to the agent’s zone, which becomes a child of the master
zone. If you miss that, the master will not execute checks on the agent.
object Zone "icinga-agent1.example.com" {
parent = "master"
endpoints = [ "icinga-agent1.example.com" ]
}
Global Zone for Command Sync
Pick global-templates
, this comes by default in both setup CLI tools, master and agent.
Pre-defined commands
The Icinga Template Library already provides a set of CheckCommand objects. The disk check from nrpe.cfg doesn’t need a new CheckCommand object.
command[check_disk1]=/usr/local/nagios/libexec/check_disk -w 20 -c 10 -p /dev/mapper/os-root
This requires a little review of used plugins in the ITL, but is worth the effort to not re-create all CheckCommands by hand.
For the disk CheckCommand, we translate the 3 arguments for later.
-
disk_wfree = 20
from -w 20
-
disk_cfree = 10
from -c 10
-
disk_partitions = [ "/dev/mapper/os-root" ]
from -p /dev/mapper/os-root
Custom CheckCommands
The Icinga agent needs these commands locally defined. With using the global-templates
zone, those commands can be synced from the config master. You can also define them locally on the agent, but this is known to become a problem with many managed agents.
The dns_health_check
requires a new CheckCommand, see the docs how the syntax and attributes work.
command[dns_health_check]=/usr/local/nagios/libexec/dns_health_check.sh node IP-address
This can be translated into
object CheckCommand "dns_health_check" {
command = [ PluginDir + "/dns_health_check.sh" ]
arguments = {
"-H" = {
value = "$dns_health_check_host$"
skip_key = true //the shell script doesn't use getopts, if you decide to use `-H <host>` instead, set this to false
description = "DNS host"
}
"-A" = {
value = "$dns_health_check_address$"
skip_key = true
description = "Expected DNS IP address"
}
}
}
At this stage, put this into /etc/icinga2/zones.d/global-templates/commands.conf
and run icinga2 daemon -C
to check the configuration being valid already.
PluginDir Constant
PluginDir
needs to be set to /usr/local/nagios/libexec
in your agent in the constants.conf
. It is better to use a global constant here, than to hardcode the path in every CheckCommand. If you decide to use a different plugin prefix path by e.g. using the packages, you only need to edit constants.conf then.
Restart and Sync
Restart Icinga on the master and verify that the CheckCommand dns_healh_check
is synced to the agent.
Master:
systemctl restart icinga2
Agent:
icinga object list --type CheckCommand --name dns_health_check
Setup Agent Checks via Command Endpoint
Disk
Here I wouldn’t re-use the previous disk check with the difference of active/passive checks. The Icinga agent can actively connect to the master, or the master connects to the Icinga agent. Therefore the checks from the master always can happen, just one direction needs to initiate the connection before.
In addition to that, the Host object should explicitly take advantage of specifying the disks
as dictionary and not mix all services in an array. This follows the example config from conf.d.
Start simple with a single apply rule
apply Service "disk root" {
check_command = "disk" //provided by the ITL
command_endpoint = host.name
vars.disk_wfree = 20 //extracted from nrpe.cfg, see above
vars.disk_cfree = 10
vars.disk_partitions = [ "/dev/mapper/os-root" ]
assign where host.vars.os == "Linux"
}
Validate the config with icinga2 daemon -C
and restart Icinga 2 with systemctl restart icinga2
.
Force a re-check and retrieve the executed command line via REST API. You can also enable the debug log on the agent, and tail/grep for the check plugin’s name.
Configure Service via Host using Apply For
Then prepare the host object again with the disks
dictionary providing the thresholds and checks.
vars.disks["disk /"] = {
disk_wfree = 20
disk_cfree = 10
disk_partitions = [ "/dev/mapper/os-root" ]
}
And finally create the service apply for rule similar to the example config, except for the command_endpoint attribute.
apply Service for (disk => config in host.vars.disks) {
import "generic-service"
check_command = "disk"
command_endpoint = host.name
vars += config
}
Validate the config with icinga2 daemon -C
and restart Icinga 2 with systemctl restart icinga2
.
Force a re-check and retrieve the executed command line via REST API. You can also enable the debug log on the agent, and tail/grep for the check plugin’s name.
DNS
The existing service apply rule stripped down to the important parts …
apply Service "dns_health_check" {
check_command = "nrpe"
vars.nrpe_command = "dns_health_check"
vars.nrpe_timeout = 60
assign where host.name == "dns"
}
… needs to be changed into the real CheckCommand object reference in check_command
. It also needs the command_endpoint
attribute pointing to the host’s endpoint we’ve defined above. Since the host name is equal to the endpoint name, we can use the trick with command_endpoint = host.name
here.
Further, the dns_health_check had two arguments in the nrpe.cfg, we need to define them here too.
This sums up into the following service apply rule:
apply Service "dns_health_check" {
check_command = "dns_health_check"
command_endpoint = host.name
vars.dns_health_check_host = "<NODE>" //maybe you can use `host.vars...` to provide the check details
vars.dns_health_check_address = "<IP-ADDRESS>"
assign where host.name == "dns"
}
If you accidentally leave out the command_endpoint
attribute, the check will be executed on the master, not the agent endpoint. This is a common source for errors.
Validate the config with icinga2 daemon -C
and restart Icinga 2 with systemctl restart icinga2
.
Force a re-check and retrieve the executed command line via REST API. You can also enable the debug log on the agent, and tail/grep for the check plugin’s name.
Conclusion
This looks longer than it is. I’ve taken the time to step into every little detail you may encounter - once you’ve done it a couple of times, it is easier and faster.
The most important bits - the command arguments for the agent should be managed by the master, and not in their CheckCommand or apply rule, but the Host object becoming the source of truth. This one can be generated from a CMDB, as you’ll likely already do with the advanced DSL code in the service apply rules.
Cheers,
Michael