tl;dr: how to cleanly configure both local and remote disk checks?
I’m trying to get a minimal setup debugged before I deploy full monitoring. My life is pretty simple: one master host that also runs a number of services, about ten monitored hosts divided into two host groups. For the moment I have just the master and one client configured, and I’m using disk checks as my primary test case for remote monitoring. I’ve successfully set up the remote monitoring (I can run a remote disk check) but I can’t figure out a clean way to configure all the checks.
Here’s my master host (with some HTTP cruft removed and reduced to only one disk):
object Host "masterhost" {
import "generic-host"
display_name = "Masterhost"
address = "127.0.0.1"
address6 = "::1"
vars.os = "Linux"
vars.no_ping = 1
vars.disks["disk /"] = {
disk_partitions = "/"
disk_wfree = "30%"
}
vars.notification["mail"] = {
groups = ["icingaadmins"]
}
}
Here’s the client, again with only one disk (the high wfree is for testing):
object Host "clienthost" {
import "generic-host"
check_command = "hostalive"
address = "clienthost"
display_name = "Endpin"
vars.group = "iotta"
vars.os = "Linux"
vars.client_endpoint = name
vars.notification["mail"] = {
groups = ["icingaadmins"]
}
vars.disks["disk /some/path"] = {
disk_partitions = "/some/path"
disk_wfree = "50%"
disk_cfree = "10%"
check_interval = 5m
// I'd prefer to pick this up from name or vars.client_endpoint,
// and I'd prefer not to have to do it separately for every disk
command_endpoint = "clienthost"
}
}
The problem is that when the services are applied by global-templates/services.conf (taken directly from the distribution) the command_endpoint and check_interval don’t get set. Here’s the relevant apply:
apply Service for (disk => config in host.vars.disks) {
import "generic-service"
check_command = "disk"
vars += config
}
I’ve tried a couple of approaches. My first idea was to run a second apply to correct what I needed (I’ve tried several variations on the body):
apply Service for (disk => config in host.vars.disks) {
import "generic-service"
disk.check_interval = 5m
command_endpoint = host.vars.client_endpoint
//assign where host.vars.client_endpoint
}
That always get rejected as being a duplicate of the apply from global-templates.
My next attempt was an ugly for loop to modify things after the fact:
for (var host in get_objects(Host)) {
for (var disk => var value in host.vars.disks) {
svc = get_service(host, disk)
svc.check_interval = 300
if (host.vars.client_endpoint) {
svc.command_endpoint = host.vars.client_endpoint
}
}
}
Yuck. It didn’t seem to work; “icinga2 object list” still showed a check interval of 60 and no command_endpoint.
The solution I haven’t tried would be to change global-templates to have two apply rules; one would limit itself to hosts that had host.vars.client_endpoint (or some other way of indicating that they’re zone clients–I don’t know the best practice) and the other would always exclude them.
But that seems inelegant and would violate the DRY principle. Surely there’s some better approach? I would appreciate any suggestions people can come up with.