Disk checks on both local and remote host

tl;dr: how to cleanly configure both local and remote disk checks?

I’m trying to get a minimal setup debugged before I deploy full monitoring. My life is pretty simple: one master host that also runs a number of services, about ten monitored hosts divided into two host groups. For the moment I have just the master and one client configured, and I’m using disk checks as my primary test case for remote monitoring. I’ve successfully set up the remote monitoring (I can run a remote disk check) but I can’t figure out a clean way to configure all the checks.

Here’s my master host (with some HTTP cruft removed and reduced to only one disk):

object Host "masterhost" {
    import "generic-host"
    display_name = "Masterhost"
    address = "127.0.0.1"
    address6 = "::1"
    vars.os = "Linux"
    vars.no_ping = 1
    vars.disks["disk /"] = {
        disk_partitions = "/"
        disk_wfree = "30%"
    }
    vars.notification["mail"] = {
        groups = ["icingaadmins"]
    }
}

Here’s the client, again with only one disk (the high wfree is for testing):

object Host "clienthost" {
    import "generic-host"
    check_command = "hostalive"
    address = "clienthost"
    display_name = "Endpin"
    vars.group = "iotta"
    vars.os = "Linux"
    vars.client_endpoint = name
    vars.notification["mail"] = {
        groups = ["icingaadmins"]
    }
    vars.disks["disk /some/path"] = {
        disk_partitions = "/some/path"
        disk_wfree = "50%"
        disk_cfree = "10%"
        check_interval = 5m
        // I'd prefer to pick this up from name or vars.client_endpoint,
        // and I'd prefer not to have to do it separately for every disk
        command_endpoint = "clienthost"
    }
 }

The problem is that when the services are applied by global-templates/services.conf (taken directly from the distribution) the command_endpoint and check_interval don’t get set. Here’s the relevant apply:

apply Service for (disk => config in host.vars.disks) {
  import "generic-service"

  check_command = "disk"

  vars += config
}

I’ve tried a couple of approaches. My first idea was to run a second apply to correct what I needed (I’ve tried several variations on the body):

apply Service for (disk => config in host.vars.disks) {
    import "generic-service"
    disk.check_interval = 5m
    command_endpoint = host.vars.client_endpoint
    //assign where host.vars.client_endpoint
}

That always get rejected as being a duplicate of the apply from global-templates.

My next attempt was an ugly for loop to modify things after the fact:

for (var host in get_objects(Host)) {
    for (var disk => var value in host.vars.disks) {
        svc = get_service(host, disk)
        svc.check_interval = 300
        if (host.vars.client_endpoint) {
            svc.command_endpoint = host.vars.client_endpoint
        }
    }
}

Yuck. It didn’t seem to work; “icinga2 object list” still showed a check interval of 60 and no command_endpoint.

The solution I haven’t tried would be to change global-templates to have two apply rules; one would limit itself to hosts that had host.vars.client_endpoint (or some other way of indicating that they’re zone clients–I don’t know the best practice) and the other would always exclude them.

But that seems inelegant and would violate the DRY principle. Surely there’s some better approach? I would appreciate any suggestions people can come up with.

Hi,

welcome to the community :slight_smile: Can you share some configuration bits for the disk check you’ve implemented already?

Cheers,
Michael

Sorry, got bit by some kind of disconnect between the BBS and Atomic-Chrome. I’ve now recreated the long post I originally typed.

Hi,

sorry for the late reply, busy times last week with being in London at GitLab Commit :slight_smile:

Since you are already setting the vars.client_endpoint to the host agent name, that’s a good preparation.

The basic apply rule for service disks is fine for the local checks. Whenever an agent host is encounted, you’d want this service object to have the command_endpoint attribute being set.

You can either solve this with two apply rules like this:

// Local non-client endpoint checks
apply Service for (disk => config in host.vars.disks) {
  import "generic-service"

  check_command = "disk"

  vars += config

  ignore where host.vars.client_endpoint != ""
}

// Remote agent hosts with client_endpoint being set
apply Service for (disk => config in host.vars.disks) {
  import "generic-service"

  check_command = "disk"

  command_endpoint = host.vars.client_endpoint

  vars += config

  assign where host.vars.client_endpoint != ""
}

If you prefer to keep it in one apply for rule, you can also achieve this with using conditions.

apply Service for (disk => config in host.vars.disks) {
  import "generic-service"

  check_command = "disk"

  // set the command endpoint based on the host type being local or an agent with vars.client_endpoint
  if (host.vars.client_endpoint != "") {
    command_endpoint = host.vars.client_endpoint
  }

  vars += config
}

That’s the simplest approach, but conditions might become unreadable without comments, and likely you want to modify the agent apply rule further.
So the first approach might be more usable - it only has the disadvantages that you need to control to not create duplicate objects, hence the assign where/ignore where conditions negating themselves.

Does that help with your question?

Cheers,
Michael

PS: The global for loop with get_objects won’t work here, the scope does not allow to fetch objects there.

Thanks, Michael, that’s exactly what I wanted to know! I’ll think about it and choose one of the approaches you suggested.

1 Like

Hi All,

I want to monitor disk on a remote host and this is my Host and Service Configuration:

Service Configuration:
apply Service for (disk => config in host.vars.disks) {
import “generic-service”

check_command = “disk”

command_endpoint = host.vars.client_endpoint

vars += config
}

Host Configuration:
object Host “remote-host”{

import “generic-host”
address = “remote-host”
check_interval = “5m”
display_name = “remote-host”
vars.disks[“disk /”] = {
disk_partitions = “/”
disk_wfree = “20%”
disk_cfree = “10%”
check_interval = 5m
command_endpoint = “remote-host” // this is my host address
}
}

But this checks the disk on my icinga server and not on the “remote-host”.

I do not have any agent on "remote-host". Why I am not able to check disk on the remote-host. What I am doing wrong?

Thank you

This is definitely possible; I know because I do it. But your
local machine has to have a way to know what the disk space is on
the remote host. If you can’t type a command on your local
machine that tells you the remote host’s disk space, Icinga can’t
do that either.

There are two approaches, and I use both. The first method, which
is more powerful and “approved”, is to have a remote agent. Then
you can set up a control file in /etc/icinga2/zones.d/master that
uses the agent to check space on the remote host. You can also
check other things, like the remote load, etc.

The downside of that approach is that when you’re a beginner with
Icinga, it’s quite difficult to set up your first remote
monitoring agent. There is lots of documentation, but that means
you have to wade through a lot of stuff and make a lot of
decisions. It took me several months (very part-time) to get
something that worked and worked the way I wanted it.

The other approach is good when you don’t have a remote agent or
you’re in a hurry. If you can NFS-mount the remote disk onto your
local machine, you can then stick the mount point into vars.disks
and it’ll Just Work.

In fact, I combine those approaches: my Icinga instance runs on
machine A, but A isn’t allowed to mount from machine B. Machine C
can mount, so I run a remote agent on C that check’s C’s disk
space as a local disk, and also mounts all of B’s filesystems via
NFS and checks those. (Don’t ask why I don’t run an agent on B;
the reasons are silly and I’m going to fix it someday.)

1 Like

Thank you very much for your suggestions. I would try and let you know If I need any further help to set it up.