Help please: adapting disk thresholds per host

Hi all,

We recently put in an Icinga monitoring system and have a couple of servers that we would like to change the disk thresholds for because they’re caching servers that constantly sit at a high usage level.

We’re trying to override it in the hosts configuration but it seems to just ignore it. If we adjust vars.disk_cfree or wfree in the services configuration files (as per the commented out lines in the services.conf extract below) it works perfectly but obviously affects all systems that have that service. If we try this in hosts nothing changes:

	object Host "server001" {
		address = "1.2.3.4"
		vars.os = "Linux"
		vars.application = "my-application"
		import "my-additonal-application-checks"
		vars.disk_wfree = "2%"
		vars.disk_wfree = "1%"
	}

According to what we’ve seen on variables precedence we would expect variables in hosts’ conf files to take priority over those on the service, so we must be doing something wrong.

I’m new to Icinga so please excuse any misunderstandings or superfluous information; I’m not quite sure what’s standard. Our configuration is as below (it seems pretty logical so I imagine it’s fairly normal):

Hosts are configured in conf.d/hosts as below:

	object Host "server001" {
		address = "1.2.3.4"
		vars.os = "Linux"
		vars.application = "my-application"
		import "my-additonal-application-checks"
	}

Hosts are added to hostgroups in hostgroups.conf:

	object HostGroup "linux-servers"  {
	  display_name = "Linux Servers"
	  assign where host.vars.os == "Linux"
	}

Services are then applied according to services.conf files:

	apply Service "Disks" to Host {
	  display_name = "Linux Disks"
	  import "by_ssh"
	  vars.by_ssh_command = "/usr/lib64/nagios/plugins/check_disk"
	  vars.by_ssh_arguments = {
		 "-X" = {
		  value = "$disk_exclude_type$"
		  description = "Ignore all filesystems of indicated type (may be repeated)"
		  repeat_key = true
		}
		"-w" = {
		  value = "$disk_wfree$"
		  description = "Exit with WARNING status if less than INTEGER units of disk are free or Exit with WARNING status if less than PERCENT of disk space is free"
		  required = true
		  order = -3
		}
		"-c" = {
		  value = "$disk_cfree$"
		  description = "Exit with CRITICAL status if less than INTEGER units of disk are free or Exit with CRITCAL status if less than PERCENT of disk space is free"
		  required = true
		  order = -3
		}
		"-u" = {
		  value = "MB"
		}
	  }
	 # vars.disk_wfree = "2%"
	 # vars.disk_cfree = "1%"
	  vars.disk_exclude_type = [
		"none",
		"tmpfs",
		"sysfs",
		"proc",
		"configfs",
		"devtmpfs",
		"devfs",
		"mtmfs",
		"tracefs",
		"cgroup",
		"fuse.gvfsd-fuse",
		"fuse.gvfs-fuse-daemon",
		"fdescfs",
		"overlay",
		"nsfs",
		"squashfs"
	  ]
	  assign where "linux-servers" in host.groups && host.name != NodeName
	}

Hi and welcome :slight_smile:,

you need to hand over the parameters from your host to the service.
You already defined the parameters on the host level:

vars.disk_wfree = "2%"
vars.disk_cfree = "1%"

and on the service definition reference the parameters from the host with:

vars.disk_wfree = host.vars.disk_wfree
vars.disk_cfree = host.vars.disk_cfree

This requires that on the host are always the parameters set but you can do a fallback to a default value when the parameters are not set, example with disk_wfree:

// check if disk_wfree is set on host level
if (host.vars.disk_wfree) {
    // use value from host
    vars.disk_wfree = host.vars.disk_wfree
} else {
    // fallback to defaut 
    vars.disk_wfree = "2%"
}

PS: Looks like a typo on the host definition, you defined disk_wfree twice.

Greetz

Thanks for the great reply Alex - that’s interesting, it seems to work backwards to how I would think it would be done, but I’m happy either way!

Thanks for pointing out the typo - I’d deleted that out of our config but mocked it up again quickly to put in this post and forgot to change a w to a c.

I’ll give it a go and let you know how I get on.

Cheers :relaxed:

Hey Alex,

That worked perfectly, thanks very much.

I got it working and then figured that I would want to restrict the custom thresholds to certain partitions - I don’t really want all my partitions to act as if they’re always going to be within a few percent of being full.

With that in mind, I configured my existing Linux checks service to ignore the customised partitions I set in the hosts file, and added another disk check to check just the customised ones. It works nicely, except, if there isn’t anything to exclude, the -E parameter in the check gets ignored and I basically have two identical standard disk checks happening, which isn’t great.

This is what I’ve been trying to do in a sort of pseudo code to get around it by cancelling the check for partitions with customised thresholds:

if (length of host.vars.disk_custom_partitions = 0)
    exit 0 #exit with success - check not required
else
    continue check

Unfortunately I can’t for the life of me work out how I should achieve this in Icinga - it doesn’t really seem to allow you to just continue with the check or to work with not conditions. Have you any thoughts that might help here please?

  if not (host.vars.disk_custom_partitions) {
    dummy_state 0
  } else {
    // Do nothing, continue the check
  }

Thanks :slight_smile:

Hi,

for that check i think you could add a second assign where to the service object - something like

// assign check where host.vars.disk_custom_partitions is defined
assign where host.vars.disk_custom_partitions

Greetz

Thanks Alex, that makes perfect sense - don’t run the check rather than exit from it.

I think it all works as I want now, thanks for your help!