Problems with argument array "latency" - Error: Non-optional macro 'nn_mounts' used in argument '-p' is missing

Hi,

I’ve been struggling to figure out why the host template filed (array) I defined works “on/off” during the course of the day. It almost seems there is a latency with resolving the value fro “nn_mounts” that is used in a local check. I’m not sure what all to provide here, so please let me know if I am missing something. Am I defining things in the wrong place?

Director data fields entry

Host template

zones.d/director-global/host_templates.conf
template Host "hadoop-vars" {
    vars.nn_mounts = [ "/hadoop" ]
}

Inherited by:

zones.d/director-global/host_templates.conf
template Host "Standard Linux Server" {
    import "host-vars"
    import "hadoop-vars"

    check_command = "ping4"
}

Command: check_disk: https://github.com/nagios-plugins/nagios-plugins

zones.d/director-global/commands.conf
object CheckCommand "check-disk-filedrop" {
import "plugin-check-command"
command = [ PluginDir + "/check_disk" ]
timeout = 1m
arguments += {
    "-A" = {
        description = "Explicitly select all paths. This is equivalent to -R .*"
        order = 1
        set_if = "$disk_all$"
    }
    "-C" = {
        description = "Clear thresholds"
        set_if = "$disk_clear$"
    }
    "-E" = {
        description = "For paths or partitions specified with -p, only check for exact paths"
        set_if = "$disk_exact_match$"
    }
    "-I" = {
        description = "Regular expression to ignore selected path/partition (case insensitive) (may be repeated)"
        order = 2
        repeat_key = true
        value = "$disk_ignore_eregi_path$"
    }
    "-K" = {
        description = "Exit with CRITICAL status if less than PERCENT of inode space is free"
        order = -3
        value = "$disk_inode_cfree$"
    }
    "-L" = {
        description = "Only check local filesystems against thresholds. Yet call stat on remote filesystems to test if they are accessible (e.g. to detect Stale NFS Handles)"
        set_if = "$disk_stat_remote_fs$"
    }
    "-M" = {
        description = "Display the mountpoint instead of the partition"
        set_if = "$disk_mountpoint$"
    }
    "-R" = {
        description = "Case insensitive regular expression for path/partition (may be repeated)"
        repeat_key = true
        value = "$disk_eregi_path$"
    }
    "-W" = {
        description = "Exit with WARNING status if less than PERCENT of inode space is free"
        order = -3
        value = "$disk_inode_wfree$"
    }
    "-X" = {
        description = "Ignore all filesystems of indicated type (may be repeated)"
        repeat_key = true
        value = "$disk_exclude_type$"
    }
    "-c" = {
        description = "Exit with CRITICAL status if less than INTEGER units of disk are free or Exit with CRITCAL status if less than PERCENT of disk space is free"
        order = -3
        required = true
        value = "15%"
    }
    "-e" = {
        description = "Display only devices/mountpoints with errors"
        set_if = "$disk_errors_only$"
    }
    "-f" = {
        description = "Don't account root-reserved blocks into freespace in perfdata"
        set_if = "$disk_ignore_reserved$"
    }
    "-g" = {
        description = "Group paths. Thresholds apply to (free-)space of all partitions together"
        value = "$disk_group$"
    }
    "-i" = {
        description = "Regular expression to ignore selected path or partition (may be repeated)"
        order = 2
        repeat_key = true
        value = "$disk_ignore_ereg_path$"
    }
    "-k" = {
        description = "Same as --units kB"
        set_if = "$disk_kilobytes$"
    }
    "-l" = {
        description = " Only check local filesystems"
        set_if = "$disk_local$"
    }
    "-m" = {
        description = "Same as --units MB"
        set_if = "$disk_megabytes$"
    }
    "-p" = {
        description = "Path or partition (may be repeated)"
        order = 1
        repeat_key = true
        value = "$disk_partitions$"
    }
    "-p_old" = {
        order = 1
        value = "$disk_partition$"
    }
    "-r" = {
        description = "Regular expression for path or partition (may be repeated)"
        repeat_key = true
        value = "$disk_ereg_path$"
    }
    "-t" = {
        description = "Seconds before connection times out (default: 10)"
        value = "$disk_timeout$"
    }
    "-u" = {
        description = "Choose bytes, kB, MB, GB, TB (default: MB)"
        value = "$disk_units$"
    }
    "-w" = {
        description = "Exit with WARNING status if less than INTEGER units of disk are free or Exit with WARNING status if less than PERCENT of disk space is free"
        order = -3
        required = true
        value = "25%"
    }
    "-x" = {
        description = "Ignore device (only works if -p unspecified)"
        value = "$disk_partitions_excluded$"
    }
    "-x_old" = "$disk_partition_excluded$"
}
vars.disk_cfree = "10%"
vars.disk_exclude_type = [
    "none",
    "tmpfs",
    "sysfs",
    "proc",
    "configfs",
    "devtmpfs",
    "devfs",
    "mtmfs",
    "tracefs",
    "cgroup",
    "fuse.gvfsd-fuse",
    "fuse.gvfs-fuse-daemon",
    "fdescfs",
    "overlay",
    "nsfs",
    "squashfs"
]
vars.disk_megabytes = true
vars.disk_wfree = "20%"
}

I tried reordering the field priority and making ‘-p’ always required, but alas, no luck.

Event history snippet:

Sunday, October 27, 2019
DOWNTIME START
14:00:00
 [icingaadmin] Scheduled downtime for Sunday maintenance

Monday, October 21, 2019
OK
15:14:01
 DISK OK - free space: /hadoop 226187 MB (44.19% inode=100%);

UNKNOWN
15:13:02
 [ 1/7 ] Error: Non-optional macro 'nn_mounts' used in argument '-p' is missing.

OK
15:09:03
 DISK OK - free space: /hadoop 226187 MB (44.19% inode=100%);

UNKNOWN
15:08:04
 [ 1/7 ] Error: Non-optional macro 'nn_mounts' used in argument '-p' is missing.

OK
15:04:04
 DISK OK - free space: /hadoop 226187 MB (44.19% inode=100%);

UNKNOWN
15:03:05
 [ 1/7 ] Error: Non-optional macro 'nn_mounts' used in argument '-p' is missing.

OK
14:59:06
 DISK OK - free space: /hadoop 226187 MB (44.19% inode=100%);

Do you use a cluster or multiple endpoints in the zone? Maybe the checkcommand is not synced correctly everywhere.

Please post the service apply rule where the var vars.nn_mounts is set to “$disk_partitions$”.

I guess that would be multiple endpoints in the zone. We have one master and connect several hosts with an Icinga agent.

Sorry, this is the right command, I pasted the wrong one above, but it is similar. I trimmed out the normal check_disk args I did not change.

zones.d/director-global/commands.conf
object CheckCommand "check-disk-hadoop" {
    import "plugin-check-command"
    command = [ PluginDir + "/check_disk" ]
    timeout = 1m
    arguments += {
        "-A" = {
            description = "Explicitly select all paths. This is equivalent to -R .*"
            order = 1
            set_if = "$disk_all$"
        }
  [...]
        }
        "-p" = {
            description = "Path or partition (may be repeated)"
            order = 1
            repeat_key = true
            required = true
            value = "$nn_mounts$"
        }
[...]
    vars.disk_cfree = "10%"
    vars.disk_exclude_type = [
        "none",
        "tmpfs",
        "sysfs",
        "proc",
        "configfs",
        "devtmpfs",
        "devfs",
        "mtmfs",
        "tracefs",
        "cgroup",
        "fuse.gvfsd-fuse",
        "fuse.gvfs-fuse-daemon",
        "fdescfs",
        "overlay",
        "nsfs",
        "squashfs"
    ]
    vars.disk_ignore_eregi_path = "/var/lib/docker"
    vars.disk_megabytes = true
    vars.disk_wfree = "20%"
}

zones.d/director-global/service_templates.conf
template Service "check-disk-hadoop" {
    check_command = "check-disk-hadoop"
    max_check_attempts = "7"
    enable_notifications = true
    enable_active_checks = true
    command_endpoint = host_name
}

zones.d/director-global/service_apply.conf
apply Service "check-hadoop-mount" {
    import "check-disk-hadoop"
    assign where "dev-mgt-nodes" in host.groups

    import DirectorOverrideTemplate
}

looks fine to me. I only had the problem when one zone has multiple endpoints. The “Check source” that is shown in icingaweb2 should be a different one when the error occours.