Check_command disk

Hello,

Since few days, I have a problem with this check command.
I receive wrong alert from Icinga2 at different hours in the day.
Alert is going to warning, crtical, ok, and it start again.

If I check with ssh on the server, all is fine for the disk.

There’s no error in Icinga2’s log.

I find a an old bug which advise to add that in /usr/share/icinga2/include/command-plugins.conf
vars.disk_wfree = “15%” vars.disk_cfree = “10%” vars.disk_inode_wfree = “15%” vars.disk_inode_cfree = “10%” vars.disk_megabytes = true vars.disk_exclude_type = [ “none”, “tmpfs”, “sysfs”, “proc”, “configfs”, “devtmpfs”, “devfs”, “mtmfs”, “tracefs”, “cgroup”, “fuse.gvfsd-fuse”, “fuse.gvfs-fuse-daemon”, “fdescfs”, “overlay”, “nsfs”,

but nothing change.

Icinga2 is on a Debian GNU/Linux 9.12 (stretch)
icinga2 - The Icinga 2 network monitoring daemon (version: r2.11.3-1)

if someone has a lead to check my configuration, I’m interested.

Hi,

some more informations would be helpful. What does critical exactly mean? A plugin error? Does it report a full disk or partition? Is it always the same disk/partition that’s affected? Is it a local disk or some NFS for example?

This would be useful to help.

Cheers,
Marcel

Additionally, if you can go to the check in Icingaweb, click “history” at the top and screenshot a chunk of that, we can see the status changes.

Hi Marcel and Blake,

I found the problem, it was the disk’s server which was full.
I move the /var/spool/icinga2/perfdata and /var/spool/icinga2/tmp to another disk and create symbolic link.

But I think I missed something in the definition of disk service.
All my Linux’s servers have the disk of Icinga2’s server.

//Disque Linux
apply Service “disquelinux” {
display_name = “Disque”
import “generic-service”

check_command = “disk”

    vars.disk_wfree = "15%"
    vars.disk_cfree = "10%"
    vars.disk_inode_wfree = "15%"
    vars.disk_inode_cfree = "10%"
    vars.disk_megabytes = true
    vars.disk_exclude_type = [
            "none",
            "tmpfs",
            "sysfs",
            "proc",
            "configfs",
            "devtmpfs",
            "devfs",
            "mtmfs",
            "tracefs",
            "cgroup",
            "fuse.gvfsd-fuse",
            "fuse.gvfs-fuse-daemon",
            "fdescfs",
            "overlay",
            "nsfs",
            "squashfs"
    ]

assign where host.address && host.vars.os == “Linux”
}

Best regards

Hello,

I’m concern to this issue(look at the above messages), i have to choose vars.disk_megabytes but i want something in gigabytes or terabytes.

As mentioned, the file /usr/share/icinga2/include/command-plugins.conf|grep disk is formatted to choose megabytes.

By the way, i specified that i need that method for another purpose (be able to choose partitions i need).

Best regards,
Moustapha Kourouma

Hello @Julien!

In your service apply rule I don’t see a command_endpoint attribute. However a such is required to pin checks to specific endpoints. And especially check_disk has to run on the host you’d like to check.

Are you sure it always actually runs on the desired host?

Best,
AK

I have setup Icinga host , no Director yet. Added few hosts for Agent-based Monitoring [Step 2 – Setting up Agent-based Monitoring] following this How To Monitor Hosts and Services with Icinga on Ubuntu 16.04 | DigitalOcean

I added disk check but having issues . Here is the check :
apply Service “disk” {
import “generic-service”
check_command = “disk”
// vars.disk_all = true
vars.disk_exclude_type = [
“none”,
“tmpfs”,
“sysfs”,
“proc”,
“configfs”,
“devtmpfs”,
“devfs”,
“mtmfs”,
“tracefs”,
“cgroup”,
“fuse.gvfsd-fuse”,
“fuse.gvfs-fuse-daemon”,
“fdescfs”,
“overlay”,
“nsfs”,
“squashfs”
]
vars.disk_ereg_path = [ “/” ]
vars.disk_ignore_ereg_path = [ “/run*”,“/var/snap*”, “/run/user/1000/doc” ]
command_endpoint = host.vars.client_endpoint
assign where host.vars.client_endpoint
}
The issues is it will trow error for example
DISK CRITICAL - /run/user/1000/doc is not accessible: Permission denied
Every other check. One check is fine , next time it checks it trows this error. Not sure why it will behave like that and not always return the correct result. Am I doing something wrong?

BTW as user nagios everything is ok , just somehow it doesn’t exclude unnecessary filesystems :
nagios@workstation-01:/root$ /usr/lib/nagios/plugins/check_disk -w 80 -c 90 -A -i ‘/run*’ -i “/snap*” -X tmpfs -X devtmpfs -X devpts -X hugetlbfs
DISK OK - free space: / 42933 MB (72% inode=94%); /boot/efi 98 MB (94% inode=-);

On the workstation host I see on debug.log the command is executed differently every time :
‘-c’ ‘10%’ ‘-w’ ‘20%’ ‘-X’ ‘none’ ‘-X’ ‘tmpfs’ ‘-X’ ‘sysfs’ ‘-X’ ‘proc’ ‘-X’ ‘configfs’ ‘-X’ ‘devtmpfs’ ‘-X’ ‘devfs’ ‘-X’ ‘mtmfs’ ‘-X’ ‘tracefs’ ‘-X’ ‘cgroup’ ‘-X’ ‘fuse.gvfsd-fuse’ ‘-X’ ‘fuse.gvfs-fuse-daemon’ ‘-X’ ‘fdescfs’ ‘-X’ ‘overlay’ ‘-X’ ‘nsfs’ ‘-X’ ‘squashfs’ ‘-m’) terminated with exit code 2

‘-c’ ‘10%’ ‘-w’ ‘20%’ ‘-X’ ‘none’ ‘-X’ ‘tmpfs’ ‘-X’ ‘sysfs’ ‘-X’ ‘proc’ ‘-X’ ‘configfs’ ‘-X’ ‘devtmpfs’ ‘-X’ ‘devfs’ ‘-X’ ‘mtmfs’ ‘-X’ ‘tracefs’ ‘-X’ ‘cgroup’ ‘-X’ ‘fuse.gvfsd-fuse’ ‘-X’ ‘fuse.gvfs-fuse-daemon’ ‘-X’ ‘fdescfs’ ‘-X’ ‘overlay’ ‘-X’ ‘nsfs’ ‘-X’ ‘squashfs’ ‘-m’ ‘-p’ ‘/’) terminated with exit code 0

When exit code is 0 there is ‘-p’ ‘/’ option , why is this difference? I only have declared disk once .

Hello Venko!

We’re working on it:

Best,
A/K

Thanks, one would think it wont take years :slight_smile: