Question about plugin output and the service status

Hello,

my Name is Bjoern and this is my first post in this community.
I’am in the process of migrating our Icinga1 based monitoring system to Icinga2.
I will not try to convert the old config. Instead I am starting from scratch to get the most of the new system.

I’ve already taken some hurdles along the way, but now I need your help.

While setting up some checks for our network switches, I’ve observed something unusual (at least for me).
I’am using check_interfaces to check the port states and the bandwidth usage.
When the actual bandwidth usage is higher than the configured threshold the corresponding port shows a warning. The service itself shows OK as you can see in the picture.
icinga2_1

I’am not sure if the status is still showing OK because the check was able to retrieve the interface successfully. I’am confused about the overall result. I have expected that the status would change to warning. I have never seen something like this in Icinga1 and I was not able to find something related in the Icinga2 documentation.
Can someone please give me a hint if this is a expected behaviour?
What is to be changed to get the whole service in a warning state (and use it for notifications)?
Am I overlooking something?

Kind regards,
Bjoern

Hi,

very good idea :+1:

In terms of the status question on check_interfaces, the plugin calculates the overall state which is shown as the green OK indicator. The Warning you’re seeing is from the long output and includes everything needed.

Can you extract the executed command line for this check? This would help getting an idea on the parameters called and you can also test this manually.

Cheers,
Michael

Hi,

first of all thanks for your fast response :+1:

I hope that I have pulled the right command line from the debug log.

    [2019-08-08 08:43:36 +0200] notice/Process: Running command '/usr/lib/nagios/plugins/check_interfaces' '--bandwidth' '5' '--hostname' '10.1.36.13' '--lastcheck' '1565246557' '--match-aliases' '--perfdata' 'interfaces::check_multi::plugins=1 interfaces::check_multi::time=0.16 device::check_snmp::uptime=35570878s 1_Gigabit_-_Level::check_snmp::inOctets=2975585594814c 1_Gigabit_-_Level::check_snmp::outOctets=4661973286683c 1_Gigabit_-_Level::check_snmp::inDiscards=0c 1_Gigabit_-_Level::check_snmp::outDiscards=0c 1_Gigabit_-_Level::check_snmp::inErrors=2c 1_Gigabit_-_Level::check_snmp::outErrors=0c 1_Gigabit_-_Level::check_snmp::inUcast=3665987313c 1_Gigabit_-_Level::check_snmp::outUcast=632641836c 1_Gigabit_-_Level::check_snmp::speed=1000000000' '--regex' '^UpC': PID 8495
    [2019-08-08 08:43:36 +0200] notice/Process: PID 8495 ('/usr/lib/nagios/plugins/check_interfaces' '--bandwidth' '5' '--hostname' '10.1.36.13' '--lastcheck' '1565246557' '--match-aliases' '--perfdata' 'interfaces::check_multi::plugins=1 interfaces::check_multi::time=0.16 device::check_snmp::uptime=35570878s 1_Gigabit_-_Level::check_snmp::inOctets=2975585594814c 1_Gigabit_-_Level::check_snmp::outOctets=4661973286683c 1_Gigabit_-_Level::check_snmp::inDiscards=0c 1_Gigabit_-_Level::check_snmp::outDiscards=0c 1_Gigabit_-_Level::check_snmp::inErrors=2c 1_Gigabit_-_Level::check_snmp::outErrors=0c 1_Gigabit_-_Level::check_snmp::inUcast=3665987313c 1_Gigabit_-_Level::check_snmp::outUcast=632641836c 1_Gigabit_-_Level::check_snmp::speed=1000000000' '--regex' '^UpC') terminated with exit code 0

I’ve executed the command directly on the command line and here is the result:

 OK: 1 interface found | interfaces::check_multi::plugins=1 time=0.12 device::check_snmp::uptime=35570967s 1_Gigabit_-_Level::check_snmp::inOctets=2977249923470c outOctets=4661983259826c inDiscards=0c outDiscards=0c inErrors=2c outErrors=0c inUcast=3667151309c outUcast=632790975c speed=1000000000
[WARNING] 1 Gigabit - Level is up   149.60Mbps(14.96%)/896.46kbps(0.09%)

I am a little bit confused about the multiline result. Never seen this before…

Cheers,
Bjoern

Everything after the pipe symbol is your performance data.

That’s defined in the plugin API. I’ve rewritten our docs for 2.11, here’s the snapshot URL: https://icinga.com/docs/icinga2/snapshot/doc/05-service-monitoring/#output to get insights in the allowed plugin output.

The warning for the interface itself originates from the --bandwidth parameter, in and out is higher than this.

I’m not really familiar with the plugin’s code (although coming from our dev team and git blame says me, but I’ve only imported it into the git repo in 2014). From reading it the logic for the overall warning state is calculated from interface errors, not from exceeded the bandwidth.

That explains the different states in the output.

Cheers,
Michael

Thanks for the explanation.
I was not familiar with the multiline outputs and the possibilities that comes with it.

If I understand it correctly, I have to modifiy the source code of check-interface to achieve an warning state in the case that the bandwidth limit is exceeded. In my opinion both criteria (interface errors and bandwidth) should be able to trigger a warning for the service.
I guess this behavioral change should be implemented as an addional commandline switch.

Puh… I will see how far I can get with my C “skills” :wink:

I’m not sure why this is the case. I don’t mind an issue over at GitHub for adjusting the behaviour, imho that’s a bug and I will discuss it with @mhein :slight_smile:

Good morning,

thanks for your help. I am not so expirienced with GitHub and I would not have filed a bug/issue :smiley:
Meanwhile I was able to spot a section of code in snmp_bulkget.c where I could “patch” in the wanted behaviour:

Original Code (starts at Line 1031 in snmp_bulkget.c):

    if (lastcheck && (interfaces[i].speed || speed)) {
        inbitps = (subtract64(interfaces[i].inOctets, oldperfdata[i].inOctets) / (u64)lastcheck) * 8ULL;
        outbitps = (subtract64(interfaces[i].outOctets, oldperfdata[i].outOctets) / (u64)lastcheck) * 8ULL;
        if (speed) {
            inload = (long double)inbitps / ((long double)speed/100L);
            outload = (long double)outbitps / ((long double)speed/100L);
        } else {
            /* use the interface speed if a speed is not given */
            inload = (long double)inbitps / ((long double)interfaces[i].speed/100L);
            outload = (long double)outbitps / ((long double)interfaces[i].speed/100L);
        }
        if ( (bw > 0) && ((int)inload > bw || (int)outload > bw))
            warn++;
    }

Modified Code:

        if (lastcheck && (interfaces[i].speed || speed)) {
            inbitps = (subtract64(interfaces[i].inOctets, oldperfdata[i].inOctets) / (u64)lastcheck) * 8ULL;
            outbitps = (subtract64(interfaces[i].outOctets, oldperfdata[i].outOctets) / (u64)lastcheck) * 8ULL;
            if (speed) {
                inload = (long double)inbitps / ((long double)speed/100L);
                outload = (long double)outbitps / ((long double)speed/100L);
            } else {
                /* use the interface speed if a speed is not given */
                inload = (long double)inbitps / ((long double)interfaces[i].speed/100L);
                outload = (long double)outbitps / ((long double)interfaces[i].speed/100L);
            }

            if ( (bw > 0) && ((int)inload > bw || (int)outload > bw)) {
                warnflag++;
                warn++;
            }
        }

I’ve extended the last if clause with “warnflag++” and put it all in curly braces.
After compiling this check shows the wanted behaviour.
I am not a programmer and I do not know if this change has unwanted effects.

Cheers,
Bjoern

Hehe ok, that’s what I wanted to do as well, nice catch :+1:

GitHub provides you with the possibility to describe your problem and possible solutions, thus opening a ticket for developers. They can comment, assign a release, or say - nice idea, but we cannot do this. It won’t be just me replying here, but a whole team knows about this then.

Don’t be shy, just try it and help improve open source :slight_smile:

Cheers,
Michael

Alright, here we go :+1:
https://github.com/NETWAYS/check_interfaces/issues/8

Please let me know if something could be done better.
Cheers,
Bjoern

1 Like