my Name is Bjoern and this is my first post in this community.
I’am in the process of migrating our Icinga1 based monitoring system to Icinga2.
I will not try to convert the old config. Instead I am starting from scratch to get the most of the new system.
I’ve already taken some hurdles along the way, but now I need your help.
While setting up some checks for our network switches, I’ve observed something unusual (at least for me).
I’am using check_interfaces to check the port states and the bandwidth usage.
When the actual bandwidth usage is higher than the configured threshold the corresponding port shows a warning. The service itself shows OK as you can see in the picture.
I’am not sure if the status is still showing OK because the check was able to retrieve the interface successfully. I’am confused about the overall result. I have expected that the status would change to warning. I have never seen something like this in Icinga1 and I was not able to find something related in the Icinga2 documentation.
Can someone please give me a hint if this is a expected behaviour?
What is to be changed to get the whole service in a warning state (and use it for notifications)?
Am I overlooking something?
In terms of the status question on check_interfaces, the plugin calculates the overall state which is shown as the green OK indicator. The Warning you’re seeing is from the long output and includes everything needed.
Can you extract the executed command line for this check? This would help getting an idea on the parameters called and you can also test this manually.
The warning for the interface itself originates from the --bandwidth parameter, in and out is higher than this.
I’m not really familiar with the plugin’s code (although coming from our dev team and git blame says me, but I’ve only imported it into the git repo in 2014). From reading it the logic for the overall warning state is calculated from interface errors, not from exceeded the bandwidth.
Thanks for the explanation.
I was not familiar with the multiline outputs and the possibilities that comes with it.
If I understand it correctly, I have to modifiy the source code of check-interface to achieve an warning state in the case that the bandwidth limit is exceeded. In my opinion both criteria (interface errors and bandwidth) should be able to trigger a warning for the service.
I guess this behavioral change should be implemented as an addional commandline switch.
Puh… I will see how far I can get with my C “skills”
I’m not sure why this is the case. I don’t mind an issue over at GitHub for adjusting the behaviour, imho that’s a bug and I will discuss it with @mhein
thanks for your help. I am not so expirienced with GitHub and I would not have filed a bug/issue
Meanwhile I was able to spot a section of code in snmp_bulkget.c where I could “patch” in the wanted behaviour:
Original Code (starts at Line 1031 in snmp_bulkget.c):
if (lastcheck && (interfaces[i].speed || speed)) {
inbitps = (subtract64(interfaces[i].inOctets, oldperfdata[i].inOctets) / (u64)lastcheck) * 8ULL;
outbitps = (subtract64(interfaces[i].outOctets, oldperfdata[i].outOctets) / (u64)lastcheck) * 8ULL;
if (speed) {
inload = (long double)inbitps / ((long double)speed/100L);
outload = (long double)outbitps / ((long double)speed/100L);
} else {
/* use the interface speed if a speed is not given */
inload = (long double)inbitps / ((long double)interfaces[i].speed/100L);
outload = (long double)outbitps / ((long double)interfaces[i].speed/100L);
}
if ( (bw > 0) && ((int)inload > bw || (int)outload > bw))
warn++;
}
Modified Code:
if (lastcheck && (interfaces[i].speed || speed)) {
inbitps = (subtract64(interfaces[i].inOctets, oldperfdata[i].inOctets) / (u64)lastcheck) * 8ULL;
outbitps = (subtract64(interfaces[i].outOctets, oldperfdata[i].outOctets) / (u64)lastcheck) * 8ULL;
if (speed) {
inload = (long double)inbitps / ((long double)speed/100L);
outload = (long double)outbitps / ((long double)speed/100L);
} else {
/* use the interface speed if a speed is not given */
inload = (long double)inbitps / ((long double)interfaces[i].speed/100L);
outload = (long double)outbitps / ((long double)interfaces[i].speed/100L);
}
if ( (bw > 0) && ((int)inload > bw || (int)outload > bw)) {
warnflag++;
warn++;
}
}
I’ve extended the last if clause with “warnflag++” and put it all in curly braces.
After compiling this check shows the wanted behaviour.
I am not a programmer and I do not know if this change has unwanted effects.
Hehe ok, that’s what I wanted to do as well, nice catch
GitHub provides you with the possibility to describe your problem and possible solutions, thus opening a ticket for developers. They can comment, assign a release, or say - nice idea, but we cannot do this. It won’t be just me replying here, but a whole team knows about this then.
Don’t be shy, just try it and help improve open source