I’ve recently found
check_nwc_health doesn’t scale for me, its simply consumes too many host resources when querying a couple of switches for multiple checks. As a result I’m looking at the
check_interfaces check. My initial use of the check seems to be working OK, however it’s behaviour isn’t strictly what I need, or possibly I’m using it wrong. I’d therefore like to ask a few questions about its operation.
Down is OK?
I’ve found by default the check will flag switch interfaces which are down as critical. Using the
--down-is-ok flag I’ve been able to change this behaviour. Unfortunately the check appears to flag all interfaces as up and thus OK, rather than maintaining the correct status of say Down, and stating an OK with the Down state. Is that correct operation or am I doing something incorrect. I’d certainly like to maintain visibility of the true interface state.
I’m passing the data through to InfluxDB and then to Grafana for graphing. As you’d expect I want to graph the number of in/out octets. Reading through the documentation it isn’t clear as to whether the following flags/args are required
I’ve set both as follows, but I’m guessing here, I’m not even sure if $LASTSERVICECHECK$ and $SERVICEPERDATA$ are available by default??
I’m mainly querying this as my initial check use of
check_interfaces produced counters which simply incremented in Grafana, so I suspect I was doing something wrong.
Some checks I’ve used previously use temp state files, and allow you to set a directory. Are state files required for
check_interfaces? Can you set a directory?
Terminated by signal 11 (Segmentation fault)
I’m also finding the same endpoint is responding with a
<Terminated by signal 11 (Segmentation fault).> when querying switches. Has this been seen before? I’ve tried cutting down the number of interfaces processed with regex, but it doesn’t seem to resolve the issue.
I have verified the command line and it works,
/usr/lib/nagios/plugins# ./check_interfaces --down-is-ok --hostname 192.168.0.101 --regex .*/A.* OK: 16 interfaces found - 6 are administratively down | interfaces::check_multi::plugins=10 time=0.19 device::check_snmp::uptime=8177167s 1-A1::check_snmp::inOctets=3632723651605c outOctets=21729461040411c inDiscards=0c outDiscards=0c inErrors=0c outErrors=0c inUcast=275477388c outUcast=186586049c speed=10000000000 1-A2::check_snmp::inOctets=1471063649353c outOctets=7311080489496c inDiscards=25c outDiscards=0c inErrors=0c outErrors=0c inUcast=3083969105c outUcast=4151200186c speed=10000000000 1-A3::check_snmp::inOctets=5499076683972c outOctets=19846139838770c inDiscards=64027c outDiscards=0c inErrors=1461c outErrors=0c inUcast=3232755079c outUcast=4247765847c speed=10000000000 1-A4::check_snmp::inOctets=1710983268056c outOctets=8346909781954c inDiscards=0c outDiscards=0c inErrors=0c outErrors=0c inUcast=362525952c outUcast=204409267c speed=10000000000 1-A5::check_snmp::inOctets=656568249799493c outOctets=109009535234968c inDiscards=7324c outDiscards=0c inErrors=0c outErrors=0c inUcast=1410028473c outUcast=2404064899c speed=10000000000 1-A6::check_snmp::inOctets=630220973513508c outOctets=54423903572618c inDiscards=18395c outDiscards=0c inErrors=0c outErrors=0c inUcast=913834946c outUcast=356883353c speed=10000000000 1-A7::check_snmp::inOctets=364271407143640c outOctets=339381140960195c inDiscards=115828c outDiscards=0c inErrors=0c outErrors=0c inUcast=3080131355c outUcast=303014111c speed=10000000000 1-A8::check_snmp::inOctets=0c outOctets=0c inDiscards=0c outDiscards=0c inErrors=0c outErrors=0c inUcast=0c outUcast=0c speed=0 2-A1::check_snmp::inOctets=2425180231591c outOctets=8296578815802c inDiscards=50c outDiscards=0c inErrors=0c outErrors=0c inUcast=2572844733c outUcast=1961283746c speed=10000000000 2-A2::check_snmp::inOctets=836158712837c outOctets=4726343224505c inDiscards=98c outDiscards=0c inErrors=0c outErrors=0c inUcast=1807198362c outUcast=2640435916c speed=10000000000 2-A3::check_snmp::inOctets=3125784448603c outOctets=8064388405270c inDiscards=6573c outDiscards=0c inErrors=0c outErrors=0c inUcast=4272857709c outUcast=2617758601c speed=10000000000 2-A4::check_snmp::inOctets=1038366388130c outOctets=2450672298551c inDiscards=103c outDiscards=0c inErrors=0c outErrors=0c inUcast=1897350490c outUcast=2488323239c speed=10000000000 2-A5::check_snmp::inOctets=138139335120593c outOctets=46543000385989c inDiscards=11779c outDiscards=0c inErrors=0c outErrors=0c inUcast=4022151097c outUcast=418871121c speed=10000000000 2-A6::check_snmp::inOctets=285498962710322c outOctets=22849093260086c inDiscards=27847c outDiscards=0c inErrors=0c outErrors=0c inUcast=3188055521c outUcast=2062931179c speed=10000000000 2-A7::check_snmp::inOctets=247704272981331c outOctets=43672644346766c inDiscards=79106c outDiscards=1034c inErrors=0c outErrors=0c inUcast=1447258185c outUcast=328840694c speed=10000000000 2-A8::check_snmp::inOctets=0c outOctets=0c inDiscards=0c outDiscards=0c inErrors=0c outErrors=0c inUcast=0c outUcast=0c speed=0 [OK] 1/A1 is up [OK] 1/A2 is up [OK] 1/A3 is up [OK] 1/A4 is up [OK] 1/A5 is up [OK] 1/A6 is up [OK] 1/A7 is up [OK] 1/A8 is up [OK] 2/A1 is up [OK] 2/A2 is up [OK] 2/A3 is up [OK] 2/A4 is up [OK] 2/A5 is up [OK] 2/A6 is up [OK] 2/A7 is up [OK] 2/A8 is up
I really need to fix this, after nearly 3 months work I’ve not managed to deliver a scalable solution which can read our switches. I have until the end of the week to deliver, and at present have nothing which is reliable, scalable, or usable.