SNMP monitoring auto detection

kevin-olbrich · May 4, 2020, 2:06pm

Hi!

I want to extend my monitoring converage by SNMP. I had this on my todos for quite a long time.
There are so many ways to get SNMP counters but they do not match my expectation.
Monitoring services by hard-coding an OID is easy but I’m searching for a more automated solution.

OpenNMS as well as LibreNMS are good tools but lack remote-system support (which means, they only work reliable on the same LAN/location). Also, during my tests, they did’nt detect many aspects I like to see (like routes per bgp-peer on Juniper MX, etc.).

Browsing the docs of various tools I had the idea of snmpwalk all MIBs. It should be possible to learn wich OIDs are lists (like network interfaces ifIndex), correct?

Why do all SNMP tools lack such support? Most enterprise hardware I have access to, feature a lot of interesting counters and metrics I would like to feed into Icinga2 and graph on influxdb/graphana attached to it.

How do others set this up? Is everybody else settings this up one by one?

Kind regards
Kevin

stevie-sy · May 5, 2020, 11:00am

Hi,

first of all what do you want to montitor? Speical hardware? Do you want to check active or you want to receive SNMP Traps? Both it a a little bit difficult to automate. That means, that somebody has to maintain how the OID and the value is to translate or to handle. Above all, what should be queried for the respective hardware. Every vendor has it’s own MIB list which you have to download for your hardware and import it in your prefered tool or script. There are some standard OIDs every vender can use. e.g. uptime. There are a lot of MIB databases in the web, where you can see or download these information. But the ones I know are not completely. Some of the OIDs you only get only from your vendor.

For example: Take a look in the nwc_health check. If you check the source, you’ll wee that the author included every OID and in the source code for the network devices and how to handle with the value. After compiling you have one script with all the OIDs and a very very long script.

So what you can do, if you don’t want not to program your own script, is to use existing scripts like nwc_health_check for network devices or use programs, they have already included the MIBs or use scripts/programs where you can import the MIBs from your vendor.

Yes you could do a snmpwalk on every device. But depending how many information you can catch from your device it may needs time. And then you have again the problem to translate the OIDs and values to human readable values. Or do you know without to google that 1.3.6.1.2.1.1.3 is the already mentioned sysuptime?

For your question how we handle that:
we are using existing script where the required OIDs are already included. If not, we have to use the check snmp from the ITL with the OID we want to check.
For traps we use logstash and I created our own translation list. Because The snmp-plugin want to have the files in the yaml-format. And I never found a working tool to translae the mib files from the vendor into the yaml format.
At the OSMC I talked with a community member about this. He is working in a company where his colleague wrote their own snmp tool!

kevin-olbrich · May 5, 2020, 11:12am

Thanks for your reply!
OK, long story short: SNMP is too complex and lacking relations between counters (which a script can bring into relation). That would explain why so many scripts are available.

The problem I see with plugins, is that most of them only return OK / WARNING / CRITICAL in terms of Icinga2. (At least thats my impression while using the icinga director). For a router I would like the general health status but at the same time, traffic stats would be appreciated.

Do most solve this with a central solution like Icinga2 or in combination with other tools side-by-side?
My current plan is to feed everything into icinga2 and forward counter data to influxdb using the icinga2 master’s plugins. That way I can use the icinga2 cluster stack (which is very very reliable) für remote locations.

I’ve tested a lot of tools like OpenNMS and LibreNMS but they lack stable remote execution. Some depend on direct access to SNMP, some lack secure clustering (like OpenNMS, which is unsuitable for WWW).

The final question is: Is Icinga2 suitable for statistical data in parallel to UP/DOWN/etc.?

stevie-sy · May 5, 2020, 11:34am

Ok, for better understanding: Icinga is only triggering (external) scripts or programs and expect return values - possibly with performance data. What the script is doing - catch data from an api, catch data through snmp, ping, etc - Icinga don’t know. How the service monitoring works you can find here: https://icinga.com/docs/icinga2/latest/doc/05-service-monitoring/#plugins

At the end it means you need your script - your own script or one you can e.g. download from https://exchange.icinga.com/. If you are not happy with one script, you could fork it.

anon66228339 · May 5, 2020, 11:36am

Yes, for network there are a lot of nice plugins like nwc_health, interfacetable etc.
Or what i do mostly now is use telegraf to fill everything into InfluxDB and then have a plugin to query the parts is want a notification for from the InfluxDB.

Regards,
Carsten

stevie-sy · May 5, 2020, 11:37am

and for shipping the recevied performance data to Grafana with InfluxDB you can use the module https://github.com/Mikesch-mp/icingaweb2-module-grafana from @anon66228339. We using this and are very happy with it

kevin-olbrich · May 5, 2020, 11:59am

Ok, I will try that.

Next question would be how you deliver / rollout the plugin. Is there anything better than using the CMDB and related tools to „copy“ the check scripts?

stevie-sy · May 5, 2020, 1:08pm

If you mean to install/copy the script to your Icinga Server (Master, Satellite, Agents) this question is not answered in one Sentence. Because there exists many tools for that: Ansible, Puppet, Salt etc. You have to look which one is the best for you and your infrastructure.

kevin-olbrich · May 5, 2020, 2:14pm

That already satisfies my question. I’m able to deploy them using my CMDB. I tought there might be a more integrated solution for check scripts.

Currently I’m testing integration / import of external commands in director.