Detailed hardware slave monitoring with SNMP

palakd · July 15, 2024, 11:56am

Is it possible to have a detailed health monitoring using SNMP, as of right now we see the problem that when one powersupply of the slave is going down we will not be notified only when the slave goes fully down.

We are investigating if snmp can give us a full picture of the hardware health. Has someone tried this implementation and help here?

Pooh · July 15, 2024, 12:24pm

This is not really an Icinga question - it can monitor anything which it can
get data about.

The answer to your question depends on the SNMP capabilities of the server you
are trying to monitor. Some provide only basic information, others are
extremely detailed.

Regards,

Antony.

rsx · July 15, 2024, 1:07pm

We prefer to use SNMP only if there is no other chance to monitor a device. Examples: We use check_ilo2_health for HPE servers and IPMI Sensor Monitoring Plugin for Dell servers.

palakd · July 29, 2024, 7:36am

Thanks for the reply. I have another query though, if my server is capable of sending SNMP traps, can they be recieved and displayed in Icinga as well? Is icinga capable of getting SNMP traps?

rsx · July 29, 2024, 10:52am

No, not at all. You need at least a process that receive traps e.g. snmptrapd and snmptt for translating into human language. There was a promising solution called trapdirector but the last commit was 4 years ago.

Out experiences with SNMP Traps were simply a nightmare, hence, we dislike any kind of this approach completely.

log1c · July 31, 2024, 6:40am

Just as an FYI:
Check the check_redfish plugin that uses the redfish API instead of ipmi/ilo/idrac or what ever the server manufacturer uses.

I used the ilo2_health plugin for a long time as well and it works pretty good, but as far as I can remember it was all information in one single check.
But I have since switched all checks regarding server hardware to the mentioned redfish plugin.
No hassle installing more tools like ipmi. And I didn’t even bother trying anything with Dell’s iDrac before that.

@palakd
I also wouldn’t bother with SNMPtraps!
If you want to monitor server hardware, check the check_redfish plugin mentioned above.
For network devices (like Cisco, HP) try check_nwc_health, which uses SNMP and works with a wide range of network devices.
You can also search exchange.icinga.com for what you need and see what turns up.

palakd · January 17, 2025, 12:35pm

Hi @log1c, since you have implemented check_redfish plugin, I was trying to configure it too, and I installed the plugin but somehow the command check_redfish.py, it doesn’t work.

Can you please guide me a bit, how you set it up?

Thanks

bberg · January 17, 2025, 2:47pm

@palakd
What do you mean with “it doesnt work”
what happens if you run sudo -u nagios /usr/lib/nagios/plugins/check_redfish.py
(of course depending on your OS)

In my experience its pretty much straight forward.

bberg · January 17, 2025, 2:50pm

I am a bit late to the SNMP Trap Party, but still wanted to provide an answer about what worked pretty good for me last time.

A few weeks ago i had to setup Monitoring with SNMP Traps for a customer. After looking around a bit i decided to do this with telegraf: telegraf/plugins/inputs/snmp_trap/README.md at master · influxdata/telegraf · GitHub
It saves its results into InfluxDB, which is pretty often already installed in an Icinga Environment.
And for getting the Results from InfluxDB to Icinga2 i decided to use check_influxdb: GitHub - NETWAYS/check_influxdb: Icinga check plugin to check InfluxDB

With this setup it was pretty easy to monitor the environment using the SNMP Traps.

palakd · January 17, 2025, 2:57pm

After the setup on the icinga client (installing the plugin). I am not sure if I also have to install anthing on the hosts for which I need to get the alerts. Because right now if I’m trying to run a simple command like: ./check_redfish.py --host 10.81.98.81 --storage --power

I get: AttributeError: module ‘redfish’ has no attribute ‘redfish_client’

log1c · January 17, 2025, 3:23pm

hello.

did you install the required redfish library mentioned in the requirements.txt?

Also your command is missing authentication (in case that is the full command your tried):

authentication arguments:
  -u USERNAME, --username USERNAME
                        the login user name
  -p PASSWORD, --password PASSWORD
                        the login password
  -f AUTHFILE, --authfile AUTHFILE
                        authentication file with user name and password

The git repo has a pretty detailed readme about the check and it usage and options.

be sure to check there first in case of issues. you might find the solution