Is it possible to have a detailed health monitoring using SNMP, as of right now we see the problem that when one powersupply of the slave is going down we will not be notified only when the slave goes fully down.
We are investigating if snmp can give us a full picture of the hardware health. Has someone tried this implementation and help here?
This is not really an Icinga question - it can monitor anything which it can
get data about.
The answer to your question depends on the SNMP capabilities of the server you
are trying to monitor. Some provide only basic information, others are
extremely detailed.
Thanks for the reply. I have another query though, if my server is capable of sending SNMP traps, can they be recieved and displayed in Icinga as well? Is icinga capable of getting SNMP traps?
No, not at all. You need at least a process that receive traps e.g. snmptrapd and snmptt for translating into human language. There was a promising solution called trapdirector but the last commit was 4 years ago.
Out experiences with SNMP Traps were simply a nightmare, hence, we dislike any kind of this approach completely.
Just as an FYI:
Check the check_redfish plugin that uses the redfish API instead of ipmi/ilo/idrac or what ever the server manufacturer uses.
I used the ilo2_health plugin for a long time as well and it works pretty good, but as far as I can remember it was all information in one single check.
But I have since switched all checks regarding server hardware to the mentioned redfish plugin.
No hassle installing more tools like ipmi. And I didn’t even bother trying anything with Dell’s iDrac before that.
@palakd
I also wouldn’t bother with SNMPtraps!
If you want to monitor server hardware, check the check_redfish plugin mentioned above.
For network devices (like Cisco, HP) try check_nwc_health, which uses SNMP and works with a wide range of network devices.
You can also search exchange.icinga.com for what you need and see what turns up.
Hi @log1c, since you have implemented check_redfish plugin, I was trying to configure it too, and I installed the plugin but somehow the command check_redfish.py, it doesn’t work.
@palakd
What do you mean with “it doesnt work”
what happens if you run sudo -u nagios /usr/lib/nagios/plugins/check_redfish.py
(of course depending on your OS)
In my experience its pretty much straight forward.
After the setup on the icinga client (installing the plugin). I am not sure if I also have to install anthing on the hosts for which I need to get the alerts. Because right now if I’m trying to run a simple command like: ./check_redfish.py --host 10.81.98.81 --storage --power
I get: AttributeError: module ‘redfish’ has no attribute ‘redfish_client’
did you install the required redfish library mentioned in the requirements.txt?
Also your command is missing authentication (in case that is the full command your tried):
authentication arguments:
-u USERNAME, --username USERNAME
the login user name
-p PASSWORD, --password PASSWORD
the login password
-f AUTHFILE, --authfile AUTHFILE
authentication file with user name and password
The git repo has a pretty detailed readme about the check and it usage and options.
be sure to check there first in case of issues. you might find the solution