Is it possible to have a detailed health monitoring using SNMP, as of right now we see the problem that when one powersupply of the slave is going down we will not be notified only when the slave goes fully down.
We are investigating if snmp can give us a full picture of the hardware health. Has someone tried this implementation and help here?
This is not really an Icinga question - it can monitor anything which it can
get data about.
The answer to your question depends on the SNMP capabilities of the server you
are trying to monitor. Some provide only basic information, others are
extremely detailed.
Thanks for the reply. I have another query though, if my server is capable of sending SNMP traps, can they be recieved and displayed in Icinga as well? Is icinga capable of getting SNMP traps?
No, not at all. You need at least a process that receive traps e.g. snmptrapd and snmptt for translating into human language. There was a promising solution called trapdirector but the last commit was 4 years ago.
Out experiences with SNMP Traps were simply a nightmare, hence, we dislike any kind of this approach completely.
Just as an FYI:
Check the check_redfish plugin that uses the redfish API instead of ipmi/ilo/idrac or what ever the server manufacturer uses.
I used the ilo2_health plugin for a long time as well and it works pretty good, but as far as I can remember it was all information in one single check.
But I have since switched all checks regarding server hardware to the mentioned redfish plugin.
No hassle installing more tools like ipmi. And I didn’t even bother trying anything with Dell’s iDrac before that.
@palakd
I also wouldn’t bother with SNMPtraps!
If you want to monitor server hardware, check the check_redfish plugin mentioned above.
For network devices (like Cisco, HP) try check_nwc_health, which uses SNMP and works with a wide range of network devices.
You can also search exchange.icinga.com for what you need and see what turns up.