Raise Multiple Alerts for same service on same server

radioactive9 · March 2, 2020, 6:09am

Hello

It might be my short of knowledge - but I am stuck with a very basic problem of alerting. This i know can be achieved in monitoring tools like HP/ IBM etc

Given an example is disk
1 Node has multiple disks—> /, /var, /opt
2 Node has multiple disk—> /, /var, /data
…
…
…
n Node has multple disk—> /, /xy, /abc

I want to monitor all using 1 single service with general rule that if it breaches 80% raise an alarm.

But I want to receive 2 alerts / 3 alerts / 4 alerts for each mount points on the same server from the same single serviceX

Currently 1 alert is raised for any OR many mount point threshold breach on particular server.

I want to make HostName:Service-Name:Mount-Point as unique key to raise alerts. That way for each mount point it will raise alert.

The above is an example. There are many more use case like windows service, MQ queue manager, Postgres multiple instance on single server which somehow needs this where the ServerName:ServiceName:$variable$ should make unique alerts. That $variable$ should be electable based on what we want to pass as unique like Mount Point, Windows Service Name, Database Instance Name, MQ Queue Name etc etc

stevie-sy · March 2, 2020, 7:41am

Hi,

if you want to specitfy for diffrent directorys diffrent thresholds and it should be only one check, you could write a bash script.

#!/bin/bash
/usr/lib64/nagios/plugins/check_disk ‘-c’ ‘5%’ ‘-w’ ‘10%’ -p /tmp
/usr/lib64/nagios/plugins/check_disk ‘-c’ ‘10%’ ‘-w’ ‘15%’ -p /etc

With the getopts function of bash you could generalize it to pass arguments

radioactive9 · March 2, 2020, 8:21am

But when lets say the /tmp and /etc is full the service will create only 1 alert. Ideally I want to create 2 alerts separately for each of /tmp and /etc and should be treated differently. But underlying service should be one.

Single service can fire only 1 alert even though there are two different breach of underlying thresholds. The downside to that is we end up creating 1 service for each monitor.

Now consider MQ queues. There are close to 66 thousand queues. But any of the queue has a problem it should be reported separately. Means we have to create 66k single service which is almost impossible.

Was I able to explain my concern? Or may be I am little vague?

stevie-sy · March 2, 2020, 9:24am

if you want for each alert your own notification you have to create for each directory a service. In this case the icinga dsl with apply rules iterating over an array is your friend.

Or you try to build notification rules with a search over the macro “service.output” on specific keywords.

radioactive9 · March 2, 2020, 10:41am

Hello Stevie

Thank You very much for your input. I am little lost as I am very new in iCinga.

I have 6 QManagers on 1 server for MQ and each Qmanager can host 1000s of Queues. IBM MQ basically.

There are close to 300 Servers hosting these. Each Queue is treated differently in terms of Queue Depth and Queue Age and needs to be separately alerted as they belong to different business for each queue

Total queues is 33000 with unique name. Each queue alert is important. If I create service per queue I will end up creating 66000 Single Service in Director (1 each for queue depth and queue age for 33k queues) I believe that is not feasible ??

stevie-sy · March 2, 2020, 12:17pm

I think you should start with the docs to get a better feeling about Icinga. Or if you spean german there exists a Icinga book. It would be a little out of the question to explain everything exactly here. Especially when you actually have to start from scratch. You may then get a few ideas that you can discuss individually. However, it is not possible to conclude 1: 1 from one monitoring system to another without familiarization.

Icinga with the DSL, the Director etc. are powerful tools that can do a lot, but certainly a different approach than other monitoring tools. I don’t know MQ myself. So it will be even more difficult to give advice.

rsx · March 2, 2020, 12:51pm

I think you should try two options apply for and REST API for icinga or director.

radioactive9 · March 2, 2020, 4:21pm

I agree Stevie. I understand I am pushing this hard.
I did attend iCinga training conducted by Thilo recently but sadly I couldn’t clear these doubts as it was not clear about these requirements at that point. Sorry I am asking silly.

For icinga dsl…can you share an example (Director way / screenshot) managing X number of mount point on N number server using 1 Service definition but is capable raising multiple alerts on any single server at the same time due to breach of threshold on multiple mount points . I can use the example and create it for MQ.

Sorry don’t curse me but we really need this solution to be able to handle multiple problems we are having with icinga in our production environment

I really do not want to create 33k service for each queue just for the sake of raising multiple alerts due to same issue on same server

stevie-sy · March 3, 2020, 7:07am

No problem.

in the docs (like suggested from @rsx) and in the community you will find a lot of examples for the icinga dsl. But you have to do some work on your own. The difficulty will be to find someone who was faced with the same situation as you and who made a move from MQ to Icinga.

We can give you only some hints.

Here is the language reference: https://icinga.com/docs/icinga2/latest/doc/17-language-reference/