Very strange: Icinga2 does not send SMS after downtimes

Hi!

I have a very strange problem…
We use Icinga2 at office to check many servers. Normally it works without any problem, but sometimes we don’t receive SMS of the problems…

After some tests I got something reproducible: I set a downtime for a service (or an host), then it happens “some shit”. Since there is a downtimes, nothing will be sent. Correct!
Then the downtime expires (or will deleted). The service is even in critical (or warning) state.
Now the very strange problem: the E-Mails will be sent, but not the SMS…

We defined two groups of services “high_prio” (E-Mails and SMS will be sent) and “low_prio” (only E-Mail will be sent).
Of course, the problem occours only with “high_prio” services…

I defined the SMS notification as:

template Notification “sms-service-notification” {
command = “sms-service-notification”

states = [ OK, Warning, Critical ]
types = [ Problem ]

vars += {
notification_from = “Monitoring icinga@our.domain.com
notification_logtosyslog = false
}

period = “24x7”
}

apply Notification “sms-service-notification” to Service {
import “sms-service-notification”
interval = 0 // disable re-notification

if (service.vars.notificationgroups != “”) {
user_groups = service.vars.notificationgroups
} else {
user_groups = host.vars.notificationgroups
}

assign where service.vars.priority == “high” || (service.vars.priority == null && host.vars.priority == [ “high” ])
}

We defined a user for these SMS as:

object User “it” {
import “generic-user”

display_name = “IT”
groups = [
“icingaadmins”,
“admins”,
“high-prio”,
“low-prio”
]
email = “devnull@internal.mail.local”
vars.mobile = “handy-admin@internal.mail.local”
}

The NotificationCommand sms-service-notification just sends an E-Mail to vars.mobile. The Mailserver will then send the data to a GSM-Modem.

So, after my tests, I see, that if the problem occours during a downtime, after the downtime (if the problem persists) just the normal E-Mails will be sent, but sms-service-notification will just not be called…

Has someone an explanation for this problem? And maybe a suggestion how to solve it?

Thanks a lot
Luca

1 Like

We have troubles with SMS in which if we use the “email address” of a phone number to send SMS, the carriers block us and have us blacklisted.

Answer (in our case anyways) is to look at something like Twilio or Pager Duty to send the SMS alerts. We bought time on this by setting up Slack alerts, but I was pushing for Pager Duty.

Hi Ben!

This can not be our problem, since our internal mailserver manage the E-Mail and, in case of an “SMS-E-Mail” sends it to a GSM-Modem.
And I can see in the Logs, that Icinga did not even try to send the “SMS-E-Mail” if the service if in state warning or critical after a downtime. But it sends “normal” E-Mails in this case… Very very strange…

Any other idea?

Thanks
Luca

1 Like

Out of ideas from my end :frowning:
Would be interested to see if this gets resolved though.

We always love to see posts formatted according to the formatting guidelines where you can also find some tips on how to format configuration :slight_smile:

You need to show us the 2 notification rule definitions for SMS and mail. In fact all the rules addressing the same objects (target address(es) to be notified, hosts/services, time periods).

It is my experience with Icinga, at least up to & including 2.10, that the tuple containing these objects (i.e. all that is contained in the brackets in my 1st paragraph) must not be identical for different rules, otherwise you have a “last one wins” situation during “apply” time, i.e. when you activate the configuration.

Also, be very careful using “interval=0”; make sure you truly understand how this works: Once the host/service has gone into the erroneous state that triggers the notification, a notification will be sent; if the state never changes from this again in the meantime, no further notifications will be sent again! We have learnt to avoid using “interval=0” - we just set very high values.

I’m guessing here & there, of course, but perhaps we blind chickens wil find a corn! :slight_smile:

Hi Kai,

here my complete configuration for E-Mail an SMS:

object NotificationCommand “mail-service-notification” {
command = [ ConfigDir + “/scripts/mail-service-notification.sh” ]

arguments += {
“-4” = “$notification_address$”
“-6” = “$notification_address6$”
“-b” = “$notification_author$”
“-c” = “$notification_comment$”
“-d” = {
required = true
value = “$notification_date$”
}
“-e” = {
required = true
value = “$notification_servicename$”
}
“-f” = {
value = “$notification_from$”
description = “Set from address. Requires GNU mailutils (Debian/Ubuntu) or mailx (RHEL/SUSE)”
}
“-i” = “$notification_icingaweb2url$”
“-l” = {
required = true
value = “$notification_hostname$”
}
“-n” = {
required = true
value = “$notification_hostdisplayname$”
}
“-o” = {
required = true
value = “$notification_serviceoutput$”
}
“-r” = {
required = true
value = “$notification_useremail$”
}
“-s” = {
required = true
value = “$notification_servicestate$”
}
“-t” = {
required = true
value = “$notification_type$”
}
“-u” = {
required = true
value = “$notification_servicedisplayname$”
}
“-v” = “$notification_logtosyslog$”
}

vars += {
notification_address = “$address$”
notification_address6 = “$address6$”
notification_author = “$notification.author$”
notification_comment = “$notification.comment$”
notification_type = “$notification.type$”
notification_date = “$icinga.long_date_time$”
notification_hostname = “$host.name$”
notification_hostdisplayname = “$host.display_name$”
notification_servicename = “$service.name$”
notification_serviceoutput = “$service.output$”
notification_servicestate = “$service.state$”
notification_useremail = “$user.email$”
notification_servicedisplayname = “$service.display_name$”
}
}

object NotificationCommand “sms-service-notification” {
command = [ ConfigDir + “/scripts/sms-service-notification.sh” ]

arguments += {
“-4” = “$notification_address$”
“-6” = “$notification_address6$”
“-b” = “$notification_author$”
“-c” = “$notification_comment$”
“-d” = {
required = true
value = “$notification_date$”
}
“-e” = {
required = true
value = “$notification_servicename$”
}
“-f” = {
value = “$notification_from$”
description = “Set from address. Requires GNU mailutils (Debian/Ubuntu) or mailx (RHEL/SUSE)”
}
“-i” = “$notification_icingaweb2url$”
“-l” = {
required = true
value = “$notification_hostname$”
}
“-n” = {
required = true
value = “$notification_hostdisplayname$”
}
“-o” = {
required = true
value = “$notification_serviceoutput$”
}
“-r” = {
required = true
value = “$notification_usermobile$”
}
“-s” = {
required = true
value = “$notification_servicestate$”
}
“-t” = {
required = true
value = “$notification_type$”
}
“-u” = {
required = true
value = “$notification_servicedisplayname$”
}
“-v” = “$notification_logtosyslog$”
}

vars += {
notification_address = “$address$”
notification_address6 = “$address6$”
notification_author = “$notification.author$”
notification_comment = “$notification.comment$”
notification_type = “$notification.type$”
notification_date = “$icinga.long_date_time$”
notification_hostname = “$host.name$”
notification_hostdisplayname = “$host.display_name$”
notification_servicename = “$service.name$”
notification_serviceoutput = “$service.output$”
notification_servicestate = “$service.state$”
notification_usermobile = “$user.vars.mobile$”
notification_servicedisplayname = “$service.display_name$”
}
}

template Notification “mail-service-notification” {
command = “mail-service-notification”

states = [ OK, Warning, Critical ]
types = [ Problem, Recovery ]

vars += {
// notification_icingaweb2url = “https://www.example.com/icingaweb2
notification_from = “Monitoring icinga@our.domain.com
notification_logtosyslog = false
}

period = “24x7”
}

template Notification “sms-service-notification” {
command = “sms-service-notification”

states = [ OK, Warning, Critical ]
types = [ Problem ]

vars += {
// notification_icingaweb2url = “https://www.example.com/icingaweb2
notification_from = “Monitoring icinga@our.domain.com
notification_logtosyslog = false
}

period = “24x7”
}

apply Notification “mail-service-notification” to Service {
import “mail-service-notification”
interval = 0 // disable re-notification

if (service.vars.notificationgroups != “”) {
user_groups = service.vars.notificationgroups
} else {
user_groups = host.vars.notificationgroups
}

assign where service.name && service.vars.notificationtype != “norecovery”
}

apply Notification “sms-service-notification” to Service {
import “sms-service-notification”
interval = 0 // disable re-notification

if (service.vars.notificationgroups != “”) {
user_groups = service.vars.notificationgroups
} else {
user_groups = host.vars.notificationgroups
}

assign where service.vars.priority == “high” || (service.vars.priority == null && host.vars.priority == [ “high” ])
}

Do you see something wrong?

We set interval=0 in order to avoid renotification of the service… Is it wrong?
I read Monitoring Basics - Icinga 2 so I unterstood, that if we don’t want to receive many notification for the same error, we have to set it to 0…
Do I understand wrong?

Thanks a lot
Luca

Heyhey,
We always love to see posts formatted according to the formatting guidelines - like if you put it in triple backticks ``` your config folds in nicely and makes everything a lot more readable :slight_smile:

1 Like

I don’t see any errors in your definitions, except that those tuples I mentioned are exactly the same for both types. Try e.g. not defining separate variables for the notification user (I assume that’s the recipient), but just a generic “notification user” and THEN setting different contacts in this for mail & SMS templates. That way you might “break the tuple”…

Concerning “interval=0”, the problem with that is only that, if you exclude recovery notifications, which many people do (or many customers don’t want to see), and which you have done in your SMS case, the Icinga2 logic says “this host/service went CRITICAL and I did my duty and notified once, and since then I have not received any trigger that there’s been a recovery”!! I.e. you need to shut-down/re-trigger somehow after a milestone duration of a notification rule, so that the “interval” sending rule of mail/SMS is “reset”… So we now tend to avoid using “interval=0” altogether… :slight_smile:

Hi Kai,

I’m feeling very dumb, but I really don’t understand what you mean…
Maybe could you make an example?

Do you mean, that Icinga send only one notification (the E-Mail) and then, due to the “interval=0” does not send the SMS?
Then I can’t understand why we do receive SMS if the problem happens outside the downtimes…
Or maybe I didn’t understood your explanation… :sweat_smile:

Thanks
Luca

Hi again,

I discovered a very very strange behaviour…
Currently my template for sms-service-notification is:

template Notification “sms-service-notification” {
command = “sms-service-notification”

states = [ OK, Warning, Critical ]
types = [ Problem ]

vars += {
// notification_icingaweb2url = “https://www.example.com/icingaweb2
notification_from = “Monitoring icinga@our.domain.com
notification_logtosyslog = false
}

period = “24x7”
}

and it does not work as expected (I don’t get an SMS is the service has a problem during the downtime).

Now I changed the used types in:

types = [ Problem, Recovery ]

same as mail-service-notification.
And so I get the SMS even if the service had a problem during the downtime, as expected (of course, after the downtime).

For me this seems to be a bug, since I cannot believe, that I must have more than one types…

Your opinion?

Thanks
Luca

Are you saying it went down while in downtime, but also recovered during downtime but you received the Recovery notification after the downtime expired? Perhaps a screenshot of your History page would help put the timeline together?

No, I sayd, it went down during the downtime and remain down after the downtime.
Then, I only receive an E-Mail and not an SMS, too.

And in my last post I sayd, that if I add another type in the types option of the SMS template, it works as expected, so I suppose a bug?

Thanks
Luca

I’m not sure if it is a bug, but there is one about hosts not sending notifications after downtimes:

Check if this matches in your scenario.

As a rule of thumb I always configure the notification templates with types Problem and Recovery. Without having the Recovery type configured I made the experience that non of the hosts will notify ever again, because the no_more_notifications option will remain on true.
To not get every recovery notification I filter them at the user level (some users only get problem notifications, some the additional recovery)

Hi,

no, my scenario is other:

  1. Host (or service) in downtime
  2. Host (or service) has a problem
  3. Downtime ends
  4. E-Mail notification will be sent, but no SMS

But, as I sayd, as I changed types and added Custom (just to add something), I’ll get SMS, too…

Thanks
Luca

OK, I cheered too early… This night we had the problem again. Downtime, problem, problem remains after downtime, just E-Mail sent…

Really, I don’t know what I can think…

Any other suggestion?

Thanks
Luca

If you can force this kind of behavior (e.g. with a test check) then I would do so and turn on the debuglog before.
Then check what the log say around the time you would normally expect the SMS to arrive

It looks like the apply Notification have different conditions define. Can you confirm the SMS notification is being applied to the service you are testing? icinga2 object list --type Notification - not sure if there’s a better way to filter that down more. You can use --name as well to specify the Notification's name.

Yes, I can confirm that, since if there is no downtime, and the host/service has a problem, we will receive an E-Mail and an SMS…
Just if the problem happens during the downtime and remain after the downtime, we get just the E-Mail…

Thanks
Luca