Hi there, long time Icinga user. We moved to a new Icinga2 instance at the start of the year which merged 2xIcinga2 instances and 1xIcinga1 instance. Since then I have an issue that is only occurring intermittently against one user, and I cannot get to the bottom of it. I’m running r2.11.3-1, with a primary and a secondary master node that handle notifcations, and a number of satellites handling checks.
The general situation: I have a number of hosts that are monitored in evenings/weekends by engineer. Alerts during these times are sent by SMS. As well as that, 24x7 we send an email.
However, engineers are reporting SMSes outside of the designated times. As far as I can see, it’s only for DOWN states, never UP.
I have almost identical setup for another set of hosts, but the TimePeriod is different, and that works as expected.
All the config looks good to me, but perhaps there’s something I’m missing, or does it sound like an issue anyone else has encountered? Or any extra troubleshooting steps would also be welcome.
Apply notification with sms-host command for user in period:
apply Notification "sms-host-networks" to Host {
import "sms-host-notification-tmplt"
users = [ "networks-sme-user" ]
period = "networks-rota-timeperiod"
types = [ Problem, Recovery ]
states = [ Up, Down ]
interval = 0 # disable re-notification
assign where "networks-sme" in host.groups
}
User definition:
template User "generic-user" {
enable_notifications = true
}
object User "networks-sme-user" {
import "generic-user"
display_name = "Networks SME"
email = NocEmail
# pager number represents flag given to sms-gateway
pager = "-o"
}
Time period:
object TimePeriod "networks-rota-timeperiod" {
display_name = "Networks SME rota notifications"
ranges = {
"monday" = "07:00-09:00,17:30-23:00"
...
"friday" = "07:00-09:00,17:30-23:00"
"saturday" = "08:00-23:00"
"sunday" = "08:00-23:00"
# Bank Holidays
"january 1" = "08:00-23:00"
...
"2020-12-23 - 2021-01-01" = "08:00-23:00"
}
}
Notification:
object NotificationCommand "sms-host-notification" {
import "plugin-notification-command"
command = [ + "/usr/local/bin/sms-gateway" ]
arguments = {
"--pager" = {
"skip_key" = true
"value" = "$user.pager$"
}
"-m" = "$notification.type$ - Host: $host.display_name$ ($address$) is $host.state$"
}
}
template Notification "sms-host-notification-tmplt" {
command = "sms-host-notification"
states = [ Up, Down ]
types = [ Problem, Acknowledgement, Recovery, Custom,
FlappingStart, FlappingEnd,
DowntimeStart, DowntimeEnd, DowntimeRemoved ]
interval = 0 //disable re-notification
}
Yesterday morning, there was an errant DOWN message. Here’s what the logs look like (P for primary node, S for secondary):
02:58:10 Host goes down, notification sent to both users despite being outside TimePeriod for networks-sme-user
P [2020-04-21 02:58:10 +0100] information/Checkable: Checkable 'hostname' has 2 notification(s). Checking filters for type 'Problem', sends will be logged.
P [2020-04-21 02:58:10 +0100] information/Notification: Sending 'Problem' notification 'hostname!sms-host-networks' for user 'networks-sme-user'
S [2020-04-21 02:58:10 +0100] information/Notification: Completed sending 'Problem' notification 'hostname!sms-host-networks' for checkable 'hostname' and user 'networks-sme-user' using command 'sms-host-notification'.
P [2020-04-21 02:58:10 +0100] information/Checkable: Checkable 'hostname' has 2 notification(s). Checking filters for type 'Problem', sends will be logged.
S [2020-04-21 02:58:10 +0100] information/Notification: Sending 'Problem' notification 'hostname!mail-host-noc' for user 'noc-user'
S [2020-04-21 02:58:10 +0100] information/Notification: Completed sending 'Problem' notification 'hostname!mail-host-noc' for checkable 'hostname' and user 'noc-user' using command 'mail-host-notification'.
03:33:35 Host recovers, only one notification sent (despite “has 2 notification(s)”)
P [2020-04-21 03:33:35 +0100] information/Checkable: Checkable 'hostname' has 2 notification(s). Checking filters for type 'Recovery', sends will be logged.
S [2020-04-21 03:33:35 +0100] information/Checkable: Checkable 'hostname' has 2 notification(s). Checking filters for type 'Recovery', sends will be logged.
S [2020-04-21 03:33:35 +0100] information/Notification: Sending 'Recovery' notification 'hostname!mail-host-noc' for user 'noc-user'
S [2020-04-21 03:33:35 +0100] information/Notification: Completed sending 'Recovery' notification 'hostname!mail-host-noc' for checkable 'hostname' and user 'noc-user' using command 'mail-host-notification'.