Notifications not honouring timeperiod

Hi there, long time Icinga user. We moved to a new Icinga2 instance at the start of the year which merged 2xIcinga2 instances and 1xIcinga1 instance. Since then I have an issue that is only occurring intermittently against one user, and I cannot get to the bottom of it. I’m running r2.11.3-1, with a primary and a secondary master node that handle notifcations, and a number of satellites handling checks.

The general situation: I have a number of hosts that are monitored in evenings/weekends by engineer. Alerts during these times are sent by SMS. As well as that, 24x7 we send an email.

However, engineers are reporting SMSes outside of the designated times. As far as I can see, it’s only for DOWN states, never UP.

I have almost identical setup for another set of hosts, but the TimePeriod is different, and that works as expected.

All the config looks good to me, but perhaps there’s something I’m missing, or does it sound like an issue anyone else has encountered? Or any extra troubleshooting steps would also be welcome.

Apply notification with sms-host command for user in period:

apply Notification "sms-host-networks" to Host {
  import "sms-host-notification-tmplt"

  users = [ "networks-sme-user" ]
  period = "networks-rota-timeperiod"

  types = [ Problem, Recovery ]
  states = [ Up, Down ]
  interval = 0 # disable re-notification

  assign where "networks-sme" in host.groups
}

User definition:

template User "generic-user" {
  enable_notifications = true
}

object User "networks-sme-user" {
  import "generic-user"

  display_name = "Networks SME"
  email = NocEmail
  # pager number represents flag given to sms-gateway
  pager = "-o"
}

Time period:

object TimePeriod "networks-rota-timeperiod" {

  display_name = "Networks SME rota notifications"
  ranges = {
    "monday"    = "07:00-09:00,17:30-23:00"
    ...
    "friday"    = "07:00-09:00,17:30-23:00"
    "saturday"  = "08:00-23:00"
    "sunday"    = "08:00-23:00"

# Bank Holidays
    "january 1"         = "08:00-23:00"
    ...
    "2020-12-23 - 2021-01-01" = "08:00-23:00"
  }
}

Notification:

object NotificationCommand "sms-host-notification" {
  import "plugin-notification-command"

  command = [  + "/usr/local/bin/sms-gateway" ]

  arguments = {
    "--pager" = {
      "skip_key" = true
      "value" = "$user.pager$"
    }
    "-m" = "$notification.type$ - Host: $host.display_name$ ($address$) is $host.state$"
  }
}

template Notification "sms-host-notification-tmplt" {
  command = "sms-host-notification"

  states = [ Up, Down ]
  types = [ Problem, Acknowledgement, Recovery, Custom,
            FlappingStart, FlappingEnd,
            DowntimeStart, DowntimeEnd, DowntimeRemoved ]

  interval = 0 //disable re-notification
}

Yesterday morning, there was an errant DOWN message. Here’s what the logs look like (P for primary node, S for secondary):

02:58:10 Host goes down, notification sent to both users despite being outside TimePeriod for networks-sme-user

P [2020-04-21 02:58:10 +0100] information/Checkable: Checkable 'hostname' has 2 notification(s). Checking filters for type 'Problem', sends will be logged.
P [2020-04-21 02:58:10 +0100] information/Notification: Sending 'Problem' notification 'hostname!sms-host-networks' for user 'networks-sme-user'
S [2020-04-21 02:58:10 +0100] information/Notification: Completed sending 'Problem' notification 'hostname!sms-host-networks' for checkable 'hostname' and user 'networks-sme-user' using command 'sms-host-notification'.
P [2020-04-21 02:58:10 +0100] information/Checkable: Checkable 'hostname' has 2 notification(s). Checking filters for type 'Problem', sends will be logged.
S [2020-04-21 02:58:10 +0100] information/Notification: Sending 'Problem' notification 'hostname!mail-host-noc' for user 'noc-user'
S [2020-04-21 02:58:10 +0100] information/Notification: Completed sending 'Problem' notification 'hostname!mail-host-noc' for checkable 'hostname' and user 'noc-user' using command 'mail-host-notification'.

03:33:35 Host recovers, only one notification sent (despite “has 2 notification(s)”)

P [2020-04-21 03:33:35 +0100] information/Checkable: Checkable 'hostname' has 2 notification(s). Checking filters for type 'Recovery', sends will be logged.
S [2020-04-21 03:33:35 +0100] information/Checkable: Checkable 'hostname' has 2 notification(s). Checking filters for type 'Recovery', sends will be logged.
S [2020-04-21 03:33:35 +0100] information/Notification: Sending 'Recovery' notification 'hostname!mail-host-noc' for user 'noc-user'
S [2020-04-21 03:33:35 +0100] information/Notification: Completed sending 'Recovery' notification 'hostname!mail-host-noc' for checkable 'hostname' and user 'noc-user' using command 'mail-host-notification'.

Hello,

can you check which notifications are there for that host/service and post the output here.

icinga2 object list --type notification --name 'hostname!*'

Reggards,
Carsten

Thanks for the reply Carsten. There are a lot of notifications returned for that command, but most are services. The two for the host (which is what the notifications are about) are:

Object '<hostname>!mail-host-noc' of type 'Notification':
  % declared in '/etc/icinga2/zones.d/global-templates/notifications.conf', lines 4:1-4:42
  * __name = "<hostname>!mail-host-noc"
  * command = "mail-host-notification"
    % = modified in '/etc/icinga2/zones.d/global-templates/notifications-mail.conf', lines 137:3-137:36
  * command_endpoint = ""
  * host_name = "<hostname>"
    % = modified in '/etc/icinga2/zones.d/global-templates/notifications.conf', lines 4:1-4:42
  * interval = 0
    % = modified in '/etc/icinga2/zones.d/global-templates/notifications-mail.conf', lines 145:3-145:14
    % = modified in '/etc/icinga2/zones.d/global-templates/notifications.conf', lines 12:3-12:14
  * name = "mail-host-noc"
  * package = "_etc"
    % = modified in '/etc/icinga2/zones.d/global-templates/notifications.conf', lines 4:1-4:42
  * period = "24x7-timeperiod"
    % = modified in '/etc/icinga2/zones.d/global-templates/notifications-mail.conf', lines 144:3-144:28
  * service_name = ""
  * source_location
    * first_column = 1
    * first_line = 4
    * last_column = 42
    * last_line = 4
    * path = "/etc/icinga2/zones.d/global-templates/notifications.conf"
  * states = [ "Up", "Down" ]
    % = modified in '/etc/icinga2/zones.d/global-templates/notifications-mail.conf', lines 139:3-139:23
  * templates = [ "mail-host-noc", "mail-host-notification" ]
    % = modified in '/etc/icinga2/zones.d/global-templates/notifications.conf', lines 4:1-4:42
    % = modified in '/etc/icinga2/zones.d/global-templates/notifications-mail.conf', lines 136:1-136:46
  * times = null
  * type = "Notification"
  * types = [ "Problem", "Recovery", "Acknowledgement", "Custom" ]
    % = modified in '/etc/icinga2/zones.d/global-templates/notifications-mail.conf', lines 140:3-142:57
    % = modified in '/etc/icinga2/zones.d/global-templates/notifications.conf', lines 10:3-10:56
  * user_groups = null
  * users = [ "noc-user" ]
    % = modified in '/etc/icinga2/zones.d/global-templates/notifications.conf', lines 7:3-7:24
  * vars = null
  * zone = "network-zone"
    % = modified in '/etc/icinga2/zones.d/global-templates/notifications.conf', lines 4:1-4:42

Object '<hostname>!sms-host-networks' of type 'Notification':
  % declared in '/etc/icinga2/zones.d/global-templates/notifications.conf', lines 32:1-32:46
  * __name = "<hostname>!sms-host-networks"
  * command = "sms-host-notification"
    % = modified in '/etc/icinga2/zones.d/global-templates/notifications-sms.conf', lines 32:3-32:35
  * command_endpoint = ""
  * host_name = "<hostname>"
    % = modified in '/etc/icinga2/zones.d/global-templates/notifications.conf', lines 32:1-32:46
  * interval = 0
    % = modified in '/etc/icinga2/zones.d/global-templates/notifications-sms.conf', lines 39:3-39:14
    % = modified in '/etc/icinga2/zones.d/global-templates/notifications.conf', lines 40:3-40:14
  * name = "sms-host-networks"
  * package = "_etc"
    % = modified in '/etc/icinga2/zones.d/global-templates/notifications.conf', lines 32:1-32:46
  * period = "networks-rota-timeperiod"
    % = modified in '/etc/icinga2/zones.d/global-templates/notifications.conf', lines 36:3-36:37
  * service_name = ""
  * source_location
    * first_column = 1
    * first_line = 32
    * last_column = 46
    * last_line = 32
    * path = "/etc/icinga2/zones.d/global-templates/notifications.conf"
  * states = [ "Up", "Down" ]
    % = modified in '/etc/icinga2/zones.d/global-templates/notifications-sms.conf', lines 34:3-34:23
    % = modified in '/etc/icinga2/zones.d/global-templates/notifications.conf', lines 39:3-39:23
  * templates = [ "sms-host-networks", "sms-host-notification-tmplt" ]
    % = modified in '/etc/icinga2/zones.d/global-templates/notifications.conf', lines 32:1-32:46
    % = modified in '/etc/icinga2/zones.d/global-templates/notifications-sms.conf', lines 31:1-31:51
  * times = null
  * type = "Notification"
  * types = [ "Problem", "Recovery" ]
    % = modified in '/etc/icinga2/zones.d/global-templates/notifications-sms.conf', lines 35:3-37:57
    % = modified in '/etc/icinga2/zones.d/global-templates/notifications.conf', lines 38:3-38:31
  * user_groups = null
  * users = [ "networks-sme-user" ]
    % = modified in '/etc/icinga2/zones.d/global-templates/notifications.conf', lines 35:3-35:33
  * vars = null
  * zone = "network-zone"
    % = modified in '/etc/icinga2/zones.d/global-templates/notifications.conf', lines 32:1-32:46

Hi,

from what i can see one notification should not be send out. Looks like there is a bug with periods for days with multible time ranges.
Can you test it on a day with only one timerangelike saturday and sunday if outside the range still messages are send?

Regards,
Carsten

% = modified in '/etc/icinga2/zones.d/global-templates/notifications-sms.conf', lines 35:3-37:57 
% = modified in '/etc/icinga2/zones.d/global-templates/notifications.conf', lines 38:3-38:31

this is strange, as your apply rule only imports 1 template.

The notifications.conf contains the Apply rule, while the notifications-sms.conf contains the object Template - would that make sense?

Thanks, that would line up with the logs. On Sunday 12 April we had six hosts go down at 02:00, and two notifications were sent at 08:00, so that worked (mostly). Is there a known bug with multiple time ranges?