Notifications are still sent during Scheduled Downtime

This is a repost from a dead topic that was never resolved Notifications sent during scheduled downtime - #6 by stevie-sy

I am on 2.13.0-1 and notifications are sent for all hosts/services during scheduled downtime. This problem existed in 2.11.3-1, which prompted our team to do an emergency upgrade to 2.13.0-1.

Downtime is applied via Icingaweb2 (not an apply rule) for maintenance windows by our Network Engineers. Downtime is fixed, not “flexible”, and downtime was sent.

Here is an example:

image

It does appear that the notification was sent out after the next check interval (20 minutes via an API call for passive monitoring for this particular check), despite Downtime being scheduled for this service.

  • Version used (icinga2 --version) 2.13.0-1
  • Operating System and version CentOS 7 3.10.0-1127.8.2.el7.x86_64
  • Enabled features (icinga2 feature list) api checker command ido-mysql influxdb mainlog notification statusdata
  • Icinga Web 2 version and modules (System - About) 2.7.3
  • Config validation (icinga2 daemon -C) REDACTED, but config loads fine.
  • If you run multiple Icinga 2 instances, the zones.conf file (or icinga2 object list --type Endpoint and icinga2 object list --type Zone) from all affected nodes (SOME NODES May appear to be in 2 zones, or otherwise be in there twice, but they have different domain names that I have redacted)
Object 'icinga02' of type 'Endpoint':
  % declared in '/etc/icinga2/zones.conf', lines 6:1-6:41
  * __name = "icinga02"
  * host = ""
  * log_duration = 86400
  * name = "icinga02"
  * package = "_etc"
  * port = "5665"
  * source_location
    * first_column = 1
    * first_line = 6
    * last_column = 41
    * last_line = 6
    * path = "/etc/icinga2/zones.conf"
  * templates = [ "icinga02" ]
    % = modified in '/etc/icinga2/zones.conf', lines 6:1-6:41
  * type = "Endpoint"
  * zone = ""

Object 'ica02m02n' of type 'Endpoint':
  % declared in '/var/lib/icinga2/api/packages/director/46b75cce-5f66-49dc-9ea0-3f5dfd5daaa7/zones.d/master/endpoints.conf', lines 7:1-7:42
  * __name = "ica02m02n"
  * host = "ica02m02n"
    % = modified in '/var/lib/icinga2/api/packages/director/46b75cce-5f66-49dc-9ea0-3f5dfd5daaa7/zones.d/master/endpoints.conf', lines 8:5-8:37
  * log_duration = 86400
    % = modified in '/var/lib/icinga2/api/packages/director/46b75cce-5f66-49dc-9ea0-3f5dfd5daaa7/zones.d/master/endpoints.conf', lines 10:5-10:21
  * name = "ica02m02n"
  * package = "director"
  * port = "5665"
    % = modified in '/var/lib/icinga2/api/packages/director/46b75cce-5f66-49dc-9ea0-3f5dfd5daaa7/zones.d/master/endpoints.conf', lines 9:5-9:17
  * source_location
    * first_column = 1
    * first_line = 7
    * last_column = 42
    * last_line = 7
    * path = "/var/lib/icinga2/api/packages/director/46b75cce-5f66-49dc-9ea0-3f5dfd5daaa7/zones.d/master/endpoints.conf"
  * templates = [ "ica02m02n" ]
    % = modified in '/var/lib/icinga2/api/packages/director/46b75cce-5f66-49dc-9ea0-3f5dfd5daaa7/zones.d/master/endpoints.conf', lines 7:1-7:42
  * type = "Endpoint"
  * zone = "master"

Object 'ica01m02n' of type 'Endpoint':
  % declared in '/var/lib/icinga2/api/packages/director/46b75cce-5f66-49dc-9ea0-3f5dfd5daaa7/zones.d/master/endpoints.conf', lines 1:0-1:41
  * __name = "ica01m02n"
  * host = "ica01m02n"
    % = modified in '/var/lib/icinga2/api/packages/director/46b75cce-5f66-49dc-9ea0-3f5dfd5daaa7/zones.d/master/endpoints.conf', lines 2:5-2:37
  * log_duration = 86400
    % = modified in '/var/lib/icinga2/api/packages/director/46b75cce-5f66-49dc-9ea0-3f5dfd5daaa7/zones.d/master/endpoints.conf', lines 4:5-4:21
  * name = "ica01m02n"
  * package = "director"
  * port = "5665"
    % = modified in '/var/lib/icinga2/api/packages/director/46b75cce-5f66-49dc-9ea0-3f5dfd5daaa7/zones.d/master/endpoints.conf', lines 3:5-3:17
  * source_location
    * first_column = 0
    * first_line = 1
    * last_column = 41
    * last_line = 1
    * path = "/var/lib/icinga2/api/packages/director/46b75cce-5f66-49dc-9ea0-3f5dfd5daaa7/zones.d/master/endpoints.conf"
  * templates = [ "ica01m02n" ]
    % = modified in '/var/lib/icinga2/api/packages/director/46b75cce-5f66-49dc-9ea0-3f5dfd5daaa7/zones.d/master/endpoints.conf', lines 1:0-1:41
  * type = "Endpoint"
  * zone = "master"

Object 'ica03m02n' of type 'Endpoint':
  % declared in '/var/lib/icinga2/api/packages/director/46b75cce-5f66-49dc-9ea0-3f5dfd5daaa7/zones.d/master/endpoints.conf', lines 25:1-25:42
  * __name = "ica03m02n"
  * host = "ica03m02n"
    % = modified in '/var/lib/icinga2/api/packages/director/46b75cce-5f66-49dc-9ea0-3f5dfd5daaa7/zones.d/master/endpoints.conf', lines 26:5-26:37
  * log_duration = 86400
    % = modified in '/var/lib/icinga2/api/packages/director/46b75cce-5f66-49dc-9ea0-3f5dfd5daaa7/zones.d/master/endpoints.conf', lines 28:5-28:21
  * name = "ica03m02n.nsvltn"
  * package = "director"
  * port = "5665"
    % = modified in '/var/lib/icinga2/api/packages/director/46b75cce-5f66-49dc-9ea0-3f5dfd5daaa7/zones.d/master/endpoints.conf', lines 27:5-27:17
  * source_location
    * first_column = 1
    * first_line = 25
    * last_column = 42
    * last_line = 25
    * path = "/var/lib/icinga2/api/packages/director/46b75cce-5f66-49dc-9ea0-3f5dfd5daaa7/zones.d/master/endpoints.conf"
  * templates = [ "ica03m02n" ]
    % = modified in '/var/lib/icinga2/api/packages/director/46b75cce-5f66-49dc-9ea0-3f5dfd5daaa7/zones.d/master/endpoints.conf', lines 25:1-25:42
  * type = "Endpoint"
  * zone = "master"

Object 'ica04m02n' of type 'Endpoint':
  % declared in '/var/lib/icinga2/api/packages/director/46b75cce-5f66-49dc-9ea0-3f5dfd5daaa7/zones.d/master/endpoints.conf', lines 31:1-31:42
  * __name = "ica04m02n"
  * host = "ica04m02n"
    % = modified in '/var/lib/icinga2/api/packages/director/46b75cce-5f66-49dc-9ea0-3f5dfd5daaa7/zones.d/master/endpoints.conf', lines 32:5-32:37
  * log_duration = 86400
    % = modified in '/var/lib/icinga2/api/packages/director/46b75cce-5f66-49dc-9ea0-3f5dfd5daaa7/zones.d/master/endpoints.conf', lines 34:5-34:21
  * name = "ica04m02n"
  * package = "director"
  * port = "5665"
    % = modified in '/var/lib/icinga2/api/packages/director/46b75cce-5f66-49dc-9ea0-3f5dfd5daaa7/zones.d/master/endpoints.conf', lines 33:5-33:17
  * source_location
    * first_column = 1
    * first_line = 31
    * last_column = 42
    * last_line = 31
    * path = "/var/lib/icinga2/api/packages/director/46b75cce-5f66-49dc-9ea0-3f5dfd5daaa7/zones.d/master/endpoints.conf"
  * templates = [ "ica04m02n" ]
    % = modified in '/var/lib/icinga2/api/packages/director/46b75cce-5f66-49dc-9ea0-3f5dfd5daaa7/zones.d/master/endpoints.conf', lines 31:1-31:42
  * type = "Endpoint"
  * zone = "master"

Object 'ica01m02n' of type 'Endpoint':
  % declared in '/var/lib/icinga2/api/packages/director/46b75cce-5f66-49dc-9ea0-3f5dfd5daaa7/zones.d/master/endpoints.conf', lines 13:1-13:42
  * __name = "ica01m02n"
  * host = "ica01m02n"
    % = modified in '/var/lib/icinga2/api/packages/director/46b75cce-5f66-49dc-9ea0-3f5dfd5daaa7/zones.d/master/endpoints.conf', lines 14:5-14:37
  * log_duration = 86400
    % = modified in '/var/lib/icinga2/api/packages/director/46b75cce-5f66-49dc-9ea0-3f5dfd5daaa7/zones.d/master/endpoints.conf', lines 16:5-16:21
  * name = "ica01m02n"
  * package = "director"
  * port = "5665"
    % = modified in '/var/lib/icinga2/api/packages/director/46b75cce-5f66-49dc-9ea0-3f5dfd5daaa7/zones.d/master/endpoints.conf', lines 15:5-15:17
  * source_location
    * first_column = 1
    * first_line = 13
    * last_column = 42
    * last_line = 13
    * path = "/var/lib/icinga2/api/packages/director/46b75cce-5f66-49dc-9ea0-3f5dfd5daaa7/zones.d/master/endpoints.conf"
  * templates = [ "ica01m02n" ]
    % = modified in '/var/lib/icinga2/api/packages/director/46b75cce-5f66-49dc-9ea0-3f5dfd5daaa7/zones.d/master/endpoints.conf', lines 13:1-13:42
  * type = "Endpoint"
  * zone = "master"

Object 'ica02m02n' of type 'Endpoint':
  % declared in '/var/lib/icinga2/api/packages/director/46b75cce-5f66-49dc-9ea0-3f5dfd5daaa7/zones.d/master/endpoints.conf', lines 19:1-19:42
  * __name = "ica02m02n"
  * host = "ica02m02n"
    % = modified in '/var/lib/icinga2/api/packages/director/46b75cce-5f66-49dc-9ea0-3f5dfd5daaa7/zones.d/master/endpoints.conf', lines 20:5-20:37
  * log_duration = 86400
    % = modified in '/var/lib/icinga2/api/packages/director/46b75cce-5f66-49dc-9ea0-3f5dfd5daaa7/zones.d/master/endpoints.conf', lines 22:5-22:21
  * name = "ica02m02n"
  * package = "director"
  * port = "5665"
    % = modified in '/var/lib/icinga2/api/packages/director/46b75cce-5f66-49dc-9ea0-3f5dfd5daaa7/zones.d/master/endpoints.conf', lines 21:5-21:17
  * source_location
    * first_column = 1
    * first_line = 19
    * last_column = 42
    * last_line = 19
    * path = "/var/lib/icinga2/api/packages/director/46b75cce-5f66-49dc-9ea0-3f5dfd5daaa7/zones.d/master/endpoints.conf"
  * templates = [ "ica02m02n" ]
    % = modified in '/var/lib/icinga2/api/packages/director/46b75cce-5f66-49dc-9ea0-3f5dfd5daaa7/zones.d/master/endpoints.conf', lines 19:1-19:42
  * type = "Endpoint"
  * zone = "master"
1 Like

Also worth noting that all of the examples that I have are check results processed via the API, not via a plugin. Not sure if there’s a bug.

In the example above, the notification is sent out after the 2nd run of the API call. (Script runs on a schedule every 20 minutes)

Bump – trying to avoid opening an issue on github (I always feel like it’s a misconfiguration). I’ll give it until tomorrow morning :slight_smile:

Hello there,
I would like to ask you to refrain from bumping topics this way :slight_smile:
If you don’t get a response to a topic, it’s usually because there wasn’t anyone who could help you at this point - especially if it hasn’t been much more than a day.
If you need more help with a topic and want to put it on the front page again, a good way to do so is giving an update on what you have tried since your last post to give people a few pointers.

In this case your issue reminds me of a similar unsolved one that I have read a while back and can’t seem to find at the moment - but I would like to encourage you to open a bug report on GitHub on base of that though :slight_smile:

1 Like

Thanks for the heads up! I’ll go ahead and open a bug report since it’s reproduceable and I’ve sanity checked configs so many times that I’m losing my sanity :sweat_smile:

I’ll post back here if a solution comes from that.

1 Like

That sounds good, thank you :slight_smile:
I’ll put a good word in with the devs, can’t promise anything though :wink:

Github issue opened:

What do the Debuglog entries for those notifications say? (If you need help activating/using the Debuglog, say so, & I’d get back to this thread… :upside_down_face:)

P.S.: We are slowly but surely going away from scheduled downtimes to more complex notfication (apply) rules with corresponding time intervals set to exclude those “downtime” times, BTW. Simply beacuse the Director, which we mainly use for config., doesn’t seem to accommodate setting scheduled downtimes via the GUI.

1 Like

No worries, very familiar with Icinga on the system side (except when I misconfigure something, of course).

TLDR; debug log is like “sure okay, I’m sending the notification”
It acts as if the downtime is not there, even though it is present.

Because of the size of our environment, I can’t leave debug logs on too long (fills up a 500gb log disk in less than an hour).

I’ve since deleted the debug log from the other day (we are only testing using DTs as a process change for now), but it was tried again this morning (again, against a passive check) and still sends the notification. Below is the entry for the standard log (this was grepped)

[2021-08-13 08:38:15 -0500] information/Checkable: Checkable 'REDACTED HOST!BGP STATUS REDACTED IP' has 1 notification(s). Checking filters for type 'Problem', sends will be logged.
[2021-08-13 08:43:19 -0500] information/Notification: Sending reminder 'Problem' notification 'REDACTED HOST!BGP STATUS REDACTED IP!NOTIFY_rabbitmq_rule' for user 'rabbitmq'
[2021-08-13 08:43:19 -0500] information/Notification: Completed sending 'Problem' notification 'REDACTED HOST!BGP STATUS REDACTED IP!NOTIFY_rabbitmq_rule' for checkable 'REDACTED HOST!BGP STATUS REDACTED IP' and user 'rabbitmq' using command 'Notify-RabbitMQ'

I’ve posted photos from Icingaweb for this in the github issue above, as their “pushback” was “oh it’s a downtime started notification”, but it’s not. Screenshots below so you don’t have to dig them up:

DT Info
image

Service History
image

Notification Info
image

Notification rule (from director)
image

Notification template used for import on the rule (from director
image

PS It’s worth noting we do NOT use the director for scheduling DTs, but instead have our Network Engineers and customer TAC schedule downtimes on services for planned maintenances via the Icingaweb2 interface (ie, select a service, apply the downtime) In most cases, as the one above, they increment the date by 1 month

Log entry for the downtime (there is no removal of the downtime in the logs):

[2021-08-13 06:47:43 -0500] information/ConfigObjectUtility: Created and activated object 'REDACTED HOST!BGP STATUS REDACTED IP!79d53944-2144-4609-be07-548216910cbc' of type 'Downtime'.
[2021-08-13 06:47:43 -0500] information/Downtime: Added downtime 'REDACTED HOST REDACTED IP!79d53944-2144-4609-be07-548216910cbc' between '2021-08-13 06:47:35' and '2021-09-13 07:47:35', author: 'SOME USER', fixed

Okay, looks like a bug! :crazy_face: (The CRITICAL notification in your example is definitely not downtime-started…) BTW, I misunderstood you to mean RECURRING scheduled downtimes (in my answer, above), sorry; should have read your initial post more closely…

One final idea (maybe here we’re subject to “it’s a new feature, not a bug”): Is the user setting the downtime via Icingaweb2 actually an admin. login user able to do that for any service? I’m sort of thinking that maybe there’s a new role setting in your version of Icinga/Web (we use an older Icinga, still) that allows certain login users to only set downtimes for certain contact users, or something…

1 Like

No, the user(s) applying downtime definitely are not admins :slight_smile:

This possible bug was present in 2.11.3-1, which prompted an emergency upgrade to 2.13.0-1, so our version was a bit dated as well.

Still though, could always be some sort of misconfiguration issue. I did look at the code for the passive check via API to make sure it wasn’t forcing notifications (came to me in a dream last night), but it doesn’t do anything weird like that.

The code itself is an old Python 2.7 script that I inherited that is using the icinga2api package and calling client.actions.process_check_result(*args)

I double checked the package code here – github To ensure that it didn’t also call a notification endpoint, and double checked the Icinga2 API docs to make sure it doesn’t also do something weird like trigger notifications.

Is there even a config option to send notifications even when in scheduled downtime? I’m looking through the sync properties in the director and don’t see anything that would apply that.

We also have a script that goes and cleans up comments/acks that do not match a ticket number on our end, but looking through the database and API, it seems like scheduled downtimes are not tied to a comment in the same way that acknowledgements are, despite having a comment field.

Lastly, I did dig up the related configs that director is generating:

Command

// Just a script that sends the alert into a rabbitmq queue
object NotificationCommand "Notify-RabbitMQ" {
    import "plugin-notification-command"
    command = [
        "/bin/python3",
        "/usr/local/bin/icingacrmcasecreation/notify_RabbitMQ.py"
    ]
    timeout = 1m
    arguments += {
        "--agent" = "$agent$"
        "--host" = "monitor.ctac.ena.net"
        "--ic2host" = "$host.name$"
        "--ic2message" = "$output$"
        "--ic2service" = "$service$"
        "--status" = "$state$"
        "--time" = "$time$"
        "--zone" = "$zone$"
    }
}

Notification Template

template Notification "Notify_Template_RMQ" {
    times.begin = 5m
    command = "Notify-RabbitMQ"
    interval = 0s
    period = "24x7"
    states = [ Critical, Down ]
    types = [ Problem ]
    users = [ "rabbitmq" ]
    vars.host = "$HOSTNAME$"
    vars.output = "Notification Type: $NOTIFICATIONTYPE$\\nHost: $HOSTNAME$\\nState: $SERVICESTATE$\\nAddress: $HOSTADDRESS$\\n\\nService: $SERVICEDESC$\\nInfo: $SERVICEOUTPUT$\\n\\nDate/Time: $LONGDATETIME$\\n\\nAcknowledged by: $SERVICEACKAUTHOR$\\nAcknowledgement: $SERVICEACKCOMMENT$\\n\\nNotification ID: $SERVICENOTIFICATIONID$\\nEvent ID: $SERVICEEVENTID$\\nLast Event ID: $LASTSERVICEEVENTID$\\n"
    vars.service = "$SERVICEDESC$"
    vars.status = "$SERVICESTATE$"
    vars.time = "$LONGDATETIME$"
}

Notification Apply Rule

apply Notification "NOTIFY_rabbitmq_rule_REDACTED_ONLY__" to Service {
    import "Notify_Template_RMQ"

    interval = 0s
    assign where "SG_redacted" in service.groups
    states = [ Critical ]
    types = [ Problem ]
    vars.agent = "icinga"
    vars.host = "$host.name$"
    vars.output = "\"*** Icinga ***Notification Type:$notification.type$ Host: $host.name$ ($host.address$) Service: $service.display_name$ State: $service.state$ Info: $service.output$ Date/Time: $icinga.long_date_time$\""
    vars.service = "$service.display_name$"
    vars.status = "$service.state$"
    vars.time = "$icinga.long_date_time$"
    vars.zone = "$service.zone$"
}

User

object User "rabbitmq" {
    import "User Template"

    period = "24x7"
}

Timeperiod

object TimePeriod "24x7" {
    import "legacy-timeperiod"
    import "Timeperiod_Template"

    display_name = "24x7"
    ranges = {
        "friday"        = "00:00-24:00"
        "monday"        = "00:00-24:00"
        "saturday"      = "00:00-24:00"
        "sunday"        = "00:00-24:00"
        "thursday"      = "00:00-24:00"
        "tuesday"       = "00:00-24:00"
        "wednesday"     = "00:00-24:00"
    }
}

Service

object Service "BGP STATUS REDACTED" {
    host_name = "redacted.net"
    import "Service_Template_15min"
    import "BGP ESP STATUS"

    display_name = "BGP STATUS REDACTED"
    groups = [ "SG_ops" ]
    vars.peerip = "10.25.1.1"
}

Strange behavior.
Only idea I have:
Did you check the service via the API, if it is actually in a downtime? Not sure how the attribute is named, but i guess that should be visible in the service object.

1 Like

Downtime is created as far as the logs go, I could query the object to find the downtime when I get my next batch of evidence (ie, the customer TAC telling me that it alerted again, knowing fully that we are actively investigating :laughing: )

This leads me to another thought – I wonder if the director is removing/recreating the service. In the director, I am merge-ing the changes, not replacing the object, but it’s worth a look.

No go on the director object:
image

That’s the entire history for the object, and doesn’t line up with the date(s) we saw for this one (last week).

I found a very recent example, the service was already acknowledged, so a notification wasn’t sent. However, I was able to confirm via the API that the DT is NOT present, even though it shows in the web UI. The DT was applied at the host level 20 minutes ago (and applied to all services). It shows in Icingaweb2, but not in the API.

Looking in the database, the DT is present in the icinga_downtimehistory table.

Edit:
I confirmed that the DT object does not exist in icinga_scheduleddowntime table either.

Edit 2:
I’m going to discuss getting the general query log turned on for our maria instance; hopefully that can give us some insight to what is removing the object.

1 Like

I was able to find entries similar to this in the maria logs three times over an hour:

                11785352 Query  DELETE FROM icinga_scheduleddowntime WHERE instance_id = 1 AND session_token <> 1629306642
                11787940 Query  DELETE FROM icinga_scheduleddowntime WHERE instance_id = 1 AND session_token <> 1629308497
                11790063 Query  DELETE FROM icinga_scheduleddowntime WHERE instance_id = 1 AND session_token <> 1629310402

Not a CPP expert at all, but it seems to line up with the following:

With that said:
1.) Why is Icinga Removing these? Is it related to a reload?
2.) If it’s related to a reload (ie, director pushing out a new config package), shouldn’t Icinga recreate them? Best I can tell, they are getting deleted, and never recreated like other objects (comments, acknowledgements, etc).