PROBLEM notification sent during recurring downtime

Seems I’m stuck in the forest and can’t see the single tree anymore…

I have two hosts, cla005 and cla006. Both have the same downtime applied using an apply rule:

# Claro daily restart (ISD-63675)
apply ScheduledDowntime "service-restarts" to Service {
  author = "icingaadmin"
  comment = "Scheduled downtimes for daily restarts of cla servers"

  ranges = {
    "monday" = "06:45-07:00"
    "tuesday" = "06:45-07:00"
    "wednesday" = "06:45-07:00"
    "thursday" = "06:45-07:00"
    "friday" = "06:45-07:00"
    "saturday" = "06:45-07:00"
    "sunday" = "06:45-07:00"
  }

  assign where match("cla005", host.name) || match("cla006", host.name)
}

The downtimes are correctly applied on both hosts:


image


image


The problem? When the service is down during the downtime on cla005, Icinga2 sends a PROBLEM notification. The same service on cla006 does not send a notification.

cla005:

image

cla006:

image

The service is clearly in the downtime, yet notification is sent. And only on one host. That’s what confuses me even more.
The service checks are applied the same way with the same check and notification settings on both hosts.

Any idea what this could cause?

Hi,

might be the problem that Icinga 2 itself doesn’t know about the downtime anymore (broken _api package). 2.11 fixes a lot in this regard, you might also want to check the troubleshooting docs for the package itself.

Which version of Icinga 2 are you using?

Cheers,
Michael

2.10.5, clustered master (you know the setup :wink: ).

Icinga 2 itself doesn’t know about the downtime anymore

Would surprise me, but cannot rule that out either.
I see the downtime correctly in both icingaweb2 and on the objects list (icinga2 object list --type ScheduledDowntime).

Didn’t try 2.11 yet, will first have to go through the test environment. Besides that possibility (fixed in new 2.11), any other ideas?
I will upgrade to 2.11 next week and see if this helps.

Meanwhile on Icinga2 2.11 but the same problem still exists; notification is sent during downtime:

Service Object:

Object 'cla005!Service MyService' of type 'Service':
  % declared in '/etc/icinga2/zones.d/global-templates/applyrules/windowsservices.conf', lines 4:1-4:63
  * __name = "cla005!Service MyService"
  * check_command = "nscp"
    % = modified in '/etc/icinga2/zones.d/global-templates/applyrules/windowsservices.conf', lines 7:3-7:24
  * check_interval = 60
    % = modified in '/etc/icinga2/zones.d/global-templates/templates/service-60s-templates.conf', lines 20:2-20:21
  * check_period = "24x7"
    % = modified in '/etc/icinga2/zones.d/global-templates/templates/service-base-template.conf', lines 11:2-11:22
  * check_timeout = null
  * command_endpoint = ""
  * display_name = "Service MyService"
  * enable_active_checks = true
  * enable_event_handler = true
  * enable_flapping = false
  * enable_notifications = true
    % = modified in '/etc/icinga2/zones.d/global-templates/templates/service-60s-templates.conf', lines 23:2-23:25
  * enable_passive_checks = true
  * enable_perfdata = true
  * event_command = ""
  * flapping_threshold = 0
  * flapping_threshold_high = 30
  * flapping_threshold_low = 25
  * groups = [ ]
  * host_name = "cla005"
    % = modified in '/etc/icinga2/zones.d/global-templates/applyrules/windowsservices.conf', lines 4:1-4:63
  * icon_image = ""
  * icon_image_alt = ""
  * max_check_attempts = 3
    % = modified in '/etc/icinga2/zones.d/global-templates/templates/service-60s-templates.conf', lines 22:2-22:23
  * name = "Service MyService"
    % = modified in '/etc/icinga2/zones.d/global-templates/applyrules/windowsservices.conf', lines 4:1-4:63
  * package = "_etc"
    % = modified in '/etc/icinga2/zones.d/global-templates/applyrules/windowsservices.conf', lines 4:1-4:63
  * retry_interval = 30
    % = modified in '/etc/icinga2/zones.d/global-templates/templates/service-60s-templates.conf', lines 21:2-21:21
  * source_location
    * first_column = 1
    * first_line = 4
    * last_column = 63
    * last_line = 4
    * path = "/etc/icinga2/zones.d/global-templates/applyrules/windowsservices.conf"
  * templates = [ "Service MyService", "service-60s-normal", "service-base" ]
    % = modified in '/etc/icinga2/zones.d/global-templates/applyrules/windowsservices.conf', lines 4:1-4:63
    % = modified in '/etc/icinga2/zones.d/global-templates/templates/service-60s-templates.conf', lines 18:1-18:37
    % = modified in '/etc/icinga2/zones.d/global-templates/templates/service-base-template.conf', lines 8:1-8:31
  * type = "Service"
  * vars
    * influx_append = ""
      % = modified in '/etc/icinga2/zones.d/global-templates/templates/service-base-template.conf', lines 12:9-12:31
    * notification
      * interval = 10800
        % = modified in '/etc/icinga2/zones.d/global-templates/templates/service-60s-templates.conf', lines 25:2-25:32
      * period = "businesshours"
        % = modified in '/etc/icinga2/zones.d/global-templates/templates/service-60s-templates.conf', lines 26:2-26:43
      * type = "mail"
        % = modified in '/etc/icinga2/zones.d/global-templates/templates/service-60s-templates.conf', lines 24:2-24:32
    * nscp_params = "MyService"
      % = modified in '/etc/icinga2/zones.d/global-templates/applyrules/windowsservices.conf', lines 10:3-10:28
    * nscp_port = "1248"
      % = modified in '/etc/icinga2/zones.d/global-templates/applyrules/windowsservices.conf', lines 8:3-8:25
    * nscp_variable = "SERVICESTATE"
      % = modified in '/etc/icinga2/zones.d/global-templates/applyrules/windowsservices.conf', lines 9:3-9:37
  * volatile = false
  * zone = "master"
    % = modified in '/etc/icinga2/zones.d/global-templates/applyrules/windowsservices.conf', lines 4:1-4:63

Downtime object:

Object 'cla005!Service MyService!myservice-daily-restarts' of type 'ScheduledDowntime':
  % declared in '/etc/icinga2/zones.d/global-templates/applyrules/downtimes.conf', lines 195:1-195:57
  * __name = "cla005!Service MyService!myservice-daily-restarts"
  * author = "icingaadmin"
    % = modified in '/etc/icinga2/zones.d/global-templates/applyrules/downtimes.conf', lines 196:3-196:24
  * child_options = "DowntimeNoChildren"
  * comment = "Scheduled downtimes for daily restarts"
    % = modified in '/etc/icinga2/zones.d/global-templates/applyrules/downtimes.conf', lines 197:3-197:52
  * duration = 0
  * fixed = true
  * host_name = "cla005"
    % = modified in '/etc/icinga2/zones.d/global-templates/applyrules/downtimes.conf', lines 195:1-195:57
  * name = "myservice-daily-restarts"
  * package = "_etc"
    % = modified in '/etc/icinga2/zones.d/global-templates/applyrules/downtimes.conf', lines 195:1-195:57
  * ranges
    % = modified in '/etc/icinga2/zones.d/global-templates/applyrules/downtimes.conf', lines 199:3-207:3
    * friday = "06:45-07:00"
    * monday = "06:45-07:00"
    * saturday = "06:45-07:00"
    * sunday = "06:45-07:00"
    * thursday = "06:45-07:00"
    * tuesday = "06:45-07:00"
    * wednesday = "06:45-07:00"
  * service_name = "Service MyService"
    % = modified in '/etc/icinga2/zones.d/global-templates/applyrules/downtimes.conf', lines 195:1-195:57
  * source_location
    * first_column = 1
    * first_line = 195
    * last_column = 57
    * last_line = 195
    * path = "/etc/icinga2/zones.d/global-templates/applyrules/downtimes.conf"
  * templates = [ "myservice-daily-restarts" ]
    % = modified in '/etc/icinga2/zones.d/global-templates/applyrules/downtimes.conf', lines 195:1-195:57
  * type = "ScheduledDowntime"
  * vars = null
  * zone = "master"
    % = modified in '/etc/icinga2/zones.d/global-templates/applyrules/downtimes.conf', lines 195:1-195:57

I guess I will have to try with the debug log enabled and hopefully find something there.

1 Like

A couple of days after the upgrade to 2.11, the downtime is now respected and notifications are not sent anymore. However a day after the upgrade, the notification was still sent (see above), so I am not sure why all of a sudden this now behaves correctly.

Maybe the downtime scheduled at the day of the update was still incorrect (or only visible in the DB but not core), and the new ones following up are now correct.

Yes, that’s possible @dnsmichi. Looks like one day ahead (next day) is already scheduled. I upgraded Icinga 2 on October 1st. So 2.10.x was still running on October 1st in the morning and probably already scheduled the downtime for October 2nd. This is when the notifications during the downtime were sent the last time. Scheduled downtimes starting from October 3rd were then “managed” by Icinga 2.11.x and I haven’t seen the notifications since.