Service notification type=PROBLEM state=OK

Hi there,

Everytime we reload our Icinga2 server (after a config change), we get some service notifications with type=PROBLEM state=OK

I can’t find any informations about this particular values. Usual notifications are PROBLEM/CRITICAL (because we filter on this values).

How can this happen? What does it mean?

Thx.

Hi,

please share more concrete details about this, just as:

  • icinga2 --version
  • Specific log entries from icinga2.log for these notifications
  • The email (or other) including the full content
  • Is this reproducible every time?

Note: This is something which is not expected to happen, so the more details you provide the better we can analyse this together.

Cheers,
Michael

Hi Michael,

thx for looking into this (and sorry for the long delay answer, it took us some time to get the right logs). It’s not reproducible everytime. It happens quite rarely actually.

# icinga2 --version
icinga2 - The Icinga 2 network monitoring daemon (version: r2.9.1-1)

Copyright (c) 2012-2018 Icinga Development Team (https://www.icinga.com/)
License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl2.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Application information:
  Installation root: /usr
  Sysconf directory: /etc
  Run directory: /run
  Local state directory: /var
  Package data directory: /usr/share/icinga2
  State path: /var/lib/icinga2/icinga2.state
  Modified attributes path: /var/lib/icinga2/modified-attributes.conf
  Objects path: /var/cache/icinga2/icinga2.debug
  Vars path: /var/cache/icinga2/icinga2.vars
  PID path: /run/icinga2/icinga2.pid

System information:
  Platform: Debian GNU/Linux
  Platform version: 7 (wheezy)
  Kernel: Linux
  Kernel version: 4.14.79-xxxx-std-ipv6-64
  Architecture: x86_64

Build information:
  Compiler: GNU 4.7.2
  Build host: 402f99151cf6

The logs:

[2019-08-08 12:18:09 +0000] debug/ServiceDbObject: service dependencies for 'server1.example.com!service1'
[2019-08-08 12:18:09 +0000] debug/ServiceDbObject: service contacts: server1.example.com!service1
[2019-08-08 12:18:09 +0000] debug/ServiceDbObject: service contactgroups: server1.example.com!service1
[2019-08-08 12:19:28 +0000] notice/NotificationComponent: Attempting to send reminder notification 'server1.example.com!service1!ldp'
[2019-08-08 12:19:28 +0000] notice/Notification: Attempting to send reminder notifications for notification object 'server1.example.com!service1!ldp'.
[2019-08-08 12:19:28 +0000] information/Notification: Sending reminder 'Problem' notification 'server1.example.com!service1!ldp' for user 'icingaadmin'
[2019-08-08 12:19:28 +0000] debug/DbEvents: add notification history for 'server1.example.com!service1'
[2019-08-08 12:19:28 +0000] debug/DbEvents: add contact notification history for service 'server1.example.com!service1' and user 'icingaadmin'.
[2019-08-08 12:19:28 +0000] information/ExternalCommandListener: Executing external command: [1565266768] PROCESS_SERVICE_CHECK_RESULT;server1.example.com;service1;0;
[2019-08-08 12:19:28 +0000] debug/DbEvents: add log entry history for 'server1.example.com!service1'
[2019-08-08 12:19:28 +0000] debug/DbEvents: add checkable check history for 'server1.example.com!service1'
[2019-08-08 12:19:28 +0000] debug/DbEvents: add state change history for 'server1.example.com!service1'
[2019-08-08 12:19:28 +0000] notice/Checkable: State Change: Checkable 'server1.example.com!service1' hard state change from UNKNOWN to OK detected.
[2019-08-08 12:19:28 +0000] information/Checkable: Checking for configured notifications for object 'server1.example.com!service1'
[2019-08-08 12:19:28 +0000] debug/Checkable: Checkable 'server1.example.com!service1' has 1 notification(s).
[2019-08-08 12:19:28 +0000] notice/Notification: Attempting to send  notifications for notification object 'server1.example.com!service1!ldp'.
[2019-08-08 12:19:28 +0000] information/Notification: Sending 'Recovery' notification 'server1.example.com!service1!ldp' for user 'icingaadmin'
[2019-08-08 12:19:28 +0000] debug/DbEvents: add notification history for 'server1.example.com!service1'
[2019-08-08 12:19:28 +0000] debug/DbEvents: add contact notification history for service 'server1.example.com!service1' and user 'icingaadmin'.
[2019-08-08 12:19:28 +0000] notice/Process: Running command '/usr/bin/logger' '--tag' 'icinga-alerts' 'host:icinga1.example.com time:1565266768 type:PROBLEM    state:OK    server:server1.example.com  service:service1': PID 7604
[2019-08-08 12:19:28 +0000] debug/DbEvents: add log entry history for 'server1.example.com!service1'
[2019-08-08 12:19:28 +0000] information/Notification: Completed sending 'Problem' notification 'server1.example.com!service1!ldp' for checkable 'server1.example.com!service1' and user 'icingaadmin'.
[2019-08-08 12:19:28 +0000] notice/Process: PID 7604 ('/usr/bin/logger' '--tag' 'icinga-alerts' 'host:icinga1.example.com time:1565266768 type:PROBLEM    state:OK    server:server1.example.com  service:service1') terminated with exit code 0
[2019-08-08 12:19:29 +0000] notice/Process: Running command '/usr/bin/logger' '--tag' 'icinga-alerts' 'host:icinga1.example.com time:1565266769 type:RECOVERY   state:OK    server:server1.example.com  service:service1': PID 9771
[2019-08-08 12:19:29 +0000] debug/DbEvents: add log entry history for 'server1.example.com!service1'
[2019-08-08 12:19:29 +0000] information/Notification: Completed sending 'Recovery' notification 'server1.example.com!service1!ldp' for checkable 'server1.example.com!service1' and user 'icingaadmin'.
[2019-08-08 12:19:29 +0000] notice/Process: PID 9771 ('/usr/bin/logger' '--tag' 'icinga-alerts' 'host:icinga1.example.com time:1565266769 type:RECOVERY   state:OK    server:server1.example.com  service:service1') terminated with exit code 0

And here are the corresponding configuration sections:

template Notification "service-notification" {
    users = ["icingaadmin"]
    period = "24x7"
    interval = 1h
}

apply Notification "ldp" to Service {
    import "service-notification"
    command = "service-notification-logger"
    assign where true
}

object NotificationCommand "service-notification-logger" {
    command = [ "/usr/bin/logger",
                "--tag", "icinga-alerts",
                "host:icinga1.example.com time:$icinga.timet$ type:$notification.type$    state:$service.state$   server:$host.display_name$  service:$service.display_name$" ]
}

Ok, thanks. That really looks odd, I have never seen this before. Since you’re using 2.9.1 I’d suggest upgrading … but beforehand, Wheezy needs a dist upgrade to at least Jessie, better Stretch and even better Buster :kissing_heart:

2.10 contains adjustments for notifications, it may be fixed there already.

That’s the plan. We already upgraded one server last week to stretch. The next should be upgraded next week. After that, we will upgrade icinga. I’ll keep this thread up-to-date to tell if it’s fixed or not.

One thing that bother me is from the log, in this order:

[2019-08-08 12:19:28 +0000] information/Notification: Sending reminder 'Problem' notification 'server1.example.com!service1!ldp' for user 'icingaadmin'
...
[2019-08-08 12:19:28 +0000] notice/Checkable: State Change: Checkable 'server1.example.com!service1' hard state change from UNKNOWN to OK detected.
...
[2019-08-08 12:19:28 +0000] notice/Process: Running command '/usr/bin/logger' '--tag' 'icinga-alerts' 'host:icinga1.example.com time:1565266768 type:PROBLEM    state:OK    server:server1.example.com  service:service1': PID 7604

Do you think it can just be an “atomicity” issue? Like

  1. I need to send a PROBLEM notification
  2. The state is changed to OK
  3. I read the state
  4. I send PROBLEM / OK

Maybe. How often is that service being checked? Or likewise, add its configuration, icinga2 object list --type Service --name .. is sufficient.

It’s a passive check. Informations are received through nsca every 5 minutes. We have about 220000 services.

# icinga2 object list --type Service --name 'server1.example.com!service1'
Object 'server1.example.com!service1' of type 'Service':
  % declared in '/etc/icinga2/conf.d/services-common.conf', lines 51:1-51:63
  * __name = "server1.example.com!service1"
  * action_url = ""
  * check_command = "dummy"
    % = modified in '/etc/icinga2/conf.d/templates.conf', lines 30:5-30:27
  * check_interval = 900
    % = modified in '/etc/icinga2/conf.d/templates.conf', lines 26:5-26:24
  * check_period = ""
  * check_timeout = null
  * command_endpoint = ""
  * display_name = "service1"
  * enable_active_checks = true
    % = modified in '/etc/icinga2/conf.d/templates.conf', lines 29:5-29:31
  * enable_event_handler = true
  * enable_flapping = false
  * enable_notifications = true
    % = modified in '/etc/icinga2/conf.d/templates.conf', lines 33:5-33:31
  * enable_passive_checks = true
  * enable_perfdata = true
  * event_command = ""
  * flapping_threshold = 0
  * flapping_threshold_high = 30
  * flapping_threshold_low = 25
  * groups = [ ]
  * host_name = "server1.example.com"
    % = modified in '/etc/icinga2/conf.d/services-common.conf', lines 51:1-51:63
  * icon_image = ""
  * icon_image_alt = ""
  * max_check_attempts = 2
    % = modified in '/etc/icinga2/conf.d/templates.conf', lines 25:5-25:26
  * name = "service1"
    % = modified in '/etc/icinga2/conf.d/services-common.conf', lines 51:1-51:63
  * notes = ""
  * notes_url = ""
  * package = "_etc"
    % = modified in '/etc/icinga2/conf.d/services-common.conf', lines 51:1-51:63
  * retry_interval = 30
    % = modified in '/etc/icinga2/conf.d/templates.conf', lines 27:5-27:24
  * source_location
    * first_column = 1
    * first_line = 51
    * last_column = 63
    * last_line = 51
    * path = "/etc/icinga2/conf.d/services-common.conf"
  * templates = [ "service1", "generic-service" ]
    % = modified in '/etc/icinga2/conf.d/services-common.conf', lines 51:1-51:63
    % = modified in '/etc/icinga2/conf.d/templates.conf', lines 24:1-24:34
  * type = "Service"
  * vars
    * dummy_state = 3
      % = modified in '/etc/icinga2/conf.d/templates.conf', lines 31:5-31:24
    * dummy_text = "No check results received."
      % = modified in '/etc/icinga2/conf.d/templates.conf', lines 32:5-32:50
  * volatile = false
  * zone = ""

Ok. Can you reproduce this then with a dummy check of yours, simulating this behaviour?