Notification for one Service on one host not working

sebo · August 21, 2019, 9:33am

Hello togehter,

i got two LB haproxy with keepalived. I do passive service checks for both services haproxy and keeplaived.

On LB02 i get Notification Mail for both services, but for LB01 i only get mail for keepalived.

Here is my hosts.conf.


object Host "dev-loadbalancer-01" {

  address = "x.x.x.x"
  check_command = "hostalive"

  # Plugin Checks
    vars.check_ssh = true
    vars.check_load = true
    vars.check_procs = true
    vars.check_swap = true
    vars.check_users = true
    vars.check_apt = true

  # OS
    vars.os = "Linux"

  # Passive Service Checks
    vars.check_haproxy = true
    vars.check_keepalived = true

    # Disk Checks
  vars.disks["disk /"] = {
    disk_partition = "/"
    disk_warning = "20%"
    disk_critical = "10%"
  }

  vars.metric = "hostalive"
  vars.notification["mail"] = {
  groups = [ "icingaadmins" ]
 }
}

object Host "dev-loadbalancer-02" {
  address = "x.x.x.x"
  check_command = "hostalive"

  # Plugin Checks
    vars.check_ssh = true
    vars.check_load = true
    vars.check_procs = true
    vars.check_swap = true
    vars.check_users = true
    vars.check_apt = true

  # OS
    vars.os = "Linux"

  # Passive Service Checks
    vars.check_haproxy = true
    vars.check_keepalived = true

 # Disk Checks
  vars.disks["disk /"] = {
    disk_partition = "/"
    disk_warning = "20%"
    disk_critical = "10%"
  }

  vars.metric = "hostalive"
  vars.notification["mail"] = {
  groups = [ "icingaadmins" ]
 }
}

apply Service:

apply Service "haproxy" {
  check_command = "dummy"
  vars.dummy_state = 1
  vars.dummy_text = "Keine Daten"
  vars.metric = "haproxy_service"
  assign where host.vars.check_haproxy
}

my bashscript:

#!/bin/bash
if [ $(systemctl is-active haproxy) == 'active' ] > /dev/null
then
        curl -k -s -u icingaweb2:xxxxx \
        -H 'Accept: application/json' \
        -X POST 'https://x.x.x.x/v1/actions/process-check-result?service=dev-loadbalancer-02!haproxy' \
        -d '{ "exit_status":0, "plugin_output": "haproxy is running", "check_source": "passive" }'

        curl -k -s -u icingaweb2:xxxxx \
        -H 'Accept: application/json' \
        -X POST 'https://192.x.x.x./v1/actions/process-check-result?service=dev-loadbalancer-02!haproxy' \
        -d '{ "exit_status":0, "plugin_output": "haproxy is running", "check_source": "passive" }'
else
        curl -k -s -u icingaweb2:xxxxxx \
        -H 'Accept: application/json' \
        -X POST 'https://x.x.x.x/v1/actions/process-check-result?service=dev-loadbalancer-02!haproxy' \
        -d '{ "exit_status":2, "plugin_output": "haproxy is down", "check_source": "passive" }'

        curl -k -s -u icingaweb2:xxxxx \
        -H 'Accept: application/json' \
        -X POST 'https://x.x.x.x/v1/actions/process-check-result?service=dev-loadbalancer-02!haproxy' \
        -d '{ "exit_status":2, "plugin_output": "haproxy is down", "check_source": "passive" }'
fi

Here the debug.log when i shutdown haproxy.service LB01:

tail -F /var/log/icinga2/debug.log | grep "dev-loadbalancer-01\!haproxy"
[2019-08-21 11:05:01 +0200] debug/HttpRequest: line: POST /v1/actions/process-check-result?service=dev-loadbalancer-01!haproxy HTTP/1.1, tokens: 3
[2019-08-21 11:05:01 +0200] debug/DbEvents: add log entry history for 'dev-loadbalancer-01!haproxy'
[2019-08-21 11:05:01 +0200] debug/DbEvents: add checkable check history for 'dev-loadbalancer-01!haproxy'
[2019-08-21 11:05:01 +0200] debug/DbEvents: add state change history for 'dev-loadbalancer-01!haproxy'
[2019-08-21 11:05:01 +0200] notice/Checkable: State Change: Checkable 'dev-loadbalancer-01!haproxy' soft state change from OK to CRITICAL detected.
[2019-08-21 11:05:01 +0200] debug/DbEvents: add log entry history for 'dev-loadbalancer-01!haproxy'
[2019-08-21 11:05:01 +0200] debug/DbEvents: add checkable check history for 'dev-loadbalancer-01!haproxy'
[2019-08-21 11:05:01 +0200] debug/DbEvents: add state change history for 'dev-loadbalancer-01!haproxy'
[2019-08-21 11:05:01 +0200] notice/Checkable: State Change: Checkable 'dev-loadbalancer-01!haproxy' soft state change from CRITICAL to CRITICAL detected.
[2019-08-21 11:06:01 +0200] debug/HttpRequest: line: POST /v1/actions/process-check-result?service=dev-loadbalancer-01!haproxy HTTP/1.1, tokens: 3
[2019-08-21 11:06:01 +0200] debug/DbEvents: add log entry history for 'dev-loadbalancer-01!haproxy'
[2019-08-21 11:06:01 +0200] debug/DbEvents: add checkable check history for 'dev-loadbalancer-01!haproxy'
[2019-08-21 11:06:01 +0200] debug/DbEvents: add state change history for 'dev-loadbalancer-01!haproxy'
[2019-08-21 11:06:01 +0200] notice/Checkable: State Change: Checkable 'dev-loadbalancer-01!haproxy' hard state change from CRITICAL to CRITICAL detected.
[2019-08-21 11:06:01 +0200] information/Checkable: Checking for configured notifications for object 'dev-loadbalancer-01!haproxy'
[2019-08-21 11:06:01 +0200] debug/Checkable: Checkable 'dev-loadbalancer-01!haproxy' has 1 notification(s).
[2019-08-21 11:06:02 +0200] debug/DbEvents: add checkable check history for 'dev-loadbalancer-01!haproxy'
[2019-08-21 11:07:01 +0200] debug/HttpRequest: line: POST /v1/actions/process-check-result?service=dev-loadbalancer-01!haproxy HTTP/1.1, tokens: 3

Here for LB02:

[2019-08-21 11:26:01 +0200] debug/HttpRequest: line: POST /v1/actions/process-check-result?service=dev-loadbalancer-02!haproxy HTTP/1.1, tokens: 3
[2019-08-21 11:26:01 +0200] debug/DbEvents: add log entry history for 'dev-loadbalancer-02!haproxy'
[2019-08-21 11:26:01 +0200] debug/DbEvents: add checkable check history for 'dev-loadbalancer-02!haproxy'
[2019-08-21 11:26:01 +0200] debug/DbEvents: add state change history for 'dev-loadbalancer-02!haproxy'
[2019-08-21 11:26:01 +0200] notice/Checkable: State Change: Checkable 'dev-loadbalancer-02!haproxy' soft state change from OK to CRITICAL detected.
[2019-08-21 11:26:01 +0200] notice/Checkable: Skipping event handler for HA-paused checkable 'dev-loadbalancer-02!haproxy'
[2019-08-21 11:26:01 +0200] debug/DbEvents: add log entry history for 'dev-loadbalancer-02!haproxy'
[2019-08-21 11:26:01 +0200] debug/DbEvents: add checkable check history for 'dev-loadbalancer-02!haproxy'
[2019-08-21 11:26:01 +0200] debug/DbEvents: add state change history for 'dev-loadbalancer-02!haproxy'
[2019-08-21 11:26:01 +0200] notice/Checkable: State Change: Checkable 'dev-loadbalancer-02!haproxy' soft state change from CRITICAL to CRITICAL detected.
[2019-08-21 11:26:01 +0200] notice/Checkable: Skipping event handler for HA-paused checkable 'dev-loadbalancer-02!haproxy'
[2019-08-21 11:27:01 +0200] debug/HttpRequest: line: POST /v1/actions/process-check-result?service=dev-loadbalancer-02!haproxy HTTP/1.1, tokens: 3
[2019-08-21 11:27:01 +0200] debug/DbEvents: add log entry history for 'dev-loadbalancer-02!haproxy'
[2019-08-21 11:27:01 +0200] debug/DbEvents: add checkable check history for 'dev-loadbalancer-02!haproxy'
[2019-08-21 11:27:01 +0200] debug/DbEvents: add state change history for 'dev-loadbalancer-02!haproxy'
[2019-08-21 11:27:01 +0200] notice/Checkable: State Change: Checkable 'dev-loadbalancer-02!haproxy' hard state change from CRITICAL to CRITICAL detected.
[2019-08-21 11:27:01 +0200] notice/Checkable: Skipping event handler for HA-paused checkable 'dev-loadbalancer-02!haproxy'
[2019-08-21 11:27:01 +0200] information/Checkable: Checking for configured notifications for object 'dev-loadbalancer-02!haproxy'
[2019-08-21 11:27:01 +0200] debug/Checkable: Checkable 'dev-loadbalancer-02!haproxy' has 1 notification(s).
[2019-08-21 11:27:01 +0200] notice/Notification: Attempting to send  notifications for notification object 'dev-loadbalancer-02!haproxy!mail-icingaadmin'.
[2019-08-21 11:27:01 +0200] information/Notification: Sending 'Problem' notification 'dev-loadbalancer-02!haproxy!mail-icingaadmin' for user 'icingaadmin'
[2019-08-21 11:27:01 +0200] debug/DbEvents: add notification history for 'dev-loadbalancer-02!haproxy'
[2019-08-21 11:27:01 +0200] debug/DbEvents: add contact notification history for service 'dev-loadbalancer-02!haproxy' and user 'icingaadmin'.
[2019-08-21 11:27:01 +0200] debug/DbEvents: add log entry history for 'dev-loadbalancer-02!haproxy'
[2019-08-21 11:27:01 +0200] information/Notification: Completed sending 'Problem' notification 'dev-loadbalancer-02!haproxy!mail-icingaadmin' for checkable 'dev-loadbalancer-02!haproxy' and user 'icingaadmin'.
[2019-08-21 11:27:02 +0200] debug/DbEvents: add checkable check history for 'dev-loadbalancer-02!haproxy'
[2019-08-21 11:27:02 +0200] warning/PluginNotificationTask: Notification command for object 'dev-loadbalancer-02!haproxy' (PID: 19874, arguments: '/etc/icinga2/scripts/mail-service-notification.sh' '-4' '192.168.2.101' '-6' '' '-b' '' '-c' '' '-d' '2019-08-21 11:27:01 +0200' '-e' 'haproxy' '-l' 'dev-loadbalancer-02' '-n' 'dev-loadbalancer-02' '-o' 'haproxy is down' '-r' 'monitoring@stashcat.com' '-s' 'CRITICAL' '-t' 'PROBLEM' '-u' 'haproxy' '-v' 'true') terminated with exit code 1, output: GPGME: CMS protocol not available

So i cant understand why for LB02 everything is working and LB01 only misses the haporxy service check notification mail.

Can you help?

Kind regards
Sebo

dnsmichi · August 21, 2019, 9:57am

Hi,

how does the notification object/apply rule look like for both of them? Highly likely a filter prevents sending the notification - or it using the wrong command.

Which icinga2 --version is involved here?

Cheers,
Michael

sebo · August 21, 2019, 10:25am

icinga2 --version
icinga2 - The Icinga 2 network monitoring daemon (version: r2.10.5-1)

Copyright (c) 2012-2019 Icinga GmbH (https://icinga.com/)
License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl2.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

System information:
  Platform: Debian GNU/Linux
  Platform version: 9 (stretch)
  Kernel: Linux
  Kernel version: 4.9.0-9-amd64
  Architecture: x86_64

Build information:
  Compiler: GNU 6.3.0
  Build host: cb654124b660

/etc/icinga2/conf.d/notifications.conf

/**
 * The example notification apply rules.
 *
 * Only applied if host/service objects have
 * the custom attribute `notification` defined
 * and containing `mail` as key.
 *
 * Check `hosts.conf` for an example.
 */

apply Notification "mail-icingaadmin" to Host {
  import "mail-host-notification"

  user_groups = host.vars.notification.mail.groups

  assign where host.vars.notification.mail
}

apply Notification "mail-icingaadmin" to Service {
  import "mail-service-notification"

  user_groups = host.vars.notification.mail.groups

  assign where host.vars.notification.mail
}

/etc/icinga2/conf.d/templates.conf

template Notification "mail-host-notification" {
  command = "mail-host-notification"

  states = [ Up, Down ]
  types = [ Problem, Acknowledgement, Recovery, Custom,
            FlappingStart, FlappingEnd,
            DowntimeStart, DowntimeEnd, DowntimeRemoved ]

  interval = 5m
  times.begin = 0m

  period = "24x7"

  vars += {
    // notification_icingaweb2url = "https://x.x.x.x/icingaweb2"
    // notification_from = "Icinga 2 Host Monitoring <icinga@>"
    notification_logtosyslog = true
  }

}


template Notification "mail-service-notification" {
  command = "mail-service-notification"

  states = [ OK, Warning, Critical, Unknown ]
  types = [ Problem, Acknowledgement, Recovery, Custom,
            FlappingStart, FlappingEnd,
            DowntimeStart, DowntimeEnd, DowntimeRemoved ]

  interval = 5m
  times.begin = 0m
  period = "24x7"

  vars += {
    // notification_icingaweb2url = "https://x.x.x.x/icingaweb2"
    // notification_from = "Icinga 2 Service Monitoring <icinga@>"
    notification_logtosyslog = true
  }
}

/etc/icinga2/conf.d/users.conf

/**
* The example user 'icingaadmin' and the example
* group 'icingaadmins'.
*/
object User "icingaadmin" {
import "generic-user"
display_name = "Icinga 2 Admin"
groups = [ "icingaadmins" ]
email = "monitoring@xxx"
states = [ OK, Warning, Critical, Unknown, Down ]
types = [ Problem, Recovery ]
}
object UserGroup "icingaadmins" {
display_name = "Icinga 2 Admin Group"
}

dnsmichi · August 21, 2019, 10:41am

HA cluster with two masters involved here?

sebo · August 21, 2019, 10:48am

Yes, two master with there own ido-sql.

object Endpoint "icinga2-master01" {
        host = "x.x.x.x"
}
object Endpoint "icinga2-master02" {
        host = "x.x.x.x"
}



object Zone "master" {
endpoints = [
"icinga2-master01","icinga2-master02"]
}

/etc/icinga2/features-enabled/ido-mysql.conf:  enable_ha = false

dnsmichi · August 21, 2019, 11:35am

So my guess is that the notification object for the first load balancer is actually executed on the other master where this should be logged. You can check that by querying the REST API /v1/objects/notifications on both masters, and checking for paused=false - that being the one which actually triggers the notification sending.

Cheers,
Michael

sebo · September 23, 2019, 1:23pm

Hello Michi,

i just rolled out my setup again. Now both Master say:

notice/Notification Component: Reminder notification HA cluster active, this endpoint does not have the authority (paused=true). Skipping.

So it points directly to your guess but this time none of the masters Notify.

dnsmichi · September 23, 2019, 5:38pm

Really odd, this happens after a while running right? During startup, there’s a cold startup where the object authority is not yet updated. We’ve improved this behaviour with 2.11, are you using this version already?

Cheers,
Michael

sebo · September 24, 2019, 6:56am

I think it happend after, the upgrade from 2.10.5-1.stretch, 2.11.0-2.stretch. I had some dpkg stuff about the ido-mysql. So yes im now using 2.11.

Is there a way to tell the Cluster to update the authority again?

sebo · September 24, 2019, 11:40am

So after searching for the Problem i found it, it was my custom mail-service-notification.sh and mail-host-notification.sh. Which had a code chang from:

urlencode() {
  local LANG=C i c e=''
  for ((i=0;i<${#1};i++)); do
    c=${1:$i:1}
    [[ "$c" =~ [a-zA-Z0-9\.\~\_\-] ]] || printf -v c '%%%02X' "'$c"
    e+="$c"
  done
  echo "$e"
}

to

urlencode() {
  local LANG=C i=0 c e s="$1"
  while [ $i -lt ${#1} ]; do
    [ "$i" -eq 0 ] || s="${s#?}"
    c=${s%"${s#?}"}
    [ -z "${c#[[:alnum:].~_-]}" ] || c=$(printf '%%%02X' "'$c")
    e="${e}${c}"
    i=$((i + 1))
  done
  echo "$e"
}

Now everything works as expected.

Thanks for your help Michi and sorry for reopening the solved Ticket.

Have a great week.