Icinga2 HA - Notifications not being sent by active endpoint

I have a HA master-master pair, and the active endpoint is refusing to send some notifications. It will send my slack and email notifications, but will not send the SMS ones. It generate the following error in the debug log

[2021-07-12 14:51:32 -0400] notice/Notification: Notification 'ca04vlgraylogweb01!check_graylog_events_mercury!sms-escalation': HA cluster active, this endpoint does not have the authority (paused=true). Skipping.

This seems odd to me since that host the error is generated on is the active endpoint.

Notes: the notification feature is only enabled on the active endpoint (host1 ) and is not enabled on host2. because of a related issue: Icinga2 crashes when sending notifications · Issue #8186 · Icinga/icinga2 · GitHub

Can anyone help me figure out why the active endpoint (which is also the config master), wont send SMS notifications (ie escalation)? Even though it is the active endpoint and is the host writing to IDO.

  • Version used
icinga2 - The Icinga 2 network monitoring daemon (version: r2.12.4-1)

Copyright (c) 2012-2021 Icinga GmbH (https://icinga.com/)
License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl2.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

System information:
  Platform: Ubuntu
  Platform version: 20.04.2 LTS (Focal Fossa)
  Kernel: Linux
  Kernel version: 5.8.0-1038-aws
  Architecture: x86_64

Build information:
  Compiler: GNU 9.3.0
  Build host: runner-hh8q3bz2-project-298-concurrent-0
  OpenSSL version: OpenSSL 1.1.1f  31 Mar 2020

Application information:

General paths:
  Config directory: /etc/icinga2
  Data directory: /var/lib/icinga2
  Log directory: /var/log/icinga2
  Cache directory: /var/cache/icinga2
  Spool directory: /var/spool/icinga2
  Run directory: /run/icinga2

Old paths (deprecated):
  Installation root: /usr
  Sysconf directory: /etc
  Run directory (base): /run
  Local state directory: /var

Internal paths:
  Package data directory: /usr/share/icinga2
  State path: /var/lib/icinga2/icinga2.state
  Modified attributes path: /var/lib/icinga2/modified-attributes.conf
  Objects path: /var/cache/icinga2/icinga2.debug
  Vars path: /var/cache/icinga2/icinga2.vars
  PID path: /run/icinga2/icinga2.pid

  • Operating System and version
Ubuntu 20.04.2 LTS (Focal Fossa)
  • Enabled features (icinga2 feature list)
    host 2
Disabled features: command compatlog debuglog elasticsearch gelf graphite icingadb livestatus notification opentsdb perfdata statusdata syslog
Enabled features: api checker ido-mysql influxdb mainlog

host 1 (config master)

Disabled features: command compatlog debuglog elasticsearch gelf graphite icingadb livestatus opentsdb perfdata statusdata syslog
Enabled features: api checker ido-mysql influxdb mainlog notification
  • Icinga Web 2 version and modules (System - About)
Name	Version
setup	2.9.0
grafana	1.4.2
monitoring	2.9.0
Icinga Web 2 Version
2.9.0
Git commit
3b0a0a78df9389c46d3a74611a8eaec2f2b3cc77
PHP Version
7.4.21
Git commit date
2021-07-12
Copyright
© 2013-2021 Icinga GmbH
  • Config validation (icinga2 daemon -C)
[2021-07-14 12:09:51 -0400] information/cli: Icinga application loader (version: r2.12.4-1)
[2021-07-14 12:09:51 -0400] information/cli: Loading configuration file(s).
[2021-07-14 12:09:52 -0400] information/ConfigItem: Committing config item(s).
[2021-07-14 12:09:52 -0400] information/ApiListener: My API identity: host1.internal
[2021-07-14 12:09:54 -0400] information/ConfigItem: Instantiated 1 NotificationComponent.
[2021-07-14 12:09:54 -0400] information/ConfigItem: Instantiated 238 Hosts.
[2021-07-14 12:09:54 -0400] information/ConfigItem: Instantiated 10 Downtimes.
[2021-07-14 12:09:54 -0400] information/ConfigItem: Instantiated 5 NotificationCommands.
[2021-07-14 12:09:54 -0400] information/ConfigItem: Instantiated 1 FileLogger.
[2021-07-14 12:09:54 -0400] information/ConfigItem: Instantiated 102 Comments.
[2021-07-14 12:09:54 -0400] information/ConfigItem: Instantiated 13054 Notifications.
[2021-07-14 12:09:54 -0400] information/ConfigItem: Instantiated 1 IcingaApplication.
[2021-07-14 12:09:54 -0400] information/ConfigItem: Instantiated 22 HostGroups.
[2021-07-14 12:09:54 -0400] information/ConfigItem: Instantiated 1 CheckerComponent.
[2021-07-14 12:09:54 -0400] information/ConfigItem: Instantiated 235 Zones.
[2021-07-14 12:09:54 -0400] information/ConfigItem: Instantiated 235 Endpoints.
[2021-07-14 12:09:54 -0400] information/ConfigItem: Instantiated 1 IdoMysqlConnection.
[2021-07-14 12:09:54 -0400] information/ConfigItem: Instantiated 3 ApiUsers.
[2021-07-14 12:09:54 -0400] information/ConfigItem: Instantiated 1 ApiListener.
[2021-07-14 12:09:54 -0400] information/ConfigItem: Instantiated 116 CheckCommands.
[2021-07-14 12:09:54 -0400] information/ConfigItem: Instantiated 1 InfluxdbWriter.
[2021-07-14 12:09:54 -0400] information/ConfigItem: Instantiated 7 TimePeriods.
[2021-07-14 12:09:54 -0400] information/ConfigItem: Instantiated 9 UserGroups.
[2021-07-14 12:09:54 -0400] information/ConfigItem: Instantiated 11 Users.
[2021-07-14 12:09:54 -0400] information/ConfigItem: Instantiated 4019 Services.
[2021-07-14 12:09:54 -0400] information/ConfigItem: Instantiated 9 ServiceGroups.
[2021-07-14 12:09:54 -0400] information/ConfigItem: Instantiated 10 ScheduledDowntimes.
[2021-07-14 12:09:55 -0400] information/ScriptGlobal: Dumping variables to file '/var/cache/icinga2/icinga2.vars'
[2021-07-14 12:09:55 -0400] information/cli: Finished validating the configuration file(s).

  • If you run multiple Icinga 2 instances, the zones.conf file (or icinga2 object list --type Endpoint and icinga2 object list --type Zone) from all affected nodes
    host 1 (config master)
/* THIS FILE IS MANAGED BY PUPPET.*/
object Endpoint "host1.internal" {
}
object Endpoint "host2.internal" {

        host = "10.4.10.41"

}
object Zone "master" {
    endpoints = [ "host1.internal","host2.internal" ]
}
object Zone "global-templates" {
        global = true
}

host2

/* THIS FILE IS MANAGED BY PUPPET.*/
object Endpoint "host1.internal" {
}
object Endpoint "host2.internal" {

}
object Zone "master" {
    endpoints = [ "host1.internal","host2.internal" ]
}
object Zone "global-templates" {
        global = true
}

Hello @jason.agility!

Disabling the notification feature on one node requires putting enable_ha = false into the { } in /etc/icinga2/features-available/notification.conf on the other node.

Best,
AK

Notifications.conf in /etc/icinga2/features-enabled now contains the following (and i tried both a service reload and restart)

/**
 * The notification component is responsible for sending notifications.
 */

object NotificationComponent "notification" {
    enable_ha = false
}

but i still get

[2021-07-14 13:48:05 -0400] notice/Notification: Notification 'hosta.internal!check_puppet_agent!sms-escalation': HA cluster active, this endpoint does not have the authority (paused=true). Skipping.

The documentation suggests that the feature “notification” must be enabled on all nodes.
https://icinga.com/docs/icinga-2/latest/doc/06-distributed-monitoring/#high-availability-with-notifications

I would enable it on the second node except that enabling it causes the second node icinga2 service to crash on startup. Which is why i open this separate topic, to see if I could have notification works properly with the feature disabled on the second node until the bug can be fixed / debugged further

@Al2Klimov, can you confirm that you can have notification feature disabled on one node in a HA pair (along with enable_ha = false for the one node where the feature is enabled)?

I tried enabling notification feature on both hosts, with enable_ha = false set on both. The second node crashed on start up. So it seems i have no way of enabling on node2, and enable_ha = false doesn’t seem to work on node1 .
I must be doing something wrong, but it isn’t clear what.

Colleagues, do you consider this behaviour surprising, too?

@jbrost @nhilverling @htriem

Well, Object Types - Icinga 2 says:

Disabling this currently only affects reminder notifications. Defaults to “true”.

Sounds like this is some limitation at the moment?

Incase it is relevant, I am testing the notification the following way:


Which if I understand the tooltip correctly, should force the notification regardless of rules (time, etc).

Shall we change that?

So it sounds like adding the enable_ha = false won’t provide me with a temporary work around. at least not for all notifications. So that means I need to fix the root issue with the icinga2 service crashing on the secondary node (linked in original post github #8186).

Well, I think the current behavior doesn’t look like the ideal one. The fact that someone felt the need to explicitly state this in the docs also supports this claim. So yes, changing this sounds desirable, but probably there’s a reason why it was implemented this way (like doing it another way would need some redesign of the component or somethink like that, I don’t know).