I have a Icinga2 HA setup with to masters.
On one of my masters (node2) the feature notification
is currently disabled. I discovered that this was causing the other master (node1) to not send sms notifications since it would :
[2021-07-12 14:51:32 -0400] notice/Notification: Notification 'ca04vlgraylogweb01!check_events!sms-escalation': HA cluster active, this endpoint does not have the authority (paused=true). Skipping.
In my attempt to fix this after much debugging (including enabling debuglog temporarily) I enabled notification feature on node2. at this point the icinga2 service started crashing repeatedly.
It seemed to crash just as it finished sync based to node1 and started attempting to send notifications.
Does any one have any insight into why the crash is occuring?
Crash log below
icinga2: /usr/include/boost/smart_ptr/intrusive_ptr.hpp:199: T* boost::intrusive_ptr<T>::operator->() const [with T = icinga::Dictionary]: Assertion `px != 0' failed.
Caught SIGABRT.
Current time: 2021-07-12 15:07:49 -0400
icinga2: /usr/include/boost/smart_ptr/intrusive_ptr.hpp:199: T* boost::intrusive_ptr<T>::operator->() const [with T = icinga::Dictionary]: Assertion `px != 0' failed.
Caught SIGABRT.
Current time: 2021-07-12 15:12:15 -0400
icinga2: /usr/include/boost/smart_ptr/intrusive_ptr.hpp:199: T* boost::intrusive_ptr<T>::operator->() const [with T = icinga::Dictionary]: Assertion `px != 0' failed.
Caught SIGABRT.
Current time: 2021-07-12 15:19:37 -0400
icinga2: /usr/include/boost/smart_ptr/intrusive_ptr.hpp:199: T* boost::intrusive_ptr<T>::operator->() const [with T = icinga::Dictionary]: Assertion `px != 0' failed.
Caught SIGABRT.
Current time: 2021-07-12 15:25:09 -0400
icinga2: /usr/include/boost/smart_ptr/intrusive_ptr.hpp:199: T* boost::intrusive_ptr<T>::operator->() const [with T = icinga::Dictionary]: Assertion `px != 0' failed.
Caught SIGABRT.
Current time: 2021-07-12 15:29:31 -0400
icinga2: /usr/include/boost/smart_ptr/intrusive_ptr.hpp:199: T* boost::intrusive_ptr<T>::operator->() const [with T = icinga::Dictionary]: Assertion `px != 0' failed.
Caught SIGABRT.
Current time: 2021-07-12 15:34:18 -0400
node2:
Version used (version: r2.12.4-1
)
Operating System and version 20.04.2 LTS (Focal Fossa)
Enabled features (Enabled features: api checker ido-mysql influxdb mainlog
)
Icinga Web 2 version and modules (System - About)
|setup|2.9.0|
|---|---|
|grafana|1.4.2|
|monitoring|2.9.0|
Config validation (valid
)
If you run multiple Icinga 2 instances, the zones.conf
file (or icinga2 object list --type Endpoint
and icinga2 object list --type Zone
) from all affected nodes
node1
/* THIS FILE IS MANAGED BY PUPPET.*/
object Endpoint "ca02vlnms01.agilitypr.internal" {
}
object Endpoint "ca04vlnms01.agilitypr.internal" {
host = "10.4.10.41"
}
object Zone "master" {
endpoints = [ "ca02vlnms01.agilitypr.internal","ca04vlnms01.agilitypr.internal" ]
}
object Zone "global-templates" {
global = true
}
node2
object Endpoint "ca02vlnms01.agilitypr.internal" {
}
object Endpoint "ca04vlnms01.agilitypr.internal" {
}
object Zone "master" {
endpoints = [ "ca02vlnms01.agilitypr.internal","ca04vlnms01.agilitypr.internal" ]
}
object Zone "global-templates" {
global = true
}
I have added additional details in a related ticket I found:
opened 06:24AM - 21 Aug 20 UTC
needs-feedback
## Describe the bug
Yesterday, after a config reload, our icinga2 master kept c… rashing with the same stack strace.
At first it crashed after 20-30 Seconds, after a while directly after validating the configuration. This went on for an hour.
```
(0) libc.so.6: gsignal (+0xcf) [0x7f0597314fff]
(1) libc.so.6: abort (+0x16a) [0x7f059731642a]
(2) libc.so.6: <unknown function> (+0x2be67) [0x7f059730de67]
(3) libc.so.6: <unknown function> (+0x2bf12) [0x7f059730df12]
(4) icinga2: icinga::NotificationComponent::NotificationTimerHandler() (+0x159b) [0x561d3dec965b]
(5) icinga2: <unknown function> (+0x338271) [0x561d3db5b271]
(6) icinga2: icinga::Timer::Call() (+0x2d) [0x561d3daf94bd]
(7) icinga2: <unknown function> (+0x341e0d) [0x561d3db64e0d]
(8) icinga2: boost::asio::detail::executor_op<boost::asio::detail::work_dispatcher<bool icinga::ThreadPool::Post<std::function<void ()> >(std::function<void ()>, icinga::SchedulerPolicy)::{lambda()#1}>, std::allocator<void>, boost::asio::detail::scheduler_operation>::do_complete(void*, std::allocator<void>*, boost::system::error_code const&, unsigned long) (+0xe3) [0x561d3e0018b3]
(9) icinga2: <unknown function> (+0x27d5f6) [0x561d3daa05f6]
(10) icinga2: <unknown function> (+0x7dbae5) [0x561d3dffeae5]
(11) icinga2: boost_asio_detail_posix_thread_function (+0xf) [0x561d3da93aff]
(12) libpthread.so.0: <unknown function> (+0x74a4) [0x7f0598a8d4a4]
(13) libc.so.6: clone (+0x3f) [0x7f05973cad0f]
```
After debugging a little, we figured that it is related to notifications. We disabled all notifications and icinga2 was able to start again without crashing. Then we were able to reactivate all notifications, as it seems that there was a buggy notification queued that caused the crash. Our theory is, that disabling all notifications cleared the buggy icinga2 state.
## To Reproduce
We have no idea what caused this bug, except the fact that it seems to be related to notifications.
## Expected behavior
No crashes.
## Your Environment
Include as many relevant details about the environment you experienced the problem in
Application version: r2.12.0-1
System information:
Platform: Debian GNU/Linux
Platform version: 9 (stretch)
Kernel: Linux
Kernel version: 4.9.0-13-amd64
Architecture: x86_64
Disabled features: compatlog debuglog elasticsearch gelf graphite icingadb livestatus opentsdb perfdata statusdata
Enabled features: api checker command ido-mysql influxdb mainlog notification syslog
Icinga Web 2: 2.8.2
Config validation:
```
[2020-08-21 08:18:46 +0200] information/cli: Icinga application loader (version: r2.12.0-1)
[2020-08-21 08:18:46 +0200] information/cli: Loading configuration file(s).
[2020-08-21 08:18:46 +0200] information/ConfigItem: Committing config item(s).
[2020-08-21 08:18:46 +0200] information/ApiListener: My API identity: monty1.uni-paderborn.de
[2020-08-21 08:18:53 +0200] warning/ApplyRule: Apply rule 'notify_host_imt_srv_mail' (in /etc/icinga2/zones.d/master/notifications.conf: 369:1-369:53) for type 'Notification' does not match anywhere!
[2020-08-21 08:18:53 +0200] warning/ApplyRule: Apply rule 'notify_service_imt_srv_mail' (in /etc/icinga2/zones.d/master/notifications.conf: 378:1-378:59) for type 'Notification' does not match anywhere!
[2020-08-21 08:18:53 +0200] warning/ApplyRule: Apply rule 'mysql_backup_check_exists' (in /etc/icinga2/zones.d/master/services/mysql_cluster.conf: 19:1-19:41) for type 'Service' does not match anywhere!
[2020-08-21 08:18:53 +0200] warning/ApplyRule: Apply rule 'check_http_content' (in /etc/icinga2/zones.d/master/services/solr.conf: 1:0-1:33) for type 'Service' does not match anywhere!
[2020-08-21 08:18:53 +0200] warning/ApplyRule: Apply rule 'check_ipmi' (in /etc/icinga2/zones.d/master/services/sun.conf: 1:0-1:25) for type 'Service' does not match anywhere!
[2020-08-21 08:18:53 +0200] warning/ApplyRule: Apply rule 'check_windows-failover_sp' (in /etc/icinga2/zones.d/master/services/windows.conf: 13:1-13:41) for type 'Service' does not match anywhere!
[2020-08-21 08:18:53 +0200] information/ConfigItem: Instantiated 1 NotificationComponent.
[2020-08-21 08:18:53 +0200] information/ConfigItem: Instantiated 1 SyslogLogger.
[2020-08-21 08:18:53 +0200] information/ConfigItem: Instantiated 904 Hosts.
[2020-08-21 08:18:53 +0200] information/ConfigItem: Instantiated 455 Downtimes.
[2020-08-21 08:18:53 +0200] information/ConfigItem: Instantiated 7 NotificationCommands.
[2020-08-21 08:18:53 +0200] information/ConfigItem: Instantiated 1 FileLogger.
[2020-08-21 08:18:53 +0200] information/ConfigItem: Instantiated 253 Comments.
[2020-08-21 08:18:53 +0200] information/ConfigItem: Instantiated 23810 Notifications.
[2020-08-21 08:18:53 +0200] information/ConfigItem: Instantiated 1 IcingaApplication.
[2020-08-21 08:18:53 +0200] information/ConfigItem: Instantiated 140 HostGroups.
[2020-08-21 08:18:53 +0200] information/ConfigItem: Instantiated 1 EventCommand.
[2020-08-21 08:18:53 +0200] information/ConfigItem: Instantiated 6774 Dependencies.
[2020-08-21 08:18:53 +0200] information/ConfigItem: Instantiated 1 CheckerComponent.
[2020-08-21 08:18:53 +0200] information/ConfigItem: Instantiated 463 Zones.
[2020-08-21 08:18:53 +0200] information/ConfigItem: Instantiated 463 Endpoints.
[2020-08-21 08:18:53 +0200] information/ConfigItem: Instantiated 1 ExternalCommandListener.
[2020-08-21 08:18:53 +0200] information/ConfigItem: Instantiated 1 IdoMysqlConnection.
[2020-08-21 08:18:53 +0200] information/ConfigItem: Instantiated 498 ApiUsers.
[2020-08-21 08:18:53 +0200] information/ConfigItem: Instantiated 1 ApiListener.
[2020-08-21 08:18:53 +0200] information/ConfigItem: Instantiated 176 CheckCommands.
[2020-08-21 08:18:53 +0200] information/ConfigItem: Instantiated 1 InfluxdbWriter.
[2020-08-21 08:18:53 +0200] information/ConfigItem: Instantiated 7 TimePeriods.
[2020-08-21 08:18:53 +0200] information/ConfigItem: Instantiated 10 UserGroups.
[2020-08-21 08:18:53 +0200] information/ConfigItem: Instantiated 31 Users.
[2020-08-21 08:18:53 +0200] information/ConfigItem: Instantiated 10515 Services.
[2020-08-21 08:18:53 +0200] information/ConfigItem: Instantiated 42 ServiceGroups.
[2020-08-21 08:18:53 +0200] information/ConfigItem: Instantiated 181 ScheduledDowntimes.
[2020-08-21 08:18:53 +0200] information/ScriptGlobal: Dumping variables to file '/var/cache/icinga2/icinga2.vars'
[2020-08-21 08:18:53 +0200] information/cli: Finished validating the configuration file(s).
```
## Additional context
We are running a single master node without HA.
Does anyone know how to clear notification as the other user idicated they did in the ticket?
this would at least let me enable notifications and not have the service crash.
Both masters will stay up and running in the following state, but as soon as I enable notifications the secondary (not config master) icinga2 service crashes. No core dump, just a crash log.