Hi, fellow Icinga users. I have an issue where my Icinga doesn’t send notifications reliably. The issue is that after 2 days or roughly 48 hours icinga stops sending notification.
As can be seen from the sql query below last notification has been sent on 2022-12-09. And the interval is set to 1d so we have clearly missed a few notifications here.
mysql> select hoststatus_id,last_notification,next_notification from icinga_hoststatus WHERE hoststatus_id = 95; ±--------------±--------------------±--------------------+
| hoststatus_id | last_notification | next_notification |
±--------------±--------------------±--------------------+
| 95 | 2022-12-09 16:36:48 | 2022-12-13 16:36:54 |
±--------------±--------------------±--------------------+
1 row in set (0.00 sec)
Service restart fixes the issue for few days again and some of the missed notifications are being sent afterwards.
######################configs#################
/* NOTIFICATION TEMPLATES */
template Notification “generic-notification-template” {
interval = 1d
states = [ Warning, Critical, Unknown ]
types = [ Problem, Acknowledgement, Recovery, Custom, FlappingStart,
FlappingEnd, DowntimeStart, DowntimeEnd, DowntimeRemoved ]
period = “24x7”
interval = 24h
}template Notification “host-notification-template” {
import “generic-notification-template”states = [ Up, Down ] types = [ Problem, Recovery, Custom ]
}
template Notification “service-notification-template” {
import “generic-notification-template”states = [ OK, Warning, Critical ] types = [ Problem, Recovery ]
}
#########################################
apply Notification “sms-icingaadmin” to Service {
import “service-notification-template”
command = “sms-service-notification”
user_groups = service.vars.notification.sms.groups
assign where service.vars.notification.sms && service.vars.non_default_notification != true && typeof(service.vars.notification.sms) == Dictionary
}
apply Notification “sms-icingaadmin” to Host {
import “host-notification-template”
command = “sms-host-notification”
user_groups = host.vars.notification.sms.groups
assign where host.vars.notification.sms && typeof(host.vars.notification.sms) == Dictionary
}
########################################
/* SMS NOTIFICATION FOR HOST */
object NotificationCommand “sms-host-notification” {
import “plugin-notification-command”command = [ SysconfDir + “/icinga2/scripts/sms-notification.sh” ]
arguments = {
“-P” = {
value = “$user.pager$”
skip_key = true
order = 0
}
“-M” = {
value = “$host.display_name$ is $host.state$”
skip_key = true
order = 1
}
}}
/* SMS NOTIFICATION FOR SERVICE */
object NotificationCommand “sms-service-notification” {
import “plugin-notification-command”command = [ SysconfDir + “/icinga2/scripts/sms-notification.sh” ]
arguments = {
“-P” = {
value = “$user.pager$”
skip_key = true
order = 0
}
“-M” = {
value = “$host.display_name$ \ $service.name$ is $service.state$”
skip_key = true
order = 1
}
}
}
This is what it looks like around the time when icinga should send a notification.
mysql> select host_id,check_source,last_notification,next_notification from icinga_hoststatus WHERE host_id = 14;
ERROR 1054 (42S22): Unknown column ‘host_id’ in ‘field list’
mysql> select hoststatus_id,last_notification,next_notification from icinga_hoststatus WHERE hoststatus_id = 14;
±--------------±--------------------±--------------------+
| hoststatus_id | last_notification | next_notification |
±--------------±--------------------±--------------------+
| 14 | 2022-12-09 14:43:33 | 2022-12-13 14:43:44 |
±--------------±--------------------±--------------------+
1 row in set (0.00 sec)mysql> select now();
±--------------------+
| now() |
±--------------------+
| 2022-12-13 14:48:18 |
±--------------------+
1 row in set (0.00 sec)mysql> select hoststatus_id,last_notification,next_notification from icinga_hoststatus WHERE hoststatus_id = 14;select now();
±--------------±--------------------±--------------------+
| hoststatus_id | last_notification | next_notification |
±--------------±--------------------±--------------------+
| 14 | 2022-12-09 14:43:33 | 2022-12-14 14:43:45 |
±--------------±--------------------±--------------------+
1 row in set (0.00 sec)±--------------------+
| now() |
±--------------------+
| 2022-12-13 14:48:29 |
±--------------------+
1 row in set (0.00 sec)
My guess since I managed to check the db when it was supposed to send a notification is that something is running for 5 mins then fails and updates the next notification time.
So some specific process doing that? There was some clock skewing (+1 year and the back to present day) done by a colleague which might have messed up something on the system.
I’m restarting the service and setting debuglogs on so that I can observe the issue more.
I proceeded to investigate the issue further. But as mentioned after restart the problem dissapears for a few days. Below are some snippets of the debuglog. 14th of december notifications worked just fine and after that once again nothing.
[2022-12-14 16:37:03 +0200] notice/ApiListener: Relaying ‘event::NotificationSentUser’ message
[2022-12-14 16:37:03 +0200] notice/Process: Running command ‘/etc/icinga2/scripts/sms-notification.sh’ ‘+123123123123’ ‘HOST_NOT_RESPONDING is DOWN’: PID 2457875
[2022-12-14 16:37:03 +0200] information/Notification: Completed sending ‘Problem’ notification ‘HOST_NOT_RESPONDING!sms-icingaadmin’ for checkable ‘HOST_NOT_RESPONDING’ and user ‘jacky’ using command ‘sms-host-notification’.
[2022-12-14 16:37:03 +0200] notice/ApiListener: Relaying ‘event::NotificationSentUser’ message[2022-12-15 16:37:06 +0200] notice/NotificationComponent: Attempting to send reminder notification ‘HOST_NOT_RESPONDING!sms-icingaadmin’.
[2022-12-15 16:37:06 +0200] notice/Notification: Attempting to send reminder notifications of type ‘Problem’ for notification object ‘HOST_NOT_RESPONDING!sms-icingaadmin’.
[2022-12-15 16:37:06 +0200] notice/Notification: Not sending reminder notifications for notification object ‘HOST_NOT_RESPONDING!sms-icingaadmin’: not in timeperiod ‘24x7’
[2022-12-15 16:37:06 +0200] notice/NotificationComponent: Attempting to send reminder notification ‘HOST_NOT_RESPONDING!sms-json-host’.
Now I believe I’ve spotted the issue here. This “debug/TimePeriod: Adding segment” happens on the service restart and it includes today and tomorrow.
Checking on the previous days debuglogs I noticed that this doesn’t happen after the reboot.
And if nothing is adding new time segments to the timeperiod 24x7 this makes sense that the notification stop coming and the error is that the time is “not in timeperiod ‘24x7’”
/var/log/icinga2/debug.log:[2022-12-16 15:23:33 +0200] debug/LegacyTimePeriod: Legacy timeperiod update returned 2 segments.
/var/log/icinga2/debug.log:[2022-12-16 15:23:33 +0200] debug/TimePeriod: Removing segment ‘Fri Dec 16 15:23:33 2022’ ↔ ‘Sat Dec 17 15:23:33 2022’ from TimePeriod ‘24x7’
/var/log/icinga2/debug.log:[2022-12-16 15:23:33 +0200] debug/TimePeriod: Adding segment ‘Fri Dec 16 00:00:00 2022’ ↔ ‘Sat Dec 17 00:00:00 2022’ to TimePeriod ‘24x7’
/var/log/icinga2/debug.log:[2022-12-16 15:23:33 +0200] debug/TimePeriod: Adding segment ‘Sat Dec 17 00:00:00 2022’ ↔ ‘Sun Dec 18 00:00:00 2022’ to TimePeriod ‘24x7’
I will check the debuglogs again on monday or tuesday and post if I can find those “debug/TimePeriod: Adding segment” from the weekend. But since I didn’t find those messages from the last few days I don’t expect to find them from the weekend logs.
If someone has seen this kind of behaviour before help would be much appreciated. I would like to know if the clock skewing messed up something, since other installations in which no clock skewing was done have been just fine or whether this is a bug.
Icinga 2 version used: 2.13.2-1
OS: AlmaLinux 8.6