Event is only run once, even if the effect is not the expected

JavierVilarroig · December 27, 2023, 1:37pm

Hi.

I have created an event to be run in case a service goes to ERROR status.

This event usually solves the issue and the service will go to OK status again

In some corner cases, the event is not fixing the issue and the service is not moving to OK status.

Running a second time the command run by the event will fix the issue. But Icinga2 is only running the event once per transition top ERROR. That forces a manual intervention by the Sysadmin (me, in that case) to fix the issue.

Is there a way to have the event run more than once, as the notifications can be sent more than once?

Thanks!!!

Version used (icinga2 --version)

root:~# icinga2 --version
icinga2 - The Icinga 2 network monitoring daemon (version: r2.12.3-1)

Copyright (c) 2012-2023 Icinga GmbH (https://icinga.com/)
License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl2.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

System information:
  Platform: Debian GNU/Linux
  Platform version: 11 (bullseye)
  Kernel: Linux
  Kernel version: 6.5.11-7-pve
  Architecture: x86_64

Build information:
  Compiler: GNU 10.2.1
  Build host: x86-ubc-01
  OpenSSL version: OpenSSL 1.1.1n  15 Mar 2022

Application information:

General paths:
  Config directory: /etc/icinga2
  Data directory: /var/lib/icinga2
  Log directory: /var/log/icinga2
  Cache directory: /var/cache/icinga2
  Spool directory: /var/spool/icinga2
  Run directory: /run/icinga2

Old paths (deprecated):
  Installation root: /usr
  Sysconf directory: /etc
  Run directory (base): /run
  Local state directory: /var

Internal paths:
  Package data directory: /usr/share/icinga2
  State path: /var/lib/icinga2/icinga2.state
  Modified attributes path: /var/lib/icinga2/modified-attributes.conf
  Objects path: /var/cache/icinga2/icinga2.debug
  Vars path: /var/cache/icinga2/icinga2.vars
  PID path: /run/icinga2/icinga2.pid

Operating System and version

Debian 11

Enabled features (icinga2 feature list)

root:/usr/lib/nagios/plugins# icinga2 feature list
Disabled features: command compatlog debuglog elasticsearch gelf graphite icingadb influxdb livestatus opentsdb perfdata statusdata syslog
Enabled features: api checker mainlog notification

Icinga Web 2 version and modules (System - About)

Icinga Web 2 Version
    2.8.2
PHP Version
    7.4.33
Copyright
    © 2013-2023 The Icinga Project 

Loaded modules
Name 	Version
director 	1.9.0
incubator 	0.12.1
ipl 	v0.5.0
monitoring 	2.8.2
reactbundle 	0.9.0

Config validation (icinga2 daemon -C)

root@vm61:~# icinga2 daemon -C
[2023-12-27 14:35:59 +0100] information/cli: Icinga application loader (version: r2.12.3-1)
[2023-12-27 14:35:59 +0100] information/cli: Loading configuration file(s).
[2023-12-27 14:35:59 +0100] information/ConfigItem: Committing config item(s).
[2023-12-27 14:35:59 +0100] information/ApiListener: My API identity: vm61.unidata.msf.org
[2023-12-27 14:35:59 +0100] warning/ApplyRule: Apply rule 'pve_node_is_dead_for_vm' (in /var/lib/icinga2/api/packages/director/0aab7bfb-c87a-4035-8afc-1ef5e311b663/zones.d/director-global/dependency_apply.conf: 1:0-1:49) for type 'Dependency' does not match anywhere!
[2023-12-27 14:35:59 +0100] warning/ApplyRule: Apply rule 'Nigthly DB load from Live' (in /var/lib/icinga2/api/packages/director/0aab7bfb-c87a-4035-8afc-1ef5e311b663/zones.d/master/scheduled_downtime_apply.conf: 1:0-1:61) for type 'ScheduledDowntime' does not match anywhere!
[2023-12-27 14:35:59 +0100] information/ConfigItem: Instantiated 1 NotificationComponent.
[2023-12-27 14:35:59 +0100] information/ConfigItem: Instantiated 1 IdoPgsqlConnection.
[2023-12-27 14:35:59 +0100] information/ConfigItem: Instantiated 1 CheckerComponent.
[2023-12-27 14:35:59 +0100] information/ConfigItem: Instantiated 4 Users.
[2023-12-27 14:35:59 +0100] information/ConfigItem: Instantiated 3 UserGroups.
[2023-12-27 14:35:59 +0100] information/ConfigItem: Instantiated 2 TimePeriods.
[2023-12-27 14:35:59 +0100] information/ConfigItem: Instantiated 9 Zones.
[2023-12-27 14:35:59 +0100] information/ConfigItem: Instantiated 218 Services.
[2023-12-27 14:35:59 +0100] information/ConfigItem: Instantiated 1 IcingaApplication.
[2023-12-27 14:35:59 +0100] information/ConfigItem: Instantiated 35 Hosts.
[2023-12-27 14:35:59 +0100] information/ConfigItem: Instantiated 2 EventCommands.
[2023-12-27 14:35:59 +0100] information/ConfigItem: Instantiated 2 NotificationCommands.
[2023-12-27 14:35:59 +0100] information/ConfigItem: Instantiated 459 Notifications.
[2023-12-27 14:35:59 +0100] information/ConfigItem: Instantiated 7 Endpoints.
[2023-12-27 14:35:59 +0100] information/ConfigItem: Instantiated 54 Downtimes.
[2023-12-27 14:35:59 +0100] information/ConfigItem: Instantiated 28 Dependencies.
[2023-12-27 14:35:59 +0100] information/ConfigItem: Instantiated 1 Comment.
[2023-12-27 14:35:59 +0100] information/ConfigItem: Instantiated 1 FileLogger.
[2023-12-27 14:35:59 +0100] information/ConfigItem: Instantiated 3 ApiUsers.
[2023-12-27 14:35:59 +0100] information/ConfigItem: Instantiated 279 CheckCommands.
[2023-12-27 14:35:59 +0100] information/ConfigItem: Instantiated 1 InfluxdbWriter.
[2023-12-27 14:35:59 +0100] information/ConfigItem: Instantiated 1 ApiListener.
[2023-12-27 14:35:59 +0100] information/ScriptGlobal: Dumping variables to file '/var/cache/icinga2/icinga2.vars'
[2023-12-27 14:35:59 +0100] information/cli: Finished validating the configuration file(s).

log1c · December 28, 2023, 7:34am

Hello.

The docs state that

Unlike notifications, event commands for hosts/services are called on every check execution if one of these conditions matches:

The host/service is in a soft state

The host/service state changes into a hard state

The host/service state recovers from a soft or hard state to OK/Up

So the easiest way would be to add multiple conditions to your script:

if ( [ $servicestate == "CRITICAL" ] && [ $servicestatetype == "SOFT" ] && [ $serviceattempt -eq 3 ] )

so that it tries to fix the problem in each soft state (for example)

JavierVilarroig · December 28, 2023, 9:57am

Hi @log1c

Thanks for your answer.

I see your point. But I don’t think it covers my need.

Is not just about running the same event multiple times. But to re run the event if the service stays in error condition for more than X minutes.

The events can be costly and/or not possible to run more than one instance at the same time.

Still, thanks again

log1c · December 28, 2023, 10:33am

I understand.
You could “abuse” notifications for this. As they have the possibility to be executed not before/after a certain amount of time (times.begin/end)
https://icinga.com/docs/icinga-2/latest/doc/03-monitoring-basics/#notification-escalations

As notifications are also “just” scripts that are executed, you can put whatever script behind a notification.

JavierVilarroig · December 28, 2023, 11:27am

Yes, you have your point on using Notifications!!

I will try that path. I already have an escalation mechanism built on my configuration. I’m sure I can make it fit for that.

May be is not very elegant but must work.

Thanks again for your help

Anyway, will be nice to have Events that can be repeated periodically as part of Icinga2 configuration. I will propose.

jeanm · January 3, 2024, 8:19am

Hello,

I would define a repeated fix script (crontab job, for instance) that runs only if a basic condition is met (some file being not empty for instance).

Then the Icinga event would simply enable the fix script when detecting a transition to the Critical state for the event, and disable the fix script when detecting the transition to the OK state. (enable = fill the file with datetime; disable = empty the file; file named “icinga_event_xxx_critical_datetime.txt” for instance).

If check info is needed, you can of course elaborate on this solution and include in the file some structured information in the file, to be used by the fix script.

My two cents,

Jean

JavierVilarroig · March 28, 2025, 12:46pm

Yes. That can be also a posibility.