Do you - and how do you use eventhandlers?

theFeu · January 3, 2022, 2:07pm

In our ask me anything series @nilmerg answered a question about Icingas eventhandlers, where he stated that there isn’t really a known use for them from his perspective.

@MarcusCaepio mentioned that their experience differs and suggested just asking you all for your experiences.

I would love to hear from all of you, how you use the eventhandlers - or if you don’t.

log1c · January 3, 2022, 2:37pm

I’ll start (after not having thought about eventcommands for some time now ^^)

Currently there are no event commands/handlers configured on the systems I have set up or attend to.
I do not find them useless, but no customer has asked for some automatic action to be taken when something “breaks” until now.
Though, after talking to colleagues, there could be a need for them in future and could be implemented.
If I understand correctly the AMA video is referring to the visual representation of configured eventhandlers/eventcommands in the webinterface, correct?
I would put that under “nice to have”, so that e.g. ops could see the configured event options and the command.

steaksauce · January 3, 2022, 2:43pm

Predating myself at my current employer, we are using monit (watchdog daemon) for “event handling” on some things. Not sure why they didn’t use event handlers in Icinga but as we move forward into Alma8 it looks like we are shifting away from monit so I will likely be using them more.

rsx · January 3, 2022, 2:45pm

I’m using one event command to write messages in Windows event log.

We’ve started a new project where we’ll heavily use event handlers to automate tasks e.g. run cleanup scripts, restart (depending) services, trigger application jobs etc.

nexo1960 · January 3, 2022, 5:49pm

Back in Icinga-1 days, we used event handlers to put a short downtime on services when a host recovered to avoid additional notifications. For Icinga 2 i am still hoping that the following pull request get merged Introduce a recovery_time attribute for checkables. by efuss · Pull Request #8323 · Icinga/icinga2 · GitHub.

Other than that, I see some use cases like:

Restart of services
Automatic enlargement of hard disks for VMs

But these examples represent only a very small part of the failures.
in general, the failures are too diverse to be handled by event handlers.

leeclemens · January 3, 2022, 7:54pm

I use one to restart Icinga2 to kill zombies started via sudo and which cannot be killed normally.

github.com/Icinga/icinga2

Zombie CheckCommand processes

opened 05:50PM - 23 Aug 21 UTC

closed 12:53AM - 10 Nov 21 UTC

leeclemens

bug area/checks

## Describe the bug A number of defunct/zombie processes accumulate. There ar…e 10-30 day (check @ 5 min intervals) and do not seem to be related to master reloads (as previous bugs/forums reference). ## To Reproduce I have only seem this issue with plugins where the `CheckCommand`'s `command` includes `sudo`. 1. Define a `CheckCommand` similar to: ``` object CheckCommand "ceph_mgr" { import "plugin-check-command" command = [ "/usr/bin/sudo", PluginContribDir + "/check_ceph_mgr" ] } ``` 2. Wait for it to run, seems to happen 10-30 times a day 3. `ps -efH` ``` icinga 3521095 1 0 Aug10 ? 00:03:44 /usr/lib64/icinga2/sbin/icinga2 --no-stack-rlimit daemon --close-stdio -e icinga 929272 3521095 0 Aug16 ? 00:05:57 /usr/lib64/icinga2/sbin/icinga2 --no-stack-rlimit daemon --close-stdio - icinga 929281 929272 0 Aug16 ? 00:00:25 /usr/lib64/icinga2/sbin/icinga2 --no-stack-rlimit daemon --close-stdio root 1124248 929281 0 Aug16 ? 00:00:00 [sudo] <defunct> root 1124412 929281 0 Aug16 ? 00:00:00 [sudo] <defunct> root 1124927 929281 0 Aug16 ? 00:00:00 [sudo] <defunct> root 1124998 929281 0 Aug16 ? 00:00:00 [sudo] <defunct> root 1125732 929281 0 Aug16 ? 00:00:00 [sudo] <defunct> root 1125752 929281 0 Aug16 ? 00:00:00 [sudo] <defunct> root 1126253 929281 0 Aug16 ? 00:00:00 [sudo] <defunct> root 1126370 929281 0 Aug16 ? 00:00:00 [sudo] <defunct> root 1126395 929281 0 Aug16 ? 00:00:00 [sudo] <defunct> root 1126877 929281 0 Aug16 ? 00:00:00 [sudo] <defunct> root 1127118 929281 0 Aug16 ? 00:00:00 [sudo] <defunct> ``` ## Expected behavior Icinga2 reaps child `CheckCommand`/plugin processes ## Screenshots N/A ## Your Environment CentOS 7.9 Icinga2 2.13.0-1 <details> <summary>* Version used (`icinga2 --version`):</summary> ``` # icinga2 --version icinga2 - The Icinga 2 network monitoring daemon (version: 2.13.0-1) Copyright (c) 2012-2021 Icinga GmbH (https://icinga.com/) License GPLv2+: GNU GPL version 2 or later <https://gnu.org/licenses/gpl2.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. System information: Platform: CentOS Linux Platform version: 7 (Core) Kernel: Linux Kernel version: 3.10.0-1160.36.2.el7.x86_64 Architecture: x86_64 Build information: Compiler: GNU 4.8.5 Build host: runner-hh8q3bz2-project-322-concurrent-0 OpenSSL version: OpenSSL 1.0.2k-fips 26 Jan 2017 Application information: General paths: Config directory: /etc/icinga2 Data directory: /var/lib/icinga2 Log directory: /var/log/icinga2 Cache directory: /var/cache/icinga2 Spool directory: /var/spool/icinga2 Run directory: /run/icinga2 Old paths (deprecated): Installation root: /usr Sysconf directory: /etc Run directory (base): /run Local state directory: /var Internal paths: Package data directory: /usr/share/icinga2 State path: /var/lib/icinga2/icinga2.state Modified attributes path: /var/lib/icinga2/modified-attributes.conf Objects path: /var/cache/icinga2/icinga2.debug Vars path: /var/cache/icinga2/icinga2.vars PID path: /run/icinga2/icinga2.pid ``` </details> <details> <summary>* Operating System and version:</summary> ``` # uname -a Linux localhost 3.10.0-1160.36.2.el7.x86_64 #1 SMP Wed Jul 21 11:57:15 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux ``` ``` # cat /etc/centos-release CentOS Linux release 7.9.2009 (Core) ``` ``` # cat /etc/os-release NAME="CentOS Linux" VERSION="7 (Core)" ID="centos" ID_LIKE="rhel fedora" VERSION_ID="7" PRETTY_NAME="CentOS Linux 7 (Core)" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:centos:centos:7" HOME_URL=https://www.centos.org/ BUG_REPORT_URL=https://bugs.centos.org/ CENTOS_MANTISBT_PROJECT="CentOS-7" CENTOS_MANTISBT_PROJECT_VERSION="7" REDHAT_SUPPORT_PRODUCT="centos" REDHAT_SUPPORT_PRODUCT_VERSION="7"``` </details> <details> <summary>* Enabled features (`icinga2 feature list`):</summary> ```# icinga2 feature list Disabled features: compatlog debuglog elasticsearch gelf icingadb influxdb influxdb2 livestatus opentsdb perfdata statusdata syslog Enabled features: api checker command graphite ido-mysql mainlog notification ``` </details> * Icinga Web 2 version and modules (System - About): N/A <details> <summary>* Config validation (`icinga2 daemon -C`):</summary> ``` # icinga2 daemon -C [2021-08-23 13:18:25 -0400] information/cli: Icinga application loader (version: 2.13.0-1) [2021-08-23 13:18:25 -0400] information/cli: Loading configuration file(s). [2021-08-23 13:18:26 -0400] information/ConfigItem: Committing config item(s). [2021-08-23 13:18:26 -0400] information/ApiListener: My API identity: wsc-salt01.cyber-center.com [2021-08-23 13:18:26 -0400] information/ConfigItem: Instantiated 1 GraphiteWriter. [2021-08-23 13:18:26 -0400] information/ConfigItem: Instantiated 1 NotificationComponent. [2021-08-23 13:18:26 -0400] information/ConfigItem: Instantiated 1 IdoMysqlConnection. [2021-08-23 13:18:26 -0400] information/ConfigItem: Instantiated 1 ExternalCommandListener. [2021-08-23 13:18:26 -0400] information/ConfigItem: Instantiated 1 CheckerComponent. [2021-08-23 13:18:26 -0400] information/ConfigItem: Instantiated 488 Services. [2021-08-23 13:18:26 -0400] information/ConfigItem: Instantiated 44 Zones. [2021-08-23 13:18:26 -0400] information/ConfigItem: Instantiated 6 HostGroups. [2021-08-23 13:18:26 -0400] information/ConfigItem: Instantiated 1 IcingaApplication. [2021-08-23 13:18:26 -0400] information/ConfigItem: Instantiated 43 Hosts. [2021-08-23 13:18:26 -0400] information/ConfigItem: Instantiated 42 Endpoints. [2021-08-23 13:18:26 -0400] information/ConfigItem: Instantiated 1 FileLogger. [2021-08-23 13:18:26 -0400] information/ConfigItem: Instantiated 3 ApiUsers. [2021-08-23 13:18:26 -0400] information/ConfigItem: Instantiated 266 CheckCommands. [2021-08-23 13:18:26 -0400] information/ConfigItem: Instantiated 1 ApiListener. [2021-08-23 13:18:26 -0400] information/ScriptGlobal: Dumping variables to file '/var/cache/icinga2/icinga2.vars' [2021-08-23 13:18:26 -0400] information/cli: Finished validating the configuration file(s). ``` </details> * If you run multiple Icinga 2 instances, the `zones.conf` file (or `icinga2 object list --type Endpoint` and `icinga2 object list --type Zone`) from all affected nodes. N/A ## Additional context Add any other context about the problem here.

The alternative solution seems to have consensus that it is not such a great idea (primarily for security reasons):

github.com/Icinga/icinga2

kill the process group with sudo rights

opened 05:13PM - 16 Apr 21 UTC

closed 09:01AM - 13 Oct 21 UTC

ggzengel

needs feedback

## Describe the bug Icinga2 can't kill checks which are called with sudo. Af…ter IMPI didn't respond the memory was eaten up and the whole system was frozen. ``` [2021-04-16 15:45:51 +0000] warning/Process: Killing process group 17700 ('sudo' '/usr/lib/nagios/plugins/check_ipmi_sensor' '-H' 'localhost' '-L' 'user' '-P' 'pass' '-U' 'icinga2' '-v') after timeout of 60 seconds [2021-04-16 15:45:51 +0000] warning/Process: Couldn't kill the process group 17700 ('sudo' '/usr/lib/nagios/plugins/check_ipmi_sensor' '-H' 'localhost' '-L' 'user' '-P' 'pass' '-U' 'icinga2' '-v'): [errno 1] Operation not permitted ```

I have not seen or thought of a third solution.

MarcusCaepio · January 4, 2022, 7:22am

Thanks @theFeu for the thread.
At my previous employer, event handlers were used extensively.
Currently I am using event handlers to restart services. E.g. I had problems with the sfcbd-watchdog service on ESXi while using check_esxi_hardware in the past. The service had to be restarted to get the check running again, what easily could be done with a handler.
More cases could be mapped in the future, such as restarting Docker containers or other services.

As event handler notifications are not present yet, I have to send them in the handler by myself (sending it to a MS Teams channel). My wishes here would be:

See in Icingaweb2, if an EventHandler is configured. I know, this is hard to solve, as only EventCommands, but not the EventHandler themself are shown in the icinga2 object list
See (in icingaweb2) / get / configure notifications for a triggered EventHandler as same as I configure all the other notifications.

This would be nice to have features of course. I guess your focus is currently in IcingaDB etc. But imho event handlers are whether ancient nor less used. And it seems like this thread is proving it so far I could image it is not used so much, because the configuration is quite tough compared to the rest of the icinga configuration.

Tqnsls · January 10, 2022, 1:54pm

Hi guys,

what @MarcusCaepio wrote is what I thought, too. We use it very much in our environment and (until now) I can not think of an alternative way to solve some of our “problems” withou eventhandlers.

I just commented beneth the video from Icinga / nilberg when it was released. But somehow my comment was deleted.

But I try to recreate it from my memories.

For our team and many of our customers eventhandlers are very important.
Unfortunately some customers use quite stupid and instable software that can not run and live without being continously restarted when certain errors occur. In those cases we use some self written eventhandlers.
Like when a specific error occurs or port is down which we all monitor, an eventhandler kicks the server itself or the software and restarts it.
We also use them to acknowledge a host or a service automatically based on certain service states or outputs.

We migrated most of them from icinga1 to icinga2 so it might have grown historically.
But at least until now we do not know of an alternative way to solve these problems.
Maybe you could enlighten us in the dark

JavierVilarroig · June 2, 2023, 12:26pm

I love events

We use them to run some PostgreSQL tasks when needed. Specifically:

Run vacuum and vacuum analyze on DB’s when a certain time has passed without any vacuum.
Reindex specific indexes when they become bloated.
Kill transactions idle for more than a week. (I’m allowed to do it )

There are other situations where I plan to continue implementing events, just lacking the time to work on them.

Cheers

MarcusCaepio · July 12, 2023, 3:48pm

But still today, event handlers are executed on every node in a icinga cluster, when I remember correctly. So in a cluster with several zones, where Eventhandlers should only be executed on specific satellites, it is not usable without problems…

log1c · July 13, 2023, 5:24am

Afaik the eventhandler is executed on the node which ran the check command for the check. Could be the satellite or the agent host, or even the master.

MarcusCaepio · July 13, 2023, 5:40am

You are right. I forgot that it was fixed. Had this in mind [dev.icinga.com #10208] Eventhandler trigger on all endpoints in high available zone · Issue #3431 · Icinga/icinga2 · GitHub