Event_command firing from both master and satellite

So I’ve found what looks like a bug, and my apologies if this is discussed somewhere else or if it’s already on board for 2.11 (I wasn’t able to find anything about it).

I’m aware that check commands and event commands are intended to run from the same node. I have no issues whatsoever with event commands running on a specified endpoint (client nodes). This problem pertains to remote checks. In this case, I have an http check and an event command that uses ssh to login to the affected client.

While initially testing it, it looked like it was running twice. I double checked my zone configs to make sure the satellites knew about each other and all looked fine. So I wrote a pair of test commands, one for checking and one for events that just announce in Slack what node they’re firing from. I was surprised to see this:

Icinga APP [1:24 PM]
lasicinga2: Hi there! I’m a useless check command to make sure that Icinga doesn’t have hierarchy problems.
icinga-master2: Hi there! I’m a useless event handler to make sure that Icinga doesn’t have hierarchy problems.
lasicinga2: Hi there! I’m a useless event handler to make sure that Icinga doesn’t have hierarchy problems.

So here I have the check command only firing from my satellite, but my event handler firing from both the satellite and master.

I did some troubleshooting with a nice guy at your sister site here https://monitoring-portal.org/t/event-handler-firing-from-both-satellite-and-master/6120

The command and service objects are here, endpoint purposefully not defined:

object CheckCommand "test-relationship" {
  import "plugin-check-command"
  command = [ PluginDir + "/test-relationship.py" ]
}

object EventCommand "test-event" {
  import "plugin-event-command"
  command = [ PluginDir + "/test-event.py" ]
}

apply Service "test relationship" {
  import "generic-service"
  check_command = "test-relationship"
  event_command = "test-event"
  assign where host.name == "testhost" // in las satellite zone
}

As it is right now, I’m doing fine by writing in my script to exit if it’s running in the wrong subnet. I might be totally wrong about something, but just trying to help out.

Thanks!

Hi,

how does the compiled config object from icinga2 object list look like?

Cheers,
Michael

Interesting, you seem to be right, just simple checks will have their eventhandler executed on satellite and master.

Experiment

/usr/lib/nagios/plugins/eventhandler-log-local

#!/bin/bash

: "${TEMP:=/tmp}"
: "${LOGFILE:=${TEMP}/eventhandler.log}"

echo "[$(date --rfc-3339=ns)] Eventhandler executed: $*" | tee -a "$LOGFILE"

Adding a command via Director:

object EventCommand "eventhandler-log-local" {
    import "plugin-event-command"
    command = [ PluginDir + "/eventhandler-log-local" ]
    arguments += {
        "--host" = "$host.name$"
        "--service" = "$service.name$"
    }
}

Some monitoring:

template Service "random" {
    import "default service"

    check_command = "random"
    enable_notifications = false
    event_command = "eventhandler-log-local"
    command_endpoint = host_name
}

apply Service "random-test" {
    import "random"

    assign where match("*", host.name)

    import DirectorOverrideTemplate
}

Logs

satellite

[2019-04-23 09:39:13.047209617+02:00] Eventhandler executed: --host wifi-livingroom --service random-test

master

[2019-04-23 09:39:13.058314770+02:00] Eventhandler executed: --host wifi-livingroom --service random-test

Conclusion

It seems like Eventhandlers are also executed when receiving results from a satellite.

I currently have no agent behind a satellite, but results should be similar.

1 Like

Additional note: I have 2 masters, but only one of them is executing the eventhandler for checks received from satellite.

Command Endpoint seems to work for single agents:

[2019-04-23 09:53:20.360590157+02:00] Eventhandler executed: --host serenia.lazyfrosch.de --service random-test

Remote Checks between 2 masters seem to cause major issues with the core currently :sob: - it is running eventhandler on both masters non-stop:

[2019-04-23 09:55:03.970510790+02:00] Eventhandler executed: --host ampere.lazyfrosch.de --service random-test
[2019-04-23 09:55:03.997902095+02:00] Eventhandler executed: --host ampere.lazyfrosch.de --service random-test
[2019-04-23 09:55:04.022410796+02:00] Eventhandler executed: --host ampere.lazyfrosch.de --service random-test
[2019-04-23 09:55:04.028392317+02:00] Eventhandler executed: --host ampere.lazyfrosch.de --service random-test
[2019-04-23 09:55:04.032409829+02:00] Eventhandler executed: --host ampere.lazyfrosch.de --service random-test
[2019-04-23 09:55:04.035604240+02:00] Eventhandler executed: --host ampere.lazyfrosch.de --service random-test
[2019-04-23 09:55:04.040632195+02:00] Eventhandler executed: --host ampere.lazyfrosch.de --service random-test
[2019-04-23 09:55:04.045234176+02:00] Eventhandler executed: --host ampere.lazyfrosch.de --service random-test
[2019-04-23 09:55:04.056932354+02:00] Eventhandler executed: --host ampere.lazyfrosch.de --service random-test

Looks like Markus has been able to reproduce, but incase you still wanted my object details, pasting below. Similar but not the same setup; I have 2 masters, and then 2 satellites per data center. Check command is only firing on one of the satellites, and event command is only firing on one satellite and one master. It isn’t bleeding into other satellite zones.

Object '[hostname]!test relationship' of type 'Service':
  % declared in '/etc/icinga2/zones.d/global-templates/testing.conf', lines 11:1-11:33
  * __name = "[hostname]!test relationship"
  * action_url = ""
  * check_command = "test-relationship"
    % = modified in '/etc/icinga2/zones.d/global-templates/testing.conf', lines 13:3-13:37
  * check_interval = 300
    % = modified in '/etc/icinga2/zones.d/global-templates/templates.conf', lines 78:3-78:21
  * check_period = ""
  * check_timeout = null
  * command_endpoint = ""
  * display_name = "test relationship"
  * enable_active_checks = true
  * enable_event_handler = true
  * enable_flapping = false
  * enable_notifications = true
  * enable_passive_checks = true
  * enable_perfdata = true
  * event_command = "test-event"
    % = modified in '/etc/icinga2/zones.d/global-templates/testing.conf', lines 14:3-14:30
  * flapping_threshold = 0
  * flapping_threshold_high = 30
  * flapping_threshold_low = 25
  * groups = [ ]
  * host_name = "[hostname]"
    % = modified in '/etc/icinga2/zones.d/global-templates/testing.conf', lines 11:1-11:33
  * icon_image = ""
  * icon_image_alt = ""
  * max_check_attempts = 5
    % = modified in '/etc/icinga2/zones.d/global-templates/templates.conf', lines 77:3-77:24
  * name = "test relationship"
    % = modified in '/etc/icinga2/zones.d/global-templates/testing.conf', lines 11:1-11:33
  * notes = ""
  * notes_url = ""
  * package = "_etc"
    % = modified in '/etc/icinga2/zones.d/global-templates/testing.conf', lines 11:1-11:33
  * retry_interval = 30
    % = modified in '/etc/icinga2/zones.d/global-templates/templates.conf', lines 79:3-79:22
  * source_location
    * first_column = 1
    * first_line = 11
    * last_column = 33
    * last_line = 11
    * path = "/etc/icinga2/zones.d/global-templates/testing.conf"
  * templates = [ "test relationship", "generic-service" ]
    % = modified in '/etc/icinga2/zones.d/global-templates/testing.conf', lines 11:1-11:33
    % = modified in '/etc/icinga2/zones.d/global-templates/templates.conf', lines 76:1-76:34
  * type = "Service"
  * vars
    * notification
      * [internal notification group]
        * mail = true
          % = modified in '/etc/icinga2/zones.d/global-templates/templates.conf', lines 81:5-81:37
  * volatile = false
  * zone = "lassatellite"
    % = modified in '/etc/icinga2/zones.d/global-templates/testing.conf', lines 11:1-11 

Collect all the details and open a GitHub issue please. I don’t have time atm to look into this topic, I’m moving along many PR reviews for 2.11.

Will mark this resolved since bug report is in. Thanks for all you guys do and I hope you don’t lose too much sleep over the 2.11 release.

Hi

Sorry for late answer on this subject. i’ve the same problem on my side.
Events are sent from master ok) and are trying to be sent from satellite (there i have error: output: execvpe(/usr/bin/icinga_event_handler) failed: No such file or director…)
Is there any mean to disable it on satellite?
Or to force events to be sent from master?

Thanks

Hey @enigma619!

Would you mind opening a new topic with your issue, if it is not resolved by the steps in this one?
You should also share some more of your setup details up front, so you can be helped better!

Thank you :slight_smile:
Feu