Icinga2 check threshold scopes

Hi,

I’m quite new to icinga so please forgive if I might have overlooked certain things.
I setup a combination of two masters running FreeBSD 12.2.
icinga2-2.12.3

I’m currently also writing my own Ansible roles to get my setup up and running (ansible_managemynetwork/roles/ansible_icinga2 at main · imp1sh/ansible_managemynetwork · GitHub).

One concept I’m not able to grasp is how to handle threshold scopes for my checks. This is a generated check I use on all of my agent based hosts.

apply Service "procs" {
  import "generic-service"
  check_command = "procs"
  assign where host.vars.agent_endpoint
  command_endpoint = host.vars.agent_endpoint
}

This is an example host:

object Host "bsdrx1.mydomain.net" {
  address6 = "2a00:1111:2222:10::15"
  vars.agent_endpoint = "bsdrx1.mydomain.net"
  vars.os = "FreeBSD"
  import "generic-host"
}

So my host uses default thresholds which I tried to overwrite for several hosts like this:

object Host "mailrx1.mydomain.net" {
  address6 = "2a00:1111:2222:2::4"
  vars.agent_endpoint = "mailrx1.mydomain.net"
  vars.os = "Linux"
  vars.procs_warning = "500"
  vars.procs_critical = "700"
  import "generic-host"
}

I already found out that those vars set on host level unfortunately are not being used but according so some google results it seems I would have to modify the service object itself to use the host var if it is set.

if (host.vars.procs_warning) {
  vars.procs_warning = host.vars.procs_warning
} 

What I don’t understand is that it seems like I would have to modify every single check with such minor code so I will be able to get some kind of scope abstraction. I would have imagned it be be some kind of builtin feature.

Am I having trouble understanding the icinga2 concept?

Hi @pebrille

it works for me when I do not set any vars in the service object. Example:

object Host "dummy-host" {
  check_command = "dummy"
  vars.procs_warning = "333"
  vars.procs_critical = "444"
}
apply Service "dummy-procs" {
  check_command = "procs"
  assign where host.name == "dummy-host"
}

Result:
grafik

As soon as I set the var in the service object, the host vars are no longer applied.

apply Service "dummy-procs" {
  check_command = "procs"
  vars.procs_warning = "666"
  assign where host.name == "dummy-host"
}

Result:
grafik

I found out that my threshold is being accepted somehow…
The WebGUI is showing the correct threshold but after a minute or so the threshold just switches to 250/400 warning/critical without any user interaction. The service keeps alternating between my settings and the default settings. I have 0 clue what’s going on.

Could someone point me into the right direction? The threshold keeps changing and I cannot figure out, why. It swtiches during runtime without the service being restarted.
Is there a debug log or something alike that could help?

Hi @ritzgu

This will enable the debug log:

icinga2 feature enable debuglog
systemctl restart icinga2

You can enable it on the host which runs the plugin and watch the log.

tail -f /var/log/icinga2/debug.log | grep 'check_procs'

Blind guess, your masters are configured different thresholds and depending on which master the check is scheduled the thresholds differ. If that’s true the check source should change accordingly.

That’s not it. On both systems the host definition is idential (rolled out via ansible).

object Host "5900x.mydomain.de" {
  address6 = "2a00:fe0:1234:44ff::f"
  vars.procs_warning = "1200"
  vars.procs_critical = "1400"
  vars.dig_server = "93.221.123.3"
  vars.agent_endpoint = "5900x.mydomain.de"
  vars.os = "Linux"
  vars.wload1 = "25"
  vars.wload5 = "20"
  vars.wload15 = "15"
  vars.cload1 = "50"
  vars.cload5 = "30"
  vars.cload15 = "20"
  import "generic-host"
}

Your service definition is also identical?

Yes, also centrally controlled by ansible.

apply Service "procs" {
  import "generic-service"
  check_command = "procs"
  assign where host.vars.agent_endpoint
  command_endpoint = host.vars.agent_endpoint
}

Edit:
I switched off one master and now the checks are coming from the same master and still they alternate.

Edi2:
I enabled debuglog on the monitored machine 5900x.mydomain.de and it seems like there are multiple checks being executed:

[2021-05-06 14:16:50 +0200] debug/CheckerComponent: Scheduling info for checkable '5900x.mydomain.de!procs' (2021-05-06 14:16:50 +0200): Object '5900x.mydomain.de!procs', Next Check: 2021-05-06 14:16:50 +0200(1.6203e+09).
[2021-05-06 14:16:50 +0200] debug/CheckerComponent: Executing check for '5900x.mydomain.de!procs'
[2021-05-06 14:16:50 +0200] debug/Checkable: Update checkable '5900x.mydomain.de!procs' with check interval '60' from last check time at 2021-05-06 14:15:50 +0200 (1.6203e+09) to next check time at 2021-05-06 14:17:50 +0200(1.6203e+09).
[2021-05-06 14:16:50 +0200] notice/Process: Running command '/usr/lib/nagios/plugins//check_procs' '-c' '400' '-w' '250': PID 12194
[2021-05-06 14:16:50 +0200] debug/CheckerComponent: Check finished for object '5900x.mydomain.de!procs'
[2021-05-06 14:16:50 +0200] notice/Process: PID 12194 ('/usr/lib/nagios/plugins//check_procs' '-c' '400' '-w' '250') terminated with exit code 2
[2021-05-06 14:16:50 +0200] debug/Checkable: Update checkable '5900x.mydomain.de!procs' with check interval '60' from last check time at 2021-05-06 14:16:50 +0200 (1.6203e+09) to next check time at 2021-05-06 14:17:50 +0200(1.6203e+09).
[2021-05-06 14:16:50 +0200] debug/DbEvents: add checkable check history for '5900x.mydomain.de!procs'
[2021-05-06 14:17:19 +0200] notice/Process: Running command '/usr/lib/nagios/plugins//check_procs' '-c' '1400' '-w' '1200': PID 15635
[2021-05-06 14:17:19 +0200] notice/Process: PID 15635 ('/usr/lib/nagios/plugins//check_procs' '-c' '1400' '-w' '1200') terminated with exit code 0

Can you run icinga2 object list for one example and identify if there is any unwanted conf involved?

There are more service checks which I don’t include here but other than that those are the 5900x host related excerpts:

Object '5900x.mydomain.de' of type 'Zone':
  % declared in '/usr/local/etc/icinga2/zones.d/masterjochen/agents.conf', lines 93:1-93:29
  * __name = "5900x.mydomain.de"
  * endpoints = [ "5900x.mydomain.de" ]
    % = modified in '/usr/local/etc/icinga2/zones.d/masterjochen/agents.conf', lines 94:3-94:35
  * global = false
  * name = "5900x.mydomain.de"
  * package = "_etc"
  * parent = "masterjochen"
    % = modified in '/usr/local/etc/icinga2/zones.d/masterjochen/agents.conf', lines 95:3-95:25
  * source_location
    * first_column = 1
    * first_line = 93
    * last_column = 29
    * last_line = 93
    * path = "/usr/local/etc/icinga2/zones.d/masterjochen/agents.conf"
  * templates = [ "5900x.mydomain.de" ]
    % = modified in '/usr/local/etc/icinga2/zones.d/masterjochen/agents.conf', lines 93:1-93:29
  * type = "Zone"
  * zone = "masterjochen"

Object '5900x.mydomain.de' of type 'Endpoint':
  % declared in '/usr/local/etc/icinga2/zones.d/masterjochen/agents.conf', lines 90:1-90:33
  * __name = "5900x.mydomain.de"
  * host = "5900x.mydomain.de"
    % = modified in '/usr/local/etc/icinga2/zones.d/masterjochen/agents.conf', lines 91:3-91:26
  * log_duration = 86400
  * name = "5900x.mydomain.de"
  * package = "_etc"
  * port = "5665"
  * source_location
    * first_column = 1
    * first_line = 90
    * last_column = 33
    * last_line = 90
    * path = "/usr/local/etc/icinga2/zones.d/masterjochen/agents.conf"
  * templates = [ "5900x.mydomain.de" ]
    % = modified in '/usr/local/etc/icinga2/zones.d/masterjochen/agents.conf', lines 90:1-90:33
  * type = "Endpoint"
  * zone = "masterjochen"


Object '5900x.mydomain.de' of type 'Host':
  % declared in '/usr/local/etc/icinga2/zones.d/masterjochen/hosts.conf', lines 111:1-111:29
  * __name = "5900x.mydomain.de"
  * action_url = ""
  * address = ""
  * address6 = "2a00:3ff:1:4ff::f"
    % = modified in '/usr/local/etc/icinga2/zones.d/masterjochen/hosts.conf', lines 112:3-112:31
  * check_command = "hostalive"
    % = modified in '/usr/local/etc/icinga2/zones.d/masterjochen/templates.conf', lines 6:3-6:29
  * check_interval = 60
    % = modified in '/usr/local/etc/icinga2/zones.d/masterjochen/templates.conf', lines 4:3-4:21
  * check_period = ""
  * check_timeout = null
  * command_endpoint = ""
  * display_name = "5900x.mydomain.de"
  * enable_active_checks = true
  * enable_event_handler = true
  * enable_flapping = false
  * enable_notifications = true
  * enable_passive_checks = true
  * enable_perfdata = true
  * event_command = ""
  * flapping_threshold = 0
  * flapping_threshold_high = 30
  * flapping_threshold_low = 25
  * groups = [ ]
  * icon_image = ""
  * icon_image_alt = ""
  * max_check_attempts = 3
    % = modified in '/usr/local/etc/icinga2/zones.d/masterjochen/templates.conf', lines 3:3-3:24
  * name = "5900x.mydomain.de"
  * notes = ""
  * notes_url = ""
  * package = "_etc"
  * retry_interval = 30
    % = modified in '/usr/local/etc/icinga2/zones.d/masterjochen/templates.conf', lines 5:3-5:22
  * source_location
    * first_column = 1
    * first_line = 111
    * last_column = 29
    * last_line = 111
    * path = "/usr/local/etc/icinga2/zones.d/masterjochen/hosts.conf"
  * templates = [ "5900x.mydomain.de", "generic-host" ]
    % = modified in '/usr/local/etc/icinga2/zones.d/masterjochen/hosts.conf', lines 111:1-111:29
    % = modified in '/usr/local/etc/icinga2/zones.d/masterjochen/templates.conf', lines 2:1-2:28
  * type = "Host"
  * vars
    * agent_endpoint = "5900x.mydomain.de"
      % = modified in '/usr/local/etc/icinga2/zones.d/masterjochen/hosts.conf', lines 116:3-116:41
    * cload1 = "50"
      % = modified in '/usr/local/etc/icinga2/zones.d/masterjochen/hosts.conf', lines 121:3-121:20
    * cload15 = "20"
      % = modified in '/usr/local/etc/icinga2/zones.d/masterjochen/hosts.conf', lines 123:3-123:21
    * cload5 = "30"
      % = modified in '/usr/local/etc/icinga2/zones.d/masterjochen/hosts.conf', lines 122:3-122:20
    * dig_server = "93.123.321.3"
      % = modified in '/usr/local/etc/icinga2/zones.d/masterjochen/hosts.conf', lines 115:3-115:34
    * os = "Linux"
      % = modified in '/usr/local/etc/icinga2/zones.d/masterjochen/hosts.conf', lines 117:3-117:19
    * procs_critical = "1400"
      % = modified in '/usr/local/etc/icinga2/zones.d/masterjochen/hosts.conf', lines 114:3-114:30
    * procs_warning = "1200"
      % = modified in '/usr/local/etc/icinga2/zones.d/masterjochen/hosts.conf', lines 113:3-113:29
    * wload1 = "25"
      % = modified in '/usr/local/etc/icinga2/zones.d/masterjochen/hosts.conf', lines 118:3-118:20
    * wload15 = "15"
      % = modified in '/usr/local/etc/icinga2/zones.d/masterjochen/hosts.conf', lines 120:3-120:21
    * wload5 = "20"
      % = modified in '/usr/local/etc/icinga2/zones.d/masterjochen/hosts.conf', lines 119:3-119:20
  * volatile = false
  * zone = "masterjochen"
  
Object '5900x.mydomain.de!procs' of type 'Service':
  % declared in '/usr/local/etc/icinga2/zones.d/masterjochen/templates.conf', lines 42:1-42:21
  * __name = "5900x.mydomain.de!procs"
  * action_url = ""
  * check_command = "procs"
    % = modified in '/usr/local/etc/icinga2/zones.d/masterjochen/templates.conf', lines 44:3-44:25
  * check_interval = 60
    % = modified in '/usr/local/etc/icinga2/zones.d/masterjochen/templates.conf', lines 11:3-11:21
  * check_period = ""
  * check_timeout = null
  * command_endpoint = "5900x.mydomain.de"
    % = modified in '/usr/local/etc/icinga2/zones.d/masterjochen/templates.conf', lines 46:3-46:45
  * display_name = "procs"
  * enable_active_checks = true
  * enable_event_handler = true
  * enable_flapping = false
  * enable_notifications = true
  * enable_passive_checks = true
  * enable_perfdata = true
  * event_command = ""
  * flapping_threshold = 0
  * flapping_threshold_high = 30
  * flapping_threshold_low = 25
  * groups = [ ]
  * host_name = "5900x.mydomain.de"
    % = modified in '/usr/local/etc/icinga2/zones.d/masterjochen/templates.conf', lines 42:1-42:21
  * icon_image = ""
  * icon_image_alt = ""
  * max_check_attempts = 5
    % = modified in '/usr/local/etc/icinga2/zones.d/masterjochen/templates.conf', lines 10:3-10:24
  * name = "procs"
    % = modified in '/usr/local/etc/icinga2/zones.d/masterjochen/templates.conf', lines 42:1-42:21
  * notes = ""
  * notes_url = ""
  * package = "_etc"
    % = modified in '/usr/local/etc/icinga2/zones.d/masterjochen/templates.conf', lines 42:1-42:21
  * retry_interval = 30
    % = modified in '/usr/local/etc/icinga2/zones.d/masterjochen/templates.conf', lines 12:3-12:22
  * source_location
    * first_column = 1
    * first_line = 42
    * last_column = 21
    * last_line = 42
    * path = "/usr/local/etc/icinga2/zones.d/masterjochen/templates.conf"
  * templates = [ "procs", "generic-service" ]
    % = modified in '/usr/local/etc/icinga2/zones.d/masterjochen/templates.conf', lines 42:1-42:21
    % = modified in '/usr/local/etc/icinga2/zones.d/masterjochen/templates.conf', lines 9:1-9:34
  * type = "Service"
  * vars = null
  * volatile = false
  * zone = "masterjochen"
    % = modified in '/usr/local/etc/icinga2/zones.d/masterjochen/templates.conf', lines 42:1-42:21

Well I think that was it.
On the target system there still was

include_recursive "conf.d"

which is why there was also a local check. Damn me for this stupid error.
Thank you for the help Roland.

You’re welcome. And don’t mind, sometimes another set of eyes can be helpful.