Check_procs alert if service is not running

Hello,

I’m sure this is very simple… but I can’t seem to find what I’m looking for. I really have 2 problems.

  1. I can’t seem to figure out how vars.procs_warning and vars.procs_critical actually work under the hood. My intention is actually binary… if the service is running, we should be good “OK”… and if the service is not running, Icinga2 should alert “CRITICAL”. I’ve been messing around trying to figure out the right formula for a day and a half and finally decided it’s just best to ask :slight_smile:

  2. When I run the check directly on the target running as the nagios user (or as the root user), I get an accurate result… but when I run the test via Icingaweb2 I get a 0 back no matter what. It’s almost like the check isn’t even running on the remote host maybe?

Here are what I think are the most relevant configurations:

template Host "generic-host" {
  max_check_attempts = 3
  check_interval = 1m
  retry_interval = 30s
  
  check_command = "hostalive"

  vars.notification["mail"] = {
    groups = [ "icingaadmins" ]
  }
}

template Service "generic-service" {
  max_check_attempts = 5
  check_interval = 1m
  retry_interval = 30s
}

object Host "elastic01.fqdn.domain.net" {
  import "generic-host"
  address = "10.0.0.5"
  vars.os = "Ubuntu"
  vars.os_type = "Linux" 

  vars.client_endpoint = name
}

apply Service "kibana" {
  import "generic-service"
  
  check_command = "procs"
  vars.procs_warning = "1:1"
  vars.procs_critical = "0:2"
  vars.procs_argument = "/usr/share/kibana"
  host_name = "elastic01.fqdn.domain.net"  
 
  assign where host.name == "elastic01.fqdn.domain.net"
}

Which seems to yield the following check command line:

nagios@elastic01:/root$ '/usr/lib/nagios/plugins/check_procs' '-a' '/usr/share/kibana' '-c' '0:2' '-w' '1:1'
PROCS OK: 1 process with args '/usr/share/kibana' | procs=1;1:1;0:2;0;

However, when Icinga2 runs it… it always returns 0, which SHOULD be a CRITICAL not a WARNING.

full disclosure, I posted the same question on monitoring-portal.org, which seems like a very similar site… I’m not 100% sure what the relationship between this site and that site are, but I just wanted to make sure my bases are covered.

Hi & welcome to the icinga community.

Try this:

vars.procs_critical = “1:”

Assuming the host to be checked is installed as agent, you need this in your service definition:

command_endpoint = host.name

1 Like

Hey, thanks for the response Roland… if I add

command_endpoint = host.name to the apply service, Icinga2 fails to start

although, the critical thing does seem to work :+1:

Most properly your zone config is wrong. But without any details or at least the error message it’s not possible (at least for me) to give any advice.

1 Like

Hey Roland, I get that… and again… thanks for the help. It took me a minute to figure out which logs are valuable and what not…

Here are the results of /var/log/icinga2/icinga2.log with debug on…

Jan 03 02:16:31 management01 icinga2[21185]: [2020-01-03 02:16:31 +0000] debug/HostGroup: Assigning membership for group 'linux-servers' to host 'docker01.fqdn.domain.net'
Jan 03 02:16:31 management01 icinga2[21185]: [2020-01-03 02:16:31 +0000] debug/HostGroup: Assigning membership for group 'windows-servers' to host 'dc01.fqdn.domain.net'
Jan 03 02:16:31 management01 icinga2[21185]: [2020-01-03 02:16:31 +0000] debug/HostGroup: Assigning membership for group 'linux-servers' to host 'management01.fqdn.domain.net'
Jan 03 02:16:31 management01 icinga2[21185]: [2020-01-03 02:16:31 +0000] debug/HostGroup: Assigning membership for group 'linux-servers' to host 'ns01.fqdn.domain.net'

Jan 03 02:16:31 management01 icinga2[21185]: [2020-01-03 02:16:31 +0000] critical/config: Error: Validation failed for object 'elastic01.fqdn.domain.net!kibana' of type 'Service'
Jan 03 02:16:31 management01 icinga2[21185]: Location: in /etc/icinga2/conf.d/elastic_services.conf: 4:3-4:30
Jan 03 02:16:31 management01 icinga2[21185]: /etc/icinga2/conf.d/elastic_services.conf(2):   import "generic-service"
Jan 03 02:16:31 management01 icinga2[21185]: /etc/icinga2/conf.d/elastic_services.conf(3):
Jan 03 02:16:31 management01 icinga2[21185]: /etc/icinga2/conf.d/elastic_services.conf(4):   command_endpoint = host.name
Jan 03 02:16:31 management01 icinga2[21185]:                                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Jan 03 02:16:31 management01 icinga2[21185]: /etc/icinga2/conf.d/elastic_services.conf(5):
Jan 03 02:16:31 management01 icinga2[21185]: /etc/icinga2/conf.d/elastic_services.conf(6):   check_command = "procs"
Jan 03 02:16:31 management01 icinga2[21185]: [2020-01-03 02:16:31 +0000] critical/config: 1 error
Jan 03 02:16:31 management01 icinga2[21185]: [2020-01-03 02:16:31 +0000] notice/WorkQueue: Stopped WorkQueue threads for 'IdoMysqlConnection, ido-mysql'
Jan 03 02:16:31 management01 icinga2[21185]: [2020-01-03 02:16:31 +0000] notice/WorkQueue: Stopped WorkQueue threads for 'DaemonUtility::LoadConfigFiles'
Jan 03 02:16:31 management01 icinga2[21185]: [2020-01-03 02:16:31 +0000] critical/cli: Config validation failed. Re-run with 'icinga2 daemon -C' after fixing the config.
Jan 03 02:16:31 management01 icinga2[21185]: [2020-01-03 02:16:31 +0000] notice/cli: Worker process couldn't load its config
Jan 03 02:16:31 management01 audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=icinga2 comm="systemd" exe="/lib/systemd/systemd" hostname=? addr=? termina
Jan 03 02:16:31 management01 systemd[1]: icinga2.service: Main process exited, code=exited, status=1/FAILURE
Jan 03 02:16:31 management01 systemd[1]: icinga2.service: Failed with result 'exit-code'.
Jan 03 02:16:31 management01 systemd[1]: Failed to start Icinga host/service/network monitoring system.
-- Subject: Unit icinga2.service has failed
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
-- 
-- Unit icinga2.service has failed.
-- 
-- The result is RESULT.

And here are the results of icinga2 daemon -C

[2020-01-03 02:18:17 +0000] information/cli: Icinga application loader (version: r2.11.2-1)
[2020-01-03 02:18:17 +0000] information/cli: Loading configuration file(s).
[2020-01-03 02:18:17 +0000] information/ConfigItem: Committing config item(s).
[2020-01-03 02:18:17 +0000] information/ApiListener: My API identity: management01.fqdn.domain.net
[2020-01-03 02:18:17 +0000] critical/config: Error: Validation failed for object 'elastic01.fqdn.domain.net!kibana' of type 'Service'; Attribute 'command_endpoint': Object 'elastic01.fqdn.domain.net' of type 'Endpoint' does not exist.
Location: in /etc/icinga2/conf.d/elastic_services.conf: 4:3-4:30
/etc/icinga2/conf.d/elastic_services.conf(2):   import "generic-service"
/etc/icinga2/conf.d/elastic_services.conf(3):   
/etc/icinga2/conf.d/elastic_services.conf(4):   command_endpoint = host.name
                                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
/etc/icinga2/conf.d/elastic_services.conf(5): 
/etc/icinga2/conf.d/elastic_services.conf(6):   check_command = "procs"

[2020-01-03 02:18:17 +0000] critical/config: 1 error
[2020-01-03 02:18:17 +0000] critical/cli: Config validation failed. Re-run with 'icinga2 daemon -C' after fixing the config.
[2020-01-03 02:18:17 +0000] information/cli: Icinga application loader (version: r2.11.2-1)
[2020-01-03 02:18:17 +0000] information/cli: Loading configuration file(s).
[2020-01-03 02:18:17 +0000] information/ConfigItem: Committing config item(s).
[2020-01-03 02:18:17 +0000] information/ApiListener: My API identity: management01.fqdn.domain.net
[2020-01-03 02:18:17 +0000] critical/config: Error: Validation failed for object 'elastic01.fqdn.domain.net!kibana' of type 'Service'; Attribute 'command_endpoint': Object 'elastic01.fqdn.domain.net' of type 'Endpoint' does not exist.
Location: in /etc/icinga2/conf.d/elastic_services.conf: 4:3-4:30
/etc/icinga2/conf.d/elastic_services.conf(2):   import "generic-service"
/etc/icinga2/conf.d/elastic_services.conf(3):   
/etc/icinga2/conf.d/elastic_services.conf(4):   command_endpoint = host.name
                                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
/etc/icinga2/conf.d/elastic_services.conf(5): 
/etc/icinga2/conf.d/elastic_services.conf(6):   check_command = "procs"

[2020-01-03 02:18:17 +0000] critical/config: 1 error
[2020-01-03 02:18:17 +0000] critical/cli: Config validation failed. Re-run with 'icinga2 daemon -C' after fixing the config.

Let me know if there is something else you might need… I’m learning :slight_smile:

As I already expected, your zone config is wrong. To help one this we would need more info about your setup e.g. master, satellite(s), agents and their zones.conf. You don’t use the director, right?

1 Like

I have director installed, but I don’t use it as most tutorials available seem to be based on config files. My setup is currently 1 Icinga2 master and a few Linux boxes with the agent on it. I used this tutorial to get Icinga2 and Icingaweb2 installed --> https://www.howtoforge.com/how-to-install-icinga-2-monitoring-on-ubuntu-1804/

Below seem to be the relevant configs and files related to zones.

root@management01:/etc/icinga2# cat constants.conf 

const PluginDir = "/usr/lib/nagios/plugins"
const ManubulonPluginDir = "/usr/lib/nagios/plugins"
const PluginContribDir = "/usr/lib/nagios/plugins"
const NodeName = "management01.fqdn.domain.net"
const ZoneName = "master"
const TicketSalt = "xxxxxxxxxxxxxxxxxxxxxxxxxxxx"


root@management01:/etc/icinga2# cat icinga2.conf 

include "constants.conf"
include "zones.conf"
include <itl>
include <plugins>
include <plugins-contrib>
include <manubulon>
include <windows-plugins>
include <nscp>
include "features-enabled/*.conf"
include_recursive "conf.d"

root@management01:/etc/icinga2# cat zones.conf 

/*
 * Generated by Icinga 2 node setup commands
 * on 2019-12-27 04:16:47 +0000
 */

object Endpoint "management01.fqdn.domain.net" {
}

object Zone "master" {
	endpoints = [ "management01.fqdn.domain.net" ]
}

object Zone "global-templates" {
	global = true
}

object Zone "director-global" {
	global = true
}

root@management01:/etc/icinga2/zones.d# ls -alhr
total 12K
-rw-r--r-- 1 root   root    119 Oct 24 08:38 README
drwxr-x--- 8 nagios nagios 4.0K Jan  3 15:57 ..
drwxr-x--- 2 nagios nagios 4.0K Jan  3 02:09 .

As you can see, I haven’t setup or changed anything specific to zones… I think I’m trying to implement the “top down” configuration, but I don’t really have need for segmented zones or satellite nodes as far as I can tell. I have a small home lab (practicing skills for a corporate implementation at some point). I was trying to achieve all checks sourced from the master node (ping, http, etc…) with the exception of polling running services which would obviously need to be run from the agent. Now that I look, it actually looks like all my checks are returning the same value… which leads me to believe I’ve definitely skipped a step somewhere. Any help is appreciated… thanks

Checks like check_procs need to executed locally means at your agent. Every agent need its own zone (having master as its parent) and this is mssing in your setup.