Service checks / remote monitoring of servers from icinga clients

Dear Community,

I am trying to execute remote checks by Icinga clients.
The scripts are running eighter snmpwalk or snmpget, or doing curl API calls to remote servers where i am not able to install icinga agent, or nrpe agent on the monitored destination.

I am looking for a proper way of configuration / implementation if it is even possible.

Currently the host objects are defined on the master level.
Apply service check configuration is in the global_templates distributed to the satellites and child clients by the master API.

Hence the checks are still being executed by the master server and not by the satellite or the icinga client.

Is there a way to force the checks to be executed only by the icinga satellite or icinga client?

In the host obejct definition on the master i use:

vars.client_endpoint = "icinga-client-hostname"
zone = "satellite-parent-object"

In the global_templates i use following in the apply services:

command_endpoint = host.vars.client_endpoint
assign where host.address && host.zone == “satellite-parent-object” && host.vars.client_endpoint && “HOSTGROUP” in host.groups

The point is that i need the check_command to be executed only from the icinga client or from the icinga satellite, that would do the job.

best regards

Peter

Hello and welcome!

TLDR; I think the solution to your problem is to define the hosts and/or services themselves in the Zone’s config file (ie zones.d/Zone1/services.conf), but you can read my thought process below.

I would start by checking out the distributed monitoring docs to ensure you’re setup correctly there.

I would especially check out this section of the docs in regards to assigning services to a specific Zone. It looks like you have done this at my cursory glance, so something else might be missing here. Some things to check are:

  • is the new config loaded?
  • are the satellites setup properly
  • is the value of host.vars.client_endpoint actually the value you think it is? (On this note, you could and probably should put hosts in a satellite zone as opposed to master to avoid confusion).

If you actually need different service checks for a host to live in different Zones, you could add in the vars.client_endpoint to the service definition, since as it is it looks like you assign it to the master:

Currently the host objects are defined on the master level.

Side note:
We use the Director module for Icinga web to pull in hosts from different sources – config management dbs, csv files, etc… Using the sync rules we are able to assign check execution to specific Zones based off of location (ie, which datacenter is closest to the customer). You may want to use the Director in the future depending on the size of your environment and the services you provide, but then again, I used to work for an MSP that would edit Naemon config files with vim/nano every time we got a new client or devices.

@steaksauce Thank you very much for the welcome :slight_smile:

I am happy for your prompt reaction to this thread. (maybe the problem is between the chair and the keyboard, but i want to make sure).

Now let`s explain in more detail.

Maybe it is worth mentioning that we are running pretty old Icinga version and we should upgrade ASAP of course. (we are well aware of that) And yes Director sounds good too.
We are running Icinga server Version 2.4.2
:frowning:

We are monitoring around 15 000 linux servers running host checks and around 800 000 applied service checks on all linux servers. We are using NRPE agent and Icinga distributed checks. That all is running fine for several years.

But here i am having problem to force icinga to execute the check only from the source endpoint object of my choice defined on the master in the master zones.conf.

Obviously the service checks do work from the master server, but this is exactly what i want to avoid.

I want the check to be executed from a particular endpoint object defined in the zones.conf, taking away the load from the master server and utilize the unused resources of the satellites or the clients, that is the goal.

So i did as you proposed. I moved the service.conf back to the master.
It looks like i must be missing something.
See my setup below.

Here is the zones.conf of the master01fqdn server

/etc/icinga2/zones.conf
########################################

// Endpoints
object Endpoint “master01fqdn” {
host = “master01fqdn”
}

object Endpoint “satellite01fqdn” {
host = “satellite01fqdn”
}

object Endpoint “client01fqdn” {
host = “client01fqdn”
log_duration = 0
}

object Endpoint “client02fqdn” {
host = “client02fqdn”
log_duration = 0
}

// Zones
object Zone “master01” {
endpoints = [ “master01fqdn” ]
}

object Zone “satellite01” {
endpoints = [ “satellite01fqdn” ]
parent = “master01”
}

object Zone “satellite01-client” {
endpoints = [ “client01fqdn”, “client02fqdn” ]
parent = “satellite01”
}

// Global settings
object Zone “global-templates” {
global = true
}

#################################################

Here is host object definition located on the master server in
/etc/icinga2//zones.d/master01/hosts.conf

object Host “someserver01fqdn”{
display_name = “someserver01fqdn”
address = “someserver01fqdn”
check_command = “ping4”
vars.domain = “com”
vars.slevel = “production”
vars.status = “production”
groups = [ “slevel_production”, “status_production”, ]
vars.client_endpoint = “satellite01fqdn”
zone = “satellite01”
}

#####################################################

here is the apply service check definition located on the master in
/etc/icinga2//zones.d/master01/services.conf

template Service “service_template_TEST_SERVICE”{
import “generic-ec-service”
check_command = “TEST_COMMAND”
enable_notifications = false
max_check_attempts = 9
check_interval = 5m
retry_interval = 2m
enable_perfdata = false
}

apply Service “TEST_SERVICE” {
import “service_template_TEST_SERVICE”
command_endpoint = host.vars.client_endpoint
assign where host.address && host.zone == “satellite01” && host.command_endpoint == host.vars.client_endpoint && “TEST_hostgroup” in host.groups
}

I did not post the hostgroup config, but the someserver01fqdn is matching it and is a member of that hosgroup.

The actual error i am getting is, terminated with exit code 128, output: execvpe
No such file or directory

Which is obvious as far the script is not present on the master server by purpose, as i do not want the master to execute it, but eighter the satellite or balance it between clients in the satellite zone.

I hope i this explains the challenge i am trying to solve.

best regards

Peter

1 Like

Don’t thank me for a speedy response (and don’t always count on one), I just happen to look periodically throughout the day most days.

Worth noting we prefer markdown formatting per this guide.

Side note, I inherited a large instance (~25k+ hosts/services) that was several versions behind. We opted to do a side-by-side migration to fight any upgrading issues.

So i did as you proposed. I moved the service.conf back to the master.
It looks like i must be missing something.
See my setup below.

I didn’t say to move back to the master, or I didn’t mean to – I thought it was already on the master:

TLDR; I think the solution to your problem is to define the hosts and/or services themselves in the Zone’s config file (ie zones.d/Zone1/services.conf ), but you can read my thought process below.

Move it back to the satellite zones, any errors there? Seems like vars.client_endpoint = “satellite01fqdn” is correct. Since it looks like the services are meant to be defined in the satellite (as expected), I would try commenting out the command_endpoint = host.vars.client_endpoint in the template – I think the default behavior is to execute in the zone the service gets defined in anyways.


Leaving this part in for someone who finds this post looking for terminated with exit code 128 error, but it doesn’t pertain to you in this case:

The actual error i am getting is, terminated with exit code 128, output: execvpe
No such file or directory

Is this the error that you received when everything was defined in the satellite zone(s)?

IIRC, that’s related to the plugins/executed commands not existing on the satellite agents/nodes. Even if it’s not, it’s a good idea to have all of those synced up (we use puppet to deploy the /usr/lib64/nagios/plugins directory – just place a new plugin on the puppet master and let it do it’s thing. Any config management tool or even rsync/scp script would work).

Which check_command is Icinga2 trying to execute when you see this error? Is it the nrpe command? I assume that all of your nrpe configs/scripts are synced on the monitoring targets since this works on the master.

1 Like

My apologies for not using the proper formatting.
Will improve that in the future.

Anyway, i got the trick now.

  1. all the command objects definitions have to be distrubuted on the global-templates level (to all the known master chlildren declared in zones.conf, satellites, clients) as you already mentioned,
    the scripts / probes have to be present on the satellites, clients (all children).
    The command object must be known to everybody.
    Nothing else. (of course the scripts must have the proper permission set)
    icinga/var/lib_icinga2/api/zones/global-templates/_etc/commands.conf

  2. all the other object definition configuration like host objects, host groups, host checks, service template objects, service apply, service group objects are now defined on the master level only, not distributed down to the childrens.
    No host object or service object configuration is distributed on the global-templates level (to the children like satellite or icinga client)
    All is defined only on the master itself
    /etc/icinga2/zones.d/master01/services.conf

all the remote executions is now handled by the master server API directly, actually the script is executed by the directive “command_endpoint” that matches the assign directive on the master level for the host object to get the host checks working and for service checks in service template or service apply and the load is absorbed where ever i assign the command_endpoint within the known children to the master.

If needed, i can share some templates, examples.

Thank you