Passive checks in a zone with 2 satellites

Hi !
I have an environment with many zones and always 2 satellites by zone.
I have some passive checks working well when data is sent on a regular basis from all satellites (every 5 min)

but there is something that look strange to me,
when a passive check is sent from only 1 satellite, icinga doesn’t care and think service is overdue and affect an unknow soft state.
It result in an alternace of OK / UNKNOWN of the service.

Example of service with passive checks:

object Service "foo" {
    host_name = "bar"
    check_command = "dummy"
    max_check_attempts = "2"
    check_interval = 10m
    retry_interval = 5m
    check_timeout = 1m
    enable_active_checks = true
    enable_passive_checks = true
    vars.dummy_state = 3
}

Result :
image

Is this a normal behaviour ?

I’m interested in what you’re trying to do here. dummy_state = 3 will cause an unknown whenever the active check runs.

Are the satellites aware of each other? They’ll both run all the checks if not. They should have each other’s IP addresses in their respective zones.conf but not their own.

This is a passive check service, active check is only for freshness check.
This helped me to understand : https://somoit.net/icinga/icinga2-understanding-checks-notification-types

The dummy_state 3 is only aplied when check_interval is overdue, in this case it seeem this value is satellite dependant and not calculated for the service and this is my problem.

Here I can’t run the check on both satellite because it report a value from itself (script report on sat).

Hi,

are these satellites connected to each other? Which version of Icinga is used here?

Cheers,
Michael

Hi Michael, thank you for joining us :slight_smile:
Yes, satellites are on same LAN with version r2.11.2-1 on ubuntu 16.04.

Ok, then please share the zones.conf to get an idea about the hierarchy.

pretty simple :

object Zone "DC1" {
    parent = "master"
    endpoints = [ "sat1.lan", "sat2.lan" ]
}

Hm, and how are these check results fed into Icinga? Maybe you can share the script or sample calls.

To keep it simple : I run every 5 min a python script on the satellite itself. It do its work and when it finish sucessfully, sends an API request to localhost port 5665 with this function :

def postIcingaServiceResult(status, hostname, servicename, output, perfs, user, password, checkHostName):
    headers = {'Accept': 'application/json',
               'Content-type': 'application/json'}
    url = 'https://localhost:5665/v1/actions/process-check-result?service=%s!%s' % (
        hostname, servicename)
    payload = {
        'exit_status': status,
        'plugin_output': output,
        'performance_data': perfs,
        'check_source': checkHostName
    }
    r = requests.post(url=url, json=payload, auth=(
        user, password), verify=False, headers=headers)
    if r.status_code == 404:
        print('not found service=%s!%s' % (hostname, servicename))

This function is used by many passive checks, it works well.

In my context this is only executed from 1 satellite only, and use these parameters :

status : 0 (OK)
hostname : satellite name in icinga
servicename : the service i want to update
output : a string that say “script ok”
perfs : some perf data relative to script
user: icinga api user
password : icinga api password
checkHostName : satellite name in icinga

I don’t understand why the master does not merge the satellites results to determine service state.

There’s a little more to your zone that can cause duplicate checks, mainly the endpoints. Can we see those? For example, satellite 1 would look like this:

object Endpoint "sat1.lan" {
}

object Endpoint "sat2.lan" {
       host = "10.xx.xx.xx"
}

Similarly, the endpoint objects for satellite 2 would have a host entry for satellite 1, but not itself. They need this in order to coordinate with each other. This gets really scary when they fire event handlers at the same time.

Hi Blake, here my endpoints config (dns resolving works well)

on master :

object Endpoint "sat1.lan" {
  host = "sat1.lan"
}
object Endpoint "sat2.lan" {
  host = "sat2.lan"
}

Nothing fancy here :confused:

on sat1 :

object Endpoint "master.lan" {
        host = "master.lan"
        port = "5665"
}

object Zone "master" {
        endpoints = [ "master.lan" ]
}

object Endpoint "sat1.lan" {
}

object Zone "DC1" {
        endpoints = [ "sat1.lan" ]
        parent = "master"
}

on sat2 :

object Endpoint "master.lan" {
        host = "master.lan";
        port = "5665";
}

object Zone "master" {
        endpoints = [ "master.lan" ];
}

object Endpoint NodeName {
}

object Zone "DC1" {
        endpoints = [ NodeName ];
        parent = "master";
}

Is Zone DC1 well configured for you ? should it reference 2 sat ?

Just to be clear, when you said “There’s a little more to your zone that can cause duplicate checks” do you assume that I’m running passive checks ?

As said the two satellites need to know about each other.
So you should add an endpoint object for sat1 in zones.conf of sat2 and vice versa.

Example from a satellite belonging to a zone with another satellite:

/*
 * Generated by Icinga 2 node setup commands
 * on 2018-03-13 16:46:05 +0100
 */

object Endpoint "master1" {
}

object Endpoint "master2" {
}

object Zone "master" {
        endpoints = [ "master1", "master2" ]
}

object Zone "global-templates" {
        global = true
}

object Zone "director-global" {
        global = true
}
object Endpoint "sat2" {
         host = "sat2" << should be in sat1 zones.conf and vice versa in sat2 zones.conf
}
object Endpoint "sat1" {
}
object Zone "satellite" {
        endpoints = [ "sat1", "sat2" ]
        parent = "master"
}

side note: will you look at that. I just found out that my zones.conf is missing the host = sat2 entry… xD

1 Like

Hi !
I’m Alesk coworker.

One think to add : We have used director to configure zones.
And I’ve just seen that it is not recommended :

Same behaviour with this content in all sat zone.conf file :

object Endpoint "master.lan" {
        host = "master.lan"
        port = "5665"
}

object Zone "master" {
        endpoints = [ "master.lan" ]
}

object Endpoint "sat1.lan" {
        host = "sat1.lan"
}

object Endpoint "sat2.lan" {
        host = "sat2.lan"
}

object Zone "DC1" {
        endpoints = [ "sat1.lan","sat2.lan" ]
        parent = "master"
}

object Zone "global-templates" {
        global = true
}

object Zone "director-global" {
        global = true
}

Is this a valid configuration for master/sat environment ?

so on sat1.lan, you can remove the host line for itself, but leave sat2’s host, and vice versa. This way they’re not trying to connect to themselves but will connect to each other. You’ll then need to restart each satellite.

In fact, I was wrong in my last post …

It seems to work with config in post #14
I’ve done what you asked but can you explain a little (I’m curious) :

Thank you !

In a top down config, the zones.conf file on a server typically doesn’t have a host address in the endpoint object for itself. Examples of how the zones.conf file is written on different server types are provided here: https://icinga.com/docs/icinga2/latest/doc/06-distributed-monitoring/#three-levels-with-masters-satellites-and-agents

@blakehartshorn Thank you for all of this !

We have updated every zone.conf considering thoses best practices and check are now well merged and no flapping is visible anymore on the passive checks.

Issue solved :slight_smile:

4 Likes