Satellite checks pending

0xliam · July 5, 2019, 2:11am

Hi there,

Forgive me as I am relatively new to Icinga and my understanding is very basic - I have a multi-master configuration which is working correctly with local devices or public facing devices - I want to add a satellite behind a firewall/NAT to monitor some remote devices.

I have added the satellite zone to the master and the satellite seems it is pulling its config correctly, however it appears the satellite does not know about the checks.

I’ve definitely made a configuration error somewhere but I cannot seem to find where I have made the mistake.

In IcingaWeb I am seeing the hostalive checks on PENDING, however reachable is marked as ‘yes’ - the check source is blank too.

On the satellite I can see it is not aware of any pending checks:
[2019-07-05 11:53:26 +1000] notice/CheckerComponent: Pending checkables: 0; Idle checkables: 0; Checks/s: 0

In /etc/icinga2/zones.conf I have created the satellite zone:

object Endpoint "satellite" {
}
object Zone "satelliteZone" {
        endpoints = [ "satellite" ]
        parent = "master"
}

In /etc/icinga2/zones.d/ on the master, I have a folder for the satellite zone which contains two folders for the satellite server itself and a Windows host in that network.

Contents of /etc/icinga2/zones.d/satelliteZone/satellite/hosts.conf:

object Host "satellite" {
    import "generic-host" 
    address = "192.180.1.69"
    check_command = "hostalive"
    command_endpoint = "satellite"
    }

The Windows host is having the same problem, but I assume it’s the same cause, and I can also see on the satellite logs the Windows agent is sending its heartbeat.

On the master, I see in the logs:
[2019-07-05 11:56:05 +1000] debug/ApiListener: Not connecting to Endpoint ' **satellite** ' because the host/port attributes are missing

But I am seeing:
[2019-07-05 11:56:01 +1000] notice/JsonRpcConnection: Received ‘event::Heartbeat’ message from ’ satellite ’

Sorry for the wall of text… I’ve tried to include as much information as possible.

0xliam · July 5, 2019, 2:31am

I forgot to add, unfortunately we’re not using Director - once we get this working, we’re going to migrate all of our config to it.

stevie-sy · July 5, 2019, 5:01am

Hi, could you show us the /etc/icinga2/zones.conf on the master?
If there is a Firewall between? Did you allow to connect (ip and port)? The same point is with a local software Firewall (e.g. firewalld).

0xliam · July 5, 2019, 5:29am

Here’s the master zones.conf:

object Endpoint "master1" {
	host = "10.0.1.1"
}

object Endpoint "master2" {
	host = "10.0.2.1"
}

object Zone "master-cluster" {
	endpoints = [ "master1", "master2" ]
}

object Endpoint "satellite" {
}

object Zone "satelliteZone" {
        endpoints = [ "satellite" ]
        parent = "master-cluster"
}

There is a firewall between the master and the satellite, so we’re trying to get the satellite to connect outbound to the masters (which have public IPs).

I can see the satellite connecting to the master in the debug.log

[2019-07-05 15:25:33 +1000] notice/JsonRpcConnection: Received 'log::SetLogPosition' message from 'satellite'
[2019-07-05 15:25:38 +1000] notice/ApiListener: Setting log position for identity 'satellite': 2019/07/05 10:14:21
[2019-07-05 15:25:38 +1000] notice/JsonRpcConnection: Received 'log::SetLogPosition' message from 'satellite'
[2019-07-05 15:25:39 +1000] notice/JsonRpcConnection: Received 'event::Heartbeat' message from 'satellite'

So it doesn’t appear to be a network issue - everything seems to be connected - from what I understand the master will use the existing TCP connection and we don’t need bi-directional connectivity - I may be wrong on this though (hoping I am not as our design relies on the satellites connecting to the master, rather than the master connecting to satellites… we have a lot of devices behind firewalls that we don’t manage).

0xliam · July 5, 2019, 8:07am

Yay! Fixed Was a config issue all along in the global zone.

This can be marked as closed!

EDIT: to add some context - we had some broken config sitting in the global zone, which was applying correctly to the clients connecting directly to the masters, but when our satellite did its first config sync, it couldn’t apply a certain line, so it would error out…

We never saw the error because it happened as soon as the service starts and got lost in the rest of the logs. I was expecting the service to stop after receiving the bad config, but it kept running and appeared to be communicating with the masters so we thought it was an issue with where the checks were executing from.

icinga daemon -X pointed out the issue to us and it was obvious after seeing it.