Stuck with "PENDING" agents with "no Endpoint object found for identity"

This is my very first post here so let me say hi to everyone and thanks for your daily effort on this great product!
Back to business…

I’m experiencing a discomforting issue since 3 days on my new pre-production setup. As you will see on my master I’m using the Icinga2 RC1 package so I’m not expecting anyone to dig beyond a quick check on evident configuration mistakes.

I have 1 master “passive”, 1 satellite connecting to the master and 2 agents connecting to the satellite.
I already have a fully functional Icinga2 deployment but here there is no director and everything is set through puppet so there is no much of a GUI help neither I want it there.
Services on master and satellite works well but on the agents they are stuck (on icingaweb) in PENDING state (already tried to force manual re-check).

I have the following message on the satellite:
New client connection for identity 'agent-01.local' from [X.X.X.X]:35910 (no Endpoint object found for identity)
New client connection for identity 'agent-02.local' from [X.X.X.X]:35910 (no Endpoint object found for identity)

I really cannot understand what is wrong there. I’ve been looking for similar issues but somehoe I cannot find much in common with my case. Certificates are OK and that’s all confirmed during connection with the following:
Received certificate request for CN '......' signed by our CA
The certificate for CN '......' is valid and uptodate. Skipping automated renewal.

Changes done on the master node seem to “propagate” but something is still wrong.
Any hint would be really appreciated.

Thanks.

NODES:
master: mon-01.local (Fedora Server 32, package: icinga2-2.12.0.rc1.48)
satellite: mon-02.local (Debian 10.4, package: icinga2-2.10.3-2)
agent1: agent-01.local (Debian 10.4, package: icinga2-2.10.3-2)
agent2: agent-02.local (Debian 10.3, package: icinga2-2.10.3-2)

ZONES:
master
sat-01

[root@mon-01.local]# cat /etc/icinga2/zones.conf

object Endpoint “mon-01.local” {
// host = “mon-01.local” // already tried this
}

object Endpoint “mon-02.local” {
// host = “mon-02.local” // already tried this
}

object Zone “global-templates” {
global = true
}

object Zone “master” {
endpoints = [ “mon-01.local”, ]
}

object Zone “sat-01” {
endpoints = [ “mon-02.local” ]
parent = “master”
}

[root@mon-02.local]# cat /etc/icinga2/zones.conf

object Endpoint “mon-01.local” {
host = “mon-01.local”
}

object Endpoint “mon-02.local” {
host = “mon-02.local”
}

object Zone “global-templates” {
global = true
}

object Zone “master” {
endpoints = [ “mon-01.local”, ]
}

object Zone “sat-01” {
endpoints = [ “mon-02.local”, ]
parent = “master”
}

[root@agent-01.local]# cat /etc/icinga2/zones.conf

object Endpoint “agent-01.local” {
}

object Endpoint “mon-02.local” {
host = “mon-02.local”
}

object Zone “agent-01.local” {
endpoints = [ “agent-01.local”, ]
parent = “sat-01”
}

object Zone “global-templates” {
global = true
}

object Zone “sat-01” {
endpoints = [ “mon-02.local”, ]
}

[root@agent-02.local]# cat /etc/icinga2/zones.conf

object Endpoint “mon-02.local” {
host = “mon-02.local”
}

object Endpoint “agent-02.local” {
}

object Zone “agent-02.local” {
endpoints = [ “agent-02.local”, ]
parent = “sat-01”
}

object Zone “global-templates” {
global = true
}

object Zone “sat-01” {
endpoints = [ “mon-02.local”, ]
}

[root@mon-01.local]# cat /etc/icinga2/zones.d/master/hosts.conf

object Host “mon-01.local” {
address = “X.X.X.X”
groups = [ “linux-nodes”, ]
display_name = “mon-01.local”
check_command = “hostalive”
}

[root@mon-01.local]# cat /etc/icinga2/zones.d/master/hostgroups.conf

object HostGroup “linux-nodes” {
display_name = “Linux Servers”
groups = [ “linux-nodes”, ]
assign where host.vars.os == “linux”
}

[root@mon-01.local]# cat /etc/icinga2/zones.d/master/services.conf

apply Service “check_ssh” {
import “generic-service”

check_command = “ssh”
assign where (host.address || host.address6) && host.vars.os == “Linux”
}

apply Service for (disk_name => config in host.vars.disks) {
import “generic-service”

check_command = “disk”
command_endpoint = host.name
vars += config
assign where host.vars.os == “Linux”
ignore where host.vars.noagent
}

apply Service “load” {
import “generic-service”

check_command = “load”
assign where host.name == NodeName
}

[root@mon-01.local]# cat /etc/icinga2/zones.d/master/api-users.conf

object ApiUser “admin” {
password = “xxxxxxxxxxxx”
permissions = [ “status/query”, “actions/", "objects/modify/”, “objects/query/", ]
// permissions = [ "
” ] // already tried this
}

[root@mon-01.local]# cat /etc/icinga2/zones.d/sat-01/hosts.conf

object Host “mon-02.local” {
address = “X.X.X.X”
groups = [ “linux-nodes”, ]
display_name = “mon-02.local”
check_command = “hostalive”
vars.client_endpoint = name
}

object Host “agent-01.local” {
address = “X.X.X.X”
groups = [ “linux-nodes”, ]
display_name = “agent-01.local”
check_command = “hostalive”
vars.client_endpoint = name
}

object Zone “agent-01.local” {
endpoints = [ “agent-01.local” ]
parent = “sat-01”
}

object Endpoint “agent-01.local” {
log_duration = 0 // Disable the replay log for command endpoint agents
}

object Host “agent-02.local” {
address = “X.X.X.X”
groups = [ “linux-nodes”, ]
display_name = “agent-02.local”
check_command = “hostalive”
vars.client_endpoint = name
}

object Zone “agent-02.local” {
endpoints = [ “agent-02.local” ]
parent = “sat-01”
}

object Endpoint “agent-02.local” {
log_duration = 0 // Disable the replay log for command endpoint agents
}

[root@mon-01.local]# cat /etc/icinga2/zones.d/sat-01/services.conf

apply Service “sat-check_ssh” {
import “generic-service”
check_command = “ssh”
assign where host.zone == “sat-01”
}

apply Service “sat-icinga” {
import “generic-service”

check_command = “icinga”
assign where host.zone == “sat-01”
}

apply Service for (disk_name => config in host.vars.disks) {
import “generic-service”

check_command = “disk”
command_endpoint = host.name
vars += config
assign where host.zone == “sat-01”
ignore where host.vars.noagent
}

apply Service “sat-load” {
import “generic-service”

check_command = “load”
assign where host.zone == “sat-01”
}

You need to configure every zone and endpoint in zones.conf only (for 2.11).

1 Like

Hi and thanks for the reply Roland,
I’ve removed the zones and endpoints from hosts.conf and retried but that didn’t fix it.
Retried also moving that part (zones and endpoints) on the master zone.conf with the same result and again the same aforementioned messages on the satellite.

There are also the following messages which I forgot to write in my initial post:

[2020-06-18 14:21:01 +0000] warning/ApiListener: No data received on new API connection for identity ‘agent-01.local’. Ensure that the remote endpoints are properly configured in a cluster setup.
Context:
(0) Handling new API client connection
[2020-06-18 14:21:11 +0000] information/ApiListener: New client connection for identity ‘agent-01.local’ from [X.X.X.X]:56006 (no Endpoint object found for identity)

I don’t get much insights from these messages and I cannot see any error anywhere.
I’m stuck. : (

1 Like

You need to have every zone and endpoint definition for an agent in the chain to that agent e.g. zone and endpoint object for agent-01.local needs to be in zones.conf of mon-01.local, mon-02.local and agent-01.local

1 Like

Thanks a lot! I missed the concept of the "every zone and endpoint definition for an agent " present along its chain up to the master. That fixed all the previous posted messages.
Unfortunately the services are still permanently in pending state and, just like before, I have no idea why.

It may be worth to mention that trying to move the services.conf to the global-templates on the master leads the satellite to “say”:
Ignoring config update for zone 'global-templates' because we have an authoritative version of the zone's config.
which, to my understanding, it shouldn’t be because the directory /var/lib/icinga2/api/zones on the satellite (just like on the agents) is empty hence no .authoritative file anywhere below the master.

1 Like

There was a bug which was fixed in 2.11. Therefore, I’d recommend to update (at least) your satellite to 2.11.4.

1 Like

Upgraded to 2.11.4-1 (buster).
Output changed a little but didn’t fixthe issue; services are still in PENDING state.

(MASTER)

[2020-06-19 10:21:09 +0000] information/ApiListener: Started new listener on ‘[0.0.0.0]:5665’
[2020-06-19 10:21:09 +0000] information/DbConnection: ‘ido-pgsql’ started.
[2020-06-19 10:21:09 +0000] information/NotificationComponent: ‘notification’ started.
[2020-06-19 10:21:09 +0000] information/CheckerComponent: ‘checker’ started.
[2020-06-19 10:21:09 +0000] information/ConfigItem: Activated all objects.
[2020-06-19 10:21:09 +0000] information/IdoPgsqlConnection: ‘ido-pgsql’ resumed.
[2020-06-19 10:21:09 +0000] information/DbConnection: Resuming IDO connection: ido-pgsql
[2020-06-19 10:21:09 +0000] information/IdoPgsqlConnection: PGSQL IDO instance id: 1 (schema version: ‘1.14.3’)
[2020-06-19 10:21:09 +0000] information/IdoPgsqlConnection: Finished reconnecting to ‘ido-pgsql’ database ‘icinga’ in 0.118227 second(s).
[2020-06-19 10:21:19 +0000] information/ApiListener: New client connection for identity ‘mon-02.local’ from [X.X.X.X]:52030
[2020-06-19 10:21:19 +0000] information/ApiListener: Sending config updates for endpoint ‘mon-02.local’ in zone ‘sat-01’.
[2020-06-19 10:21:19 +0000] information/ApiListener: Syncing configuration files for global zone ‘global-templates’ to endpoint ‘mon-02.local’.
[2020-06-19 10:21:19 +0000] information/ApiListener: Syncing configuration files for zone ‘sat-01’ to endpoint ‘mon-02.local’.
[2020-06-19 10:21:19 +0000] information/ApiListener: Finished sending config file updates for endpoint ‘mon-02.local’ in zone ‘sat-01’.
[2020-06-19 10:21:19 +0000] information/ApiListener: Syncing runtime objects to endpoint ‘mon-02.local’.
[2020-06-19 10:21:19 +0000] information/ApiListener: Finished syncing runtime objects to endpoint ‘mon-02.local’.
[2020-06-19 10:21:19 +0000] information/ApiListener: Finished sending runtime config updates for endpoint ‘mon-02.local’ in zone ‘sat-01’.
[2020-06-19 10:21:19 +0000] information/ApiListener: Sending replay log for endpoint ‘mon-02.local’ in zone ‘sat-01’.
[2020-06-19 10:21:19 +0000] information/ApiListener: Replayed 24 messages.
[2020-06-19 10:21:19 +0000] information/ApiListener: Finished sending replay log for endpoint ‘mon-02.local’ in zone ‘sat-01’.
[2020-06-19 10:21:19 +0000] information/ApiListener: Finished syncing endpoint ‘mon-02.local’ in zone ‘sat-01’.
[2020-06-19 10:21:19 +0000] information/JsonRpcConnection: Received certificate request for CN ‘mon-02.local’ signed by our CA.
[2020-06-19 10:21:19 +0000] information/JsonRpcConnection: The certificate for CN ‘mon-02.local’ is valid and uptodate. Skipping automated renewal.
[2020-06-19 10:21:19 +0000] information/WorkQueue: #5 (ApiListener, RelayQueue) items: 0, rate: 0.55/s (33/min 33/5min 33/15min);
[2020-06-19 10:21:19 +0000] information/WorkQueue: #6 (ApiListener, SyncQueue) items: 0, rate: 0/s (0/min 0/5min 0/15min);
[2020-06-19 10:21:19 +0000] information/WorkQueue: #7 (IdoPgsqlConnection, ido-pgsql) items: 0, rate: 0.25/s (15/min 15/5min 15/15min);
[2020-06-19 10:21:39 +0000] information/WorkQueue: #7 (IdoPgsqlConnection, ido-pgsql) items: 7, rate: 0.85/s (51/min 51/5min 51/15min); empty in 9 seconds
[2020-06-19 10:21:49 +0000] information/WorkQueue: #7 (IdoPgsqlConnection, ido-pgsql) items: 1, rate: 1.16667/s (70/min 70/5min 70/15min);
[2020-06-19 10:22:09 +0000] information/WorkQueue: #7 (IdoPgsqlConnection, ido-pgsql) items: 6, rate: 1.76667/s (106/min 109/5min 109/15min); empty in 10 seconds
[2020-06-19 10:22:59 +0000] information/WorkQueue: #7 (IdoPgsqlConnection, ido-pgsql) items: 6, rate: 1.78333/s (107/min 202/5min 202/15min); empty in 9 seconds
[2020-06-19 10:23:09 +0000] information/WorkQueue: #7 (IdoPgsqlConnection, ido-pgsql) items: 6, rate: 1.75/s (105/min 221/5min 221/15min); empty in infinite time, your task handler isn’t able to keep up
[2020-06-19 10:23:49 +0000] information/WorkQueue: #7 (IdoPgsqlConnection, ido-pgsql) items: 5, rate: 1.75/s (105/min 297/5min 297/15min);
[2020-06-19 10:24:09 +0000] information/WorkQueue: #7 (IdoPgsqlConnection, ido-pgsql) items: 6, rate: 1.78333/s (107/min 335/5min 335/15min);
[2020-06-19 10:24:39 +0000] information/WorkQueue: #7 (IdoPgsqlConnection, ido-pgsql) items: 5, rate: 1.75/s (105/min 392/5min 392/15min);
[2020-06-19 10:24:59 +0000] information/WorkQueue: #7 (IdoPgsqlConnection, ido-pgsql) items: 6, rate: 1.75/s (105/min 426/5min 426/15min); empty in 9 seconds
[2020-06-19 10:25:39 +0000] information/WorkQueue: #7 (IdoPgsqlConnection, ido-pgsql) items: 6, rate: 1.75/s (105/min 504/5min 504/15min); empty in 9 seconds
[2020-06-19 10:25:49 +0000] information/WorkQueue: #7 (IdoPgsqlConnection, ido-pgsql) items: 6, rate: 1.75/s (105/min 521/5min 521/15min); empty in infinite time, your task handler isn’t able to keep up
[2020-06-19 10:25:59 +0000] information/WorkQueue: #7 (IdoPgsqlConnection, ido-pgsql) items: 6, rate: 1.75/s (105/min 538/5min 538/15min); empty in infinite time, your task handler isn’t able to keep up
[2020-06-19 10:26:09 +0000] information/WorkQueue: #7 (IdoPgsqlConnection, ido-pgsql) items: 6, rate: 1.75/s (105/min 556/5min 559/15min); empty in infinite time, your task handler isn’t able to keep up
[2020-06-19 10:26:09 +0000] information/ConfigObject: Dumping program state to file ‘/var/lib/icinga2/icinga2.state’
[2020-06-19 10:26:29 +0000] information/WorkQueue: #5 (ApiListener, RelayQueue) items: 0, rate: 0.25/s (15/min 81/5min 117/15min);
[2020-06-19 10:26:29 +0000] information/WorkQueue: #6 (ApiListener, SyncQueue) items: 0, rate: 0/s (0/min 0/5min 0/15min);
[2020-06-19 10:26:29 +0000] information/WorkQueue: #7 (IdoPgsqlConnection, ido-pgsql) items: 5, rate: 1.75/s (105/min 557/5min 599/15min);
[2020-06-19 10:29:29 +0000] information/WorkQueue: #7 (IdoPgsqlConnection, ido-pgsql) items: 5, rate: 1.73333/s (104/min 557/5min 937/15min); empty in 10 seconds
[2020-06-19 10:29:49 +0000] information/WorkQueue: #7 (IdoPgsqlConnection, ido-pgsql) items: 5, rate: 1.73333/s (104/min 555/5min 971/15min); empty in 9 seconds
[2020-06-19 10:29:59 +0000] information/WorkQueue: #7 (IdoPgsqlConnection, ido-pgsql) items: 5, rate: 1.75/s (105/min 559/5min 992/15min); empty in infinite time, your task handler isn’t able to keep up
[2020-06-19 10:30:09 +0000] information/WorkQueue: #7 (IdoPgsqlConnection, ido-pgsql) items: 6, rate: 1.75/s (105/min 557/5min 1011/15min);
[2020-06-19 10:30:59 +0000] information/WorkQueue: #7 (IdoPgsqlConnection, ido-pgsql) items: 6, rate: 1.73333/s (104/min 557/5min 1104/15min); empty in 9 seconds
[2020-06-19 10:31:09 +0000] information/ConfigObject: Dumping program state to file ‘/var/lib/icinga2/icinga2.state’
[2020-06-19 10:31:29 +0000] information/WorkQueue: #7 (IdoPgsqlConnection, ido-pgsql) items: 6, rate: 1.73333/s (104/min 555/5min 1161/15min);
[2020-06-19 10:31:38 +0000] information/WorkQueue: #5 (ApiListener, RelayQueue) items: 0, rate: 0.266667/s (16/min 79/5min 196/15min);
[2020-06-19 10:31:38 +0000] information/WorkQueue: #6 (ApiListener, SyncQueue) items: 0, rate: 0/s (0/min 0/5min 0/15min);
[2020-06-19 10:31:39 +0000] information/WorkQueue: #7 (IdoPgsqlConnection, ido-pgsql) items: 0, rate: 1.73333/s (104/min 555/5min 1178/15min);
[2020-06-19 10:31:59 +0000] information/WorkQueue: #7 (IdoPgsqlConnection, ido-pgsql) items: 6, rate: 1.76667/s (106/min 557/5min 1218/15min); empty in 9 seconds
[2020-06-19 10:33:09 +0000] information/WorkQueue: #7 (IdoPgsqlConnection, ido-pgsql) items: 6, rate: 1.73333/s (104/min 558/5min 1351/15min); empty in 9 seconds
[2020-06-19 10:34:29 +0000] information/WorkQueue: #7 (IdoPgsqlConnection, ido-pgsql) items: 6, rate: 1.73333/s (104/min 554/5min 1499/15min);
[2020-06-19 10:35:19 +0000] information/WorkQueue: #7 (IdoPgsqlConnection, ido-pgsql) items: 5, rate: 1.73333/s (104/min 555/5min 1592/15min); empty in 9 seconds
[2020-06-19 10:35:29 +0000] information/WorkQueue: #7 (IdoPgsqlConnection, ido-pgsql) items: 6, rate: 1.7/s (102/min 552/5min 1609/15min); empty in 59 seconds
[2020-06-19 10:36:09 +0000] information/ConfigObject: Dumping program state to file ‘/var/lib/icinga2/icinga2.state’
[2020-06-19 10:36:39 +0000] information/WorkQueue: #7 (IdoPgsqlConnection, ido-pgsql) items: 6, rate: 1.76667/s (106/min 556/5min 1685/15min);

(SATELLITE)

[2020-06-19 10:21:19 +0000] information/ApiListener: Reconnecting to endpoint ‘mon-01.local’ via host ‘X.X.X.X’ and port ‘5665’
[2020-06-19 10:21:19 +0000] information/ApiListener: New client connection for identity ‘mon-01.local’ to [X.X.X.X]:5665
[2020-06-19 10:21:19 +0000] information/ApiListener: Requesting new certificate for this Icinga instance from endpoint ‘mon-01.local’.
[2020-06-19 10:21:19 +0000] information/ApiListener: Sending config updates for endpoint ‘mon-01.local’ in zone ‘master’.
[2020-06-19 10:21:19 +0000] information/ApiListener: Finished sending config file updates for endpoint ‘mon-01.local’ in zone ‘master’.
[2020-06-19 10:21:19 +0000] information/ApiListener: Syncing runtime objects to endpoint ‘mon-01.local’.
[2020-06-19 10:21:19 +0000] information/ApiListener: Finished syncing runtime objects to endpoint ‘mon-01.local’.
[2020-06-19 10:21:19 +0000] information/ApiListener: Finished sending runtime config updates for endpoint ‘mon-01.local’ in zone ‘master’.
[2020-06-19 10:21:19 +0000] information/ApiListener: Sending replay log for endpoint ‘mon-01.local’ in zone ‘master’.
[2020-06-19 10:21:19 +0000] information/ApiListener: Finished sending replay log for endpoint ‘mon-01.local’ in zone ‘master’.
[2020-06-19 10:21:19 +0000] information/ApiListener: Finished syncing endpoint ‘mon-01.local’ in zone ‘master’.
[2020-06-19 10:21:19 +0000] information/ApiListener: Finished reconnecting to endpoint ‘mon-01.local’ via host ‘X.X.X.X’ and port ‘5665’
[2020-06-19 10:21:19 +0000] information/ApiListener: Applying config update from endpoint ‘mon-01.local’ of zone ‘master’.
[2020-06-19 10:21:19 +0000] information/ApiListener: Ignoring config update from endpoint ‘mon-01.local’ for zone ‘global-templates’ because we have an authoritative version of the zone’s config.
[2020-06-19 10:21:19 +0000] information/ApiListener: Ignoring config update from endpoint ‘mon-01.local’ for zone ‘sat-prague-01’ because we have an authoritative version of the zone’s config.
[2020-06-19 10:21:19 +0000] information/ApiListener: Received configuration updates (0) from endpoint ‘mon-01.local’ do not qualify for production, not triggering reload.

(AGENT1)

[2020-06-19 10:20:12 +0000] information/ApiListener: Started new listener on ‘[0.0.0.0]:5665’
[2020-06-19 10:20:12 +0000] information/ConfigItem: Activated all objects.
[2020-06-19 10:20:12 +0000] information/ApiListener: Reconnecting to endpoint ‘mon-02.local’ via host ‘X.X.X.X’ and port ‘5665’
[2020-06-19 10:20:12 +0000] information/ApiListener: New client connection for identity ‘mon-02.local’ to [X.X.X.X]:5665
[2020-06-19 10:20:12 +0000] information/ApiListener: Requesting new certificate for this Icinga instance from endpoint ‘mon-02.local’.
[2020-06-19 10:20:12 +0000] information/ApiListener: Sending config updates for endpoint ‘mon-02.local’ in zone ‘sat-01’.
[2020-06-19 10:20:12 +0000] information/ApiListener: Finished sending config file updates for endpoint ‘mon-02.local’ in zone ‘sat-01’.
[2020-06-19 10:20:12 +0000] information/ApiListener: Syncing runtime objects to endpoint ‘mon-02.local’.
[2020-06-19 10:20:12 +0000] information/ApiListener: Finished syncing runtime objects to endpoint ‘mon-02.local’.
[2020-06-19 10:20:12 +0000] information/ApiListener: Finished sending runtime config updates for endpoint ‘mon-02.local’ in zone ‘sat-01’.
[2020-06-19 10:20:12 +0000] information/ApiListener: Sending replay log for endpoint ‘mon-02.local’ in zone ‘sat-01’.
[2020-06-19 10:20:12 +0000] information/ApiListener: Finished sending replay log for endpoint ‘mon-02.local’ in zone ‘sat-01’.
[2020-06-19 10:20:12 +0000] information/ApiListener: Finished syncing endpoint ‘mon-02.local’ in zone ‘sat-01’.
[2020-06-19 10:20:12 +0000] information/ApiListener: Finished reconnecting to endpoint ‘mon-02.local’ via host ‘X.X.X.X’ and port ‘5665’
[2020-06-19 10:20:12 +0000] information/ApiListener: Applying config update from endpoint ‘mon-02.local’ of zone ‘sat-01’.
[2020-06-19 10:20:12 +0000] information/ApiListener: Received configuration updates (0) from endpoint ‘mon-02.local’ do not qualify for production, not triggering reload.
[2020-06-19 10:20:21 +0000] information/WorkQueue: #5 (ApiListener, RelayQueue) items: 0, rate: 0/s (0/min 0/5min 0/15min);
[2020-06-19 10:20:21 +0000] information/WorkQueue: #6 (ApiListener, SyncQueue) items: 0, rate: 0/s (0/min 0/5min 0/15min);
[2020-06-19 10:25:12 +0000] information/ConfigObject: Dumping program state to file ‘/var/lib/icinga2/icinga2.state’
[2020-06-19 10:25:31 +0000] information/WorkQueue: #5 (ApiListener, RelayQueue) items: 0, rate: 0/s (0/min 0/5min 0/15min);
[2020-06-19 10:25:31 +0000] information/WorkQueue: #6 (ApiListener, SyncQueue) items: 0, rate: 0/s (0/min 0/5min 0/15min);
[2020-06-19 10:30:12 +0000] information/ConfigObject: Dumping program state to file ‘/var/lib/icinga2/icinga2.state’

(AGENT2)

[2020-06-19 10:20:24 +0000] information/ApiListener: Started new listener on ‘[0.0.0.0]:5665’
[2020-06-19 10:20:24 +0000] information/ApiListener: Reconnecting to endpoint ‘mon-02.local’ via host ‘X.X.X.X’ and port ‘5665’
[2020-06-19 10:20:24 +0000] information/ConfigItem: Activated all objects.
[2020-06-19 10:20:24 +0000] information/ApiListener: New client connection for identity ‘mon-02.local’ to [X.X.X.X]:5665
[2020-06-19 10:20:24 +0000] information/ApiListener: Requesting new certificate for this Icinga instance from endpoint ‘mon-02.local’.
[2020-06-19 10:20:24 +0000] information/ApiListener: Sending config updates for endpoint ‘mon-02.local’ in zone ‘sat-01’.
[2020-06-19 10:20:24 +0000] information/ApiListener: Finished sending config file updates for endpoint ‘mon-02.local’ in zone ‘sat-01’.
[2020-06-19 10:20:24 +0000] information/ApiListener: Syncing runtime objects to endpoint ‘mon-02.local’.
[2020-06-19 10:20:24 +0000] information/ApiListener: Finished syncing runtime objects to endpoint ‘mon-02.local’.
[2020-06-19 10:20:24 +0000] information/ApiListener: Finished sending runtime config updates for endpoint ‘mon-02.local’ in zone ‘sat-01’.
[2020-06-19 10:20:24 +0000] information/ApiListener: Sending replay log for endpoint ‘mon-02.local’ in zone ‘sat-01’.
[2020-06-19 10:20:24 +0000] information/ApiListener: Finished sending replay log for endpoint ‘mon-02.local’ in zone ‘sat-01’.
[2020-06-19 10:20:24 +0000] information/ApiListener: Finished syncing endpoint ‘mon-02.local’ in zone ‘sat-01’.
[2020-06-19 10:20:24 +0000] information/ApiListener: Finished reconnecting to endpoint ‘mon-02.local’ via host ‘X.X.X.X’ and port ‘5665’
[2020-06-19 10:20:24 +0000] information/ApiListener: Applying config update from endpoint ‘mon-02.local’ of zone ‘sat-01’.
[2020-06-19 10:20:24 +0000] information/ApiListener: Received configuration updates (0) from endpoint ‘mon-02.local’ do not qualify for production, not triggering reload.
[2020-06-19 10:20:34 +0000] information/WorkQueue: #5 (ApiListener, RelayQueue) items: 0, rate: 0/s (0/min 0/5min 0/15min);
[2020-06-19 10:20:34 +0000] information/WorkQueue: #6 (ApiListener, SyncQueue) items: 0, rate: 0/s (0/min 0/5min 0/15min);
[2020-06-19 10:25:24 +0000] information/ConfigObject: Dumping program state to file ‘/var/lib/icinga2/icinga2.state’
[2020-06-19 10:25:44 +0000] information/WorkQueue: #5 (ApiListener, RelayQueue) items: 0, rate: 0/s (0/min 0/5min 0/15min);
[2020-06-19 10:25:44 +0000] information/WorkQueue: #6 (ApiListener, SyncQueue) items: 0, rate: 0/s (0/min 0/5min 0/15min);
[2020-06-19 10:30:24 +0000] information/ConfigObject: Dumping program state to file ‘/var/lib/icinga2/icinga2.state’
[2020-06-19 10:30:53 +0000] information/WorkQueue: #5 (ApiListener, RelayQueue) items: 0, rate: 0/s (0/min 0/5min 0/15min);
[2020-06-19 10:30:53 +0000] information/WorkQueue: #6 (ApiListener, SyncQueue) items: 0, rate: 0/s (0/min 0/5min 0/15min);

Below a screenshot to give you a better idea of the situation from a “grid view” including everything.
The services with “sat-” prepending their names are applied to satellite and agents only (just mentioning it for the sake of the correctness).
Screenshot_2020-06-19_12-46-14

1 Like

this is your problem:

Please check delete all files named .authoritative on your satellite /var/lib/icinga2/api if there are some and restart icinga on the satellite.

If this does not help try these commands on your satellite:

systemctl stop icinga2
rm -rf /var/lib/icinga2/api/{packages,zones,zones-stage}/*
systemctl start icinga2
1 Like

This will also happen, if you have something in your satellites /etc/icinga2/zones.d/global-templates directory. This directory should be deleted, if you want to get updates from your master.

1 Like

Thank you Roland and Noah for your time.
I’ve done what both of you suggested.

cleaned up with
rm -rf /var/lib/icinga2/api/{packages,zones,zones-stage}/*
and restarted: nothing changed.

Removed directories master and sat-01 from the agents zone.d which were recreated by puppet due to an erroneous custom set of mine (weren’t there on the satellite) and I got a very encouraging:

[2020-06-19 11:55:05 +0000] information/ApiListener: Reconnecting to endpoint ‘mon-02.local’ via host ‘X.X.X.X’ and port ‘5665’
[2020-06-19 11:55:05 +0000] information/ApiListener: New client connection for identity ‘mon-02.local’ to [X.X.X.X]:5665
[2020-06-19 11:55:05 +0000] information/ApiListener: Requesting new certificate for this Icinga instance from endpoint ‘mon-02.local’.
[2020-06-19 11:55:05 +0000] information/ApiListener: Sending config updates for endpoint ‘mon-02.local’ in zone ‘sat-01’.
[2020-06-19 11:55:05 +0000] information/ApiListener: Finished sending config file updates for endpoint ‘mon-02.local’ in zone ‘sat-01’.
[2020-06-19 11:55:05 +0000] information/ApiListener: Syncing runtime objects to endpoint ‘mon-02.local’.
[2020-06-19 11:55:05 +0000] information/ApiListener: Finished syncing runtime objects to endpoint ‘mon-02.local’.
[2020-06-19 11:55:05 +0000] information/ApiListener: Finished sending runtime config updates for endpoint ‘mon-02.local’ in zone ‘sat-01’.
[2020-06-19 11:55:05 +0000] information/ApiListener: Sending replay log for endpoint ‘mon-02.local’ in zone ‘sat-01’.
[2020-06-19 11:55:05 +0000] information/ApiListener: Finished sending replay log for endpoint ‘mon-02.local’ in zone ‘sat-01’.
[2020-06-19 11:55:05 +0000] information/ApiListener: Finished syncing endpoint ‘mon-02.local’ in zone ‘sat-01’.
[2020-06-19 11:55:05 +0000] information/ApiListener: Finished reconnecting to endpoint ‘mon-02.local’ via host ‘X.X.X.X’ and port ‘5665’
[2020-06-19 11:55:05 +0000] information/ApiListener: Applying config update from endpoint ‘mon-02.local’ of zone ‘sat-01’.
[2020-06-19 11:55:05 +0000] information/ApiListener: Received configuration for zone ‘global-templates’ from endpoint ‘mon-02.local’. Comparing the timestamp and checksums.
[2020-06-19 11:55:05 +0000] information/ApiListener: Applying configuration file update for path ‘/var/lib/icinga2/api/zones-stage/global-templates’ (0 Bytes).
[2020-06-19 11:55:05 +0000] information/ApiListener: Received configuration for zone ‘sat-01’ from endpoint ‘mon-02.local’. Comparing the timestamp and checksums.
[2020-06-19 11:55:05 +0000] information/ApiListener: Applying configuration file update for path ‘/var/lib/icinga2/api/zones-stage/sat-01’ (0 Bytes).
[2020-06-19 11:55:05 +0000] information/ApiListener: Received configuration updates (2) from endpoint ‘mon-02.local’ do not qualify for production, not triggering reload.

Unfortunately services are still in PENDING state (I’ve restarted all the icinga(s) anywhere).
There is not newer snapshot available than RC1.48 for the master. I’m going to “downgrade” it to release and check if I can rule out the bug in the master node.

The current directory situation is as follows (no .authoritative files anywhere):
(SATELLITE)

root@mon-02.local:# tree -a /var/lib/icinga2/api/
|-- log
|   |-- 1592489858
|   | (plenty of them ....)
|   |-- 1592563160
|   `-- current
|-- packages
|   `-- _api
|       |-- active-stage
|       |-- active.conf
|       |-- ef362e4e-84r2-39id-23gf-38dh34ks9927
|       |   |-- conf.d
|       |   |-- include.conf
|       |   `-- zones.d
|       `-- include.conf
|-- repository
|-- zones
`-- zones-stage

9 directories, 54 files

(AGENT1)

root@agent-01.local:# tree -a /var/lib/icinga2/api/
/var/lib/icinga2/api/
|-- log
|   |-- 1592484283
|   | (plenty of them ....)
|   |-- 1592567701
|   `-- current   
|-- packages
|   `-- _api
|       |-- 54855795-5fc1-41a5-acfd-4da9012cf221
|       |   |-- conf.d
|       |   |-- include.conf
|       |   `-- zones.d
|       |-- active-stage
|       |-- active.conf
|       `-- include.conf
|-- repository
|-- zones
|   |-- global-templates
|   `-- sat-01
`-- zones-stage   
    |-- global-templates
    |   `-- .timestamp
    `-- sat-01
        `-- .timestamp

13 directories, 200 files

(AGENT2)

root@agent-02.local:# tree -a /var/lib/icinga2/api/
/var/lib/icinga2/api/
|-- log
|   |-- 1592482438
|   | (plenty of them ....)
|   |-- 1592567363
|   `-- current
|-- packages
|   `-- _api
|       |-- 8d90s8f0-8shd-734j-10so-9s879dkf2udl
|       |   |-- conf.d
|       |   |   `-- downtimes
|       |   |-- include.conf
|       |   `-- zones.d
|       |-- active-stage
|       |-- active.conf
|       `-- include.conf
|-- repository
|-- zones
|   |-- global-templates
|   `-- sat-01
`-- zones-stage
    |-- global-templates
    |   `-- .timestamp
    `-- sat-01
        `-- .timestamp

14 directories, 176 files
1 Like

Did this log message on your satellite go away?

[2020-06-19 10:21:19 +0000] information/ApiListener: Ignoring config update from endpoint ‘mon-01.local’ for zone ‘global-templates’ because we have an authoritative version of the zone’s config.

Because you didn’t say that you removed the /etc/icinga2/zones.d/global-templates directory on your satellite, which should resolve this issue.

Yes I did, I’ve just forgot to write it (sorry). I’ve removed global-templates among with the other two.
And yes, that message went away which is why I thought that it looked very promising.

In the meantime I’ve just downgraded to icinga2-2.11.4-1 on the master node which makes it now inline with the rest of the nodes.
Unfortunately nothing changed.

FOR THE RECORD (before I forget)
icinga2-2.12.0.rc1.42 and icinga2-2.12.0.rc1.48 (fedora) are seriously buggy
cpu and memory goes to 100% at any stop/reload of the service for up to a few minutes getting the node mostly unresponsive (must be killed with -9 to avoid long waiting). Release is OK.

1 Like

How are those services created on the agents? Could you share config of that? Because it seems like the agents don’t even know about the services. You can verify that using the following command on one of the agents:

icinga2 object list --type service --name sat-*

Also make sure, that the agents know about the hosts agent-01.local/agent-02.local, because the service can only be created on the agent, if it knows about the services host:

icinga2 object list --type host --name agent-*
1 Like

The command returns nothing! : (

the file configuration for sat-* is (on the master)
/etc/icinga2/zones.d/sat-01/services.conf

# This file is managed by Puppet. DO NOT EDIT.

apply Service "sat-check_ssh" {
  import "generic-service"

  check_command = "ssh"
  assign where host.zone == "sat-01"
}

apply Service "sat-icinga" {
  import "generic-service"

  check_command = "icinga"
  assign where host.zone == "sat-01"
}

apply Service for (disk_name => config in host.vars.disks) {
  import "generic-service"

  check_command = "disk"
  command_endpoint = host.name
  vars += config
  assign where host.zone == "sat-01"
  ignore where host.vars.noagent
}

apply Service "sat-load" {
  import "generic-service"

  check_command = "load"
  assign where host.zone == "sat-01"
}

apply Service "sat-procs" {
  import "generic-service"

  check_command = "procs"
  assign where host.zone == "sat-01"
}

apply Service "sat-swap" {
  import "generic-service"

  check_command = "swap"
  assign where host.zone == "sat-01"
}

apply Service "sat-users" {
  import "generic-service"

  check_command = "users"
  assign where host.zone == "sat-01"
}

Run on the master for agent-01 I get the following:

# icinga2 object list --type host --name agent-01.local
Object 'agent-01.local' of type 'Host':
  % declared in '/etc/icinga2/zones.d/sat-01/hosts.conf', lines 3:1-3:47
  * __name = "agent-01.local"
  * action_url = ""
  * address = "X.X.X.X"
    % = modified in '/etc/icinga2/zones.d/sat-01/hosts.conf', lines 4:3-4:25
  * address6 = ""
  * check_command = "hostalive"
    % = modified in '/etc/icinga2/zones.d/sat-01/hosts.conf', lines 7:3-7:29
  * check_interval = 300
  * check_period = ""
  * check_timeout = null
  * command_endpoint = ""
  * display_name = "agent-01.local"
    % = modified in '/etc/icinga2/zones.d/sat-01/hosts.conf', lines 6:3-6:52
  * enable_active_checks = true
  * enable_event_handler = true
  * enable_flapping = false
  * enable_notifications = true
  * enable_passive_checks = true
  * enable_perfdata = true
  * event_command = ""
  * flapping_threshold = 0
  * flapping_threshold_high = 30
  * flapping_threshold_low = 25
  * groups = [ "linux-nodes" ]
    % = modified in '/etc/icinga2/zones.d/sat-01/hosts.conf', lines 5:3-5:29
  * icon_image = ""
  * icon_image_alt = ""
  * max_check_attempts = 3
  * name = "agent-01.local"
  * notes = ""
  * notes_url = ""
  * package = "_etc"
  * retry_interval = 60
  * source_location
    * first_column = 1
    * first_line = 3
    * last_column = 47
    * last_line = 3
    * path = "/etc/icinga2/zones.d/sat-01/hosts.conf"
  * templates = [ "agent-01.local" ]
    % = modified in '/etc/icinga2/zones.d/sat-01/hosts.conf', lines 3:1-3:47
  * type = "Host"
  * vars = null
  * volatile = false
  * zone = "sat-01"

for agent-02 the command returns similar output. No output at all for the same command run on the satellite or on the agents themselves.

1 Like

Okay, there’re two issues with the way you configured the hosts and services.

  1. Your objects are configured in zones.d/sat-01, which results in this config only being synced to sat-01. Your agents never get those config files.

  2. assign where host.zone == "sat-01" will never match on hosts in your agent zones => services will never be created

How to fix those issues?

  1. Move your hosts to /etc/icinga2/zones.d/agent-01.local/hosts.conf and /etc/icinga2/zones.d/agent-02.local/hosts.conf and your service apply rules to /etc/icinga2/zones.d/global-templates/services.conf. That way all the hosts end up where they should and the apply rules will be synced to every endpoint.

  2. Change your services apply rules assign statement to something like assign where host.vars.client_endpoint so it will match on both your agents and the satellite.

1 Like

Does that mean that for every agent I’m going to deploy (and the satellites) I have to have directory within the master zones.d named against their FQHN and an exact copy of the hosts.conf file within?

1 Like

That doesn’t need to be an exact copy. The zones.d/agent-01.local/hosts.conf only needs hosts that should be checked in that zone.

In your case:

zones.d/agent-01.local/hosts.conf:

object Host “agent-01.local” {
  address = “X.X.X.X”
  groups = [ “linux-nodes”, ]
  display_name = “agent-01.local”
  check_command = “hostalive”
  vars.client_endpoint = name
}

zones.d/agent-02.local/hosts.conf:

object Host “agent-02.local” {
  address = “X.X.X.X”
  groups = [ “linux-nodes”, ]
  display_name = “agent-02.local”
  check_command = “hostalive”
  vars.client_endpoint = name
}

Note: Your hostalive checks will now be executed on the agents, which might not be very useful. You can fix that by adding zone = master to the hosts. That way they will be synced to the agents, to be able to apply local services to them, but the host alive checks are executed on the master.

I would recommend reading through the distributed monitoring chapter in the Icinga docs. That can always clean up some misconceptions on this very complex and sometimes confusing topic.

1 Like

Well, that’s why I’m here : ) . I went through the chapter but I didn’t get quite a lot of stuff apparently.
According to my understanding, for instance, in my case there should be on the master zones.conf

object Zone "sat-01" {
  endpoints = [ "mon-02.local", "agent-02.local", "agent-01.local", ]
  parent = "master"
}

which (as of now I believe) means that the zone sat-01 is the “child” of the master and within this zone there are the nodes called endpoints specified in the line above.
Therefore I would expect all the related hosts.conf files to end up into the directory
/etc/icinga2/zones.d/sat-01/
like, for example:
/etc/icinga2/zones.d/sat-01/mon-02.local.conf
/etc/icinga2/zones.d/sat-01/agent-01.local.conf
/etc/icinga2/zones.d/sat-01/agent-02.local.conf
as they are all belonging to the sat-01 zone.

You have suggested me to create a directory within the /etc/icinga2/zones.d/ (on the master I assumed) with the name of the node itself which would be seen like another zone by icinga.
So I find this confusing nevertheless I’ve tried it anyway and icinga complained with a:
[2020-06-19 14:38:29 +0000] warning/config: Ignoring directory '/etc/icinga2/zones.d/agent-01.local' for unknown zone 'agent-01.local'.

What is wrong from my assumptions?

P.S. I really appreciate your help, without you guys I would be stuck forever

1 Like

OK I think you meant to create
/etc/icinga2/zones.d/sat-01/agent-01.local.conf
/etc/icinga2/zones.d/sat-01/agent-02.local.conf
directly on the agents and not on the master.

1 Like

I’ve resolved pretty much!

ON THE MASTER

  • I’ve copied the necessary files into global-templates as specified in the guide (it was complaining when - I’ve moved services.conf in there)
    cp -vp /etc/icinga2/conf.d/{commands,groups,notifications,services,templates,timeperiods,users}.conf /etc/icinga2/zones.d/global-templates/

  • moved the previously discussed sat-* services into /etc/icinga2/zones.d/sat-01/services.conf

  • moved the standard services into /etc/icinga2/zones.d/global-templates/services.conf

  • grouped the hosts in zone sat-01 into /etc/icinga2/zone.d/sat-01/hosts.conf which contains:

object Host "agent-01.local" {
  address = "X.X.X.X"
  display_name = "agent-01.local"
  check_command = "hostalive"
}

object Host "mon-02.local" {
  address = "X.X.X.X"
  display_name = "mon-02.local"
  check_command = "hostalive"
}

object Host "agent-02.local" {
  address = "X.X.X.X"
  display_name = "agent-02.local"
  check_command = "hostalive"
}
  • kept pretty much the same /etc/icinga2/zone.d/master/hosts.conf (renamed from FQHN.conf but that should be irrelevant I think)
object Host "mon-01.local" {
  address = "X.X.X.X"
  display_name = "mon-01.local"
  check_command = "hostalive"
}

Now the zones.d looks like:

/etc/icinga2/zones.d/
├── global-templates
│   ├── commands.conf
│   ├── groups.conf
│   ├── notifications.conf
│   ├── services.conf
│   ├── templates.conf
│   ├── timeperiods.conf
│   └── users.conf
├── master
│   ├── api-users.conf
│   ├── hostgroups.conf
│   ├── hosts.conf
│   └── my-icinga2.pp
├── README
└── sat-prague-01
    ├── hosts.conf
    └── services.conf

3 directories, 14 files
  • master restarted

ON THE SATELLITE & THE 2 AGENTS
(there is nothing into /etc/icinga2/zones.d)

  1. systemctl stop icinga2
  2. rm -rvf /var/lib/icinga2/api/{packages,zones,zones-stage}/*
  3. systemctl start icinga2

it copied all the new configs and within seconds everything appeared in the icingaweb grid view and after a few minutes everything went green!

I still have received two messages on the agents logs during restarting:

[2020-06-19 16:50:35 +0000] information/ApiListener: Stage: Updating received configuration file '/var/lib/icinga2/api/zones-stage/sat-01/_etc/hosts.conf’ for zone ‘sat-01’.
[2020-06-19 16:50:35 +0000] information/ApiListener: Stage: Updating received configuration file '/var/lib/icinga2/api/zones-stage/sat-01/_etc/services.conf’ for zone ‘sat-01’.
[2020-06-19 16:50:35 +0000] information/ApiListener: Applying configuration file update for path ‘/var/lib/icinga2/api/zones-stage/sat-01’ (1656 Bytes).
[2020-06-19 16:50:35 +0000] information/ApiListener: Received configuration updates (2) from endpoint ‘mon-02.local’ do not qualify for production, not triggering reload.

Not sure why I’ve received this but it may be somehow related to the slightly earlier released messages:

[2020-06-19 16:50:35 +0000] information/ApiListener: Received configuration for zone ‘sat-01’ from endpoint ‘mon-02.local’. Comparing the timestamp and checksums.
[2020-06-19 16:50:35 +0000] information/ApiListener: Our production configuration is more recent than the received configuration update. Ignoring configuration file update for path ‘/var/lib/icinga2/api/zones-stage/sat-01’. Current timestamp ‘2020-06-19 16:50:24 +0000’ (1592585424.828333) >= received timestamp ‘2020-06-19 16:50:24 +0000’ (1
592585424.828333).

I’m sure there is a lot to be tuned but at least I have a running configuration to improve.
Thanks A LOT for your support and your crucial insights.

1 Like