Agent service stuck in pending

Hi all,

I installed icinga2 for distributed monitoring in December 2019. Everything works fine until 6/16/2020. New agent services all stuck in pending state. The master node “icinga-master” and satellite node “icinga-satellite” now still work fine. I could add new monitoring services to them. But all the new services assigned to the agent hosts stuck in pending.
I checked the /var/log/icinga2/icinga2.log of agent1. It shows agent1 could receive new config files from icinga-satellite. But it seems the timestamp of new config files stuck on 2020-06-16. So icinga would not update the old config files in agent1. Following are the log file icinga2.log of agent1:

[2020-07-22 07:32:32 +0000] information/ApiListener: Received configuration for zone ‘director-global’ from endpoint ‘icinga-satellite’. Comparing the timestamp and checksums.
[2020-07-22 07:32:32 +0000] information/ApiListener: Our production configuration is more recent than the received configuration update. Ignoring configuration file update for path ‘/var/lib/icinga2/api/zones-stage/director-global’. Current timestamp ‘2020-06-16 00:45:24 +0000’ (1592268324.776023) >= received timestamp ‘2020-06-16 00:45:24 +0000’ (1592268324.776023).

Can anyone give some suggestions?

Kind regards,
Jimmy

which version of icinga you are using

and try :
systemctl stop icinga2
rm -rf /var/lib/icinga2/api/{packages,zones,zones-stage}/

Devin, Thanks so much for your reply. Where should I run the two commands (systemctl stop and rm -rf)? On all the agent nodes? Or master node and satellite node?

Following is my icinga version:

root@icinga-master1:~# icinga2 --version
icinga2 - The Icinga 2 network monitoring daemon (version: r2.11.2-1)

Copyright © 2012-2020 Icinga GmbH (https://icinga.com/)
License GPLv2+: GNU GPL version 2 or later http://gnu.org/licenses/gpl2.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

System information:
Platform: Ubuntu
Platform version: 18.04.4 LTS (Bionic Beaver)
Kernel: Linux
Kernel version: 4.15.0-109-generic
Architecture: x86_64

Build information:
Compiler: GNU 8.3.0
Build host: runner-LTrJQZ9N-project-298-concurrent-0

Application information:

General paths:
Config directory: /etc/icinga2
Data directory: /var/lib/icinga2
Log directory: /var/log/icinga2
Cache directory: /var/cache/icinga2
Spool directory: /var/spool/icinga2
Run directory: /run/icinga2

Old paths (deprecated):
Installation root: /usr
Sysconf directory: /etc
Run directory (base): /run
Local state directory: /var

Internal paths:
Package data directory: /usr/share/icinga2
State path: /var/lib/icinga2/icinga2.state
Modified attributes path: /var/lib/icinga2/modified-attributes.conf
Objects path: /var/cache/icinga2/icinga2.debug
Vars path: /var/cache/icinga2/icinga2.vars
PID path: /run/icinga2/icinga2.pid

Please share the icinga2 --version in satellites/agenticinga2 --version in satellites/agent
zones.conf of master/satellites/agent

Also check whether hostalive is working for this agent
What happens if you force a re-check?
find out check source for the agents

it is recommended to use same version of icinga in all -
latest 2.11.4) upgrade on master - satellites then on agents

What did you change at this date? Did you upgrade to V2.11? If so, most properly your config is no longer valid since V2.11 zone and endpoint objects are allowed to be defined in zones.conf only.

Thanks again. icinga2 version for satterlite:
root@icinga-satellite:~# icinga2 --version
icinga2 - The Icinga 2 network monitoring daemon (version: r2.11.2-1)

Copyright © 2012-2020 Icinga GmbH
License GPLv2+: GNU GPL version 2 or later
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

System information:
Platform: Ubuntu
Platform version: 18.04.4 LTS (Bionic Beaver)
Kernel: Linux
Kernel version: 4.15.0-109-generic
Architecture: x86_64

Build information:
Compiler: GNU 8.3.0
Build host: runner-LTrJQZ9N-project-298-concurrent-0

Application information:

General paths:
Config directory: /etc/icinga2
Data directory: /var/lib/icinga2
Log directory: /var/log/icinga2
Cache directory: /var/cache/icinga2
Spool directory: /var/spool/icinga2
Run directory: /run/icinga2

Old paths (deprecated):
Installation root: /usr
Sysconf directory: /etc
Run directory (base): /run
Local state directory: /var

Internal paths:
Package data directory: /usr/share/icinga2
State path: /var/lib/icinga2/icinga2.state
Modified attributes path: /var/lib/icinga2/modified-attributes.conf
Objects path: /var/cache/icinga2/icinga2.debug
Vars path: /var/cache/icinga2/icinga2.vars
PID path: /run/icinga2/icinga2.pid

icinga2 version for agent1:
root@agent1:~# icinga2 --version
icinga2 - The Icinga 2 network monitoring daemon (version: r2.11.2-1)

Copyright © 2012-2020 Icinga GmbH
License GPLv2+: GNU GPL version 2 or later
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

System information:
Platform: Ubuntu
Platform version: 16.04.6 LTS (Xenial Xerus)
Kernel: Linux
Kernel version: 4.4.0-174-generic
Architecture: x86_64

Build information:
Compiler: GNU 5.4.0
Build host: runner-LTrJQZ9N-project-298-concurrent-0

Application information:

General paths:
Config directory: /etc/icinga2
Data directory: /var/lib/icinga2
Log directory: /var/log/icinga2
Cache directory: /var/cache/icinga2
Spool directory: /var/spool/icinga2
Run directory: /run/icinga2

Old paths (deprecated):
Installation root: /usr
Sysconf directory: /etc
Run directory (base): /run
Local state directory: /var

Internal paths:
Package data directory: /usr/share/icinga2
State path: /var/lib/icinga2/icinga2.state
Modified attributes path: /var/lib/icinga2/modified-attributes.conf
Objects path: /var/cache/icinga2/icinga2.debug
Vars path: /var/cache/icinga2/icinga2.vars
PID path: /run/icinga2/icinga2.pid

zone.f for master:
root@icinga-master1:~# cat /etc/icinga2/zones.conf

object Endpoint "icinga-master.***" {
	host = "icinga-master.***"
	port = "5665"
}

object Zone "master" {
	endpoints = [ "icinga-master.***" ]
}

object Endpoint "icinga-satellite.***" {
	host = "icinga-satellite.***"
	port = "5665"
}

object Zone "satellite" {
	endpoints = [ "icinga-satellite.***" ]
	parent = "master"
}

object Zone "global-templates" {
	global = true
}

object Zone "director-global" {
	global = true
}

zones.conf for satellite:

object Endpoint "icinga-master.***" {
	host = "icinga-master.***"
	port = "5665"
}

object Zone "master" {
	endpoints = [ "icinga-master.***" ]
}

object Endpoint "icinga-satellite.***" {
	host = "icinga-satellite.***"
	port = "5665"
}

object Zone "satellite" {
	endpoints = [ "icinga-satellite.***" ]
	parent = "master"
}

object Zone "global-templates" {
	global = true
}

object Zone "director-global" {
	global = true
}

zones.conf for agent1:

object Endpoint "icinga-master.***" {
	host = "10.**.***.**"
}

object Endpoint "icinga-satellite.***" {
    host = "10.**.***.**"
}

object Zone "master" {
	endpoints = [ "icinga-master.***" ]
}

object Zone "satellite" {
    endpoints = [ "icinga-satellite.***" ]
    parent = "master"
}

object Endpoint "agent1" {
}

object Zone "agent1" {
	endpoints = [ "agent1" ]
	parent = "satellite"
}

object Zone "global-templates" {
	global = true
}

object Zone "director-global" {
	global = true
}

Did you add your agents to satellite’s zones.conf?

Thanks for your help. No. I don’t add any agent to satellite’s zones.conf. And I used icinga director to add new services to the agent. It worked well before. But after 6/16, the new services could not update. I did not change the system on 6/16.

force re-check does not work. The new service still shows pending, since the agent did not update the config of the new service. The agent only runs the old services on 6/16.

If you configure your agents with director, you don’t need to add their zone and endpoint objects to zones.conf manually. This is done by director automatically means you just need to add a host.

Thanks again for your kind help. Right now both master and satellite could still accept new services. Only the agents could not update the new services. The log file in my first post shows the agent is receiving new config from satellite. But it just does not update it due to timestamp.

There were some bug fixes with 2.11.3 and 2.11.4, although I haven’t checked if you might be affected I’d recommend to update all nodes to 2.11.4. This helps especially if you identify a new bug.

Hi.

@rsx Since this problem (with the zones.conf) occurs quite often, would you mind creating a tutorial about it?

Greetings.

That’s funny, yesterday I had the same idea. Unfortunately, I have not enough time doing this the next weeks. Second, I can’t remember if V2.12 will change this again (which would make a tutorial obselete).

1 Like

Hi Roland @rsx ,

Thanks so much for your kind help. I have upgraded the icinga2 to version: r2.11.4-1 for master, satellite, and agents. And also tried to remove the config files in satellite, agents
systemctl stop icinga2
rm -rf /var/lib/icinga2/api/{packages,zones,zones-stage}/
systemctl start icinga2

The issue is still there. All the new services added to the agent stuck in pending state. My monitoring system stuck in this issue for three weeks. Don’t know what to do next to fix it. Any ideas of that??

The operating system for master and satellite are ubuntu 18.04.
The system for agent is ubuntu 16.04.

The icinga2 monitoring worked well before 6/16/2020.

Did I get it right? Your agents are working fine as well as all existing service checks. Only newly added checks stuck in pending? Did you check the logs? Did you place the new services in a different zone?

@rsx Thanks again for your kind reply. Yes, the agents are working fine with all existing service checks.

Yes, only newly added checks stuck in pending for agents.
But I could add the new services to master and satellite hosts. It seems only the agent could not accept the new config file from the satellite.
I add the new service through icinga director.

which log file I should check? icinga2.log or debug.log

icinga2.log at one “failing” agent should be enough.

Thakns @rsx. I upgraded the failing agent to r2.11.4-1 tonight. Following are the icinga2.log after the upgrade. Do I need to try " rm -rf /var/lib/icinga2/api/{packages,zones,zones-stage}/" again??

[2020-07-30 06:25:01 +0000] information/Application: Received USR1 signal, reopening application logs.
[2020-07-30 06:28:14 +0000] information/ConfigObject: Dumping program state to file ‘/var/lib/icinga2/icinga2.state’
[2020-07-30 06:28:53 +0000] information/WorkQueue: #5 (ApiListener, RelayQueue) items: 0, rate: 0/s (0/min 0/5min 0/15min);
[2020-07-30 06:28:53 +0000] information/WorkQueue: #6 (ApiListener, SyncQueue) items: 0, rate: 0/s (0/min 0/5min 0/15min);
[2020-07-30 06:32:18 +0000] information/Application: Received request to shut down.
[2020-07-30 06:32:18 +0000] information/Application: Shutting down…
[2020-07-30 06:32:18 +0000] information/CheckerComponent: ‘checker’ stopped.
[2020-07-30 06:32:18 +0000] information/ApiListener: ‘api’ stopped.
[2020-07-30 06:32:20 +0000] information/FileLogger: ‘main-log’ started.
[2020-07-30 06:32:20 +0000] information/FileLogger: ‘debug-file’ started.
[2020-07-30 06:32:20 +0000] information/ApiListener: ‘api’ started.
[2020-07-30 06:32:20 +0000] information/ApiListener: Started new listener on ‘[0.0.0.0]:5665’
[2020-07-30 06:32:20 +0000] information/CheckerComponent: ‘checker’ started.
[2020-07-30 06:32:20 +0000] information/ApiListener: Reconnecting to endpoint ‘icinga-satellite.’ via host '10...’ and port ‘5665’
[2020-07-30 06:32:20 +0000] information/ConfigItem: Activated all objects.
[2020-07-30 06:32:20 +0000] information/ApiListener: New client connection for identity 'icinga-satellite.’ to [10...]:5665
[2020-07-30 06:32:20 +0000] information/ApiListener: Requesting new certificate for this Icinga instance from endpoint ‘icinga-satellite.’.
[2020-07-30 06:32:20 +0000] information/ApiListener: Sending config updates for endpoint 'icinga-satellite.
’ in zone ‘satellite’.
[2020-07-30 06:32:20 +0000] information/ApiListener: Finished sending config file updates for endpoint ‘icinga-satellite.’ in zone ‘satellite’.
[2020-07-30 06:32:20 +0000] information/ApiListener: Syncing runtime objects to endpoint 'icinga-satellite.
’.
[2020-07-30 06:32:20 +0000] information/ApiListener: Finished syncing runtime objects to endpoint ‘icinga-satellite.’.
[2020-07-30 06:32:20 +0000] information/ApiListener: Finished sending runtime config updates for endpoint 'icinga-satellite.
’ in zone ‘satellite’.
[2020-07-30 06:32:20 +0000] information/ApiListener: Sending replay log for endpoint ‘icinga-satellite.’ in zone ‘satellite’.
[2020-07-30 06:32:20 +0000] information/ApiListener: Finished sending replay log for endpoint 'icinga-satellite.
’ in zone ‘satellite’.
[2020-07-30 06:32:20 +0000] information/ApiListener: Finished syncing endpoint ‘icinga-satellite.’ in zone ‘satellite’.
[2020-07-30 06:32:20 +0000] information/ApiListener: Finished reconnecting to endpoint 'icinga-satellite.
’ via host ‘10...’ and port ‘5665’
[2020-07-30 06:32:30 +0000] information/WorkQueue: #5 (ApiListener, RelayQueue) items: 0, rate: 0/s (0/min 0/5min 0/15min);
[2020-07-30 06:32:30 +0000] information/WorkQueue: #6 (ApiListener, SyncQueue) items: 0, rate: 0/s (0/min 0/5min 0/15min);
[2020-07-30 06:33:23 +0000] information/Application: Received request to shut down.
[2020-07-30 06:33:24 +0000] information/Application: Shutting down…
[2020-07-30 06:33:24 +0000] information/CheckerComponent: ‘checker’ stopped.
[2020-07-30 06:33:24 +0000] information/ApiListener: ‘api’ stopped.
[2020-07-30 06:33:24 +0000] information/FileLogger: ‘main-log’ started.
[2020-07-30 06:33:24 +0000] information/FileLogger: ‘debug-file’ started.
[2020-07-30 06:33:24 +0000] information/ApiListener: ‘api’ started.
[2020-07-30 06:33:24 +0000] information/ApiListener: Started new listener on ‘[0.0.0.0]:5665’
[2020-07-30 06:33:24 +0000] information/ApiListener: Reconnecting to endpoint 'icinga-satellite.
’ via host '10...’ and port ‘5665’
[2020-07-30 06:33:24 +0000] information/CheckerComponent: ‘checker’ started.
[2020-07-30 06:33:24 +0000] information/ConfigItem: Activated all objects.
[2020-07-30 06:33:24 +0000] information/ApiListener: New client connection for identity 'icinga-satellite.
’ to [10...]:5665
[2020-07-30 06:33:24 +0000] information/ApiListener: Requesting new certificate for this Icinga instance from endpoint 'icinga-satellite.
’.
[2020-07-30 06:33:24 +0000] information/ApiListener: Sending config updates for endpoint 'icinga-satellite.
’ in zone ‘satellite’.
[2020-07-30 06:33:24 +0000] information/ApiListener: Finished sending config file updates for endpoint 'icinga-satellite.
’ in zone ‘satellite’.
[2020-07-30 06:33:24 +0000] information/ApiListener: Syncing runtime objects to endpoint 'icinga-satellite.
’.
[2020-07-30 06:33:24 +0000] information/ApiListener: Finished syncing runtime objects to endpoint 'icinga-satellite.
’.
[2020-07-30 06:33:24 +0000] information/ApiListener: Finished sending runtime config updates for endpoint 'icinga-satellite.
’ in zone ‘satellite’.
[2020-07-30 06:33:24 +0000] information/ApiListener: Sending replay log for endpoint 'icinga-satellite.
’ in zone ‘satellite’.
[2020-07-30 06:33:24 +0000] information/ApiListener: Finished sending replay log for endpoint 'icinga-satellite.
’ in zone ‘satellite’.
[2020-07-30 06:33:24 +0000] information/ApiListener: Finished syncing endpoint 'icinga-satellite.
’ in zone ‘satellite’.
[2020-07-30 06:33:24 +0000] information/ApiListener: Finished reconnecting to endpoint 'icinga-satellite.
’ via host '10...’ and port ‘5665’
[2020-07-30 06:33:34 +0000] information/WorkQueue: #5 (ApiListener, RelayQueue) items: 0, rate: 0/s (0/min 0/5min 0/15min);
[2020-07-30 06:33:34 +0000] information/WorkQueue: #6 (ApiListener, SyncQueue) items: 0, rate: 0/s (0/min 0/5min 0/15min);
[2020-07-30 06:38:24 +0000] information/ConfigObject: Dumping program state to file ‘/var/lib/icinga2/icinga2.state’
[2020-07-30 06:38:44 +0000] information/WorkQueue: #5 (ApiListener, RelayQueue) items: 0, rate: 0/s (0/min 0/5min 0/15min);
[2020-07-30 06:38:44 +0000] information/WorkQueue: #6 (ApiListener, SyncQueue) items: 0, rate: 0/s (0/min 0/5min 0/15min);
[2020-07-30 06:43:24 +0000] information/ConfigObject: Dumping program state to file ‘/var/lib/icinga2/icinga2.state’
[2020-07-30 06:43:53 +0000] information/WorkQueue: #5 (ApiListener, RelayQueue) items: 0, rate: 0/s (0/min 0/5min 0/15min);
[2020-07-30 06:43:53 +0000] information/WorkQueue: #6 (ApiListener, SyncQueue) items: 0, rate: 0/s (0/min 0/5min 0/15min);
[2020-07-30 06:48:24 +0000] information/ConfigObject: Dumping program state to file ‘/var/lib/icinga2/icinga2.state’
[2020-07-30 06:49:03 +0000] information/WorkQueue: #5 (ApiListener, RelayQueue) items: 0, rate: 0/s (0/min 0/5min 0/15min);
[2020-07-30 06:49:03 +0000] information/WorkQueue: #6 (ApiListener, SyncQueue) items: 0, rate: 0/s (0/min 0/5min 0/15min);
[2020-07-30 06:53:24 +0000] information/ConfigObject: Dumping program state to file ‘/var/lib/icinga2/icinga2.state’
[2020-07-30 06:54:13 +0000] information/WorkQueue: #5 (ApiListener, RelayQueue) items: 0, rate: 0/s (0/min 0/5min 0/15min);
[2020-07-30 06:54:13 +0000] information/WorkQueue: #6 (ApiListener, SyncQueue) items: 0, rate: 0/s (0/min 0/5min 0/15min);
[2020-07-30 06:58:24 +0000] information/ConfigObject: Dumping program state to file ‘/var/lib/icinga2/icinga2.state’
[2020-07-30 06:59:23 +0000] information/WorkQueue: #6 (ApiListener, SyncQueue) items: 0, rate: 0/s (0/min 0/5min 0/15min);
[2020-07-30 06:59:23 +0000] information/WorkQueue: #5 (ApiListener, RelayQueue) items: 0, rate: 0/s (0/min 0/5min 0/15min);
[2020-07-30 07:03:24 +0000] information/ConfigObject: Dumping program state to file ‘/var/lib/icinga2/icinga2.state’
[2020-07-30 07:04:33 +0000] information/WorkQueue: #6 (ApiListener, SyncQueue) items: 0, rate: 0/s (0/min 0/5min 0/15min);
[2020-07-30 07:04:33 +0000] information/WorkQueue: #5 (ApiListener, RelayQueue) items: 0, rate: 0/s (0/min 0/5min 0/15min);
[2020-07-30 07:08:24 +0000] information/ConfigObject: Dumping program state to file ‘/var/lib/icinga2/icinga2.state’
[2020-07-30 07:09:43 +0000] information/WorkQueue: #6 (ApiListener, SyncQueue) items: 0, rate: 0/s (0/min 0/5min 0/15min);
[2020-07-30 07:09:43 +0000] information/WorkQueue: #5 (ApiListener, RelayQueue) items: 0, rate: 0/s (0/min 0/5min 0/15min);
[2020-07-30 07:13:24 +0000] information/ConfigObject: Dumping program state to file ‘/var/lib/icinga2/icinga2.state’
[2020-07-30 07:14:53 +0000] information/WorkQueue: #6 (ApiListener, SyncQueue) items: 0, rate: 0/s (0/min 0/5min 0/15min);
[2020-07-30 07:14:53 +0000] information/WorkQueue: #5 (ApiListener, RelayQueue) items: 0, rate: 0/s (0/min 0/5min 0/15min);
[2020-07-30 07:18:24 +0000] information/ConfigObject: Dumping program state to file ‘/var/lib/icinga2/icinga2.state’