Icinga2 Agent won't sync director zone

Hi,
at the moment we have a icinga2 master-satellite-agent setup which is running fine so far. We are using icingaweb2 and director to configure and deploy commands and checks to all agents. We have 3 zones with 2 masters and 4 satellites. The masters are in the master zone, 2 satellites are in the production zone and 2 satellites are in the test zone. Most of our hosts are setup on debian buster which includes the master and satellites. The agent that has the problem is installed with centos.
Icinga master information:
icinga version:

root@icinga-master-1:~# icinga2 --version
icinga2 - The Icinga 2 network monitoring daemon (version: r2.13.2-1)

Copyright (c) 2012-2022 Icinga GmbH (https://icinga.com/)
License GPLv2+: GNU GPL version 2 or later <https://gnu.org/licenses/gpl2.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

System information:
  Platform: Debian GNU/Linux
  Platform version: 10 (buster)
  Kernel: Linux
  Kernel version: 4.19.0-18-amd64
  Architecture: x86_64

Build information:
  Compiler: GNU 8.3.0
  Build host: runner-hh8q3bz2-project-298-concurrent-0
  OpenSSL version: OpenSSL 1.1.1d  10 Sep 2019

Application information:

General paths:
  Config directory: /etc/icinga2
  Data directory: /var/lib/icinga2
  Log directory: /var/log/icinga2
  Cache directory: /var/cache/icinga2
  Spool directory: /var/spool/icinga2
  Run directory: /run/icinga2

Old paths (deprecated):
  Installation root: /usr
  Sysconf directory: /etc
  Run directory (base): /run
  Local state directory: /var

Internal paths:
  Package data directory: /usr/share/icinga2
  State path: /var/lib/icinga2/icinga2.state
  Modified attributes path: /var/lib/icinga2/modified-attributes.conf
  Objects path: /var/cache/icinga2/icinga2.debug
  Vars path: /var/cache/icinga2/icinga2.vars
  PID path: /run/icinga2/icinga2.pid

icingaweb + director infos:
Icinga Web 2 Version: 2.9.5
Git Commit: 053971c99dc1a4510beb64a888ea695cc14032dc
PHP-Version: 7.3.31-1~deb10u1
Git Commit Datum: 2021-11-18

Geladene Bibliotheken

Name Version
icinga/icinga-php-library 0.7.0
icinga/icinga-php-thirdparty 0.10.0

Geladene Module

Name Version
bayerisch 1.0.0
businessprocess 2.3.1
cube 1.1.0
director 1.8.0
doc 2.9.5
fraenkisch 1.0.0
grafana 1.3.6
idoreports 0.9.1
incubator 0.6.0
ipl v0.5.0
jira 1.1.0
monitoring 2.9.5
oesterreichisch 1.0.0
pdfexport 0.9.1
reactbundle 0.9.0
reporting 0.9.2
unicorn 1.0.2
x509 1.0.0

icinga satellite version:

root@icinga-satellite-test-1:~# icinga2 --version
icinga2 - The Icinga 2 network monitoring daemon (version: r2.13.2-1)

Copyright (c) 2012-2022 Icinga GmbH (https://icinga.com/)
License GPLv2+: GNU GPL version 2 or later <https://gnu.org/licenses/gpl2.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

System information:
  Platform: Debian GNU/Linux
  Platform version: 10 (buster)
  Kernel: Linux
  Kernel version: 4.19.0-18-amd64
  Architecture: x86_64

Build information:
  Compiler: GNU 8.3.0
  Build host: runner-hh8q3bz2-project-298-concurrent-0
  OpenSSL version: OpenSSL 1.1.1d  10 Sep 2019

Application information:

General paths:
  Config directory: /etc/icinga2
  Data directory: /var/lib/icinga2
  Log directory: /var/log/icinga2
  Cache directory: /var/cache/icinga2
  Spool directory: /var/spool/icinga2
  Run directory: /run/icinga2

Old paths (deprecated):
  Installation root: /usr
  Sysconf directory: /etc
  Run directory (base): /run
  Local state directory: /var

Internal paths:
  Package data directory: /usr/share/icinga2
  State path: /var/lib/icinga2/icinga2.state
  Modified attributes path: /var/lib/icinga2/modified-attributes.conf
  Objects path: /var/cache/icinga2/icinga2.debug
  Vars path: /var/cache/icinga2/icinga2.vars
  PID path: /run/icinga2/icinga2.pid

One agent from the test zone got reinstalled a few weeks ago. Since the resinstallation the director zone won’t sync on it while other agents work without any problems. I compared the agent config to another agent and i cannot find any issues. In this post i have to censor the names so i will call the “broken” agent “agent-problem”.
Information for agent-problem:
icinga version:

$ icinga2 --version
icinga2 - The Icinga 2 network monitoring daemon (version: 2.13.2-1)
Copyright (c) 2012-2022 Icinga GmbH (https://icinga.com/)
License GPLv2+: GNU GPL version 2 or later <https://gnu.org/licenses/gpl2.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
System information:
  Platform: Oracle Linux Server
  Platform version: 8.3
  Kernel: Linux
  Kernel version: 4.18.0-240.22.1.el8_3.x86_64
  Architecture: x86_64
Build information:
  Compiler: GNU 8.4.1
  Build host: runner-hh8q3bz2-project-322-concurrent-0
  OpenSSL version: OpenSSL 1.1.1g FIPS  21 Apr 2020
Application information:
General paths:
  Config directory: /etc/icinga2
  Data directory: /var/lib/icinga2
  Log directory: /var/log/icinga2
  Cache directory: /var/cache/icinga2
  Spool directory: /var/spool/icinga2
  Run directory: /run/icinga2
Old paths (deprecated):
  Installation root: /usr
  Sysconf directory: /etc
  Run directory (base): /run
  Local state directory: /var
Internal paths:
  Package data directory: /usr/share/icinga2
  State path: /var/lib/icinga2/icinga2.state
  Modified attributes path: /var/lib/icinga2/modified-attributes.conf
  Objects path: /var/cache/icinga2/icinga2.debug
  Vars path: /var/cache/icinga2/icinga2.vars
  PID path: /run/icinga2/icinga2.pid

os version:

$ cat /etc/os-release 
NAME="Oracle Linux Server"
VERSION="8.3"
ID="ol"
ID_LIKE="fedora"
VARIANT="Server"
VARIANT_ID="server"
VERSION_ID="8.3"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Oracle Linux Server 8.3"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:oracle:linux:8:3:server"
HOME_URL="https://linux.oracle.com/"
BUG_REPORT_URL="https://bugzilla.oracle.com/"
ORACLE_BUGZILLA_PRODUCT="Oracle Linux 8"
ORACLE_BUGZILLA_PRODUCT_VERSION=8.3
ORACLE_SUPPORT_PRODUCT="Oracle Linux"
ORACLE_SUPPORT_PRODUCT_VERSION=8.3

/etc/icinga2/zones.conf:

object Endpoint NodeName {
}

object Endpoint "icinga-satellite-test-2" {
host = "icinga-satellite-test-2"
}

object Endpoint "icinga-satellite-test-1" {
host = "icinga-satellite-test-1"
}

object Zone "agent-problem" {
endpoints = [ "agent-problem", ]
parent = "test-satellite"
}

object Zone "director-global" {
global = true
}

object Zone "global-templates" {
global = true
}

object Zone "test-satellite" {
endpoints = [ "icinga-satellite-test-1", "icinga-satellite-test-2", ]
}

The communication and registration between the agent-problem and the satellites from the zone test-satellite is working properly.

At first we only noticed that commands are staged at unknown with the output

Check command ‘check_load’ does not exist.
We reforced a synchronisation of the zone by deleting /var/lib/icinga2/api/zones and zones-stage and restarting the service. After that we got following error with the zones-stage:

[2022-02-01 09:31:18 +0100] information/cli: Icinga application loader (version: 2.13.1-1)
[2022-02-01 09:31:18 +0100] information/cli: Loading configuration file(s).
[2022-02-01 09:31:18 +0100] information/ConfigItem: Committing config item(s).
[2022-02-01 09:31:18 +0100] information/ApiListener: My API identity: << agent-problem >>
[2022-02-01 09:31:18 +0100] critical/config: Error: Array iterator requires value to be an array.
Location: in /var/lib/icinga2/api/zones-stage//director-global/director/service_apply.conf: 681:1-681:61
/var/lib/icinga2/api/zones-stage//director-global/director/service_apply.conf(679): }
/var/lib/icinga2/api/zones-stage//director-global/director/service_apply.conf(680): 
/var/lib/icinga2/api/zones-stage//director-global/director/service_apply.conf(681): apply Service "SMART-Status " for (config in host.vars.disks) {
                                                                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
/var/lib/icinga2/api/zones-stage//director-global/director/service_apply.conf(682):     import "service-agent-template"
/var/lib/icinga2/api/zones-stage//director-global/director/service_apply.conf(683):     import "service-5min"

Context:

	(0) Evaluating 'apply' rule (in /var/lib/icinga2/api/zones-stage//director-global/director/service_apply.conf: 681:1-681:61)
	(1) Evaluating 'apply' rules for host '<< agent-problem >>'
[2022-02-01 09:31:18 +0100] critical/config: 1 error
[2022-02-01 09:31:18 +0100] critical/cli: Config validation failed. Re-run with 'icinga2 daemon -C' after fixing the config.

I have no explanation to this error and the validation on the master and everywhere else on other agents does not fail at all. I wanted to see if the sync will work if i remove allthose checks and without the checks following error appear:

[2022-02-07 12:17:27 +0100] critical/config: Error: Validation failed for object '<< agent-problem >>' of type 'Service'; Attribute 'command_endpoint': Checkable with command endpoint requires a zone. Please check the troubleshooting documentation.
Location: in /var/lib/icinga2/api/zones-stage//director-global/director/service_apply.conf: 13:1-13:19
/var/lib/icinga2/api/zones-stage//director-global/director/service_apply.conf(11): }
/var/lib/icinga2/api/zones-stage//director-global/director/service_apply.conf(12):
/var/lib/icinga2/api/zones-stage//director-global/director/service_apply.conf(13): apply Service "RAM" {
^^^^^^^^^^^^^^^^^^^
/var/lib/icinga2/api/zones-stage//director-global/director/service_apply.conf(14): import "service-agent-template"
/var/lib/icinga2/api/zones-stage//director-global/director/service_apply.conf(15): import "service-5min"
[2022-02-07 12:17:27 +0100] critical/config: 1 error 

After those errors we deleted the agent from the director and tried to add it again but that did not work at all. We checked the configuration and compared it to working agents but we did not see any difference or errors at the time.

Right now we do not have any glues to figure out what the source of the problem is or how to recreate the problem on an other working agent.
Does anybody else have any idea what we could look for or what could be the error?
if you need further information or have other questions i will try to answer as good as i can.
Thanks in advance for your time and help :slight_smile:

Blind guess: Icinga Agent is not enabled for this host object in the director.

We are using Templates to add the host so this should not be able to happen
This is the director config:

zones.d/test-satellite/hosts.conf
object Host "agent-problem" {
    import "agent-test-template"
    import "host-centos-vars"

    display_name = "agent-problem"
    address = "<< ip_address >>"
    vars.ntp_address = "<< our ntp server for the ntp check >>"
}

zones.d/test-satellite/agent_endpoints.conf
object Endpoint "agent-problem" {
    log_duration = 0s
}

zones.d/test-satellite/agent_zones.conf
object Zone "agent-problem" {
    parent = "test-satellite"
    endpoints = [ "agent-problem" ]
}

and this is how it looks resolved:

zones.d/staspt-satellite/hosts.conf
object Host "agent-problem" {
    display_name = "agent-problem"
    address = "<< ip_address >>"
    check_command = "hostalive"
    max_check_attempts = "2"
    check_interval = 30s
    retry_interval = 15s
    enable_active_checks = true
    enable_perfdata = true
    zone = "test-satellite"
    vars.crit_mem = "10"
    vars.is_agent = true
    vars.is_hardware = false
    vars.is_puppet_agent = false
    vars.mail_notification = true
    vars.ntp_address = "<< our ntp >>"
    vars.os = "CentOS"
    vars.os_version = 8
    vars.procs_critical = "1500"
    vars.procs_warning = "1000"
    vars.warn_mem = "20"
}

zones.d/test-satellite/agent_endpoints.conf
object Endpoint "agent-problem" {
    log_duration = 0s
}

zones.d/test-satellite/agent_zones.conf
object Zone "agent-problem" {
    parent = "test-satellite"
    endpoints = [ "agent-problem" ]
}

to show the agent conf visualized from the icinga director conf:


Also the agent is configured to accept commands over api:

$ cat /etc/icinga2/features-enabled/api.conf

object ApiListener "api" {
  accept_commands = true
  accept_config = true
}

You should also delete /var/lib/icinga2/api/packages.

Hi,
we tried your solution but the error is still the same:

$ systemctl stop icinga2
$ rm -rf api/*
$ systemctl start icinga2
$ cat api/zones-stage/startup.log 
[2022-02-11 07:40:46 +0100] information/cli: Icinga application loader (version: 2.13.2-1)
[2022-02-11 07:40:46 +0100] information/cli: Loading configuration file(s).
[2022-02-11 07:40:46 +0100] information/ConfigItem: Committing config item(s).
[2022-02-11 07:40:46 +0100] information/ApiListener: My API identity: << agent-problem >>
[2022-02-11 07:40:47 +0100] critical/config: Error: Array iterator requires value to be an array.
Location: in /var/lib/icinga2/api/zones-stage//director-global/director/service_apply.conf: 2646:1-2646:61
/var/lib/icinga2/api/zones-stage//director-global/director/service_apply.conf(2644): 
/var/lib/icinga2/api/zones-stage//director-global/director/service_apply.conf(2645): */
/var/lib/icinga2/api/zones-stage//director-global/director/service_apply.conf(2646): apply Service "SMART-Status " for (config in host.vars.disks) {
                                                                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
/var/lib/icinga2/api/zones-stage//director-global/director/service_apply.conf(2647):     import "service-agent-template"
/var/lib/icinga2/api/zones-stage//director-global/director/service_apply.conf(2648):     import "service-5min"
Context:
    (0) Evaluating 'apply' rule (in /var/lib/icinga2/api/zones-stage//director-global/director/service_apply.conf: 2646:1-2646:61)
    (1) Evaluating 'apply' rules for host '<< agent-problem >>'
[2022-02-11 07:40:47 +0100] critical/config: 1 error
[2022-02-11 07:40:47 +0100] critical/cli: Config validation failed. Re-run with 'icinga2 daemon -C' after fixing the config.

The error in the log did not change at all. :confused:

It looks like your host.vars.disks isn’t an array.

Yes the error message is that the value is not an array but …
this is the var in the director config

This is the one of the services with that name (we have different alertings and so on for different server so we have to split it in multiple services but the assignement is almost the same)

apply Service "SMART-Status " for (config in host.vars.disks) {
    check_command = "check_smart_status"
    max_check_attempts = "2"
    check_interval = 5m
    retry_interval = 30s
    enable_notifications = true
    enable_active_checks = true
    enable_perfdata = true
    assign where host.vars.disks && match("<< some chosen fqdn >>", host.name)
    command_endpoint = host_name
    vars.check_disk = "/dev/sda"
    vars.config = config

    import DirectorOverrideTemplate
}

as you can see we already filter if the host has the array or not. at the moment the host does not have the array cause we deleted it to see if that solution would work but it does not. The error won’t change even so the host does not get the service apply rule. Only if we delete all those services the error changes as mentioned in the initial post. Also the check is working on other hosts so i would say the check and the configuration of it should be fine.

I’m not sure if I get everything correctly. E.g. you have listed zones.d/staspt-satellite/hosts.conf and I’d assume staspt-satellite is not correct (and maybe a mistake from redaction).

Second, you have object Zone "agent-problem" ... in your zones.conf, but this is not necessary when using the director.

In your host.conf example I don’t see vars.disks.

BTW: There is no need for assign where host.vars.disks since apply for will not create services when the array is empty.

I’m not sure if I get everything correctly. E.g. you have listed zones.d/staspt-> satellite/hosts.conf and I’d assume staspt-satellite is not correct (and maybe a mistake from redaction).

I tried to mask the zones before but i wanted to make sure the screenshot and information is as clean as possible. The problem-agent is connected to the staspt satellites which are in the staspt-satellite zone. i think the the configuration makes sense and so on.

Second, you have object Zone "agent-problem" ... in your zones.conf , but this is not necessary when using the director.

ok i didn’t know that we took the configuration from the official icinga2 documentation Distributed Monitoring - Icinga 2. so far it is working fine for us maybe we will change this in the future.

In your host.conf example I don’t see vars.disks.

Yes i removed the array so that the host doesn’t get the check. i had hoped to fix the problem or to check if the error changes but it only changes if i remove the apply rules completely.

So i found my problem today. The default checks in /etc/icinga2/conf.d also have a variable called disks in it so the check tiggered for the host but not over the director.

1 Like