Handling of unknown Endpoints

Hi,

I am setting up an Icinga2 instance on a windows/linux environment, but I am quite a newbie in the field of monitoring. The master server and all linux clients are running Ubuntu 18.04.2 with r2.10.5-1, all windows clients are running on W7 x64 with v2.10.5. Most of the stuff including sync is working fine. However, I got a strange behavior of the windows clients I am not sure how to handle.

zones.conf on the master:

object Endpoint "DBA" {
        host = "IP-1"
}
object Endpoint "DB-M" {
        host = "IP-3"
}
object Endpoint "DB-2" {
        host = "IP-20"
}
object Endpoint "DB-3" {
        host = "IP-30"
}
object Endpoint "L2-Z3" {
        host = "IP-28"
}
object Endpoint "L2-Z4" {
        host = "IP-29"
}
object Endpoint "L3-Z3" {
        host = "IP-38"
}
object Endpoint "L3-Z4" {
        host = "IP-39"
}

object Zone "master" {
        endpoints = [ "DBA" ]
}
object Zone "DB-M" {
        endpoints = [ "DB-M" ]
        parent = "master"
}
object Zone "DB-2" {
        endpoints = [ "DB-2" ]
        parent = "master"
}
object Zone "DB-3" {
        endpoints = [ "DB-3" ]
        parent = "master"
}
object Zone "L2-Z3" {
        endpoints = [ "L2-Z3" ]
        parent = "master"
}
object Zone "L2-Z4" {
        endpoints = [ "L2-Z4" ]
        parent = "master"
}
object Zone "L3-Z3" {
        endpoints = [ "L3-Z3" ]
        parent = "master"
}
object Zone "L3-Z4" {
        endpoints = [ "L3-Z4" ]
        parent = "master"
}

object Zone "global-commands" {
        global = true
}
object Zone "Linux-commands" {
        global = true
}
object Zone "windows-commands" {
        global = true
}
object Zone "director-global" {
        global = true
}

On the clients, the zones.conf only contain the relevant parts.
Linux-Clients (in this case DB-2):

object Endpoint "DBA" {
        host = "IP-1"
}
object Endpoint "DB-2" {
        host = "IP-20"
}
object Zone "master" {
        endpoints = [ "DBA" ]
}
object Zone "DB-2" {
        endpoints = [ "DB-2" ]
        parent = "master"
}
object Zone "global-commands" {
        global = true
}
object Zone "director-global" {
        global = true
}
object Zone "Linux-commands" {
        global = true
}

Windows-Clients (in this case L3-Z4):

object Endpoint "DBA" {
	host = "IP-1"
}
object Endpoint "L3-Z4" {
	host = "IP-39"
}
object Zone "master" {
	endpoints = [ "DBA" ]
}
object Zone "L3-Z4" {
	endpoints = [ "L3-Z4" ]
	parent = "master"
}

object Zone "global-commands" {
	global = true
}
object Zone "director-global" {
        global = true
}
object Zone "windows-commands" {
	global = true
}

The hosts are defined in the global-commands Zone /etc/icinga2/zones.d/global-commands:
(the file contains all 8 host objects, I just boiled it down a bit)

object Host "DBA" {
        import "generic-host"
        address = "IP-1"
        vars.client_endpoint = name
        vars.os = "Linux"
        zone = "master"
        vars.disks["Disk Usage"] = {
                disk_partitions = "/"
        }

}
object Host "DB-2" {
        import "generic-host"
        address = "IP-20"
        vars.client_endpoint = name
        vars.os = "Linux"
        vars.isdb = true
        check_command = "hostalive"
        zone = "master"
        vars.disks["Disk Usage"] = {
                disk_partitions = ["/", "/other/mountpoint"]
        }
}
object Host "L3-Z3" {
        import "generic-host"
        address = "IP-38"
        vars.client_endpoint = name
        vars.os = "Windows"
        check_command = "hostalive"
        zone = "master"
}
object Host "L3-Z4" {
        import "generic-host"
        address = "IP-39"
        vars.client_endpoint = name
        vars.os = "Windows"
        check_command = "hostalive"
        zone = "master"
}

There are several linux-based checks in the linux-commands directory, which all run fine. For example /etc/icinga2/zones.d/linux-commands/cpu.conf:

apply Service "CPU Load" {
  import "generic-service"
  check_command = "load"
  command_endpoint = host.vars.client_endpoint
  assign where host.vars.os == "Linux"
}

Then there are windows-based checks, for example /etc/icinga2/zones.d/windows-commands/disk.conf:

apply Service "Disk C" {
  check_command = "nscp-local-disk"
  command_endpoint = host.vars.client_endpoint
  vars.nscp_disk_showall = true
  assign where host.vars.client_endpoint && host.vars.os == "Windows"
}

Sure enough, on the linux client everything works fine:

$ icinga2 daemon -C
[2019-06-19 08:18:50 +0000] information/cli: Icinga application loader (version: r2.10.5-1)
[2019-06-19 08:18:50 +0000] information/cli: Loading configuration file(s).
[2019-06-19 08:18:50 +0000] information/ConfigItem: Committing config item(s).
[2019-06-19 08:18:50 +0000] information/ApiListener: My API identity: DB-2
[2019-06-19 08:18:50 +0000] warning/ApplyRule: Apply rule 'mail-icingaadmin' (in /var/lib/icinga2/api/zones/global-commands/_etc/notifications.conf: 23:1-23:48) for type 'Notification' does not match anywhere!
[2019-06-19 08:18:50 +0000] information/ConfigItem: Instantiated 1 IcingaApplication.
[2019-06-19 08:18:50 +0000] information/ConfigItem: Instantiated 8 Hosts.
[2019-06-19 08:18:50 +0000] information/ConfigItem: Instantiated 1 FileLogger.
[2019-06-19 08:18:50 +0000] information/ConfigItem: Instantiated 2 NotificationCommands.
[2019-06-19 08:18:50 +0000] information/ConfigItem: Instantiated 8 Notifications.
[2019-06-19 08:18:50 +0000] information/ConfigItem: Instantiated 2 HostGroups.
[2019-06-19 08:18:50 +0000] information/ConfigItem: Instantiated 1 ApiListener.
[2019-06-19 08:18:50 +0000] information/ConfigItem: Instantiated 1 CheckerComponent.
[2019-06-19 08:18:50 +0000] information/ConfigItem: Instantiated 5 Zones.
[2019-06-19 08:18:50 +0000] information/ConfigItem: Instantiated 2 Endpoints.
[2019-06-19 08:18:50 +0000] information/ConfigItem: Instantiated 1 User.
[2019-06-19 08:18:50 +0000] information/ConfigItem: Instantiated 216 CheckCommands.
[2019-06-19 08:18:50 +0000] information/ConfigItem: Instantiated 1 UserGroup.
[2019-06-19 08:18:50 +0000] information/ConfigItem: Instantiated 7 ServiceGroups.
[2019-06-19 08:18:50 +0000] information/ConfigItem: Instantiated 3 TimePeriods.
[2019-06-19 08:18:50 +0000] information/ScriptGlobal: Dumping variables to file '/var/cache/icinga2/icinga2.vars'
[2019-06-19 08:18:50 +0000] information/cli: Finished validating the configuration file(s).

On the Windows client however, there are strange errors (reduced a bit, there are errors for all services):

[2019-06-19 10:19:47 +0200] information/cli: Icinga application loader (version: v2.10.5)
[2019-06-19 10:19:47 +0200] information/cli: Loading configuration file(s).
[2019-06-19 10:19:47 +0200] information/ConfigItem: Committing config item(s).
[2019-06-19 10:19:47 +0200] information/ApiListener: My API identity: L3-Z3

[2019-06-19 10:19:48 +0200] critical/config: Error: Validation failed for object 'L2-Z3!Disk C' of type 'Service'; Attribute 'command_endpoint': Object 'L2-Z3' of type 'Endpoint' does not exist.
Location: in C:\ProgramData\icinga2\var\lib\icinga2\api\zones/windows-commands/_etc/Disk.conf: 3:3-3:46
C:\ProgramData\icinga2\var\lib\icinga2\api\zones/windows-commands/_etc/Disk.conf(1): apply Service "Disk C" {
C:\ProgramData\icinga2\var\lib\icinga2\api\zones/windows-commands/_etc/Disk.conf(2):   check_command = "nscp-local-disk"
C:\ProgramData\icinga2\var\lib\icinga2\api\zones/windows-commands/_etc/Disk.conf(3):   command_endpoint = host.vars.client_endpoint
                                                                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
C:\ProgramData\icinga2\var\lib\icinga2\api\zones/windows-commands/_etc/Disk.conf(4):   vars.nscp_disk_showall = true
C:\ProgramData\icinga2\var\lib\icinga2\api\zones/windows-commands/_etc/Disk.conf(5):   assign where host.vars.client_endpoint && host.vars.os == "Windows"


[2019-06-19 10:19:48 +0200] critical/config: Error: Validation failed for object 'L2-Z3!Memory Usage' of type 'Service'; Attribute 'command_endpoint': Object 'L2-Z3' of type 'Endpoint' does not exist.
Location: in C:\ProgramData\icinga2\var\lib\icinga2\api\zones/windows-commands/_etc/memory.conf: 3:3-3:46
C:\ProgramData\icinga2\var\lib\icinga2\api\zones/windows-commands/_etc/memory.conf(1): apply Service "Memory Usage" {
C:\ProgramData\icinga2\var\lib\icinga2\api\zones/windows-commands/_etc/memory.conf(2):   check_command = "nscp-local-memory"
C:\ProgramData\icinga2\var\lib\icinga2\api\zones/windows-commands/_etc/memory.conf(3):   command_endpoint = host.vars.client_endpoint
                                                                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
C:\ProgramData\icinga2\var\lib\icinga2\api\zones/windows-commands/_etc/memory.conf(4):   vars.nscp_memory_showall = true
C:\ProgramData\icinga2\var\lib\icinga2\api\zones/windows-commands/_etc/memory.conf(5):   assign where host.vars.client_endpoint && host.vars.os == "Windows"


[2019-06-19 10:19:48 +0200] critical/config: Error: Validation failed for object 'L3-Z4!Disk C' of type 'Service'; Attribute 'command_endpoint': Object 'L3-Z4' of type 'Endpoint' does not exist.
Location: in C:\ProgramData\icinga2\var\lib\icinga2\api\zones/windows-commands/_etc/Disk.conf: 3:3-3:46
C:\ProgramData\icinga2\var\lib\icinga2\api\zones/windows-commands/_etc/Disk.conf(1): apply Service "Disk C" {
C:\ProgramData\icinga2\var\lib\icinga2\api\zones/windows-commands/_etc/Disk.conf(2):   check_command = "nscp-local-disk"
C:\ProgramData\icinga2\var\lib\icinga2\api\zones/windows-commands/_etc/Disk.conf(3):   command_endpoint = host.vars.client_endpoint
                                                                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
C:\ProgramData\icinga2\var\lib\icinga2\api\zones/windows-commands/_etc/Disk.conf(4):   vars.nscp_disk_showall = true
C:\ProgramData\icinga2\var\lib\icinga2\api\zones/windows-commands/_etc/Disk.conf(5):   assign where host.vars.client_endpoint && host.vars.os == "Windows"


[2019-06-19 10:19:48 +0200] critical/config: Error: Validation failed for object 'L3-Z4!Memory Usage' of type 'Service'; Attribute 'command_endpoint': Object 'L3-Z4' of type 'Endpoint' does not exist.
Location: in C:\ProgramData\icinga2\var\lib\icinga2\api\zones/windows-commands/_etc/memory.conf: 3:3-3:46
C:\ProgramData\icinga2\var\lib\icinga2\api\zones/windows-commands/_etc/memory.conf(1): apply Service "Memory Usage" {
C:\ProgramData\icinga2\var\lib\icinga2\api\zones/windows-commands/_etc/memory.conf(2):   check_command = "nscp-local-memory"
C:\ProgramData\icinga2\var\lib\icinga2\api\zones/windows-commands/_etc/memory.conf(3):   command_endpoint = host.vars.client_endpoint
                                                                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
C:\ProgramData\icinga2\var\lib\icinga2\api\zones/windows-commands/_etc/memory.conf(4):   vars.nscp_memory_showall = true
C:\ProgramData\icinga2\var\lib\icinga2\api\zones/windows-commands/_etc/memory.conf(5):   assign where host.vars.client_endpoint && host.vars.os == "Windows"

[2019-06-19 10:19:48 +0200] critical/config: 12 errors

I know why the errors are there, the endpoints are not defined in the clients zones.conf file. Some questions here:

  1. Why do I get those errors only on Windows? The same setup runs on the Linux clients, but no errors there
  2. If I clear the synced zones, the windows service starts fine and it reports data to the master (as it did after the initial setup). However, if i restart the windows service, it will immediately stop again, without any warning or log entry. If I run the daemon from command line I can see the errors though.
  3. What would be the recommended setup here? What would happen if I just copy the full zones.conf form the master to the clients? In my understanding, all satellites would trigger all tests on all the other servers. Do I have a misunderstanding here?

Thanks in advance
Manuel

Erm, that doesn’t work and the master will already stop with a critical error when a host is put into a global zone. Please show the full output of icinga2 object list --type Host --name DBA to gain more insights where this really is defined on the master.

The Windows client is called L3-Z3 but for some reason there is a host object synced to it called L3-Z2 where the service apply rule matches upon, and reads the host.vars.client_endpoint attribute.

Why is that the case?

Furthermore, since you’re using command endpoints as execution bridge, you don’t need to fully sync the apply rules and host objects to the clients. Instead, move them to the satellite/master zone above, and only let the checks being pushed via command endpoint.

Cheers,
Michael

Well, it is working :slight_smile:
Here the output of the command:

root@master:~$ icinga2 object list --type Host --name DBA
Object 'DBA' of type 'Host':
  % declared in '/etc/icinga2/zones.d/global-commands/hosts.conf', lines 1:0-1:16
  * __name = "DBA"
  * action_url = ""
  * address = "IP-1"
    % = modified in '/etc/icinga2/zones.d/global-commands/hosts.conf', lines 3:2-3:24
  * address6 = ""
  * check_command = "hostalive"
    % = modified in '/etc/icinga2/zones.d/global-commands/templates.conf', lines 19:3-19:29
  * check_interval = 60
    % = modified in '/etc/icinga2/zones.d/global-commands/templates.conf', lines 16:3-16:21
  * check_period = ""
  * check_timeout = null
  * command_endpoint = ""
  * display_name = "DBA"
  * enable_active_checks = true
  * enable_event_handler = true
  * enable_flapping = false
  * enable_notifications = true
  * enable_passive_checks = true
  * enable_perfdata = true
  * event_command = ""
  * flapping_threshold = 0
  * flapping_threshold_high = 30
  * flapping_threshold_low = 25
  * groups = [ ]
  * icon_image = ""
  * icon_image_alt = ""
  * max_check_attempts = 3
    % = modified in '/etc/icinga2/zones.d/global-commands/templates.conf', lines 15:3-15:24
  * name = "DBA"
  * notes = ""
  * notes_url = ""
  * package = "_etc"
  * retry_interval = 30
    % = modified in '/etc/icinga2/zones.d/global-commands/templates.conf', lines 17:3-17:22
  * source_location
    * first_column = 0
    * first_line = 1
    * last_column = 16
    * last_line = 1
    * path = "/etc/icinga2/zones.d/global-commands/hosts.conf"
  * templates = [ "DBA", "generic-host" ]
    % = modified in '/etc/icinga2/zones.d/global-commands/hosts.conf', lines 1:0-1:16
    % = modified in '/etc/icinga2/zones.d/global-commands/templates.conf', lines 14:1-14:28
  * type = "Host"
  * vars
    * client_endpoint = "DBA"
      % = modified in '/etc/icinga2/zones.d/global-commands/hosts.conf', lines 4:2-4:28
    * disks
      * Disk Usage
        % = modified in '/etc/icinga2/zones.d/global-commands/hosts.conf', lines 7:2-9:2
        * disk_partitions = "/"
    * notification
      * mail
        % = modified in '/etc/icinga2/zones.d/global-commands/templates.conf', lines 20:3-22:3
        * groups = [ "icingaadmins" ]
    * os = "Linux"
      % = modified in '/etc/icinga2/zones.d/global-commands/hosts.conf', lines 5:2-5:18
  * volatile = false
  * zone = "master"
    % = modified in '/etc/icinga2/zones.d/global-commands/hosts.conf', lines 6:2-6:16

I’m not sure where you read that, there is no reference to L3-Z2 anywhere, neither in this post nor in the configuration :thinking:

OK, I have checked the documentation again and see your point. I have started playing around with the linux satellites and just applied the same configuration to the windows clients, that’s why the config is a bit messed up.

The final layout should be as follows I think:
Sync CheckCommands in global zone to satellites & clients
Windows: define hosts and run services in the master zone with command_endpoint set. This will schedule the checks on the master and execute them on the client
Linux: sync everything and then? How do I define the behaviour of scheduling and executing locally?

Thanks,
Manuel

In my tests it is not, so I am not sure what’s going on here. Anyhow, even if you manage to put a host into a global zone, this technically won’t work - all endpoints will attempt to run the checks for this objects resulting in mixed up check results on the master, and flapping results. This is discouraged and bad practice.

Still wondering how you could trick the config compiler though.

Mixed numbers, always hard in examples. It is L2-Z3 vs L3-Z3.

[2019-06-19 10:19:47 +0200] information/ApiListener: My API identity: L3-Z3

[2019-06-19 10:19:48 +0200] critical/config: Error: Validation failed for object 'L2-Z3!Disk C' of type 'Service'; Attribute 'command_endpoint': Object 'L2-Z3' of type 'Endpoint' does not exist.

Treat Windows and Linux clients the same - just the plugins and check commands may differ, but the command_endpoint setting is generally the same. This allows to model more advanced configuration such as …

apply Service "disk" {
  if (host.vars.os_type == "windows") {
    check_command = "disk-windows"
    vars.disk_win_warn = "..." //specific thresholds
  } else {
    //Linux
   check_command = "disk"
   vars.disk_warn = "..." //specific thresholds
  }

  command_endpoint = host.vars.client_endpoint

  assign where host.vars.client_endpoint
}

The above is not a must, and also only possible with static config files and the DSL. In case you’re using the Director, go for service sets and apply rules in there which include e.g. the os_type custom var on the host object/template in their assign where expressions.

Cheers,
Michael