Hi,
I am setting up an Icinga2 instance on a windows/linux environment, but I am quite a newbie in the field of monitoring. The master server and all linux clients are running Ubuntu 18.04.2 with r2.10.5-1, all windows clients are running on W7 x64 with v2.10.5. Most of the stuff including sync is working fine. However, I got a strange behavior of the windows clients I am not sure how to handle.
zones.conf on the master:
object Endpoint "DBA" {
host = "IP-1"
}
object Endpoint "DB-M" {
host = "IP-3"
}
object Endpoint "DB-2" {
host = "IP-20"
}
object Endpoint "DB-3" {
host = "IP-30"
}
object Endpoint "L2-Z3" {
host = "IP-28"
}
object Endpoint "L2-Z4" {
host = "IP-29"
}
object Endpoint "L3-Z3" {
host = "IP-38"
}
object Endpoint "L3-Z4" {
host = "IP-39"
}
object Zone "master" {
endpoints = [ "DBA" ]
}
object Zone "DB-M" {
endpoints = [ "DB-M" ]
parent = "master"
}
object Zone "DB-2" {
endpoints = [ "DB-2" ]
parent = "master"
}
object Zone "DB-3" {
endpoints = [ "DB-3" ]
parent = "master"
}
object Zone "L2-Z3" {
endpoints = [ "L2-Z3" ]
parent = "master"
}
object Zone "L2-Z4" {
endpoints = [ "L2-Z4" ]
parent = "master"
}
object Zone "L3-Z3" {
endpoints = [ "L3-Z3" ]
parent = "master"
}
object Zone "L3-Z4" {
endpoints = [ "L3-Z4" ]
parent = "master"
}
object Zone "global-commands" {
global = true
}
object Zone "Linux-commands" {
global = true
}
object Zone "windows-commands" {
global = true
}
object Zone "director-global" {
global = true
}
On the clients, the zones.conf only contain the relevant parts.
Linux-Clients (in this case DB-2):
object Endpoint "DBA" {
host = "IP-1"
}
object Endpoint "DB-2" {
host = "IP-20"
}
object Zone "master" {
endpoints = [ "DBA" ]
}
object Zone "DB-2" {
endpoints = [ "DB-2" ]
parent = "master"
}
object Zone "global-commands" {
global = true
}
object Zone "director-global" {
global = true
}
object Zone "Linux-commands" {
global = true
}
Windows-Clients (in this case L3-Z4):
object Endpoint "DBA" {
host = "IP-1"
}
object Endpoint "L3-Z4" {
host = "IP-39"
}
object Zone "master" {
endpoints = [ "DBA" ]
}
object Zone "L3-Z4" {
endpoints = [ "L3-Z4" ]
parent = "master"
}
object Zone "global-commands" {
global = true
}
object Zone "director-global" {
global = true
}
object Zone "windows-commands" {
global = true
}
The hosts are defined in the global-commands Zone /etc/icinga2/zones.d/global-commands:
(the file contains all 8 host objects, I just boiled it down a bit)
object Host "DBA" {
import "generic-host"
address = "IP-1"
vars.client_endpoint = name
vars.os = "Linux"
zone = "master"
vars.disks["Disk Usage"] = {
disk_partitions = "/"
}
}
object Host "DB-2" {
import "generic-host"
address = "IP-20"
vars.client_endpoint = name
vars.os = "Linux"
vars.isdb = true
check_command = "hostalive"
zone = "master"
vars.disks["Disk Usage"] = {
disk_partitions = ["/", "/other/mountpoint"]
}
}
object Host "L3-Z3" {
import "generic-host"
address = "IP-38"
vars.client_endpoint = name
vars.os = "Windows"
check_command = "hostalive"
zone = "master"
}
object Host "L3-Z4" {
import "generic-host"
address = "IP-39"
vars.client_endpoint = name
vars.os = "Windows"
check_command = "hostalive"
zone = "master"
}
There are several linux-based checks in the linux-commands directory, which all run fine. For example /etc/icinga2/zones.d/linux-commands/cpu.conf:
apply Service "CPU Load" {
import "generic-service"
check_command = "load"
command_endpoint = host.vars.client_endpoint
assign where host.vars.os == "Linux"
}
Then there are windows-based checks, for example /etc/icinga2/zones.d/windows-commands/disk.conf:
apply Service "Disk C" {
check_command = "nscp-local-disk"
command_endpoint = host.vars.client_endpoint
vars.nscp_disk_showall = true
assign where host.vars.client_endpoint && host.vars.os == "Windows"
}
Sure enough, on the linux client everything works fine:
$ icinga2 daemon -C
[2019-06-19 08:18:50 +0000] information/cli: Icinga application loader (version: r2.10.5-1)
[2019-06-19 08:18:50 +0000] information/cli: Loading configuration file(s).
[2019-06-19 08:18:50 +0000] information/ConfigItem: Committing config item(s).
[2019-06-19 08:18:50 +0000] information/ApiListener: My API identity: DB-2
[2019-06-19 08:18:50 +0000] warning/ApplyRule: Apply rule 'mail-icingaadmin' (in /var/lib/icinga2/api/zones/global-commands/_etc/notifications.conf: 23:1-23:48) for type 'Notification' does not match anywhere!
[2019-06-19 08:18:50 +0000] information/ConfigItem: Instantiated 1 IcingaApplication.
[2019-06-19 08:18:50 +0000] information/ConfigItem: Instantiated 8 Hosts.
[2019-06-19 08:18:50 +0000] information/ConfigItem: Instantiated 1 FileLogger.
[2019-06-19 08:18:50 +0000] information/ConfigItem: Instantiated 2 NotificationCommands.
[2019-06-19 08:18:50 +0000] information/ConfigItem: Instantiated 8 Notifications.
[2019-06-19 08:18:50 +0000] information/ConfigItem: Instantiated 2 HostGroups.
[2019-06-19 08:18:50 +0000] information/ConfigItem: Instantiated 1 ApiListener.
[2019-06-19 08:18:50 +0000] information/ConfigItem: Instantiated 1 CheckerComponent.
[2019-06-19 08:18:50 +0000] information/ConfigItem: Instantiated 5 Zones.
[2019-06-19 08:18:50 +0000] information/ConfigItem: Instantiated 2 Endpoints.
[2019-06-19 08:18:50 +0000] information/ConfigItem: Instantiated 1 User.
[2019-06-19 08:18:50 +0000] information/ConfigItem: Instantiated 216 CheckCommands.
[2019-06-19 08:18:50 +0000] information/ConfigItem: Instantiated 1 UserGroup.
[2019-06-19 08:18:50 +0000] information/ConfigItem: Instantiated 7 ServiceGroups.
[2019-06-19 08:18:50 +0000] information/ConfigItem: Instantiated 3 TimePeriods.
[2019-06-19 08:18:50 +0000] information/ScriptGlobal: Dumping variables to file '/var/cache/icinga2/icinga2.vars'
[2019-06-19 08:18:50 +0000] information/cli: Finished validating the configuration file(s).
On the Windows client however, there are strange errors (reduced a bit, there are errors for all services):
[2019-06-19 10:19:47 +0200] information/cli: Icinga application loader (version: v2.10.5)
[2019-06-19 10:19:47 +0200] information/cli: Loading configuration file(s).
[2019-06-19 10:19:47 +0200] information/ConfigItem: Committing config item(s).
[2019-06-19 10:19:47 +0200] information/ApiListener: My API identity: L3-Z3
[2019-06-19 10:19:48 +0200] critical/config: Error: Validation failed for object 'L2-Z3!Disk C' of type 'Service'; Attribute 'command_endpoint': Object 'L2-Z3' of type 'Endpoint' does not exist.
Location: in C:\ProgramData\icinga2\var\lib\icinga2\api\zones/windows-commands/_etc/Disk.conf: 3:3-3:46
C:\ProgramData\icinga2\var\lib\icinga2\api\zones/windows-commands/_etc/Disk.conf(1): apply Service "Disk C" {
C:\ProgramData\icinga2\var\lib\icinga2\api\zones/windows-commands/_etc/Disk.conf(2): check_command = "nscp-local-disk"
C:\ProgramData\icinga2\var\lib\icinga2\api\zones/windows-commands/_etc/Disk.conf(3): command_endpoint = host.vars.client_endpoint
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
C:\ProgramData\icinga2\var\lib\icinga2\api\zones/windows-commands/_etc/Disk.conf(4): vars.nscp_disk_showall = true
C:\ProgramData\icinga2\var\lib\icinga2\api\zones/windows-commands/_etc/Disk.conf(5): assign where host.vars.client_endpoint && host.vars.os == "Windows"
[2019-06-19 10:19:48 +0200] critical/config: Error: Validation failed for object 'L2-Z3!Memory Usage' of type 'Service'; Attribute 'command_endpoint': Object 'L2-Z3' of type 'Endpoint' does not exist.
Location: in C:\ProgramData\icinga2\var\lib\icinga2\api\zones/windows-commands/_etc/memory.conf: 3:3-3:46
C:\ProgramData\icinga2\var\lib\icinga2\api\zones/windows-commands/_etc/memory.conf(1): apply Service "Memory Usage" {
C:\ProgramData\icinga2\var\lib\icinga2\api\zones/windows-commands/_etc/memory.conf(2): check_command = "nscp-local-memory"
C:\ProgramData\icinga2\var\lib\icinga2\api\zones/windows-commands/_etc/memory.conf(3): command_endpoint = host.vars.client_endpoint
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
C:\ProgramData\icinga2\var\lib\icinga2\api\zones/windows-commands/_etc/memory.conf(4): vars.nscp_memory_showall = true
C:\ProgramData\icinga2\var\lib\icinga2\api\zones/windows-commands/_etc/memory.conf(5): assign where host.vars.client_endpoint && host.vars.os == "Windows"
[2019-06-19 10:19:48 +0200] critical/config: Error: Validation failed for object 'L3-Z4!Disk C' of type 'Service'; Attribute 'command_endpoint': Object 'L3-Z4' of type 'Endpoint' does not exist.
Location: in C:\ProgramData\icinga2\var\lib\icinga2\api\zones/windows-commands/_etc/Disk.conf: 3:3-3:46
C:\ProgramData\icinga2\var\lib\icinga2\api\zones/windows-commands/_etc/Disk.conf(1): apply Service "Disk C" {
C:\ProgramData\icinga2\var\lib\icinga2\api\zones/windows-commands/_etc/Disk.conf(2): check_command = "nscp-local-disk"
C:\ProgramData\icinga2\var\lib\icinga2\api\zones/windows-commands/_etc/Disk.conf(3): command_endpoint = host.vars.client_endpoint
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
C:\ProgramData\icinga2\var\lib\icinga2\api\zones/windows-commands/_etc/Disk.conf(4): vars.nscp_disk_showall = true
C:\ProgramData\icinga2\var\lib\icinga2\api\zones/windows-commands/_etc/Disk.conf(5): assign where host.vars.client_endpoint && host.vars.os == "Windows"
[2019-06-19 10:19:48 +0200] critical/config: Error: Validation failed for object 'L3-Z4!Memory Usage' of type 'Service'; Attribute 'command_endpoint': Object 'L3-Z4' of type 'Endpoint' does not exist.
Location: in C:\ProgramData\icinga2\var\lib\icinga2\api\zones/windows-commands/_etc/memory.conf: 3:3-3:46
C:\ProgramData\icinga2\var\lib\icinga2\api\zones/windows-commands/_etc/memory.conf(1): apply Service "Memory Usage" {
C:\ProgramData\icinga2\var\lib\icinga2\api\zones/windows-commands/_etc/memory.conf(2): check_command = "nscp-local-memory"
C:\ProgramData\icinga2\var\lib\icinga2\api\zones/windows-commands/_etc/memory.conf(3): command_endpoint = host.vars.client_endpoint
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
C:\ProgramData\icinga2\var\lib\icinga2\api\zones/windows-commands/_etc/memory.conf(4): vars.nscp_memory_showall = true
C:\ProgramData\icinga2\var\lib\icinga2\api\zones/windows-commands/_etc/memory.conf(5): assign where host.vars.client_endpoint && host.vars.os == "Windows"
[2019-06-19 10:19:48 +0200] critical/config: 12 errors
I know why the errors are there, the endpoints are not defined in the clients zones.conf file. Some questions here:
- Why do I get those errors only on Windows? The same setup runs on the Linux clients, but no errors there
- If I clear the synced zones, the windows service starts fine and it reports data to the master (as it did after the initial setup). However, if i restart the windows service, it will immediately stop again, without any warning or log entry. If I run the daemon from command line I can see the errors though.
- What would be the recommended setup here? What would happen if I just copy the full zones.conf form the master to the clients? In my understanding, all satellites would trigger all tests on all the other servers. Do I have a misunderstanding here?
Thanks in advance
Manuel