Director - Some services in a different zone than the host

When using Icinga Director, I have a very small subset of services that are being applied to a different zone than the host. This causes a problem in the zone where the service is deployed, because the config in zones-stage cannot be loaded because the host does not exist:

critical/config: Error: Validation failed for object 'some_redacted_host!SERVICE_NAME' of type 'Service'; Attribute 'host_name': Object 'some_redacted_host' of type 'Host' does not exist.
Location: in /var/lib/icinga2/api/zones-stage//Zone-VPN/director/services.conf: 49629:5-49629:42
/var/lib/icinga2/api/zones-stage//Zone-VPN/director/services.conf(49627):
/var/lib/icinga2/api/zones-stage//Zone-VPN/director/services.conf(49628): object Service "SERVICE_NAME" {
/var/lib/icinga2/api/zones-stage//Zone-VPN/director/services.conf(49629):     host_name = "some_redacted_host"
                                                                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
/var/lib/icinga2/api/zones-stage//Zone-VPN/director/services.conf(49630):     import "Service_Template_5min"
/var/lib/icinga2/api/zones-stage//Zone-VPN/director/services.conf(49631):     import "BACKBONE"

[2020-05-14 09:42:05 -0500] critical/config: 48 errors

This occurs a good bit (48 times looking at the output above) for a small subset of hosts. There appears to be no commonality between the hosts or the services other than they are all importing “BACKBONE”. It’s worth noting at this point that not all services that are importing “BACKBONE” are affected by this.

During the Director import process, we assign the zone based off of a field from a SQL query to the HOST, not the service.

The above services are trying to go to Zone-VPN, but the hosts and other services under the hosts live in Zone-2.

With the above information about the services, I found that the “Cluster Zone” property is NOT set on:

  • check command
  • command template
  • service templates

I found the solution to my problem while at the very end of my post, and decided to go ahead and post the problem, with my solution here in case anyone else sees it:

While looking, I thought that the given services were deployed by a service apply rule, but it turns out that this particular service is imported via an automation job in director. One of the the sync properties sets the zone that the service should exist in. It used to be that these services were all in the same zone, but we split them out due to the large number of SNMP queries we were performing. After splitting them out (about 2 months ago), the services were never split out either. I removed the Zone property form the service’s sync rule, and everything seems fine.

What caused me to discover this 2 months after the fact wasn’t that the services weren’t in monitoring:
an end user pointed out to me that a small subset of services were stuck in the pending state. During troubleshooting I found that all of the seemingly random services were in the Zone-VPN zone – the very zone that couldn’t load the zones-stage config. Only services/hosts that are created in the Zone after I made the change are affected, which is why it was only a small subset of services.

1 Like