Cluster-zone check: Zone '...' is not connected. Log lag: less than 1 millisecond

I’m trying to implement cluster zone health check as per in the documentation.
I have 4 zones including the master zone, 3 up, 1 down, however, all zones are marked as “not connected”, even the master zone.
error message is in the topic title.

I tried to raise the warn/crit levels, no success.
The results from the agents are coming correctly from the zones, but the cluster-zone check seems fail.
how can I debug this?
also is there a way to debug the icinga2 built in checks in the ITL?

How does your service object look like?

apply Service “Cluster Zone Health” {
import “template-cluster-zone-check”

assign where host.vars.agent_endpoint
vars.cluster_lag_critical = 60000
vars.cluster_lag_warning = 30000
vars.cluster_zone = "$$"

import DirectorOverrideTemplate


This is wrong:

and you can remove it since the ITL definition points to $$ by default as described here.

my Zone names are not equal to
also the checks seems good applied, as the correct zone names and queried
update: removed this variable, but no change

You should read this chapter especially the convention parts e.g.: " agent nodes also have their own unique zone. By convention you must use the FQDN for the zone name"

ok, thanks, but it does not seem to resolve my issue

the zone names are valid, they provide data, it says lag is under 1ms, so why is it shows as “down”?

I’m wondering what you mean with down since a service check does not have this state available.

sorry, not down but critical

do you have any idea, how to debug the ITL commands from shell?
maybe that can help debuggning this issue

You could use the icinga2 console as described here.