On the satellites, it checks whether the satellite is connected to the parent-zone (consisting of 2 nodes) → this works perfectly as I expect: If only 1 parent node is not available, the check still says “OK”.
Because I wouldn’t recognize a disconnected zone on the parent node, I also check in the opposite direction:
On the parent zone: For each child zone, I check whether the child zone is connected. The check runs on one of the parent nodes, distributed by icinga2 scheduler.
Problem: The cluster-zone check says “CRITICAL” when it runs on the one node the satellite is currently not connected to. How can I influence this behaviour? I would expect a non-OK-state only if the zone isn’t connected at all.
I see an check-command property “ha_mode 0” within director, but don’t know how to set it (and don’t no if this is the right value).
Looks like a bool so 0 or 1 or true and false. Maybe experiment with it on the cli and then model it in the director to apply it on the hosts or services.
I could be missing something here, but when an endpoint has a parent zone with 2 endpoints, that zone definition needs to have 2 parent endpoints, and hence the child connects to both parents.
This is one reason we built the clustergraph module, to find issues like this, where a child has been misconfigured to only have 1 parent when there are two endpoints in the parent zone.