Change Output/State Result of Built-in Commands

XilityWorks · May 5, 2023, 7:42pm

I am using Director to deploy configuration from my master host down to an aggregation host which then handles all inbound connections from ~22 satellites. This top-down configuration has the configs replicate from the master node, to the aggregation host, then to each of our various zones and any sub-zones therein. I believe I may have discovered what might be a bug in Director or maybe just an unfortunate circumstance with how Director deploys config.

My environment

The Problem

Director does not validate config at all levels when using a top-down distributed infrastructure. This, in turn, causes Director to deploy what initially looks like valid configuration that falls apart at later stages in the deployment

How to Recreate

I believe any bad zone-based config will work but our organization mistakenly ran into it when moving a host from an incorrect zone without moving its parent as well:

Set Host A as a host dependency of Host B
Move Host B to a different zone than Host A.
Deploy config
Config will validate on master node then deploy downward
The lowest satellite in the chain before the divergence of the two zones (Host A and Host B) will record the following log before halting the deployment:

[2023-01-31 13:56:11 -0600] critical/ApiListener: Config validation failed for staged cluster config sync in ‘/var/lib/icinga2/api/zones-stage/’. Aborting. Logs: ‘/var/lib/icinga2/api//zones-stage-startup-last-failed.log’

Impact

Satellites and agents below this divergence never receive deployment updates. The only way to know that this issue has happened is to check the logs on the host (/var/log/icinga2/icinga2.log) or using the built-in “icinga” command. The “icinga” command however, only produces failed deployments as a WARNING with seemingly no option to change this to CRITICAL or otherwise.

Attempted Workarounds

I have tried to use the “negate” command that ships with Monitoring-Plugins and is included in the ITL. This command requires the command that is being “negated” to be referenced by absolute path meaning it only works for third-party plugins as there is no path to the built-in “icinga” command. There is a feature request that has been open and unassigned since November that would solve my issue, I believe: https://github.com/Icinga/icinga2/issues/9475

I tried using the “logfiles” command as well. However, this is a third-party plugin that does not ship with Monitoring-Plugins or Icinga2. Deploying third-party plugins is something we’re still working on solving internally so I’d like to avoid adding to the list of third-party plugins we need to deploy.

Preferred Solution

Obviously, I would prefer people just don’t make bad config changes to my environment. However, accidents happen and given that implementing JIT config validation for every leg of the deployment seems like a big feature request. I would like to be notified when someone does mess up with a “CRITICAL” problem state. Allow me to configure the command “icinga” to output a failed configuration deployment as a “CRITICAL” problem instead of a “WARNING”. (ideally it would be nice if the returned state was left to user configuration)

I’ll leave this open for a few days before opening a feature request if no one has any ideas. Thank you for your time!