That was my understanding too - I was under the impression there were cluster/config sync bugs in ~2.10 which implied the config master should be the one to receive the transport commands, but I could be wrong on that.
Perhaps though the transport is unrelated, as I am having other issues with my core now since upgrading with timing, and I am starting to think this is the root of it all.
Our Icinga Web is containerized, so only using the API and no local command pipe - I can rule this out being a bug or any weird behaviour, I think it was purely due to the lack of a secondary master being configured in the monitoring module…
This is what was happening prior to me adding both masters as transports - assuming the following:
Only config master (primary) configured in the monitoring module
Secondary master current active endpoint
Ack any alert
Icinga Web sends alert to primary (as it is the only configured endpoint)
Secondary master does not receive the ack as the primary is not writing to the IDO because the secondary is the active endpoint, but Icinga Web displays it as un-ack’d still.
When attempting to re-ack, Icinga Web repeats step 3.
Adding the secondary master as a transport seems that the ack was written to the IDO, hence why Icinga Web was displaying it properly.
I haven’t tried stopping one master after sending an ack since we’re running into the other issues I mentioned, so we’ve removed the secondary master entirely until I can work out whether this is a config issue or a bug in 2.12.2…
However I am also curious to know what happens if Icinga Web sends the API request to the config master/primary while it is not the active endpoint - or maybe it sends to both and assumes one will deal with it correctly.
May I ask you if you or anyone else from the icinga team has an explanation for this “issue” of having only one or both master servers in the command transports?
The current behavior noticed is somewhat contradicting the documentation.
maybe we get the correct way to configure this and also a way to test this better in our setups. Maybe we get also a way to identify it if this is a bug or a normal behavior. If this is a bug we can deliever logs etc.
I guess the monitored object was acked in Icinga 2, but this fact got lost from (or was missing at all in) the IDO DB. I guess because the IDO writing master’s API didn’t accept_config nor accept_commands from the one which acked the checkable.
My config master has only accept_commands = true, the second master has both set to true.
Afair setting accept_config to true on the config master is not recommended (or doesn’t even work?!).
Ack was done while the config master was the active endpoint.
As having both masters in the command transports seems to be the way to go, I will try to remember that in future setups.
Actually I haven’t checked that in some time.
Also not sure why this is stuck in my head.
Maybe from this line in the log, I don’t know. information/ApiListener: Ignoring config update from endpoint 'ma01' for zone 'azure' because we have an authoritative version of the zone's config.
Then I will from now on put this in the features-enabled/api.conf of clusters as well
Thanks for the heads up.
Things not syncing properly since 2.12.2 in a HA setup sound suspiciously like it could be related to the following bug that was introduced there and is fixed in the already released 2.12.3: https://github.com/Icinga/icinga2/issues/8533 If this is the case, you should be able to find a log message similar to the one in the issue.