Acknowledgements not syncing between masters

Interesting.
Though this does not really make sense to me, as only the first working command transport will/should be used, at least that’s how I understand the documentation:

Icinga Web 2 will try one transport after another to send a command until the command is successfully sent.

But I just tried it myself after having added the second master as a command transport as well.
And it really works.

I would really like to know why or where my mistake in understanding lies.

That was my understanding too - I was under the impression there were cluster/config sync bugs in ~2.10 which implied the config master should be the one to receive the transport commands, but I could be wrong on that.

Perhaps though the transport is unrelated, as I am having other issues with my core now since upgrading with timing, and I am starting to think this is the root of it all.

Do I understand you correctly: You configured API and the local command pipe (cmd-file) also? And after it works?
My knowledge was that we should better use the API (only?) instead.

Our Icinga Web is containerized, so only using the API and no local command pipe - I can rule this out being a bug or any weird behaviour, I think it was purely due to the lack of a secondary master being configured in the monitoring module…

This is what was happening prior to me adding both masters as transports - assuming the following:

  • Only config master (primary) configured in the monitoring module
  • Secondary master current active endpoint
  1. Ack any alert
  2. Icinga Web sends alert to primary (as it is the only configured endpoint)
  3. Secondary master does not receive the ack as the primary is not writing to the IDO because the secondary is the active endpoint, but Icinga Web displays it as un-ack’d still.
  4. When attempting to re-ack, Icinga Web repeats step 3.

Adding the secondary master as a transport seems that the ack was written to the IDO, hence why Icinga Web was displaying it properly.

I haven’t tried stopping one master after sending an ack since we’re running into the other issues I mentioned, so we’ve removed the secondary master entirely until I can work out whether this is a config issue or a bug in 2.12.2…

However I am also curious to know what happens if Icinga Web sends the API request to the config master/primary while it is not the active endpoint - or maybe it sends to both and assumes one will deal with it correctly.

1 Like

@theFeu
May I ask you if you or anyone else from the icinga team has an explanation for this “issue” of having only one or both master servers in the command transports?
The current behavior noticed is somewhat contradicting the documentation.

2 Likes

good idea @log1c

maybe we get the correct way to configure this and also a way to test this better in our setups. Maybe we get also a way to identify it if this is a bug or a normal behavior. If this is a bug we can deliever logs etc.

Hi there!
Sorry for the late reply, I was holed up for my holiday :man_technologist:
I will ask around :slight_smile:

I guess the monitored object was acked in Icinga 2, but this fact got lost from (or was missing at all in) the IDO DB. I guess because the IDO writing master’s API didn’t accept_config nor accept_commands from the one which acked the checkable.

Thanks for the reply @Al2Klimov (and thanks for asking around @theFeu).

My config master has only accept_commands = true, the second master has both set to true.
Afair setting accept_config to true on the config master is not recommended (or doesn’t even work?!).
Ack was done while the config master was the active endpoint.

As having both masters in the command transports seems to be the way to go, I will try to remember that in future setups.

Also thanks from my side @Al2Klimov.

In our setup we have both masters configured with

accept_config = true
accept_commands = true

every master has it’s own db.
But here it also happens also sometimes.

Is IDO HA explicitly disabled?

Does either the documentation or one of us devs say so?

yes it is set to false on both nodes

Actually I haven’t checked that in some time.
Also not sure why this is stuck in my head.
Maybe from this line in the log, I don’t know.
information/ApiListener: Ignoring config update from endpoint 'ma01' for zone 'azure' because we have an authoritative version of the zone's config.

Then I will from now on put this in the features-enabled/api.conf of clusters as well :slight_smile:
Thanks for the heads up.

Here is a semi-related issue where this was recommended by a former Icinga team member.

2 Likes

That workaround shouldn’t be needed anymore as the bug has been fixed.

Do you know when (version/commit) it was fixed? I had that issue in 2.11.2.

Have you tried v2.11.8?

I have not. Getting ready to go to 2.12.3 shortly, though.

Things not syncing properly since 2.12.2 in a HA setup sound suspiciously like it could be related to the following bug that was introduced there and is fixed in the already released 2.12.3: https://github.com/Icinga/icinga2/issues/8533 If this is the case, you should be able to find a log message similar to the one in the issue.