Second master in HA intermitently disconnected from (random) child zones

Hi,

I’m trying to setup HA on master zone in my environment. Once I start the “new master node”, I get an error on nearly half of my child zones, that reads:

Icinga 2 Cluster Problem: 1 endpoints are not connected.
(icinga-1master.com)

a TCPDump on the new Node processed on wireshark , shows a TLS Warning:

Expert Info (Warning/Protocol): Ignored Unknown Record)

As a consequence of this, check latency occurs in zones and they en up disconnected; preventing monitoring; and forcing the rollback of the Implementation of the secondary master.

Following this external document; is there a way I can configure TLS parameters on the secondary master?

Make sure “Allow subdissectors to reassemble TCP streams” is enabled in the TCP protocol preferences
Make sure “Reassemble TLS records spanning multiple TCP segment” is enabled in the TLS protocol preferences
Make sure “Reassemble TLS application data spanning multiple TCP records” is enabled in the TLS protocol preferences

My Infrastructure:
current master (no connection issues):
root@monitoring-icinga-1.master.com ~# uname -a
Linux monitoring-icinga-1.master.com 3.10.0-1160.6.1.el7.x86_64 #1 SMP Tue Nov 17 13:59:11 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
root@monitoring-icinga-1.master.com ~# rpm -qa | grep icinga2
icinga2-2.11.6-1.el7.icinga.x86_64
icinga2-bin-2.11.6-1.el7.icinga.x86_64
icinga2-ido-mysql-2.11.6-1.el7.icinga.x86_64
icinga2-common-2.11.6-1.el7.icinga.x86_64

new master (connection issues)
root@icinga-1.master.com ~# uname -a
Linux icinga-1.master.com 3.10.0-1160.6.1.el7.x86_64 #1 SMP Tue Nov 17 13:59:11 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
root@icinga-1.master.com ~# rpm -qa | grep icinga
icinga2-2.11.6-1.el7.icinga.x86_64
icinga2-common-2.11.6-1.el7.icinga.x86_64
icinga2-ido-mysql-2.11.6-1.el7.icinga.x86_64
icinga2-bin-2.11.6-1.el7.icinga.x86_64

Infrastructure configuration details:
root@monitoring-icinga-1.master.com ~# icinga2 feature list
Disabled features: command compatlog debuglog elasticsearch gelf graphite livestatus opentsdb perfdata statusdata syslog
Enabled features: api checker ido-mysql influxdb mainlog notification
root@monitoring-icinga-1.master.com ~# icinga2 daemon --validate
[2020-12-03 06:24:01 -0500] information/cli: Icinga application loader (version: 2.11.6-1)
[2020-12-03 06:24:01 -0500] information/cli: Loading configuration file(s).
[2020-12-03 06:24:02 -0500] information/ConfigItem: Committing config item(s).
[2020-12-03 06:24:02 -0500] information/ApiListener: My API identity: monitoring-icinga.master. om
[2020-12-03 06:24:12 -0500] information/WorkQueue: #4 (DaemonUtility::LoadConfigFiles) items: 8, rate: 12.1333/s (728/min 728/5min 728/15min); empty in 18599 days, 11 hours, 24 minutes and 12 seconds
[2020-12-03 06:24:12 -0500] information/WorkQueue: #5 (InfluxdbWriter, influxdb) items: 0, rate: 0/s (0/min 0/5min 0/15min);
[2020-12-03 06:24:12 -0500] information/WorkQueue: #6 (ApiListener, RelayQueue) items: 0, rate: 0/s (0/min 0/5min 0/15min);
[2020-12-03 06:24:12 -0500] information/WorkQueue: #7 (ApiListener, SyncQueue) items: 0, rate: 0/s (0/min 0/5min 0/15min);
[2020-12-03 06:25:23 -0500] information/ConfigItem: Instantiated 1 InfluxdbWriter.
[2020-12-03 06:25:23 -0500] information/ConfigItem: Instantiated 112 ScheduledDowntimes.
[2020-12-03 06:25:23 -0500] information/ConfigItem: Instantiated 301 HostGroups.
[2020-12-03 06:25:23 -0500] information/ConfigItem: Instantiated 1 EventCommand.
[2020-12-03 06:25:23 -0500] information/ConfigItem: Instantiated 1 FileLogger.
[2020-12-03 06:25:23 -0500] information/ConfigItem: Instantiated 3 NotificationCommands.
[2020-12-03 06:25:23 -0500] information/ConfigItem: Instantiated 1 NotificationComponent.
[2020-12-03 06:25:23 -0500] information/ConfigItem: Instantiated 97230 Notifications.
[2020-12-03 06:25:23 -0500] information/ConfigItem: Instantiated 1 IcingaApplication.
[2020-12-03 06:25:23 -0500] information/ConfigItem: Instantiated 6165 Hosts.
[2020-12-03 06:25:23 -0500] information/ConfigItem: Instantiated 329 Downtimes.
[2020-12-03 06:25:23 -0500] information/ConfigItem: Instantiated 1 ApiListener.
[2020-12-03 06:25:23 -0500] information/ConfigItem: Instantiated 112763 Dependencies.
[2020-12-03 06:25:23 -0500] information/ConfigItem: Instantiated 3468 Comments.
[2020-12-03 06:25:23 -0500] information/ConfigItem: Instantiated 1 CheckerComponent.
[2020-12-03 06:25:23 -0500] information/ConfigItem: Instantiated 61 Zones.
[2020-12-03 06:25:23 -0500] information/ConfigItem: Instantiated 75 Endpoints.
[2020-12-03 06:25:23 -0500] information/ConfigItem: Instantiated 20 ApiUsers.
[2020-12-03 06:25:23 -0500] information/ConfigItem: Instantiated 1 IdoMysqlConnection.
[2020-12-03 06:25:23 -0500] information/ConfigItem: Instantiated 425 CheckCommands.
[2020-12-03 06:25:23 -0500] information/ConfigItem: Instantiated 6 TimePeriods.
[2020-12-03 06:25:23 -0500] information/ConfigItem: Instantiated 2 Users.
[2020-12-03 06:25:23 -0500] information/ConfigItem: Instantiated 88085 Services.
[2020-12-03 06:25:23 -0500] information/ConfigItem: Instantiated 51 ServiceGroups.
[2020-12-03 06:25:23 -0500] information/ScriptGlobal: Dumping variables to file ‘/var/cache/icinga2/icinga2.vars’
[2020-12-03 06:25:23 -0500] information/cli: Finished validating the configuration file(s).
You have new mail in /var/spool/mail/root

/etc/icinga2 is mirrored on both nodes.
differences
constants.conf , each node has a different NodeName
zones.d/
on the node without issues, all children zone configuration files are created
on node with issues, the folder is empty as to make the old node authoritative on the zone.

error logs on new master:

[2020-12-02 05:27:42 -0500] warning/JsonRpcConnection: Error while sending JSON-RPC message for identity 'monitoring-icinga-satellite-1.xxx'
Error: Connection reset by peer
 
 
        (0) icinga2: icinga::JsonRpc::SendRawMessage(std::shared_ptr<icinga::AsioTlsStream> const&, icinga::String const&, boost::asio::basic_yield_context<boost::asio::executor_binder<void (*)(), boost::asio::executor> >) (+0x8a) [0x94fa9a]
        (1) icinga2: icinga::JsonRpcConnection::WriteOutgoingMessages(boost::asio::basic_yield_context<boost::asio::executor_binder<void (*)(), boost::asio::executor> >) (+0x231) [0xac5a41]
        (2) /usr/lib64/icinga2/sbin/icinga2() [0xac5e6a]
        (3) /usr/lib64/icinga2/sbin/icinga2() [0xac6348]
        (4) libboost_context.so.1.69.0: make_fcontext (+0x2f) [0x7fea294b518f]

Is there any advise on help troubleshooting this?

as per Network experts, my master dropped the TLS packets with “Ignored Unknown Records” as they were being tampered by a Wan Optimization device.
Traffic has been bypassed, and now zones are fully connected without issues.
Marking as solved.

I recommend that you question the vendor of that device regarding what
“optimization” means.

Antony.