Agents crashed after update clients to version r2.13.0-1 with server in r2.12.3-1

Hi,

Thanks to everybody for reading. We have a recent issue after upgrading some ubuntu clients to latest version available in the repository. The agents that has been updated to r2.13.0-1, has crashed and in the status has a failed. They have stoped working.

The server is in an Ubuntu Server 16.04 LTS with ESM enabled. The latest version available in the repository of the server is r2.12.3-1.

Is a compatibility issue?

In the clients log, we can see that:

root@unifi:/etc/icinga2# service icinga2 status
● icinga2.service - Icinga host/service/network monitoring system
Loaded: loaded (/lib/systemd/system/icinga2.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/icinga2.service.d
└─limits.conf
Active: failed (Result: exit-code) since Tue 2021-08-03 22:14:52 CEST; 9min ago
Process: 17247 ExecStartPre=/usr/lib/icinga2/prepare-dirs /etc/default/icinga2 (code=exited, status=0/SUCCESS)
Process: 17262 ExecStart=/usr/sbin/icinga2 daemon --close-stdio -e ${ICINGA2_ERROR_LOG} (code=exited, status=1/FAILURE)
Main PID: 17262 (code=exited, status=1/FAILURE)
Status: “Activating config objects…”

ago 03 22:14:52 unifi icinga2[17284]: [2021-08-03 22:14:52 +0200] information/ConfigItem: Instantiated 1 IcingaApplication.
ago 03 22:14:52 unifi icinga2[17284]: [2021-08-03 22:14:52 +0200] information/ConfigItem: Instantiated 1 ApiListener.
ago 03 22:14:52 unifi icinga2[17284]: [2021-08-03 22:14:52 +0200] information/ConfigItem: Instantiated 4 Zones.
ago 03 22:14:52 unifi icinga2[17284]: [2021-08-03 22:14:52 +0200] information/ConfigItem: Instantiated 2 Endpoints.
ago 03 22:14:52 unifi icinga2[17284]: [2021-08-03 22:14:52 +0200] information/ConfigItem: Instantiated 244 CheckCommands.
ago 03 22:14:52 unifi icinga2[17284]: [2021-08-03 22:14:52 +0200] information/ScriptGlobal: Dumping variables to file ‘/var/cache/icinga2/icinga2.vars’
ago 03 22:14:52 unifi icinga2[17262]: [2021-08-03 22:14:52 +0200] information/cli: Closing console log.
ago 03 22:14:52 unifi systemd[1]: Started Icinga host/service/network monitoring system.
ago 03 22:14:52 unifi systemd[1]: icinga2.service: Main process exited, code=exited, status=1/FAILURE
ago 03 22:14:52 unifi systemd[1]: icinga2.service: Failed with result ‘exit-code’.
root@unifi:/etc/icinga2#

And if we see the daemon:

root@unifi:/etc/icinga2# icinga2 daemon

[2021-08-03 22:24:35 +0200] information/cli: Icinga application loader (version: r2.13.0-1)

[2021-08-03 22:24:35 +0200] information/cli: Loading configuration file(s).

[2021-08-03 22:24:36 +0200] information/ConfigItem: Committing config item(s).

[2021-08-03 22:24:36 +0200] information/ApiListener: My API identity: unifi

[2021-08-03 22:24:36 +0200] information/ConfigItem: Instantiated 1 IcingaApplication.

[2021-08-03 22:24:36 +0200] information/ConfigItem: Instantiated 1 ApiListener.

[2021-08-03 22:24:36 +0200] information/ConfigItem: Instantiated 4 Zones.

[2021-08-03 22:24:36 +0200] information/ConfigItem: Instantiated 2 Endpoints.

[2021-08-03 22:24:36 +0200] information/ConfigItem: Instantiated 244 CheckCommands.

[2021-08-03 22:24:36 +0200] information/ScriptGlobal: Dumping variables to file ‘/var/cache/icinga2/icinga2.vars’

[2021-08-03 22:24:36 +0200] information/ConfigObject: Restoring program state from file ‘/var/lib/icinga2/icinga2.state’

[2021-08-03 22:24:36 +0200] information/ConfigObject: Restored 290 objects. Loaded 3 new objects without state.

[2021-08-03 22:24:36 +0200] information/ConfigItem: Triggering Start signal for config items

[2021-08-03 22:24:36 +0200] information/ApiListener: ‘api’ started.

[2021-08-03 22:24:36 +0200] critical/ApiListener: Cannot bind TCP socket for host ‘::’ on port ‘5665’: open: Address family not supported by protocol

Context:

(0) Activating object ‘api’ of type ‘ApiListener’

[2021-08-03 22:24:36 +0200] critical/ApiListener: Cannot add listener on host ‘::’ for port ‘5665’.

Context:

(0) Activating object ‘api’ of type ‘ApiListener’

We have made a reconfigure and the host is correctly stablished. We don’t know where is the issue, but all ubuntu servers that has been updated to r2.13 has crashed. Old with r2.12 works good.

Thanks a lot

@metainnova_es
The parent node needs to have a more recent version than the child node.

Please have a look at the documentation:
https://icinga.com/docs/icinga-2/latest/doc/06-distributed-monitoring/#versions-and-upgrade

Good morning everyone. I just want to chime in, that I see the same behaviour after the update to 2.13.0-1.focal (but as you can see on Ubuntu 20.04)

systemctl status icinga2.service

● icinga2.service - Icinga host/service/network monitoring system
     Loaded: loaded (/lib/systemd/system/icinga2.service; enabled; vendor preset: enabled)
    Drop-In: /etc/systemd/system/icinga2.service.d
             └─limits.conf
     Active: failed (Result: exit-code) since Wed 2021-08-04 07:25:26 CEST; 21min ago
    Process: 16589 ExecStartPre=/usr/lib/icinga2/prepare-dirs /etc/default/icinga2 (code=exited, status=0/SUCCESS)
    Process: 16594 ExecStart=/usr/sbin/icinga2 daemon --close-stdio -e ${ICINGA2_ERROR_LOG} (code=exited, status=1/FAILURE)
   Main PID: 16594 (code=exited, status=1/FAILURE)
     Status: "Activating config objects..."

Aug 04 07:25:26 icinga icinga2[16623]: [2021-08-04 07:25:26 +0200] information/ConfigItem: Instantiated 1 User.
Aug 04 07:25:26 icinga icinga2[16623]: [2021-08-04 07:25:26 +0200] information/ConfigItem: Instantiated 3 TimePeriods.
Aug 04 07:25:26 icinga icinga2[16623]: [2021-08-04 07:25:26 +0200] information/ConfigItem: Instantiated 3 ServiceGroups.
Aug 04 07:25:26 icinga icinga2[16623]: [2021-08-04 07:25:26 +0200] information/ConfigItem: Instantiated 5 ScheduledDowntimes.
Aug 04 07:25:26 icinga icinga2[16623]: [2021-08-04 07:25:26 +0200] information/ConfigItem: Instantiated 213 Services.
Aug 04 07:25:26 icinga icinga2[16623]: [2021-08-04 07:25:26 +0200] information/ScriptGlobal: Dumping variables to file '/var/cache/icinga2/icinga2.vars'
Aug 04 07:25:26 icinga icinga2[16594]: [2021-08-04 07:25:26 +0200] information/cli: Closing console log.
Aug 04 07:25:26 icinga systemd[1]: Started Icinga host/service/network monitoring system.
Aug 04 07:25:26 icinga systemd[1]: icinga2.service: Main process exited, code=exited, status=1/FAILURE
Aug 04 07:25:26 icinga systemd[1]: icinga2.service: Failed with result 'exit-code'.

icinga2 daemon

[...]
[2021-08-04 07:39:58 +0200] critical/ApiListener: Cannot bind TCP socket for host '::' on port '5665': open: Address family not supported by protocol
Context:
        (0) Activating object 'api' of type 'ApiListener'

[2021-08-04 07:39:58 +0200] critical/ApiListener: Cannot add listener on host '::' for port '5665'.
Context:
        (0) Activating object 'api' of type 'ApiListener'

Any help would be greatly appreciated.

Edit: @nexo1960 This is the behaviour on the master and not on one of the nodes. At least for me.

@reiner.hormann
Please share your API.conf - usually located under /etc/icinga2

Here is my api.conf

/**
 * The API listener is used for distributed monitoring setups.
 */
object ApiListener "api" {

  ticket_salt = TicketSalt
}

I’m scratching my head about

Cannot bind TCP socket for host '::'

and

Address family not supported by protocol

Could it be a problem, that I do not have IPv6 on that VM?

Yes try to set the bind_host to 0.0.0.0
https://icinga.com/docs/icinga-2/latest/doc/09-object-types/#apilistener

1 Like

@metainnova_es
Depending on you setup you may want to disable the API feature on your agents. But please first check who is initiating the connection to/from the parent node.

That did the trick, thank you very much!

For later reference: I added bind_host = "0.0.0.0" to api.conf and restarted icinga2.service. After that the API is only bound to my IPv4 and it tries not to bind to IPv6. This is needed, since I do not use IPv6 on those machines.

After that I edited api.conf on my other nodes the same way, since they had the same problem.

1 Like

Hi @nexo1960 and @reiner.hormann

Efectively, adding in api.conf the line "bind_host=“0.0.0.0” the issue is solved. We have reviewed the network interfaces of the different clients and operating systems, and computers with version 2.13 that has the IPV.6 protocol disabled are the uniques that stop responding and gives the fatal error. The solution is to add that line in api.conf at clients. ¿any reason for the failure?

We have restored a backup of one of the VM that has failed, enabled ipv.6 (was disabled through grub with line “disableipv.6”, and after enabled, we have updated, and now any issue. That is the proof that new version 2.13 needs ipv.6 enabled on linux agents, or in there case, the specification that api goes to 0.0.0.0 adding that line in api.conf to get icinga2 agent running again.

Other question @nexo1960. There is any plan to maintain the xenial repository and update it with version 2.13, or the development and update of that repo has been discontinued? In our case, we have servers with ESM enabled, so we can receive updates for three to five years more. That is the reason why the server is on 2.12.5 and clients on 2.13. It’s not posible to update the server to recent version if we don’t update de operating system distribution to 18.04 or 20.04, because the xenial repository doesn’t contain that version in their publications.

Thanks again.

@metainnova_es
As far as i can tell xenial has been dropped:

You may be able to run 2.12 on the master and 2.13 on the clients, but keep in mind that this is not tested nor supported by the devs.

@theFeu
coud you help here? Is IPv6 now a requirement for Icinga 2.13?

Update: There is also a github issue open regarding this problem:

Did all of you explicitly fully disable IPv6 support yourself? Or is there a way to end up with this configuration by default (maybe some cloud provider images with strange defaults)?

There’s no hard requirement on IPv6 as you can get back to the old behavior by setting bind_host = "0.0.0.0". Version 2.13 changed the default from 0.0.0.0 to :: in order to additionally use IPv6 by default (:: will also implicitly listen on IPv4) but doing so requires the operating system to at least know that IPv6 exists (there’s no need for any kind of IPv6 connectivity).

Hi,

Efectively, in the clients that update has occasioned this issue, the IPV.6 support had been disabled completely vía “GRUB_CMDLINE_LINUX_DEFAULT”, option “ipv6.disable=1” in the grub file, so kernel don’t start anything through ipv-6 protocol.

Thanks a lot.

By you or your :cloud: provider?

Hi @Al2Klimov,

We have disabled IPV.6 in a lot of servers by our own.

Not an Icinga-specific comment, but I advise against this.

If you do not want machines to communicate using IPv6, simply don’t give them
routable IPv6 addresses, but disabling the IPv6 stack entirely can cause
problems with more software than just Icinga.

Out of interest, why do you not want to use IPv6 anyway?

Antony.

1 Like

Hi @Pooh,

It’s a simply decision about system resources and usability. We don’t use anything over IPv6 and our network is not configured to use it, so we have preferred to disable.

We will enable it again if it’s necessary for icinga2.

Thanks a lot to everybody.

Even more interesting. Did just the IPv6 subsystem itself actually waste that many system resources?

Hi @Al2Klimov

No, the IPv6 subsystem not taking each nothing system resources, but we prefer to disable it if we are not going to use. Appart of that, monitoring systems applied to VMware hypervisors and VMware tools shown a lot of IPv6 addresses in the cluster console that are simple unnecessary to display, because apart for loopback, they don’t have any utility if the local network is not adapted to use IPv6.

I can confirm that adding bind_host = “0.0.0.0” fixed the problem for me and I too have fully deactivated IPv6 using:

GRUB_CMDLINE_LINUX_DEFAULT=“ipv6.disable=1”

in /etc/default/grub