Icinga2 API not replying randomly

Hey Icinga community,

I’d like to share with you a problem about the Icinga2 API, after upgrading from 2.10 to 2.12.

I can easily reproduce it with the following command:
for ((i=1;i<=100;i++)); do curl -k -v "https://icinga.com:5665/v1";sleep 1; done

It generally fail after 10-20 call. Sometimes more, sometimes less. The curl command stucks on “* Trying a.b.c.d…”
I’ve checked the number of connection on the ApiListener with:

curl -s -u 'user:password' https://icinga.com:5665/v1/status/ApiListener?pretty=1' | jq '.results[0].status.api.http'
{
  "clients": 452
}

As my infrastructure is loadbalanced on 2 Master servers, you can multiply this number x2.

Some more infos:

# icinga2 --version

icinga2 - The Icinga 2 network monitoring daemon (version: r2.12.3-1)

Copyright (c) 2012-2021 Icinga GmbH (https://icinga.com/)
License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl2.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

System information:
  Platform: Debian GNU/Linux
  Platform version: 9 (stretch)
  Kernel: Linux
  Kernel version: 4.9.0-14-amd64
  Architecture: x86_64

Build information:
  Compiler: GNU 6.3.0
  Build host: runner-hh8q3bz2-project-298-concurrent-0
  OpenSSL version: OpenSSL 1.1.0l  10 Sep 2019

Application information:

General paths:
  Config directory: /etc/icinga2
  Data directory: /var/lib/icinga2
  Log directory: /var/log/icinga2
  Cache directory: /var/cache/icinga2
  Spool directory: /var/spool/icinga2
  Run directory: /run/icinga2

Old paths (deprecated):
  Installation root: /usr
  Sysconf directory: /etc
  Run directory (base): /run
  Local state directory: /var

Internal paths:
  Package data directory: /usr/share/icinga2
  State path: /var/lib/icinga2/icinga2.state
  Modified attributes path: /var/lib/icinga2/modified-attributes.conf
  Objects path: /var/cache/icinga2/icinga2.debug
  Vars path: /var/cache/icinga2/icinga2.vars
  PID path: /run/icinga2/icinga2.pid


icinga2 daemon -C

[2021-01-14 22:35:00 +0100] information/cli: Icinga application loader (version: r2.12.3-1)
[2021-01-14 22:35:00 +0100] information/cli: Loading configuration file(s).
[2021-01-14 22:35:01 +0100] information/ConfigItem: Committing config item(s).
[2021-01-14 22:35:01 +0100] information/ApiListener: My API identity: icinga-master1.localdomain
[2021-01-14 22:35:11 +0100] information/WorkQueue: #4 (DaemonUtility::LoadConfigFiles) items: 0, rate:  9/s (540/min 540/5min 540/15min);
[2021-01-14 22:35:11 +0100] information/WorkQueue: #6 (ApiListener, RelayQueue) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2021-01-14 22:35:11 +0100] information/WorkQueue: #7 (ApiListener, SyncQueue) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2021-01-14 22:35:25 +0100] information/ConfigItem: Instantiated 1 NotificationComponent.
[2021-01-14 22:35:25 +0100] information/ConfigItem: Instantiated 2121 Hosts.
[2021-01-14 22:35:25 +0100] information/ConfigItem: Instantiated 169 Downtimes.
[2021-01-14 22:35:25 +0100] information/ConfigItem: Instantiated 8 NotificationCommands.
[2021-01-14 22:35:25 +0100] information/ConfigItem: Instantiated 1 FileLogger.
[2021-01-14 22:35:25 +0100] information/ConfigItem: Instantiated 4862 Comments.
[2021-01-14 22:35:25 +0100] information/ConfigItem: Instantiated 20527 Notifications.
[2021-01-14 22:35:25 +0100] information/ConfigItem: Instantiated 1 IcingaApplication.
[2021-01-14 22:35:25 +0100] information/ConfigItem: Instantiated 74 HostGroups.
[2021-01-14 22:35:25 +0100] information/ConfigItem: Instantiated 3 EventCommands.
[2021-01-14 22:35:25 +0100] information/ConfigItem: Instantiated 575 Dependencies.
[2021-01-14 22:35:25 +0100] information/ConfigItem: Instantiated 1 CheckerComponent.
[2021-01-14 22:35:25 +0100] information/ConfigItem: Instantiated 13 Zones.
[2021-01-14 22:35:25 +0100] information/ConfigItem: Instantiated 13 Endpoints.
[2021-01-14 22:35:25 +0100] information/ConfigItem: Instantiated 1 ExternalCommandListener.
[2021-01-14 22:35:25 +0100] information/ConfigItem: Instantiated 1 IdoMysqlConnection.
[2021-01-14 22:35:25 +0100] information/ConfigItem: Instantiated 12 ApiUsers.
[2021-01-14 22:35:25 +0100] information/ConfigItem: Instantiated 1 ApiListener.
[2021-01-14 22:35:25 +0100] information/ConfigItem: Instantiated 299 CheckCommands.
[2021-01-14 22:35:25 +0100] information/ConfigItem: Instantiated 1 LivestatusListener.
[2021-01-14 22:35:25 +0100] information/ConfigItem: Instantiated 15 TimePeriods.
[2021-01-14 22:35:25 +0100] information/ConfigItem: Instantiated 1 UserGroup.
[2021-01-14 22:35:25 +0100] information/ConfigItem: Instantiated 5 Users.
[2021-01-14 22:35:25 +0100] information/ConfigItem: Instantiated 37816 Services.
[2021-01-14 22:35:25 +0100] information/ConfigItem: Instantiated 6 ServiceGroups.
[2021-01-14 22:35:26 +0100] information/ScriptGlobal: Dumping variables to file '/var/cache/icinga2/icinga2.vars'
[2021-01-14 22:35:26 +0100] information/cli: Finished validating the configuration file(s).

The API comes with a very basic configuration:

# cat /etc/icinga2/features-enabled/api.conf
object ApiListener "api" {
  accept_config = true
  accept_commands = true
}

I really don’t understand why the API stop to responds sometimes. Any idea?

Thanks!
Kevin

More details:

To contact the Icinga2 API, I’m using the following perl lib: https://github.com/stockholmuniversity/SU-API-Icinga2-perl

It seems each ‘login’ generates 2 call, a 401 unauthorized then a 200 authorized. This is due to the realm usage (here: https://github.com/stockholmuniversity/SU-API-Icinga2-perl/blob/master/lib/Monitoring/Icinga2/Client/REST.pm#L207)

So I’ve modified locally the lib to force the authorization header. A way less unauthorized calls :slight_smile:

Let’s see if random timeouts are still present now…

EDIT: timeout still present.

Important detail: it seems I can reproduce the issue only with the systems behind a one common IP. My orchestrators/scripts/… are all sharing the same public IP.
From Icinga2 API point of view, one IP = one client? Is Icinga2 API could block somehow my IP randomly?

Hi,

you could try this Perl library: https://metacpan.org/pod/Monitoring::Icinga2::Client::rest - works fine for us.

Hi @stevie-sy,
Monitoring::Icinga2::Client::REST is https://github.com/stockholmuniversity/SU-API-Icinga2-perl
EDIT: if you didn’t change the code, you should take a look at icinga2 logs. You’ll see some unauthorized call

oh thanks! Didn’t noticed this yet :roll_eyes:

Ok finally I manage to reproduce the issue easily.
I just need to request the API with some parallel calls:
GET on /v1 (every second)
GET on /v1/objects/hosts/example.com (every second)
POST on /v1/actions/process-check-result (every 10 seconds)

After ~3 minutes, the first timeout occurs.

Intersting facts: around 330 CLOSE_WAIT connections on each master servers.

Here the last news:
I’ve discovered the random timeout are not so random. I can’t connect to the Icinga API every 5 minutes.
From iotop command, I can see a brief but huge disk write activity from icinga2 daemon.
As my master servers are under VMware stack, I’ve checked the datastore performance. Every 5 minutes, a write latency spike occurs.

Master servers have both the following feature enabled:

~# icinga2 feature list
Disabled features: compatlog debuglog elasticsearch gelf graphite icingadb influxdb opentsdb perfdata statusdata syslog
Enabled features: api checker command ido-mysql livestatus mainlog notification

No cleanup routine configured in ido-mysql.
Any idea from what the write activity come from?