Icinga2 randomly crashes upon startup

hi all
UPDATE:
I’ve figured out that this problem only appears when we add more then 1 vCPU to the VM.
with 1 cpu all version works as expected.
with 2+ vcpu it crashes frequently…

We have an icinga2 instance running on Ubuntu Linux (it’s a VM on a proxmox machine)

we have about 75+ hosts with 1000+ services.

our problem is that icinga2 randomly crashes on startups.
We have tried to
-upgrade OS from 16.04->18.04->19.04
-upgrade icinga2 from 2.8->2.9->2.10->2.11
-build a new machine from scratch (18.04) and install a fresh icinga2 on it
no changes. we get random errors. sometimes it manages to start up, then it runs correctly but the more hosts we add the bigger the chance of failure.

if we run 0 hosts it starts up correctly, with ~10 hosts it usually works okay, but with more, it usually fails.

So once again, sometimes it starts up (even with full host list), so I guess the configs are okay, it’s not complaining about config errors, only showing lowlevel system errors are when failing.

I appreciate your help cause this drives me crazy. :slight_smile:
here are some of the random error messages - they are different almost every time

[2019-10-03 18:31:41 +0200] information/cli: Icinga application loader (version: r2.11.0-1)
[2019-10-03 18:31:41 +0200] information/cli: Loading configuration file(s).
[2019-10-03 18:31:41 +0200] information/ConfigItem: Committing config item(s).
corrupted size vs. prev_size while consolidating
/builds/packaging/deb-icinga2/build/icinga2/lib/base/json.cpp:185: assertion failed: !“Invalid variant type.”

[2019-10-03 18:31:40 +0200] information/cli: Icinga application loader (version: r2.11.0-1)
[2019-10-03 18:31:40 +0200] information/cli: Loading configuration file(s).
[2019-10-03 18:31:40 +0200] information/ConfigItem: Committing config item(s).
double free or corruption (out)
Caught SIGABRT.

[2019-10-03 18:31:39 +0200] information/cli: Icinga application loader (version: r2.11.0-1)
[2019-10-03 18:31:39 +0200] information/cli: Loading configuration file(s).
[2019-10-03 18:31:39 +0200] information/ConfigItem: Committing config item(s).
double free or corruption (out)
corrupted size vs. prev_size

[2019-10-03 18:31:38 +0200] information/cli: Icinga application loader (version: r2.11.0-1)
[2019-10-03 18:31:38 +0200] information/cli: Loading configuration file(s).
[2019-10-03 18:31:38 +0200] information/ConfigItem: Committing config item(s).
double free or corruption (out)
Caught SIGABRT.
Current time: 2019-10-03 18:31:38 +0200

[2019-10-03 18:31:38 +0200] critical/Application: Icinga 2 has terminated unexpectedly. Additional information can be found in ‘/var/log/icinga2/crash/report.1570120298.494978’

free(): invalid pointer

  1. content of a crash log:
    Application version: r2.11.0-1

System information:
Platform: Ubuntu
Platform version: 19.04 (Disco Dingo)
Kernel: Linux
Kernel version: 5.0.0-29-generic
Architecture: x86_64

Build information:
Compiler: GNU 8.3.0
Build host: runner-LTrJQZ9N-project-298-concurrent-0

Application information:

General paths:
Config directory: /etc/icinga2
Data directory: /var/lib/icinga2
Log directory: /var/log/icinga2
Cache directory: /var/cache/icinga2
Spool directory: /var/spool/icinga2
Run directory: /run/icinga2

Old paths (deprecated):
Installation root: /usr
Sysconf directory: /etc
Run directory (base): /run
Local state directory: /var

Internal paths:
Package data directory: /usr/share/icinga2
State path: /var/lib/icinga2/icinga2.state
Modified attributes path: /var/lib/icinga2/modified-attributes.conf
Objects path: /var/cache/icinga2/icinga2.debug
Vars path: /var/cache/icinga2/icinga2.vars
PID path: /run/icinga2/icinga2.pid
Stacktrace:

    (0) libc.so.6: gsignal (+0xc7) [0x7f147fbeaed7]
    (1) libc.so.6: abort (+0x121) [0x7f147fbcc535]
    (2) libc.so.6: <unknown function> (+0x8c726) [0x7f147fc33726]
    (3) libc.so.6: <unknown function> (+0x9359a) [0x7f147fc3a59a]
    (4) libc.so.6: <unknown function> (+0x953dc) [0x7f147fc3c3dc]
    (5) icinga2: <unknown function> (+0x67b16d) [0x55c8ce8a516d]
    (6) icinga2: icinga::ApplyRule::AddRule(icinga::String const&, icinga::String const&, icinga::String const&, std::shared_ptr<icinga::Expression> const&, std::shared_ptr<icinga::Expression>

const&, icinga::String const&, icinga::String const&, icinga::String const&, std::shared_ptricinga::Expression const&, bool, icinga::DebugInfo const&, boost::intrusive_ptricinga::Dictionary con
st&) (+0x78f) [0x55c8ce8b0e7f]
(7) icinga2: icinga::ApplyExpression::DoEvaluate(icinga::ScriptFrame&, icinga::DebugHint*) const (+0x15f) [0x55c8ce89ee1f]
(8) icinga2: icinga::Expression::Evaluate(icinga::ScriptFrame&, icinga::DebugHint*) const (+0x53) [0x55c8ce8a9c73]
(9) icinga2: icinga::DictExpression::DoEvaluate(icinga::ScriptFrame&, icinga::DebugHint*) const (+0xd1) [0x55c8ce8acd01]
(10) icinga2: (+0x750382) [0x55c8ce97a382]
(11) icinga2: icinga::Expression::Evaluate(icinga::ScriptFrame&, icinga::DebugHint*) const (+0x53) [0x55c8ce8a9c73]
(12) icinga2: icinga::DictExpression::DoEvaluate(icinga::ScriptFrame&, icinga::DebugHint*) const (+0xd1) [0x55c8ce8acd01]
(13) icinga2: icinga::Expression::Evaluate(icinga::ScriptFrame&, icinga::DebugHint*) const (+0x53) [0x55c8ce8a9c73]
(14) icinga2: icinga::DictExpression::DoEvaluate(icinga::ScriptFrame&, icinga::DebugHint*) const (+0xd1) [0x55c8ce8acd01]
(15) icinga2: icinga::Expression::Evaluate(icinga::ScriptFrame&, icinga::DebugHint*) const (+0x53) [0x55c8ce8a9c73]

additional information:
it definitely crashes upon parsing configuration files: it does with -C too:

root@:~# icinga2 daemon -C
[2019-10-03 18:57:16 +0200] information/cli: Icinga application loader (version: r2.11.0-1)
[2019-10-03 18:57:16 +0200] information/cli: Loading configuration file(s).
[2019-10-03 18:57:16 +0200] information/ConfigItem: Committing config item(s).
corrupted size vs. prev_size while consolidating
Caught SIGABRT.
Current time: 2019-10-03 18:57:16 +0200

[2019-10-03 18:57:16 +0200] critical/Application: Icinga 2 has terminated unexpectedly. Additional >information can be found in ‘/var/log/icinga2/crash/report.1570121836.987138’

free(): double free detected in tcache 2
malloc_consolidate(): invalid chunk size
Aborted (core dumped)

Hi and welcome to the community!

I haven’t used proxmox but I do know that sometimes adding vCPUs to an already installed OS results to crashes. My suggestion is to create a new box from scratch with 2 or 4 vCPUs and transfer the config there and test.

Cheers,
George

Hi,

the errors sound really odd, can you upload the full configuration somewhere, obfuscated of course?
Also, as @gkoutsog suggested, please try this in a default Linux VM as well.

Last but not least, if this is reproducible, please install the debug packages and re-post the crash log.

Cheers,
Michael

thanks for your comment
I’ve tried building a new VM from scratch with 4vcpu from start, same result: double free or corruption (fasttop)

icinga2.service - Icinga host/service/network monitoring system
Loaded: loaded (/lib/systemd/system/icinga2.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/icinga2.service.d
└─limits.conf
Active: failed (Result: exit-code) since Mon 2019-10-14 13:09:52 UTC; 2s ago
Process: 13863 ExecStart=/usr/sbin/icinga2 daemon --close-stdio -e ${ICINGA2_ERROR_LOG} (code=exited, status=1/FAILURE)
Process: 13847 ExecStartPre=/usr/lib/icinga2/prepare-dirs /etc/default/icinga2 (code=exited, status=0/SUCCESS)
Main PID: 13863 (code=exited, status=1/FAILURE)

okt 14 13:09:51 monit2test icinga2[13863]: [2019-10-14 13:09:51 +0000] information/cli: Icinga application loader (version: r2.11.0-1)
okt 14 13:09:51 monit2test icinga2[13863]: [2019-10-14 13:09:51 +0000] information/cli: Loading configuration file(s).
okt 14 13:09:51 monit2test icinga2[13863]: [2019-10-14 13:09:51 +0000] information/ConfigItem: Committing config item(s).
okt 14 13:09:51 monit2test icinga2[13863]: malloc_consolidate(): invalid chunk size
okt 14 13:09:51 monit2test icinga2[13863]: double free or corruption (fasttop)
okt 14 13:09:51 monit2test icinga2[13863]: Caught SIGABRT.corrupted double-linked list
okt 14 13:09:52 monit2test systemd[1]: icinga2.service: Main process exited, code=exited, status=1/FAILURE
okt 14 13:09:52 monit2test icinga2[13863]: Current time: 2019-10-14 13:09:51 +0000
okt 14 13:09:52 monit2test systemd[1]: icinga2.service: Failed with result ‘exit-code’.
okt 14 13:09:52 monit2test systemd[1]: Failed to start Icinga host/service/network monitoring system.

Hi,

can you please share the full configuration with me, I’d like to get an idea about the apply rules and stacktrace here with reproducing this myself.

If the configuration is not located in /etc/icinga2 only, please also add the content in /var/lib/icinga2/api.

Thanks,
Michael

yes, we are working on it
I will upload it somewhere

Follow the nextcloud link please beneath share. Only I do have access to this file share.