Please describe your problem as detailed as possible and don’t forget to use a meaningful title
Hi, I have been running into ‘random’ out of memory kills for icinga2 core
i.e.
[Sun Nov 6 23:50:01 2022] Killed process 1491 (icinga2), UID 997, total-vm:202536kB, anon-rss:23584kB, file-rss:0kB, shmem-rss:0kB
[Wed Nov 16 03:02:44 2022] Killed process 27648 (icinga2), UID 996, total-vm:14297888kB, anon-rss:13853948kB, file-rss:0kB, shmem-rss:0kB
where I can not find anything relevant/outstanding on the logs that would help isolating the root cause of the failuire (and any subsequent resolution action). That is, main log is no showing any meaningful errors and , the ‘random’ nature of the failures (not related to configuration updates/reloads), prevents keeping the debug log enabled on the production environment.
Under such perspective, I was wondering if there is a way to enable opentelemetry observability into icinga2 core, or anything alike , as to help capturing any traces that could help understand/troubleshoot the root cause of the failure, whenever it happens to again?
Give as much information as you can, e.g.
- Version used (
icinga2 --version
)
icinga2 --version
icinga2 - The Icinga 2 network monitoring daemon (version: 2.13.2-1)
Copyright (c) 2012-2022 Icinga GmbH (https://icinga.com/)
License GPLv2+: GNU GPL version 2 or later https://gnu.org/licenses/gpl2.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.System information:
Platform: CentOS Linux
Platform version: 7 (Core)
Kernel: Linux
Kernel version: 3.10.0-1160.76.1.el7.x86_64
Architecture: x86_64Build information:
Compiler: GNU 4.8.5
Build host: runner-hh8q3bz2-project-322-concurrent-0
OpenSSL version: OpenSSL 1.0.2k-fips 26 Jan 2017Application information:
General paths:
Config directory: /etc/icinga2
Data directory: /var/lib/icinga2
Log directory: /var/log/icinga2
Cache directory: /var/cache/icinga2
Spool directory: /var/spool/icinga2
Run directory: /run/icinga2Old paths (deprecated):
Installation root: /usr
Sysconf directory: /etc
Run directory (base): /run
Local state directory: /varInternal paths:
Package data directory: /usr/share/icinga2
State path: /var/lib/icinga2/icinga2.state
Modified attributes path: /var/lib/icinga2/modified-attributes.conf
Objects path: /var/cache/icinga2/icinga2.debug
Vars path: /var/cache/icinga2/icinga2.vars
PID path: /run/icinga2/icinga2.pid
- Operating System and version
# cat /etc/centos-release
CentOS Linux release 7.9.2009 (Core)
- Enabled features (
icinga2 feature list
)
# icinga2 feature list
Disabled features: command compatlog debuglog elasticsearch gelf graphite icingadb influxdb2 livestatus opentsdb perfdata statusdata syslog
Enabled features: api checker ido-mysql influxdb mainlog notification
- Icinga Web 2 version and modules (System - About) - NA
- Config validation (
icinga2 daemon -C
)
icinga2 daemon -C
[2022-11-21 07:49:25 -0500] information/cli: Icinga application loader (version: 2.13.2-1)
[2022-11-21 07:49:25 -0500] information/cli: Loading configuration file(s).
[2022-11-21 07:49:26 -0500] information/ConfigItem: Committing config item(s).
[2022-11-21 07:49:26 -0500] information/ApiListener: My API identity: icinga-1.corp-apps.com
[2022-11-21 07:49:36 -0500] information/WorkQueue: #5 (InfluxdbWriter, influxdb) items: 0, rate: 0/s (0/min 0/5min 0/15min);
[2022-11-21 07:49:36 -0500] information/WorkQueue: #4 (DaemonUtility::LoadConfigFiles) items: 8, rate: 12.9333/s (776/min 776/5min 776/15min); empty in 19317 days, 12 hours, 49 minutes and 36 seconds
[2022-11-21 07:49:36 -0500] information/WorkQueue: #7 (ApiListener, RelayQueue) items: 0, rate: 0/s (0/min 0/5min 0/15min);
[2022-11-21 07:49:36 -0500] information/WorkQueue: #8 (ApiListener, SyncQueue) items: 0, rate: 0/s (0/min 0/5min 0/15min);
[2022-11-21 07:49:46 -0500] information/WorkQueue: #4 (DaemonUtility::LoadConfigFiles) items: 8, rate: 12.9333/s (776/min 776/5min 776/15min); empty in infinite time, your task handler isn't able to keep up
Several Suppressed ApplyRule warnings of type
[2022-11-21 07:51:38 -0500] warning/ApplyRule:... for type 'Service' does not match anywhere!
[2022-11-21 07:51:38 -0500] information/ConfigItem: Instantiated 1 InfluxdbWriter.
[2022-11-21 07:51:38 -0500] information/ConfigItem: Instantiated 1 NotificationComponent.
[2022-11-21 07:51:38 -0500] information/ConfigItem: Instantiated 1 IdoMysqlConnection.
[2022-11-21 07:51:38 -0500] information/ConfigItem: Instantiated 1 CheckerComponent.
[2022-11-21 07:51:38 -0500] information/ConfigItem: Instantiated 2 Users.
[2022-11-21 07:51:38 -0500] information/ConfigItem: Instantiated 51 ServiceGroups.
[2022-11-21 07:51:38 -0500] information/ConfigItem: Instantiated 6 TimePeriods.
[2022-11-21 07:51:38 -0500] information/ConfigItem: Instantiated 103055 Services.
[2022-11-21 07:51:38 -0500] information/ConfigItem: Instantiated 53 Zones.
[2022-11-21 07:51:38 -0500] information/ConfigItem: Instantiated 192 ScheduledDowntimes.
[2022-11-21 07:51:38 -0500] information/ConfigItem: Instantiated 3 NotificationCommands.
[2022-11-21 07:51:38 -0500] information/ConfigItem: Instantiated 369 HostGroups.
[2022-11-21 07:51:38 -0500] information/ConfigItem: Instantiated 144616 Notifications.
[2022-11-21 07:51:38 -0500] information/ConfigItem: Instantiated 1427 Downtimes.
[2022-11-21 07:51:38 -0500] information/ConfigItem: Instantiated 138679 Dependencies.
[2022-11-21 07:51:38 -0500] information/ConfigItem: Instantiated 1 IcingaApplication.
[2022-11-21 07:51:38 -0500] information/ConfigItem: Instantiated 7115 Hosts.
[2022-11-21 07:51:38 -0500] information/ConfigItem: Instantiated 1 EventCommand.
[2022-11-21 07:51:38 -0500] information/ConfigItem: Instantiated 67 Endpoints.
[2022-11-21 07:51:38 -0500] information/ConfigItem: Instantiated 7788 Comments.
[2022-11-21 07:51:38 -0500] information/ConfigItem: Instantiated 1 FileLogger.
[2022-11-21 07:51:38 -0500] information/ConfigItem: Instantiated 21 ApiUsers.
[2022-11-21 07:51:38 -0500] information/ConfigItem: Instantiated 464 CheckCommands.
[2022-11-21 07:51:38 -0500] information/ConfigItem: Instantiated 1 ApiListener.
[2022-11-21 07:51:39 -0500] information/ScriptGlobal: Dumping variables to file '/var/cache/icinga2/icinga2.vars'
[2022-11-21 07:51:39 -0500] information/cli: Finished validating the configuration file(s).
- If you run multiple Icinga 2 instances, the
zones.conf
file (oricinga2 object list --type Endpoint
andicinga2 object list --type Zone
) from all affected nodes
This is an HA Master cluster. could be provided on demand. Count of endpoints, zones, etc visible on config validation.
*additional info: jemaloc is enabled
# rpm -qa | egrep "ici|jem"
icinga2-2.13.2-1.el7.icinga.x86_64
icinga2-common-2.13.2-1.el7.icinga.x86_64
icinga2-bin-2.13.2-1.el7.icinga.x86_64
icinga2-ido-mysql-2.13.2-1.el7.icinga.x86_64
jemalloc-3.6.0-1.el7.x86_64
# cat /etc/sysconfig/icinga2
#Mananged by puppet
#This is the default environment Icinga 2 runs with.
#Make your changes here.
#DAEMON=/usr/sbin/icinga2
#ICINGA2_CONFIG_FILE=/etc/icinga2/icinga2.conf
#ICINGA2_INIT_RUN_DIR=/run/icinga2
#ICINGA2_PID_FILE=/run/icinga2/icinga2.pid
#ICINGA2_LOG_DIR=/var/log/icinga2
#ICINGA2_ERROR_LOG=/var/log/icinga2/error.log
#ICINGA2_STARTUP_LOG=/var/log/icinga2/startup.log
#ICINGA2_LOG=/var/log/icinga2/icinga2.log
#ICINGA2_CACHE_DIR=/var/cache/icinga2
#ICINGA2_USER=icinga
#ICINGA2_GROUP=icinga
#ICINGA2_COMMAND_GROUP=icingacmd
LD_PRELOAD=/usr/lib64/libjemalloc.so.1
Harward specs
free -h
total used free shared buff/cache available
Mem: 15G 2.6G 8.4G 784M 4.6G 11G
Swap: 0B 0B 0B
grep proc /proc/cpuinfo | tail
processor : 0
processor : 1
processor : 2
processor : 3
processor : 4
processor : 5
processor : 6
processor : 7