Icinga2 Not writting to database - "icinga is currently not up and running"

jason.agility · July 6, 2021, 7:50pm

I have a very strange problem

I am setting up a new icinga2 node, and it was working, then a bunch of agents connected and the service seems to crash
Now icingaweb is “icinga is currently not up and running” and icinga2 service is not writting to the database

root@icinga2:/etc/icinga2/features-available# ll
-rw-r--r-- 1 nagios nagios  270 Jul  6 15:01 ido-mysql.conf

/**
 * The db_ido_mysql library implements IDO functionality
 * for MySQL.
 */

library "db_ido_mysql"

object IdoMysqlConnection "idomysql" {
  user = "icinga2",
  password = "redacted",
  host = "database.internal",
  database = "icinga2"
}

IDO is not writting to the database

select status_update_time from icinga_programstatus;
Empty set (0.00 sec)

I tried dropping the icinga2 database, and re creating it, and then repopulating from schema

/usr/share/icinga2-ido-mysql/schema/mysql.sql

and then restarting the service, but no luck.
the icinga2 service is up and running

As i was writting this ticket, 20 min after I rebuilt the db. It suddenly came back to life.
Any help or ideas would be appreciated. since I am worried it may happen again.

Version used (icinga2 --version) r2.12.4-1
Operating System and version : Ubuntu 20.04.2 LTS (Focal Fossa)
Enabled features (icinga2 feature list) api checker debuglog ido-mysql influxdb mainlog notification
Icinga Web 2 version and modules (System - About)
|setup|2.8.2|
|grafana|1.4.2|
|monitoring|2.8.2|
Config validation (icinga2 daemon -C)
valid
If you run multiple Icinga 2 instances, the zones.conf file (or icinga2 object list --type Endpoint and icinga2 object list --type Zone) from all affected nodes

jason.agility · July 6, 2021, 8:36pm

It seems like the root cause is that everytime a agent connect to the master it causes huge cpu spikes

45768 nagios    20   0 1882904 231008  15860 S 396.3   2.8  66:44.35 /usr/lib/x86_64-linux-gnu/icinga2/sbin/icinga2 --no-stack-rlimit daemon --close-stdio -e +

thats a cpu% of 396.3

and when that happens, icinga2 seems to forget to update the database.

Pooh · July 6, 2021, 8:50pm

I would start with a close look in /var/log/icinga2 at icinga2.log,
icinga2.err and startup.log

Antony.

jason.agility · July 6, 2021, 8:55pm

The only enlighteniing things in the icinag2.log file is hosts that cannot connect (because they need their zones and certs updated, something that is in progress)
and the fact that the DB re connection / resuming seems quite slow.

[2021-07-06 15:54:56 -0400] information/DbConnection: 'idomysql' started.
[2021-07-06 16:00:24 -0400] information/IdoMysqlConnection: 'idomysql' resumed.
[2021-07-06 16:00:24 -0400] information/DbConnection: Resuming IDO connection: idomysql

jason.agility · July 7, 2021, 7:49pm

more testing and messing with things and it seems that eventually icinga2 catches up and is happy. but every time i reload the service it gets stuck reloading. and then if i dont notice right away, and restart the service to fix the issue it takes 45min for icinga to catch back up and the cpu load to return to normal levels.

theFeu · July 9, 2021, 6:57am

It might help if you share your configuration with us - if you have a lof of hosts and a lot of apply rules that refence the host objects like assign where host.name == "foo" the reload time could be bloated because of that.

jason.agility · July 9, 2021, 11:13am

I do have a large ish number of services (3748) an most are via apply rules (that match again multiple hosts [apply where host.os == linux])

icinga2 daemon --validate
[2021-07-09 07:10:42 -0400] information/cli: Icinga application loader (version: r2.12.4-1)
[2021-07-09 07:10:42 -0400] information/cli: Loading configuration file(s).
[2021-07-09 07:10:42 -0400] information/ConfigItem: Committing config item(s).
[2021-07-09 07:10:42 -0400] information/ApiListener: My API identity: host.internal
[2021-07-09 07:10:45 -0400] warning/ApplyRule: Apply rule 'ping6' (in /etc/icinga2/zones.d/master/linux-services.conf: 28:1-28:21) for type 'Service' does not match anywhere!
[2021-07-09 07:10:45 -0400] warning/ApplyRule: Apply rule '' (in /etc/icinga2/zones.d/master/linux-services.conf: 57:1-57:53) for type 'Service' does not match anywhere!
[2021-07-09 07:10:45 -0400] warning/ApplyRule: Apply rule 'icinga' (in /etc/icinga2/zones.d/master/linux-services.conf: 66:1-66:22) for type 'Service' does not match anywhere!
[2021-07-09 07:10:45 -0400] warning/ApplyRule: Apply rule 'load' (in /etc/icinga2/zones.d/master/linux-services.conf: 74:1-74:20) for type 'Service' does not match anywhere!
[2021-07-09 07:10:45 -0400] warning/ApplyRule: Apply rule 'procs' (in /etc/icinga2/zones.d/master/linux-services.conf: 85:1-85:21) for type 'Service' does not match anywhere!
[2021-07-09 07:10:45 -0400] warning/ApplyRule: Apply rule 'swap' (in /etc/icinga2/zones.d/master/linux-services.conf: 92:1-92:20) for type 'Service' does not match anywhere!
[2021-07-09 07:10:45 -0400] warning/ApplyRule: Apply rule 'users' (in /etc/icinga2/zones.d/master/linux-services.conf: 99:1-99:21) for type 'Service' does not match anywhere!
[2021-07-09 07:10:45 -0400] warning/ApplyRule: Apply rule 'puppet-agent' (in /etc/icinga2/zones.d/master/linux-services.conf: 106:1-106:28) for type 'Service' does not match anywhere!
[2021-07-09 07:10:45 -0400] warning/ApplyRule: Apply rule 'update' (in /etc/icinga2/zones.d/master/windows-services.conf: 157:1-157:22) for type 'Service' does not match anywhere!
[2021-07-09 07:10:45 -0400] warning/ApplyRule: Apply rule 'SQL Server txn log' (in /etc/icinga2/zones.d/master/windows-services.conf: 208:1-208:34) for type 'Service' does not match anywhere!
[2021-07-09 07:10:45 -0400] information/ConfigItem: Instantiated 1 NotificationComponent.
[2021-07-09 07:10:45 -0400] information/ConfigItem: Instantiated 235 Hosts.
[2021-07-09 07:10:45 -0400] information/ConfigItem: Instantiated 11 Downtimes.
[2021-07-09 07:10:45 -0400] information/ConfigItem: Instantiated 5 NotificationCommands.
[2021-07-09 07:10:45 -0400] information/ConfigItem: Instantiated 1 FileLogger.
[2021-07-09 07:10:45 -0400] information/ConfigItem: Instantiated 164 Comments.
[2021-07-09 07:10:45 -0400] information/ConfigItem: Instantiated 12461 Notifications.
[2021-07-09 07:10:45 -0400] information/ConfigItem: Instantiated 1 IcingaApplication.
[2021-07-09 07:10:45 -0400] information/ConfigItem: Instantiated 22 HostGroups.
[2021-07-09 07:10:45 -0400] information/ConfigItem: Instantiated 1 CheckerComponent.
[2021-07-09 07:10:45 -0400] information/ConfigItem: Instantiated 239 Zones.
[2021-07-09 07:10:45 -0400] information/ConfigItem: Instantiated 239 Endpoints.
[2021-07-09 07:10:45 -0400] information/ConfigItem: Instantiated 1 IdoMysqlConnection.
[2021-07-09 07:10:45 -0400] information/ConfigItem: Instantiated 3 ApiUsers.
[2021-07-09 07:10:45 -0400] information/ConfigItem: Instantiated 1 ApiListener.
[2021-07-09 07:10:45 -0400] information/ConfigItem: Instantiated 116 CheckCommands.
[2021-07-09 07:10:45 -0400] information/ConfigItem: Instantiated 1 InfluxdbWriter.
[2021-07-09 07:10:45 -0400] information/ConfigItem: Instantiated 7 TimePeriods.
[2021-07-09 07:10:45 -0400] information/ConfigItem: Instantiated 9 UserGroups.
[2021-07-09 07:10:45 -0400] information/ConfigItem: Instantiated 11 Users.
[2021-07-09 07:10:45 -0400] information/ConfigItem: Instantiated 3748 Services.
[2021-07-09 07:10:45 -0400] information/ConfigItem: Instantiated 9 ServiceGroups.
[2021-07-09 07:10:45 -0400] information/ConfigItem: Instantiated 10 ScheduledDowntimes.
[2021-07-09 07:10:46 -0400] information/ScriptGlobal: Dumping variables to file '/var/cache/icinga2/icinga2.vars'
[2021-07-09 07:10:46 -0400] information/cli: Finished validating the configuration file(s).