Backend icinga is not running

vyvuvivo · October 18, 2021, 9:37am

Hi Team,
I have installed master zone with 2 nodes: master01, master02.
I frequently (about once a week) get “backend icinga not running” error after deploy configuration from Icinga Director.
When this error is encountered, the icinga2 service on the active endpoint(master02) is reloading. I have to manualy restart service to work properly.

I tried turning off master02 to debug, but this still occurs with master01.
I have consulted similar topics about “Backend icinga not running” but there is no solution for this case.
Can someone help me, please!

Enviroment

Version used (icinga2 --version): v2.13.1-1
Operating System and version: CentOS 7
Enabled features (icinga2 feature list): api checker ido-mysql influxdb mainlog notification
Icinga Web 2 version and modules (System - About): icingaweb2 v2.8.2, module: director, fileshipper, grafana, incubator, ipl, monitoring, reactbundle
Config validation (icinga2 daemon -C): OK

Al2Klimov · October 18, 2021, 1:35pm

Hmm… “Shutting down old instance…” What do the logs say?

vyvuvivo · October 19, 2021, 3:09am

Tnks for your repply!
When this error occurs. There isn’t started log and icinga2 service has been suspended (reloading state)

[2021-09-14 14:58:51 +0700] information/Application: Received request to shut down.
[2021-09-14 14:58:52 +0700] information/Application: Shutting down...
[2021-09-14 14:58:52 +0700] information/CheckerComponent: 'checker' stopped.
[2021-09-14 14:58:52 +0700] information/NotificationComponent: 'notification' stopped.
[2021-09-14 14:58:52 +0700] information/DbConnection: Pausing IDO connection: ido-mysql

When normal process.

[2021-09-14 19:48:10 +0700] information/Application: Shutting down...
[2021-09-14 19:48:10 +0700] information/CheckerComponent: 'checker' stopped.
[2021-09-14 19:48:10 +0700] information/NotificationComponent: 'notification' stopped.
[2021-09-14 19:48:10 +0700] information/DbConnection: Pausing IDO connection: ido-mysql
[2021-09-14 19:48:10 +0700] information/IdoMysqlConnection: Disconnected from 'ido-mysql' database 'icinga2'.
[2021-09-14 19:48:10 +0700] information/IdoMysqlConnection: 'ido-mysql' paused.
[2021-09-14 19:48:10 +0700] information/DbConnection: 'ido-mysql' stopped.
[2021-09-14 19:48:10 +0700] information/InfluxdbWriter: 'influxdb' paused.
[2021-09-14 19:48:10 +0700] information/ApiListener: 'api' stopped.
[2021-09-14 19:48:26 +0700] information/FileLogger: 'main-log' started.
[2021-09-14 19:48:29 +0700] information/ApiListener: 'api' started.
[2021-09-14 19:48:29 +0700] information/ApiListener: Started new listener on '[::]:5665'
[2021-09-14 19:48:29 +0700] information/ApiListener: Reconnecting to endpoint 'master01' via host 'xxx.xxx.xxx.xxx' and port '5665'

Al2Klimov · October 19, 2021, 8:58am

Hmm… and if you increase the log level to notice?

vyvuvivo · October 19, 2021, 11:49am

I will increase the log level to notice for more information. Does the file size increase quickly?
The error happens randomly so I can’t enable debug log. (It’s too large)
Thanks!

Al2Klimov · October 19, 2021, 1:09pm

Well… define quickly.

steaksauce · October 19, 2021, 2:23pm

“Quickly” is going to depend on your restraints, and how many events are generated (usually depending on how large your environment is). You could and should setup your logs to rotate – start with daily if you think it’s going to be big, and then you can work your way down from there if needed.

vyvuvivo · October 19, 2021, 3:06pm

I tried to change severity to notice. But log file is too large. Just about 15 minutes, the log file has grown to 2GB.

My enviroment has about 26k services and 3k5 hosts.

steaksauce · October 19, 2021, 5:46pm

We’re the same size – sounds like you need a log disk like us

Al2Klimov · October 20, 2021, 8:33am

Can you rotate it more frequently?

drapiti · October 21, 2021, 5:58pm

Hi guys, we were running into this issue also not long ago. We found that there were some issues with the configuration so it was not able to sync correctly. To start try look at the configuration with icinga daemon -c. We ended up clearing the staging directory and everything went back to normal. In our case one of the problems was a user group not set in a global zone.

vyvuvivo · October 22, 2021, 12:16pm

Yep. I always check the configurations with “icinga2 daemon -C” the first. There are some warning event as below.

Do you mean delete /var/lib/icinga2/api/zones-stage directory and restart service to resync?

I wil enable debug log when the error occurs the next time. Hope to have more information.

Many thanks!

drapiti · October 22, 2021, 4:19pm

Yes emptied and restarted on masters and satellites after having corrected the config error.