Backend icinga is not running

Hi Team,
I have installed master zone with 2 nodes: master01, master02.
I frequently (about once a week) get “backend icinga not running” error after deploy configuration from Icinga Director.
When this error is encountered, the icinga2 service on the active endpoint(master02) is reloading. I have to manualy restart service to work properly.

I tried turning off master02 to debug, but this still occurs with master01.
I have consulted similar topics about “Backend icinga not running” but there is no solution for this case.
Can someone help me, please!

Enviroment

  • Version used (icinga2 --version): v2.13.1-1
  • Operating System and version: CentOS 7
  • Enabled features (icinga2 feature list): api checker ido-mysql influxdb mainlog notification
  • Icinga Web 2 version and modules (System - About): icingaweb2 v2.8.2, module: director, fileshipper, grafana, incubator, ipl, monitoring, reactbundle
  • Config validation (icinga2 daemon -C): OK

Hmm… “Shutting down old instance…” What do the logs say?

Tnks for your repply!
When this error occurs. There isn’t started log and icinga2 service has been suspended (reloading state)

[2021-09-14 14:58:51 +0700] information/Application: Received request to shut down.
[2021-09-14 14:58:52 +0700] information/Application: Shutting down...
[2021-09-14 14:58:52 +0700] information/CheckerComponent: 'checker' stopped.
[2021-09-14 14:58:52 +0700] information/NotificationComponent: 'notification' stopped.
[2021-09-14 14:58:52 +0700] information/DbConnection: Pausing IDO connection: ido-mysql

When normal process.

[2021-09-14 19:48:10 +0700] information/Application: Shutting down...
[2021-09-14 19:48:10 +0700] information/CheckerComponent: 'checker' stopped.
[2021-09-14 19:48:10 +0700] information/NotificationComponent: 'notification' stopped.
[2021-09-14 19:48:10 +0700] information/DbConnection: Pausing IDO connection: ido-mysql
[2021-09-14 19:48:10 +0700] information/IdoMysqlConnection: Disconnected from 'ido-mysql' database 'icinga2'.
[2021-09-14 19:48:10 +0700] information/IdoMysqlConnection: 'ido-mysql' paused.
[2021-09-14 19:48:10 +0700] information/DbConnection: 'ido-mysql' stopped.
[2021-09-14 19:48:10 +0700] information/InfluxdbWriter: 'influxdb' paused.
[2021-09-14 19:48:10 +0700] information/ApiListener: 'api' stopped.
[2021-09-14 19:48:26 +0700] information/FileLogger: 'main-log' started.
[2021-09-14 19:48:29 +0700] information/ApiListener: 'api' started.
[2021-09-14 19:48:29 +0700] information/ApiListener: Started new listener on '[::]:5665'
[2021-09-14 19:48:29 +0700] information/ApiListener: Reconnecting to endpoint 'master01' via host 'xxx.xxx.xxx.xxx' and port '5665'

Hmm… and if you increase the log level to notice?

I will increase the log level to notice for more information. Does the file size increase quickly?
The error happens randomly so I can’t enable debug log. (It’s too large)
Thanks!

Well… define quickly.

1 Like

“Quickly” is going to depend on your restraints, and how many events are generated (usually depending on how large your environment is). You could and should setup your logs to rotate – start with daily if you think it’s going to be big, and then you can work your way down from there if needed.

2 Likes

I tried to change severity to notice. But log file is too large. Just about 15 minutes, the log file has grown to 2GB.
image

My enviroment has about 26k services and 3k5 hosts.

We’re the same size – sounds like you need a log disk like us :wink:

1 Like

Can you rotate it more frequently?

Hi guys, we were running into this issue also not long ago. We found that there were some issues with the configuration so it was not able to sync correctly. To start try look at the configuration with icinga daemon -c. We ended up clearing the staging directory and everything went back to normal. In our case one of the problems was a user group not set in a global zone.

Yep. I always check the configurations with “icinga2 daemon -C” the first. There are some warning event as below.

Do you mean delete /var/lib/icinga2/api/zones-stage directory and restart service to resync?

I wil enable debug log when the error occurs the next time. Hope to have more information.

Many thanks!

Yes emptied and restarted on masters and satellites after having corrected the config error.