Icinga 2 service not restarting

Giulia_Baldusso · January 15, 2020, 10:04am

Hello,

in these days I had a problem with icinga and the notifications, so I tried to restart the service (it was active) with systemctl restart icinga2.service he didn’t restarted and now is unavailable.

After, I noticed that a process (syslog-ng) was using the CPU at 160% so I changed a conf file and restarted the service…now it uses 32% of CPU and all is apparenlty fair.

So I thinked: “now Icinga2 don’t have any problem for restarting, let’s try to make it run”

I used systemctl restart icinga2.service but Icinga don’t want to come up!

systemctl status icinga2.service says that:

icinga2.service: Main process exited, code=killled, status=6/ABRT
Failed to start Icinga /host7service/network monitoring system.
icinga2.service: Unit entered failed state.
icinga2.service: Failed with result ‘signal’.

It also says that I can find more infos under /var/log/icinga2/crash but all the files are empty!

What can I do?

All the other service s are running…Docker, Apache2, mysql…

What can I do?

The docker containers are running…

Giulia

Pooh · January 15, 2020, 10:29am

Hello,

in these days I had a problem

What was the problem?

with icinga and the notifications, so I tried to restart the service (it was
active) with systemctl restart icinga2.service he didn’t restarted and now
is unavailable.

Anything useful in /var/log/icinga2/icinga2.log or
/var/log/icinga2/startup.log ?

After, I noticed that a process (syslog-ng) was using the CPU at 160% so I
changed a conf file

Which one?

What did you change?

and restarted the service…now it uses 32% of CPU and all is apparenlty
fair.

So, to be clear, syslog-ng is using 32% of your CPU?

That seems like a great deal to me.

So I thinked: “now Icinga2 don’t have any problem for restarting, let’s try
to make it run”

I used systemctl restart icinga2.service but Icinga don’t want to come up!

Anything useful in /var/log/icinga2/icinga2.log or
/var/log/icinga2/startup.log ?

systemctl status icinga2.service says that:

icinga2.service: Main process exited, code=killled, status=6/ABRT
Failed to start Icinga /host7service/network monitoring system.
icinga2.service: Unit entered failed state.
icinga2.service: Failed with result ‘signal’.

It also says that I can find more infos under /var/log/icinga2/crash but
all the files are empty!

What can I do?

Troubleshooting - Icinga 2 may help.

Antony.

rsx · January 15, 2020, 10:31am

You might have an error in your conf files. For checking you need to run

icinga2 daemon -C

Giulia_Baldusso · January 15, 2020, 11:51am

The problem was that the notifications randomly stopped to be sent to my e-mail…

in /var/log icinga2.log doesn’t exist…same for startup.log

in /var/log/icinga2 directory i found only old logs and the “crash” directory have empty files inside.

I found these files:

in /var/log/error there is:

icingasrv systemd[1]: failed to start Icinga host/service/network monitoring service
icingasrv syslog-ng[16589]: I/O error occoured while writing; fd=‘56’, error=‘Broken pipe (32)’

sorry if I don’t report all the things but I’m in a sort of matrioska of RDP’s, VmWare ecc ecc ecc so I’m writing logs by hand

in /var/log/syslog there is:

icingasrv systemd[1]: icinga2.service: Failed with result ‘signal’.

And there is also an error that sounds like:

icingasrv icinga2[5616]: error: Function call ‘std::ifstream::open’ for file /NOW/IT/REFERS/TO/A/LONG/PATH failed with error code 2, ‘No such file or directory’

(0) Compiling configuration file /NOW/IT/REFERS/TO/A/LONG/PATH

For syslog-ng, i changed the line with #ForwardToSyslog=yes in ForwardToSyslog=no in /etc/systemd/journald.conf…
Now it uses 32% of CPU but is better than 160%…

I already readed https://icinga.com/docs/icinga2/latest/doc/15-troubleshooting/…

Giulia_Baldusso · January 15, 2020, 11:53am

The output don’t seems to contain errors…

dnsmichi · January 15, 2020, 12:30pm

Hi,

sounds odd with the syslog-ng failing so hard on the file descriptor level. Can you add some more specs of your system, like the output of

icinga2 --version
icinga2 daemon -C
How many CPU, RAM, etc. is available, e.g. htop

Cheers,
Michael

Giulia_Baldusso · January 15, 2020, 1:18pm

icinga2 --version

System information:
Platform: Debian GNU/Linux
Platform version: 9 (stretch)
Kernel: Linux
Kernel version: 4.9.0-6-amd64
Architecture: x86_64

Build information:
Compiler: GNU 6.3.0
Build host: 486e413fb159

Application information:

general paths:
Config directory: /etc/icinga2
data directory: /var/lib/icinga2
Log directory: /var/log/icinga2
Cache directory: /var/cache/icinga2
Spool diectory: /var/spool/icinga2
Run directory: 7run/icinga2

Old paths (deprecated):
Installation root: /usr
Sysconf directory: /etc
Run directory (base): /run
Local state directory: /var

Internal paths:

Package data directory: /usr/share/icinga2
Sate path: /var/lib/icinga2/icinga2.state
Modified attributes path: /var/lib/icinga2/modified-attributes.conf
Objects path: /var/cache/icinga2/icinga2.debug
Vars path: /var/cache/icinga2/icinga2.vars
PID path: /run/icinga2/icinga2.pid

icinga2 daemon -C

information/cli: Icinga application loader 8version: r2.10.2-1)
information/cli: Loading configuration file(s).
information/ConfigItem: Commiting config item(s).
information/ApiListener: My API identity: icingasrv.local
information/ConfigItem: Instantiated 1 ScheduledDowntime.
information/ConfigItem: Instantiated 497 Services.
information/ConfigItem: Instantiated 1 icingaApplication.
information/ConfigItem: Instantiated 486 Hosts.
information/ConfigItem: Instantiated 1 fileLogger.
information/ConfigItem: Instantiated 2 NotificationCommands.
information/ConfigItem: Instantiated 13 Notifications.
information/ConfigItem: Instantiated 1 NotificationComponent.
information/ConfigItem: Instantiated 2 HostGroups.
information/ConfigItem: Instantiated 1 ApiListener.
information/ConfigItem: Instantiated 1 GraphiteWriter.
information/ConfigItem: Instantiated 1 PerfdataWriter.
information/ConfigItem: Instantiated 1 CheckerComponent.
information/ConfigItem: Instantiated 3 Zones.
information/ConfigItem: Instantiated 1 Endpoint.
information/ConfigItem: Instantiated 1 ApiUser.
information/ConfigItem: Instantiated 1 User.
information/ConfigItem: Instantiated 1 IdoMysqlConnection.
information/ConfigItem: Instantiated 215 CheckCommands.
information/ConfigItem: Instantiated 1 usergroup.
information/ConfigItem: Instantiated 3 ServiceGroups.
information/ConfigItem: Instantiated 3 Timeperiods.
information/ScriptGlobal: Dumping variabiles to file ‘/var/cache/icinga2/icinga2.vars’
information/cli: Finished validating the configuration file(s).

I cannot install htop because I don’t have free space in /var/cache/apt/archives…

The CPU processes that use more CPU are:

syslog-ng 26,3%
top 10,5%
jbd2/dm-0-8 5,3%
mysqld 5,3%

Tthe others are at 0%

The host is a VmWare virtual machine witch:

Consumed host CPU: 1,6 GHz
Consumed host memory: 3,94 GB
Active guest memory: 491 MB

Storage provisioned: 120 GB
Storage uncommited: 1,03 KB
Storage not-shared: 64,11 GB
Storage used: 64,11 GB

Giulia_Baldusso · January 15, 2020, 1:39pm

In the graph of VmWare the consumed host memory seems to be near the 100%

Giulia_Baldusso · January 15, 2020, 1:44pm

with the free -m command:

mem total: 3955
mem used: 2093
mem free: 327
mem shared: 70
mem buff/cache: 1535
mem available: 1527

swap total: 2047
swap used: 157
swap free: 1890

gkoutsog · January 15, 2020, 1:45pm

Your issues seem to be I/O related. You could try and run sar -b and see if the numbers are too big.

Also worth checking VMware logs for issues as it could be affected by underlying storage.

Giulia_Baldusso · January 15, 2020, 2:53pm

The memory situation reported by VmWare is not good…

gkoutsog · January 15, 2020, 2:58pm

You could try to reboot the VM but there is a good risk that it will refuse to start again.

dnsmichi · January 15, 2020, 3:03pm

Meaning to say, /var is full?

Giulia_Baldusso · January 15, 2020, 3:54pm

the output of “df -h /var” says that the dimension is 44G and the used part is 42G…available 0 and usage 100%

dnsmichi · January 15, 2020, 4:00pm

Holy moly, you should take immediate action here. Whenever daemons cannot write their logs to /var/log that’s really bad for their runtime, likewise /var/lib/mysql won’t be able to store things in the database.

Try

du -h --max-depth=1

to identify big directories. Highly likely logs are huge and need to be purged.

Cheers,
Michael

Giulia_Baldusso · January 15, 2020, 4:07pm

12K ./www
20G ./spool
5,2G ./log
7,5G ./lib
5,0M ./backups
4,0K ./opt
44K ./mail
4,0K ./local
1,4G ./cache
508K ./tmp
34G .

This can be the issue related to ther icinga2 service?

dnsmichi · January 15, 2020, 7:53pm

Dive deeper into the directories and figure out why log for instance is 5.2 GB in size. Try deleting non-required files and free up space. Oh, and spool has 20GB of data … did you by chance enable the perfdata feature but not use PNP for metrics?

This is related to any service on your host, all of them suffer right now and the CPU load is increasing because of that.

In terms of partitions I strongly recommend to create new ones for

/var/log
/var/lib/mysql

to separate them from normal operations.

Cheers,
Michael

Giulia_Baldusso · January 16, 2020, 9:01am

Thank youuu now icinga2.service restarted and the e-mail restarted to be inoltrated!!!

Thank you so much!!!