Satellite connected servers go to Unknown state (Too many open files) after the upgrade from 2.10.5 to 2.11

Hi guys,

After we upgraded the every server to the last version of icinga (from 2.10.5 to 2.11) we’re getting this error: “Exception occurred while checking ‘server_name!ssh’: Error: Function call ‘pipe2’ failed with error code 24, ‘Too many open files’” only to the servers connected to the satellite.
Our configuration is:
1 master (in x location) with some clients that have the master as the parent
1 satellite (in another location) with some clients connected to it.

As mentioned before this error appears 2-3 times per day only to the servers connected to the satellite. One thing i’ve noticed: if i restart the icinga2 process on satellite everything goes back again.
Except for the services going to Unknown state, also the host appears as “Down” in icinga.

What i’ve tried is to increase the limits in limits.conf file, but that didn’t helped. Also, i’ve uncommented the line from the limits.conf file “# May also cause problems, uncomment if you have any
#LimitNPROC=62883” without any good result, it’s still the same.

Do you have some thoughts what I can do in order to not get this error?
LE: I’m using Ubuntu 18.04 version on both (master + satellite).

Thanks in advance for your help!

Hi,

LimitNPROC unfortunately is the wrong parameter, that only controls the number of processes spawned. This is the directive required when fork() fails with errors. In this case, you’re hidding a file descriptor limit.

A file descriptor on Linux/Unix can be multiple things - a file handle, a socket or even a pipe. The communication between the process spawn helper and check execution processes uses a pipe in the background.

We’ve seen this before during a performance analysis, and in order to level up the limits, you need to do the following: https://github.com/Icinga/icinga2/issues/7425#issuecomment-535481664

Cheers,
Michael