Troubleshooting Icinga agents

dnsmichi · January 18, 2019, 8:11am

Resources

Check the troubleshooting docs first to see if it already provides an answer.

FAQ

General

Duplicate check results and strange thresholds

This applies to agents run via command_endpoint, The Icinga daemon being run on the agent acts as “execution bridge” and does not need any local host/service objects.

The problem lies in the default configuration for any instance - by default, conf.d is included and provides a host object (NodeName, FQDN) and some apply rules for services. By default, the checker feature is enabled after a fresh package setup.

Icinga as agent would now start to execute local checks and sends these check results back to the master.

The master instance itself schedules a check execution via that command bridge, and applies different thresholds.

This leads to the situation where you’ll have two check sources:

The agent running a check itself
The master execution a check via command_endpoint

The last one is what you want, and therefore we recommend to disable the checker feature and conf.d inclusion on agents. Since 2.9, conf.d is automatically disabled in node wizard/setup CLI commands when run in agent mode.

Check here for more details.

Windows

Windows blocks Icinga 2 with ephemeral port range

At a sudden point, the agent cannot be connected to the parent node again, and the logs contain the following:

critical/TcpSocket: Invalid socket: 10055, "An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full."

This is a low-level OS error outside of Icinga itself.

What solved the problem was an entry into the Windows hosts registry raising the MaxUserPort value.

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters 

Value Name: MaxUserPort Value 
Type: DWORD 
Value data: 65534

More details in this blogpost by @twidhalm