Master cant connect to agent

My Problem is still the one in Setup agent node without connecting to the master
but i was informed, that i might not get help there, because it is marked as solved.

I will copy the relevant parts here:

"The situation is this: the icinga master is in my network and doesnt have a public ip. my network is connected to the hosts network via vpn. the same setup exists for multiple other networks and works flawlessly with linux servers.
the thing is: the vpn is one way and the host network has no route to my network.

I got it working somehow for one host, after Kev kindly helped me, but afaik i tried the same apporach for two more hosts and it doesnt work the same way. The certificates work and are correct imo, but the master simply cant establish a connection."

"Yes the hosts are pingable and are working in icinga already, its just the services that are unknown, because icinga cant connect to the agents.

…It is now 3 hours later. I nuked the agents and installed everything icinga from scratch.
One of them is now reachable and working as intended and the other still has the same problem, eventhough i installed them identically.

The debuglog on the master is really not helpful at all, because he just complains that he cant connect."

Debuglog on the agent:

[2021-10-13 15:45:27 +0200] information/FileLogger: 'debug-file' started.
[2021-10-13 15:45:27 +0200] information/FileLogger: 'main-log' started.
[2021-10-13 15:45:27 +0200] information/ApiListener: 'api' started.
[2021-10-13 15:45:27 +0200] information/ApiListener: Started new listener on '[::]:5665'
[2021-10-13 15:45:27 +0200] debug/ApiListener: Not connecting to Endpoint 'xxx' because that's us.
[2021-10-13 15:45:27 +0200] debug/ApiListener: Not connecting to Endpoint 'xx' because the host/port attributes are missing.
[2021-10-13 15:45:27 +0200] notice/ApiListener: Current zone master: xxx
[2021-10-13 15:45:27 +0200] notice/ApiListener: Connected endpoints: 
[2021-10-13 15:45:27 +0200] information/ConfigItem: Activated all objects.
[2021-10-13 15:45:27 +0200] notice/ApiListener: Updating object authority for objects at endpoint 'xxx'.
[2021-10-13 15:45:27 +0200] debug/IcingaApplication: In IcingaApplication::Main()
[2021-10-13 15:45:37 +0200] information/WorkQueue: #4 (ApiListener, RelayQueue) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2021-10-13 15:45:37 +0200] information/WorkQueue: #5 (ApiListener, SyncQueue) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2021-10-13 15:45:37 +0200] debug/ApiListener: Not connecting to Endpoint 'xxx' because that's us.
[2021-10-13 15:45:37 +0200] notice/ApiListener: Updating object authority for objects at endpoint 'xxx'.
[2021-10-13 15:45:37 +0200] debug/ApiListener: Not connecting to Endpoint 'xx' because the host/port attributes are missing.
[2021-10-13 15:45:37 +0200] notice/ApiListener: Current zone master: xxx
[2021-10-13 15:45:37 +0200] notice/ApiListener: Connected endpoints: 
[2021-10-13 15:45:47 +0200] notice/ApiListener: Updating object authority for objects at endpoint 'xxx'.

“The firewall doesn’t seem to be the issue, the port is open.”

“Oh ok. Like i said time and date are the same on both systems.
API is also enabled”

I’d recommend to let the agents initiate the connections since you don’t need to open one additional port on every client machine. To do so, you need to configure host and port attributes in you agent’s zones.conf for the parent endpoint.

The zones conf is empty, i think that is because the director kickstart script writes its stuff somewhere else.
This is what i found in the icinga2.conf, which seems to have the necessary info as far as i can tell.

/**
 * Icinga 2 Config - Proposed by Icinga 2 PowerShell Module
 */

/* Define our includes to run the agent properly. */
include "constants.conf"
include <itl>
include <plugins>
include <nscp>
include <windows-plugins>

/* Required for Icinga 2.8.0 and above */
const NodeName = "dc1.xxx.local"

/* Define our block required to enable or disable Icinga 2 debug log
 * Enable or disable it by using the PowerShell Module with
 * argument -IcingaEnableDebugLog or by switching
 * PowerShellIcinga2EnableDebug to true or false manually.
 * true: Debug log is active
 * false: Debug log is deactivated
 * IMPORTANT: ";" after true or false has to remain to allow the
 *            PowerShell Module to switch this feature on or off.
 */
const PowerShellIcinga2EnableDebug = false;
const PowerShellIcinga2EnableLog = true;

if (PowerShellIcinga2EnableDebug) {
  object FileLogger "debug-file" {
    severity = "debug"
    path = LocalStateDir + "/log/icinga2/debug.log"
  }
}

/* Try to define a constant for our NSClient++ installation
 * IMPORTANT: If the NSClient++ is installed newly to the system, the
 * Icinga Service has to be restarted in order to set this variable
 * correctly. If the NSClient++ is installed over the PowerShell Module,
 * the Icinga 2 Service is restarted automaticly.
 */
if (!globals.contains("NscpPath")) {
  NscpPath = dirname(msi_get_component_path("{5C45463A-4AE9-4325-96DB-6E239C034F93}"))
}

/* Enable our default main logger feature to write log output. */
if (PowerShellIcinga2EnableLog) {
  object FileLogger "main-log" {
    severity = "information"
    path = LocalStateDir + "/log/icinga2/icinga2.log"
  }
}

/* All informations required to correctly connect to our parent Icinga 2 nodes. */
object Endpoint "dc1.xxx.local" {}
object Endpoint "monitoringxxx" {

}

/* Define the zone and its containing endpoints we should communicate with. */
object Zone "master" {
  endpoints = [ "monitoringxxx" ]
}

/* All of our global zones, check commands and other configuration are synced into.
 * Director global zone must be defined in case the Icinga Director is beeing used.
 * Default value for this is "director-global".
 * All additional zones can be configured with -GlobalZones argument.
 * IMPORTANT: If -GlobalZones argument is used, the Icinga Director global zones has
 *            to be defined as well within the argument array.
 */
object Zone "director-global" {
 global = true
}
object Zone "global-templates" {
 global = true
}

/* Define a zone for our current agent and set our parent zone for proper communication. */
object Zone "dc1.xxx.local" {
  parent = "master"
  endpoints = [ "dc1.xxx.local" ]
}

/* Configure all settings we require for our API listener to properly work.
 * This will include the certificates, if we accept configurations which
 * can be changed with argument -AcceptConfig and the bind informations.
 * The bind_port can be modified with argument -AgentListenPort.
 */
object ApiListener "api" {
  accept_commands = true
  accept_config = true
  bind_host = "::"
  bind_port = 5665
}

Okay, the agent was installed by the old powershell module. In this case you need modify agent’s icinga2.conf manually:

object Endpoint "monitoringxxx" {
   host = "x.x.x.x" (or FQDN)
   port = "5665"
}

Or update director’s setting for master endpoint by adding host and port information. And rerun the powershell module on every agent.

I just tried your solution of modifying the conf manually. Sadly nothing changed.
How would that even help though, considering the agent has no way of finding the host due to the one way connection.

Did you check icinga2.log?

You can try a connection manually using e.g. Powershell:

Test-NetConnection -ComputerName x.x.x.x -Port 5665

icinga2.log just keeps repeating

[2021-11-25 14:52:43 +0100] information/ConfigObject: Dumping program state to file 'C:\ProgramData\icinga2\var\lib\icinga2/icinga2.state'
[2021-11-25 14:57:04 +0100] information/WorkQueue: #4 (ApiListener, RelayQueue) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2021-11-25 14:57:04 +0100] information/WorkQueue: #5 (ApiListener, SyncQueue) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2021-11-25 14:57:43 +0100] information/ConfigObject: Dumping program state to file 'C:\ProgramData\icinga2\var\lib\icinga2/icinga2.state'

Like i said, the agents cant connect to the master themselves, but i tried anyway for a different host in the same network that has a working agent and he cant connect with “Test-NetConnection -ComputerName x.x.x.x -Port 5665” either.

No, your agent does not even try to connect. Otherwise you would see this in the logs. That means your config is not ok.

Im an idiot. I forgot the " behind my masters name.
Now he at least tries to connect.

[2021-11-29 13:08:38 +0100] information/FileLogger: 'main-log' started.
[2021-11-29 13:08:38 +0100] information/ApiListener: 'api' started.
[2021-11-29 13:08:38 +0100] information/ApiListener: Started new listener on '[::]:5665'
[2021-11-29 13:08:38 +0100] information/ConfigItem: Activated all objects.
[2021-11-29 13:08:38 +0100] information/ApiListener: Reconnecting to endpoint 'monitoring.xxx.xxx.de' via host 'monitoring.xxx.xxx.de' and port '5665'
[2021-11-29 13:08:38 +0100] critical/ApiListener: Cannot connect to host 'monitoring.xxx.xxx.de' on port '5665': Der angegebene Host ist unbekannt

The german in the end is him saying “the specified host is unknown”.

Instead of hostname.domainname i now replaced it with the IP the master seems to have in their network(i got that from a working agents conf) and now it says this:

[2021-11-29 13:17:20 +0100] information/ConfigItem: Activated all objects.
[2021-11-29 13:17:20 +0100] information/ApiListener: Reconnecting to endpoint 'monitoring.xxx.xxx.de' via host '10.1.1.54' and port '5665'
[2021-11-29 13:17:21 +0100] critical/ApiListener: Cannot connect to host '10.1.1.54' on port '5665': Es konnte keine Verbindung hergestellt werden, da der Zielcomputer die Verbindung verweigerte
[2021-11-29 13:17:30 +0100] information/WorkQueue: #4 (ApiListener, RelayQueue) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2021-11-29 13:17:30 +0100] information/WorkQueue: #5 (ApiListener, SyncQueue) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2021-11-29 13:17:30 +0100] information/ApiListener: Reconnecting to endpoint 'monitoring.xxx.xxx.de' via host '10.1.1.54' and port '5665'
[2021-11-29 13:17:31 +0100] critical/ApiListener: Cannot connect to host '10.1.1.54' on port '5665': Es konnte keine Verbindung hergestellt werden, da der Zielcomputer die Verbindung verweigerte

The german part translates to "Connection could not be established, because the targeted host refuses the connection

That’s why I recommended to test with Test-NetConnection. It looks like you have a network issue e.g. firewall blocking or routing malfunction.

WARNING: TCP connect to 10.1.1.54:5665 failed

ComputerName : 10.1.1.54
RemoteAddress : 10.1.1.54
RemotePort : 5665
InterfaceAlias : Ethernet
SourceAddress : 192.168.50.155
PingSucceeded : False
PingReplyDetails (RTT) : 0 ms
TcpTestSucceeded : False

The thing is, that result is the same on the working server in the same network.
Im sorry, im fairly new at all this, but how could the network be the issue, if it works for other hosts in the same network?

I’m sorry, I’ve no idea about your network.

Hi,

you could try commands like
tracert
(= traceroute on Linux) to see at which hop you get problems. Then you should see what @rsx mean: maybe a firewall or router blocks the tcp packets on the way to the destination.