Setup agent node without connecting to the master

Hi Icinga Community,

I’m extending an existing monitoring enviroment with some Windows Servers 2019.
For security reasons one of them is not allowed to connect from the agent to the master.
So using the Icinga Wizard is not possible as it can’t fetch a certificate from the master.

I’ve found a similar article from 2yrs ago but it doesn’t work for me the same way:

##############################

  • OS master: Debian GNU/Linux, platform version 10
  • Version master: The Icinga 2 network monitoring daemon (version: r2.12.3-1)
  • OS agent: Windows Servers 2019
  • Version agent: Icinga2-v2.12.4
  • Enabled features on agent: api mainlog notification
  • Disabled features on agent: checker debuglog elasticsearch gelf graphite influxdb opentsdb perfdata

I run only one master with several agents connected. Using the Icinga Wizard works absolutely fine for all other agents. But not using the Wizard fails.

The agent was integrated in the monitoring and performs successfully a ping check. But the master can’t fetch any remote check results from the agent

Icinga2-Service is running on the agent
But a telnet from the master to the agent on port 5665 fails, connection refused.

Using netstat -a on the agent shows, that the port 5665 is not active (not listed)
##############################

What I’ve done:

  1. I’ve installed Icinga on the Windows Server, then created and signed the certificate on the master with the following two commands:
cinga2 pki new-cert --cn server.foo.bar --key server.foo.bar.key --csr server.foo.bar.csr
icinga2 pki sign-csr --csr server.foo.bar.csr --cert server.foo.bar.crt
  1. Then copied the certificate and the key into /var/lib/icinga2/certs on the agent node
  2. Then copied the ca.crt from /var/lib/icinga2/ca (on the master) in the same folder on the agent
  3. Enabled the api-feature with C:\Program Files\ICINGA2\sbin> .\icinga2.exe feature enable api

Checking the log-files on icinga master and agent, I don’t get much information.
Log entry from master:

[2021-07-09 12:57:04 +0200] information/ApiListener: Reconnecting to endpoint 'ms2019p-epdhcp.domain.com' via host 'ms2019p-epdhcp.domain.com' and port '5665'
[2021-07-09 12:57:04 +0200] critical/ApiListener: Cannot connect to host 'ms2019p-epdhcp.domain.com' on port '5665': Connection refused
[2021-07-09 12:57:14 +0200] information/ApiListener: Reconnecting to endpoint 'ms2019p-epdhcp.domain.com' via host 'ms2019p-epdhcp.domain.com' and port '5665'
[2021-07-09 12:57:14 +0200] critical/ApiListener: Cannot connect to host 'ms2019p-epdhcp.domain.com' on port '5665': Connection refused
[2021-07-09 12:57:24 +0200] information/ApiListener: Reconnecting to endpoint 'ms2019p-epdhcp.domain.com' via host 'ms2019p-epdhcp.domain.com' and port '5665'
[2021-07-09 12:57:24 +0200] critical/ApiListener: Cannot connect to host 'ms2019p-epdhcp.domain.com' on port '5665': Connection refused
[2021-07-09 12:57:34 +0200] information/ApiListener: Reconnecting to endpoint 'ms2019p-epdhcp.domain.com' via host 'ms2019p-epdhcp.domain.com' and port '5665'
[2021-07-09 12:57:34 +0200] critical/ApiListener: Cannot connect to host 'ms2019p-epdhcp.domain.com' on port '5665': Connection refused
[2021-07-09 12:57:44 +0200] information/ApiListener: Reconnecting to endpoint 'ms2019p-epdhcp.domain.com' via host 'ms2019p-epdhcp.domain.com' and port '5665'
[2021-07-09 12:57:44 +0200] critical/ApiListener: Cannot connect to host 'ms2019p-epdhcp.domain.com' on port '5665': Connection refused


Log entry from agent: (comes daily)

[2021-07-08 02:00:03 +0200] information/Checkable: Checkable 'MS2019P-EPDHCP!load' has 1 notification(s). Checking filters for type 'DowntimeStart', sends will be logged.
[2021-07-08 02:00:03 +0200] information/Notification: Sending 'DowntimeStart' notification 'MS2019P-EPDHCP!load!mail-icingaadmin' for user 'icingaadmin'
[2021-07-08 02:00:03 +0200] information/Downtime: Triggering downtime 'MS2019P-EPDHCP!load!6b853396-81b9-4c95-b64c-d5d1576a3996' for checkable 'MS2019P-EPDHCP!load'.
[2021-07-08 02:00:03 +0200] warning/PluginUtility: Error: Non-optional macro 'service.output' used in argument '-o' is missing.


[2021-07-08 02:00:03 +0200] warning/PluginNotificationTask: Notification command for object 'MS2019P-EPDHCP!load' (PID: 4294967295, arguments: '') terminated with exit code 3, output: Error: Non-optional macro 'service.output' used in argument '-o' is missing.


[2021-07-08 02:00:03 +0200] information/Notification: Completed sending 'DowntimeStart' notification 'MS2019P-EPDHCP!load!mail-icingaadmin' for checkable 'MS2019P-EPDHCP!load' and user 'icingaadmin' using command 'mail-service-notification'.
[2021-07-08 02:00:23 +0200] information/ConfigObjectUtility: Created and activated object 'MS2019P-EPDHCP!load!551240ba-0bad-4460-933e-4f72440fae0f' of type 'Downtime'.
[2021-07-08 02:00:23 +0200] information/Downtime: Added downtime 'MS2019P-EPDHCP!load!551240ba-0bad-4460-933e-4f72440fae0f' between '2021-07-09 02:00:00' and '2021-07-09 03:00:00', author: 'icingaadmin', fixed
[2021-07-08 02:04:18 +0200] information/ConfigObject: Dumping program state to file 'C:\ProgramData\icinga2\var\lib\icinga2/icinga2.state'
[2021-07-08 02:09:18 +0200] information/ConfigObject: Dumping program state to file 'C:\ProgramData\icinga2\var\lib\icinga2/icinga2.state'
[2021-07-08 02:14:18 +0200] information/ConfigObject: Dumping program state to file 'C:\ProgramData\icinga2\var\lib\icinga2/icinga2.state'
[2021-07-08 02:19:18 +0200] information/ConfigObject: Dumping program state to file 'C:\ProgramData\icinga2\var\lib\icinga2/icinga2.state'
[2021-07-08 02:24:18 +0200] information/ConfigObject: Dumping program state to file 'C:\ProgramData\icinga2\var\lib\icinga2/icinga2.state'
[2021-07-08 02:29:18 +0200] information/ConfigObject: Dumping program state to file 'C:\ProgramData\icinga2\var\lib\icinga2/icinga2.state'
[2021-07-08 02:34:18 +0200] information/ConfigObject: Dumping program state to file 'C:\ProgramData\icinga2\var\lib\icinga2/icinga2.state'
[2021-07-08 02:39:18 +0200] information/ConfigObject: Dumping program state to file 'C:\ProgramData\icinga2\var\lib\icinga2/icinga2.state'
[2021-07-08 02:44:18 +0200] information/ConfigObject: Dumping program state to file 'C:\ProgramData\icinga2\var\lib\icinga2/icinga2.state'
[2021-07-08 02:49:18 +0200] information/ConfigObject: Dumping program state to file 'C:\ProgramData\icinga2\var\lib\icinga2/icinga2.state'
[2021-07-08 02:54:18 +0200] information/ConfigObject: Dumping program state to file 'C:\ProgramData\icinga2\var\lib\icinga2/icinga2.state'
[2021-07-08 02:59:18 +0200] information/ConfigObject: Dumping program state to file 'C:\ProgramData\icinga2\var\lib\icinga2/icinga2.state'
[2021-07-08 03:00:24 +0200] information/Checkable: Checkable 'MS2019P-EPDHCP!load' has 1 notification(s). Checking filters for type 'DowntimeEnd', sends will be logged.
[2021-07-08 03:00:24 +0200] information/Notification: Sending 'DowntimeEnd' notification 'MS2019P-EPDHCP!load!mail-icingaadmin' for user 'icingaadmin'
[2021-07-08 03:00:24 +0200] warning/PluginUtility: Error: Non-optional macro 'service.output' used in argument '-o' is missing.


[2021-07-08 03:00:24 +0200] information/ConfigObjectUtility: Deleted object 'MS2019P-EPDHCP!load!6b853396-81b9-4c95-b64c-d5d1576a3996' of type 'Downtime'.
[2021-07-08 03:00:24 +0200] warning/PluginNotificationTask: Notification command for object 'MS2019P-EPDHCP!load' (PID: 4294967295, arguments: '') terminated with exit code 3, output: Error: Non-optional macro 'service.output' used in argument '-o' is missing.


[2021-07-08 03:00:24 +0200] information/Downtime: Removed downtime 'MS2019P-EPDHCP!load!6b853396-81b9-4c95-b64c-d5d1576a3996' from checkable 'MS2019P-EPDHCP!load' (Reason: expired at 2021-07-08 03:00:00 +0200).
[2021-07-08 03:00:24 +0200] information/Notification: Completed sending 'DowntimeEnd' notification 'MS2019P-EPDHCP!load!mail-icingaadmin' for checkable 'MS2019P-EPDHCP!load' and user 'icingaadmin' using command 'mail-service-notification'.

I’m fairly new to Icinga and don’t understand all the magic behind it entirely. It seems to me, that the Icinga service is running but not listening to the port 5665?

It would be very appreciated if someone can push me in the right direction.

Thank you very much and kind regards
Kevin

1 Like

Hi @visablehamburg, may I ask you to run the following command on the agent and share the output here.

C:\Program Files\ICINGA2\sbin> .\icinga2.exe daemon -C

So you can also check which port the service is listening to.

Hi @visablehamburg ,
as @yhabteab wrote, have a look at the the output of

C:\Program Files\ICINGA2\sbin> .\icinga2.exe daemon -C

Are the certificates reachable and on the right place?
Did you try to configure the agent manually?
The Wizard won’t be work because he try to connect to the master!

Check all these things and it should work!

I successfully connected the master to the host.

That’s great, I think I haven’t really understood the way the agent works.
C:\Program Files\ICINGA2\sbin> .\icinga2.exe daemon -C gave me a valuable hint.

The name of the certificate was written is lower case letters, but Icinga was looking for a certificate name written in capitals. So I changed that and came a step further.

Next error message then was that the zones.conf on the agent wasn’t set, so I wrote a proper zones.conf.


object Endpoint "master.domain.de" {
        host = "master.domain.de"
        port = "5665"
}

object Zone "master" {
  endpoints = [ "master.domain.de" ]
}

object Endpoint "agent.domain.de" {
}

object Zone "agent.domain.de" {
   endpoints = [ "agent.domain.de" ]
   parent = "master"
}

Last thing was that the agent didn’t accept commands from the master, which can be configured in the api.conf file.

object ApiListener "api" {
  accept_config = true
  accept_commands = true

Thanks again for the great help guys!

Could you maybe elaborate a bit more on those two steps?
I have your exact same problem, but i dont really understand how you proceeded here.
Isn’t your agent node on a windows server? Where is /var/lib/icinga2/certs there?

That’s correct, the agent node is on a windows server. When installing, icinga creates this folder path, but it’s not in full.

So let me try again:

  1. Install icinga on your windows agent
  2. Create and sign certificate on master by using the following commands
icinga2 pki new-cert --cn server.foo.bar --key server.foo.bar.key --csr server.foo.bar.csr
icinga2 pki sign-csr --csr server.foo.bar.csr --cert server.foo.bar.crt
  1. On the master (in my case debian10) go to: /var/lib/icinga2/certs
  2. Copy the created certificate onto your windows agent to the following path:
    C:\ProgramData\icinga2\var\lib\icinga2\certs
  3. Copy also the “ca.crt” from the master into the same folder on windows agent (here I’m not sure if that’s necessary but I did it, and it works)

The name of the certificate was written is lower case letters, but Icinga was looking for a certificate name written in capitals. So I changed that and came a step further.

Finally make sure, you don’t run into the same problem and check lower/upper case writing

2 Likes

Its working now, thank you so much!

Sadly i have more issues.
I was super happy, that i got my first host working, but now ive tried two more that are in the same network and the master just doesnt connect to them.

The situation is this: the icinga master is in my network and doesnt have a public ip. my network is connected to the hosts network via vpn. the same setup exists for multiple other networks and works flawlessly with linux servers.
the thing is: the vpn is one way and the host network has no route to my network.

I got it working somehow for one host, after Kev kindly helped me, but afaik i tried the same apporach for two more hosts and it doesnt work the same way. The certificates work and are correct imo, but the master simply cant establish a connection.

Does anyone have any ideas or knows where i could look to find the mistake.

Thanks in advance

the thing is: the vpn is one way and the host network has no route to my network.

If the traffic is initiated from the master (ie, the master talks to the host(s) in question, it should be fine, as in most cases the default network behavior is to allow established communications (think of you talking to a webserver that could not normally respond inside your network).

What does the zones.conf look like on the master vs the agent? You may want to check out the second example under Endpoint Connection Direction

Thanks for the fast reply.
Yes the communication should work, it even does from the same network for 1 host. I think the biggest issue with this setup is probably the certificates, which i managed to solve with Kevs help.

I didnt do anything with the zones.conf and wouldnt really know where to start tbh. I created my setup with the director and installed it on the host with the director created powershell kickstart-script.

Do you maybe have something a little more concrete what i should look for in zones.conf

Again, you might want to check out the second example under Endpoint Connection Direction – you just want to make sure that the direct is the same every time you add an agent (read the whole paragraph for clarification).

When I first setup our new cluster, I had some nasty things going on by choosing “both” directions, and couldn’t get things to work until I tidied up my zones.conf

Also, what do the debug logs on both the master and the agent say? Probably a better place to start anyways (lack of coffee this morning).

Thanks for the fast reply.

Don’t mention it and don’t count on it :stuck_out_tongue: I just try to check out the community most mornings.

Ok. Im not sure i fully understand tho.
My agents zone.conf looks like this:

object Endpoint "mymastersname" {
}

object Zone "master" {
	endpoints = [ "mymastersname" ]
}

object Endpoint "myagentsname" {
}

object Zone "myagentsname" {
	endpoints = [ "myagentsname" ]
	parent = "master"
}

object Zone "global-templates" {
	global = true
}

object Zone "director-global" {
	global = true
}

My master doesnt have any hosts in its zone.conf

object Endpoint "mymastersname" {
}

object Zone "master" {
        endpoints = [ "mymastersname" ]
}

object Zone "global-templates" {
        global = true
}

object Zone "director-global" {
        global = true
}


So if i understand this correctly i just have to add the host = “myagentsname” to the masters zone.conf and it should try to connect?
Why is my masters zone.conf empty tho? If i understand the documentation correctly noone should try to connect to anyone.

I also checked the debuglogs and the agent doesnt seem to want to write one, eventhough the feature is enabled.

The master ofc writes 5 novels per minute, but concerning the new hosts its surprisingly useless.

Timeout while reconnecting to endpoint 'myagentsname' via host 'myagentsip' and port '5665', cancelling attempt

Does anyone have any ideas, what i could do next?

Hi @why,

Where do you have the zone and endpoint definition of the agent on the master? It’ s not in zones.conf of the master, which you have posted above.

My master doesnt have any hosts in its zone.conf

object Endpoint "mymastersname" {
}

object Zone "master" {
       endpoints = [ "mymastersname" ]
}

object Zone "global-templates" {
       global = true
}

object Zone "director-global" {
       global = true
}

Hi @yhabteab thanks for answering.

i configured everything with director so my zones.conf seems to be in /var/lib/icinga2/api/zones/master/director/agent_zones.conf is that correct? It doesnt have as many zones as i have hosts, but it might be all the zones of hosts with agents.

If its the correct conf its fairly unspectacular, but it does not contain ips for any of the hosts.
It looks like this:

object Zone "host1" {
    parent = "master"
    endpoints = [ "host1" ]
}

object Zone "host2" {
    parent = "master"
    endpoints = [ "host 2" ]
}

I see, right. Then there must also be another file in that path /var/lib/icinga2/api/zones/master/director/, which has the Endpoint Definition in it. How does the content look like there?

@yhabteab
Right. That would be in /var/lib/icinga2/api/zones/master/director/agent_endpoints.conf

object Endpoint "host 1" {
    host = "192.168.x.x"
    log_duration = 0s
}

object Endpoint "host 2" {
    host = "192.168.x.xx"
    log_duration = 0s
}

object Endpoint "host 3" {
    host = "192.168.x.xxx"
    log_duration = 0s
}

Seems to be fine. Is it even possible to ping the agent from the master? If it works, please check the certificate of the agent and the logs from both endpoints would also help a lot, but not only a single line but all relevant log entries.

Yes the hosts are pingable and are working in icinga already, its just the services that are unknown, because icinga cant connect to the agents.

…It is now 3 hours later. I nuked the agents and installed everything icinga from scratch.
One of them is now reachable and working as intended and the other still has the same problem, eventhough i installed them identically.

The debuglog on the master is really not helpful at all, because he just complains that he cant connect.

On windows i cant seem to start a debuglog. I enable the feature, but he just doesnt create a file.