Create hosts in the director without creating a zone and endpoint manually first

Hi,

these are unfortunately my last days working with Icinga2 and the director, so I want to cleanup the environment and configuration before I hand it over to my colleagues and get as much out of the director as possible. I aim for a pretty clean and easy to maintain setup. :slight_smile:

Details about our setup:

  • icingaweb2.7.0
  • icinga2 2.10.5
  • director 1.6.2
  • satellites have to establish the connection to the masters; masters can’t connect to the satellites directly due to firewalls in between

We have 2 central master instances in the “master zone” and for most of our projects a zone with 2 satellites.

So, I followed the hint in the WebUI that you probably do something wrong if you create the zones and endpoints manually. I also read different links (ie. https://icinga.com/docs/director/latest/doc/24-Working-with-agents/ & Icinga Web in distributed environments) and the Director 101 and googled a bit.

First question based on the documentation:

  • Is it correct that the masters must be able to connect to the satellites for the scenario where I don’t want to create the zones and endpoints manually? So it would not be sufficient that the satellites can connect to the masters.

The part that works:

  • create the host with Icinga2 Agent Yes, Establish Connection No, Accepts Config Yes
  • verify in the preview that the director will deploy the new host, endpoint and zone config
  • apply & deploy the new config
  • verify with icinga2 object list --type zone that the zone was created:
Object 'scde-clickhouse-grafana-ds-01' of type 'Zone':
  % declared in '/var/lib/icinga2/api/packages/director/38800fd2-b564-4f76-9e0f-ceb7602377b2/zones.d/do-ffm-icinga-master/agent_zones.conf', lines 1:0-1:42
  * __name = "scde-clickhouse-grafana-ds-01"
  * endpoints = [ "scde-clickhouse-grafana-ds-01" ]
    % = modified in '/var/lib/icinga2/api/packages/director/38800fd2-b564-4f76-9e0f-ceb7602377b2/zones.d/do-ffm-icinga-master/agent_zones.conf', lines 3:5-3:51
  * global = false
  * name = "scde-clickhouse-grafana-ds-01"
  * package = "director"
  * parent = "do-ffm-icinga-master"
    % = modified in '/var/lib/icinga2/api/packages/director/38800fd2-b564-4f76-9e0f-ceb7602377b2/zones.d/do-ffm-icinga-master/agent_zones.conf', lines 2:5-2:35
  * source_location
    * first_column = 0
    * first_line = 1
    * last_column = 42
    * last_line = 1
    * path = "/var/lib/icinga2/api/packages/director/38800fd2-b564-4f76-9e0f-ceb7602377b2/zones.d/do-ffm-icinga-master/agent_zones.conf"
  * templates = [ "scde-clickhouse-grafana-ds-01" ]
    % = modified in '/var/lib/icinga2/api/packages/director/38800fd2-b564-4f76-9e0f-ceb7602377b2/zones.d/do-ffm-icinga-master/agent_zones.conf', lines 1:0-1:42
  * type = "Zone"
  * zone = "do-ffm-icinga-master"

So, I end up with the host in icinga2 and icingaweb2 and icinga2 object list. The services (with command_endpoint = host.name) are applied correctly.

I run into issues when I want to assign anything to the new zone and endpoint. For example when I want to add a 2nd node to the satellite zone. I can not choose the new zone from the drop down Cluster Zone and I do not find that zone listed in Icinga Infrastructure -> Zones. Same for the endpoint.

So what am I missing or doing wrong here? I’m really confused now.

Thanks in advance!

No

It doesn’t matter if master or satellite initiates the connection, so it’s fine to have only the satellites configured to connect (Same belongs to clients/agents).

Zones and Endpoint shall not be defined within the director but in conf files. For best practice take a look at this blog entry if you understand german.

1 Like

Hi Roland,

thanks for your reply.

This is clear to me for services for example. But how to add a 2nd host to an existing zone if I want to have 2 satellites in the same zone? That’s actually the point that confuses me the most and where I feel lost. Or do I just have to…

  1. create host satellite1.example.com (-> director will create the zone and endpoint)
  2. create host satellite2.example.com (-> director will create the zone and endpoint, too)
  3. create the local zones.conf on both notes that contains the zone with both endpoints?

In this case the director & master node will think that the endpoint satellite2.example.com is an endpoint in zone satellite2.example.com while the node itself thinks that the endpoint satellite2.example.com is one of two endpoints of the satellite1 zone. This sounds wrong to me and I see potential for issues with the config sync.

Sorry, feels like I am hopelessly confused now. :confounded:

Ah, ok, you’re trying to add your satellites as host, that’s not necessary. Just (gessing your run icinga node wizard already) modify your zones.conf on all machines (as described in that blog entry) and restart all instances.

1 Like

Is it really that easy? I will try it and let you know if the result it works as expected. Many thanks!

It is. After that you should run kickstart wizard to inform the director about these zones and endpoints.

1 Like

Many thanks! Now everything makes sense and works as expected!

Hmm… something is still not working when I try to add a new client. The node wizard does not succeed and the master log reports:

[2019-08-09 12:44:42 +0000] information/ApiListener: New client connection for identity 'ldue-clickhouse-grafana-ds-01' from [x.x.x.x]:43120 (certificate validation failed: code 18: self signed certificate)
[2019-08-09 12:44:52 +0000] warning/ApiListener: No data received on new API connection for identity 'ldue-clickhouse-grafana-ds-01'. Ensure that the remote endpoints are properly configured in a cluster setup.

First, I created the host in the Icinga2 director. The rendered configuration:

zones.d/do-ffm-icinga-master/hosts.conf

object Host "ldue-clickhouse-grafana-ds-01" {
    import "linux-host"

    address = "x.x.x.x"
    vars.customer = "SSC"
}

zones.d/do-ffm-icinga-master/agent_endpoints.conf

object Endpoint "ldue-clickhouse-grafana-ds-01" {
    log_duration = 0s
}

zones.d/do-ffm-icinga-master/agent_zones.conf

object Zone "ldue-clickhouse-grafana-ds-01" {
    parent = "do-ffm-icinga-master"
    endpoints = [ "ldue-clickhouse-grafana-ds-01" ]
}

zones.conf on the client:

object Endpoint "do-ffm-icinga-01" {
    host = "x.x.x.165"
    port = "5665"
}

object Endpoint "do-ffm-icinga-02" {
    host = "x.x.x.188"
    port = "5665"
}

object Zone "do-ffm-icinga-master" {
    endpoints = ["do-ffm-icinga-01", "do-ffm-icinga-02"]
}

object Endpoint "ldue-clickhouse-grafana-ds-01" {
    host = "x.x.x.x"
    port = "5665"
}


object Zone "ldue-clickhouse-grafana-ds-01" {
    endpoints = ["ldue-clickhouse-grafana-ds-01"]
    parent = "do-ffm-icinga-master"}

object Zone "global-templates" {
    global = true 
}
object Zone "director-global" {
    global = true 
}

Next step is to setup the pki (the PKI on the master is fine and we have more than 70 hosts connected).
So I run on the client:

icinga2 pki new-cert 
  --cn ldue-clickhouse-grafana-ds-01 
  --key /var/lib/icinga2/certs/ldue-clickhouse-grafana-ds-01.key 
  --csr /var/lib/icinga2/certs/ldue-clickhouse-grafana-ds-01.csr 
  --cert /var/lib/icinga2/certs/ldue-clickhouse-grafana-ds-01.crt

information/base: Writing private key to '/var/lib/icinga2/certs/ldue-clickhouse-grafana-ds-01.key'.
information/base: Writing X509 certificate to '/var/lib/icinga2/certs/ldue-clickhouse-grafana-ds-01.crt'.
information/base: Writing certificate signing request to '/var/lib/icinga2/certs/ldue-clickhouse-grafana-ds-01.csr'.
icinga2 pki save-cert --host x.x.x.165 --key /var/lib/icinga2/certs/ldue-clickhouse-grafana-ds-01.key --cert /var/lib/icinga2/certs/ldue-clickhouse-grafana-ds-01.crt --trustedcert /var/lib/icinga2/certs/do-ffm-icinga-01.crt
information/cli: Retrieving X.509 certificate for 'x.x.x.165:5665'.

 Subject:     CN = do-ffm-icinga-01
 Issuer:      CN = Icinga CA
 Valid From:  Dec 28 01:10:51 2018 GMT
 Valid Until: Dec 24 01:10:51 2033 GMT
 Fingerprint: 99 D8 5E 49 ED 02 47 82 06 12 9F A9 AD 66 C6 ED F3 96 F9 8E 

***
*** You have to ensure that this certificate actually matches the parent
*** instance's certificate in order to avoid man-in-the-middle attacks.
***

information/pki: Writing certificate to file '/var/lib/icinga2/certs/do-ffm-icinga-01.crt'.

Next step is the setup using the node wizard:

icinga2 node setup 
  --cn ldue-clickhouse-grafana-ds-01
  --endpoint do-ffm-icinga-01,x.x.x.165,5665 
  --zone ldue-clickhouse-grafana-ds-01 
  --parent_zone do-ffm-icinga-master 
  --parent_host x.x.x.165 
  --trustedcert /var/lib/icinga2/certs/do-ffm-icinga-01.crt
  --accept-commands 
  --disable-confd 
  --accept-config 
  --ticket c9934524a18744f73c64d41abb19d31e78d3a27f

information/cli: Requesting certificate with ticket 'c9934524a18744f73c64d41abb19d31e78d3a27f'.
information/cli: Verifying parent host connection information: host 'x.x.x.165', port '5665'.
information/cli: Using the following CN (defaults to FQDN): 'ldue-clickhouse-grafana-ds-01'.
information/cli: Backup file '/var/lib/icinga2/certs//ldue-clickhouse-grafana-ds-01.key.orig' already exists. Skipping backup.
information/cli: Backup file '/var/lib/icinga2/certs//ldue-clickhouse-grafana-ds-01.crt.orig' already exists. Skipping backup.
information/base: Writing private key to '/var/lib/icinga2/certs//ldue-clickhouse-grafana-ds-01.key'.
information/base: Writing X509 certificate to '/var/lib/icinga2/certs//ldue-clickhouse-grafana-ds-01.crt'.
information/cli: Verifying trusted certificate file '/var/lib/icinga2/certs/do-ffm-icinga-01.crt'.
information/cli: Requesting a signed certificate from the parent Icinga node.
critical/cli: Could not fetch valid response. Please check the master log.
critical/cli: Failed to fetch signed certificate from parent Icinga node 'x.x.x.165, 5665'. Please try again.

The log on the master shows just two events:

[2019-08-09 13:14:43 +0000] information/ApiListener: New client connection for identity 'ldue-clickhouse-grafana-ds-01' from [x.x.x.x]:43154 (certificate validation failed: code 18: self signed certificate)
[2019-08-09 13:14:53 +0000] warning/ApiListener: No data received on new API connection for identity 'ldue-clickhouse-grafana-ds-01'. Ensure that the remote endpoints are properly configured in a cluster setup.
Context:
	(0) Handling new API client connection

So I came across this thread: CSR auto-signing fails silently if no ticket_salt is set in the ApiListener feature configuration and I did not have a TicketSalt defined as well. So I created it with openssl rand -base64 30 and added it to the constants.conf. Did the same for the 2nd master node.
The api-feature is enabled and the configuration looks like this:

/**
 * The API listener is used for distributed monitoring setups.
 */

object ApiListener "api" {
  accept_config = true
  accept_commands = true

  ticket_salt = TicketSalt
}

The constants.conf of that node:

/**
 * This file defines global constants which can be used in
 * the other configuration files.
 */

/* The directory which contains the plugins from the Monitoring Plugins project. */
const PluginDir = "/usr/lib/nagios/plugins"

/* The directory which contains the Manubulon plugins.
 * Check the documentation, chapter "SNMP Manubulon Plugin Check Commands", for details.
 */
const ManubulonPluginDir = "/usr/lib/nagios/plugins"

/* The directory which you use to store additional plugins which ITL provides user contributed command definitions for.
 * Check the documentation, chapter "Plugins Contribution", for details.
 */
const PluginContribDir = "/usr/lib/nagios/plugins"

/* Our local instance name. By default this is the server's hostname as returned by `hostname --fqdn`.
 * This should be the common name from the API certificate.
 */
const NodeName = "do-ffm-icinga-01"

/* Our local zone name. */
const ZoneName = "do-ffm-icinga-master"

/* Secret key for remote node tickets */
const TicketSalt = "2xMm1JgsEQcUjJ/nve5xWPPA47zXIIXKU6boyK4n"

Any hint appreciated again. I don’t get why this topic confuses me that much. :confused:

Ok, this is getting really weird… I literally checked everything again in the last 2 hours and I was able to send the CSR successfully a few times - but randomly!

Example:
I started a loop out of desperation. The loop:

while true; do sleep 5; icinga2 node setup --cn ldue-clickhouse-grafana-ds-01 --endpoint do-ffm-icinga-01,x.x.x.165,5665 --zone ldue-clickhouse-grafana-ds-01 --parent_zone do-ffm-icinga-master --parent_host x.x.x.165 --trustedcert /var/lib/icinga2/certs/do-ffm-icinga-01.crt --disable-confd --accept-commands --accept-config ; done

So I basically run the node setup every 5 seconds. The 27th execution of the node setup was successful. The 26 before that and the 9 after that failed again.

The error messages, so that you don’t have to read the whole thread again:

# client 
information/cli: Requesting certificate without a ticket.
information/cli: Verifying parent host connection information: host '104.248.241.165', port '5665'.
information/cli: Using the following CN (defaults to FQDN): 'ldue-clickhouse-grafana-ds-01'.
information/cli: Backup file '/var/lib/icinga2/certs//ldue-clickhouse-grafana-ds-01.key.orig' already exists. Skipping backup.
information/cli: Backup file '/var/lib/icinga2/certs//ldue-clickhouse-grafana-ds-01.crt.orig' already exists. Skipping backup.
information/base: Writing private key to '/var/lib/icinga2/certs//ldue-clickhouse-grafana-ds-01.key'.
information/base: Writing X509 certificate to '/var/lib/icinga2/certs//ldue-clickhouse-grafana-ds-01.crt'.
information/cli: Verifying trusted certificate file '/var/lib/icinga2/certs/do-ffm-icinga-01.crt'.
information/cli: Requesting a signed certificate from the parent Icinga node.
critical/cli: Could not fetch valid response. Please check the master log.
critical/cli: Failed to fetch signed certificate from parent Icinga node '104.248.241.165, 5665'. Please try again.
# master
[2019-08-11 21:41:05 +0000] information/ApiListener: New client connection for identity 'ldue-clickhouse-grafana-ds-01' from [142.93.96.49]:57076 (certificate validation failed: code 18: s
elf signed certificate)
[2019-08-11 21:41:15 +0000] warning/ApiListener: No data received on new API connection for identity 'ldue-clickhouse-grafana-ds-01'. Ensure that the remote endpoints are properly configur
ed in a cluster setup.
Context:
        (0) Handling new API client connection

The same behavior is reproducible by just running the icinga2 pki request command.

So I start to think that this might be related to https://github.com/Icinga/icinga2/issues/6981? @dnsmichi, @mcktr - what do you think? :slight_smile:

I will try the same procedure in another test env early next week. But any input at this time is highly appreciated.

Please try with the 2.11 RC1 on the master, this seems to be related to a lock in the network stack in 2.10.x.

1 Like

Ok, not reproducible with 2.11 and my initial questions got answered. Thank you, guys!

1 Like

Thanks, added to the release feedback list :slight_smile:

1 Like