Icinga2 director configuration in HA Master Setup

I am trying to setup two icinga2 master in HA mode.
For that:

  1. I have copied the files all files under /var/lib/icinga2/ca from Primary Master to Secondary Master

  2. I have run the icinga2 setup wizard as masters on both masters to generate certificates

  3. I have kept the password same /etc/icinga2/conf.d/api-users.conf for both masters

  4. In constant.conf i have kept the const TicketSalt = “Same” for both Primary and Secondary

  5. My zones.conf look below
    For Primary

     object Endpoint "ncvdl09.us.corp.net" {
             //Local Server
     }
    
     object Endpoint "ncvdl10.us.corp.net" {
             host = "192.168.1.154"
     }
    
     object Zone "master" {
             endpoints = [ "ncvdl09.us.corp.net", "ncvdl10.us.corp.net" ]
     }
    
     object Zone "global-templates" {
             global = true
     }
    
     object Zone "director-global" {
             global = true
     }
    

For Secondary

object Endpoint "ncvdl10.us.corp.net" {
       // Local Server
}

object Endpoint "ncvdl09.us.corp.net" {
       // Remote Master
}

object Zone "master" {
        endpoints = [ "ncvdl10.us.corp.net", "ncvdl09.us.corp.net" ]
}

object Zone "global-templates" {
        global = true
}

object Zone "director-global" {
        global = true
}

My Architecture looks like below

I will have series of questions which I will ask while I try to deploy the icinga2

Question :-

  1. Installed Director on Master1 / Icingaweb2 server. While configuring what will be the endpoint name. Should I put Master1 FQDN. Then will it work for Master2 as well? I am little confused about this concept. Please help me to understand

  2. While installing the satellite pair in each region - should i be copying the files of /var/lib/icinga2/ca to the satellite pair as well and replace.

  3. With the above zones.conf - can you please help - how the zones.conf in satellite will look like with a sample. Should I be updating the object endpoints with satellite name in the master zone.conf as well

  4. For agent installation who will issue the pki ticket? Master / satellite ? I am little confused with this one.

  5. Do we have an agent for AIX

More questions to come as I move forward

  1. Use master1, and only this. In case of failure of master1, no deployments happen but the secondary master still continues with collecting check data and updating the backends. Either restore master1, or if that’s gone endlessly, change the Director deployment.
  2. No, never ever copy the private CA key to any other instance than the primary master. Doing so exposes a security risk for attackers gaining access to your trust store, allowing them to sign certificates and play man in the middle even with TLS. To make things easier with certificate handling, the CLI tools support automated signing during the setup period. This is inspired by Puppet, if you are familiar with that.
  3. Depends on the directions, should the satellites connect to the masters, or vice versa? If both ways work, I’d suggest that the masters populate the host field for the satellites thus doing the connect. The satellite’s configuration does not need the host attribute for the master’s then. This also needs some “try it out” configuration to fully understand it.
  4. The ticket needs to be generated on the master holding the Icinga CA key pair & TicketSalt. This instance will then automatically sign the certificate request with the ticket send by the agent. Since 2.8, satellites can forward the request from the agents, so you don’t need a direct master-agent connection here. If you prefer to not pre-generate the ticket on the master, you can also send a request without ticket, but then must manually approve and sign them on the primary master, see the details about “on-demand signing” in the docs.
  5. You can build Icinga 2 on AIX with the help of the development guidelines, or you keep using an alternate method. We generally recommend to use the SSH based way, NRPE is discouraged for security reasons.

Cheers,
Michael

1 Like

Thank You for the details. I completed the director setup.

For point 2 to make 2 satellite in same zone should I be copying the /var/lib/icinga2/ca from primary satellite to secondary satellite
I am currently getting the below errors

On Secondary Satellite

[2019-10-11 09:14:53 -0700] information/ApiListener: New client connection for identity ‘nacvl11.us.corp.net’ to [192.168.1.156]:5665 (certificate validation failed: code 18: self signed certificate)

EDIT: Sorry i do not see the /var/lib/icinga2/ca in the Satellite server. How do I overcome the error above

My Satellite zones.conf looks like below

PrimarySatellite

/*
 * Generated by Icinga 2 node setup commands
 * on 2019-10-11 08:31:43 -0700
 */

object Endpoint "nacvdl09.us.corp.net" {
        host = "192.168.1.151"
        port = "5665"
}

object Endpoint "nacvdl10.us.corp.net" {
        host = "192.168.1.154"
        port = "5665"
}

object Zone "master" {
        endpoints = [ "nacvdl09.us.corp.net", "nacvdl10.us.corp.net" ]
}

object Endpoint "nacvdl11.us.corp.net" {
}

object Endpoint "nacvdl12.us.corp.net" {
        host = "192.168.1.193"
}

object Zone "US_Satellite" {
        endpoints = [ "nacvdl11.us.corp.net", "nacvdl12.us.corp.net" ]
        parent = "master"
}

object Zone "global-templates" {
        global = true
}

object Zone "director-global" {
        global = true
}

On Secondary Satellite the zone.conf is like below

/*
 * Generated by Icinga 2 node setup commands
 * on 2019-10-11 08:48:08 -0700
 */

object Endpoint "nacvdl09.us.corp.net" {
        host = "192.168.1.151"
        port = "5665"
}

object Endpoint "nacvdl10.us.corp.net" {
        host = "192.168.1.154"
        port = "5665"
}

object Zone "master" {
        endpoints = [ "nacvdl09.us.corp.net", "nacvdl10.us.corp.net" ]
}

object Endpoint "nacvdl12.us.corp.net" {
}

object Endpoint "nacvdl11.us.corp.net" {
        host = "192.168.1.156"
}

object Zone "US_Satellite" {
        endpoints = [ "nacvdl12.us.corp.net", "nacvdl11.us.corp.net" ]
        parent = "master"
}

object Zone "global-templates" {
        global = true
}

object Zone "director-global" {
        global = true
}

EDIT
Ok I am stuck and need help
I am not able to setup the two satellite with above zones.conf
The above log is on the secondary satellite nacvdl12.us.corp.net @/var/log/icinga2/icinga2.log

[2019-10-11 23:13:41 -0700] information/ApiListener: Syncing runtime objects to endpoint 'nacvdl11.us.corp.net'.
[2019-10-11 23:13:41 -0700] information/ApiListener: Finished syncing runtime objects to endpoint 'nacvdl11.us.corp.net'.
[2019-10-11 23:13:41 -0700] information/ApiListener: Finished sending runtime config updates for endpoint 'nacvdl11.us.corp.net' in zone 'US_Satellite'.
[2019-10-11 23:13:51 -0700] information/ApiListener: New client connection for identity 'nacvdl11.us.corp.net' from [192.168.1.156]:50634
[2019-10-11 23:13:51 -0700] information/ApiListener: Sending config updates for endpoint 'nacvdl11.us.corp.net' in zone 'US_Satellite'.
[2019-10-11 23:13:51 -0700] information/ApiListener: Finished sending config file updates for endpoint 'nacvdl11.us.corp.net' in zone 'US_Satellite'.
[2019-10-11 23:13:51 -0700] information/ApiListener: Syncing runtime objects to endpoint 'nacvdl11.us.corp.net'.
[2019-10-11 23:13:51 -0700] information/ApiListener: Finished syncing runtime objects to endpoint 'nacvdl11.us.corp.net'.
[2019-10-11 23:13:51 -0700] information/ApiListener: Finished sending runtime config updates for endpoint 'nacvdl11.us.corp.net' in zone 'US_Satellite'.

But in the primary satellite looks like it is not able to register secondary satellite

[2019-10-11 23:18:30 -0700] information/ApiListener: Reconnecting to endpoint 'naacvdl12.us.corp.net' via host '192.168.1.193' and port '5665'
[2019-10-11 23:18:31 -0700] warning/ApiListener: Certificate validation failed for endpoint 'naacvdl12.us.corp.net': code 18: self signed certificate
[2019-10-11 23:18:31 -0700] information/ApiListener: New client connection for identity 'naacvdl12.us.corp.net' to [192.168.1.193]:5665 (certificate validation failed: code 18: self signed certificate)
[2019-10-11 23:18:31 -0700] information/ApiListener: Finished reconnecting to endpoint 'naacvdl12.us.corp.net' via host '192.168.1.193' and port '5665'
[2019-10-11 23:18:40 -0700] information/ApiListener: Reconnecting to endpoint 'naacvdl12.us.corp.net' via host '192.168.1.193' and port '5665'
[2019-10-11 23:18:41 -0700] warning/ApiListener: Certificate validation failed for endpoint 'naacvdl12.us.corp.net': code 18: self signed certificate
[2019-10-11 23:18:41 -0700] information/ApiListener: New client connection for identity 'naacvdl12.us.corp.net' to [192.168.1.193]:5665 (certificate validation failed: code 18: self signed certificate)
[2019-10-11 23:18:41 -0700] information/ApiListener: Finished reconnecting to endpoint 'naacvdl12.us.corp.net' via host '192.168.1.193' and port '5665'
[2019-10-11 23:18:50 -0700] information/ApiListener: Reconnecting to endpoint 'naacvdl12.us.corp.net' via host '192.168.1.193' and port '5665'
[2019-10-11 23:18:51 -0700] warning/ApiListener: Certificate validation failed for endpoint 'naacvdl12.us.corp.net': code 18: self signed certificate
[2019-10-11 23:18:51 -0700] information/ApiListener: New client connection for identity 'naacvdl12.us.corp.net' to [192.168.1.193]:5665 (certificate validation failed: code 18: self signed certificate)
[2019-10-11 23:18:51 -0700] information/ApiListener: Finished reconnecting to endpoint 'naacvdl12.us.corp.net' via host '192.168.1.193' and port '5665'

No, as said, never do that. Instead, rely on the CSR auto or on-demand signing methods and CLI setup commands.

The /var/lib/icinga2/ca directory only must exist on your primary master. All other nodes will keep a copy of ca.crt in their /var/lib/icinga2/certs directory when setup via CLI commands and setup wizards.

Fix this before continuing.

Cheers,
Michael

1 Like

Oh!! I think I have made a mess of the Satellite configuration. :frowning:
Because the /var/lib/icinga2/ca was not there I ran

icinga2 api setup

That has created the directory /var/lib/icinga2/ca folder with cert files in it

# ls -ltr
total 8
-rw------- 1 icinga icinga 3243 Oct 11 10:34 ca.key
-rw-r----- 1 icinga icinga 1720 Oct 11 10:34 ca.crt

I have no clue how to fix that. Should I be stopping the api in both the satellite? Please help

EDIT
This is what I did on the satellite

  1. Stopped API service
  2. Removed (after backup) the directory /var/lib/icinga2/ca
  3. Removed (after backup) all the files under /var/lib/icinga2/certs/
  4. Ran icinga2 node wizard (Note I am doing on demand CSR signing using icinga2 ca sign)
  5. API got enabled
  6. Checked the directory /var/lib/icinga2/ca still not there
  7. /var/lib/icinga2/certs/ has newly created .key .cert and ca.cert and a .orig file
  8. As node wizard changed the zones.conf → manually edited and added the object endpoint for satellite 2 and also added the satellite 2 under object zone “US_Zone”

Did the above steps in satellite 2 as well after finished satellite 1. Only In step 8 for satellite 2 added the satellite 1 as end point and under object zone “US_Zone”

I am still getting the warning log in tail -f /var/log/icinga2/icinga2.log

On Primary Satellite Getting for secondary Satellite

[2019-10-14 17:44:08 -0700] information/ApiListener: New client connection for identity 'naacvdl12.us.corp.net' from [192.168.1.193]:38590 (certificate validation failed: code 18: self signed certificate)
[2019-10-14 17:44:17 -0700] information/ApiListener: Reconnecting to endpoint 'naacvdl12.us.corp.net' via host '192.168.1.193' and port '5665'
[2019-10-14 17:44:17 -0700] warning/ApiListener: Certificate validation failed for endpoint 'naacvdl12.us.corp.net': code 18: self signed certificate
[2019-10-14 17:44:17 -0700] information/ApiListener: New client connection for identity 'naacvdl12.us.corp.net' to [192.168.1.193]:5665 (certificate validation failed: code 18: self signed certificate)
[2019-10-14 17:44:17 -0700] information/ApiListener: Finished reconnecting to endpoint 'naacvdl12.us.corp.net' via host '192.168.1.193' and port '5665'
[2019-10-14 17:44:18 -0700] information/ApiListener: New client connection for identity 'naacvdl12.us.corp.net' from [192.168.1.193]:38592 (certificate validation failed: code 18: self signed certificate)
[2019-10-14 17:44:27 -0700] information/ApiListener: Reconnecting to endpoint 'naacvdl12.us.corp.net' via host '192.168.1.193' and port '5665'
[2019-10-14 17:44:27 -0700] warning/ApiListener: Certificate validation failed for endpoint 'naacvdl12.us.corp.net': code 18: self signed certificate
[2019-10-14 17:44:27 -0700] information/ApiListener: New client connection for identity 'naacvdl12.us.corp.net' to [192.168.1.193]:5665 (certificate validation failed: code 18: self signed certificate)
[2019-10-14 17:44:27 -0700] information/ApiListener: Finished reconnecting to endpoint 'naacvdl12.us.corp.net' via host '192.168.1.193' and port '5665'
[2019-10-14 17:44:28 -0700] information/ApiListener: New client connection for identity 'naacvdl12.us.corp.net' from [192.168.1.193]:38594 (certificate validation failed: code 18: self signed certificate)

On Secondary Satellite looks like it is OK

[2019-10-14 18:03:27 -0700] information/ApiListener: Syncing runtime objects to endpoint 'naacvdl11.us.corp.net'.
[2019-10-14 18:03:27 -0700] information/ApiListener: Finished syncing runtime objects to endpoint 'naacvdl11.us.corp.net'.
[2019-10-14 18:03:27 -0700] information/ApiListener: Finished sending runtime config updates for endpoint 'naacvdl11.us.corp.net' in zone 'US_Satellite'.
[2019-10-14 18:03:37 -0700] information/ApiListener: New client connection for identity 'naacvdl11.us.corp.net' from [192.168.1.156]:60184
[2019-10-14 18:03:37 -0700] information/ApiListener: Sending config updates for endpoint 'naacvdl11.us.corp.net' in zone 'US_Satellite'.
[2019-10-14 18:03:37 -0700] information/ApiListener: Finished sending config file updates for endpoint 'naacvdl11.us.corp.net' in zone 'US_Satellite'.
[2019-10-14 18:03:37 -0700] information/ApiListener: Syncing runtime objects to endpoint 'naacvdl11.us.corp.net'.
[2019-10-14 18:03:37 -0700] information/ApiListener: Finished syncing runtime objects to endpoint 'naacvdl11.us.corp.net'.
[2019-10-14 18:03:37 -0700] information/ApiListener: Finished sending runtime config updates for endpoint 'naacvdl11.us.corp.net' in zone 'US_Satellite'.
[2019-10-14 18:03:47 -0700] information/ApiListener: New client connection for identity 'naacvdl11.us.corp.net' from [192.168.1.156]:60186
[2019-10-14 18:03:47 -0700] information/ApiListener: Sending config updates for endpoint 'naacvdl11.us.corp.net' in zone 'US_Satellite'.
[2019-10-14 18:03:47 -0700] information/ApiListener: Finished sending config file updates for endpoint 'naacvdl11.us.corp.net' in zone 'US_Satellite'.
[2019-10-14 18:03:47 -0700] information/ApiListener: Syncing runtime objects to endpoint 'naacvdl11.us.corp.net'.
[2019-10-14 18:03:47 -0700] information/ApiListener: Finished syncing runtime objects to endpoint 'naacvdl11.us.corp.net'.
[2019-10-14 18:03:47 -0700] information/ApiListener: Finished sending runtime config updates for endpoint 'naacvdl11.us.corp.net' in zone 'US_Satellite'.
[2019-10-14 18:03:57 -0700] information/ApiListener: New client connection for identity 'naacvdl11.us.corp.net' from [192.168.1.156]:60188
[2019-10-14 18:03:57 -0700] information/ApiListener: Sending config updates for endpoint 'naacvdl11.us.corp.net' in zone 'US_Satellite'.
[2019-10-14 18:03:57 -0700] information/ApiListener: Finished sending config file updates for endpoint 'naacvdl11.us.corp.net' in zone 'US_Satellite'.
[2019-10-14 18:03:57 -0700] information/ApiListener: Syncing runtime objects to endpoint 'naacvdl11.us.corp.net'.
[2019-10-14 18:03:57 -0700] information/ApiListener: Finished syncing runtime objects to endpoint 'naacvdl11.us.corp.net'.
[2019-10-14 18:03:57 -0700] information/ApiListener: Finished sending runtime config updates for endpoint 'naacvdl11.us.corp.net' in zone 'US_Satellite'.

Why am I getting the warning on primary Satellite?

As a troubleshooting step:

  1. Copied /var/lib/icinga2/certs/ca.crt - from Primary Master, Secondary Master, Primary Satellite & Secondary Satellite
    Did a diff command across all crt and found all are same.
  2. Checked the certificate of the satellite using openssl verify -verbose -CAfile /var/lib/icinga2/certs/ca.crt /var/lib/icinga2/certs/naacvdl12.us.corp.net.crt . Returned OK

Hello

After few restarts of the icinga2 across all Master Satellite it looks like it started working with no error in any of the logs. Now I am trying to get agent on boarded using Director.

There seem to be two things which is confusing me in director

  1. Host Group
  2. Host Template

Looked like you can create a Host Template and Assign a Host Group under it. Then in Host Template you can assign the Services you want to monitor

So I created the following

  1. Host Group Called “Linux_Production”
  2. Host Template Called “US_Linux_Agent”
  3. I added two services under Host Template US_LZ_CPU & US_LZ_Disk

The LZ_CPU is executing fine but LZ_Disk seems to be in pending state. Not sure why