Error with icinga2-agent-kickstart.bash

Running icinga2-agent-kickstart.bash on a client gives me some errors:

root@myclient:~# ./icinga2-agent-kickstart.bash
INFO: This should be a Debian system
check: icinga2 installed - OK: 2.10.5-1
INFO: Using old SSL directory: /etc/icinga2/pki
information/base: Writing private key to '/etc/icinga2/pki/myclient.lan.key'.
information/base: Writing X509 certificate to '/etc/icinga2/pki/myclient.lan.crt'.
information/base: Writing certificate signing request to '/etc/icinga2/pki/myclient.lan.csr'.
information/cli: Retrieving X.509 certificate for 'master.lan:5665'.

 Subject:     CN = master.lan
 Issuer:      CN = Icinga CA
 Valid From:  Jul 15 11:28:06 2019 GMT
 Valid Until: Jul 11 11:28:06 2034 GMT
 Fingerprint: 3F 6F 77 2E AD C0 DC 72 52 67 1C 22 F9 23 E9 6C 39 30 34 D0

***
*** You have to ensure that this certificate actually matches the parent
*** instance's certificate in order to avoid man-in-the-middle attacks.
***

information/pki: Writing certificate to file '/etc/icinga2/pki/trusted-master.crt'.
critical/cli: Could not fetch valid response. Please check the master log.
./icinga2-agent-kickstart.bash: line 186: Could not retrieve final certificate from host master.lan: command not found
Writing config to /etc/icinga2/icinga2.conf
Writing config to /etc/icinga2/zones.conf
Writing config to /etc/icinga2/features-available/api.conf
warning/cli: Feature 'api' already enabled.
[2019-07-18 16:59:32 +0200] information/cli: Icinga application loader (version: r2.10.5-1)
[2019-07-18 16:59:32 +0200] information/cli: Loading configuration file(s).
[2019-07-18 16:59:32 +0200] information/ConfigItem: Committing config item(s).
[2019-07-18 16:59:32 +0200] warning/ApiListener: Attribute 'key_path' for object 'api' of type 'ApiListener' is deprecated and should not be used.
[2019-07-18 16:59:32 +0200] warning/ApiListener: Attribute 'ca_path' for object 'api' of type 'ApiListener' is deprecated and should not be used.
[2019-07-18 16:59:32 +0200] warning/ApiListener: Attribute 'cert_path' for object 'api' of type 'ApiListener' is deprecated and should not be used.
[2019-07-18 16:59:32 +0200] warning/ApiListener: Copying '/etc/icinga2/pki/myclient.lan.crt' certificate file to '/var/lib/icinga2/certs//myclient.lan.crt'
[2019-07-18 16:59:32 +0200] warning/ApiListener: Copying '/etc/icinga2/pki/myclient.lan.key' certificate file to '/var/lib/icinga2/certs//myclient.lan.key'
[2019-07-18 16:59:32 +0200] warning/ApiListener: Please read the upgrading documentation for v2.8: https://icinga.com/docs/icinga2/latest/doc/16-upgrading-icinga-2/
[2019-07-18 16:59:32 +0200] information/ApiListener: My API identity: myclient.lan
[2019-07-18 16:59:32 +0200] critical/SSL: Error loading and verifying locations in ca key file '/var/lib/icinga2/certs//ca.crt': 33558530, "error:02001002:system library:fopen:No such file or directory"
[2019-07-18 16:59:32 +0200] critical/config: Error: Cannot make SSL context for cert path: '/var/lib/icinga2/certs//myclient.lan.crt' key path: '/var/lib/icinga2/certs//myclient.lan.key' ca path: '/var/lib/icinga2/certs//ca.crt'.
Location: in /etc/icinga2/features-enabled/api.conf: 3:1-3:24
/etc/icinga2/features-enabled/api.conf(1): /** Icinga 2 Config - proposed by Icinga Director */
/etc/icinga2/features-enabled/api.conf(2):
/etc/icinga2/features-enabled/api.conf(3): object ApiListener "api" {
                                           ^^^^^^^^^^^^^^^^^^^^^^^^
/etc/icinga2/features-enabled/api.conf(4):   cert_path = SysconfDir + "/icinga2/pki/myclient.lan.crt"
/etc/icinga2/features-enabled/api.conf(5):   key_path = SysconfDir + "/icinga2/pki/myclient.lan.key"

[2019-07-18 16:59:32 +0200] critical/config: 1 error
Please restart icinga2:
  service icinga2 restart

icinga2.log on the master:
information/ApiListener: New client connection for identity 'myhost.lan' from [10.0.0.5]:54462 (certificate validation failed: code 18: self signed certificate)

The first steps via the API seem to work, but “pki request …” is failing!
./icinga2-agent-kickstart.bash: line 186: Could not retrieve final certificate from host master.lan: command not found

btw. what “command not found” here means?

Master:
Director version: 1.6.2
Icinga Web 2 version and modules: 2.6.3
Icinga 2 version: r2.10.5-1
Operating System and version: Ubuntu 18.04
Webserver, PHP versions: apache 2.4.29, php 7.2

Client:
Ubuntu 18.04
Icinga 2 version: r2.10.5-1

Seems the script doesn’t stop on errors. Could not retrieve final certificate is a log message from the script when it detects an error code. Unfortunately the CLI output of pki request is hidden.

Try running pki request manually, with extracted arguments from the script. Then look into the master’s log after the connection attempted. The self signed certificate warning is typically normal since you don’t have a trusted certificate at this point - this is where a pki request is all about.

Cheers,
Michael

“icinga pki …” output on client:

critical/cli: Could not fetch valid response. Please check the master log.

icinga2.log on master:

[2019-07-18 17:43:24 +0200] information/ApiListener: New client connection for identity 'myclient.lan' from [10.0.0.5]:55634 (certificate validation failed: code 18: self signed certificate)
[2019-07-18 17:43:34 +0200] warning/ApiListener: No data received on new API connection for identity 'myclient.lan'. Ensure that the remote endpoints are properly configured in a cluster setup.
    Context:
            (0) Handling new API client connection

Nothing more? Hmmm could indicate a problem with the connection. What happens after you retry the command a few times?

Already tried about 5 times, always the same result.

But yeah, it’s weird because the previous client had the same problem and after a few tries it worked.

The participating hosts are VMs on a Xen host and all in the same subnet.
Maybe thats an Xen networking issue. Any ideas how to debug further?

I’m not sure if it is a network issue, or a problem within the core itself. With 2.10 and before, we had seen hanging connections - this is a topic for the coming 2.11 with a rewritten network stack.

One thing I would try - test the pki request from a different machine. It might also be worthwhile to use node wizard instead to see if that changes the behaviour. This CLI command (or node setup) does all the things with certificate signing, save-cert and pki request done manually in the script.

Also, do a tcpdump and check if there are data transmissions after the initial TLS handshake. It may be the case that pki request doesn’t send any data, or doesn’t receive any from the master.

How’s the performance of your master instance, is it overloaded with other JSON-RPC messages, or other indicators which would explain non-processed requests? There’s details in the “monitoring icinga 2” section in the docs, as well as performance metrics available in the icinga check. Last but not least, /v1/status/ApiListener provides insights as well.

Cheers,
Michael

“icinga2 node wizard” output on client:

...
Add more master/satellite endpoints? [y/N]:
Parent certificate information:

 Subject:     CN = master.lan
 Issuer:      CN = Icinga CA
 Valid From:  Jul 15 11:28:06 2019 GMT
 Valid Until: Jul 11 11:28:06 2034 GMT
 Fingerprint: 3F 6F 77 2E AD C0 DC 72 52 67 1C 22 F9 23 E9 6C 39 30 34 D0

Is this information correct? [y/N]: y

Please specify the request ticket generated on your Icinga 2 master (optional).
 (Hint: # icinga2 pki ticket --cn 'myclient.lan'): 6hcad4882c691d18b32c452c3c2d613fde975kd9
critical/cli: Could not fetch valid response. Please check the master log.
critical/cli: Failed to fetch signed certificate from master '10.0.0.30, 5665'. Please try again.

icinga2.log on master:

[2019-07-19 09:47:25 +0200] information/ApiListener: New client connection for identity 'myclient.lan' from [10.0.0.5]:49308 (certificate validation failed: code 18: self signed certificate)
[2019-07-19 09:47:26 +0200] information/WorkQueue: #7 (IdoMysqlConnection, ido-mysql) items: 1, rate: 3.53333/s (212/min 1098/5min 3306/15min);
[2019-07-19 09:47:35 +0200] warning/ApiListener: No data received on new API connection for identity 'myclient.lan'. Ensure that the remote endpoints are properly configured in a cluster setup.
Context:
        (0) Handling new API client connection

The network flow taken on the master:

1   0.000000  client → master  TCP 74 49320 → 5665 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=2861551345 TSecr=0 WS=128
2   0.000093  master → client  TCP 74 5665 → 49320 [SYN, ACK] Seq=0 Ack=1 Win=28960 Len=0 MSS=1460 SACK_PERM=1 TSval=2636918417 TSecr=2861551345 WS=128
3   0.000436  client → master  TCP 66 49320 → 5665 [ACK] Seq=1 Ack=1 Win=29312 Len=0 TSval=2861551346 TSecr=2636918417
4   0.003013  client → master  TLSv1 379 Client Hello
5   0.003054  master → client  TCP 66 5665 → 49320 [ACK] Seq=1 Ack=314 Win=30080 Len=0 TSval=2636918420 TSecr=2861551348
6   0.017012  master → client  TLSv1.3 3483 Server Hello, Change Cipher Spec, Application Data, Application Data, Application Data, Application Data, Application Data
7   0.017293  client → master  TCP 66 49320 → 5665 [ACK] Seq=314 Ack=1449 Win=32128 Len=0 TSval=2861551363 TSecr=2636918434
8   0.017407  client → master  TCP 66 49320 → 5665 [ACK] Seq=314 Ack=3418 Win=36096 Len=0 TSval=2861551363 TSecr=2636918434
9   0.019467  client → master  TLSv1.3 176 Change Cipher Spec, Application Data, Application Data
10   0.019705  master → client  TLSv1.3 321 Application Data
11   0.060882  client → master  TCP 66 49320 → 5665 [ACK] Seq=424 Ack=3673 Win=39040 Len=0 TSval=2861551406 TSecr=2636918437
12   0.060938  master → client  TLSv1.3 321 Application Data
13   0.061258  client → master  TCP 66 49320 → 5665 [ACK] Seq=424 Ack=3928 Win=41856 Len=0 TSval=2861551407 TSecr=2636918478
14  10.070643  master → client  TLSv1.3 90 Application Data
15  10.070783  master → client  TCP 66 5665 → 49320 [FIN, ACK] Seq=3952 Ack=424 Win=30080 Len=0 TSval=2636928488 TSecr=2861551407
16  10.072579  client → master  TCP 66 49320 → 5665 [ACK] Seq=424 Ack=3952 Win=41856 Len=0 TSval=2861561417 TSecr=2636928487
17  10.072623  client → master  TLSv1.3 90 Application Data
18  10.072671  master → client  TCP 54 5665 → 49320 [RST] Seq=3953 Win=0 Len=0
19  10.072699  client → master  TCP 66 49320 → 5665 [FIN, ACK] Seq=448 Ack=3953 Win=41856 Len=0 TSval=2861561417 TSecr=2636928488
20  10.072717  master → client  TCP 54 5665 → 49320 [RST] Seq=3953 Win=0 Len=0
21  12.283906  client → master  TCP 74 49326 → 5665 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=2861563629 TSecr=0 WS=128
22  12.283979  master → client  TCP 74 5665 → 49326 [SYN, ACK] Seq=0 Ack=1 Win=28960 Len=0 MSS=1460 SACK_PERM=1 TSval=2636930701 TSecr=2861563629 WS=128
23  12.284334  client → master  TCP 66 49326 → 5665 [ACK] Seq=1 Ack=1 Win=29312 Len=0 TSval=2861563629 TSecr=2636930701
24  12.286155  client → master  TLSv1 379 Client Hello
25  12.286193  master → client  TCP 66 5665 → 49326 [ACK] Seq=1 Ack=314 Win=30080 Len=0 TSval=2636930703 TSecr=2861563631
26  12.300316  master → client  TLSv1.3 3483 Server Hello, Change Cipher Spec, Application Data,  Application Data, Application Data, Application Data, Application Data
27  12.300735  client → master  TCP 66 49326 → 5665 [ACK] Seq=314 Ack=1449 Win=32128 Len=0 TSval=2861563646 TSecr=2636930717
28  12.300759  client → master  TCP 66 49326 → 5665 [ACK] Seq=314 Ack=3418 Win=36096 Len=0 TSval=2861563646 TSecr=2636930717
29  12.317449  client → master  TLSv1.3 2016 Change Cipher Spec, Application Data, Application Data, Application Data
30  12.317482  master → client  TCP 66 5665 → 49326 [ACK] Seq=3418 Ack=2264 Win=34048 Len=0 TSval=2636930734 TSecr=2861563662
31  12.318693  master → client  TLSv1.3 1617 Application Data
32  12.318960  master → client  TCP 1514 [TCP segment of a reassembled PDU]
33  12.318996  client → master  TCP 66 49326 → 5665 [ACK] Seq=2264 Ack=4969 Win=39168 Len=0 TSval=2861563664 TSecr=2636930736
34  12.319044  master → client  TLSv1.3 169 Application Data
35  12.319319  client → master  TCP 66 49326 → 5665 [ACK] Seq=2264 Ack=6520 Win=42112 Len=0 TSval=2861563664 TSecr=2636930736
36  22.369929  master → client  TLSv1.3 90 Application Data
37  22.370057  master → client  TCP 66 5665 → 49326 [FIN, ACK] Seq=6544 Ack=2264 Win=34048 Len=0 TSval=2636940787 TSecr=2861563664
38  22.370590  client → master  TLSv1.3 90 Application Data
39  22.370664  master → client  TCP 54 5665 → 49326 [RST] Seq=6545 Win=0 Len=0

Apparently no certificates are exchanged. Maybe thats normal behavoir for that icinga api scenario?
But some “Application Data” is flowing.

Could there be a problem with the salt?

My icinga master has 6 host objects with some tests.
A agent is running only on one client (i am trying to get the second one to work ;).
icing check output:

|api_num_conn_endpoints|1.00|
|---|---|
|num_services_flapping|0.00|
|current_concurrent_checks|1.00|
|remote_check_queue|0.00|
|min_latency|0.00|
|max_latency|0.00|
|avg_latency|0.00|
|min_execution_time|0.00|
|max_execution_time|10.02|
|avg_execution_time|0.45|
|num_services_ok|19.00|
|num_services_warning|0.00|
|num_services_critical|1.00|
|num_services_unknown|8.00|
|num_services_pending|0.00|
|num_services_unreachable|0.00|
|num_services_in_downtime|0.00|
|active_service_checks_15min|418.00|
|num_services_acknowledged|0.00|
|uptime|10,058.01|
|num_hosts_up|6.00|
|num_hosts_down|0.00|
|num_hosts_pending|0.00|
|num_hosts_unreachable|0.00|
|num_hosts_flapping|0.00|
|num_hosts_in_downtime|0.00|
|num_hosts_acknowledged|0.00|
|last_messages_sent|1,563,530,548.56|
|last_messages_received|1,563,530,548.58|
|sum_messages_sent_per_second|0.17|
|sum_messages_received_per_second|0.17|
|sum_bytes_sent_per_second|28.13|
|passive_service_checks_15min|0.00|
|passive_service_checks_5min|0.00|
|api_num_endpoints|4.00|
|idomysqlconnection_ido-mysql_queries_5mins|986.00|
|api_num_http_clients|0.00|
|api_num_json_rpc_anonymous_clients|0.00|
|api_num_json_rpc_relay_queue_item_rate|1.52|
|api_num_json_rpc_relay_queue_items|0.00|
|api_num_json_rpc_sync_queue_item_rate|0.00|
|api_num_json_rpc_sync_queue_items|0.00|
|api_num_json_rpc_work_queue_count|2.00|
|api_num_json_rpc_work_queue_item_rate|0.08|
|api_num_json_rpc_work_queue_items|0.00|
|api_num_not_conn_endpoints|3.00|
|checkercomponent_checker_idle|33.00|
|checkercomponent_checker_pending|1.00|
|idomysqlconnection_ido-mysql_queries_rate|3.30|
|idomysqlconnection_ido-mysql_queries_1min|198.00|
|idomysqlconnection_ido-mysql_queries_15mins|2,957.00|
|active_service_checks_5min|139.00|
|idomysqlconnection_ido-mysql_query_queue_items|0.00|
|idomysqlconnection_ido-mysql_query_queue_item_rate|3.67|
|active_host_checks|0.03|
|passive_host_checks|0.00|
|active_host_checks_1min|2.00|
|passive_host_checks_1min|0.00|
|active_host_checks_5min|6.00|
|passive_host_checks_5min|0.00|
|active_host_checks_15min|18.00|
|passive_host_checks_15min|0.00|
|active_service_checks|0.45|
|passive_service_checks|0.00|
|active_service_checks_1min|27.00|
|passive_service_checks_1min|0.00|
|sum_bytes_received_per_second|54.35|

Tcpdump looks fine, TLS 1.3 is interesting but shouldn’t harm here. The stats also don’t look as if there is a problem. One of the things you could try is to generate a full gdb backtrace from the running process once such oki request was fired. Maybe there’s a deadlock into play here.

Should i write a bug report on github icinga repo?

Anyway, whats an alternative method to get the agent running?

Before doing so, a test with the snapshot packages on the master might be worthwhile. Or you’ll wait until next week when 2.11 RC is out. If that backtrace points to locks (which I can analyse), those problems have already been addressed with 2.11, #7041 to be exact.

here are the backtraces:

https://pastebin.com/eSeUhZh7
https://pastebin.com/sS6MXw3n

What is the Path for the Icinga master CA certificate on the master?
ca.crt and ca.key is in /var/lib/icinga2/ca/
and in /var/lib/icinga2/certs/ is ca.crt too?

The CA key pair is stored in /var/lib/icinga2/ca and should be kept safe and secure. All client setups copy the public CA certificate as well as the client certificate and private key into /var/lib/icinga2/certs, including the master when run via icinga2 node wizard/setup.

The debug traces don’t unveil anything which would need proper attention. I’d suggest testing the 2.11 RC1 released yesterday to see whether the problems originate from the network stack.

Cheers,
Michael