I need some explaination about CSR auto signing.
I’m in 3 level cluster (master / satellite / agent).
For the record, I was until then in 2 level (master / agent) and I didn’t have this problem.
I use an Ansible playbook to install and register my agent, after install Icinga2 binaries (windows / linux wathever) my ansible server make an api call on the master to get ticket and send the “node setup” command with this ticket.
In 2 level scenario : no probleme.
Like I said, now i’m in 3 level scenario, so I need to reconfigure my agents ton speak with satellites instead of masters.
So I modify my Ansible playbook to configure agent for speaks with satellite BUT continue to ask to the master for the ticket.
Here the tasks in execution order :
During those steps, I follow the logs in the satellite node and I can see several lines like bellow
information/ApiListener: Reconnecting to endpoint ‘agent_fqdn’ via host ‘10.32.6.2’ and port ‘5665’
[2024-12-19 15:21:34 +0100] warning/ApiListener: Certificate validation failed for endpoint ‘agent_fqdn’: code 18: self-signed certificate
[2024-12-19 15:21:34 +0100] information/ApiListener: New client connection for identity ‘agent_fqdn’ to [10.32.6.2]:5665 (certificate validation failed: code 18: self-signed certificate)
[2024-12-19 15:21:49 +0100] critical/ApiListener: Timeout while reconnecting to endpoint ‘agent_fqdn’ via host ‘10.32.6.2’ and port ‘5665’, cancelling attempt
[2024-12-19 15:21:49 +0100] information/ApiListener: Finished reconnecting to endpoint ‘agent_fqdn’ via host ‘10.32.6.2’ and port ‘5665’
So for what I understand, despite the ticket provided, the auto signing seems to not working…
If I check on master, I don’t have CSR waiting for approuval.
If I re run the process, auto signing continu to not working but I have the CSR waiting on master…
Here is an extract of the procedure that works for me (procedure for Windows servers with Linux master):
Conventions:
SERVER = the server on which the agent is being configured MASTER = the server running the Icinga Certificate Authority FQDN = the fully-qualified DNS name of the server on which the agent is being configured
Open an SSH session on MASTER, switch user to root, and copy the following file from MASTER to SERVER, to directory C:\ProgramData\icinga2\var\lib\icinga2\certs:
/var/lib/icinga2/certs/ca.crt
On SERVER, open a PowerShell prompt run as Admin, change directory to C:\ProgramData\icinga2\var\lib\icinga2\certs and run the following commands:
Thanks for your answer and sorry for the late reply.
This is approximatively what i’m doing on linux but without generate csr and manualy copying file. I need a full automated process, but sadly mine is not 100% working…
This is what I use, it’s working but not 100%, I don’t understand why.
Just right now, I have a new server, I installed the agent with ansible, but no csr waiting for approuval on satellite / master …
If I re run several time it ends up working…
On precision more, my agent ask for certificat to his parent, wich is a satellite.
When it’s working, I can see the csr on the master, I sign it and restart service on agent and it’s OK.
Before I start to integrate satellite, I don’t remember had those kind of problem… maybe this is the problem ?
I don’t think it’s a problem with their role, sometime it work, and some other, it don’t, I don’t have explaination.
If I try to run manually the command the is the same behavior, no request for new certificate on satellite / master.
Here, after 3 try, the agent ask for certificate and I can see the csr on the master…
Each time I run the same playbook …
When there is a configuration with 2 satellites and 2 master, it’s the master signing the csr but after ?
It copy to the satellite wich copy to the agent ? Or this is the master it self ?
Hope it’s the satellite because I don’t want to open port between master and agent (it’s one of the interest to set satellite)
Just read the following that could be your problem:
+## 2.14.5 (2025-02-06)
+
+This release fixes a regression introduced in 2.14.4 that caused the `icinga2 node setup`,
+`icinga2 node wizard`, and `icinga2 pki request` commands to fail if a certificate was
+requested from a node that has to forward the request to another node for signing.
+Additionally, it fixes a small bug in the performance data normalization and includes
+various documentation improvements.
I test the new release but there is always the bug…
3 servers enrollment :
First : more than 5 try
Second : on first try
Third : 3 try
I enable debuglog feature and I can see some error when agent ask for certificate :
On agent :
[2025-02-18 09:54:31 +0100] debug/JsonRpcConnection: Error while reading JSON-RPC message for identity 'v003lin-zee13.ut.unitel.lan': Error: End of file
Stacktrace:
0# __cxa_throw in /usr/lib/x86_64-linux-gnu/icinga2/sbin/icinga2
1# 0x00005AE3823B6241 in /usr/lib/x86_64-linux-gnu/icinga2/sbin/icinga2
2# icinga::NetString::ReadStringFromStream(boost::intrusive_ptr<icinga::Shared<icinga::AsioTlsStream> > const&, boost::asio::basic_yield_context<boost::asio::executor_binder<void (*)(), boost::asio::executor> >, long) in /usr/lib/x86_64-linux-gnu/icinga2/sbin/icinga2
3# icinga::JsonRpcConnection::HandleIncomingMessages(boost::asio::basic_yield_context<boost::asio::executor_binder<void (*)(), boost::asio::executor> >) in /usr/lib/x86_64-linux-gnu/icinga2/sbin/icinga2
4# 0x00005AE3826540FF in /usr/lib/x86_64-linux-gnu/icinga2/sbin/icinga2
5# 0x00005AE382691035 in /usr/lib/x86_64-linux-gnu/icinga2/sbin/icinga2
6# make_fcontext in /lib/x86_64-linux-gnu/libboost_context.so.1.74.0
[2025-02-18 09:54:31 +0100] warning/JsonRpcConnection: API client disconnected for identity 'v003lin-zee13.ut.unitel.lan'
[2025-02-18 09:54:32 +0100] notice/JsonRpcConnection: Received 'icinga::Hello' message from identity 'v004lin-zee14.ut.unitel.lan'.
[2025-02-18 09:54:32 +0100] debug/JsonRpcConnection: Processed JSON-RPC 'icinga::Hello' message for identity 'v004lin-zee14.ut.unitel.lan' (took total 0ms).
[2025-02-18 09:54:32 +0100] notice/JsonRpcConnection: Error while reading JSON-RPC message for identity 'v004lin-zee14.ut.unitel.lan': Error: End of file
Stacktrace:
0# __cxa_throw in /usr/lib/x86_64-linux-gnu/icinga2/sbin/icinga2
1# 0x00005AE3823B6241 in /usr/lib/x86_64-linux-gnu/icinga2/sbin/icinga2
2# icinga::NetString::ReadStringFromStream(boost::intrusive_ptr<icinga::Shared<icinga::AsioTlsStream> > const&, boost::asio::basic_yield_context<boost::asio::executor_binder<void (*)(), boost::asio::executor> >, long) in /usr/lib/x86_64-linux-gnu/icinga2/sbin/icinga2
3# icinga::JsonRpcConnection::HandleIncomingMessages(boost::asio::basic_yield_context<boost::asio::executor_binder<void (*)(), boost::asio::executor> >) in /usr/lib/x86_64-linux-gnu/icinga2/sbin/icinga2
4# 0x00005AE3826540FF in /usr/lib/x86_64-linux-gnu/icinga2/sbin/icinga2
5# 0x00005AE382691035 in /usr/lib/x86_64-linux-gnu/icinga2/sbin/icinga2
6# make_fcontext in /lib/x86_64-linux-gnu/libboost_context.so.1.74.0
[2025-02-18 09:54:32 +0100] notice/JsonRpcConnection: Disconnecting API client for identity 'v004lin-zee14.ut.unitel.lan'
On satellite :
[2025-02-18 10:49:42 +0100] notice/JsonRpcConnection: Error while reading JSON-RPC message for identity 's007pin-aakon03.usb.unitel.lan': Error: stream truncated [asio.ssl.stream:1]
[2025-02-18 10:49:43 +0100] debug/JsonRpcConnection: Error while reading JSON-RPC message for identity 's007pin-aakon03.usb.unitel.lan': Error: Operation canceled [system:125 at /usr/include/boost/asio/detail/reactive_socket_recv_op.hpp:134 in function 'do_complete']
There is not the same timestamp because it extracted from 2 different shots on 2 different agent enrollment.
For more understanding the logs bellow :
v004lin-zee14.ut.unitel.lan and v003lin-zee13.ut.unitel.lan are the satellites
s007pin-aakon03.usb.unitel.lan is the agent
I looked on github issue but I didn’t find anything since the last release…
Does s007pin-aakon03.usb.unitel.lan have connectivity to port 5665 of both v004lin-zee14.ut.unitel.lan and v003lin-zee13.ut.unitel.lan?
Or is it like in our setup where the connectivity is unidirectional from satellite to agent? In which case you may want to try the method I gave in my first reply, because it only assumes the following connectivity:
from ansible to master, satellite and agent (ssh)
bidirectional between master and satellite (port 5665)
from satellite to agent (port 5665)
Note that the process I described can easily be automated using Ansible.
Best regards,
Jean
PS: I am not sure the --key and --cert options should be used in this command:
Yes network between agent <=> satellite and satellite <=> master are open on TCP/5665 in both directions.
At the end maybe it will be the solution if internal mechanisms doesn’t work…
For the “icinga2 pki save-cert” command it seems you’re right, the options “key” and “cert” are not available in the command helper… it’s weird that it’s accepted…
I will try without those options to see if it resolve my problem