Question about CSR Auto-signing

Hello,

I need some explaination about CSR auto signing.
I’m in 3 level cluster (master / satellite / agent).
For the record, I was until then in 2 level (master / agent) and I didn’t have this problem.
I use an Ansible playbook to install and register my agent, after install Icinga2 binaries (windows / linux wathever) my ansible server make an api call on the master to get ticket and send the “node setup” command with this ticket.
In 2 level scenario : no probleme.

Like I said, now i’m in 3 level scenario, so I need to reconfigure my agents ton speak with satellites instead of masters.
So I modify my Ansible playbook to configure agent for speaks with satellite BUT continue to ask to the master for the ticket.
Here the tasks in execution order :

  • API call to get ticket with CN=[agent_fqdn]
  • Generate autosign certificate with :
    icinga2 pki new-cert
    –cn [agent_fqdn]
    –key /var/lib/icinga2/certs/[agent_fqdn].key
    –cert /var/lib/icinga2/certs/[agent_fqdn].crt
  • Ask for parent certificate (it ask it to the satellite ‘parent’)
    icinga2 pki save-cert
    –host [satellite_fqdn]
    –port 5665
    –key /var/lib/icinga2/certs/[agent_fqdn].key
    –cert /var/lib/icinga2/certs/[agent_fqdn].crt
    –trustedcert /var/lib/icinga2/certs/master.crt
  • Node setup
    icinga2 node setup
    –zone [agent_fqdn]
    –endpoint [satellite_fqdn],[satellite_fqdn],5665
    –endpoint [satellite2_fqdn],[satellite_fqdn],5665
    –parent_host [satellite_fqdn],5665
    –parent_zone [satellite_zone]
    –cn agent_fqdn
    –accept-config
    –accept-commands
    –disable-confd
    –trustedcert /var/lib/icinga2/certs/master.crt
    –ticket [ticket_from_master_requested_before]
  • Restart icinga2 service

During those steps, I follow the logs in the satellite node and I can see several lines like bellow

information/ApiListener: Reconnecting to endpoint ‘agent_fqdn’ via host ‘10.32.6.2’ and port ‘5665’
[2024-12-19 15:21:34 +0100] warning/ApiListener: Certificate validation failed for endpoint ‘agent_fqdn’: code 18: self-signed certificate
[2024-12-19 15:21:34 +0100] information/ApiListener: New client connection for identity ‘agent_fqdn’ to [10.32.6.2]:5665 (certificate validation failed: code 18: self-signed certificate)
[2024-12-19 15:21:49 +0100] critical/ApiListener: Timeout while reconnecting to endpoint ‘agent_fqdn’ via host ‘10.32.6.2’ and port ‘5665’, cancelling attempt
[2024-12-19 15:21:49 +0100] information/ApiListener: Finished reconnecting to endpoint ‘agent_fqdn’ via host ‘10.32.6.2’ and port ‘5665’

So for what I understand, despite the ticket provided, the auto signing seems to not working…
If I check on master, I don’t have CSR waiting for approuval.
If I re run the process, auto signing continu to not working but I have the CSR waiting on master…

I’m pretty sure I don’t understand something in this process despite several read of the doc here
https://icinga.com/docs/icinga-2/latest/doc/06-distributed-monitoring/#csr-auto-signing

If some one has some explaination for me :slight_smile:

Thank you for advance

Here is an extract of the procedure that works for me (procedure for Windows servers with Linux master):

Conventions:

SERVER = the server on which the agent is being configured
MASTER = the server running the Icinga Certificate Authority
FQDN = the fully-qualified DNS name of the server on which the agent is being configured

  • Open an SSH session on MASTER, switch user to root, and copy the following file from MASTER to SERVER, to directory C:\ProgramData\icinga2\var\lib\icinga2\certs:

/var/lib/icinga2/certs/ca.crt

  • On SERVER, open a PowerShell prompt run as Admin, change directory to C:\ProgramData\icinga2\var\lib\icinga2\certs and run the following commands:
    • $FQDN = [System.Net.Dns]::Resolve($null).HostName
    • $CERT_DIR = “C:\ProgramData\icinga2\var\lib\icinga2\certs”
    • cd “C:\Program Files\ICINGA2\sbin”
    • .\icinga2.exe pki new-cert --cn $FQDN --key $CERT_DIR\$FQDN.key --csr $CERT_DIR\$FQDN.csr
    • The output should be:

information/base: Writing private key to ‘FQDN.key’.
information/base: Writing certificate signing request to ‘FQDN.csr’.

  • On MASTER in the SSH session, change directory to any work directory (\tmp for instance), and:
    • Copy there the FQDN.csr file just created on SERVER
    • Run the following command:

icinga2 pki sign-csr --csr ./FQDN.csr --cert ./FQDN.crt

  • Copy FQDN.crt (just created) from MASTER to SERVER, to directory C:\ProgramData\icinga2\var\lib\icinga2\certs

Thanks for your answer and sorry for the late reply.
This is approximatively what i’m doing on linux but without generate csr and manualy copying file. I need a full automated process, but sadly mine is not 100% working…

Maybe use or lift from lfops/roles/icinga2_agent/tasks/main.yml at 3e91ec650db47071c48af7a1ba48dd247de79934 · Linuxfabrik/lfops · GitHub.

This is what I use, it’s working but not 100%, I don’t understand why.
Just right now, I have a new server, I installed the agent with ansible, but no csr waiting for approuval on satellite / master …
If I re run several time it ends up working…

On precision more, my agent ask for certificat to his parent, wich is a satellite.
When it’s working, I can see the csr on the master, I sign it and restart service on agent and it’s OK.
Before I start to integrate satellite, I don’t remember had those kind of problem… maybe this is the problem ?

If you have problems with the Linuxfabrik roles, you can open issues on github.

I don’t think it’s a problem with their role, sometime it work, and some other, it don’t, I don’t have explaination.
If I try to run manually the command the is the same behavior, no request for new certificate on satellite / master.
Here, after 3 try, the agent ask for certificate and I can see the csr on the master…
Each time I run the same playbook …

there is no way to disable certificate check ?

The task coping the CA from the master to the new agent should take care about the certificate check.

Working after 3 tries sounds like a wireshark session is in order.

When there is a configuration with 2 satellites and 2 master, it’s the master signing the csr but after ?
It copy to the satellite wich copy to the agent ? Or this is the master it self ?
Hope it’s the satellite because I don’t want to open port between master and agent (it’s one of the interest to set satellite)

Just read the following that could be your problem:


+## 2.14.5 (2025-02-06)
+
+This release fixes a regression introduced in 2.14.4 that caused the `icinga2 node setup`,
+`icinga2 node wizard`, and `icinga2 pki request` commands to fail if a certificate was
+requested from a node that has to forward the request to another node for signing.
+Additionally, it fixes a small bug in the performance data normalization and includes
+various documentation improvements.

Ah yes it look like my issue you re right !
I test it tomorrow !

After reading it’s seems to be exactly the problem I have.
Waiting for new release… Thank you for the direction

1 Like

I test the new release but there is always the bug…
3 servers enrollment :

  • First : more than 5 try
  • Second : on first try
  • Third : 3 try

I enable debuglog feature and I can see some error when agent ask for certificate :

On agent :

[2025-02-18 09:54:31 +0100] debug/JsonRpcConnection: Error while reading JSON-RPC message for identity 'v003lin-zee13.ut.unitel.lan': Error: End of file

Stacktrace:
0# __cxa_throw in /usr/lib/x86_64-linux-gnu/icinga2/sbin/icinga2
1# 0x00005AE3823B6241 in /usr/lib/x86_64-linux-gnu/icinga2/sbin/icinga2
2# icinga::NetString::ReadStringFromStream(boost::intrusive_ptr<icinga::Shared<icinga::AsioTlsStream> > const&, boost::asio::basic_yield_context<boost::asio::executor_binder<void (*)(), boost::asio::executor> >, long) in /usr/lib/x86_64-linux-gnu/icinga2/sbin/icinga2
3# icinga::JsonRpcConnection::HandleIncomingMessages(boost::asio::basic_yield_context<boost::asio::executor_binder<void (*)(), boost::asio::executor> >) in /usr/lib/x86_64-linux-gnu/icinga2/sbin/icinga2
4# 0x00005AE3826540FF in /usr/lib/x86_64-linux-gnu/icinga2/sbin/icinga2
5# 0x00005AE382691035 in /usr/lib/x86_64-linux-gnu/icinga2/sbin/icinga2
6# make_fcontext in /lib/x86_64-linux-gnu/libboost_context.so.1.74.0
[2025-02-18 09:54:31 +0100] warning/JsonRpcConnection: API client disconnected for identity 'v003lin-zee13.ut.unitel.lan'
[2025-02-18 09:54:32 +0100] notice/JsonRpcConnection: Received 'icinga::Hello' message from identity 'v004lin-zee14.ut.unitel.lan'.
[2025-02-18 09:54:32 +0100] debug/JsonRpcConnection: Processed JSON-RPC 'icinga::Hello' message for identity 'v004lin-zee14.ut.unitel.lan' (took total 0ms).
[2025-02-18 09:54:32 +0100] notice/JsonRpcConnection: Error while reading JSON-RPC message for identity 'v004lin-zee14.ut.unitel.lan': Error: End of file

Stacktrace:
0# __cxa_throw in /usr/lib/x86_64-linux-gnu/icinga2/sbin/icinga2
1# 0x00005AE3823B6241 in /usr/lib/x86_64-linux-gnu/icinga2/sbin/icinga2
2# icinga::NetString::ReadStringFromStream(boost::intrusive_ptr<icinga::Shared<icinga::AsioTlsStream> > const&, boost::asio::basic_yield_context<boost::asio::executor_binder<void (*)(), boost::asio::executor> >, long) in /usr/lib/x86_64-linux-gnu/icinga2/sbin/icinga2
3# icinga::JsonRpcConnection::HandleIncomingMessages(boost::asio::basic_yield_context<boost::asio::executor_binder<void (*)(), boost::asio::executor> >) in /usr/lib/x86_64-linux-gnu/icinga2/sbin/icinga2
4# 0x00005AE3826540FF in /usr/lib/x86_64-linux-gnu/icinga2/sbin/icinga2
5# 0x00005AE382691035 in /usr/lib/x86_64-linux-gnu/icinga2/sbin/icinga2
6# make_fcontext in /lib/x86_64-linux-gnu/libboost_context.so.1.74.0
[2025-02-18 09:54:32 +0100] notice/JsonRpcConnection: Disconnecting API client for identity 'v004lin-zee14.ut.unitel.lan'

On satellite :

[2025-02-18 10:49:42 +0100] notice/JsonRpcConnection: Error while reading JSON-RPC message for identity 's007pin-aakon03.usb.unitel.lan': Error: stream truncated [asio.ssl.stream:1]
[2025-02-18 10:49:43 +0100] debug/JsonRpcConnection: Error while reading JSON-RPC message for identity 's007pin-aakon03.usb.unitel.lan': Error: Operation canceled [system:125 at /usr/include/boost/asio/detail/reactive_socket_recv_op.hpp:134 in function 'do_complete']

There is not the same timestamp because it extracted from 2 different shots on 2 different agent enrollment.

For more understanding the logs bellow :
v004lin-zee14.ut.unitel.lan and v003lin-zee13.ut.unitel.lan are the satellites
s007pin-aakon03.usb.unitel.lan is the agent

I looked on github issue but I didn’t find anything since the last release…

In your place, I would now escalate from this forum post to a github issue.

Hi,

Does s007pin-aakon03.usb.unitel.lan have connectivity to port 5665 of both v004lin-zee14.ut.unitel.lan and v003lin-zee13.ut.unitel.lan?

Or is it like in our setup where the connectivity is unidirectional from satellite to agent? In which case you may want to try the method I gave in my first reply, because it only assumes the following connectivity:

  • from ansible to master, satellite and agent (ssh)
  • bidirectional between master and satellite (port 5665)
  • from satellite to agent (port 5665)

Note that the process I described can easily be automated using Ansible.

Best regards,

Jean

PS: I am not sure the --key and --cert options should be used in this command:

icinga2 pki save-cert
–host [satellite_fqdn]
–port 5665
–key /var/lib/icinga2/certs/[agent_fqdn].key
–cert /var/lib/icinga2/certs/[agent_fqdn].crt
–trustedcert /var/lib/icinga2/certs/master.crt

Hi,

Yes network between agent <=> satellite and satellite <=> master are open on TCP/5665 in both directions.

At the end maybe it will be the solution if internal mechanisms doesn’t work…

For the “icinga2 pki save-cert” command it seems you’re right, the options “key” and “cert” are not available in the command helper… it’s weird that it’s accepted…
I will try without those options to see if it resolve my problem

For the record I open an issue on github : Error while reading JSON-RPC during agent enrolment · Issue #10351 · Icinga/icinga2 · GitHub

1 Like