After Upgrade Monitoring Module Command Transport not Valid

Hello Icinga Community,
I hope you all are virus free and feeling well. Has anyone experience a problem with the Icinga2 monitoring module Command Transport Backend not working after upgrade to version 2.11.3?

I am in the middle of upgrading my production Icinga2 application and have ran into a problem with the Command Transport posting an “Failed to successfully validate the configuration: Couldn’t connect to the Icinga 2 API: NSS: client certificate not found (nickname not specified)” error. See picture below.

I have a staging environment and I did not receive this problem during the upgrade.

When I run the cURL command from the command line I receive the same error.

Has anyone experience this problem? Or can someone please help with troubleshooting this problem I have reviewed many cURL troubleshooting topics on the web but I have not found a solution yet.

I have a Icinga2 HA environment
Master 1 = Icinga2 (2.11.3) Icingaweb2 (2.7.3) OS (Red Hat 7.7)
Master 2 = Icinga2 (2.10.5) Icingaweb2 (2.7.1) OS (Red Hat 7.7)
1500+ host
10000+ services

Thanks in advance for you feedback. :slight_smile:

Hi,

here some ideas from my side:

1 Like

Hello Stevie Sy,
Thanks for providing some help. The Icinga application was working great on version 2.10.5 before the upgrade. The only change was upgrading to 2.11.3. My testing Icinga setup did not having this problem but it is much smaller setup. Answers to your question are below.

/var/lib/icinga2/certs (Master_01 Server)

-rwxr-x---. 1 icinga icinga 1720 Feb 22  2018 ca.crt
-rw-r--r--. 1 root   root   1720 Feb 22  2018 ca.crt.orig
-rwxr-x---. 1 icinga icinga   40 Feb 22  2018 ticket
-rwxr-x---. 1 icinga icinga 1814 Feb 13  2018 Master_01_hostname.crt
-rwxr-x---. 1 icinga icinga 1704 Feb 13  2018 Master_01_hostname.csr
-rwxr-x---. 1 icinga icinga 3243 Feb 13  2018 Master_01_hostname.key

/var/lib/icinga2/ca (Master_01 Server)

-rw-r--r--. 1 icinga icinga 1720 Feb 13  2018 ca.crt
-rw-------. 1 icinga icinga 3243 Feb 13  2018 ca.key

I have reviewed the Certificate Troubleshooting section and I have verified they are the correct certificates.

SELinux is set to “Permissive” mode. I am not a Red Hat Linux expert so I don’t know if this would cause any problems. From my understand, by enabling Permissive mode the SELinux policies are not enforced. Any feedback on this would be awesome.

All API-Users are posting errors when running a basic cURL connect command from the command line.

/etc/icinga2/conf.my/api-users.conf (see below why using conf.my directory)

object ApiUser "icingaweb2" {
  password = "my_password"
  permissions = [ "status/query", "actions/*", "objects/modify/*", "objects/query/*" ]
}

No, Icinga did not load the default configuration after the upgrade. I have edited out the conf.d include in the icinga2.conf file. I have created a new directory call “conf.my” and have included that in the icinga2.conf. Every time Icinga was upgraded I had to edit the files in the conf.d directory that were changed from the upgrade.

/etc/icinga2/icinga2.conf

/**
 * Although in theory you could define all your objects in this file
 * the preferred way is to create separate directories and files in the conf.d
 * directory. Each of these files must have the file extension ".conf".
 */
//include_recursive "conf.d"
include_recursive "conf.my

Ok and what are the logs saying if your icingaweb2-api-user want to connect to the icinga core?

Double quotes at the end of the line are missing, but I’d assume it’s just a copy & paste mistake:

include_recursive "conf.my

What about trying (at least for a test) with full permissions:

permissions = [ “*” ]

Hi,

i would first test on the master if the configuration is loaded correctly by issuing icinga2 object list --type apiuser and see if icinga knows the api user. Dont search any other things before icinga knows the user.

Regards,
Carsten

Hello Carsten,
Thanks for the reply. Below is the results of the command.

[root@Master_01_hostname icinga]# icinga2 object list --type apiuser
Object 'icingaweb2' of type 'ApiUser':
  % declared in '/etc/icinga2/conf.my/api-users.conf', lines 13:1-13:27
  * __name = "icingaweb2"
  * client_cn = ""
  * name = "icingaweb2"
  * package = "_etc"
  * password_hash = ""
  * permissions = [ "status/query", "actions/*", "objects/modify/*", "objects/query/*" ]
    % = modified in '/etc/icinga2/conf.my/api-users.conf', lines 15:3-15:86
  * source_location
    * first_column = 1
    * first_line = 13
    * last_column = 27
    * last_line = 13
    * path = "/etc/icinga2/conf.my/api-users.conf"
  * templates = [ "icingaweb2" ]
    % = modified in '/etc/icinga2/conf.my/api-users.conf', lines 13:1-13:27
  * type = "ApiUser"
  * zone = ""

Hello Roland,
Thanks for your reply, Yes the end double quotes was missing is a copy & paste mistake. I have tried to connect by cURL with the root user at the command line but I still get the same results “curl: (52) NSS: client certificate not found (nickname not specified)”

[root@Master_01_hostname icinga]# curl -k -u root:my_root_password 'https://localhost:5665/v1' -v
* About to connect() to localhost port 5665 (#0)
*   Trying ::1...
* Connection refused
*   Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 5665 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
* skipping SSL peer certificate verification
* NSS: client certificate not found (nickname not specified)
* SSL connection using TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
* Server certificate:
*       subject: CN=Master_01_hostname
*       start date: Feb 13 20:52:44 2018 GMT
*       expire date: Feb 09 20:52:44 2033 GMT
*       common name: Master_01_hostname
*       issuer: CN=Icinga CA
* Server auth using Basic with user 'root'
> GET /v1 HTTP/1.1
> Authorization: Basic XXXXXXXXXXXXXXXXXXXXXXXXXXX
> User-Agent: curl/7.29.0
> Host: localhost:5665
> Accept: */*
>
* Empty reply from server
* Connection #0 to host localhost left intact
curl: (52) NSS: client certificate not found (nickname not specified)

I think the problem is the OS, do you use RHEL?

I found this thread, maybe it helps, https://github.com/curl/curl/issues/540

If nothing helps open a issue witz all the details needed.

Hello Stevie Sy,
Thanks for your reply, I have enabled debuglog feature on icinga2 core and ran a tail command on the debug.log. I grep the command to “icingaweb2” and did not receive any data. I also grep to “HttpServerConnection” looking for any http command to the icinga2 core and nothing appeared.

Here is icingaweb2 logs (debug mode). You can see the commands here.

2020-04-03T19:31:50-04:00 - ERROR - Icinga\Module\Monitoring\Exception\CommandTransportException in /usr/share/icingaweb2/modules/monitoring/library/Monitoring/Command/Transport/CommandTransport.php:137 with message: icinga2: NSS: client certificate not found (nickname not specified).
#0 /usr/share/icingaweb2/modules/monitoring/application/forms/Command/Object/CheckNowCommandForm.php(76): Icinga\Module\Monitoring\Command\Transport\CommandTransport->send(Object(Icinga\Module\Monitoring\Command\Object\ScheduleServiceCheckCommand))
#1 /usr/share/php/Icinga/Web/Form.php(1171): Icinga\Module\Monitoring\Forms\Command\Object\CheckNowCommandForm->onSuccess()
#2 /usr/share/icingaweb2/modules/monitoring/library/Monitoring/Web/Controller/MonitoredObjectController.php(282): Icinga\Web\Form->handleRequest()
#3 /usr/share/icingaweb2/modules/monitoring/library/Monitoring/Web/Controller/MonitoredObjectController.php(60): Icinga\Module\Monitoring\Web\Controller\MonitoredObjectController->setupQuickActionForms()
#4 /usr/share/icingaweb2/modules/monitoring/application/controllers/ServiceController.php(72): Icinga\Module\Monitoring\Web\Controller\MonitoredObjectController->showAction()
#5 /usr/share/icingaweb2/library/vendor/Zend/Controller/Action.php(507): Icinga\Module\Monitoring\Controllers\ServiceController->showAction()
#6 /usr/share/php/Icinga/Web/Controller/Dispatcher.php(76): Zend_Controller_Action->dispatch(String)
#7 /usr/share/icingaweb2/library/vendor/Zend/Controller/Front.php(937): Icinga\Web\Controller\Dispatcher->dispatch(Object(Icinga\Web\Request), Object(Icinga\Web\Response))
#8 /usr/share/php/Icinga/Application/Web.php(300): Zend_Controller_Front->dispatch(Object(Icinga\Web\Request), Object(Icinga\Web\Response))
#9 /usr/share/php/Icinga/Application/webrouter.php(99): Icinga\Application\Web->dispatch()
#10 /usr/share/icingaweb2/public/index.php(4): require_once(String)
#11 {main}
2020-04-03T19:33:30-04:00 - DEBUG - Sending Icinga command "actions/reschedule-check" to the API "localhost:5665"
2020-04-03T19:33:30-04:00 - DEBUG - Executing curl -s -X POST -H 'Accept: application/json' -k -u 'icingaweb2':'my_password' -d '{"next_check":1585956810,"force":true,"service":"Server_hostname!Version_Icinga_2_Client"}' 'https://localhost:5665/v1/actions/reschedule-check'
2020-04-03T19:34:00-04:00 - ERROR - Icinga\Module\Monitoring\Exception\CurlException in /usr/share/icingaweb2/modules/monitoring/library/Monitoring/Web/Rest/RestRequest.php:291 with message: NSS: client certificate not found (nickname not specified)

The crazy thing is it started working yesterday. I noticed the file permission on directory /etc/icinga2/conf.my were configured with root permission. I changed the directory permission to user=icinga & group=icinga. I restarted icinga2 and all started working. I though all was fixed ! I let it sit overnight to confirm and BAMB I checked this morning and the curl connection error is happening again. I did not change anything overnight. Could this be a bug ?

My current configuration is Master 1 = ver 2.11.3. Master 2 = ver 2.10.5. The documentation says all client need to upgrade to 2.11.x. I am working upgrading all the client but with 600+ windows client it takes some time. I have all Windows clients upgraded except 50+ servers. Could this be causing the problem? It is crazy how all was working yesterday. When I ended my day yesterday the only active node was Master 1 (2.11.3). I disable Master 2 to test and confirm all was working. I checked it a few hours later and all was still working. Why did it fail overnight?

Where is the bang your head into wall emoji .

Thanks @stevie-sy & @rsx & @anon66228339 in advance for your feedback
Alex

What happens if you stop master2? Is it working as expected?
Maybe it’s an idea to upgrade your master and satelite servers first and after that your agents. We did so and had no problems. Maybe the others has other experience.

For me it’s interesting you don’f find something strage in the logs. not even an authentication error with the icingaweb2 user.
Or it’s really a problem with curl what @anon66228339 found on RHEL

Hell All,
I wanted to provide an new update on this issue. I believe this is bug in the software but I wanted to see if anyone has any other suggestion before I submit this problem as a bug. Here is what I have done since my last update.

  1. Restored the Master1 back to version 2.10.5 from VMware backup
  2. After restore, I tested the Master1 and the monitoring module command transport was able to validate without a problem. I also tested curl manually on the command line with all my icinga2 apiusers and the curl command worked correctly.
  3. Stopped icinga2 service on Master1 and waited from fail-over to complete. Master2 is active node now for all hosts and services.
  4. Ran update command on Master 1 to update to version 2.11.3 (yum update icinga2)
  5. Removed local config sync file on Master1
    ran command on Master1 ( rm -rf /var/lib/icinga2/api/zones/* )
    ran command on Master1 ( rm -rf /var/lib/icinga2/api/zones-stage/* )
  6. Started icinga2 service on Master1. Master 2 is still active node but checks and notification are balanced between both nodes.
  7. Tested monitoring module command transport on Master 1 and was able to validate without a problem. I also tested curl manually on the command line with all my icinga2 apiusers and the curl command worked correctly.
  8. Stopped icinga2 server on Master 2 and waited for fail-over. Master1 is active node now for all hosts and services.
  9. Tested monitoring module command transport on Master 1 and was able to validate without a problem. I also tested curl manually on the command line with all my icinga2 apiusers and the curl command worked correctly.
  10. Master1 was working correctly on version 2.11.3 now.

I completed the about steps around 4/10/2020, 16:00 EST. I checked the status a couple hours later and all was still working !!

The next morning, I check the status of the Icinga server and Master1 was now having problem validating the command transport in the monitoring module again. Over 4k of the services are marked as “Unknown” now.

I started the icinga2 service on Master2 (2.10.5) and stopped the icinga2 service on Master1 (2.11.3). The Master2 node is now the active node. All services with a “Unknown” state started to slowly recovered.

I reviewed the icinga2.log on the Master1 and around 04-10-2020, 20:08 EST, I start seeing ‘warning/JsonRpcConnection’ errors for many host. This is time when all the services started posting ‘Unknown’ errors.

 [2020-04-10 20:08:31 -0400] warning/JsonRpcConnection: API client disconnected for identity 'hostname'

After around 4 hours of Master1 working on version 2.11.3, messages start posting in the icinga2.log with ‘warning/JsonRpcConnection’ errors. This is the time all service started to go to an ‘Unknown’ state.

Has anyone experience this problem?

I have attached the icinga2.logs file from Master1 for anyone to review. I have filter the log with only messages from one client ( hostname = icinga2_client01, version = 2.11.2) that was having problem. I also posted logs from the client also. The client is in a different time zone than the Master1. Please add +6 hours to find matching log messages.
Master1 = 04-10-2020, 20:10
Client01 = 04-11-2020, 02:10

Thanks in advance for your help.
Alex

Master1_logs_with_icinga2_client01_only.log (27.3 KB)
icinga2_client01.log (482.2 KB)

Icinga2 2.11+ enforces TLS 1.2 now, maybe there is your problem. You should start your master/client with debug log enabled and see there if this is your problem.

Regards,
Carsten

Hello @anon66228339,
I started my Master1 in debug mode. The log gets very large quickly so I only ran debug mode for 15 minutes. In this time the log was almost 2 GB in size. I did not find any reference to version of TLS in the logs. I searched for TLS and other key words but did not find any TLS version listed in the debug log. Are you able to find the TLS version in your debug logs?

I did follow the Troubleshooting for TLS handshake and can confirm TLS version 1.2 is available.

The weird thing I experience is when running the OpenSSL command below is the OpenSSL connection never closes when running it on Master01 (2.11.3). The command prompts just sits there. I waited for over 5 minutes and I never get a close message. If I run this same command and point it at Master02 (2.10.5) the command prompts will close after 10 seconds.

[root@icingaclient icinga]# openssl s_client -connect master01:5665
CONNECTED(00000003)
depth=1 CN = Icinga CA
verify error:num=19:self signed certificate in certificate chain
---
Certificate chain
 0 s:/CN=master01_hostname
   i:/CN=Icinga CA
 1 s:/CN=Icinga CA
   i:/CN=Icinga CA
---
Server certificate
-----BEGIN CERTIFICATE-----
MIIFDTCCAvWgAwIBAgIVAM4XMj6qDmGOCXYxy8S63GeVqwZHMA0GCSqGSIb3DQEB
.....
GW6jg3SG2a5rtC6p5SrJOC73IWlELPAAPB/dyJKynm6vYj9I5EUdO2QNy4WtW8zR
8w==
-----END CERTIFICATE-----
subject=/CN=master01_hostname
issuer=/CN=Icinga CA
---
Acceptable client certificate CA names
/CN=Icinga CA
Client Certificate Types: RSA sign, DSA sign, ECDSA sign
Requested Signature Algorithms: RSA+SHA512:DSA+SHA512:ECDSA+SHA512:RSA+SHA384:DSA+SHA384:ECDSA+SHA384:RSA+SHA256:DSA+SHA256:ECDSA+SHA256:RSA+SHA224:DSA+SHA224:ECDSA+SHA224:RSA+SHA1:DSA+SHA1:ECDSA+SHA1
Shared Requested Signature Algorithms: RSA+SHA512:DSA+SHA512:ECDSA+SHA512:RSA+SHA384:DSA+SHA384:ECDSA+SHA384:RSA+SHA256:DSA+SHA256:ECDSA+SHA256:RSA+SHA224:DSA+SHA224:ECDSA+SHA224:RSA+SHA1:DSA+SHA1:ECDSA+SHA1
Peer signing digest: SHA512
Server Temp Key: ECDH, P-256, 256 bits
---
SSL handshake has read 3511 bytes and written 427 bytes
---
New, TLSv1/SSLv3, Cipher is ECDHE-RSA-AES256-GCM-SHA384
Server public key is 4096 bit
Secure Renegotiation IS supported
Compression: NONE
Expansion: NONE
No ALPN negotiated
SSL-Session:
    Protocol  : TLSv1.2
    Cipher    : ECDHE-RSA-AES256-GCM-SHA384
    Session-ID: B9FC46...XXXXXXXXXXXXXXXXXXXXXXXXXXX...2764C080216598
    Session-ID-ctx:
    Master-Key: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    Key-Arg   : None
    Krb5 Principal: None
    PSK identity: None
    PSK identity hint: None
    TLS session ticket lifetime hint: 7200 (seconds)
    TLS session ticket:
    0000 - 0d b7 9d cc 47 ff b8 51-6e 21 40 33 16 49 d9 1a   ....G..Qn!@3.I..
    ......
    0090 - c1 9f 93 8f 71 cb 7d 04-10 08 e6 2a 5f de ce 8b   ....q.}....*_...

    Start Time: 1586892781
    Timeout   : 300 (sec)
    Verify return code: 19 (self signed certificate in certificate chain)
---

Thanks in advance for your help.

Alex

Maybe your problem is curl/curl libs. Is this a self compiled version of curl/curl libs? Can you show us the output of curl -V please.

Please also try to create a new api user with (for testing only) a simple password like test/test1234 and test it with your curl. i read many reports with this “false” error message when used with a wrong password.

I believe this is the latest version of cURL. I have applied all the latest RedHat security patches last month. Here is a list updateinfo list. I don’t see anything listed for cURL. icinga2_patchlist.txt (25.6 KB)

[root@Master1 icinga]# curl -V
curl 7.29.0 (x86_64-redhat-linux-gnu) libcurl/7.29.0 NSS/3.44 zlib/1.2.7 libidn/1.28 libssh2/1.8.0
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtsp scp sftp smtp smtps telnet tftp
Features: AsynchDNS GSS-Negotiate IDN IPv6 Largefile NTLM NTLM_WB SSL libz unix-sockets

I created a new API user and not luck. The weird thing is cURL commands worked a few hours after the upgrade to 2.11.3. I tested it soon after the upgrade. Then a couple hours later I noticed 4000+ services in a “Unknown” state. I tried to run a cURL command manually at that time and ever since they have not worked. Why did it work soon after but not now?

Object 'test_user' of type 'ApiUser':
  % declared in '/etc/icinga2/conf.my/api-users.conf', lines 12:1-12:26
  * __name = "test_user"
  * client_cn = ""
  * name = "test_user"
  * package = "_etc"
  * password_hash = ""
  * permissions = [ "*" ]
    % = modified in '/etc/icinga2/conf.my/api-users.conf', lines 16:3-16:23
  * source_location
    * first_column = 1
    * first_line = 12
    * last_column = 26
    * last_line = 12
    * path = "/etc/icinga2/conf.my/api-users.conf"
  * templates = [ "test_user" ]
    % = modified in '/etc/icinga2/conf.my/api-users.conf', lines 12:1-12:26
  * type = "ApiUser"
  * zone = ""
[icinga@master1 ~]$ curl -k -u test_user:test/test1234 'https://localhost:5665/v1' -v
* About to connect() to localhost port 5665 (#0)
*   Trying ::1...
* Connection refused
*   Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 5665 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
* skipping SSL peer certificate verification
* NSS: client certificate not found (nickname not specified)
* SSL connection using TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
* Server certificate:
*       subject: CN=master1_hostname
*       start date: Feb 13 20:52:44 2018 GMT
*       expire date: Feb 09 20:52:44 2033 GMT
*       common name: master1_hostname
*       issuer: CN=Icinga CA
* Server auth using Basic with user 'test_user'
> GET /v1 HTTP/1.1
> Authorization: Basic xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
> User-Agent: curl/7.29.0
> Host: localhost:5665
> Accept: */*
>
* Empty reply from server
* Connection #0 to host localhost left intact
curl: (52) NSS: client certificate not found (nickname not specified)
``

I have opened a GitHub issue for this problem.

2 Likes