We are in the process of migrating our icinga infrastructure from ubuntu to rhel8.
We have 2 HA masters, 6 satellites and 1 graphite server. The database is an AWS RDS MySQL - db.r6g.xlarge
We are using icinga to monitor ~17k servers and 120k services.
After migrating everything to the new Rhel environment, the icinga web seems to be stuck, even if the checks and notifications are working as expected.
Tactical overview is not updated at all.
I have attached a printscreen
Icingaweb is installed on one of the masters.
IDO config is identical on both masters.
Yes, I have migrated the database to another host using a snapshot of the original database.
Below are some icingaweb2 log events
* Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 5665 (#0)
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: /etc/pki/tls/certs/ca-bundle.crt
CApath: none
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server did not agree to a protocol
* Server certificate:
* subject: CN=master-rhel8.icinga-stage.__REDACTED__
* start date: Jul 17 14:01:39 2024 GMT
* expire date: Aug 18 14:01:39 2025 GMT
* issuer: CN=Icinga CA
* SSL certificate verify result: self signed certificate in certificate chain (19), continuing anyway.
* Server auth using Basic with user 'root'
> POST /v1/actions/reschedule-check HTTP/1.1
Host: localhost:5665
Authorization: Basic __REDACTED__
Content-Type: application/json
Accept: application/json
Content-Length: 108
* upload completely sent off: 108 out of 108 bytes
< HTTP/1.1 200 OK
< Server: Icinga/r2.14.2-1
< Content-Type: application/json
< Content-Length: 140
<
* Connection #0 to host localhost left intact
2024-08-16T12:08:20+03:00 - DEBUG - Sending Icinga command "actions/reschedule-check" to the API "localhost:5665"
2024-08-16T12:08:20+03:00 - DEBUG - Executing curl -s -X POST -H 'Accept: application/json' -k -u 'root':'__REDACTED__' -d '{"next_check":1723799300,"force":true,"service":"HOST!CHECK"}' 'https://localhost:5665/v1/actions/reschedule-check'
2024-08-16T12:08:21+03:00 - DEBUG - * Hostname localhost was found in DNS cache
* Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 5665 (#0)
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: /etc/pki/tls/certs/ca-bundle.crt
CApath: none
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server did not agree to a protocol
* Server certificate:
* subject: CN=master-rhel8.icinga-stage.__REDACTED__
* start date: Jul 17 14:01:39 2024 GMT
* expire date: Aug 18 14:01:39 2025 GMT
* issuer: CN=Icinga CA
* SSL certificate verify result: self signed certificate in certificate chain (19), continuing anyway.
* Server auth using Basic with user 'root'
> POST /v1/actions/reschedule-check HTTP/1.1
Host: localhost:5665
Authorization: Basic __REDACTED__
Content-Type: application/json
Accept: application/json
Content-Length: 113
* upload completely sent off: 113 out of 113 bytes
< HTTP/1.1 200 OK
< Server: Icinga/r2.14.2-1
< Content-Type: application/json
< Content-Length: 145
<
* Connection #0 to host localhost left intact
Thanks for your reply and please excuse the delay from my side, I was in my holidays.
First, I took the liberty to alter your log and redact some information as FQDNs and the root password. As this post was public for ten days now, you might wanna rotate passwords, even if it’s just a staging system.
Back to topic: Unfortunately I was unable to read something out of your information. Could you please take a look at the following points:
In Icinga Web, navigate to Settings () > System/Health and check the status for Icinga.
In Icinga Web, navigate to Settings () > Configuration/Modules and select monitoring. There, go to the Backends tab and inspect each Monitoring Backend, memorize the Resource.
Afterwards, go to Settings () > Configuration/Application, switch to the Resources tab and select the just memorized Resource. Please both Validate them through the UI-Button and compare if they are identical to those configured in your Icinga 2 IDO configuration.
I just used plural for multiple Resources, but most probably there is only one.
On your system, I would kindly ask you again to have a look in the Icinga Web log file - Configuration - Icinga Web - (might be /usr/share/icingaweb2/log/icingaweb2.log) and grep for WARN or ERROR log messages. Otherwise, you can grep for SQL-related messages.
However, maybe Icinga Web isn’t the culprit here. Would you please enter a SQL console and check if the icinga_statehistory table contains recent entries?
select * from icinga_statehistory order by state_time desc limit 10;
Otherwise, please feel free to check the Icinga 2 log files for WARN or ERROR log messages.