I’ve been using Icinga2 with little to no issues for a long while now. We have the agent automatically deploying and everything has been working perfectly for quite some time.
Today, I have a Windows server that was previously working with the agent all of the sudden reporting it has no connection with icinga2.
I tried my usual fixes that always have worked in the past. For example uninstalling it and letting it reinstall. This didn’t resolve the issue. I also disabled the firewall on both the Windows server and the Icinga2 server for troubleshooting purposes. This did not help either.
I see no errors or warnings in the deployments through director and I can’t think of anything that has changed recently except for cutting over the Icinga2web to https (this works fine).
All other Windows agents continue to connect and monitor services as expected.
In such cases I always start with looking into the logs, especially affected agent and its parent(s). My typical errors are hostname-certificate-missmatch, missing parents host ip, unsigned certificate…
Thanks, I see nothing in the icinga log file of interest on the Windows host. What is odd to me is that all other Windows hosts are working fine and the agent deployment is controlled/cookie-cutter.
This could be a certificate issue but I’m not really sure where to look. I ran icinga2 ca list on the Icinga2 server and there are none. Also nothing is listed with --all.
Certificate issues are reported in the logs. If nothing is reported than typically none of the partners tries to connect (typical due to missing “contact” config). Here you’ll find some troubleshooting tips.
So it seems like my problem is that the Icinga2 agent on the Windows server will not listen on 5665.
I tried uninstalling it and reinstalling it. The service is running. But no matter what I try I never see it listening on 5665. Other Windows servers that are working show the 5665 port and a connection to the Icinga2 server as expected.
Where should I look for why the icinga service wouldn’t be listening on 5665?
It seems like there is a problem with getting the conf file and maybe that is why it isn’t listening. Any idea where to look from here? I’m wondering if this has to do with the API and our recent change to SSL/https instead of http
As you use the Director: Do you deploy the agent with the Director-given host template token and the code snippet?
As there is an “Error 401 Unauthorized”:
Did you, maybe, change the API user/password for the director? Or changed the agent key/token for the host template?
You also write “change to https”: Is your code snippet maybe still with http instead of https?
A tip for reinstalling the agent:
uninstall the agent
then delete c:\program data\icinga2
so the system is completely clean from the previous icinga2 install.
Are you using a self signed certificate which is not stored at the local certificate store of that Windows machine? If so you need to add -IgnoreSSLErrors (assuming you use icinga2-powershell-module).
have you tried starting icinga from command line with icinga2.exe daemon to see whats happening?
How is your connection configured? The master/satellite is connecting to the agent ?
In my environments in most cases its the network team which kills my connections or the Virusscanner/appsecurity has got some “hardening” In rare cases I ****** up Icinga and do not know that i have done it
The interesting thing is the I never had this problem before and there have been no changes to anything other than modifying icinga2 to use https instead of http and the addition of the VMware module for vSphere which required a few modules be updated to their latest versions.
If I curl the API url with https from a linux box it detects a self signed certificate (our internal CA) and complains/fails but if I pass -k to ignore the detection it proceeds with no issue. I tried adding the root and intermediate certificates to the linux box I was curling from but it still complained and I had to ignore it to proceed.
I’m going to look at this right now. Anything that would simplify the Windows agent process would be great as of now I am doing the powershell module with Director and deploying via SCCM. It has worked flawlessly since I initially configured it quite some time ago.
In this case it seems that the problem is communicating with the Icinga2 server and getting the config for the Windows agent over the API. I have ruled out firewalls, networking issues etc…