I am almost at my wits end. I just finished installing the icinga2 setup with 2 master in HA and 2 Satellite in HA. Hopeful that my configuration is correct. Details of the setup is in the link Community Question
For couple of days I was trying to build up the services on 2 Linux boxes I have. I am deploying the services using director. What I noticed was random “pending state” of services on first time deployment on Host Template. But If select 1 server it was deploying fine. Post that if delete and redeploy on Host Template it will work fine as well. Yesterday I was playing around with check_logfile plugin and due to /var/log/message file cannot be read by icinga user I gave 640 permission to the messages file. Though it is not related but that is the only change I did apart from deleting service deploy again several times on host first then host group etc.
After sometime I saw all the services are running late. Then soon I realized all services are not being reported any more in web2 portal.
I enabled debug log and found that the agent on the end server is running the service checks.
The Graphite Browser also stopped reporting data trends. Once I restarted the Database (postgres) looked like for sometime it started working again. But then as soon as i restarted the Satellite it stopped again.
I have no clue on what caused this and how to fix that. Any guidance on what to look will be very helpful
EDIT I disabled the ido-pgsql and stopped the secondary master and it looks good. Now we are able to get the service checks done. Something is not correct in my secondary master configuration. I also stopped 1 Satellite and waited and then another Satellite. Looks like all is OK in terms of Satellite. But Secondary Master looked to be the problem - I don’t know why.