I am almost at my wits end. I just finished installing the icinga2 setup with 2 master in HA and 2 Satellite in HA. Hopeful that my configuration is correct. Details of the setup is in the link Community Question
For couple of days I was trying to build up the services on 2 Linux boxes I have. I am deploying the services using director. What I noticed was random “pending state” of services on first time deployment on Host Template. But If select 1 server it was deploying fine. Post that if delete and redeploy on Host Template it will work fine as well. Yesterday I was playing around with check_logfile plugin and due to /var/log/message file cannot be read by icinga user I gave 640 permission to the messages file. Though it is not related but that is the only change I did apart from deleting service deploy again several times on host first then host group etc.
After sometime I saw all the services are running late. Then soon I realized all services are not being reported any more in web2 portal.
I enabled debug log and found that the agent on the end server is running the service checks.
The Graphite Browser also stopped reporting data trends. Once I restarted the Database (postgres) looked like for sometime it started working again. But then as soon as i restarted the Satellite it stopped again.
I have no clue on what caused this and how to fix that. Any guidance on what to look will be very helpful
EDIT I disabled the ido-pgsql and stopped the secondary master and it looks good. Now we are able to get the service checks done. Something is not correct in my secondary master configuration. I also stopped 1 Satellite and waited and then another Satellite. Looks like all is OK in terms of Satellite. But Secondary Master looked to be the problem - I don’t know why.
But in secondary master these folders are not present. Also the
/etc/icinga2/zones.d is just blank in both the servers
I tried to stop the Primary Master and found that the IcingaWeb2 just went blank with no node or service at all. I am guessing the Primary Master is not syncing services to secondary obviously
for me personally it is really hard to follow a long text with many details. Try to cut it down to all the steps you did, and also illustrate it with configuration snippets and log outputs.
Also, if you have used a docs URL or source, link it here to allow everyone learn what you’ve tried already.
First things first, please share the zones.conf Zone hierarchy to get a better picture
Second to that, please add the output of icinga2 --version for all involved nodes.
Sorry It was progressive and I was trying myself in the background along with asking for help. Hence so much of text
Current problem : When Primary Master goes down the Icingaweb2 goes blank. As if Secondary Master is not able to continue the monitoring. Though I can see in agent log that the checks are still happening.
I have deployed all services using director.
zones.conf -> Primary Master
object Endpoint "ncvdl09.us.corp.net" {
//Local Server
}
object Endpoint "ncvdl10.us.corp.net" {
host = "192.168.1.154"
}
object Zone "master" {
endpoints = [ "ncvdl09.us.corp.net", "ncvdl10.us.corp.net" ]
}
object Zone "global-templates" {
global = true
}
object Zone "director-global" {
global = true
}
For Secondary Master the Zones.conf
object Endpoint "ncvdl10.us.corp.net" {
// Local Server
}
object Endpoint "ncvdl09.us.corp.net" {
// Remote Server Primary Master
host = "53.242.35.151"
}
object Zone "master" {
endpoints = [ "ncvdl10.us.corp.net", "ncvdl09.us.corp.net" ]
}
object Zone "global-templates" {
global = true
}
object Zone "director-global" {
global = true
}
icinga2 --version
icinga2 - The Icinga 2 network monitoring daemon (version: 2.11.0-1)
Copyright (c) 2012-2019 Icinga GmbH (https://icinga.com/)
License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl2.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
System information:
Platform: Red Hat Enterprise Linux Server
Platform version: 7.6 (Maipo)
Kernel: Linux
Kernel version: 3.10.0-957.5.1.el7.x86_64
Architecture: x86_64
Build information:
Compiler: GNU 4.8.5
Build host: runner-LTrJQZ9N-project-322-concurrent-0
Application information:
General paths:
Config directory: /etc/icinga2
Data directory: /var/lib/icinga2
Log directory: /var/log/icinga2
Cache directory: /var/cache/icinga2
Spool directory: /var/spool/icinga2
Run directory: /run/icinga2
Old paths (deprecated):
Installation root: /usr
Sysconf directory: /etc
Run directory (base): /run
Local state directory: /var
Internal paths:
Package data directory: /usr/share/icinga2
State path: /var/lib/icinga2/icinga2.state
Modified attributes path: /var/lib/icinga2/modified-attributes.conf
Objects path: /var/cache/icinga2/icinga2.debug
Vars path: /var/cache/icinga2/icinga2.vars
PID path: /run/icinga2/icinga2.pid
With using the Director, are you managing the Satellite zone inside the infrastructure tab? We generally recommend to build the Zone trust relationship for the cluster config sync outside in zones.conf, to prevent problems like Zone-in-Zone-Inception being a chicken egg problem.
That being said, move the satellite zone definitions outside of the Director into zones.conf. That’s also described here.
Then went to Director -> Activity Log -> Infrastructure -> Kickstart Wizard --> Added Icinga primary Host / port / API username & Password and imported it. Is that something I was not suppose to do ?
Sure that’s the default way of getting things done, fetching the external master zone Icinga knows about. Still, your zones.conf does not include any reference to the child zone called satellite_US … where’s that defined?
object Endpoint "ncvdl10.us.corp.net" {
// Local Server
}
object Endpoint "ncvdl09.us.corp.net" {
host = "192.168.1.151"
}
object Zone "master" {
endpoints = [ "ncvdl10.us.corp.net", "ncvdl09.us.corp.net" ]
}
object Endpoint "ncvdl12.us.corp.net" {
host = "192.168.1.193"
}
object Endpoint "ncvdl11.us.corp.net" {
host = "192.168.1.156"
}
object Zone "US_Satellite" {
endpoints = [ "ncvdl12.us.corp.net", "ncvdl11.us.corp.net" ]
parent = "master"
}
object Zone "global-templates" {
global = true
}
object Zone "director-global" {
global = true
}
I stopped Primary Master
And the web2 went blank with no service checks visible. I can see in the agent debug log that it is running the service checks