Hi Icinga Community,
I wonder if someone can provide an answer to a strange issue I am experiencing. When deploying new hosts, services or any other config changes using director the icinga cluster reports that all endpoints are not connected for a few minutes. This occurs only when deploying changes. All other times the cluster endpoints are fine and checks are being executed and reported back.
I am using the top down strategy with HA masters and a number of satellites.
Icinga Version: 2.11.4-1
Icinga Web 2 Version: 2.8.1
Git commit: 233bd29e4104125b4e5ef631e8c16dde33dadd9a
PHP Version: 7.3.11
Director Version: 1.7.2
Primary Master zones.conf
// Primary master
object Endpoint "enf-sdmon-mstr01.ourdomain.net" {
// This server
}
// Secondary master
object Endpoint "awsir-sdmon-mstr02.ourdomain.net" {
// Actively connect to the secondary master
host = "awsir-sdmon-mstr02.ourdomain.net"
}
// Satellite Endpoints
object Endpoint "enf-emea01-sat01.ourdomain.net" {
// Actively connect to satellite
host = "enf-emea01-sat01.ourdomain.net"
}
object Endpoint "enf-emea02-sat01.ourdomain.net" {
// Actively connect to satellite
host = "enf-emea02-sat01.ourdomain.net"
}
// MASTER ZONE
object Zone "master" {
endpoints = [ "enf-sdmon-mstr01.ourdomain.net", "awsir-sdmon-mstr02.ourdomain.net" ]
}
// ENF EMEA01 Satellite
object Zone "enf-emea01-satellite" {
endpoints = [ "enf-emea01-sat01.ourdomain.net" ]
parent = "master"
}
// ENF EMEA02 Satellite
object Zone "enf-emea02-satellite" {
endpoints = [ "enf-emea02-sat01.ourdomain.net" ]
parent = "master"
}
// Globals
object Zone "global-templates" {
global = true
}
object Zone "director-global" {
global = true
}
Secondary Master zones.conf
// primary master
object Endpoint "enf-sdmon-mstr01.ourdomain.net" {
// First master already connects to us
}
// Secondary master
object Endpoint "awsir-sdmon-mstr02.ourdomain.net" {
// This server
}
// Satellite Endpoints
object Endpoint "enf-emea01-sat01.ourdomain.net" {
// Actively connect to satellite
host = "enf-emea01-sat01.ourdomain.net"
}
object Endpoint "enf-emea02-sat01.ourdomain.net" {
// Actively connect to satellite
host = "enf-emea02-sat01.ourdomain.net"
}
// MASTER ZONE
object Zone "master" {
endpoints = [ "enf-sdmon-mstr01.ourdomain.net", "awsir-sdmon-mstr02.ourdomain.net" ]
}
// ENF EMEA01 Satellite Zone
object Zone "enf-emea01-satellite" {
endpoints = [ "enf-emea01-sat01.ourdomain.net" ]
parent = "master"
}
// ENF EMEA02 Satellite Zone
object Zone "enf-emea02-satellite" {
endpoints = [ "enf-emea02-sat01.ourdomain.net" ]
parent = "master"
}
// Globals
object Zone "global-templates" {
global = true
}
object Zone "director-global" {
global = true
}
And one of the Agents zones.conf
object Endpoint "enf-emea02-sat01.ourdomain.net" {
}
object Zone "enf-emea02-satellite" {
endpoints = [ "enf-emea02-sat01.ourdomain.net" ]
}
object Endpoint "enf-emea02-cassdb11.ourdomain.net" {
log_duration = 0
}
object Zone "enf-emea02-cassdb11.ourdomain.net" {
endpoints = [ "enf-emea02-cassdb11.ourdomain.net" ]
parent = "enf-emea02-satellite"
}
object Zone "global-templates" {
global = true
}
object Zone "director-global" {
global = true
}
All have the API enabled and accept config and accept commands except the primary master.
As I said at the start of this post, we are only seeing the error when deployments have been made.
Hope someone can point me in the right direction.
Thanks
Peter.