I am wondering if anyone has a better way to monitor connectivity to satellites - we would like to use this to monitor overall connectivity to a customer’s site.
e.g. if a master (or both masters) loses connectivity to a satellite this could notify us that site is down.
Currently we’re using the ‘cluster’ check on our masters which lists connected/disconnected satellites, but if I could seperate the status of each satellite into seperate services that would be a bit more ideal for our use case.
Sorry, was only able to test this now! This seems to do the job!
I am wondering if anyone would be able to shed some light on how this might be achievable in an automated fashion.
We want this check to apply to every zone on both of our masters - this way we’re monitoring connectivity from both DCs to all of our customers.
This is our current config:
apply Service "Connectivity Test" {
import "Client Connectivity"
assign where "Icinga Master" in host.templates
import DirectorOverrideTemplate
}
template Service "Client Connectivity" {
check_command = "cluster-zone"
vars.cluster_zone = "satellite.example.com.au"
}
We’re using Director, and it’s easy enough to add this check individually for each zone (modifying vars.cluster_zone as we go).
However I am wondering if something like the below is possible?
Essentially, we need to get a list of child zones of the master, and apply this check for each zone:
apply Service "connectivity" for (zone in childZones) {
import DirectorOverrideTemplate
display_name = zone + "connectivity"
check_command = "cluster-zone"
vars.cluster_zone = zone
assign where "Icinga Master" in host.templates
}
I am not sure if it’s possible to achieve this in Director, but I’m not sure how we’d even achieve it with the DSL.
Is it possible to retrieve a list of child zones? Or am I going about this in a stupid way?
As an update, we discussed this internally and ended up defining a Host template in Director with cluster-zone as the host check.
We opted to create these via the API, and the host that is created gets the hostname of the customer’s zone, and a friendly name of the actual customer name.
These hosts sit in our cluster zone.
We then are planning on parenting these to the actual satellite zone, so if the master sees the ‘client check’ go into critical (disconnected), the satellite zone is marked as unreachable.
It’s a bit of a strange setup, but I think it works for our needs.
We also use the cluster-zone check as the “hostalive” check as a replacement for every host that has an agent or is a satellite. If we need the ping check, we add it separately.