Monitoring satellite connectivity

I am wondering if anyone has a better way to monitor connectivity to satellites - we would like to use this to monitor overall connectivity to a customer’s site.

e.g. if a master (or both masters) loses connectivity to a satellite this could notify us that site is down.

Currently we’re using the ‘cluster’ check on our masters which lists connected/disconnected satellites, but if I could seperate the status of each satellite into seperate services that would be a bit more ideal for our use case.

Our monitoring architecture looks like this:

Master 1, Master 2 <------- Satellite <------ Hosts

Satellites talk up to the masters (all connectivity is upstream).

Any thoughts would be much appreciated.

Just try cluster-zone instead of (or in addition to) cluster.

6 Likes

Ah, I must have missed that when reading the documentation.

Will test this tomorrow morning and report my findings.

Sorry, was only able to test this now! This seems to do the job! :slight_smile:

I am wondering if anyone would be able to shed some light on how this might be achievable in an automated fashion.

We want this check to apply to every zone on both of our masters - this way we’re monitoring connectivity from both DCs to all of our customers.

This is our current config:

apply Service "Connectivity Test" {
    import "Client Connectivity"
    assign where "Icinga Master" in host.templates
    import DirectorOverrideTemplate
}

template Service "Client Connectivity" {
    check_command = "cluster-zone"
    vars.cluster_zone = "satellite.example.com.au"
}

We’re using Director, and it’s easy enough to add this check individually for each zone (modifying vars.cluster_zone as we go).

However I am wondering if something like the below is possible?

Essentially, we need to get a list of child zones of the master, and apply this check for each zone:

apply Service "connectivity" for (zone in childZones) {
    import DirectorOverrideTemplate
    display_name = zone + "connectivity"
    check_command = "cluster-zone"
    vars.cluster_zone = zone
    assign where "Icinga Master" in host.templates

}

I am not sure if it’s possible to achieve this in Director, but I’m not sure how we’d even achieve it with the DSL.

Is it possible to retrieve a list of child zones? Or am I going about this in a stupid way?

You could try something like this:

apply Service "zone" {
   display_name = "Zone"
   check_command = "cluster-zone"

   assign where get_object("Endpoint", host.name)
   ignore where host.name == NodeName
}
2 Likes

As an update, we discussed this internally and ended up defining a Host template in Director with cluster-zone as the host check.

We opted to create these via the API, and the host that is created gets the hostname of the customer’s zone, and a friendly name of the actual customer name.

These hosts sit in our cluster zone.

We then are planning on parenting these to the actual satellite zone, so if the master sees the ‘client check’ go into critical (disconnected), the satellite zone is marked as unreachable.

It’s a bit of a strange setup, but I think it works for our needs.

We also use the cluster-zone check as the “hostalive” check as a replacement for every host that has an agent or is a satellite. If we need the ping check, we add it separately.

1 Like