Master -> Satellite -> Agents and remote checks

Hi,

I followed the doc “06-distributed-monitoring/#three-levels-with-masters-satellites-and-agents” and managed to get the agent connected that is “behind” a satellite. Manually editing configs. (agent is on LAN, satellite is the only on this LAN that has a internet connection)

The agent is showing alive in icingaweb2 and when I create a new service and put in into the zones.d/satellite/ folder it shows up.

But I’m still a bit confused on how it should work.

I have alot of services created in /zones.d/master/services.conf. What would be best practice to get those working against (or on) the new agent that is behind the satellite ?

For example adding this to a service that is located in the master zone gives me a ‘error’:

apply Service “check_mem” {
import “slowcheck-service”
display_name = “Check memory”
check_command = “check_mem”
vars.warning = 70
vars.critical = 90
command_endpoint = host.vars.client_endpoint
assign where host.zone == “satellite” || host.vars.client_endpoint
}

unknown

Also:

apply Service “cluster-health” {
check_command = “cluster-zone”
display_name = “cluster-health-” + host.name
/* This follows the convention that the client zone name is the FQDN which is the same as the host object name. */
vars.cluster_zone = host.name
vars.grafana_graph_disable = true
assign where host.zone == “satellite” || host.vars.client_endpoint
}

agent_not_connected

Is my setup Master → Satellite → Agents not correct ? Would make sense if I read the ‘errors’ :wink: but why is it (partially) working.

My conf setups:

( The /zones.d/satellite/zones.conf shown is located on the master host. )

Any tips how to troubleshoot this are appreciated.!

A master zone is not global (which is correct), therefore, your service definitions are not synchronized to satellites and agents.

Dont think that’s completely (or I dont get it :wink: right as I see for example this one sync and executed right:

// Ping Check
apply Service “External Pings” {
check_command = “ping4”
assign where host.address
}

externalping

Could you expain why that service is synchronized to the agent? Any suggestion what I’m doing wrong to get above services running on the agent behind the satellite?

You need to store the service definitions within a global zone e.g. global-templates. So /zones.d/master/services.conf will not work but /zones.d/global-templates/services.conf will do.

Okay, moved this (see below) service to /zones.d/global-templates/services.conf and restarted icinga2 service.

apply Service “cluster-health” {
check_command = “cluster-zone”
display_name = “cluster-health-” + host.name
vars.cluster_zone = host.name
vars.grafana_graph_disable = true
assign where host.vars.client_endpoint
}

The endpoints appear as they did earlier, but the new agent (behind the satellite) doesnt show up in icingaweb2.

Also, placing the cluster.conf to the global zone isn’t allow.

[2019-09-25 18:53:00 +0100] critical/config: Error: Service ‘icinga!cluster’ cannot be put into global zone ‘global-templates’.

This service is also still showing less connected nodes: (should be 6 endpoints)
cluster_only_5_instead_of_6

uitleg

Is it normal that a cluster-health check doesnt see the endpoint that is “behind” a satellite ?

The satellite is connected to my master (named icinga) and to the agent (tmkms) accordingly:

Debuglog:

[2019-09-27 18:26:52 +0200] notice/ApiListener: Connected endpoints: TMKMS (1) and icinga (1)

Found out that I used “vars.agent_endpoint = name” instead of “vars.client_endpoint = name” what I used to assign services. This resolved some issues I had earlier.

But still some unresolved questions:

  1. Why is the cluster only showing the master endpoint and does it not count the agent (behind the satellite) ? Or how to make it count/check this agent to ?

  2. I have alot of services in my master zone and some in the satellite zone. Strangely enough it’s also trying to target those master services against my agent ? I thought only global services where executed there or zone specific services ? ( explained by @rsx ) What am I doing wrong ?
    This is what it shows. Those services exist on my master in the master zone:
    unkown_from_master_zone

3.I found out that icinga2 daemon -C" didnt give me errors on the satellite, but the startuplog gave me the errors I needed earlier. ( /var/lib/icinga2/api/zones-stage ) Still learning :wink:
Shouldnt icinga2 somehow check this before it starts on the satellite or at least give some sort of feedback or are there other way to find those when deploying services to a satellite ?

Findings:

  1. Dont know the answer, but created two cluster checks. One one for the master and one for the satellite.

  2. still dont know

  3. Think I know how it works…, but deploying services on the master could resolve in errors on a satellite without really knowing in advance I guess.