Howto tell the Director to use a specific host for expensive checks?

I have a bunch of expensive checks that run directly on the master nodes.
As I experience some performance bottlenecks, I would like to delegate the execution to a different host that has the agent and the check plugins installed.

So far, I’ve played around with zones and endpoints, but I’m going crazy as the director is already creating a zone object for the agent, but they’re not usable as a parent for the “ping only” hosts, which I need to run the checks against.

I guess in the end, the question is: how to tell the director that an agent is now a satellite?

When you put its host object in its zone aka top down config sync.

What about creating those service objects with Run on agent?

Run on agent isn’t an option as the hosts are switches and I only ping them directly from Icinga. The expensive checks I wan’t to move away form the master zone are checking the aggregate of alerts and health of the switches in LibreNMS.

I tried to put the hosts without agent in the zone of the host with a agent (satellite to be) but failed as the director only allowed me the 2 master nodes.

I also tried to define zones and endpoints to make it a satellite - even via direkt Manipulation of the director DB but failed.

Sometimes I could do what I wanted in the director GUI but I never got such configs to deploy.
I created a lot of configs that failed deployment.
I even removed I switches and the template for them from the director so I could set the command_endpoint to the would be satellite host but this also didn’t work.

So in essence how to make a normal host like this into a satellite with the director?

object Host "" {
    import "tpl-host-linux"
    display_name = "would-be-satellite"
    address = ""

object Endpoint "" {
    host = ""
    log_duration = 0s

object Zone "" {
    parent = "master"
    endpoints = [ "" ]

Does somebody use the director with satellites and how does this work?

The director does not allow top down config sync.

I meant with host object that machine that shall run your checks. This machine has an agent installed and if you define services with run on agent they are scheduled from its parent but executed an that agent.

That would be very sad as it would mean, we have soon outgrown the director. Also why would it allow the creation of zones and endpoints?

This would work but then I will have 700 services on the host instead of 350 hosts (switches) with 2 services and a disconnect between the host and service objects of the switches.

I’ve absolutely no idea.

Yes, we did (and had huge problems after some time). And we switched to defining satellite zones and endpoints in the masters zones.conf files.

I second that. The Director even has a (cryptic) warning about that feature.

If you want your agent to behave like a satellite you need to add it to the masters zones.conf. After that you can use it as a zone inside the Director (after a kickstart).

And then basic Icinga behavior: every check without command_endpoint/“run on agent” setting will be executed on the satellite for each host you put into the satellites zone.


I’ve missed this question:

Yes, one of our customers have been running smoothly such a distributed environment since 2018. For adding satellites you need to manually add them to zones.conf on the master and run kickstart-wizard after icinga’s reload.

1 Like

Ok, now I understand your scenario. I’d recommend to add a satellite (or even two for HA and more compute power) and move those 350 host objects in the satellite zone.

1 Like

I got it working!

  1. edited the zones.conf on the masters and the satellite to be.
  2. had to disable the satellite to be in the director
  3. deploy config
  4. rerun director kick start
  5. assign hosts to the new zone

Then the debugging started :wink:
I had to watch /var/lib/icinga2/api/zones-stage-startup-last- failed.log on the new satellite for errors and fix one after the other.
This required moving one templates at a time to global zones or the other way into mater to reduce dependencies (had apply rules) and deploy again, deploy and check again for errors in the log. Also had to clone a some service templates as I didn’t want to deploy a token to every host with an agent installed.