Agents host/service configs set up at scale

we are running 2.11 in a distributed cluster with 2 HA masters and 6 satellite zones (each with 2 server) but we’re looking into setting up agents to run the checks locally themselves. From the documentation, it appears as though the agent host and service objects get added under the master zone. the question i have is regarding scale - if we have 5000 agent hosts, won’t that impact the masters performance even if there’s only 3 services per agent and especially considering each existing satellite zone already has 2000 hosts and 17000 services (we already have plans to scale horizontal to more satellite zones). also the number of endpoints the master reports in the cluster would also include all the agents. the other option we’ve been trying is to have each agent be a zone onto itself but then the endpoints need to be on both masters in the zones.conf (based on the docs) which then becomes quite large.

just looking for some advice on where agent host and service object should live and how best to deploy at large scale.

I have a similar sized setup and the masters are doing okay. We follow the general practice of one config master. It pushes to the satellites and those push to the agents which keeps the load from being too high, but you definitely want to ramp up the memory and CPU cores on your masters if you have a lot of remote checks (I’ve got python scripts parsing json every 90 seconds). I have 8 cores and 12GB of ram on each of my masters with 7 satellite zones, 3,000 hosts and close to 60,000 service checks. Just keep an eye on your resources as you scale out and implement more complicated checks, and the best practices in the documentation should serve you fine.

@blakehartshorn - you have all the agent hosts/endpoints defined under the zones.d/master? as well as the services for them? or trying to have them their own zone under zones.d/agentA, zones.d/agentB…or in zones.conf on both masters? that’s kind of frustrating me at the moment on where they should reside

It’s a little bit of everywhere,

https://icinga.com/docs/icinga2/latest/doc/06-distributed-monitoring/#three-levels-with-masters-satellites-and-agents

master1, zones.d with subdirectories for each zone populated with zone/endpoint/host objects for all agents and satellites.

master2, zones.conf with information about itself and its sibling.

satellites, zones.conf with each other, + the parent endpoints and zone for master.

agents, zones.conf with their own zone/endpoint with the master or satellite as parent.

I’d follow that document very closely if you’re setting this up for the first time and either script out or use a configuration management tool to roll out zones.conf to all the nodes, but the configuration for all services goes on the master. I recommend using the global-templates zone folder for all your services that go everywhere.

thanks for the tip about putting services in the global-templates - will try that next

master2, zones.conf with information about itself and its sibling. = have it
satellites, zones.conf with each other, + the parent endpoints and zone for master. = have it
agents, zones.conf with their own zone/endpoint with the master as parent. = have it

so i think i’m close. the issue i’m trying to understand is on the primary master - are you saying that each agent is it’s own zone under zones.d:

zones.d/
   |_agent1/agent1.host.conf (with host+endpoint and parent = master)
   |_agent2/agent2.host.conf
   ...
   |_agentN/agentN.host.conf

do all the agent endpoints need to be defined somewhere else too?

the documentation has all the agents host/endpoint and services under the master zone. which we have done but i’m hesitant about that from a performance/administration view especially at scale (a simple ‘ls’ would be terrible).

thanks for the help

Hi Pete,

The agents don’t go into their own zone folders, they go into the folders of their parent zone. Here’s what my zones.d on the configuration master looks like (I don’t think this is information that will get me in trouble lol):

drwx------ 2 icinga icinga  4096 Feb  4 13:46 aussatellite
drwx------ 2 icinga icinga  4096 Feb  6 16:16 bossatellite
drwx------ 2 icinga icinga  4096 Feb 20 16:17 gersatellite
drwx------ 4 icinga icinga  4096 Feb 25 10:36 global-templates
drwx------ 2 icinga icinga 12288 Mar  5 14:56 lassatellite
drwx------ 4 icinga icinga  4096 Mar  4 14:49 master
-rw-r----- 1 icinga icinga   133 May 23  2019 README
drwx------ 2 icinga icinga  4096 Mar  4 14:49 sngsatellite
drwx------ 2 icinga icinga  4096 Feb 20 16:47 uksatellite

So I have a bunch of host and zone conf files defining agents for each data center in the appropriate folders there, named after their server type strictly for my own organization (a flat file would work). It does need host/zone/endpoint objects for anything running the agent. Note that the other DCs aren’t nested inside of master, that’s normal. There’s a folder per master or satellite zone named after the relevant zone.

right we have something similar too for the satellites.

zones.d/
    |_master/
            |_satellite-zone1.conf (2 endpoints)
            |_satellite-zone2.conf (2 endpoints)
    |_satellite-zone1/
            |_satellite-zone1.conf (2 sat endpoints, 2 hosts & master parent)
            |_hostA.conf (host, its own endpoint & satellite parent)
            |_hostB.conf
            |_services.conf (services run by this satellite)
    |_satellite-zone2/
<same as satelllite1>

the only way thus far i’ve been able to get local agents is to add them under the master zone which at scale is making me nervous.

|_master/
      |_satellite-zone1.conf (endpoints)
      |_satellite-zone2.conf (endpoints)
      |_agent1.conf (endpoint, host, parent master)
      |_agent200.conf (endpoint, host, parent master)
      |_agent.services.conf (services that run on agents)

if that’s the only way - then i’ll have to figure out increasing resources on the masters.