Documentation for Distributed Monitoring with Director?

Great!

Right so technically for me a agent is the node that connects to the satellite (agent)

yes this is confusing as the termS agent and agent mean two different things and they are both right as there is also a agent on the master node :stuck_out_tongue: but we just call it the master

So I like to keep it simple

  • master
  • satelite (optional)
  • agent (the server you want to monitor)

Any how still does not solve your initial question I suppose!
back to post 1:

Yes, the the master receives the results, after it tels the satellite the configuration. The satellite will then as the agent to do the check :slight_smile:

zone files, configuration and logs mostly…

Bu putting them in the proper folder on the master in the /zones.d/zone_name this will go to the satellite, that goes to the agent. The agent needs to allow the satellites incoming connection in the configuration for it to work.

Hope it helps :slight_smile:

1 Like

This is excellent information. I really appreciate you sticking with me on this and helping clear things up. I am starting to gain traction on all of this.

I have gone through the configuration both on the command line and through director and spotted a few mistakes I made. I have corrected them but I am still unable to get a successful ping/ssh/http/winrm check on a host that only the satellite should be able to ping/ssh/http/winrm. I have confirmed that I can ping the host in question from the satellite too. If I can get this demo host working then I can take everything I have learned and start applying it everywhere else.

I did everything through Director but I have been using the preview tab to see where it saves the files and then inspecting them on the command line to see what is going and to learn what it is that it does etc.

Here is what my dashboard is showing:

Here is what my config is showing:

root@AWS-NOC-02:/var/lib/icinga2/api/zones/WDC/director# cat host_templates.conf
template Host "WDC-basic-windows" {
    check_command = "hostalive"
    command_endpoint = "WDC-NOCSat-01"
}

template Host "WDC-switches" {
    check_command = "hostalive"
    command_endpoint = "WDC-NOCSat-01"
}

root@AWS-NOC-02:/var/lib/icinga2/api/zones/WDC/director# cat hosts.conf
object Host "WDC-NOCSat-01" {
    import "WDC-NOCSat-Template"

    display_name = "WDC Icinga Satellite"
    address = "192.100.34.190"
}

object Host "WDC333SW01" {
    import "WDC-switches"

    display_name = "Meraki Switch Test"
    address = "192.100.16.89"
}

object Host "WDC-BOUNCE-01" {
    import "WDC-basic-windows"

    display_name = "WDC-BOUNCE-01"
    address = "192.100.34.61"
}

root@AWS-NOC-02:/var/lib/icinga2/api/zones/WDC/director# cat servicesets.conf
/**
 * Service Set: basic-windows-services
 * on host WDC-basic-windows
 */

apply Service "check-ping" {
    import "check-ping"


    assign where "WDC-basic-windows" in host.templates
    zone = "WDC"

    import DirectorOverrideTemplate
}

apply Service "check-winrm " {
    import "check-winrm "


    assign where "WDC-basic-windows" in host.templates
    zone = "WDC"

    import DirectorOverrideTemplate
}

So it would appear that these two test agents are being checked by the correct satellite which is good news. Assuming that my hosts are setup correctly, then that means I setup my services wrong?

cat /var/lib/icinga2/api/zones/director-global/director/service_templates.conf

template Service "check-ssh" {
    check_command = "ssh"
    command_endpoint = host_name
}

template Service "check-http" {
    check_command = "http"
    command_endpoint = host_name
}

template Service "check-ping" {
    check_command = "hostalive"
    command_endpoint = null
}

template Service "check-winrm " {
    check_command = "tcp"
    vars.tcp_port = "5985"
}

Other than check-ping being null I am not seeing anything that stands out?

cat /var/lib/icinga2/api/zones/director-global/director/servicesets.conf
/**
 * Service Set: basic-linux-services
 *
 * HTTP/PING/SSH
 */

/**
 * Service Set: basic-linux-services
 * on host basic-linux
 */

apply Service "check-http" {
    import "check-http"


    assign where "basic-linux" in host.templates

    import DirectorOverrideTemplate
}

apply Service "check-ssh" {
    import "check-ssh"


    assign where "basic-linux" in host.templates

    import DirectorOverrideTemplate
}

apply Service "check-ping" {
    import "check-ping"


    assign where "basic-linux" in host.templates

    import DirectorOverrideTemplate
}

/**
 * Service Set: basic-windows-services
 *
 * ping and winrm
 */

/**
 * Service Set: basic-ping
 *
 * ping check
 */

Hopefully I am troubleshooting this correctly. Its starting to all come together.

I was following this guide while doing this

Other than check-ping being null I am not seeing anything that stands out?

Right so I you look at your screen shot, then you will see ping being “not connected”
Which is not typically the error message you get from a ping service. This indicates it is trying to connect to the agent WDC-BOUNCE-01 on the icinga proces and tries to ping it self which is in 100% of the cases not use-full :slight_smile:

You want the first piece of icinga infrastructure above the agent to ping the agent. (satellite or master)

So it being null in your config is definitely a problem here. (probably the same for winrm/ssh, not very use-full if the agent tries to check it self!)

Its kinda like having a doctor as a parent and googling your symptoms yourself and saying I am fine :smiley:

you switch seems to have the right ping source?

Any way hope it helps you a bit further dont forget to lurke on the OG documentation:

Good Afternoon William,

I hope you had a wonderful and safe Thanksgiving!

So your response brings us back full circle to my original desire of doing monitoring without having to go around and install the icinga agent on all of my hosts. I thought that the Satellites in the zones of the agents would be able to do the basic ssh/ping/http/winrm checks?

Hi No TG here in the EU but thanks for asking yourself?

I love a good circle :slight_smile:

Right so you could let your satellite do all the heavy lifting, If its a good thing to do ?
depends,

In any case you are looking for the command_endpoint setting there is one in your host template that you can set to your satellite that should give you the settings you want.

Quiet and peaceful :slightly_smiling_face:

So I am glad that you mentioned command_endpoint as I told my host template to use the Satellite but was unsure if it was the correct thing to do. Now I know it was which is good.

object Host "WDC-BOUNCE-01" {
    import "wdc-basic-windows"

    display_name = "WDC-BOUNCE-01"
    address = "10.10.10.10"
}

cat /var/lib/icinga2/api/zones/AWS-NOC-02/director/endpoints.conf
object Endpoint "WDC-NOCSat-01" {
    host = "10.10.10.190" 
}

I went to the Satellite and confirmed that the password in /etc/icinga2/conf.d/api-users.conf was correct

/**
 * The APIUser objects are used for authentication against the API.
 */
object ApiUser "WDC" {
  password = "SuperSecretPassword123!"
  // client_cn = ""

  permissions = [ "*" ]
}

I then took that password and verified on master that the API user had that password.

Not sure where else to look to be honest.

Hi There,

So I have done some digging in my configuration :slight_smile: (getting this to work for myself was quite something i do remember)

we offshore our “hostalive” to our satelite we do that by NOT setting command_endpoint for this particilar service and set it to

command_endpoint = host.vars.agent_endpoint

in our service template
which points to:

vars.agent_endpoint = name

in our host config

So in order to get the check to execute on your satellite for a service you need to either

for our satellite host we had to override to to the master
in the service template you would set:

vars.command_endpoint = "master"

Now this is all without the director :tada:

So I spun up my virtualbox for you and this is probably what you want to set to “no”

which generates:

template Service "test-s" {
    command_endpoint = null
}

(leaving it empty)
That should now trigger the check from the host above the agent (a satellite in your case)

Hope it helps, It did for me :smiley:

Good Morning!

I hope you had an excellent weekend.

So I tried to set this manually but after restarting the icinga service using systemctl restart icinga, director changed the file back. I guess this is to be expected and actually a good thing in a way. I poked around the WebUI but could not find any way to manually change this.

I am assuming that you are referring to /etc/icinga2/conf.d/hosts.conf. If so, I checked in there and I do not have vars.agent_endpoint = name set in that file. I will go ahead and add it.

object Host NodeName {
  /* Import the default host template defined in `templates.conf`. */
  import "generic-host"

  /* Specify the address attributes for checks e.g. `ssh` or `http`. */
  address = "127.0.0.1"
  address6 = "::1"

  /* Set custom attribute `os` for hostgroup assignment in `groups.conf`. */
  vars.os = "Linux"

  /* Define http vhost attributes for service apply rules in `services.conf`. */
  vars.http_vhosts["http"] = {
    http_uri = "/"
  }
  /* Uncomment if you've sucessfully installed Icinga Web 2. */
  //vars.http_vhosts["Icinga Web 2"] = {
  //  http_uri = "/icingaweb2"
  //}

  /* Define disks and attributes for service apply rules in `services.conf`. */
  vars.disks["disk"] = {
    /* No parameters. */
  }
  vars.disks["disk /"] = {
    disk_partitions = "/"
  }

  /* Define notification mail attributes for notification apply rules in `notifications.conf`. */
  vars.notification["mail"] = {
    /* The UserGroup `icingaadmins` is defined in `users.conf`. */
    groups = [ "icingaadmins" ]
  }

  /* For use with our service template */
  vars.agent_endpoint = name

}

(This change persisted through a restart probably because it is not in the “directory” folder (ie director doesnt control it))

Did you mean hosts config on your master and if not which configuration are you referring to?

root@AWS-NOC-02:~# locate hosts.conf
/etc/icinga2/conf.d/hosts.conf
/var/lib/icinga2/api/packages/director/AWS-NOC-02-1606330592-0/zones.d/WDC/hosts.conf
/var/lib/icinga2/api/zones/WDC/director/hosts.conf

/var/lib/icinga2/api/zones/WDC/director/hosts.conf is the file that contains the configuration for the satellite

object Host "WDC-NOCSat-01" {
    import "WDC-NOCSat-Template"

    display_name = "WDC Icinga Satellite"
    address = "10.10.10.20"
}

Are you suggesting I change this to be

object Host "WDC-NOCSat-01" {
    import "WDC-NOCSat-Template"

    display_name = "WDC Icinga Satellite"
    address = "10.10.10.20"

    vars.command_endpoint = "master"
}

Although as I type this I am realizing that what I said above about director changing the file back when Icinga is restarted will probably also happen here. So no manual changes can be made to the files which is a bummer.




Regardless, after making the suggested change of setting run on agent to no on the service templates my service_templates.conf now looks like this

template Service "check-ssh" {
    check_command = "ssh"
    command_endpoint = null
}

template Service "check-http" {
    check_command = "http"
    command_endpoint = null
}

template Service "check-ping" {
    check_command = "hostalive"
    command_endpoint = null
}

template Service "check-winrm " {
    check_command = "tcp"
    command_endpoint = null
    vars.tcp_port = "5985"
}

However it is still showing the master as the one doing the checks

I feel like we are really close to figuring this out. Each post gets me further and teaches me something new so again, thank you for all your help. I created a simulated multi-site setup on my homelab using firewall rules and vlans so that I can work through all of this from scratch and document the whole process. If you are alright with it, I plan on giving a special mention / shoutout to you at the bottom of each part of the guide so that if anyone ever reads my guide, they will know the true source of my knowledge :wink: .

Hi @hewithaname,

this:

template Service "check-ssh" {
    check_command = "ssh"
    command_endpoint = null
}

Is great stuff. If you change the template to yes this will change for the specific service of your choosing, It should monitor your service on the specific host And that should be the end result you are looking for ?

That is the end result yes. So I changed the run on agent in check-ssh template to yes, I was able to deploy the configuration just fine.

zones.d/director-global/service_templates.conf
template Service "check-ssh" {
    check_command = "ssh"
    command_endpoint = host_name
}
information/cli: Icinga application loader (version: r2.8.1-1)
information/cli: Loading configuration file(s).
information/ConfigItem: Committing config item(s).
information/ApiListener: My API identity: AWS-NOC-02
warning/ApplyRule: Apply rule 'check-http' 
  (in [stage]/zones.d/director-global/servicesets.conf: 12:1-12:26) for type 'Service' does not match anywhere!
warning/ApplyRule: Apply rule 'check-ssh' 
  (in [stage]/zones.d/director-global/servicesets.conf: 21:1-21:25) for type 'Service' does not match anywhere!
warning/ApplyRule: Apply rule 'check-ping' 
  (in [stage]/zones.d/director-global/servicesets.conf: 30:1-30:26) for type 'Service' does not match anywhere!
information/ConfigItem: Instantiated 1 ApiListener.
information/ConfigItem: Instantiated 4 Zones.
information/ConfigItem: Instantiated 2 Endpoints.
information/ConfigItem: Instantiated 2 ApiUsers.
information/ConfigItem: Instantiated 1 FileLogger.
information/ConfigItem: Instantiated 13 Notifications.
information/ConfigItem: Instantiated 2 NotificationCommands.
information/ConfigItem: Instantiated 209 CheckCommands.
information/ConfigItem: Instantiated 1 Downtime.
information/ConfigItem: Instantiated 2 HostGroups.
information/ConfigItem: Instantiated 1 IcingaApplication.
information/ConfigItem: Instantiated 4 Hosts.
information/ConfigItem: Instantiated 2 Comments.
information/ConfigItem: Instantiated 1 UserGroup.
information/ConfigItem: Instantiated 1 User.
information/ConfigItem: Instantiated 3 TimePeriods.
information/ConfigItem: Instantiated 17 Services.
information/ConfigItem: Instantiated 3 ServiceGroups.
information/ConfigItem: Instantiated 1 ScheduledDowntime.
information/ConfigItem: Instantiated 1 CheckerComponent.
information/ConfigItem: Instantiated 1 ExternalCommandListener.
information/ConfigItem: Instantiated 1 IdoMysqlConnection.
information/ConfigItem: Instantiated 1 NotificationComponent.
information/ScriptGlobal: Dumping variables to file '/var/cache/icinga2/icinga2.vars'
information/cli: Finished validating the configuration file(s).

I applied the same change to the check-http service template and that had the same result as above.

However, when I made the same changes to check-ping and check-winrm it failed with the following error:

information/cli: Icinga application loader (version: r2.8.1-1)
information/cli: Loading configuration file(s).
information/ConfigItem: Committing config item(s).
information/ApiListener: My API identity: AWS-NOC-02
critical/config: Error: Validation failed for object 'WDC-BOUNCE-01!check-ping' of type 'Service'; Attribute 'command_endpoint': Object 'WDC-BOUNCE-01' of type 'Endpoint' does not exist.
Location: in [stage]/zones.d/director-global/service_templates.conf: 13:5-13:32
[stage]/zones.d/director-global/service_templates.conf(11): template Service "check-ping" {
[stage]/zones.d/director-global/service_templates.conf(12):     check_command = "hostalive"
[stage]/zones.d/director-global/service_templates.conf(13):     command_endpoint = host_name
                                                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[stage]/zones.d/director-global/service_templates.conf(14): }
[stage]/zones.d/director-global/service_templates.conf(15): 

critical/config: 1 error

It is really weird that it is working for two of them but not the other two? I wonder if it is because those two checks are not being used since WDC-Bounce-01 is a Windows Server. Let me add a Linux server and see what happens.

Yup!

Now I am getting the same error for the SSH and HTTP service checks that I was for the winrm and ping.

information/cli: Icinga application loader (version: r2.8.1-1)
information/cli: Loading configuration file(s).
information/ConfigItem: Committing config item(s).
information/ApiListener: My API identity: AWS-NOC-02
critical/config: Error: Validation failed for object 'WDC-UTIL-01!check-http' of type 'Service'; Attribute 'command_endpoint': Object 'WDC-UTIL-01' of type 'Endpoint' does not exist.
Location: in [stage]/zones.d/director-global/service_templates.conf: 8:5-8:32
[stage]/zones.d/director-global/service_templates.conf(6): template Service "check-http" {
[stage]/zones.d/director-global/service_templates.conf(7):     check_command = "http"
[stage]/zones.d/director-global/service_templates.conf(8):     command_endpoint = host_name
                                                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[stage]/zones.d/director-global/service_templates.conf(9): }
[stage]/zones.d/director-global/service_templates.conf(10): 

critical/config: Error: Validation failed for object 'WDC-UTIL-01!check-ssh' of type 'Service'; Attribute 'command_endpoint': Object 'WDC-UTIL-01' of type 'Endpoint' does not exist.
Location: in [stage]/zones.d/director-global/service_templates.conf: 3:5-3:32
[stage]/zones.d/director-global/service_templates.conf(1): template Service "check-ssh" {
[stage]/zones.d/director-global/service_templates.conf(2):     check_command = "ssh"
[stage]/zones.d/director-global/service_templates.conf(3):     command_endpoint = host_name
                                                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[stage]/zones.d/director-global/service_templates.conf(4): }
[stage]/zones.d/director-global/service_templates.conf(5): 

critical/config: 2 errors

At first your config works fine because the service is not applied anywhere :slight_smile:
(ping, ssh and http)

So when you apply the checks to hosts, you now have the problem that your host is missing something like this :

object Endpoint "{{ inventory_hostname }}" {
  host = "{{ inventory_hostname }}"
  log_duration = 0
}

I would look in your host templates perhaps to see if you can configure this.

(I am thinking out loud at this point. Trying to enumerate through everything and check each little thing in an effort to debug this :slight_smile: )
Endpoints are in Director -> infrastructure

This is what mine looks like.

zones.d/AWS-NOC-02/endpoints.conf
object Endpoint "WDC-NOCSat-01" {
    host = "10.10.10.20"
}

When I try to inspect it, I get this error. Maybe this is the core issue?


It is saying connection timed out and yet it is reachable according to the dashboard…interesting

root@AWS-NOC-02:/var/lib/icinga2/api/zones/AWS-NOC-02/director# ls
endpoints.conf  zones.conf
root@AWS-NOC-02:/var/lib/icinga2/api/zones/AWS-NOC-02/director# cat zones.conf
object Zone "WDC" {
    parent = "AWS-NOC-02"
    endpoints = [ "WDC-NOCSat-01" ]
}

root@AWS-NOC-02:/var/lib/icinga2/api/zones/AWS-NOC-02/director# cat endpoints.conf
object Endpoint "WDC-NOCSat-01" {
    host = "10.10.10.20"
}

Alright I just noticed something interesting. If I go to the dashboard and click on the host then it will show its check source as being the Satellite.


But if I click on the service (check-http) then it shows its check source as the master.

Thats really weird.

Let’s take a step back and look at all the configuration made at this point. I have only ever used Director and nothing else so all the conf files should be in the /var/lib/icinga2/api/ directory/zones.

root@AWS-NOC-02:/var/lib/icinga2/api/zones# ls AWS-NOC-02/director/
endpoints.conf  hosts.conf  zones.conf
root@AWS-NOC-02:/var/lib/icinga2/api/zones# cat AWS-NOC-02/director/endpoints.conf
object Endpoint "WDC-NOCSat-01" {
    host = "10.10.10.20"
}

root@AWS-NOC-02:/var/lib/icinga2/api/zones# cat AWS-NOC-02/director/hosts.conf
object Host "WDC-UTIL-01" {
    import "basic-linux"

    display_name = "WDC Linux Server"
    address = "10.10.10.30"
}

root@AWS-NOC-02:/var/lib/icinga2/api/zones# cat AWS-NOC-02/director/zones.conf
object Zone "WDC" {
    parent = "AWS-NOC-02"
    endpoints = [ "WDC-NOCSat-01" ]
}

root@AWS-NOC-02:/var/lib/icinga2/api/zones# ls WDC/director/
hosts.conf
root@AWS-NOC-02:/var/lib/icinga2/api/zones# cat WDC/director/hosts.conf
object Host "WDC-NOCSat-01" {
    import "WDC-NOCSat-Template"

    display_name = "WDC Icinga Satellite"
    address = "10.10.10.20"
}

object Host "WDC112SW01" {
    import "wdc-switches"

    display_name = "Meraki Switch Test"
    address = "10.10.40.80"
}

object Host "WDC-BOUNCE-01" {
    import "wdc-basic-windows"

    display_name = "WDC-BOUNCE-01"
    address = "10.10.10.50"
}

root@AWS-NOC-02:/var/lib/icinga2/api/zones# ls director-global/director/
001-director-basics.conf  endpoint_templates.conf  host_templates.conf  service_templates.conf  servicesets.conf
root@AWS-NOC-02:/var/lib/icinga2/api/zones# cat director-global/director/001-director-basics.conf

const DirectorStageDir = dirname(dirname(current_filename))

globals.directorWarnedOnceForThresholds = false;
globals.directorWarnOnceForThresholds = function() {
    if (globals.directorWarnedOnceForThresholds == false) {
        globals.directorWarnedOnceForThresholds = true
        log(LogWarning, "config", "Director: flapping_threshold_high/low is not supported in this Icinga 2 version!")
    }
}

const DirectorOverrideTemplate = "host var overrides (Director)"
if (! globals.contains(DirectorOverrideTemplate)) {
  const DirectorOverrideVars = "_override_servicevars"

  globals.directorWarnedOnceForServiceWithoutHost = false;
  globals.directorWarnOnceForServiceWithoutHost = function() {
    if (globals.directorWarnedOnceForServiceWithoutHost == false) {
      globals.directorWarnedOnceForServiceWithoutHost = true
      log(
        LogWarning,
        "config",
        "Director: Custom Variable Overrides will not work in this Icinga 2 version. See Director issue #1579"
      )
    }
  }

  template Service DirectorOverrideTemplate {
    /**
     * Seems that host is missing when used in a service object, works fine for
     * apply rules
     */
    if (! host) {
      var host = get_host(host_name)
    }
    if (! host) {
      globals.directorWarnOnceForServiceWithoutHost()
    }

    if (vars) {
      vars += host.vars[DirectorOverrideVars][name]
    } else {
      vars = host.vars[DirectorOverrideVars][name]
    }
  }
}
root@AWS-NOC-02:/var/lib/icinga2/api/zones# cat director-global/director/endpoint_templates.conf
template Endpoint "WDC Endpoint Tester" {
    host = "10.10.10.20"
    port = "5665"
}

root@AWS-NOC-02:/var/lib/icinga2/api/zones# cat director-global/director/host_templates.conf
template Host "basic-linux" {
    check_command = "hostalive"
    command_endpoint = "WDC-NOCSat-01"
}

template Host "WDC-NOCSat-Template" {
    check_command = "hostalive"
    command_endpoint = "WDC-NOCSat-01"
    notes = "Dummy template for the WDC Icinga Satellite"
}

template Host "wdc-basic-windows" {
    check_command = "hostalive"
    command_endpoint = "WDC-NOCSat-01"
}

template Host "wdc-switches" {
    check_command = "hostalive"
    command_endpoint = "WDC-NOCSat-01"
}

root@AWS-NOC-02:/var/lib/icinga2/api/zones# cat director-global/director/service_templates.conf
template Service "check-ssh" {
    check_command = "ssh"
    command_endpoint = null
}

template Service "check-http" {
    check_command = "http"
    command_endpoint = null
}

template Service "check-ping" {
    check_command = "hostalive"
    command_endpoint = null
}

template Service "check-winrm " {
    check_command = "tcp"
    command_endpoint = null
    vars.tcp_port = "5985"
}

root@AWS-NOC-02:/var/lib/icinga2/api/zones# cat director-global/director/servicesets.conf
/**
 * Service Set: basic-linux-services
 *
 * HTTP/PING/SSH
 */

/**
 * Service Set: basic-linux-services
 * on host basic-linux
 */

apply Service "check-http" {
    import "check-http"


    assign where "basic-linux" in host.templates

    import DirectorOverrideTemplate
}

apply Service "check-ssh" {
    import "check-ssh"


    assign where "basic-linux" in host.templates

    import DirectorOverrideTemplate
}

apply Service "check-ping" {
    import "check-ping"


    assign where "basic-linux" in host.templates

    import DirectorOverrideTemplate
}

/**
 * Service Set: basic-windows-services
 *
 * ping and winrm
 */

/**
 * Service Set: basic-ping
 *
 * ping check
 */

/**
 * Service Set: basic-windows-services
 * on host wdc-basic-windows
 */

apply Service "check-ping" {
    import "check-ping"


    assign where "wdc-basic-windows" in host.templates

    import DirectorOverrideTemplate
}

apply Service "check-winrm " {
    import "check-winrm "


    assign where "wdc-basic-windows" in host.templates

    import DirectorOverrideTemplate
}

Looking through this, I am not seeing any issues. Are you?

Hi @hewithaname,

So I drew it all out in a notepad because configs can be quite something to disect,

I noticed:

  • template hosts you over wrote the the command endpoint. it would ideally just be:
template Host "basic-linux" {
    check_command = "hostalive"
}
  • Each host does not have its own zone defined like so:
object Zone "WDC-UTIL-01" {
  endpoints = [ "WDC-UTIL-01" ]
  parent = "WDC-NOC-SAT-01"
}
  • all services have null defined.
template Service "check-ssh" {
    check_command = "ssh"
    command_endpoint = null
}

Ideally they look like this ish UNLESS the service needs to run somewhere else:

template Service "generic-service" {
  max_check_attempts = 5
  check_interval = 5m
  retry_interval = 30s
  command_endpoint = host.vars.agent_endpoint
  // this points to  "object Host" -> "vars.agent_endpoint = name"
}
  • Your satellite host alive pings it self:
template Host "WDC-NOCSat-Template" {
    check_command = "hostalive"
    command_endpoint = "WDC-NOCSat-01"
    notes = "Dummy template for the WDC Icinga Satellite"
}

Ideally:

template Host "WDC-NOCSat-Template" {
    check_command = "hostalive"
    command_endpoint = "AW-NOC-02"
    // IF this is indeed your master?
    notes = "Dummy template for the WDC Icinga Satellite"
}
  • your check-ping is defined twice and then applied twice for windows and linux
apply Service "check-ping" {
    import "check-ping"
    assign where "basic-linux" in host.templates

and

apply Service "check-ping" {
    import "check-ping"
    assign where "wdc-basic-windows" in host.templates

Not this might be a director thing but ideally you keep assigns to a minimum like this:

apply Service "check-ping" {
    import "check-ping"
    assign where host.address

This reduces the load on your master with reloading the config.
So a lot to fix still in your config but that is alright hope it helps you fix a few things

Good Evening William!

Thank you so much for looking through that. I ended up making a lot of changes to help my sanity. I ended up reinstalling the Satellite due to it just being completely out of whack. There is so much that I have learned since I first installed it and this allowed me to check each step to ensure everything was correct. This also fixed the issue of the Master and Satellite not running the same version of Icinga2.

Master = AWS-NOC-02
Satellite = WDC-NOCSat-02

This is what my Dashboard currently looks like.

  • Ensured DNS was properly configured

  • Deleted all references to the previous Satellite

  • Deleted the check-ping service template since the check_command hostalive already does a ping check

  • Ensured all firewall, acl, ufw, security groups were properly configured

  • Followed the documentation to setup the new Satellite

  • Used curl -k -s -u WDC:$PASS https://WDC-NOCSat-02.domain.corp:5665/v1 on Master to confirm that it could properly talk to the Satellite over the API.

    • <html><head><title>Icinga 2</title></head><h1>Hello from Icinga 2 (Version: r2.8.1-1)!</h1><p>You are authenticated as <b>WDC</b>. Your user has the following permissions:</p> <ul><li>*</li></ul><p>More information about API requests is available in the <a href="https://docs.icinga.com/icinga2/latest" target="_blank">documentation</a>.</p></html>
  • Fixed all the host templates so that they only have check_command = "hostalive"

  • Added max_check_attempts, check_interval, and retry_interval into my service templates

  • Fixed my Satellite host configuration so that its command_endpoint was now Master


I understand that each host should have a parent. This does make sense. I just didn’t realize that they would need their own Zone? If I have 100 hosts, then each of them will need their own Zone?


I am not seeing any way to get command_endpoint = host.vars.agent_endpoint into the configuration with Director UI. I can’t change it manually because Director will just change it back.




I feel that showing you the configuration files was very helpful so ill leave them here at the bottom again for you.

root@AWS-NOC-02:/var/lib/icinga2/api/zones# ls
AWS-NOC-02  WDC  director-global
root@AWS-NOC-02:/var/lib/icinga2/api/zones# cd director-global/director/
root@AWS-NOC-02:/var/lib/icinga2/api/zones/director-global/director# ls
001-director-basics.conf  endpoint_templates.conf  host_templates.conf  service_templates.conf  servicesets.conf
root@AWS-NOC-02:/var/lib/icinga2/api/zones/director-global/director# cat host_templates.conf
template Host "wdc-satellites" {
    check_command = "hostalive"
    command_endpoint = "AWS-NOC-02"
}

root@AWS-NOC-02:/var/lib/icinga2/api/zones/director-global/director# cat service_templates.conf
template Service "check-ssh" {
    check_command = "ssh"
    max_check_attempts = "3"
    check_interval = 10s
    retry_interval = 30s
}

template Service "check-http" {
    check_command = "http"
    max_check_attempts = "3"
    check_interval = 10s
    retry_interval = 30s
}

template Service "check-winrm " {
    check_command = "tcp"
    max_check_attempts = "3"
    check_interval = 10s
    retry_interval = 30s
    vars.tcp_port = "5985"
}

root@AWS-NOC-02:/var/lib/icinga2/api/zones/director-global/director# cat servicesets.conf
/**
 * Service Set: basic-linux-services
 *
 * HTTP/PING/SSH
 */

/**
 * Service Set: basic-windows-services
 *
 * ping and winrm
 */

root@AWS-NOC-02:/var/lib/icinga2/api/zones/director-global/director# cd ../../WDC/director/
root@AWS-NOC-02:/var/lib/icinga2/api/zones/WDC/director# ls
host_templates.conf  hosts.conf  servicesets.conf
root@AWS-NOC-02:/var/lib/icinga2/api/zones/WDC/director# cat host_templates.conf
template Host "wdc-basic-linux" {
    check_command = "hostalive"
}

template Host "wdc-basic-windows" {
    check_command = "hostalive"
}

template Host "wdc-meraki-switches" {
    check_command = "hostalive"
}

root@AWS-NOC-02:/var/lib/icinga2/api/zones/WDC/director# cat hosts.conf
object Host "WDC112SW01" {
    import "wdc-meraki-switches"

    display_name = "WDC112SW01"
    address = "10.10.40.80"
}

object Host "WDC-BOUNCE-01" {
    import "wdc-basic-windows"

    display_name = "WDC-BOUNCE-01"
    address = "10.10.10.50"
}

object Host "WDC-UTIL-01" {
    import "wdc-basic-linux"

    display_name = "WDC Linux Server"
    address = "10.10.10.30"
}

root@AWS-NOC-02:/var/lib/icinga2/api/zones/WDC/director# cat servicesets.conf
/**
 * Service Set: basic-linux-services
 * on host wdc-basic-linux
 */

apply Service "check-http" {
    import "check-http"


    assign where "wdc-basic-linux" in host.templates
    zone = "WDC"

    import DirectorOverrideTemplate
}

apply Service "check-ssh" {
    import "check-ssh"


    assign where "wdc-basic-linux" in host.templates
    zone = "WDC"

    import DirectorOverrideTemplate
}

/**
 * Service Set: basic-windows-services
 * on host wdc-basic-windows
 */

apply Service "check-winrm " {
    import "check-winrm "


    assign where "wdc-basic-windows" in host.templates
    zone = "WDC"

    import DirectorOverrideTemplate
}

root@AWS-NOC-02:/var/lib/icinga2/api/zones/WDC/director# cd ../../AWS-NOC-02/director/
root@AWS-NOC-02:/var/lib/icinga2/api/zones/AWS-NOC-02/director# ls
endpoints.conf  hosts.conf  zones.conf
root@AWS-NOC-02:/var/lib/icinga2/api/zones/AWS-NOC-02/director# cat endpoints.conf
object Endpoint "WDC-NOCSat-02" {
    host = "10.10.10.20"
    port = "5665"
    log_duration = 1d
}

root@AWS-NOC-02:/var/lib/icinga2/api/zones/AWS-NOC-02/director# cat hosts.conf
object Host "WDC-NOCSat-02" {
    import "wdc-satellites"

    display_name = "WDC-NOCSat-02"
    address = "10.10.10.20"
}

root@AWS-NOC-02:/var/lib/icinga2/api/zones/AWS-NOC-02/director# cat zones.conf
object Zone "WDC" {
    parent = "AWS-NOC-02"
    endpoints = [ "WDC-NOCSat-02" ]
}



I hope the changes I made cleaned up my configuration. I felt like it was a complete mess and that just makes troubleshooting difficult.

Thank You!
-Adam

Hi,

I will check it out tomorrow morning :slight_smile: In the mean time could you specify where you would like more help that will help me enormously, Sort of a morning puzzle to wake up the brain haha.

I see 3 problems in your first screeny Are those the problems you are trying to solve at this point in time ?

Thanks

No rush at all (I am about to head home for the day)! I appreciate all the time you have spent helping.

My current goal is the same one I started with. Its been a fun puzzle :slight_smile: .

I am just looking to setup distributed monitoring. Master communicates with Satellites which communicate with Agents. Currently the setup is just a simple POC setup. Master server, Satellite server, Windows agent, Linux agent, and a Meraki switch are the only devices involved here. All three of those agents should only ever be checked by the Satellite. Satellite reports back to Master. Master displays the dashboard. The checks being performed are just ping (hostalive), ssh (check-ssh), http (check-http), and winrm (check-winrm).

Have a wonderful day!
-Adam

Yes, the reason:

The Icinga 2 hierarchy consists of so-called zone objects. Zones depend on a parent-child relationship in order to trust each other.
found here: https://icinga.com/docs/icinga2/latest/doc/06-distributed-monitoring/#zones

I am not seeing any way to get command_endpoint = host.vars.agent_endpoint into the configuration with Director UI. I can’t change it manually because Director will just change it back.

Ah yes true, that might be a director limitation. and it want so set it to host.name which would work too that is what you should see when you do run on agent = yes

I do not see any weird stuff specifically in your new config at frist glance it is a way better setup :slight_smile:

When selecting a command_endpoint I don’t have the ability to manually enter in a string like that unfortunately. It feels weird for Director to have that limitation.

However, I can of course select run on agent = yes for all of the templates. This gets rid of the command_endpoint option.

However this also forces me to select a value for establish connection and accepts config. I am selecting no here because these agents do not have icinga installed. Goes back to the goal of not needing to have to install icinga on every host I want to monitor. Just need the satellites to do ping/http/winrm/ssh checks.

Now for the host template that is applied to the satellite host, if I tell it that the icinga agent is installed (which it is since it is a satellite), I cannot tell it that its command endpoint is master.

I can however add multiple templates to a host so let me try that. I will use the above template to tell the satellite that it is in the WDC zone and its command endpoint is AWS-NOC-02 (master). Then I will use this 2nd template to tell icinga that the satellite has an agent installed and accept configuration.


Let’s see if this will deploy. Nope.

information/cli: Icinga application loader (version: r2.8.1-1)
information/cli: Loading configuration file(s).
information/ConfigItem: Committing config item(s).
information/ApiListener: My API identity: AWS-NOC-02
critical/config: Error: Validation failed for object 'WDC-NOCSat-02' of type 'Host'; Attribute 'command_endpoint': Command endpoint must be in zone 'WDC' or in a direct child zone thereof.
Location: in [stage]/zones.d/WDC/hosts.conf: 22:1-22:27
[stage]/zones.d/WDC/hosts.conf(20): }
[stage]/zones.d/WDC/hosts.conf(21): 
[stage]/zones.d/WDC/hosts.conf(22): object Host "WDC-NOCSat-02" {
                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[stage]/zones.d/WDC/hosts.conf(23):     import "wdc-satellites-endpoint"
[stage]/zones.d/WDC/hosts.conf(24):     import "wdc-satellites-agent"

critical/config: 1 error

So the command_endpoint, which in this case is AWS-NOC-02, needs to be in the WDC zone. But there is no way it can be in that zone because it is the master and in its own zone. If we look at the endpoints we can see that it is not in the WDC Zone.

Lets go ahead and revert the template back to what it use to be and then redeploy so that we can see the other changes

Unfortunately, despite these changes, the dashboard appears to be the same

I wonder what I am missing. I will have to continue to tinker with this

Hi @hewithaname

this:

critical/config: Error: Validation failed for object 'WDC-NOCSat-02' of type 'Host'; Attribute 'command_endpoint': Command endpoint must be in zone 'WDC' or in a direct child zone thereof.
Location: in [stage]/zones.d/WDC/hosts.conf: 22:1-22:27
[stage]/zones.d/WDC/hosts.conf(20): }
[stage]/zones.d/WDC/hosts.conf(21): 
[stage]/zones.d/WDC/hosts.conf(22): object Host "WDC-NOCSat-02" {

Is a sneaky problem I ran into this last week :slight_smile:
No idea how to fix it in the director but I configure this:

object Host "l" {
  import "satelite-host"
  // Host Details
  icon_image = "tux.png"
  display_name = ""
  address = ""

  // Assign variables
  vars.os = "Linux"
  vars.type = "Satelite"
  vars.env = "PROD"
  vars.agent_endpoint = name
  vars.purpose = "Icinga_Satelite"
  vars.zone = "AW-US"
  zone = "master"

where:
zone = "master"
solves that sticky problem

1 Like

Very interesting! That’s good information to have. I think the next best steps are to do some manual configuration of the conf files to establish a proper setup between the master and satellites and then use director kickstart to import them as objects. That would bypass a lot of the issues we are seeing here.