How do I monitor the primary, and send alerts if the primary goes down, from a Satellite?

I have a single Primary server with several Satellite servers. I read in the documentation that best practice for a HA master configuration is to use a single database and connect both masters to that IP (VIP, if the database is a cluster behind a load balancer), so unless I’m missing something, that’s not actually HA - because the database / VIP is still a single point of failure.

For this reason, I’d like to stick with my current model of a single Primary & multiple Satellites. That said, I’d like to monitor a few things from one of my secondary nodes - and have the secondary send alerts if those things go down. Specifically, I want a Satellite server to monitor the Primary node, as well as the underlying physical hypervisors that the Primary resides on.

The Primary is in 1 datacenter, and I’m putting a secondary into each of our other data centers across the globe.

I’m still fairly new to Icinga, but I believe that My Master/Satellite setup is configured in a Top Down Config Sync mode. When I make changes to the config on the Master, I don’t have to then go make any changes or restart anything on the Secondary for my changes to take effect.

I realize that the documentation encourages one to use a services.conf file in each zone and reference that with something like: assign where host.vars.agent_endpoint. That said, I’ve decided to define the service checks for each host (endpoint) in its own conf file, so that we can much more easily manage each host using Ansible.

I have an ansible playbook that I wrote which will setup monitoring for a specific endpoint that can be run like so, that will put the configuration for the new host that needs to be monitored into the correct zone on the Master server:

#  ansible-playbook add_manual_host.yml -e "hostname=fqdn ip_address=172.16.x.x zone=europe http=false vnc=false linux=true" -K
#	Possible Zones:
#		- americas
#		- europe
#		- asia

How can I accomplish monitoring the Primary (and sending notifications) from a secondary server? On my Primary, I have multiple zones (for each data center) defined in /etc/icinga2/zones.conf, and the corresponding zone directories in setup in /etc/icinga2/zones.d/.

Below are a few other details about my environment (stdout from the Primary). Thanks in advance!

root@icinga:/# icinga2 --version
icinga2 - The Icinga 2 network monitoring daemon (version: r2.13.2-1)

Copyright (c) 2012-2022 Icinga GmbH (https://icinga.com/)
License GPLv2+: GNU GPL version 2 or later <https://gnu.org/licenses/gpl2.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

System information:
  Platform: Ubuntu
  Platform version: 20.04.3 LTS (Focal Fossa)
  Kernel: Linux
  Kernel version: 5.4.0-99-generic
  Architecture: x86_64

Build information:
  Compiler: GNU 9.3.0
  Build host: runner-hh8q3bz2-project-298-concurrent-0
  OpenSSL version: OpenSSL 1.1.1f  31 Mar 2020

Application information:

General paths:
  Config directory: /etc/icinga2
  Data directory: /var/lib/icinga2
  Log directory: /var/log/icinga2
  Cache directory: /var/cache/icinga2
  Spool directory: /var/spool/icinga2
  Run directory: /run/icinga2

Old paths (deprecated):
  Installation root: /usr
  Sysconf directory: /etc
  Run directory (base): /run
  Local state directory: /var

Internal paths:
  Package data directory: /usr/share/icinga2
  State path: /var/lib/icinga2/icinga2.state
  Modified attributes path: /var/lib/icinga2/modified-attributes.conf
  Objects path: /var/cache/icinga2/icinga2.debug
  Vars path: /var/cache/icinga2/icinga2.vars
  PID path: /run/icinga2/icinga2.pid
root@icinga:/# icinga2 feature list
Disabled features: command compatlog debuglog elasticsearch gelf graphite icingadb influxdb influxdb2 livestatus opentsdb perfdata statusdata syslog
Enabled features: api checker ido-pgsql mainlog notification

We do this by simply using the cluster-zone check defined at the satellites for their parent zones. Their notification objects are assigned on the satellites only with this rule assign where host.zone == ZoneName

For the underlying hypervisors it would be easy in case of vSphere since you only need to define service checks at your satellites accordingly.

Thank you for your response. Unfortunately, I must still be missing something.

Reading through the documentation about how to use cluster-zone, it appears that’s talking about a master-master setup, whereas I have a single master / multiple satellites.

Nevertheless, I tried doing the following:

cd /etc/icinga2/zones.d/custom/
root@icinga:/etc/icinga2/zones.d/custom# cat health.conf 
apply Service "agent-health" {
  check_command = "cluster-zone"

  display_name = "agent-health-" + host.name

  // This follows the convention that the agent zone name is the FQDN which is the same as the host object name.
  vars.cluster_zone = host.name

  // Create this health check for agent hosts in the satellite zone
  assign where host.zone == "custom" && host.vars.agent_endpoint
}

apply Dependency "agent-health-check" to Service {
  parent_service_name = "agent-health"

  states = [ OK ] // Fail if the parent service state switches to NOT-OK
  disable_notifications = true

  assign where host.zone == "custom" && host.vars.agent_endpoint // Automatically assigns all agent endpoint checks as child services on the matched host
  ignore where service.name == "agent-health" // Avoid a self reference from child to parent
}

… and then I went into the maser zone, and created a health.conf file:

root@icinga:/etc/icinga2/zones.d/master# cat health.conf 
apply Service "satellite-zone-health" {
  check_command = "cluster-zone"
  check_interval = 30s
  retry_interval = 10s

  vars.cluster_zone = "custom"

  assign where match("icinga-master.fqdn.com", host.name)
}

… and I then added this Service object to the master host config file:

object Service "cluster" {
  host_name = "icinga-master"
  check_command = "cluster"
  check_interval = 5s
  retry_interval = 1s
}

This doesn’t appear to be working!
When I run icinga2 daemon -C, I see two warnings:

[2022-02-09 18:16:01 +0000] warning/ApplyRule: Apply rule 'agent-health-check' (in /etc/icinga2/zones.d/custom/health.conf: 13:1-13:48) for type 'Dependency' does not match anywhere!
[2022-02-09 18:16:01 +0000] warning/ApplyRule: Apply rule 'agent-health' (in /etc/icinga2/zones.d/custom/health.conf: 1:0-1:27) for type 'Service' does not match anywhere!

So then I switched gears, and I did what I thought was the correct thing to do in my situation, but after I put in the below configuration, icinga2 daemon -C didn’t produce any errors, but when I log into the icingaweb2 interface, I see a message that Zone master-host-BACKUP_MONITOR doesn’t exist.

On the master, I navigated to /etc/icinga2/zones.d/my-satellite, and I created the following file:

master-host.conf:

object Host "master-host-BACKUP_MONITOR" {
        check_command = "hostalive"
        address = "172.16.x.x"
}

object Service "Master Health" {
	check_command = "cluster-zone"
	host_name  = "master-host-BACKUP_MONITOR"
}

I should note that my zones are custom names (except for master), not the FQDN of the icinga satellite server “controlling” that zone.

What am I still missing in either of these attempts?

Try this simple example:

apply Service "icinga_parent" {
   display_name = "Icinga Parent Zone"
   check_command = "cluster-zone"

   vars.cluster_zone = host.vars.icinga_parent

   assign where host.vars.icinga_parent
}

It appears that I have monitoring of the Master working properly on the Satellite now. Thank you for your help. Unfortunately, the Satellite is not sending notifications, even from the icingaweb2 interface when I “force” a test notification.

Here’s what I see in the icinga2.log file on the Satellite when I tried to send a test notification from the Secondary:

[2022-02-10 11:26:12 +0000] information/HttpServerConnection: Request: POST /v1/actions/send-custom-notification (from [::ffff:127.0.0.1]:47096), user: icingaweb2, agent: , status: OK).

When I run this same test, on the same object, from the Primary, the notification gets sent correctly.

Notification is an enabled feature on the Satellite:

root@icinga:/var/log/icinga2# icinga2 feature list
Disabled features: command compatlog debuglog elasticsearch gelf graphite icingadb influxdb influxdb2 livestatus opentsdb perfdata statusdata syslog
Enabled features: api checker ido-pgsql mainlog notification

I have reviewed Delegate notifications master/satellite - #5 by rsx, and I setup the following config file in my Satellite zone:

root@icinga:/etc/icinga2/zones.d/custom# cat satellite-notify.conf 
apply Notification "mail-from-satellite" to Service {
  import "mail-service-notification"
  user_groups = host.vars.notification.mail.groups
  users = host.vars.notification.mail.users
  assign where host.name == "master-host-BACKUP_MONITOR" && host.zone == "custom"
}

But that’s not working either.

So I’m not sure what I’m missing here.

Have you checked if the notification object exists at your satellite e.g. icinga2 object list -n mail-from-satellite?

Oh, that’s interesting. Good catch!

From the Master, I see two objects when I run that command.
From the Satellite, I don’t see any objects.

So how do I get the Master to push these to the Satellite?

Best practice is to have global zones for such kind of objects. In this case a global zone e.g. notifications that exists on master and all satellites would be enough.

So it sounds like having this in /etc/icinga2/features-enabled/api.conf:

object ApiListener "api" {
  accept_config = true
  accept_commands = true
}

… and having this in zones.conf:

object Zone "global-templates" {
        global = true
}

object Zone "director-global" {
        global = true
}

… isn’t enough.

I’m just now realizing, with the way things are structured, I should probably have a folder on the Master server named “global-templates” inside zones.d, and put my config in there.

Thank you!

This is resolved (and I have now edited this post 2-3 different times).

Briefly, AFTER doing everything below (see “more details” below), I had the brilliant idea to actually check error logs. I found this snippet:

[2022-02-10 19:18:39 +0000] critical/ApiListener: Config validation failed for staged cluster config sync in '/var/lib/icinga2/api/zones-stage/'. Aborting. Logs: '/var/lib/icinga2/api/zones-stage//startup.log'

So then, I checked /var/lib/icinga2/api/zones-stage//startup.log, which led me to discover that the objects I was trying to create were trying to import things that didn’t exist.

My solution: In short, I had to move all of the relevant config files from /etc/icinga2/conf.d/ into /etc/icinga2/zones.d/global-templates/.

More details, and how I went about troubleshooting:
So now I have these three files in global-templates:

root@icinga:/etc/icinga2/zones.d/global-templates# ls

  • health.conf
  • satellite-notify.conf

Contents of each file are below. After making these changes, I’m getting the same results.
The command icinga2 object list -n mail-from-satellite shows 2 objects on the master, and none on the Satellite, so I’m still missing something. I feel like I’m close though.

root@icinga:/etc/icinga2/zones.d/global-templates# cat health.conf 
apply Service "agent-health" {
  check_command = "cluster-zone"
  display_name = "agent-health-" + host.name
  vars.cluster_zone = host.name
  assign where host.zone == "my-satellite-zone" && host.vars.agent_endpoint
}

apply Dependency "agent-health-check" to Service {
  parent_service_name = "agent-health"
  states = [ OK ] 
  disable_notifications = true
  assign where host.zone == "my-satellite-zone" && host.vars.agent_endpoint // Automatically assigns all agent endpoint checks as child services on the matched host
  ignore where service.name == "agent-health" // Avoid a self reference from child to parent
}

apply Service "satellite-zone-health" {
  check_command = "cluster-zone"
  check_interval = 30s
  retry_interval = 10s

  vars.cluster_zone = "master"

  assign where match("master-fqdn", host.name)
}

apply Service "satellite-master-health" {
  check_command = "cluster-zone"
  check_interval = 30s
  retry_interval = 10s
  vars.cluster_zone = "my-satellite-zone"
  assign where match("master-host-BACKUP_MONITOR", host.name)
}
root@icinga:/etc/icinga2/zones.d/global-templates# cat satellite-notify.conf 
apply Notification "mail-from-satellite" to Service {
  import "mail-service-notification"
  user_groups = host.vars.notification.mail.groups
  users = host.vars.notification.mail.users
  assign where host.name == "master-host-BACKUP_MONITOR" && host.zone == "my-satellite-zone"
}

Digging into this further, after I did the above, I moved the “global-templates” folder on the Master to a new folder called notifications (still in /etc/icinga2/zones.d/), and I edited zones.conf accordingly to reference the new notifications zone. I then edited the zones.conf on the Satellite server to match:

object Zone "notifications" {
global = true
}

The same two files that I mentioned above exist in this new notifications zone.
I restarted icinga2 on both the master & the Satellite, and here is what I observed:

On the Master, I see the files in /var/lib/icinga2/api/zones/notifications/_etc/
On the Satellite, I see the files in /var/lib/icinga2/api/zones-stage/notifications/_etc/

But icinga2 object list -t Notification on the Satellite still produces no results, so I’m clearly still missing something.

1 Like