Director Background Service in two Server Cluster - Can it cause problems when running twice?

ChrissK · May 1, 2022, 12:02pm

Hi there,

I’m building up a Icinga2 / IcingaWeb2 Cluster at the moment.

For IcingaWeb2 I got a two server cluster (behind loadbalancer) with shared volumes (/etc/icingaweb2 & /usr/share/icingaweb2/) to ensure same versions, sources, modules and its configuration.

The webinterface is NOT running on the same hosts Icinga2 is - so config-master of my Icinga2 setup is somewhere else (just for info in case anyone is asking).

Both nodes share the same database (icingaweb2 & director) which works fine so far.

At the moment there is running one director background service on each host.
They are recogniced by GUI for both hosts (both nodes show both background services).

So far, so good - but I wondered if this could cause any troubles when I decide to define import sources / syncs / jobs for director to import stuff automatically.

I didn’t find “anything” in the docs regarding this so I asked myself:
(1) Is there a internal mechanism that will ensure this kind of jobs will never executed in the same time from both daemons
=> Since there is a database table for running daemons, theoretically they could know there is a second one
=> If not - is there anything else that will ensure things won’t get messy when running two daemons (with same database)

(2) Is there a way to configure the daemon to place it’s pid to a specified location
=> “icingacli [director [daemon]] --help” indicates there’s no such option (since there is no info about it)
=> If it would be possible it would make things easy - dropping the pid file into the shared voldume, so only one node is running the daemon
=> Otherwise I could do it “manually” within the service file and ensure it is running only once - but I would prefer the daemon could do it by itself

(3) Clould it cause any other side effects running the daemon ony once (one node without it)
=> Doc indicates daemon is needed for exactly these things I try to run only once (+ Housekeeping)
=> I guess Housekeeping is ment as DB-Housekeeping !? So I don’t care if this is something only one instance is doing
=> So I would think this shouldn’t be a problemm to have one node without running a daemon locally
=> Is there anything I’m missing in my thougths about running it only on one cluster node?

(4) If this all is just a bad idea - I could:
=> Install IcingaWeb2 with director module onto a third node (or even directly on my Icinga2 config-master)
=> Proxy /icingaweb2/director to the single running director instance
=> Only one daemon is required sice director gui only exists once
=> Not tested yet at all - so no idea this is running well (or at all)
=> Not a fan of my own thought …

Above I wrote >I didn’t find “anything” in the docs regarding this< - this is (as you probably know) not quite true.

Of course I fond some lines like “Jobs are currently not executed in parallel” - but I guess this is ment like “one daemon will not execute two different jobs at the same time”.
Maybe it’s simple as that and it will also ensure two daemons will not execute the same job at the same time - but I don’t think this is part of the meaning.

So is there anyone with experience in setup director in a cluster setup?
Anyone with deeper knowledge how these things will be handled by the daemon?

I would appreciate any constructive thoughts about my thinking / plans for this setup.

Thanks in advance,
Chris

PedroMSantosD · February 10, 2023, 3:56pm

Hi, not sure if this will help you, as I suffer the same doubts myself but:

My architecture is:
IcingaWeb-1; running icinga-director service, connected to configuration master and backend PG database
IcingaWeb-2; running icinga-director service, connected to configuration master and backend PG database

by configuration master 1 mean one of the two nodes in the ‘master’ zone icinga2 cluster; the one holding contents under /etc/icinga2/zones.d ; where my ‘code’ definition lies for the cluster.

When both services are running,
systemctl status report them one one connected, and the other not connected to backend database due to exisitng lock, so I guess director developer did design for hot-stand-by and no rush condition betweeen director instances.

Note that this is dev environment and I’m just starting to play with it:
systemd status:

Node 1: notice db connectied

root@monitoring-icinga-web ~# systemctl status icinga-director
● icinga-director.service - Icinga Director - Monitoring Configuration
   Loaded: loaded (/etc/systemd/system/icinga-director.service; enabled; vendor preset: disabled)
   Active: active (running) since Fri 2023-02-10 00:01:23 EST; 10h ago
     Docs: https://icinga.com/docs/director/latest/
 Main PID: 22706 (icingacli)
   Status: "running, db: connected"
   CGroup: /system.slice/icinga-director.service
           └─22706 icinga::director: running, db: connected

Feb 10 00:01:23 monitoring-icinga-web-1..internal systemd[1]: Stopped Icinga Director - Monitoring ...n.
Feb 10 00:01:23 monitoring-icinga-web-1..internal systemd[1]: Starting Icinga Director - Monitoring.....
Feb 10 00:01:23 monitoring-icinga-web-1..internal systemd[1]: Started Icinga Director - Monitoring ...n.
Hint: Some lines were ellipsized, use -l to show in full.

Node 2: notice db blocked:

root@monitoring-icinga-web-2. ~# systemctl status icinga-director
● icinga-director.service - Icinga Director - Monitoring Configuration
   Loaded: loaded (/etc/systemd/system/icinga-director.service; enabled; vendor preset: disabled)
   Active: active (running) since Thu 2023-02-09 19:29:42 EST; 15h ago
     Docs: https://icinga.com/docs/director/latest/
 Main PID: 1164 (icingacli)
   Status: "running, db: locked by other instance"
   CGroup: /system.slice/icinga-director.service
           └─1164 icinga::director: running, db: locked by other instance

Feb 09 19:29:42 monitoring-icinga-web-2.internal systemd[1]: Stopped Icinga Director - Monitoring ...n.
Feb 09 19:29:42 monitoring-icinga-web-2.internal systemd[1]: Starting Icinga Director - Monitoring.....
Feb 09 19:29:42 monitoring-icinga-web-2.internal systemd[1]: Started Icinga Director - Monitoring ...n.
Feb 09 19:29:42 monitoring-icinga-web-2.internal icingadirector[1164]: RuntimeException in /usr/sha...y
Hint: Some lines were ellipsized, use -l to show in full.

with

Feb 09 19:29:42 monitoring-icinga-web-2.internal icingadirector[1164]: RuntimeException in /usr/share/icingaweb2/modules/director/library/Director/Daemon/DaemonDb.php:161 with message: DB is locked by a running daemon instance, will retry

if stopping (restarting) node 1: they will switch:

root@monitoring-icinga-web-2.internal ~# systemctl status icinga-director -l
● icinga-director.service - Icinga Director - Monitoring Configuration
   Loaded: loaded (/etc/systemd/system/icinga-director.service; enabled; vendor preset: disabled)
   Active: active (running) since Thu 2023-02-09 19:29:42 EST; 15h ago
     Docs: https://icinga.com/docs/director/latest/
 Main PID: 1164 (icingacli)
   Status: "running, db: connected"
   CGroup: /system.slice/icinga-director.service
           └─1164 icinga::director: running, db: connected