Zero Downtime Upgrades?

Googling anything w/ downtime in it gets me a lot of info about setting downtimes in Icinga, which is not what I want.

If I’m using a distributed Icinga2 setup, am I correct in assuming I will still incur Icinga2 downtime at least when upgrading the masters?

Assuming downtime is unavoidable, can I run two parallel masters have have satellites/clients talk to both? Or would I have to duplicate the entire Icinga2 architecture down to two agents on each client, etc.?

Thanks!

Just upgrade one master after the other. If the changelog tells you nothing about a definite version “needs to be same” requirement (e.g. breaking cluster messages) or database schema upgrades, this is the way to go. Otherwise plan with an upgrade downtime window.

Satellite zones should always have the same endpoints for both masters in them, so shutting down one side won’t harm the cluster integrity.

Cheers,
Michael

We have an HA cluster with 2 masters on r2.10.5-1:

object Endpoint "monitoring-01" {
  //host = "10.155.0.8" // that's us
}
object Endpoint "monitoring-02" {
  host = "10.155.0.9" // connect to m-02
}
object Zone "master" {
  endpoints = [ "monitoring-01", "monitoring-02" ]
}

I need to also upgrade the OS (debian 9 to 10), because of missing libs that hinder the upgrade to v2.11 (and I don’t like dist-upgrades, they are potentially messy). I’ve already prepared a new machine on debian 10 and r2.11.3-1.
My question is, can I add this new one to the cluster as second master, and then stop the first and promote this to first master (deleting configs from /var/lib/icinga2/api/zones/master/_etc/company and checking them out again from git to /etc/icinga2/zones.d/master/company), and then add another clone of it as the secondary master?
My main issue is with the existing data, like acknowledgements and downtimes, since we use an internal tracker where icinga creates tickets through notofications, which in turn acknowledges the problem in icinga (so it doesn’t send more notifications for it, thus creating another ticket or adding a comment to existing ones). If I just stop both and start the new ones, I would lose that data and all existing tickets would be created again, thus confusing our servicedesk and the customers :slight_smile:
Another option I thought of is to migrate the database (and status files?) and eventually apply necessary patches.
What would you suggest is the best method here and the one with the least downtime?

Thanks,
Hugo.

@dnsmichi
can you please suggest how best to proceed here? it’s a pretty important topic, because of the whole tracker tickets in our system (not to mention preserving the whole history, to be able to offer SLA reports on customer demands).

root@monitoring-01:~$ icinga2 --version
icinga2 - The Icinga 2 network monitoring daemon (version: r2.10.5-1)

root@monitoring-03:~$ icinga2 --version
icinga2 - The Icinga 2 network monitoring daemon (version: r2.11.4-1)

should I join another m4 to the cluster m1/m2, using r2.10.5-1, remove m1/m2 from cluster, upgrade m4 to r2.11.4-1 and join m3? I still think perhaps the simplest method would be DB export from m1, import to m3, then join another m4 and stop m1/m2…

Hello @netphantm :slight_smile:

I’m afraid that Michi retired from this forum a few months back, so I would suggest that you open a new topic for your inquiry instead :slight_smile:

Have a nice day,
Feu