I have a setup with an Icinga Master monitoring several groups of Slaves (no Satellites involved).
The Slaves are grouped into Dev(elopment), Test and Live.
When my Dev Slaves got upgraded from Icinga 2.10 to 2.11, they stopped running some (not all) checks correctly, so I downgraded them again and stopped the Master, and the Test and Live Slaves from upgrading.
I think that the correct way to perform this upgrade would be to do the Master machine first, but I’m nervous about doing this simply because it is the Live Icinga server, so I’m wondering - is there a way I can have a Test Icinga Master server operating alongside the Live Master, both talking to the same back-end Slaves and getting all their monitoring updates, so that I can upgrade the Test Master and check that it all works okay, before being confident to upgrade the Live master (which other people are watching, not just me) once it looks stable?
Basically, I want to do a controlled roll-out of Icinga 2.11, but in a way which doesn’t break my (currently working) 2.10 Master, and doesn’t lose any monitoring history if there’s a problem.
What have other people done in this situation for a production environment where downtime or breakage of the monitoring system itself is really unacceptable?
Ideas and/or case studies welcome