Is there a possibility, that if the satellite is not reachable, the master-endpoint will take over the monitoring in the satellite zone?
Current behaviour is that the checks are just on “Pending” if the satellite is not working.
Thanks for your help!
thanks for your answer.
I am already aware of creating several satellites for ech zones to get the high availability.
But this is not really a solution for our environment.
We have 1 master system (Hardware) and 7 satellite systems around the globe (virtualized). If we need to double satellites the maintenance effort would increase a lot. Furthermore I don´t see any benefits to create 2 satellites in one location. If there is no power at the local location or the virtualization cluster has problems, both satellites won´t work. That´s why a option for “fallback to master” would be the best solution for us.
That is understandable. an extra server does bring costs and maintenance (even if it is automated) I am afraid I have not seen this option to make the master a fallback.
But what I have found is that you can put dependencies into icinga so if your satellite goes down ( and possibly your agents) you only would get alerts for the satellite and not the rest of the hosts in that zone.
Unfortunately dependencies wouln´t solve our problem. And it isn´t actually needed. Because if the satellite is not reachable, the checks just stay on “OK” and don´t get critical or warning, because the check source is not reachable.
But we want notifications and historical data about the local infrastructure, even if the satellite isn´t working. In case of problems in satellite zones we never know where it comes from because all checks just stay at “OK-state” as soon as the satellite is down. We don´t know which switches are down or which UPSs have power or not. This would be possible with the fallback to master system.
For this icinga should use its replay log function unless you specifically tell it to turn it off as long as the satellite is down the agents should still collect all the check results and send them once the satellite is up. It will make your aci possible but only after the outage not during the outage.
You can technically run multiple icinga processes on the master node in case you really want the fallback but by doing this you make the master technically a satellite too for each zone this can cause a lot of unwanted network traffic and latency sou you could end up with 8 icinga processes for 7 zones and 1 master so a lotof overhead and maintenance there too.
Unfortunately no “work around” would help us a lot and would be good in our big environment.
I´m worried about creating several Icinga2 Processes on the master and maintain that.
One main goal for IT-Monitoring is to help in big outages, which doesn´t work with master and satellites right now without bad work arounds.
My only hope is that Icinga will support the option that the master can be used as fallback in the future.