I’m looking for ideas about how to implement the following scenario: We have a distributed monitoring setup as it is described as “Top Down Config Sync” in the current Icinga2 documentation. We’ll get a new node placed out in the internet (outside of our data center). The goal is to do a small subset of checks we have twice:
- To let this outside node send out notifications in case the master is down
- To check the internal and external view (ie, we need to have ping latency results as they are from inside our network, as well as how they are from the internet, visible in Icingaweb)
We checked some approaches, which are all not really satisfying. Hope someone of you has ideas.
Let the outside node run standalone: Doesn’t really fit our requirements, as then we don’t have the check results of the outside node visible in Icingaweb (and we don’t want to have a dedicated Icingaweb instance setup for the outside node). Also, notifications get sent out twice if master and outside node are not able to coordinate.
High availability master: Not applicable, as this outside node should only do a small subset of checks and not all of them. As the node is placed outside our datacenter, it only has limited access to the internal network anway.
Using check_command: Not applicable, as the outside node should start sending notifications as soon as the master is down (thus the outside node needs its own scheduler).
Connect this outside node via top down config sync: This would be possible, as we can allow the outside node and the master to do bidirectional communication and thus could attach the outside node similar as every inside node.
I was trying some approaches with top down config sync, which we all not really satisfying:
- Distribute the the Host and Service objects that should be checked twice from the master to the outside node: Doesn’t work, as the master then complains about having the Host being redefined in zone.d. But I actually need it twice, as otherwise I can’t monitor twice.
- Manually enter Host and Service objects in conf.d on the outside node: Would work, but has two disadvantages:
** Notifications are sent twice (outside node should not send out notifications if the master is available to do it)
** The outside node also submits the status of the Host object to the master. Seems like there is no way to to have this filtered out.
Why do I want to have the status of the Host object not submitted? Probably the best solution would be if I could define (as an example) a service ping4_internal and ping4_external on a Host. The outside node should only submit the ping4_external service check to the master, nothing else. This would allow to clearly distinguish between what is the inside and what is the outside result. It would also allow to define distinct notification rules (ie, notify inside and outside with different “times”). However, when the outside node and the master don’t agree upon whether the host is up or not, then the host starts flapping on the master. Thus would be much easier if the outside node could submit only Service results.
Let’s start this discussion.