Seeking Help With Configuring One-Way Connectivity

I work for an MSP and we use Icinga2 internally to monitor customer devices, including all workstations/notebooks.

~100 Notebooks are configured with one-way connectivity. The intention is that supported devices can communicate back to a WAN IP from anywhere.

I have configured the host check to be the built-in cluster-zone command. The hosts will almost never be pingable, a traditional ping host check wouldn’t work. All services are dependent on “Agent Health” where the command is cluster. I have enabled Active and Passive Checks, Notifications, and Event Handler. While devices are online, this works quite well.

The problems start when the device sleeps or is coming back online. We will immediately receive a notification for every service on the host right before they all recover. Sometimes the problem notifications and recoveries come at the same time. The issue is creating alarm fatigue in our techs :frowning:

Can anyone explain why this happens and how I can implement this without these notifications?

I’d really like to know if the “not connected” messages are from the notebook agent or from the satellite/parent. I need the statistics, and it must produce notifications for things like 100% full disks while users are working from home. I don’t care if we’re not getting results because the device is offline.

Thanks for any help!

Hi @XilityWorks ,
hm, that is probably going to be a tricky one. Icinga 2 ist pretty much built from/for a “server” setup, meaning having 24/7 connectivity, so this here is a bit unconventional.

First things first, you did configure Dependencies, did I read that right? Meaning a dependency on all Services if the connection to the Icinga Infrastructure fails.

How are you configuring this setup? Using the Icinga Director or plain text configuration?

Hi Wil,

I am not sure if the alerts seen on the console are the issue, or if the notifications (emails or other) are the issue, or both.

First, I would configure a small check period for the host, and a slightly longer period for the dependent services.

For the alerts on the console, I would setup a filter excluding the alerts for services related to hosts that are down, and the alerts that are not yet in a hard state.

For the notifications, I would setup a delay that guarantees the notification will only be sent for the ascertained failing checks.

My two cents,

Jean