Not understanding why I can't get the following (various attempts) at a dependency working?

Porcupine · October 7, 2021, 6:31am

I’ve been having difficulties creating a dependency, and I can’t help but think there’s something wrong with my general way of going about it. Here’s a bit of backstory:

I’m trying to monitor some optics on DWDM units. Each path ends up being 4x optics (RX/TX on each), across two devices (location1/location2). I ended up splitting each channel to be monitored into its own make-belief host, which allowed the integrating of the two actual hosts (for path visibility/clarity), but also to allow granting visibility to impacted parties and lock them to viewing their channels VIA Icinga Web 2.

Alerting is where it’s all falling apart though. It’s normal for various elements in a channel to go down, when either end-point gets unplugged. We don’t want to see alerts go out when a user downs their respective endpoint for maintenance/whatnot, so we want to silence various alerts if either endpoint go down. After a slew of trial and error, I’ve broken down to just a dead basic dependency (which I can see taken/action/etc. by running icinga2 in debug mode), yet alerts go out no matter what. Config looks something like this:

object HostGroup “dwdm1-ch28” {
display_name = “DWDM1 - Channel 28”
}

object Host “dwdm1-ch28” {
import “template-network-host”
display_name = “Network: DWDM1-Channel 28”
address = “127.0.0.1”
groups = [ “dwdm1-ch28” ]
}

apply Dependency “dwdm1-tor1-ch28” to Service {
parent_service_name = “tor1.2-RX”
states = [ OK ]
disable_notifications = true
assign where service.name == “tor1.1-RX”
}

apply Notification “mail-service-notification-dwdm1-ch28” to Service {
import “mail-service-notification”
user_groups = [ “dwdm1-channel28-admins” ]
interval = 5m
assign where host.name == “dwdm1-ch28”
ignore where host.name == “dwdm1-ch28” && ( service.name == “tor1.2-RX” || service.name == “tor2.2-RX” )
}

[service checks configured below, all work as expected]

In the above example, there are 8 services, with the following states (entirely static, since it’s a test channel ATM):

tor1.1-RX [Critical]
tor1.1-TX [Ok]
tor1.2-RX [Critical]
tor1.2-TX [Ok]
tor2.1-RX [Critical]
tor2.1-TX [Ok]
tor2.2-RX [Ok]
tor2.2-TX [Ok]

The tor1.2-RX, and tor2.2-RX are the variables we want to be dependent on, if either are down, anything else can be down and it’d be potentially normal. As you’ll notice, I’m only testing on single dependency, against a single variable, etc. (I’ve dumbed it down as far as I could assuming I made mistakes, just trying to get ANY results to work back from, to no avail).

My understanding is that the dependency should trigger the “reachability” of the “tor1.1-RX” service to be unreachable, and notifications for said service should cease once its unreachable. When watching that service VIA icingaweb2, I see the reachability flag flip perpetually between green/reachable, and red/unreachable ~2x/minute, but I have no idea why it’s getting set back to reachable. The dependency success/fail status (depending on my testing parameters) is constant in the debug log.

Is this broken because the service dependency has the same host for parent and child? Is there some flaw in the logic? I’ve tried dozens of combinations at this point, but the only constant is that alerts continue no matter what, regardless of any dependency status/etc.

What am I doing wrong? Anyone?

Porcupine · October 8, 2021, 9:37pm

For anyone curious, I’ve figured it out:

I hadn’t seen that Icinga2 changed how dependencies are handled for services earlier this year (any passing dependency overwrites all failing dependencies, resulting in a service’s state being reachable if any individual dependency tied to it passes).

There was a conflicting [general network] dependency in this case, that another administrator accidentally tied to services (trying to idiot proof things, by tying the dependency to both hosts and services, not knowing that service dependencies were intrinsically tied to their hosts). This is why I kept seeing the reachable flag flip between red/green a few times a minute, as the two sets of dependencies acted upon the respective services.