Dependencies notifying immediately after recovery

This question is complementary to my unanswered question at Notifications despite Dependencies , where my configs and system details can be found.

Where dependencies do work, when the router recovers, I immediately get notifications from hosts behind the parent. Ideally these should either repoll immediately or at least wait until the next pollling cycle is complete.

What is the default behavior supposed to be?
What should I put in the dependency or notification config to get the desired behavior?

Hi,

if I understood you correctly the problem is that when your router goes CRITICAL (first notification) all the child hosts are “handled” by the dependency (no notification), if the router now comes back online ( status change to OK) you’ll be notified for all child hosts?

You can disable the checks with disable_checks = true, this prevents that the child hosts are checked as long as the dependency is active.

For a more detailed look we need an example, please add the configuration including all templates so we can see the check_intervals and retry_intervals.

Best regards
Michael

2 Likes

Michael,

Thank you for your response.

As you can see in the configs I linked at Notifications Despite Dependencies, disable_checks = true is set in the config, and shows up in the dependency object. The check_intervals and retry_intervals are there as well.

Any suggestions are welcome.

I’ll hook into this, as I had the same problem just now (and sometimes in the past)


Log Master 01 (filtered for name of hosts)
[2020-02-19 09:10:22 +0100] information/Checkable: Checkable 'abc-0527-LR01' has 1 notification(s). Checking filters for type 'Problem', sends will be logged.
[2020-02-19 09:36:24 +0100] information/Checkable: Checkable 'abc-0527-LR01' has 1 notification(s). Checking filters for type 'Recovery', sends will be logged.
[2020-02-19 09:36:26 +0100] information/Checkable: Checkable 'abc-0527-FW01' has 1 notification(s). Checking filters for type 'Problem', sends will be logged.
[2020-02-19 09:36:26 +0100] information/Notification: Sending 'Problem' notification 'abc-0527-FW01!xy-0527-Hosts' for user 'xxx'
[2020-02-19 09:36:26 +0100] information/Notification: Completed sending 'Problem' notification 'abc-0527-FW01!xy-0527-Hosts' for checkable 'abc-0527-FW01' and user 'xxx' using command 'mail-host-notification-prio1'.
[2020-02-19 09:36:27 +0100] information/Notification: Sending reminder 'Problem' notification 'abc-0527-LR01!Ping_ISP_public_IP!xy-0527-Public-IP_ISP' for user 'xxx'
[2020-02-19 09:36:27 +0100] information/Notification: Completed sending 'Problem' notification 'abc-0527-LR01!Ping_ISP_public_IP!xy-0527-Public-IP_ISP' for checkable 'abc-0527-LR01!Ping_ISP_public_IP' and user 'xxx' using command 'mail-service-notification-prio2'.
[2020-02-19 09:36:27 +0100] information/Notification: Sending reminder 'Problem' notification 'abc-0527-SW-MS225!xy-0527-Hosts' for user 'xxx'
[2020-02-19 09:36:27 +0100] information/Notification: Completed sending 'Problem' notification 'abc-0527-SW-MS225!xy-0527-Hosts' for checkable 'abc-0527-SW-MS225' and user 'xxx' using command 'mail-host-notification-prio1'.
[2020-02-19 09:36:46 +0100] information/Checkable: Checkable 'abc-0527-SW-MS225' has 1 notification(s). Checking filters for type 'Recovery', sends will be logged.
[2020-02-19 09:37:44 +0100] information/Checkable: Checkable 'abc-0527-LR01!Ping_ISP_public_IP' has 1 notification(s). Checking filters for type 'Recovery', sends will be logged.
[2020-02-19 09:41:13 +0100] information/Checkable: Checkable 'abc-0527-FW01' has 1 notification(s). Checking filters for type 'Recovery', sends will be logged.
Log Master 02 (filtered for name of hosts)
[2020-02-19 09:10:22 +0100] information/Checkable: Checkable 'abc-0527-LR01' has 1 notification(s). Checking filters for type 'Problem', sends will be logged.
[2020-02-19 09:25:26 +0100] information/Notification: Sending reminder 'Problem' notification 'abc-0527-LR01!xy-0527-Hosts' for user 'xxx'
[2020-02-19 09:25:26 +0100] information/Notification: Completed sending 'Problem' notification 'abc-0527-LR01!xy-0527-Hosts' for checkable 'abc-0527-LR01' and user 'xxx' using command 'mail-host-notification-prio1'.
[2020-02-19 09:36:24 +0100] information/Checkable: Checkable 'abc-0527-LR01' has 1 notification(s). Checking filters for type 'Recovery', sends will be logged.
[2020-02-19 09:36:26 +0100] information/Checkable: Checkable 'abc-0527-FW01' has 1 notification(s). Checking filters for type 'Problem', sends will be logged.
[2020-02-19 09:36:46 +0100] information/Checkable: Checkable 'abc-0527-SW-MS225' has 1 notification(s). Checking filters for type 'Recovery', sends will be logged.
[2020-02-19 09:37:44 +0100] information/Checkable: Checkable 'abc-0527-LR01!Ping_ISP_public_IP' has 1 notification(s). Checking filters for type 'Recovery', sends will be logged.
[2020-02-19 09:41:13 +0100] information/Checkable: Checkable 'abc-0527-FW01' has 1 notification(s). Checking filters for type 'Recovery', sends will be logged.

Dependency-Configuration:

apply Dependency "abc-LR01_zu_abc-0527-Hosts" to Host {
    parent_host = "abc-0527-LR01"
    disable_checks = true
    disable_notifications = true
    ignore_soft_states = false
    period = "24x7"
    assign where match("abc-0527*", host.name) && host.name != "abc-0527-LR01"
    states = [ Up ]
}

Notification apply rules (for hosts and service):

apply Notification "xy-0527-Hosts" to Host {
    times = {
        begin = 15m
    }
    command = "mail-host-notification-prio1"
    interval = 0s
    period = "24x7"
    assign where match("abc-0527*", host.name) && host.name != "abc-0527-AP"
    states = [ Down, Up ]
    types = [ Custom, Problem, Recovery ]
    users = [ "xxx" ]
}
apply Notification "xy-0527-Public-IP_ISP" to Service {
    times = {
        begin = 15m
    }
    command = "mail-service-notification-prio2"
    interval = 0s
    period = "24x7"
    assign where service.name == "Ping_ISP_public_IP" && host.name == "abc-0527-LR01"
    states = [ Critical, OK, Unknown, Warning ]
    types = [ Custom, Problem, Recovery ]
    users = [ "xxx" ]
}

The user only gets the CRITICAL/DOWN notifications.

I have the feeling that times = { begin = 15m } is the “bad boy” here.

Setup:
2 Masters, 2 Satellites (doing the checks)
CentOS7
Icinga v2.11.0 (trying to schedule an update this week to get to 2.11.2)

@ken Do you delay your notifications as well?

Another thought:
Host template:

template Host "_iamatemplate_" {
    check_command = "icmp"
    max_check_attempts = "3"
    check_interval = 5m
    retry_interval = 1m
...

Would it help raising the retry_interval for the parent host, e.g. to a multiple of the check_interval of the child hosts? That was a suggestion the last time I inquired about this. I can’t really see why that would help. Also this would delay notifications even further.

@log1c I added times.begin on the router notification, which helped but did not cure my other issue (Notifications despite Dependencies, but did not affect this issue, one way or another.

So you didn’t have the times = { begin = something } in your notification config before?

What I was aiming it, is that exactly this setting may cause this behavior. But if you have not had this setting prior to my post, I think this can be ruled out as the culprit of the problem.

Yes, same behavior with and without times.begin

I added

disable_checks = true

to Dependency "router-internal" and let it run overnight. Out of 6 events, instead of 2-3-4-5 notifications on each router recovery, I got 0 on the first 5 and 1 on the 6th. So I’ll call that fixed.

Unfortunately Icinga is still an order of magnitude noisier than our older monitoring solution, because basic dependencies are still not stopping alerting while a parent is down. Unless I see some progress on Notifications despite Dependencies I will need to find another solution.