Hostalive returns CRITICAL, yet reachable shows "yes"

dbodky · October 9, 2019, 7:57am

Hello everybody,

I’ve got a problem with understanding how a host can (apparently) be both, DOWN (hence the hostalive check returning Critical) and reachable at the same time. Which information can I derive from this constellation, and where might possible errors regarding configuration or connectivity between my master and the host lie?

Thanks for any answers
Daniel

aflatto · October 9, 2019, 8:07am

Hello Daniel

Can you share with us the host object configuration and the service definition for the hostalive ?
Is the setup a distributed one or are you executing all the checks from the master ?

dbodky · October 9, 2019, 8:19am

Hi Assaf,

thanks for helping me out here The setup is partly distributed, but all checks for this specific machine are executed on the master node. The host object looks as follows

  object Host "[SRS]Router" {
    import "generic-host"
    address = "xx.xx.xx.xxx"
    vars.standort = "city"
    vars.os = ""
    vars.notification["mail"] = {
      users = [ "admins" ]
    }
    groups = [ "iaas" ]
  }

and the service object defining the hostalive-checks

  template CheckCommand "hostalive-common" {
    vars.ping_wrta = 3000.0
    vars.ping_wpl = 80

    vars.ping_crta = 5000.0
    vars.ping_cpl = 100
  }

  object CheckCommand "hostalive" {
    import "ping"
    import "hostalive-common"
  }

  object CheckCommand "hostalive4" {
    import "ping4"
    import "hostalive-common"
  }

I hope this is helpful.

aflatto · October 9, 2019, 8:29am

Hello Daniel

This might be unrelated to Icinga, but if the device is a router, is it possible it is blocking ICMP, while you can reach it on telnet/SSH or the other management ports (SNMP) used by the vendor ?

dbodky · October 9, 2019, 8:34am

That might actually be it. So trivial I never thought of that, will look into it in a moment. In the meantime, do you happen to know why the host is marked as “reachable” anyways? How is Icinga determining if a host is reachable if ICMP is rejected?

Regards
Daniel

log1c · October 10, 2019, 7:31am

The “reachable” status in the web interface is part of the dependency calculation.
It has nothing to do with the actual host state.

Example:
If you configure a router as a parent of a server you will see the “reachable” status of the server change to “no” in the web interface, in case the router goes DOWN.
Check: https://icinga.com/docs/icinga2/latest/doc/03-monitoring-basics/#dependencies

The dependencies are used for limiting notifications of child hosts (e.g behind a central router/switch), so you don’t get bombarded with messages, if the central component is DOWN. You will then only get the notification for the parent host.

dbodky · October 10, 2019, 10:59am

Thank you very much, I did not know this until now and it clarifies so much and opens new possibilities.