Alternative host check_command examples?

connrs · January 19, 2022, 9:33am

Hi all,

I’ve recently had some issues with ping checks not being suitable as a means to verify if a host is up on Azure.

Ping was working (oddly) but the system was clearly offline and the services on the host were all aware that the agent was offline.

Is there an alternative check that is recommended as an alternative hostalive/ping? It was pretty clear that Icinga knew that the agent wasn’t responding - is there an agent ping?

I appreciate that I’m asking for advice here without giving you any details. Imagine that all the nodes are on the same subnet and on the same rack!

Thanks,
connrs.

Pooh · January 19, 2022, 9:54am

Hi all,

I’ve recently had some issues with ping checks not being suitable as a
means to verify if a host is up on Azure.

My experience is that Azure blocks ICMP in the default firewall rules, and you
can’t change that. I think it’s bad, but that’s what Azure does.

Is there an alternative check that is recommended as an alternative
hostalive/ping? It was pretty clear that Icinga knew that the agent wasn’t
responding - is there an agent ping?

Yes: Icinga Template Library - Icinga 2
cluster-zone

https://icinga.com/docs/icinga-2/latest/doc/06-distributed-monitoring/
#cluster-zone-with-masters-and-agents

Alternatively you could use the SSH check, assuming your monitored machine
listens for SSH connections.

Antony.

connrs · January 19, 2022, 11:47am

Thanks @Pooh. I really should just get cluster-zone into my monitoring anyway because it looks invaluable. In addition to running it as a service, are you saying it works as check_command on a host object?

E.g.

object Host "client1.example.com" {
  check_command = "cluster-zone"

Pooh · January 19, 2022, 12:22pm

Any service check can be used as a check_command on a host object.

A service check will return value 0 if all is okay, or an integer greater than
zero for Warning (1), Critical (2), or Unknown (3).

The check_command for a Host object only looks for “zero or non-zero”,
threfore any Service check can be used, because an OK result means “Host is
Up” and a Warning, Critical or Unknown result means “Host is Down”.

You might just need to be a bit careful with alert thresholds to make sure
“Down” really does mean Down, rather than “a bit unhappy” (although this
applies to the standard ping check as well of course).

Antony.

connrs · January 19, 2022, 12:40pm

I feel a little dim for not realising this in the first place! I’ve been using Icinga for years and that didn’t click for me at all. Thanks

Edit: I want to point out that while @Pooh’s first reply is the solution, the second reply contains an insight that I was missing when reading the docs.

Tqnsls · January 19, 2022, 1:32pm

We just switched from hostalive / check_ping to check_icmp (from nagios-plugins-icmp), because this is very much faster. check_ping needing like 1 seconds or often more versus check_icmp mostly less then 0,5 seconds.
In huge environments this is an immense advantage