Make host problem messages in Icinga2 custom for Hostgroups

lobr · July 22, 2021, 10:23am

Hello Icinga community,

I am looking for some help on a Icinga2 monitoring issue:

The issue:
We are monitoring a huge amount of floating IP machines, which basically respond naturally some package loss and high ping. Since I don’t want hundreds of VMs to be reported as down in Icinga2 Web, it would be helpful that specific hosts would go to critical state after a set amount of checks.

The goal:
Checks on a specific host group (floating IP hosts) should stay in warning state for 3 check attempts, even if a package loss or high ping is given. After more than 3 check attempts where package loss is > 20% and ping is > 500 ms, those hosts should get the critical state.

Is it possible to customize the severity of an host check in Icinga2 and how could this be implemented?
Thank you in advance!

dgoetz · July 22, 2021, 11:03am

Yes you can use custom variable to set different thresholds.

So I would add another host template which imports the normal one but is only used for the floating IP hosts which sets the custom variables of hostalive as needed.

If you have also an additional ping check assigned to the host, ensure that the custom variables for it are also explicitly set as it will inherit from the host object if not.

Pooh · July 22, 2021, 11:12am

Have a look at the definition of the hostalive check:

https://icinga.com/docs/icinga-2/latest/doc/10-icinga-template-library/
#hostalive

and also the parameters for host checking:

https://icinga.com/docs/icinga-2/latest/doc/09-object-types/#host

The settings you want to adjust are:

max_check_attempts - The number of times a host is re-checked before changing
into a hard state. Defaults to 3.

ping_wrta - The RTA warning threshold in milliseconds. Defaults to 3000.
ping_wpl - The packet loss warning threshold in %. Defaults to 80.
ping_crta - The RTA critical threshold in milliseconds. Defaults to 5000.
ping_cpl - The packet loss critical threshold in %. Defaults to 100.

So, for the values you asked about, you would set:

max_check_attempts=3
ping_wrta=whatever RTA value you want to be warned about
ping_wpl=whatever packet loss %age you want to be warned about
ping_crta=500
ping_cpl=20

Note that these checks are combined, meaning thatif either exceeds the
warning threshold, you will get a wanring; if either exceeds the critical
threshold, you will get a critical alert.

You cannot specify “alert only if both RTA is high and packet loss is high”.

I hope that helps,

Antony.

lobr · July 26, 2021, 12:55pm

Thank you @dgoetz and @Pooh that helped me a lot!

One last thing:
How is the package loss calculated?
The package loss percent rates that occur the most are 16% and 28%.
How many packages are send by each ping?
Is there any variable to customize the amount of packages send by each ping?

dgoetz · July 26, 2021, 1:11pm

Yes, this is also configurable by custom variable ping_packets, which should be by default 5. (What makes we wonder about 16% and 28%)

lobr · July 26, 2021, 1:35pm

Okay, I did set the variable vars.ping_packets = 5 inside the host template, so I was expecting package loss of 20%, 40%, 60%, …
But it is still 16%, 28%, 37%, …