Minimal Available Members in Service Groups

Hi all,

checking HA services is sometimes troublesome in icinga, since nagios has got the straight host->service dependency. I search for a solution, monitoring the Set of servers behind the service, but only alerting, if there is less than the minimal amount of servers available.

E.g: You have a HA service, which is served by 10 servers. It’s important to have at least 3 servers

servers_avail < 3 => CRIT
servers_avail < 6 => WARN
servers_avail > 6 => OK

Of course, you can just monitor the service itself and the hosts separately. But I don’t want to get alarmed for a single server, and I want to get alarmed before all servers are already down.

All I can see, the only approach is based on multi-layer checks:

I’m missing something, which does not do this with another additional check. I’d like to see this inside of Icinga2 itself. So that if 5 servers go down, Icinga immediately sets the error condition and not after re-running some BPM checks.

I’ve seen the feature of Redundancy Groups, but as far as I understand, this covers only dependencies

There are already ServiceGroups and HostGroups which group single services/hosts. Personally, I’ve never used such stuff, since they cannot get nested and do not have further functionality.

Apart from that, wouldn’t it be nice to have there some availability checks included? So that you could specify the minimal amount of members with state OK?

Something like:

object ServiceGroup "prod-http-api" {
  display_name = "HTTP Service API Production"
  assign where host.vars.environment == "prod" and service.check_command == "http-api"
  available_crit = 3
  available_warn = 5
  available_ok = 6

So that, the ServiceGroup will emit a warning, whenever there are too less Members in State OK.

Kind Regards,

Here you’ll find a suitable blog post and here a discussion about it.

Thank you. The article you’ve linked is indeed the same I’ve referenced already. But as mentioned, this is just another layer of services derived by others.

I wonder, whether there is a possibility to remove the layering especially to remove the delay between runs.

As far as I understand, the existing ServiceGroup objects could serve this need, but require an enhancement of icinga2 core.