Hi all,
checking HA services is sometimes troublesome in icinga, since nagios has got the straight host->service dependency. I search for a solution, monitoring the Set of servers behind the service, but only alerting, if there is less than the minimal amount of servers available.
E.g: You have a HA service, which is served by 10 servers. It’s important to have at least 3 servers
servers_avail < 3 => CRIT
servers_avail < 6 => WARN
servers_avail > 6 => OK
Of course, you can just monitor the service itself and the hosts separately. But I don’t want to get alarmed for a single server, and I want to get alarmed before all servers are already down.
All I can see, the only approach is based on multi-layer checks:
- https://github.com/Icinga/icingaweb2-module-businessprocess
- https://icinga.com/blog/2021/04/29/calculating-a-state-over-mutliple-services/
I’m missing something, which does not do this with another additional check. I’d like to see this inside of Icinga2 itself. So that if 5 servers go down, Icinga immediately sets the error condition and not after re-running some BPM checks.
I’ve seen the feature of Redundancy Groups, but as far as I understand, this covers only dependencies
There are already ServiceGroups and HostGroups which group single services/hosts. Personally, I’ve never used such stuff, since they cannot get nested and do not have further functionality.
Apart from that, wouldn’t it be nice to have there some availability checks included? So that you could specify the minimal amount of members with state OK?
Something like:
object ServiceGroup "prod-http-api" {
display_name = "HTTP Service API Production"
assign where host.vars.environment == "prod" and service.check_command == "http-api"
available_crit = 3
available_warn = 5
available_ok = 6
}
So that, the ServiceGroup will emit a warning, whenever there are too less Members in State OK.
Kind Regards,
Bene