Host state UP/DOWN based on “host classes” defined by % or number of services states
Before you ask a question, you can check the troubleshooting documentation first, maybe you can find an answer here.
wasn’t able to find a way forward for my case.
Hi all,
How can we control the threshold to bring a host into DOWN or UP again based on the services of the hosts that report an OK, warning or critical state?
I am looking for some idea, guidance and suggestions. We have a medium setup and we were wondering how we can improve in terms of the notification. We have too many and sometimes important messages that get overseen and are slipping through our fingers because of less important messages flooding our mattermost channels. So we were looking for some kind of “host classes” that can prioritize hosts over others when it comes to the host state and by that the notification.
Example:
- ultra-high prio: DNS service hosts, network devices and storage server
- high prio: AD controller and HA services hosts (pair of two) in general
- medium prio: HA service hosts with more than two service hosts per HA group
- low prio: for example reachability checks (we use host objects with lists of objects to group end points to run checks as services across the group of end points)
So the idea was to say put host to DOWN (and notify) if:
- ultra-high prio: number of service in warning or critical > 1
- high prio: number of service critical > 1
- medium prio: %/number of services in warning or critical > 10%/3
- low prio: % of services in warning or critical > 50%
Any idea or links where we can see some examples or guidance?
################################
Give as much information as you can, e.g.
- Version used (
icinga2 --version
) - Operating System and version
icinga2 - The Icinga 2 network monitoring daemon (version: 2.13.2-1)
System information:
Platform: CentOS Linux
Platform version: 7 (Core)
Kernel: Linux
Kernel version: 3.10.0-1160.53.1.el7.x86_64
Architecture: x86_64
Build information:
Compiler: GNU 4.8.5
Build host: runner-hh8q3bz2-project-322-concurrent-0
OpenSSL version: OpenSSL 1.0.2k-fips 26 Jan 2017
- Enabled features (
icinga2 feature list
)
icinga2 feature list
Disabled features: command compatlog debuglog elasticsearch gelf graphite icingadb influxdb2 livestatus opentsdb perfdata statusdata syslog
Enabled features: api checker ido-pgsql influxdb mainlog notification
- Config validation (
icinga2 daemon -C
)
icinga2 daemon -C
[2023-03-22 11:16:52 +0100] information/cli: Icinga application loader (version: 2.13.2-1)
[2023-03-22 11:16:52 +0100] information/cli: Loading configuration file(s).
[2023-03-22 11:16:52 +0100] information/ConfigItem: Committing config item(s).
[2023-03-22 11:16:52 +0100] information/ApiListener: My API identity: #######.########.####.####.###
[2023-03-22 11:16:53 +0100] warning/ApplyRule: Apply rule 'ncpa_ctx_drive - ' (in /etc/icinga2/zones.d/global-templates/services_CTX.conf: 32:1-32:69) for type 'Service' does not match anywhere!
[2023-03-22 11:16:53 +0100] warning/ApplyRule: Apply rule 'ncpa_network_ctx_win_recv_ ' (in /etc/icinga2/zones.d/global-templates/services_CTX.conf: 75:1-75:71) for type 'Service' does not match anywhere!
[2023-03-22 11:16:53 +0100] warning/ApplyRule: Apply rule 'tcp-' (in /etc/icinga2/zones.d/global-templates/services/services.conf: 8:1-8:55) for type 'Service' does not match anywhere!
[2023-03-22 11:16:53 +0100] warning/ApplyRule: Apply rule 'ncpa_mnt_used_gb' (in /etc/icinga2/zones.d/global-templates/services/services.conf: 92:1-92:32) for type 'Service' does not match anywhere!
[2023-03-22 11:16:53 +0100] warning/ApplyRule: Apply rule 'ncpa_mem_used_swap' (in /etc/icinga2/zones.d/global-templates/services/services.conf: 168:1-168:34) for type 'Service' does not match anywhere!
[2023-03-22 11:16:53 +0100] warning/ApplyRule: Apply rule 'hlm-status' (in /etc/icinga2/zones.d/global-templates/services/services.conf: 405:1-405:26) for type 'Service' does not match anywhere!
[2023-03-22 11:16:53 +0100] information/ConfigItem: Instantiated 1 InfluxdbWriter.
[2023-03-22 11:16:53 +0100] information/ConfigItem: Instantiated 1 NotificationComponent.
[2023-03-22 11:16:53 +0100] information/ConfigItem: Instantiated 1 IdoPgsqlConnection.
[2023-03-22 11:16:53 +0100] information/ConfigItem: Instantiated 1 CheckerComponent.
[2023-03-22 11:16:53 +0100] information/ConfigItem: Instantiated 2 Users.
[2023-03-22 11:16:53 +0100] information/ConfigItem: Instantiated 1 UserGroup.
[2023-03-22 11:16:53 +0100] information/ConfigItem: Instantiated 3 ServiceGroups.
[2023-03-22 11:16:53 +0100] information/ConfigItem: Instantiated 3 TimePeriods.
[2023-03-22 11:16:53 +0100] information/ConfigItem: Instantiated 7310 Services.
[2023-03-22 11:16:53 +0100] information/ConfigItem: Instantiated 9 Zones.
[2023-03-22 11:16:53 +0100] information/ConfigItem: Instantiated 4 NotificationCommands.
[2023-03-22 11:16:53 +0100] information/ConfigItem: Instantiated 143 HostGroups.
[2023-03-22 11:16:53 +0100] information/ConfigItem: Instantiated 8303 Notifications.
[2023-03-22 11:16:53 +0100] information/ConfigItem: Instantiated 1 IcingaApplication.
[2023-03-22 11:16:53 +0100] information/ConfigItem: Instantiated 982 Hosts.
[2023-03-22 11:16:53 +0100] information/ConfigItem: Instantiated 8 Endpoints.
[2023-03-22 11:16:53 +0100] information/ConfigItem: Instantiated 20 Comments.
[2023-03-22 11:16:53 +0100] information/ConfigItem: Instantiated 1 FileLogger.
[2023-03-22 11:16:53 +0100] information/ConfigItem: Instantiated 2 ApiUsers.
[2023-03-22 11:16:53 +0100] information/ConfigItem: Instantiated 293 CheckCommands.
[2023-03-22 11:16:53 +0100] information/ConfigItem: Instantiated 1 ApiListener.
[2023-03-22 11:16:54 +0100] information/ScriptGlobal: Dumping variables to file '/var/cache/icinga2/icinga2.vars'
[2023-03-22 11:16:54 +0100] information/cli: Finished validating the configuration file(s).