How do I suppress service notifications [UNKNOWN] for couple of customer to respective practices

Software:
icinga2 - The Icinga 2 network monitoring daemon (version: 2.11.8)
Director plugin: 1.6.1

System information:
Platform: CentOS Linux
Platform version: 7 (Core)
Kernel: Linux
Kernel version: 3.10.0-1160.15.2.el7.x86_64
Architecture: x86_64

Build information:
Compiler: GNU 4.8.5
Build host: runner-hh8q3bz2-project-322-concurrent-0

Question: How do I suppress service notifications in the state of UNKNOWN ?

I have been supporting infrastruction monitoring for couple of customers. I have a requirement to disable UNKNOWN for few customers and for specific teams.

Following is my sample senario and requirment.
Customer - A : LINUX, VMware, Network - Need to allow UNKNOWN for these teams only
Support Team for Customer - A need to recieve all alerts.

Customer - B : LINUX, VMware, Network - Need to allow UNKNOWN for these teams only
SAP, ORACLE-DB, WINDOWS doesn’t require UNKNOWN do not want to allow for teams
Support Team for Customer - B need to recieve all the alerts.

I have created a separte templates for each:
For example:

Support team template:
HOST =>
apply Notification “Linux_Host_Disable_Unknown” to Host {
import “default-host-notification”

assign where host.vars.os == "linux"
users = [ "linux.support" ]

}

Services =>
apply Notification “Linux_Services_Disable_Unknown” to Service {
import “default-service-notification”

assign where host.vars.os == "linux"
users = [ "linux.support" ]

}

Customer Team Template :
HOST =>
apply Notification “CUST_Linux_Host_Disable_Unknown” to Host {
import “default-host-notification”

assign where host.name == "host1" || match("host2*", host.name) || host.name == "host3" || host.name == "host4" || host.name == "host5"
states = [ Down, Up ]
types = [ Custom, Problem, Recovery ]
users = [ "cust.linux.support" ]

}

Service =>

apply Notification “CUST_Linux_Services_Disable_Unknown” to Service {
import “default-service-notification”

assign where host.name == host1" || match("host2*", host.name) || host.name == "host3" || host.name == "host4" || host.name == "host5"
states = [ Critical, OK, Warning ]
types = [ Custom, Problem, Recovery ]
users = [ "cust.linux.support" ]

}

Our customer TAC does not act on any unknown (or warning) for ALL customers, so our approach is a little different.

In your description, it looks like you’re on the right path, but I would ensure you have all of the following:

  • Service Group via apply rule that is for disabled unknowns
  • Host Group via apply rule that is for disabled unknowns
  • Notification Apply rule that disables Unknown alerts for hosts/services (you may use a notification template as you have above to import for this rule)

Unfortunately, we use the director since we automate the import, group assignment, and service assignment for ~25k services (nevermind the host count), so I don’t have a good .conf file example on doing the assignments.

Basically you want to first create a notification template that does not include the unknown state. From there, you want to create both a service group and a host group and assign said hosts/services to the group with the method of your choosing (again, custom variables are the way to go, but manual assignment works if your environment is tiny but if your environment is small or larger, then it’s time to use custom variables.

After this, create your notification apply rule for excluding unknowns, import the template to the rule, and assign where the group name is linux_host_disable_unknown and linux_services_disable_unknown. You will have a second notification apply rule that applies to either:

  • hosts/services NOT in the disable group
  • hosts/services that are in an ENABLE group

Just looking at your config, it looks like your most of the way there, or possibly all of the way there, I would just do a sanity check on everything.

assign where host.name == "host1" || match("host2*", host.name) || host.name == "host3" || host.name == "host4" || host.name == "host5"
states = [ Down, Up ]
types = [ Custom, Problem, Recovery ]
users = [ "cust.linux.support" ]

Using the service/host groups will allow you to not have to filter via the hostname – you can simply choose your own path to get them into the group (ie, using a custom variable, or just manually entering in the name into the host/service groups)

After all of this is sorted out, you can take it a step further in Icingaweb by introducing a custom menu option in /etc/icingaweb2/navigation/menu.ini that filters out all of these as well.

You can do something that’s like “show all unhandled customer problems” that filters out unknowns for the service group, but keeps the unknowns for the others. Here is a slightly modified example that we use live in our environment:

# THERE COULD BE TYPOS -- this is meant for you to modify to your needs

# First we have to create the top level menu option
[CTAC]
name = "CTAC"
users = "yes" # any icingaweb users
groups = "also_yes" # any icingaweb groups, note: not notification groups
type = "menu-item"
icon = "error.png"
priority = "1" # This places the menu option at the very top of the right-hand pane

# Next we can add in the link that filters out what we don't want
# This particular one is only for services, but can be modified to include hosts, or you can use another menu option for hosts
[CTAC-Unhandled Services Not Unknown]
name = "Unhandled Critical Services"
users = "yes" # any icingaweb users
groups = "also_yes" # any icingaweb groups, note: not notification groups
type = "menu-item"
target = "_main" # this opens the link up in the same tab
url = "monitoring/list/services?service_in_downtime=False&service_acknowledged=0&(service_state=2)&modifyFilter=0&limit=100&sort=host_display_name&service_hard_state=2&((hostgroup=Linux_host_disable_unknown)|(hostgroup=Linux_host_enable_unknown))"
owner = "root"
Xicon = "error.png"
priority = "1" # First option
parent = "CTAC" # Assigns this sub option to the CTAC option defined above


# Last, we can add in the link that filters out what we don't want. Unfortunately, I don't know a good way off hand, but you can probably do something to the likes of "include hosts/services in the enable group | services that are in critical/warning
# This particular one is only for services, but can be modified to include hosts, or you can use another menu option for hosts
[CTAC-Unhandled Services]
name = "Unhandled Critical Services"
users = "yes" # any icingaweb users
groups = "also_yes" # any icingaweb groups, note: not notification groups
type = "menu-item"
target = "_main" # this opens the link up in the same tab
url = "monitoring/list/services?service_in_downtime=False&service_acknowledged=0&(service_state=2|service_state=3)&modifyFilter=0&limit=100&sort=host_display_name&service_hard_state=2&((hostgroup=Linux_host_disable_unknown)|(hostgroup=Linux_host_enable_unknown))"
owner = "root"
Xicon = "error.png"
priority = "2" # Second option
parent = "CTAC" # Assigns this sub option to the CTAC option defined above
1 Like

@steaksauce
I appreciate for your time efforts on the detailed explanation.
Let me go through the steps once again along with your supporting points and will update you on the same.