Mass Override Service Variables in Director

I’m running a distributed monitoring setup with the following modules:

Background:
Up till recently we had hosts which were flagged for after-hours notifications and those that were not. My organization is wanting to pivot and enable after-hours notifications for individual services in addition to hosts. Almost all services are applied to hosts via their service sets. I have a “Linux” service set that applies to objects in a “Linux” hostgroup. These amount to ~1000 services.

The Problem:
My existing configuration does not have a custom var/field for this on any service template for the apply-to rules of notifications to look for. We also utilize service sets wherever we can to automatically add baseline services to hosts when added to our inventory. Hosts already had this var due to how we presently send after-hours notifications. In nagios, we defined timeperiods directly in the service definition. Unless I’ve misunderstood something, Icinga2 does the opposite and wants to apply notification configurations based on the apply-to rules exclusively.

I was wondering what the shortest path forward is to get from where I am:
notifying for hosts/services based off of a host variable, meaning we notify for all services on an “after-hours” host.

To: configuration where I can easily disable/enable notifications on an individual service basis, enabling after-hours notification on a service-by-service basis.

Potential Option One:
My theory before being able to do any testing, is that I can add the field to a root service template like Generic-Service. I have configured this template to have the boilerplate settings for our normal checks that then are imported into each service template. I can then override the template as needed on a per-service basis (most services do not require after-hours notifications to be enabled). However, I believe if I am to do this, it would result in me having to individually override every single service on every single host we already have deployed. (see: a hellacious amount of work) I’m hoping I can use something like:
icingacli director service set * --vars.afterhours_notifications 'true'

If not, I’m hoping there is another means of leveraging the director/icinga APIs to mass change this service variable on these applied service set services. The biggest hitch of this one is the going back and updating existing services. This requires a retroactive change on ~1000 services.

Potential Option Two:
I duplicate the Generic-Service template, naming it “Generic-Service (After-Hours)”. Then import this into a fork of the templates, one after-hours, one standard business hours. I really hate this option as it will surely break existing overrides due to services applying differently/with new names.

If there are better paths forward, I’m all ears. I’m trying to avoid sinking several hours into clicking around a web interface changing a boolean from false to true on ~1000 services.

Thank you for your time, I understand this is a bit of an essay. I’m hoping I’m just overcomplicating my problem.

I think you have to differentiate between check periode (set via service templates) and notification period (set via notification template or notification apply rule). In addition individual users can also have a notification time.

Depending on your needs you can run your service checks only during business hours or check all services 24x7. User notifications are independent, so you can set up 24x7 checks, but notifications are sent out only during business hours.

In your case it could be an option that service checks are 24x7, but notification apply rules differentiate via custom host variable between different service/host notification periods.

If I don’t misunderstood you here you would only have to edit the service templates, not every single service in each set.

I’d say we currently use your option one:
generic template
image
specific service template
image
service in service set

hosts also have these variables

and then we have these two apply rules configured in the IcingaDSL, as the Director can’t do some of the stuff.

apply Notification "notification-jira-service" to Service {
        import "jira-service-notification-template"

        if (host.vars.ticketsystem == "x") {
                users = [ "x-ticketsystem" ]
        } else {
                users = [ "y-ticketsystem" ]
        }


        # Benachtichtigungs-Delay setzen, sofern es im Service konfiguriert wurde
        if (service.vars.notification_delay) {
                times.begin = service.vars.notification_delay
        }


        if ( number(DateTime().format("%H")) < 8 || number(DateTime().format("%H")) > 18 ) {
                times.begin = 3600
        }
        ########################################
        # Notifizierungszeiträume
        ########################################

        # 24x7
        # Notifizere 24/7
        if (service.vars.notification_period == "24x7") {
                period = "24x7"
                assign where service.vars.notification_period == "24x7" && service.vars.notification_contact == ""

        # 8x5
        # Notifiziere 8x5
        } else if (service.vars.notification_period == "8x5") {
                period = "8x5"

                assign where service.vars.notification_period == "8x5" && service.vars.notification_contact == ""

        # Default
        # Fallback wenn keine der oberen Optionen gesetzt ist
        } else {
                period = "24x7"

                assign where true && service.vars.notification_contact == ""
        }


        ########################################
        # Ticketprioritäten
        ########################################
        # Auswirkung
        # hoch: 14903
        # mittel: 14904
        # niedrig: 14905
        #
        # Dringlichkeit
        # hoch: 14906
        # mittel: 14907
        # niedrig: 14908
        var jira_ticket_priority = get_ticket_priority(host, service)
        if (host.vars.ticketsystem == "..."){
                if (jira_ticket_priority == "1") {
                        vars.auswirkung = "10000"
                        vars.dringlichkeit = "10003"
                } else if (jira_ticket_priority == "3") {
                        vars.auswirkung = "10002"
                        vars.dringlichkeit = "10005"
                } else {
                        vars.auswirkung = "10001"
                        vars.dringlichkeit = "10004"
                }
        } else {
                if (jira_ticket_priority == "1") {
                        vars.auswirkung = "14903"
                        vars.dringlichkeit = "14906"
                } else if (jira_ticket_priority == "3") {
                        vars.auswirkung = "14905"
                        vars.dringlichkeit = "14908"
                } else {
                        vars.auswirkung = "14904"
                        vars.dringlichkeit = "14907"
                }
        }

        ########################################
        # Weitere Informationen zur ticket Description
        ########################################
        ...



        ########################################
        # Apply und Ignore rules
        ########################################
        # Deaktiviere alle Notifizierungen für diesen host (sowohl host und service notifizierungen)
        ignore where host.vars.notification_alerting == false

        # Deaktiviere die Notifizierung für diesen service
        ignore where service.vars.notification_period == "none"

        # Deaktivierung der Benachrichtigung für Hosts beginnend mit Q<Zahl> (Testsysteme)
        ignore where regex("^q[0-9].*", host.name)

        # Deaktivierung der Benachrichtigung für Terminal-Server
        #ignore where regex("^[\\w\\d]+-(d|p)[\\d]+-ts[\\d]+",host.name)
        ignore where regex("^[\\w\\d]+-d[\\d]+-ts[\\d]+",host.name)

        # Deaktivierung der Benachrichtigungen für Hosts mit Tag "noNotification::true"
        ignore where host.vars.tags.noNotification == "True"  || host.vars.tags.noNotification == "true"

        # Deaktivierung für Kunden-Checks
        ignore where regex("^customer[-_].*", service.name)

apply Notification "notification-jira-host" to Host {
        import "jira-host-notification-template"

        if (host.vars.ticketsystem == "x") {
                users = [ "x-ticketsystem" ]
        } else {
                users = [ "y-ticketsystem" ]
        }

        # Benachtichtigungs-Delay setzen, sofern es im Service konfiguriert wurde
        if (host.vars.notification_delay) {
                times.begin = host.vars.notification_delay
        }

        # Delay von 60 Minuten außerhalb der Kernarbeitszeit
        if ( number(DateTime().format("%H")) < 8 || number(DateTime().format("%H")) > 18 ) {
                times.begin = 3600
        }

        ########################################
        # Notifizierungszeiträume
        ########################################

        # 24x7
        # Notifizere 24/7
        if (host.vars.notification_period == "24x7") {
                period = "24x7"

                assign where host.vars.notification_period == "24x7" && host.vars.notification_contact == ""

        # 8x5
        # Notifiziere 8x5
        } else if (host.vars.notification_period == "8x5") {
                period = "8x5"

                assign where host.vars.notification_period == "8x5" && host.vars.notification_contact == ""

        # Default
        # Fallback wenn keine der oberen Optionen gesetzt ist
        } else {
                period = "24x7"

                assign where true && host.vars.notification_contact == ""
        }

        ########################################
        # Ticketprioritäten
        ########################################
        # Auswirkung
        # hoch: 14903
        # mittel: 14904
        # niedrig: 14905
        #
        # Dringlichkeit
        # hoch: 14906
        # mittel: 14907
        # niedrig: 14908
        var jira_ticket_priority = get_ticket_priority(host, null)

        if (host.vars.ticketsystem == "..."){
                if (jira_ticket_priority == "1") {
                        vars.auswirkung = "10000"
                        vars.dringlichkeit = "10003"
                } else if (jira_ticket_priority == "3") {
                        vars.auswirkung = "10002"
                        vars.dringlichkeit = "10005"
                } else {
                        vars.auswirkung = "10001"
                        vars.dringlichkeit = "10004"
                }

        } else {
                if (jira_ticket_priority == "1") {
                        vars.auswirkung = "14903"
                        vars.dringlichkeit = "14906"
                } else if (jira_ticket_priority == "3") {
                        vars.auswirkung = "14905"
                        vars.dringlichkeit = "14908"
                } else {
                        vars.auswirkung = "14904"
                        vars.dringlichkeit = "14907"
                }
        }


        ########################################
        # Weitere Informationen zur ticket Description
        ########################################
        ...

        ########################################
        # Apply und Ignore rules
        ########################################
        # Deaktiviere alle Notifizierungen für diesen host (sowohl host und service notifizierungen)
        ignore where host.vars.notification_alerting == false

        # Deaktiviere die Hostalive Notifizierung für diesen Host
        ignore where host.vars.notification_period == "none"

        # Deaktivierung der Benachrichtigung für Hosts beginnend mit Q<Zahl> (Testsysteme)
        ignore where regex("^q[0-9].*", host.name)

        # Deaktivierung der Benachrichtigung für Terminal-Server
        #ignore where regex("^[\\w\\d]+-(d|p)[\\d]+-ts[\\d]+",host.name)
        ignore where regex("^[\\w\\d]+-d[\\d]+-ts[\\d]+",host.name)

        # Deaktivierung der Benachrichtigungen für Hosts mit Tag "noNotification::true"
        ignore where host.vars.tags.noNotification == "True" || host.vars.tags.noNotification == "true"
}

hope this helps :slight_smile:

This is what I have today. I’m not looking to change check period. We graph performance data so checking 24/7 is mandatory. We have a rotating on-call list that uses the notification configuration items. I’m asking how to get from configuration A where I do not have a variable on services to match against to configuration B where I do, with the least amount of clicking through the webgui and setting a variable to true/false.

What you’ve outlined is what I aim to be using. However, I’m trying to move to that configuration with the lease amount of work. Right now I have a Generic-Service template that is inherited by just about every service at some level:

Because every service template has this same root template, I would need to fork these templates and maintain “after-hours generic-service” and “business-hours generic-service” templates. A vast majority of services in our inventory do not need to produce 24x7 notifications (~2300 services). I feel like this leaves me with:

  1. Create the new generic service template and maintain a list of after hours and business hours templates, each with their corresponding service.var.after-hours_notifications variable set. No overriding required but I would need to update imports on affected devices (may require redefining service overrides).

  2. Add the after-hours notifications field to the existing generic-service template and set it to false. Afterwards, I would override it on any services which I need after-hours notifications for (~1000)

To reiterate, my problem is less one of “how do I do this?” and more “How do I do this without spending hours adjusting templates/variables?”

Really, I’m looking for a reliable way of setting service overrides en masse. The only real downside to option 2 is the time spent overriding a service variable across my after-hours hosts. If this could be performed in a matter of seconds instead of hours, that would be ideal. Let me know if I’m overcomplicating this.

Not sure :wink:

Just to make sure I understand your structure correctly:
Do you have:

  • hosts with checks where some of those checks should notify 24x7 while others should not.
  • Or is it the hosts where the distinction is made, so host A and all its service notify 24x7 while host B is only 9x5?

If it is the first one:
Is it always the same services that should notify 24x7, so that you could add the “after hours notify” variable to the service template and set it to “yes”?
This would be easily doable. You could either adjust all the necessary templates via the Director UI with multi-edit (SHIFT/CTRL+Click) or via the CLI with icingacli director service set <service template> --json '{ "vars.after_hours": true }'

Director CLI modify: CLI - Icinga Director
Director CLI Boolean variables: CLI - Icinga Director

On the host level it would be similar, just adding the variable to the host template and then modify it for the various hosts. Could also be done via the Import&Sync if that is where your hosts come from.

If you determine the “after-hours” notification status for each host and each service individually I currently see no simple way. As far as I can see, the CLI only works for service templates, not the actual service objects on a host.

We currently define host/service notifications as in the method #2. We’re working on implementing #1.
Unfortunately, it is not the same services across all hosts. We would have sets of Linux agents where after-hours notifications would be configured for:

  • 22/22 services
  • 11/22 services
  • 0/22 services

I think these issues are steering me in another direction. I believe if I have a subset of devices to reconfigure to use or not use after-hours notifications, I’m going to have this same problem of needing to update a service override. I think I need to re-evaluate before going forward.