Time based apply for Scheduled Downtime

breml · June 3, 2019, 8:01am

In our setup we would like to add new systems to Icinga2 during the roll out phase, with a defined date, when monitoring of said host should start. Because the end date of the pre production phase is different for each host, I prefer to not configure a scheduled downtime for each host.
My idea is to have a general downtime which is then applied until the end date of the pre production phase. Currently I have the following solution in place:

object Host "downtimetest-until-20190604" {
        import "generic-host"
        display_name = "downtimetest 2019-06-04"
        address = "localhost"
        vars.pre_production_downtime_until = DateTime(2019, 06, 04)
}

apply ScheduledDowntime "pre-production-downtime" to Host {
  author = "icingaadmin"
  comment = "Scheduled downtime for pre production stage"

  ranges = {
    "monday"    = "00:00-24:00"
    "tuesday"   = "00:00-24:00"
    "wednesday" = "00:00-24:00"
    "thursday"  = "00:00-24:00"
    "friday"    = "00:00-24:00"
    "saturday"  = "00:00-24:00"
    "sunday"    = "00:00-24:00"
  }

  // Check if pre_production_downtime_until is set and if it is still valid (later than now)
  assign where host.vars.pre_production_downtime_until && DateTime(get_time()) < host.vars.pre_production_downtime_until
}

This works, but I need to reload the configuration on a regular basis, because the assign instructions are only evaluated during the load of the configuration. Is there a way to have dynamic parts in the configuration, that are evaluated on demand?

Is there a better way to achieve the goal.

dnsmichi · June 3, 2019, 9:44am

Hi,

the first question to be answered - should the checks run again these “not in production yet” hosts and services?

Create a TimePeriod object which only allows ranges after 4.6.2019.

If checks should be denied, use check_period on the host/service object. Hide that inside a template, and remove it later (also works inside the Director).
If checks are allowed and only notifications should be suppressed, use the period attribute on the notification object

If you’re looking into downtimes, better use runtime created downtimes via the REST API. A small script running in a cronjob can always check whether a host/service already is in a downtime (downtime_depth) and if not, schedule one. That way you do not depend on reloading the configuration.

Programmatic examples can be found in the API docs.

Cheers,
Michael

breml · June 3, 2019, 10:10am

Hi @dnsmichi

Thanks for your reply.

In our case we want to run the checks against these hosts in order to see the results in the Icinga web ui, but we want to suppress the notifications. The reason is, that in order to bring these hosts online, a joint effort of different parties (infrastructure, network, etc) is necessary, but the setup of the monitoring should be independent of the progress of the different parties.

I am looking for a solution, that does not need any (manual) change/cleanup to the configuration once the pre production period is over. This is my problem with the TimePeriods per case, because I need to clean them up later whereas the scheduled downtimes are managed by icinga itself.

I will think about the idea to use a small script with the REST API.

Thanks again.

Regards,
Lucas

dnsmichi · June 3, 2019, 11:26am

Hi,

here’s a little helper thingy I discovered and documented lately, this should help with a simple bash script

Cheers,
Michael