Full site maintenance / downtime

another question about downtime :wink:

How do you managed in real life a full “site” maintenance ? are you doing a downtime one by one on each host ?

What do you mean with “site” maintenance?

### Windows PatchDay ###

apply ScheduledDowntime "windows-patch-day" to Host {
        author =        "icingaadmin"
        comment =       "PatchDay"

  ranges = {
    tuesday =   "00:01-04:00"
    wednesday = "00:01-04:00"
    thursday =  "00:01-04:00"
    friday =    "00:01-04:00"
  }
    assign where host.vars.os == "windows"
	ignore where match ("*omni-esx*", host.name)
}

apply ScheduledDowntime "windows-patch-day" to Service {
        author =        "icingaadmin"
        comment =       "PatchDay"

  ranges = {
    tuesday =   "00:01-04:00"
    wednesday = "00:01-04:00"
    thursday =  "00:01-04:00"
    friday =    "00:01-04:00"
  }
    assign where host.vars.os == "windows"
	ignore where match ("*omni-esx*", host.name)
}

for a full-site downtime, which occurs only once, I would place a scheduled downtime which triggers only one time, applied to all Hosts and services:

apply ScheduledDowntime "rip datacenter" to Host{
  autor = "bofh"
  comment = "the big shutdown"
  ranges = { "2038-01-19" = "00:00-23:59" }
  assign where host.name
}
apply ScheduledDowntime "rip datacenter" to Service {
  autor = "bofh"
  comment = "the big shutdown"
  ranges = { "2038-01-19" = "00:00-23:59" }
  assign where host.name
}
1 Like

If you dont know the end of maintenance you can disable the notifications via icingacli

I’d simple stop icinga2.service.

I use the API to quickly downtime hostgroups (hey I think I just found a feature request). Are we talking like a multi-datacenter environment where you’re working specifically on one?

If it’s a small infrastructure that you can keep your eye on, you can toggle off notifications globally until you’re done so it doesn’t blow up your email/pager, but I definitely discourage that with large infrastructure so you’re not missing anything important.

yeah we are tallking about a multi “datacenter” environment.
We are having 11 remote “site” that we monitored, so stop icinga2 service as @rsx said, it’s not possible, and turning off notification globally it’s not also possible :wink:

I think we will check to use an apply ScheduleDowntime

Something to be aware of, although loading a one time downtime object in Icinga and reloading is totally fine, if you’ve got like a scheduled weekly maintenance and think to set it on a recurring schedule in there, it can go kaboom when you have a ton of stuff.

Robert Sturm’s example is safe apart from the author typo.

Just some ideas i did in the past.

  • a non active check on the satellite and put a dependancy for all host/services in that zone which will disable notifications. then you can manualy disable notifications for that zone by setting the service to critical.

  • Use a script/webpage wich will set a (scheduled) downtime for all hosts/services in a zone via api.

  • use scheduled downtimes

  • remove the zone temporarily from both masters (comment it out only).

1 Like

@anon66228339

I really like the idea of the non-active check… as we are having ONE satellite that we managed PER physical site, it will be a nice idea to have a passive check called “Site Maintenance” for example…

Can you elaborate a little bit more about dependancy/notification for this case ?

I haven’t played with it yet, but the details are here:
https://icinga.com/docs/icinga2/latest/doc/09-object-types/#dependency

it’s basically the same thing as when all services are downtimed when a host object goes down. I keep meaning to do this associating my vms with their host servers, but I’d need a clean way to keep track it.

Its simple, create a virtual host in icinga per zone with ‘dummy’ as host check and then apply 2 dependancies to all hosts/services in that zone to the virtual host (lets name it “thebutton-zonename”)

apply Dependency "disablenotifications" to Service {
  parent_host_name = "thebutton-zonename"
  disable_notifications = true

  assign where host.name != "thebutton-zonename" && host.zone == "zonename"
}

apply Dependency "disablenotifications" to Host {
  parent_host_name = "thebutton-zonename"
  disable_notifications = true

  assign where host.name != "thebutton-zonename" && host.zone == "zonename"
}

The virtual host could look like:

object Host "thebutton-zonename" {
  import "generic-host"

  check_command = "dummy"
  interval = 0
  retry_interval = 0
  max_check_attempts = 1
  enable_active_checks = false
  enable_passive_checks = true
  enable_perfdata = false

  dummy_text = "Notifications for " + zone + " are enabled."
                                                                                                                                                                                                                   
  address = "127.0.0.1"
                                                                                                                                                                                                                   
}

If you now set the host manualy to down, no host/service should send a message from that zone anymore.

2 Likes