another question about downtime
How do you managed in real life a full “site” maintenance ? are you doing a downtime one by one on each host ?
another question about downtime
How do you managed in real life a full “site” maintenance ? are you doing a downtime one by one on each host ?
What do you mean with “site” maintenance?
### Windows PatchDay ###
apply ScheduledDowntime "windows-patch-day" to Host {
author = "icingaadmin"
comment = "PatchDay"
ranges = {
tuesday = "00:01-04:00"
wednesday = "00:01-04:00"
thursday = "00:01-04:00"
friday = "00:01-04:00"
}
assign where host.vars.os == "windows"
ignore where match ("*omni-esx*", host.name)
}
apply ScheduledDowntime "windows-patch-day" to Service {
author = "icingaadmin"
comment = "PatchDay"
ranges = {
tuesday = "00:01-04:00"
wednesday = "00:01-04:00"
thursday = "00:01-04:00"
friday = "00:01-04:00"
}
assign where host.vars.os == "windows"
ignore where match ("*omni-esx*", host.name)
}
for a full-site downtime, which occurs only once, I would place a scheduled downtime which triggers only one time, applied to all Hosts and services:
apply ScheduledDowntime "rip datacenter" to Host{
autor = "bofh"
comment = "the big shutdown"
ranges = { "2038-01-19" = "00:00-23:59" }
assign where host.name
}
apply ScheduledDowntime "rip datacenter" to Service {
autor = "bofh"
comment = "the big shutdown"
ranges = { "2038-01-19" = "00:00-23:59" }
assign where host.name
}
If you dont know the end of maintenance you can disable the notifications via icingacli
I’d simple stop icinga2.service.
I use the API to quickly downtime hostgroups (hey I think I just found a feature request). Are we talking like a multi-datacenter environment where you’re working specifically on one?
If it’s a small infrastructure that you can keep your eye on, you can toggle off notifications globally until you’re done so it doesn’t blow up your email/pager, but I definitely discourage that with large infrastructure so you’re not missing anything important.
yeah we are tallking about a multi “datacenter” environment.
We are having 11 remote “site” that we monitored, so stop icinga2 service as @rsx said, it’s not possible, and turning off notification globally it’s not also possible
I think we will check to use an apply ScheduleDowntime
Something to be aware of, although loading a one time downtime object in Icinga and reloading is totally fine, if you’ve got like a scheduled weekly maintenance and think to set it on a recurring schedule in there, it can go kaboom when you have a ton of stuff.
Robert Sturm’s example is safe apart from the author typo.
Just some ideas i did in the past.
a non active check on the satellite and put a dependancy for all host/services in that zone which will disable notifications. then you can manualy disable notifications for that zone by setting the service to critical.
Use a script/webpage wich will set a (scheduled) downtime for all hosts/services in a zone via api.
use scheduled downtimes
remove the zone temporarily from both masters (comment it out only).
I really like the idea of the non-active check… as we are having ONE satellite that we managed PER physical site, it will be a nice idea to have a passive check called “Site Maintenance” for example…
Can you elaborate a little bit more about dependancy/notification for this case ?
I haven’t played with it yet, but the details are here:
https://icinga.com/docs/icinga2/latest/doc/09-object-types/#dependency
it’s basically the same thing as when all services are downtimed when a host object goes down. I keep meaning to do this associating my vms with their host servers, but I’d need a clean way to keep track it.
Its simple, create a virtual host in icinga per zone with ‘dummy’ as host check and then apply 2 dependancies to all hosts/services in that zone to the virtual host (lets name it “thebutton-zonename”)
apply Dependency "disablenotifications" to Service {
parent_host_name = "thebutton-zonename"
disable_notifications = true
assign where host.name != "thebutton-zonename" && host.zone == "zonename"
}
apply Dependency "disablenotifications" to Host {
parent_host_name = "thebutton-zonename"
disable_notifications = true
assign where host.name != "thebutton-zonename" && host.zone == "zonename"
}
The virtual host could look like:
object Host "thebutton-zonename" {
import "generic-host"
check_command = "dummy"
interval = 0
retry_interval = 0
max_check_attempts = 1
enable_active_checks = false
enable_passive_checks = true
enable_perfdata = false
dummy_text = "Notifications for " + zone + " are enabled."
address = "127.0.0.1"
}
If you now set the host manualy to down, no host/service should send a message from that zone anymore.