Automated downtime for new hosts and services

djones · September 27, 2019, 4:56pm

Anyone doing this so new hosts/services don’t alert if there is a problem to work out with new additions? Today, I push out new checks via the Director then go find the host/service and put in sched downtime quickly before any false alerts happen.

I suppose a log file could be monitored with swatch to trigger a script that does this via the API but I bet there is a better way to do this.

What do others do about new hosts/services to prevent false alerts? Different host/service groups or contacts?

UMax · September 30, 2019, 7:20pm

I’m also looking for that feature…

anon66228339 · September 30, 2019, 10:02pm

I made a script to set automatic downtime via eventhandler. Its not for new hosts only but you can give it a try. After hosts are working just remove the eventhandler from the host object.

UMax · October 1, 2019, 3:32am

Thanks man! Looks good! Any ideas how to get a list of a recently added hosts?

djones · October 1, 2019, 2:42pm

The script looks amazing. Nice work. I am a little confused as to how it would be used with an eventhandler like this. Maybe I don’t have enough experience with eventhandlers but I think what you are suggesting is to create all new hosts with this eventhandler and then when it goes green after a “bake in” period, remove the eventhandler. Unless I am missing something, automating this still seems like some key pieces are missing.

I was thinking about some kind of deploy-hook feature that could allow a script to launch. The script could check the deploy details from the database and apply a specified fixed downtime to the new hosts and services.

I guess I could poll the director db for deployments but that’s not very efficient. I have had a lot of success using swatch to execute scripts based on log files so maybe that would be my approach for now and use Python with the icinga2api.client module which makes using the API very simple.

Still seems like this would be a common feature that others would find useful to set downtime on new hosts/services to give time to fix any networking/firewall/nrpe/nsclient++/SNMP/certificate/etc issues before alerting.

Thanks,
Dave

ETMA · October 2, 2019, 11:55am

I would probably setup something like this:

downtimes.conf:
apply ScheduledDowntime “NewHost-downtime” to Host {
author = “YourName”
comment = “Scheduled downtime for new hosts”

ranges = {
monday = “00:00-23:59”
tuesday = “00:00-23:59”
wednesday = “00:00-23:59”
thursday = “00:00-23:59”
friday = “00:00-23:59”
saturday = “00:00-23:59”
sunday = “00:00-23:59”
}

assign where host.vars.ignoreDowntime == true
}

Then set the ignoreDowntime field to true when creating the object (or on the template, if using director). And once your want the downtime to actually be taken serious, set it to false.

djones · October 2, 2019, 4:42pm

I would like this to be 100% automated as I don’t manually add/remove hosts in the Director. We have various Anisble playbooks and import/syncs in the Director that are managed by other teams outside of the Director.

Would we change the default template to have host.vars.ignoreDowntime == true, then use the API later to set the value false on the host/service later after it’s “ready”?

ETMA · October 3, 2019, 8:33am

That is probably what I would do. But then again, I am fairly new to Icinga and I have limited servers I manage.

If there is a downside to my idea, I am sure someone will point it out

Duffkess · September 15, 2023, 3:10pm

I just found this thread beacuse I had the same issue.
We are also using director to synchronize everything from a cmdb solution.

this is what I ended up with now:

host import is running, so it creates the imported data for all hosts, source is named “xx cmdb_prod hosts xx”
I created another import source doing the following query:

select 	
	imported_row.object_name as host,
    UNIX_TIMESTAMP() as begin,
    CONCAT(
		'{"',
        DATE_FORMAT(NOW(), '%Y-%m-%d'), 
        '":"', 
        DATE_FORMAT(NOW(), '%H:%i'), 
        '-', 
        DATE_FORMAT(NOW() + INTERVAL 1 HOUR, '%H:%i'),
        '"}'
    ) AS ranges
from imported_rowset_row 
inner join imported_row on imported_rowset_row.row_checksum = imported_row.checksum
left join icingadb.host on imported_row.object_name = icingadb.host.name
where rowset_checksum = 
(	select rowset_checksum from import_source
	inner join import_run on import_source.id = import_run.source_id
	where import_source.source_name like '%cmdb_prod hosts%'
	order by start_time desc
	limit 1
)
and icingadb.host.id is null

( this query uses the director database (as source) and the icingadb, which in most cases are running on the same mysql server, I guess…)

I added a modifier to decode the ranges as json string
this will give me all hosts that are currently imported but not yet deployed to icinga, so only new hosts where the initial downtime should be created

a sync rule is creating all the host objects
a new sync rule creates the scheduled downtimes

the settinggs are: merge, delete
properties:

source                              destination
icinga director                     author
${ranges_decoded}                   ranges
auto generated on new hosts         comment
apply                               object_type
Host                                apply_to
host.name = "${host}"               assign_filter
1                                   fixed
1                                   with_services

this way the downtime gets created in the initial sync but is then deleted on the next sync when the object is created and then not there in the import source anymore. the downtime object still exists untill it is finished.
I tested it on a view scenarios and it works quite well for me.