Anyone doing this so new hosts/services don’t alert if there is a problem to work out with new additions? Today, I push out new checks via the Director then go find the host/service and put in sched downtime quickly before any false alerts happen.
I suppose a log file could be monitored with swatch to trigger a script that does this via the API but I bet there is a better way to do this.
What do others do about new hosts/services to prevent false alerts? Different host/service groups or contacts?
I made a script to set automatic downtime via eventhandler. Its not for new hosts only but you can give it a try. After hosts are working just remove the eventhandler from the host object.
The script looks amazing. Nice work. I am a little confused as to how it would be used with an eventhandler like this. Maybe I don’t have enough experience with eventhandlers but I think what you are suggesting is to create all new hosts with this eventhandler and then when it goes green after a “bake in” period, remove the eventhandler. Unless I am missing something, automating this still seems like some key pieces are missing.
I was thinking about some kind of deploy-hook feature that could allow a script to launch. The script could check the deploy details from the database and apply a specified fixed downtime to the new hosts and services.
I guess I could poll the director db for deployments but that’s not very efficient. I have had a lot of success using swatch to execute scripts based on log files so maybe that would be my approach for now and use Python with the icinga2api.client module which makes using the API very simple.
Still seems like this would be a common feature that others would find useful to set downtime on new hosts/services to give time to fix any networking/firewall/nrpe/nsclient++/SNMP/certificate/etc issues before alerting.
Then set the ignoreDowntime field to true when creating the object (or on the template, if using director). And once your want the downtime to actually be taken serious, set it to false.
I would like this to be 100% automated as I don’t manually add/remove hosts in the Director. We have various Anisble playbooks and import/syncs in the Director that are managed by other teams outside of the Director.
Would we change the default template to have host.vars.ignoreDowntime == true, then use the API later to set the value false on the host/service later after it’s “ready”?
I just found this thread beacuse I had the same issue.
We are also using director to synchronize everything from a cmdb solution.
this is what I ended up with now:
host import is running, so it creates the imported data for all hosts, source is named “xx cmdb_prod hosts xx”
I created another import source doing the following query:
select
imported_row.object_name as host,
UNIX_TIMESTAMP() as begin,
CONCAT(
'{"',
DATE_FORMAT(NOW(), '%Y-%m-%d'),
'":"',
DATE_FORMAT(NOW(), '%H:%i'),
'-',
DATE_FORMAT(NOW() + INTERVAL 1 HOUR, '%H:%i'),
'"}'
) AS ranges
from imported_rowset_row
inner join imported_row on imported_rowset_row.row_checksum = imported_row.checksum
left join icingadb.host on imported_row.object_name = icingadb.host.name
where rowset_checksum =
( select rowset_checksum from import_source
inner join import_run on import_source.id = import_run.source_id
where import_source.source_name like '%cmdb_prod hosts%'
order by start_time desc
limit 1
)
and icingadb.host.id is null
( this query uses the director database (as source) and the icingadb, which in most cases are running on the same mysql server, I guess…)
I added a modifier to decode the ranges as json string
this will give me all hosts that are currently imported but not yet deployed to icinga, so only new hosts where the initial downtime should be created
a sync rule is creating all the host objects
a new sync rule creates the scheduled downtimes
the settinggs are: merge, delete
properties:
source destination
icinga director author
${ranges_decoded} ranges
auto generated on new hosts comment
apply object_type
Host apply_to
host.name = "${host}" assign_filter
1 fixed
1 with_services
this way the downtime gets created in the initial sync but is then deleted on the next sync when the object is created and then not there in the import source anymore. the downtime object still exists untill it is finished.
I tested it on a view scenarios and it works quite well for me.