Cleanup script for InfluxDB

mj84 · February 26, 2020, 4:11pm

Hi folks,

I just wanted to share a small script I developed today.
As you guys surely can imagine, monitored systems come and go, and those that leave usually don’t return.
But the data of these systems is still being kept in various places, despite not being useful to anyone.

As we have a quite high fluctuation within the monitored systems, I searched for a way to clean up at least some of the data these no-longer present systems which is why I developed this script to cleanup my InfluxDB of orphaned hosts.

I was able to reclaim several GB using that script in our environment (~2000 hosts, ~10000 services).

This script does the following:

collect list of hostnames present in Icinga2
collect the measurements in Icinga2’s InfluxDB
collect the hostnames within a measurement
check if each hostname is still present in Icinga2, if not, drop ALL series for that host

Obviously run this at your own risk and create backups if your InfluxDB contains critical data!
Also, please note that this script causes a lot of IO, depending on how many series have to be dropped.

Please see the following GitHub repo for this script:

I might implement the following additional functionality:

cleanup orphaned services where the hosts still exist in Icinga2
provide some sort of retention mechanism, where only series with their last datapoint more than X days in the past are being dropped
switch to InfluxDB-Python instead of calling the influx binary to improve speed

May this script might be useful for someone else

Regards,
Markus

gkoutsog · February 27, 2020, 9:27am

Seems nice and I intend to try it soon.
But maybe place it in a GitHub repo or something similar in order to get latest version from a central place.

Cheers,
George

mj84 · February 27, 2020, 12:25pm

Good point, I just created a new GitHub repo for this script: