Retention policies and continuous queries made simple

influxdb

(Janina Tritschler) #1

Author: @Carsten
Original: https://monitoring-portal.org/t/retention-policies-and-continuous-queries-made-simple/1792


This is a small howto about downsampling data for InfluxDB.

Use this howto on your own risk!
And as always -> create a backup first before you change anything!

Setup

InfluxDB name: icinga2
default retention policy: rp_1_year for new databases, or autogen for existing.

Terms

continuous query (CQ) An query that runs automatically and periodically within a database, normaly every 30 minutes if you didnt configure other times in your influxdb.ini

retention policy (RP) The part of InfluxDB’s data structure that describes for how long InfluxDB keeps data (duration).

Actual situation for most InfluxDB setups used with Icinga2

Most users setup InfluxDB like in this blog post, which works perfect.

Your Icinga2 server will send all the metrics (and meta data) to your InfluxDB, InfluxDB store all data inside the “DEFAULT” retention policy and keep it forever (or until you have enough and delete it). That is how InfluxDB works, you store data in a retention policy!

But what if you don’t want to store all the data indefinitely? You will need a retention policy (or more) and if you also want to downsample your data older then x hours|days|weeks|years you also need continuous queries :slight_smile: Lets create a setup that will downsample all data older then 52 weeks (1 hour to one entry) and downsample again after 104 weeks (1 day to one entry) deleting data older then 3 years.

Explanation:
Icinga2 will send performance data to InfluxDB -> InfluxDB will store all data inside the default retention policy.

The first continuous queries will pull data, that is older then 52 weeks, every hour from the default retention policy and write it into the retention policy “rp_2_years”.

The second continuous queries “cq_after_2_years” will pull data, that is older then 104 weeks, every day from the retention policy “rp_2_years” and write it into the retention policy “rp_3_years”.

The default retention policy will delete all data from the default policy that is older then 52 weeks and one day.
The retention policy “rp_2_years” will delete all data from the retention policy “rp_2_years” that is older then 104 weeks and 2 days.
The retention policy “rp_3_years” will delete all data from the retention policy “rp_3_years” that is older then 156 weeks.

Prepare a new InfluxDB database

  1. Log into influx cli
  2. Create database with non default retention policy named “one_year”
CREATE DATABASE "icinga2" WITH DURATION 52w1d REPLICATION 1 SHARD DURATION 168h NAME "rp_1_year"
  1. Change to your newly created database
use icinga2
  1. Create second retention policy “rp_2_years” and “rp_3_years”
CREATE RETENTION POLICY "rp_2_years" ON "icinga2" DURATION 104w2d REPLICATION 1
CREATE RETENTION POLICY "rp_3_years" ON "icinga2" DURATION 156w REPLICATION 1
  1. Check if all retention policies are there
> SHOW RETENTION POLICIES
name       duration   shardGroupDuration replicaN default
----       --------   ------------------ -------- -------
rp_1_year  8760h0m0s  168h0m0s           1        true
rp_2_years 17520h0m0s 168h0m0s           1        false
rp_3_years 26208h0m0s 168h0m0s           1        false
  1. Create the first continuous query that will downsample all metric data (meta data will not be taken) older 52 weeks and downsample 1 hour of data into 1 entry writing it into the retention policy named rp_2_years
CREATE CONTINUOUS QUERY "cq_after_1_year" ON "icinga2" BEGIN SELECT mean(value) AS value,mean(crit) AS crit,mean(warn) AS warn INTO "icinga2"."rp_2_years".:MEASUREMENT FROM "icinga2"."rp_1_year"./.*/ WHERE time < now() -52w GROUP BY time(1h),* END
  1. For the next continuous query we want to use data from the retention policy “rp_2_years”
CREATE CONTINUOUS QUERY "cq_after_2_year" ON "icinga2" BEGIN SELECT mean(value) AS value,mean(crit) AS crit,mean(warn) AS warn INTO "icinga2"."rp_3_years".:MEASUREMENT FROM "icinga2"."rp_2_years"./.*/ WHERE time < now() -104w GROUP BY time(1d),* END
  1. Check if all continuous are set like we wanted it.
> SHOW CONTINUOUS QUERIES
name: _internal
name query
---- -----

name: icinga2
name            query
----            -----
cq_after_1_year CREATE CONTINUOUS QUERY cq_after_1_year ON icinga2 BEGIN SELECT mean(value) AS value, mean(crit) AS crit, mean(warn) AS warn INTO icinga2.rp_2_years.:MEASUREMENT FROM icinga2.rp_1_year./.*/ WHERE time < now() - 52w GROUP BY time(1h), * END
cq_after_2_year CREATE CONTINUOUS QUERY cq_after_2_year ON icinga2 BEGIN SELECT mean(value) AS value, mean(crit) AS crit, mean(warn) AS warn INTO icinga2.rp_3_years.:MEASUREMENT FROM icinga2.rp_2_years./.*/ WHERE time < now() - 104w GROUP BY time(1d), * END

For existing databases with a retention policy “autogen”

Hint

Its not possible to rename existing retention policies, but we can alter them.
Do not drop the default retention policy, you will loose all your data!

  1. Log into influx cli
  2. Change to your existing database
use icinga2
  1. Create second retention policy “rp_2_years” and “rp_3_years”
CREATE RETENTION POLICY "rp_2_years" ON "icinga2" DURATION 104w2d REPLICATION 1
CREATE RETENTION POLICY "rp_3_years" ON "icinga2" DURATION 156w REPLICATION 1
  1. Check if all retention policies are there
> SHOW RETENTION POLICIES
name       duration   shardGroupDuration replicaN default
----       --------   ------------------ -------- -------
autogen    0s         168h0m0s           1        true
rp_2_years 17520h0m0s 168h0m0s           1        false
rp_3_years 26208h0m0s 168h0m0s           1        false
  1. Create the first continuous query that will downsample all metric data (meta data will not be taken) older 52 weeks and downsample 1 hour of data into 1 entry writing it into the retention policy named rp_2_years
CREATE CONTINUOUS QUERY "cq_after_1_year" ON "icinga2" BEGIN SELECT mean(value) AS value,mean(crit) AS crit,mean(warn) AS warn INTO "icinga2"."rp_2_years".:MEASUREMENT FROM "icinga2"."autogen"./.*/ WHERE time < now() -52w GROUP BY time(1h),* END
  1. For the next continuous query we want to use data from the retention policy “rp_2_years”
CREATE CONTINUOUS QUERY "cq_after_2_year" ON "icinga2" BEGIN SELECT mean(value) AS value,mean(crit) AS crit,mean(warn) AS warn INTO "icinga2"."rp_3_years".:MEASUREMENT FROM "icinga2"."rp_2_years"./.*/ WHERE time < now() -104w GROUP BY time(1d),* END
  1. Check if all continuous are set like we wanted it.
> SHOW CONTINUOUS QUERIES
name: _internal
name query
---- -----

name: icinga2
name            query
----            -----
cq_after_1_year CREATE CONTINUOUS QUERY cq_after_1_year ON icinga2 BEGIN SELECT mean(value) AS value, mean(crit) AS crit, mean(warn) AS warn INTO icinga2.rp_2_years.:MEASUREMENT FROM icinga2.autogen./.*/ WHERE time < now() - 52w GROUP BY time(1h), * END
cq_after_2_year CREATE CONTINUOUS QUERY cq_after_2_year ON icinga2 BEGIN SELECT mean(value) AS value, mean(crit) AS crit, mean(warn) AS warn INTO icinga2.rp_3_years.:MEASUREMENT FROM icinga2.rp_2_years./.*/ WHERE time < now() - 104w GROUP BY time(1d), * END
  1. Wait for at least 2 days, because second CQ runs only every day once, or more days depending on your amount of data, before you alter the default retention policy to delete all data older 52w and 1 day
ALTER RETENTION POLICY "autogen" ON "icinga2" DURATION 52w1d REPLICATION 1 SHARD DURATION 168h DEFAULT
  1. Check if the “autogen” policy has changed
> show retention policies
name       duration   shardGroupDuration replicaN default
----       --------   ------------------ -------- -------
autogen    8760h0m0s  168h0m0s           1        true
rp_2_years 17520h0m0s 168h0m0s           1        false
rp_3_years 26208h0m0s 168h0m0s           1        false

If you still have questions, just ask here or open a new topic.

Cheers,
Carsten