Retention policies and continuous queries made simple

Author: @anon66228339
Original: https://monitoring-portal.org/t/retention-policies-and-continuous-queries-made-simple/1792


This is a small howto about downsampling data for InfluxDB.

Use this howto on your own risk!
And as always → create a backup first before you change anything!

Setup

InfluxDB name: icinga2
default retention policy: rp_1_year for new databases, or autogen for existing.

Terms

continuous query (CQ) An query that runs automatically and periodically within a database, normaly every 30 minutes if you didnt configure other times in your influxdb.ini

retention policy (RP) The part of InfluxDB’s data structure that describes for how long InfluxDB keeps data (duration).

Actual situation for most InfluxDB setups used with Icinga2

Most users setup InfluxDB like in this blog post, which works perfect.

Your Icinga2 server will send all the metrics (and meta data) to your InfluxDB, InfluxDB store all data inside the “DEFAULT” retention policy and keep it forever (or until you have enough and delete it). That is how InfluxDB works, you store data in a retention policy!

But what if you don’t want to store all the data indefinitely? You will need a retention policy (or more) and if you also want to downsample your data older then x hours|days|weeks|years you also need continuous queries :slight_smile: Lets create a setup that will downsample all data older then 52 weeks (1 hour to one entry) and downsample again after 104 weeks (1 day to one entry) deleting data older then 3 years.

Explanation:
Icinga2 will send performance data to InfluxDB → InfluxDB will store all data inside the default retention policy.

The first continuous queries will pull data, that is older then 52 weeks, every hour from the default retention policy and write it into the retention policy “rp_2_years”.

The second continuous queries “cq_after_2_years” will pull data, that is older then 104 weeks, every day from the retention policy “rp_2_years” and write it into the retention policy “rp_3_years”.

The default retention policy will delete all data from the default policy that is older then 52 weeks and one day.
The retention policy “rp_2_years” will delete all data from the retention policy “rp_2_years” that is older then 104 weeks and 2 days.
The retention policy “rp_3_years” will delete all data from the retention policy “rp_3_years” that is older then 156 weeks.

Prepare a new InfluxDB database

  1. Log into influx cli
  2. Create database with non default retention policy named “one_year”
CREATE DATABASE "icinga2" WITH DURATION 52w1d REPLICATION 1 SHARD DURATION 168h NAME "rp_1_year"
  1. Change to your newly created database
use icinga2
  1. Create second retention policy “rp_2_years” and “rp_3_years”
CREATE RETENTION POLICY "rp_2_years" ON "icinga2" DURATION 104w2d REPLICATION 1
CREATE RETENTION POLICY "rp_3_years" ON "icinga2" DURATION 156w REPLICATION 1
  1. Check if all retention policies are there
> SHOW RETENTION POLICIES
name       duration   shardGroupDuration replicaN default
----       --------   ------------------ -------- -------
rp_1_year  8760h0m0s  168h0m0s           1        true
rp_2_years 17520h0m0s 168h0m0s           1        false
rp_3_years 26208h0m0s 168h0m0s           1        false
  1. Create the first continuous query that will downsample all metric data (meta data will not be taken) older 52 weeks and downsample 1 hour of data into 1 entry writing it into the retention policy named rp_2_years
CREATE CONTINUOUS QUERY "cq_after_1_year" ON "icinga2" BEGIN SELECT mean(value) AS value,mean(crit) AS crit,mean(warn) AS warn INTO "icinga2"."rp_2_years".:MEASUREMENT FROM "icinga2"."rp_1_year"./.*/ WHERE time < now() -52w GROUP BY time(1h),* END
  1. For the next continuous query we want to use data from the retention policy “rp_2_years”
CREATE CONTINUOUS QUERY "cq_after_2_year" ON "icinga2" BEGIN SELECT mean(value) AS value,mean(crit) AS crit,mean(warn) AS warn INTO "icinga2"."rp_3_years".:MEASUREMENT FROM "icinga2"."rp_2_years"./.*/ WHERE time < now() -104w GROUP BY time(1d),* END
  1. Check if all continuous are set like we wanted it.
> SHOW CONTINUOUS QUERIES
name: _internal
name query
---- -----

name: icinga2
name            query
----            -----
cq_after_1_year CREATE CONTINUOUS QUERY cq_after_1_year ON icinga2 BEGIN SELECT mean(value) AS value, mean(crit) AS crit, mean(warn) AS warn INTO icinga2.rp_2_years.:MEASUREMENT FROM icinga2.rp_1_year./.*/ WHERE time < now() - 52w GROUP BY time(1h), * END
cq_after_2_year CREATE CONTINUOUS QUERY cq_after_2_year ON icinga2 BEGIN SELECT mean(value) AS value, mean(crit) AS crit, mean(warn) AS warn INTO icinga2.rp_3_years.:MEASUREMENT FROM icinga2.rp_2_years./.*/ WHERE time < now() - 104w GROUP BY time(1d), * END

For existing databases with a retention policy “autogen”

Hint

Its not possible to rename existing retention policies, but we can alter them.
Do not drop the default retention policy, you will loose all your data!

  1. Log into influx cli
  2. Change to your existing database
use icinga2
  1. Create second retention policy “rp_2_years” and “rp_3_years”
CREATE RETENTION POLICY "rp_2_years" ON "icinga2" DURATION 104w2d REPLICATION 1
CREATE RETENTION POLICY "rp_3_years" ON "icinga2" DURATION 156w REPLICATION 1
  1. Check if all retention policies are there
> SHOW RETENTION POLICIES
name       duration   shardGroupDuration replicaN default
----       --------   ------------------ -------- -------
autogen    0s         168h0m0s           1        true
rp_2_years 17520h0m0s 168h0m0s           1        false
rp_3_years 26208h0m0s 168h0m0s           1        false
  1. Create the first continuous query that will downsample all metric data (meta data will not be taken) older 52 weeks and downsample 1 hour of data into 1 entry writing it into the retention policy named rp_2_years
CREATE CONTINUOUS QUERY "cq_after_1_year" ON "icinga2" BEGIN SELECT mean(value) AS value,mean(crit) AS crit,mean(warn) AS warn INTO "icinga2"."rp_2_years".:MEASUREMENT FROM "icinga2"."autogen"./.*/ WHERE time < now() -52w GROUP BY time(1h),* END
  1. For the next continuous query we want to use data from the retention policy “rp_2_years”
CREATE CONTINUOUS QUERY "cq_after_2_year" ON "icinga2" BEGIN SELECT mean(value) AS value,mean(crit) AS crit,mean(warn) AS warn INTO "icinga2"."rp_3_years".:MEASUREMENT FROM "icinga2"."rp_2_years"./.*/ WHERE time < now() -104w GROUP BY time(1d),* END
  1. Check if all continuous are set like we wanted it.
> SHOW CONTINUOUS QUERIES
name: _internal
name query
---- -----

name: icinga2
name            query
----            -----
cq_after_1_year CREATE CONTINUOUS QUERY cq_after_1_year ON icinga2 BEGIN SELECT mean(value) AS value, mean(crit) AS crit, mean(warn) AS warn INTO icinga2.rp_2_years.:MEASUREMENT FROM icinga2.autogen./.*/ WHERE time < now() - 52w GROUP BY time(1h), * END
cq_after_2_year CREATE CONTINUOUS QUERY cq_after_2_year ON icinga2 BEGIN SELECT mean(value) AS value, mean(crit) AS crit, mean(warn) AS warn INTO icinga2.rp_3_years.:MEASUREMENT FROM icinga2.rp_2_years./.*/ WHERE time < now() - 104w GROUP BY time(1d), * END
  1. Wait for at least 2 days, because second CQ runs only every day once, or more days depending on your amount of data, before you alter the default retention policy to delete all data older 52w and 1 day
ALTER RETENTION POLICY "autogen" ON "icinga2" DURATION 52w1d REPLICATION 1 SHARD DURATION 168h DEFAULT
  1. Check if the “autogen” policy has changed
> show retention policies
name       duration   shardGroupDuration replicaN default
----       --------   ------------------ -------- -------
autogen    8760h0m0s  168h0m0s           1        true
rp_2_years 17520h0m0s 168h0m0s           1        false
rp_3_years 26208h0m0s 168h0m0s           1        false

If you still have questions, just ask here or open a new topic.

Cheers,
Carsten

10 Likes

This is just awesome! I was preparing something very similar.

like:- create a Retention policies for delete data older than one hour but it’s delete also newer data from measurement, please help-out me.

CREATE DATABASE “deleteOldData” WITH DURATION 1h REPLICATION 1 SHARD DURATION 1h NAME “rp_1_hour”

Thanks in advance.

That’s a really great post! Lot of effort made it into it. Hope it will be helpful for many people! :slight_smile:

@Abhishe12155967 Influx stores the data in shards and Influx will always delete the whole shard if it applies a retention policy. So the RP duration should be longer than the shard duration. The minimum duration of a shard is 1 hour. So the RP duration has to be longer than that. I did not try it with a configuration like 1 hour and 10 minutes but you are safe if you set the RP duration to 2hrs or more.

1 Like

@winem Thank you for quick support. I will update my Retention policy.

Hi there, thanks alot for this useful post!

I implemented the retention policies and continuous queries on my rather large InfluxDB (100 GiB of data, reaching back until July 2018) and … nothing happened.

# Removed my previous text as it was misleading and simply not correct :)

@jtritschler I discovered something you might want to include in your original post as the current text might cause people to lose data when altering their default RP.

It turns out, that CQs don’t support a WHERE clause regarding the time interval.
The time interval covered by the CQ is always defined by the CQ’s GROUP BY time() clause (for basic CQs), OR by the RESAMPLE EVERY ... FOR clause in the advanced syntax:

Schedule and coverage
Continuous queries operate on real-time data. They use the local server’s timestamp, the GROUP BY time() interval, and InfluxDB database’s preset time boundaries to determine when to execute and what time range to cover in the query.

https://docs.influxdata.com/influxdb/v1.8/query_language/continuous_queries/#backfilling-results-for-older-data

Someone asked a similar question in the InfluxDB community on how to downsample older data:
https://community.influxdata.com/t/continuous-query-on-the-full-database/9379

It turns out that you need to manually downsample older data if you don’t want to lose it when setting the DURATION on the default retention policy autogen to 52w1d!

This can be done using 2 simple SELECT queries:

Downsample data older than one year but newer than 2 years:

SELECT mean(value) AS value,mean(crit) AS crit,mean(warn) AS warn INTO "icinga2"."rp_2_years".:MEASUREMENT FROM "icinga2"."rp_1_year"./.*/ WHERE time < now() -52w AND time > now() -104w GROUP BY time(1h),*

Downsample data older than 2 years but newer than 3 years:

SELECT mean(value) AS value,mean(crit) AS crit,mean(warn) AS warn INTO "icinga2"."rp_3_years".:MEASUREMENT FROM "icinga2"."autogen"./.*/ WHERE time < now() -104w AND time > now() -156w GROUP BY time(1d),*

Hope that helps anyone in a similar situation where you need to downsample data on an existing InfluxDB with just the autogen retention policy.

EDIT
Whenever you see the error message ERR: no data received, this probably means your InfluxDB host ran out of memory and the InfluxDB process got killed.
I had to split up my select queries for backfilling data in two-week intervals on a machine with 8 GiB RAM whose only purpose is to run InfluxDB for my Icinga2 installation.

Regards,
Markus

1 Like

@mj84 is this already added to the original post right?

As far as I can tell, no.
The problem is that continuous queries in InfluxDBv1 don’t support where clauses but the original post in this thread uses them.
From my experience it is definitely necessary to manually downsample your data to other retention policies. Otherwise if you change the „autogen“ retention policy, all data older than X will be discarded.

EDIT
On a new setup it would be preferable to use InfluxDBv2 anyways, which uses a completely different language

1 Like