Problem with Graphite retention

Hello, I have a problem with the whisper and icinga2 files, the collection of information began in September 2019, and now in February 2020, data older than 6 months was deleted from the whisper file, using the whisper-dump I only see information from 10 / 02/2020, my configuration for carbon is as follows "[carbon]
pattern = ^ carbon .
retentions = 60: 90d

[default_1min_for_1day]
pattern =. *
retentions = 60s: 1d

[default]
pattern =. *
retentions = 12s: 4h, 2m: 3d, 5m: 8d, 13m: 32d, 1h: 1y
[root @ FSSRVMON01 services] #
"Could you help me by indicating possible causes of information loss?

Thanks
Angel

Hi @abarra,

my guess is that you initially ran graphite with a different config. Is that correct?

So, what I guess happened:

  1. you started graphite with configA
  2. Icinga2 started to send data to the graphite instance
  3. Graphite received the data and created the whisper files according to the default retention from configA
  4. You updated the configuration and the default retention and came up with configB
  5. You did not use whisper-resize to apply the new retention policy to the already existing files

You may confirm it if you have any services that were added after the config change with the new default retention.

The issue is that a change on the carbon configuration does not apply to already existing data (whisper files) and the retention, aggregation, etc. is stored within these files. It needs whisper-resize to update the retention of an existing file.

https://github.com/graphite-project/carbonate lists some very helpful additional tools to work with graphite.

Cheers,
Marcel

3 Likes

Hi Marcel,

Thanks for hour help, We installed icinga2+graphite several months ago and the information collection worked normally, until 09/02/2020, we did not make any changes to the carbon storage-schemas.conf file, when seeing the “wsp” files it says that the retention is of 86400.

After investigating what was happening, we changed the configuration of the storage-schemas.conf file to a new retention, after this we executed the whisper-resize command to change the file size, we are concerned that the problem is repeated.

This is the new configuration

[root@FSSRVMON01 etc]# more /etc/carbon/storage-schemas.conf

# Schema definitions for Whisper files. Entries are scanned in order,
# and first match wins. This file is scanned for changes every 60 seconds.
#
#  [name]
#  pattern = regex
#  retentions = timePerPoint:timeToStore, timePerPoint:timeToStore, ...
# Carbon's internal metrics. This entry should match what is specified in
# CARBON_METRIC_PREFIX and CARBON_METRIC_INTERVAL settings

[carbon]
pattern = ^carbon\.
retentions = 60:90d
[default]
pattern = .*
retentions = 60s:1d,5m:8w,30m:1y,360m:8y
[default_1min_for_1day]
pattern = .*
retentions = 60s:1d
We use Grafana for the Display of the information.

This the original configuration.

[root@FSSRVMON01 etc]# more /etc/carbon/storage-schemas.conf.bkp

# Schema definitions for Whisper files. Entries are scanned in order,
# and first match wins. This file is scanned for changes every 60 seconds.
#
#  [name]
#  pattern = regex
#  retentions = timePerPoint:timeToStore, timePerPoint:timeToStore, ...
# Carbon's internal metrics. This entry should match what is specified in
# CARBON_METRIC_PREFIX and CARBON_METRIC_INTERVAL settings
[carbon]
pattern = ^carbon\.
retentions = 60:90d
[default_1min_for_1day]
pattern = .*
retentions = 60s:1d

[default]
pattern = .*
retentions = 12s:4h, 2m:3d, 5m:8d, 13m:32d, 1h:1y

[root@FSSRVMON01 etc]#

Regards
Angel Barra

Hi, please use the code boxes when pasting configurations. Otherwise it’s a bit hard to read.

2 things that come to mind:
The shown retention of 86400 matches your configuration:

[default_1min_for_1day]
pattern = .*
retentions = 60s:1d

So these files should not contain any data older than 1 day.

The 2nd thing is that default_1min_for_1day configuration in your storages-schemas.conf will never be applied because only the first pattern that matches the metric name will be used. Therefore only [default] will match.

Did you do the mentioned changes just today or even before the data older than 6 months was dropped?