Graphite won't graph more than 2 days of data

I am not able to graph more than 2 days of data. I have read previous posts on the same topic and double-checked my configuration.

I believe my system is configured properly

/etc/carbon/storage-schemas.conf contains:
[icinga2_internals]
pattern = ^icinga2\..*\.(max_check_attempts|reachable|current_attempt|execution_time|latency|state|state_type)
retentions = 5m:7d

[icinga2_default]
pattern = ^icinga2\.
retentions = 1m:2d,5m:10d,30m:90d,360m:4y

[carbon]
pattern = ^carbon\.
retentions = 60:90d

[default_1min_for_1day]
pattern = .*
retentions = 60s:1d

#[default]
#pattern = .*
#retentions = 12s:4h, 2m:3d, 5m:8d, 13m:32d, 1h:1y

/etc/carbon/storage-aggregation.conf contains:
[min]
pattern = \.min$
xFilesFactor = 0.1
aggregationMethod = min

[max]
pattern = \.max$
xFilesFactor = 0.1
aggregationMethod = max

[sum]
pattern = \.count$
xFilesFactor = 0
aggregationMethod = sum

[default_average]
pattern = .*
xFilesFactor = 0.5
aggregationMethod = average

If I look at a data file
/var/lib/carbon/whisper/icinga2/HOSTNAME/services/ping4/ping4/perfdata/rta/value.wsp
it looks like it has the correct settings but I can’t get more than 2 days of data graphed.

whisper-info value.wsp
maxRetention: 126144000
xFilesFactor: 0.5
aggregationMethod: average
fileSize: 191104

Archive 0
retention: 172800
secondsPerPoint: 60
points: 2880
size: 34560
offset: 64

Archive 1
retention: 864000
secondsPerPoint: 300
points: 2880
size: 34560
offset: 34624

Archive 2
retention: 7776000
secondsPerPoint: 1800
points: 4320
size: 51840
offset: 69184

Archive 3
retention: 126144000
secondsPerPoint: 21600
points: 5840
size: 70080
offset: 121024

Following up to my own post, whisper-dump only shows data for Archive 0. No data is present for Archive 1,2,3 even though it has been more than a week since I removed /var/lib/carbon/whisper/icinga2 and restarted the carbon-cache service.

I have a cron job that removes files older than 1 day from /var/spool/icinga2/perfdata.
Could this be the problem?

Hm, I have the exact same config in the storage schemas, but have only the following inside the storage aggregation config:

[default]
pattern = .*
xFilesFactor = 0
aggregationMethod = average

Quoting from

If I understand your config correctly you would aggregate all [dot]min & [dot]max files with the 0.1, the [dot]count file with the 0 setting and all other files with the 0.5 setting.
As my folders typically hold

  • crit.wsp
  • min.wsp
  • value.wsp
  • warn.wsp

I could suspect that those would all be aggerated with the [default] part from the config and maybe that’s why the values disappear after two days.

But tbh this is some wild guessing from my side :sweat_smile: and I am happy my graphite setups currently work :wink:

Only thing I can tell you for certain:

Those files are not needed for graphite, as graphite receives the perfdata directly from the core via a TCP connection configured in config for the graphite feature.
You can safely disable the perfdata feature if you don’t use it for anything else (it is deprecated anyway).

Thanks for your comments. I have changed my storage-aggregation.conf to just this

[default_average]
pattern = .*
xFilesFactor = 0
aggregationMethod = average

I also restarted carbon-cache, disabled the perfdata feature and ran the following on my whisper files

find /var/lib/carbon/whisper/icinga2 -name *wsp -exec whisper-resize '{}' --xFilesFactor=0  1m:2d 5m:10d 30m:90d 360m:4y \;

I ran this yesterday and now I have no graph data, even for an hour.

I will remove the contents of /var/lib/carbon/whisper/icinga2 and restart carbon-cache again.

Can you also provide your check_interval? This is needed to know how often a value is send to graphite so how good the slots and aggregation match and how the xFilesFactor is met for aggregation.

ping4 and ssh service have a check_interval of 30m
generic-host and generic-service have a check_interval of 10m

The example whisper file I provided was for …/ping4/ping4/perfdata/rta/value.wsp

I do see the same problem for services that import generic-service.

I see that while I restarted carbon-cache, I did not restart carbon-aggregator.

root      16100      1  0 Feb08 ?        00:00:23 /usr/bin/python2 -s /usr/bin/carbon-aggregator --config=/etc/carbon/carbon.conf --pidfile=/var/run/carbon-aggregator.pid --logdir=/var/log/carbon/ start
carbon    39446      1  0 Feb15 ?        00:06:31 /usr/bin/python2 -s /usr/bin/carbon-cache --config=/etc/carbon/carbon.conf --pidfile=/var/run/carbon-cache.pid --logdir=/var/log/carbon/ start

I will make sure to restart everything – carbon-cache, carbon-aggregator and icinga2

I see error messages in the system log when I restart carbon-aggregator and carbon-cache

Feb 16 04:48:12 icinga carbon-aggregator: Unhandled Error
Feb 16 04:48:12 icinga carbon-aggregator: Traceback (most recent call last):
Feb 16 04:48:12 icinga carbon-aggregator: File "/usr/lib64/python2.7/site-packages/twisted/python/usage.py", line 255, in parseOptions
Feb 16 04:48:12 icinga carbon-aggregator: self._dispatch[optMangled](optMangled, arg)
Feb 16 04:48:12 icinga carbon-aggregator: File "/usr/lib64/python2.7/site-packages/twisted/python/usage.py", line 411, in <lambda>
Feb 16 04:48:12 icinga carbon-aggregator: fn = lambda name, value, m=method: m(value)
Feb 16 04:48:12 icinga carbon-aggregator: File "/usr/lib64/python2.7/site-packages/twisted/application/app.py", line 539, in opt_reactor
Feb 16 04:48:12 icinga carbon-aggregator: installReactor(shortName)
Feb 16 04:48:12 icinga carbon-aggregator: File "/usr/lib64/python2.7/site-packages/twisted/application/reactors.py", line 80, in installReactor
Feb 16 04:48:12 icinga carbon-aggregator: for installer in getReactorTypes():
Feb 16 04:48:12 icinga carbon-aggregator: --- <exception caught here> ---
Feb 16 04:48:12 icinga carbon-aggregator: File "/usr/lib64/python2.7/site-packages/twisted/plugin.py", line 217, in getPlugins
Feb 16 04:48:12 icinga carbon-aggregator: adapted = interface(plugin, None)
Feb 16 04:48:12 icinga carbon-aggregator: File "/usr/lib64/python2.7/site-packages/zope/interface/interface.py", line 468, in _call_conform
Feb 16 04:48:12 icinga carbon-aggregator: return conform(self)
Feb 16 04:48:12 icinga carbon-aggregator: File "/usr/lib64/python2.7/site-packages/twisted/plugin.py", line 71, in __conform__
Feb 16 04:48:12 icinga carbon-aggregator: return self.load()
Feb 16 04:48:12 icinga carbon-aggregator: File "/usr/lib64/python2.7/site-packages/twisted/plugin.py", line 66, in load
Feb 16 04:48:12 icinga carbon-aggregator: return namedAny(self.dropin.moduleName + '.' + self.name)
Feb 16 04:48:12 icinga carbon-aggregator: File "/usr/lib64/python2.7/site-packages/twisted/python/reflect.py", line 319, in namedAny
Feb 16 04:48:12 icinga carbon-aggregator: obj = getattr(obj, n)
Feb 16 04:48:12 icinga carbon-aggregator: exceptions.AttributeError: 'module' object has no attribute 'glade'

Feb 16 04:48:18 icinga carbon-cache: Unhandled Error
Feb 16 04:48:18 icinga carbon-cache: Traceback (most recent call last):
Feb 16 04:48:18 icinga carbon-cache: File "/usr/lib64/python2.7/site-packages/twisted/python/usage.py", line 255, in parseOptions
Feb 16 04:48:18 icinga carbon-cache: self._dispatch[optMangled](optMangled, arg)
Feb 16 04:48:18 icinga carbon-cache: File "/usr/lib64/python2.7/site-packages/twisted/python/usage.py", line 411, in <lambda>
Feb 16 04:48:18 icinga carbon-cache: fn = lambda name, value, m=method: m(value)
Feb 16 04:48:18 icinga carbon-cache: File "/usr/lib64/python2.7/site-packages/twisted/application/app.py", line 539, in opt_reactor
Feb 16 04:48:18 icinga carbon-cache: installReactor(shortName)
Feb 16 04:48:18 icinga carbon-cache: File "/usr/lib64/python2.7/site-packages/twisted/application/reactors.py", line 80, in installReactor
Feb 16 04:48:18 icinga carbon-cache: for installer in getReactorTypes():
Feb 16 04:48:18 icinga carbon-cache: --- <exception caught here> ---
Feb 16 04:48:18 icinga carbon-cache: File "/usr/lib64/python2.7/site-packages/twisted/plugin.py", line 217, in getPlugins
Feb 16 04:48:18 icinga carbon-cache: adapted = interface(plugin, None)
Feb 16 04:48:18 icinga carbon-cache: File "/usr/lib64/python2.7/site-packages/zope/interface/interface.py", line 468, in _call_conform
Feb 16 04:48:18 icinga carbon-cache: return conform(self)
Feb 16 04:48:18 icinga carbon-cache: File "/usr/lib64/python2.7/site-packages/twisted/plugin.py", line 71, in __conform__
Feb 16 04:48:18 icinga carbon-cache: return self.load()
Feb 16 04:48:18 icinga carbon-cache: File "/usr/lib64/python2.7/site-packages/twisted/plugin.py", line 66, in load
Feb 16 04:48:18 icinga carbon-cache: return namedAny(self.dropin.moduleName + '.' + self.name)
Feb 16 04:48:18 icinga carbon-cache: File "/usr/lib64/python2.7/site-packages/twisted/python/reflect.py", line 319, in namedAny
Feb 16 04:48:18 icinga carbon-cache: obj = getattr(obj, n)
Feb 16 04:48:18 icinga carbon-cache: exceptions.AttributeError: 'module' object has no attribute 'glade'

I now see data for all archives using whisper-dump on a value.wsp file.

I am not sure if I should put storage-aggregation.conf back the way it was.

Use this:

[min]
pattern = \.min$
xFilesFactor = 0.1
aggregationMethod = min

[max]
pattern = \.max$
xFilesFactor = 0.1
aggregationMethod = max

[sum]
pattern = \.count$
xFilesFactor = 0
aggregationMethod = sum

[default_average]
pattern = .*
xFilesFactor = 0.5
aggregationMethod = average

or this

[default]
pattern = .*
xFilesFactor = 0
aggregationMethod = average

Ok, so the problem is the following.

A check_interval of 10m or 30m will one create a datapoint in this interval, but with a retention of 1m:2d,5m:10d,30m:90d,360m:4y the first aggregation will expect a datapoint every minute so 29 slots are empty. For the second aggregation it will then try to aggregate taking the xFilesFactor which is normally 0.5 so it requires at least 50% slots filled, so in this case 3 of 5, but with only 1 or none it can never aggregate so it will lose all data during aggregation after 2 days.

When using this interval I recommend adjusting the default and internals for icinga2 and adding another schema for ping4 and ssh.

[icinga2_internals]
pattern = ^icinga2\..*\.(max_check_attempts|reachable|current_attempt|execution_time|latency|state|state_type)
retentions = 10m:7d

[icinga2_30m]
pattern = ^icinga2\..*\.(ssh|ping4)\..*
retentions = 30m:90d,360m:4y

[icinga2_default]
pattern = ^icinga2\.
retentions = 10m:10d,30m:90d,360m:4y

In the other configuration I would keep the default then as there will now be enough values for aggregation to work like expected.

3 Likes

I made the change to the storage-schema.conf that you suggested.
I restarted carbon-aggregator, carbon-cache and icinga2
Do I need to run whisper-resize?

[icinga2_internals]
pattern = ^icinga2\..*\.(max_check_attempts|reachable|current_attempt|execution_time|latency|state|state_type)
retentions = 5m:7d

[icinga2_30m]
pattern = ^icinga2\..*\.(ssh|ping4)\..*
retentions = 30m:90d,360m:4y

[icinga2_default]
pattern = ^icinga2\.
retentions = 1m:2d,5m:10d,30m:90d,360m:4y

I put back the storage-aggregation.conf but changed XfilesFactor = 0.1 instead of 0.5 for [default_average]

[min]
pattern = \.min$
xFilesFactor = 0.1
aggregationMethod = min

[max]
pattern = \.max$
xFilesFactor = 0.1
aggregationMethod = max

[sum]
pattern = \.count$
xFilesFactor = 0
aggregationMethod = sum

[default_average]
pattern = .*
xFilesFactor = 0.1
aggregationMethod = average

Yes, a resize is always required as the files are created with a fixed format based on the configuration and not changed afterwards except manually by running resize.

And if this is the current configuration you only added the one for the interval of 30m but did not change the other ones.

I resized the affected files for ping4 and ssh.

The issue is solved.

Here’s a summary of the actions:

  1. Add new item to storage-schema.conf for intervals of 30m before [icinga2_default]
[icinga2_30m]
pattern = ^icinga2\..*\.(ssh|ping4)\..*
retentions = 30m:90d,360m:4y
  1. Adjust storage-aggregation.conf by changing xFilesFactor to 0.1 from 0 for [default_average]
[default_average]
pattern = .*
xFilesFactor = 0.1
aggregationMethod = averag
  1. Restart carbon-aggregator, carbon-cache and icinga2
  2. Run whisper-resize on affected files
find . -regex '.*/\(ssh\|ping4\)/.*.wsp' -exec whisper-resize '{}' --xFilesFactor=0.1 30m:90d 360m:4y \;
2 Likes

I had some additional problems. The last set of changes caused me to lose ping data. No further data was being aggregated.

  1. I removed the block for [icinga2_30m] in storage-schemas.conf
  2. I changed xFilesFactor = 0 for [default_average] in storage-aggregation.conf
  3. I removed the whisper files for ssh and ping4
  4. I restarted carbon-aggregator, carbon-cache and icinga2

I will also note that this is the second time of running whisper-resize that my data files were corrupted.
No data was being recorded in whisper files after the resize.
I am not sure if the problem was that I didn’t restart services after resize.