Graphite Troubleshooting

graphite
troubleshoot

(Michael Friedrich) #1

View Metrics

Watch incoming metrics

Example:

watch "/opt/whisper/bin/whisper-fetch.py /opt/graphite/storage/whisper/icinga2/HOST/services/check_hiveserver2_HOST_port_10000/check_hiveserver2_HOST_port_10000/perfdata/query_time/value.wsp --pretty | tail"

Data is not shown older than …

@twidhalm explains it:

Carbon uses a first match when it comes to decide which retention policy to use. It starts on top of your list and the retention times of the first pattern that matches is used. The icinga2_internals doesn’t look like valid regex to me. So most of your metrics will use icinga2_default .

Since carbon is aggregating older entries you should always use multiples of a timeframe as next timeframe. What you have is: 1Minute, 1Minute, 5Minutes,5Minutes all with different retention times. What you want is something like

1m:1d,5m:7d,30m:30d,1h:1y,1d:4y

So the first part of each tuple show’s the timeframe of the aggregation, the second shows how long to keep it. If >you use one timeframe (eg. 1m) multiple times with different retention times, carbon doesn’t know what to do.

And always use multiples of the the predecessors.

Every aggregation timeframe has to be a multiple of every one of the timeframes you used before.

10m is 10*1m and 2*5m so this is ok. You just have to decide for how long you want to keep these > retentions. I’d go for not too many timeframes. Remember, these are only aggregation levels. Theoretically you could use 1m:4y (please don’t do this, you will end up with unreasonably huge files) and zoom in and out just as you whish.

The first part of a 1m:1d tuple states the aggregation level. Let me explain with a part from my example above:

1m:1d,5m:7d

Means:

  • One datapoint every minute for 1 day
  • After this day 5 of these datapoints are calculated into one aggregation (per default it’s the mean, all summed up and diveded by 5).
  • These aggegations will be saved for 7 days
  • After these 7 days, the already aggregated values will be aggregated again
  • When the last aggregation level is reached the data is deleted from the database

So the aggregation timeframe means, you can not zoom in deeper than the aggregation timeframe. You can always zoom out. Using an aggregation timeframe of 1y would only be reasonable if you wanted to keep your data for, say, 100 years.

Remember that every time you change the storage schema this only effects newly created whisper files. You will have to tranform older files if you want them to use the new schema as well. This might not be losless. In most cases it’s just easier to delete the whisper files already created.

Change Aggregation

Example script

    sudo -u graphite find /var/lib/graphite/whisper/icinga2/$YOURHOSTNAMEHERE/services/ -type f -name '*.wsp' -exec python /etc/graphite/whisper-resize.py {} --nobackup 60s:14d 120s:90d 15360s:360d \;
systemctl stop graphite_carbon_cache.service
cd path/to/whisperfolder
find ./ -type f -name 'value.wsp' -exec whisper-set-aggregation-method.py {} average 0 \;
find ./ -type f -name 'warn.wsp' -exec whisper-set-aggregation-method.py {} average 0 \;
find ./ -type f -name 'crit.wsp' -exec whisper-set-aggregation-method.py {} average 0 \;
find ./ -type f -name 'max.wsp' -exec whisper-set-aggregation-method.py {} max 0 \;
find ./ -type f -name '*.wsp' -exec whisper-resize.py --nobackup {} 1m:2w 5m:8w 30m:1y 360m:8y \;
systemctl start graphite_carbon_cache.service

General Problems

graphite2 package on RHEL/CentOS

This is not the Graphite metrics project. Instead, you’ll need to install carbon-cache, whisper and graphite-web as packages, when available.

graphite2.x86_64 1.3.10-1.el7_3 @base

(Michael Friedrich) pinned #2