Thank you for your help, guidance, and feedback regarding this topic.
I believe I have everything working as expected (or probably 95% anyway).
I know a couple of you mentioned that you will be writing tutorials for this. I too plan on adding what I did to the How-To forum this weekend so others can hopefully benefit from it.
A bit late to the game (and I haven’t really read the thread ) , but I’ll just post what works for me when installing Graphite on a Ubuntu 18.04 machine.
GRAPHITE INSTALLATION UBUNTU 18.04 LTS (for python2)
##################### https://github.com/graphite-project/graphite-web/issues/1425#issuecomment-194905689
apt install graphite-web graphite-carbon libapache2-mod-wsgi
vi /etc/graphite/local_settings.py
CHANGE SECRET_KEY, TIME_ZONE
vi /etc/default/graphite-carbon
CARBON_CACHE_ENABLED=true
graphite-manage migrate
PYTHONPATH=/opt/graphite/webapp django-admin migrate --settings=graphite.settings --run-syncdb
sudo chown _graphite. /var/lib/graphite/graphite.db
echo Listen 8000 >> /etc/apache2/ports.conf
cp /usr/share/graphite-web/apache2-graphite.conf /etc/apache2/sites-available/graphite.conf
sed -i.org -e "/VirtualHost/s/:80/:8000/" /etc/apache2/sites-available/graphite.conf
a2ensite graphite
systemctl restart apache2
vi /etc/carbon/storage-schemas.conf
[icinga2_metadata]
pattern = ^icinga2\..*\.metadata\.
retentions = 5m:10d,30m:90d,360m:4y
[icinga2_perfdata]
pattern = ^icinga2\..*\.perfdata\.
retentions = 5m:10d,30m:90d,360m:4y
chown _graphite._graphite /var/log/graphite/ -R
systemctl stop apache2.service
systemctl start apache2.service
icinga2 feature enable graphite
vi /etc/icinga2/features-enabled/graphite.conf
enable_send_thresholds = true
systemctl restart icinga2.service
cd /usr/share/icingaweb2/modules/
git clone https://github.com/Icinga/icingaweb2-module-graphite.git graphite
### In case you use the ICMP command for host check, add this template and an "obscured graphite check command variable"
vi /etc/icingaweb2/modules/graphite/templates/icmp-host.ini
[icmp-rt.graph]
check_command = "icmp-host"
[icmp-rt.metrics_filters]
rtmin.value = "$host_name_template$.perfdata.rtmin.value"
rta.value = "$host_name_template$.perfdata.rta.value"
rtmax.value = "$host_name_template$.perfdata.rtmax.value"
[icmp-rt.urlparams]
areaAlpha = "0.5"
areaMode = "all"
lineWidth = "2"
min = "0"
yUnitSystem = "none"
[icmp-rt.functions]
rtmin.value = "alias(color(scale($metric$, 1000), '#44bb77'), 'Min. round trip time (ms)')"
rta.value = "alias(color(scale($metric$, 1000), '#ffaa44'), 'Avg. round trip time (ms)')"
rtmax.value = "alias(color(scale($metric$, 1000), '#ff5566'), 'Max. round trip time (ms)')"
[icmp-pl.graph]
check_command = "icmp-host"
[icmp-pl.metrics_filters]
pl.value = "$host_name_template$.perfdata.pl.value"
[icmp-pl.urlparams]
areaAlpha = "0.5"
areaMode = "all"
lineWidth = "2"
min = "0"
yUnitSystem = "none"
[icmp-pl.functions]
pl.value = "alias(color($metric$, '#1a7dd7'), 'Packet loss (%)')"
######
Change CHARACTERSET FOR IDO-RESSOURCE to "latin1" !
######
You will need to modify your graphite.conf file per your system configuration of mod_wsgi by running command
mod_wsgi-express module-config
I know this is a late to the game because it looks like you already found a work around but I wanted to pass along what I found. I have receive much help from others here I wanted to return what I have found.
If you have a different working method please share it with us, your welcome to take your time as long as you need … the goal here is to present a working solution
These gaps happen mostly because of a retention missmatch in graphite storage-schema. Graphite expect a metric point in a period of time which is defined as a retention in storage-schema. If your metric points not match this defined retention, graphite will set per default null as value for that “too late/early point”.
Small Example:
If you tell Graphite to expect every minute an Icinga 2 metric point your schema will look similar to this:
But if your check is scheduled every five minute the result is that you have every “five minute slot”, four “null” values and one “perfdata” value. These null values are for example your gaps. So you have to change the retention policy to this:
If I change 1m to 5m, does that solve the gap issue because it would be getting a metric every 5 minutes instead of 1 minute?
How does the retention break down?
Does the above mean get a metric every 1 minute and keep it for 2 days, get a metric every 5 minutes and keep it for 10 days, get a metric every 30 minutes and keep that for 90 days get a metric every 360 months and keep it for 4 years?
Most of my service checks are 300 second intervals. I’m trying to understand and determine the best default setting to use in order to review short term data and long term data for trending analysis etc…
Don’t go tweaking metric retention for this; it’s totally unnecessary. The default graph view in Icinga’s graphite module does space based on null, which is more visible depending on what timeframe you have it showing you. It’s helpful when you need it to be, and graphite averages things over time anyway. I’m not at a computer atm, but I know there are other settings for how it handles null values. Your output is also a moving target; retry intervals are default 30s. I use similar retentions as you.
For aesthetics, look at some of the inis in the templates folder for the graphite module in icingaweb.
I’ve got a 1m, but that’s because I do have a lot of checks that run every 1-2 minutes, as well as the retries in critcal scenarios to see if it recovers quickly or not.
Actually, let’s talk about storage-aggregation.conf for a second.
xFilesFactor is a confusing setting, from the docs:
xFilesFactor should be a floating point number between 0 and 1, and specifies what fraction of the previous retention level’s slots must have non-null values in order to aggregate to a non-null value. The default is 0.5."
I just tell that thing I don’t care how empty it looks, just average my stuff.
[default_average]
pattern = .*
xFilesFactor = 0
aggregationMethod = average