A distro agnostic guide to Graphite with venv (examples: RHEL8, Debian10, Ubuntu18)

With a lot of outdated documents on how to get Graphite working properly, I figured it would be good to provide something up to date and relevant for the latest batch of enterprise Linux operating systems, but also using virtual environments to make it usable on anything with recent versions of Python.

This article is targeted at the new standardization of python3.6, and we’re using py-venv without the (quite awesome) Pipenv wrapper, because in the case of carbon/graphite, it’ll actually waste more time.

For this demonstration, we will be using the following:

  • Whisper backend
  • 1 Carbon relay
  • 2 Carbon cache daemons
  • Graphite-web
  • Gunicorn
  • Nginx
  • Icinga2

If you’re completely new to Graphite, it’s like this: Carbon is the actual time series database, Graphite is a web portal that sits in front of it. Whisper is a storage file format in which Carbon houses metrics. There are other options for a Carbon backend, but this is the one that’s part of the project.

I’m choosing to do this with a carbon-relay to get your feet slightly damp with Carbon clustering. A carbon-relay just routes incoming data to various cache daemons which handle the actual work. You can add as many carbon-cache daemons as you like, and it’s not unusual to use one per server core on dedicated Graphite servers. You may wish to explore more advanced setups with aggregation in the future if you find yourself housing 32TB of graph data.

This was tested to completion on a minimal installation of RHEL8. Debian 10 and Ubuntu 18.04LTS were tested in containers up through the installation process. Debian 10 was tested June 8th, 2019, this was after the code freeze in March, so it should stay relevant.

Additionally, firewall management is not described here.


Getting distro dependencies

Although the instructions here will be applicable to anything with systemd and python3.6+ going forward, you’ll need to have the necessary things for the toolset to build. If you’re on something esoteric, I assume you can handle this step.

RHEL8

Example from minimum install:

# dnf install python36 python36-devel gcc gcc-c++ make libffi-devel

Debian 10 and Ubuntu 18.04

Debian provides Python 3.7 and Ubuntu provides Python 3.6. Both are fine for this demonstration.

# apt update
# apt install python3 python3-dev python3-pip python3-venv gcc g++ make libffi-dev libcairo2-dev

Installing Graphite

Let’s install our virtual environment.

# cd /opt
# python3 -m venv graphite

And an unprivileged user to run this.

# useradd -d /opt/graphite -s /bin/bash -M -r carbon
# chown -R carbon:carbon /opt/graphite

Now let’s get into our environment. Using activate as your bash source will set up the necessary environment variables for your contained instance of python.

# su - carbon
$ source bin/activate

We’re going to use pip to install everything. Graphite/Carbon have some set expectations on where everything should be installed, so we’ll be specifying that explicitly.

(graphite) $ pip install --upgrade pip wheel setuptools gunicorn whisper
(graphite) $ pip install carbon --install-option="--prefix=/opt/graphite" --install-option="--install-lib=/opt/graphite/lib"
(graphite) $ pip install graphite-web --install-option="--prefix=/opt/graphite" --install-option="--install-lib=/opt/graphite/webapp"

Configuring Carbon

carbon.conf

$ cd /opt/graphite/conf
$ cp carbon.conf.example carbon.conf
$ yourfaveditor carbon.conf

It’s good to read through this entire config file. You’ll learn a lot about how Carbon works. We need to cover a few pitfalls though that relate to a clustered setup.

Let’s make sure we’re using the carbon user here:

USER = carbon

We are going to REMOVE the following lines. This is because these are aimed at a single carbon-cache daemon running and not multiple. This will be repurposed elsewhere.

LINE_RECEIVER_PORT = 2003
PICKLE_RECEIVER_PORT = 2004
UDP_RECEIVER_PORT = 2003
CACHE_QUERY_PORT = 7002

This is on by default, but I want to point out a rookie mistake I made: do not put this under [relay]. If you do, the cache on the relay will back up while the carbon-cache daemons start dropping points. This results in your carbon-relay brickwalling at 100% CPU and metrics being lost.

USE_FLOW_CONTROL = True

Safety first! Change this to true.

WHISPER_LOCK_WRITES = True

There is a URL here for Graphite so that it and Carbon can make friends. Later on when we prepare the local_settings.py for Django, you might add a uri to the end. In this example, I have used /graphite for my server, and the default Gunicorn port of 8000 is mentioned.

GRAPHITE_URL = http://127.0.0.1:8000/graphite

Okay, so there’s our default carbon-cache config. Next is to create specific configs for each carbon-cache daemon. Try this:

[cache:a]
LINE_RECEIVER_PORT = 2003
PICKLE_RECEIVER_PORT = 2004
CACHE_QUERY_PORT = 7002

[cache:b]
LINE_RECEIVER_PORT = 2103
PICKLE_RECEIVER_PORT = 2104
CACHE_QUERY_PORT = 7102

This is why we removed those lines earlier.

Now it’s time to configure our relay. You may have noted the LINE and PICKLE protocols mentioned earlier. These are two ways to ship data to Carbon. Icinga uses LINE, which is plain text and less efficient. In this case, our relay is going to receive on LINE, but route to the cache daemons using PICKLE.

The “consistent-hashing” value below tells Carbon to equally distribute responsibility across the cache daemons. This is for a simple Graphite setup where we just have one database and need more resources for writing to it. For sharded or redundant Graphite servers across multiple nodes, there is plenty of that in Graphite’s official documentation, and if you’re reading this, you are so not ready.

Let’s setup our relay section at the end like this. You can adjust to your liking as you learn more about Carbon.

[relay]
LINE_RECEIVER_INTERFACE = 0.0.0.0 # you can use 127.0.0.1 if Icinga is on the same server
LINE_RECEIVER_PORT = 2013
RELAY_METHOD = consistent-hashing
REPLICATION_FACTOR = 1
DYNAMIC_ROUTER = True
DESTINATIONS = 127.0.0.1:2004:a,127.0.0.1:2104:b
DESTINATION_PROTOCOL = pickle
DESTINATION_TRANSPORT = none 
USER = carbon
QUEUE_LOW_WATERMARK_PCT = 0.8
TIME_TO_DEFER_SENDING = 0.0001

If you suddenly have carbon-relay hitting 100% CPU: Earlier we talked about having USE_FLOW_CONTROL on for all daemons. Still, it may outgrow you. I have my carbon-relay set to a MAX_QUEUE_SIZE of 5,000,000. Plan to increase these values as needed. If you’re running out of resources, you simply have too many metrics for your hardware to handle. With 50,000 service checks, I’m running 4 cache daemons, 1 carbon-relay, and the IDO database in postgresql on the same 6 core server. It’s at this exact moment using 2GB of memory out of 16 I’ve allotted it. You should be okay unless your infrastructure is huge.

MAX_QUEUE_SIZE = 100000
MAX_DATAPOINTS_PER_MESSAGE = 5000

If you want to correct me on how to handle this, please do so in the comments.

Anyway, onto our schemas.

storage-schemas.conf

PAY ATTENTION FOR A MINUTE

Something that you’ll find a million forum hits for if you google it is “I’m only seeing x days of graph data!” When setting up the following config file, FIRST MATCH HITS. I made this mistake when I was new, too. Don’t feel bad.

It’s time to setup our storage aggregation for Icinga. Copy the following example file to get started.

$ cp storage-schemas.conf.example storage-schemas.conf
$ emacs storage-schemas.conf

Let’s add our regex patterns for Icinga time series. The format for retentions should be relatively obvious.

[icinga-hosts]
pattern = ^icinga2\..*\.host\.
retentions = 5m:4w

[icinga-services]
pattern = ^icinga2\..*\.services\.
retentions = 5m:12w,15m:26w,30m:1y

[carbon]
pattern = ^carbon\.
retentions = 60:90d

[default]
pattern = .*
retentions = 60s:1d,5m:30d,1h:3y

So here I’ve said all host checks (typically just pings) should house 5 minute time intervals for 4 weeks. Services should hold 5 minute intervals for up to 12 weeks, 15 minutes for up to 26 weeks, and 30 minutes for up to a year. If you have a very large infrastructure, expect a terabyte to disappear as soon as you turn this on. If you left WHISPER_FALLOCATE_CREATE set to true, it’s going to claim all the space it needs within the hour.

storage-aggregation.conf

This config file is to specify how we want it to shove our data backward. Here’s an example from mine:

[default_average]
pattern = .*
xFilesFactor = 0
aggregationMethod = average

Let’s break this down, bottom to top:

aggregationMethod can be average, sum, min, max, or last. In my case, I have it averaging out the data points as they move from 5m to 15m.

xFilesFactor describes what to do with null values. This is a floating point number from 0 to 1. The default is 0.5, so if one time period has less than 50% of the time points defined, that becomes null as it moves backward. This often isn’t ideal. In my case, I have a lot of 1m intervals set for checks that only run every 5 minutes. I like this because when it starts doing the retries in soft state, I get a more detailed view of what’s going wrong. If I used 0.8, my graph would just disappear the second it aggregates. By setting it to 0, I’m saying that I don’t care how few of the time points are used, if there are any, do something with it. 0.1 is another good option.

pattern is a regex just like in storage-schemas.conf. If you have some checks you’d rather take a different approach to xFilesFactor or aggregationMethod for, go ahead and create more blocks above this one.


Configuring Graphite

Graphite is a Django app. As such, there are standard ways of configuring your database and authentication methods. For the sake of the tutorial, we are going to blindly accept the defaults, and I’ll defer you to resources for moving out of staging after you have a working setup.

/opt/graphite/webapp/graphite/local_settings.py

Let’s dig into our Django config. We’ll cover a few settings worth overriding the defaults at this stage.

Make up some garbage for your secret key for hashing. The longer the better.

SECRET_KEY = 'imalittleteapotshortandstout'

In our example carbon.conf earlier, we specified this uri. This will be addressed in our nginx.conf as well. We’re leaving off that trailing / though.

URL_PREFIX = '/graphite'

This by default isn’t expecting multiple cache daemons. add as many as you created.

CARBONLINK_HOSTS = ["127.0.0.1:7002:a", "127.0.0.1:7102:b"]
CARBONLINK_TIMEOUT = 1.0
CARBONLINK_RETRY_DELAY = 15 # Seconds to blacklist a failed remote server
# The replication factor to use with consistent hashing.
# This should usually match the value configured in Carbon.
REPLICATION_FACTOR = 1

This should be enough to get you started here. Definitely adjust to your needs once Graphite is working. Let’s get our config for gunicorn ready.

(graphite) $ cp /opt/graphite/conf/graphite.wsgi.example /opt/graphite/webapp/graphite.wsgi

You shouldn’t even need to mess with this. Let’s move on with our lives. One last step is to run Django’s database migration feature.

(graphite) $ PYTHONPATH=/opt/graphite/webapp django-admin.py migrate --settings=graphite.settings --run-syncdb

If you have selinux enabled, you’ll need to do this as root for later:

# setsebool -P httpd_can_network_connect 1

Systemd configuration

If you’re one of the last hold outs on systemd, I’m going to assume your skills at writing init scripts are wicked good and you don’t need my help.

From your virtualenv, type env on the command line to get details about the environment. We’ll use this in our units.

Enter /etc/systemd/system as root and let’s create these files.

carbon-cache@.service

[Unit]
Description=carbon-cache instance %i (graphite)

[Service]
Environment=PATH=/opt/graphite/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin
Environment=VIRTUAL_ENV=/opt/graphite
User=carbon
Group=carbon
ExecStartPre=/bin/rm -f /opt/graphite/storage/carbon-cache-%i.pid
ExecStart=/opt/graphite/bin/carbon-cache.py --instance=%i start --pidfile=/opt/graphite/storage/carbon-cache-%i.pid
Type=forking
PIDFile=/opt/graphite/storage/carbon-cache-%i.pid
LimitNOFILE=128000

[Install]
WantedBy=multi-user.target

If you’re new to systemd units, the @ method is for passing an instance id to a service so that you do not need redundant init options in cases such as this.

carbon-relay.service

[Unit]
Description=Graphite Carbon Relay
After=network.target

[Service]
Environment=/opt/graphite/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin
Environment=VIRTUAL_ENV=/opt/graphite
Type=forking
StandardOutput=syslog
StandardError=syslog
ExecStart=/opt/graphite/bin/carbon-relay.py --config=/opt/graphite/conf/carbon.conf --pidfile=/var/run/carbon-relay.pid start
ExecReload=/bin/kill -USR1 $MAINPID
PIDFile=/var/run/carbon-relay.pid
Restart=always

[Install]
WantedBy=multi-user.target

graphite.service

[Unit]
Description = Graphite

[Service]
Environment=PATH=/opt/graphite/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin
Environment=PYTHONPATH=/opt/graphite/webapp:/opt/graphite
Environment=VIRTUAL_ENV=/opt/graphite
User = carbon
Group = carbon
WorkingDirectory = /opt/graphite/webapp
PIDFile = /var/run/graphite/graphite.pid
ExecStart = /opt/graphite/bin/gunicorn --bind 0.0.0.0:8000 graphite.wsgi
ExecReload = /bin/kill -s HUP $MAINPID
ExecStop = /bin/kill -s TERM $MAINPID

[Install]
WantedBy = multi-user.target

Okay, we’re good to start enabling things.

# systemctl daemon-reload
# systemctl enable carbon-cache@a carbon-cache@b carbon-relay graphite
# systemctl start carbon-cache@a carbon-cache@b carbon-relay graphite

If you made more than 2 carbon cache daemons, include them as well.


Configuring Icinga

If you’re reading this document for another platform, skip to their manual at this point. Carbon is likely already making metrics for itself, but we want to feed it something useful.

We’re not going to be pointing Icinga at Graphite, but rather at Carbon. That line connection we made on the carbon-relay is going to disburse this to the daemons.

Let’s get things ready to go. On the server running Icinga:

# icinga2 feature enable graphite
# emacs /etc/icinga2/features-enabled/graphite.conf

We’re going to add an entry like so. Replace the address either with your graphite server or with 127.0.0.1 if Icinga is running on the same server.

object GraphiteWriter "graphite-demo" {
  host = "192.168.122.184"
  port = 2013
}

Make sure to use the line port from your carbon relay, not a pickle port and not one of the cache daemons. Icinga feeds over line, but the relay will use pickle to communicate to the cache daemons.

# icinga2 daemon -C
# systemctl reload icinga2

Now let’s verify that our whisper files are generating correctly. Navigate into /opt/graphite/storage/whisper/icinga2 and drill down until you find a .wsp file for a given metric. We can use the whisper-info.py script to view the file’s metadata.

# /opt/graphite/bin/whisper-info.py ./value.wsp

aggregationMethod: average
maxRetention: 31536000
xFilesFactor: 0.0
fileSize: 710260

Archive 0
offset: 52
secondsPerPoint: 300
points: 24192
retention: 7257600
size: 290304

Archive 1
offset: 290356
secondsPerPoint: 900
points: 17472
retention: 15724800
size: 209664

Archive 2
offset: 500020
secondsPerPoint: 1800
points: 17520
retention: 31536000
size: 210240

Here we have retentions of 5, 15, and 30 minutes, just like in my storage-schemas.conf, as well as an xFilesFactor of 0, like the storage-aggregation.conf shown earlier. These files have created correctly.

If you ever find yourself needing to change these values, simply changing the config and reloading carbon will not solve the problem. See whisper-resize.py for more information.

If it seems like some metrics are not populating, check to see if carbon-relay is using 100% cpu and return to that section of the tutorial.


Configuring Icingaweb

You don’t actually need nginx even setup at this point. Icinga will get what it needs directly from Gunicorn.

# cd /usr/share/icingaweb2/modules
# git clone https://github.com/Icinga/icingaweb2-module-graphite.git
# mv icingaweb2-module-graphite graphite

In icingaweb2, navigate to configuration > modules > graphite and click the reload icon to enable.

Enter the URL and the credentials if necessary (we didn’t go over auth methods in this guide). In this case, we’re pointing it directly to the IP address and port 8000, which is what Gunicorn is serving on.

de9cc1f2

You should now see graphs in Icinga, like this one showing what happens when I run portupgrade on my webserver:

1f4b30e0

Nginx configuration

Well, you probably want to actually be able to use Graphite in its original intended capacity. Or not. Some people install the API-only version and use Grafana. You should look into Grafana, it’s pretty awesome.

RHEL8:

# dnf install nginx

Ubuntu/Debian:

# apt install nginx

You’re going to want to add the following to any server {} block that is relevant. This isn’t meant to be a tutorial on nginx, so if it thus far isn’t something you’re using, just shove this in the default server block in /etc/nginx/nginx.conf

        location /graphite/ {
           proxy_pass http://127.0.0.1:8000/graphite/;
           allow 10.5.0.0/16; # example
           deny all; # example
        }

        location /graphite/static/ {
          alias /opt/graphite/webapp/content/;
        }

A few things to adjust accordingly, set “proxy_pass” to the gunicorn url, and make sure to have the trailing / as this is how Django “views” are standardized. The allow/deny settings I provided are an example and not required. In this example, I’m saying allow this particular IP subnet at my company, and deny everyone else.

Now go to http://yourdomain/graphite/ and make sure you can view all elements.


Moving out of your test environment

So we really didn’t bother to secure anything here. I didn’t even give you advice on how to manage your firewall (because I don’t know if and what one you’re using). Depending on what in your network is exposed where, you may have some holes to close before you start running Graphite in production. Here are your jumping off points:

Authentication

These settings are Django specific and defined in local_settings.py. Graphite’s own documentation doesn’t have much to say. Check here:
Settings | Django documentation | Django

Database backend

Same goes for using a database other than sqlite.
Databases | Django documentation | Django

Encryption between Carbon and Icinga

Note the following from the example config:

# This defines the wire transport, either none or ssl.
# If SSL is used any TCP connection will be upgraded to TLS1.  The system's
# trust authority will be used unless DESTINATION_SSL_CA is specified in
# which case an alternative certificate authority chain will be used for
# verifying the remote certificate.
# To use SSL you'll need the cryptography, service_identity, and twisted >= 14
# DESTINATION_TRANSPORT = none
# DESTINATION_SSL_CA=/path/to/private-ca.crt

Using SSL certificates with Gunicorn

Just add these flags to your gunicorn command:
--certfile /path/to/cert.pem --keyfile /path/to/keyfile.key

Using SSL certificates with Nginx

From Nginx documentation: Configuring HTTPS servers

7 Likes

This is my first how-to here. I want to write more of these. Feel free to hit me in the comments with any corrections, concerns, or requests for more thorough explanations in certain topics.

Also I’ll try to find time to figure out how to make it not treat my root examples as comments because ugh. I tested my markdown in Boostnote.

3 Likes

6 posts were split to a new topic: Wrong Graphite Web URL

Thanks for the guide.

I recently made the experience that Graphite API is a great replacement for Graphite Web, as it removed all the Web UI and Django requirements.

Resources:

https://graphite-api.readthedocs.io/en/latest/

Unfortunately no activity since 2017

Hello Blake!
Great tutorial, thank you!
But I dare to point to a little mistake:

This seems to be a reference to SELinux, not Systemd, doesn’t it?

Typo discovery ftw. Will fix. Thanks.

Hi Blake,

Firstly, thanks for the guide, I used this to setup graphite with my icinga2.

Just thought I’d comment, I’m new to linux found the guide pretty tricky in some parts where steps were left out or not explained thoroughly (understandable as the target audience was probably people familiar with gnu/linux)

e.g. advising to create .conf from examples sometimes but not others
entering and existing virtual environment
editing carbon.conf was a bit confusing

and for anyone else’s benefit, i had a error when running PYTHONPATH=/opt/graphite/webapp django-admin.py migrate --settings=graphite.settings --run-syncdb

turned out I needed to run sudo apt-get install libcairo2-dev first on my ubuntu 18.04.

but overall very thankful for the guide, great work Blake.

libcairo? Weird. I’ll spin up a VM at some point this weekend and update accordingly if I catch anything else missing.

I’ll look into the the flow a bit. Thanks for the feedback. I can’t make carbon.conf not confusing though, that thing is a monstrosity lol

Just went through your how to, so thanks at first! Very helpful and it works.

At first I tried to omit the “multiple” relay config and leave those settings on default, but that didn’t work.
So I just configured everything as in the how to, but removed the second relay.
Got it working then.

This was a bit confusing at first, as it goes into the same config file carbon.conf, but you mention carbon-cache config. So I spent some minutes looking for some carbon-cache.conf :smiley:

Also when trying to start the daemons I got an error:

Jul 11 10:57:11 blub carbon-relay.py[44875]: In carbon.conf, DESTINATION_PROTOCOL must be one of pickle, line. Invalid value: 'pickle
Jul 11 10:57:11 blub carbon-relay.py[44875]: DESTINATION_TRANSPORT = none'

Problem was a single space in front of DESTIONATION_PORT = none.
Maybe add some hint to not having space in front of the key-value pairs (for dummies like me ;))

This wasn’t needed for me on my Ubuntu 18.04 machine

Yea, I’m not sure.

I was just getting

File “/opt/graphite/webapp/cairocffi/init.py”, line 36, in dlopen
raise OSError(“dlopen() failed to load a library: %s” % ’ / '.join(names))
OSError: dlopen() failed to load a library: cairo / cairo-2 / cairo-gobject-2 / cairo.so.2

After trying to run PYTHONPATH=/opt/graphite/webapp django-admin.py migrate --settings=graphite.settings --run-syncdb

but after I did apt-get install libcairo2-dev it worked.

So again not sure, very new to linux.

Likely the cairo devel package pulls in the actual library required by graphite web. Cairo in this regard is used to render the graph images. Could be a problem with pip dependencies.

@blakehartshorn maybe it is a good idea to add the check for dependencies into the installation details, like described here?

Chances are I just messed this up when consolidating my notes for Debian and Ubuntu as they’re pretty redundant. I’ll throw that in and if I find time this weekend do a dry run of all this on an Ubuntu VM.

Okay, this also affected Buster :man_facepalming:

Thanks for pointing out my errors. I got through install testing deb/ubuntu and finished the guide on RHEL. Since the carbon/carbonffi Python modules built without protest in pip, I figured they were fine.