I’ve used Grafana for several years. Ever since the first time I used it, I have wanted to sit down and write a server which would provide metrics to it through the Simple JSON datasource plugin. I’m happy to announce that I have finally gotten around to doing just that. The tool is called PyGraf and will be opened sourced in the very near future!
In the meantime, I’d like to share my findings as there’s not a lot of documentation on the topic.
The Simple JSON docs tells us that our backend must support the following endpoints:
/ should return 200 ok. Used for “Test connection” on the datasource config page.
/search used by the find metric options on the query tab in panels.
/query should return metrics based on input.
/annotations should return annotations.
That’s great to know, but how do we go about testing them? Let’s run through that now.
As you’ve probably guessed by Docker posts, I’m a huge fan of containerisation. Therefore instead of installing Prometheus on a host, let’s instead spin it up in a container. As described on the Prometheus website, we can accomplish this by issuing only a single command:
docker run -p 9090:9090 -v /tmp/prometheus.yml:/etc/prometheus/prometheus.yml prom/prometheus
Let’s break down each component of this command to make sure we fully understand what it is doing:
docker run: Spin up a container
-p 9090:9090: Bind a port on our Docker host to a container port. This enables devices outside of the Docker host to reach the container on port 9090
-v /tmp/prometheus.yml:/etc/prometheus/prometheus.yml: Binds the
/tmp/prometheus.yml file stored on the Docker host to
/etc/prometheus/prometheus.yml inside of the container
<user_account>/<container_name> that we want to use
If you’ve used Grafana, or even heard of it, chances are you’ve also heard of InfluxDB and Prometheus too. As I haven’t touched on the latter yet, I figured now is a good time to start. In case you haven’t heard of some, or all of these applications, let’s start off with a quick description on what they can do for us.
Note: You might also want to have a read of the My Monitoring Journey: Cacti, Graphite, Grafana & Chronograf post too.
Grafana is a frontend web app that is used to create beautiful dashboards. It does this by retrieving metrics which are stored on backend database servers such as InfluxDB, Prometheus MySQL, PostgreSQL and Graphite (to name just a few). It then uses metrics to create graphs which are displayed on the aforementioned dashboards.
I first mentioned Telegraf in the My Monitoring Journey: Cacti, Graphite, Grafana & Chronograf post and then covered its installation and setup in the Installing & Setting up InfluxDB, Telegraf & Grafana post. Let’s now delve a little deeper, shall we?
The good news is that there’s a lot less to Telegraf’s configuration than what there is to InfluxDB so you’ll likely find this post easier to follow than the Getting to know InfluxDB and article.
What is it?
Before diving into configurations, it would be best to first cover off what Telegraf actually is. To quote the Telegraf GitHub page:
Telegraf is an agent written in Go for collecting, processing, aggregating, and writing metrics.
Design goals are to have a minimal memory footprint with a plugin system so that developers in the community can easily add support for collecting metrics from well known services (like Hadoop, Postgres, or Redis) and third party APIs (like Mailchimp, AWS CloudWatch, or Google Analytics).
I’ve demonstrated a few InfluxDB commands in my Getting to know InfluxDB and InfluxDB: Retention Policies & Shard Groups posts but though it would be a good idea to write a post completely dedicated to useful CLI commands – so here it is.
This command is self explanatory. It lists all of your InfluxDB databases:
> SHOW DATABASES
Enters a database so that subsequent commands will be run against it:
> USE telegraf
Using database telegraf
Note: This is a continuation of the Getting to know InfluxDB post. If you haven’t read it yet, I suggest you do before reading this post.
I found InfluxDB’s documentation around Retention Policies (RP) and Shard Groups quite unclear in parts and am therefore writing this post to assist others who find themselves feeling the same way.
What is a Retention Policy?
As the documentation says:
The part of InfluxDB’s data structure that describes for how long InfluxDB keeps data (duration), how many copies of those data are stored in the cluster (replication factor), and the time range covered by shard groups (shard group duration). RPs are unique per database and along with the measurement and tag set define a series.
When you create a database, InfluxDB automatically creates a retention policy called
autogen with an infinite duration, a replication factor set to one, and a shard group duration set to seven days. See Database Management for retention policy management.
I touched on InfluxDB in the My Monitoring Journey: Cacti, Graphite, Grafana & Chronograf post and then covered its installation and setup in the Installing & Setting up InfluxDB, Telegraf & Grafana post. Now it’s time to look at how the database actually works and commands we can use to integrate it.
In the latter mentioned post above we saw that Telegraf had created a
telegraf database in InfluxDB. Let’s now jump into InfluxDB and take a look at this database:
$ influx --username influx --password influx_pass
To view a list of all of the databases, issue the following
show databases command:
> show databases
I mentioned these tools in the My Monitoring Journey: Cacti, Graphite, Grafana & Chronograf post and thought now would be a good time to cover their installation and setup. Let’s get started.
Installing InfluxDB & Telegraf
Instructions on how to install all of the TICK stack components can be found here. As I’m running Ubuntu, I’ll need to run these commands:
sudo dpkg -i influxdb_1.2.4_amd64.deb
sudo dpkg -i telegraf_1.3.1-1_amd64.deb
Note that after running the last command, the Telegraf service automatically starts:
will@ubuntu:/tmp$ sudo dpkg -i telegraf_1.3.1-1_amd64.deb
Selecting previously unselected package telegraf.
(Reading database ... 177117 files and directories currently installed.)
Preparing to unpack telegraf_1.3.1-1_amd64.deb ...
Unpacking telegraf (1.3.1-1) ...
Setting up telegraf (1.3.1-1) ...
Created symlink from /etc/systemd/system/multi-user.target.wants/telegraf.service to /lib/systemd/system/telegraf.service.
I remember using Cacti at my first job over a decade ago. I’ve revisited it more than a few times since but it hasn’t been my go to monitoring tool for quite some time for a number of reasons, such as:
- It’s not visually appealing when compared to Grafana and Chronograf.
- It’s difficult to set up.
- It’s difficult to maintain.