Getting to know InfluxDB

I touched on InfluxDB in the My Monitoring Journey: Cacti, Graphite, Grafana & Chronograf post and then covered its installation and setup in the Installing & Setting up InfluxDB, Telegraf & Grafana post. Now it’s time to look at how the database actually works and commands we can use to integrate it.

InfluxDB Structure

In the latter mentioned post above we saw that Telegraf had created a telegraf  database in InfluxDB. Let’s now jump into InfluxDB and take a look at this database:

To view a list of all of the databases, issue the following show databases  command:

As the name suggests, _internal  is an internal InfluxDB database, therefore let’s take a look at what’s happening in the telegraf  database. To do this we must first execute the use telegraf  command:

Now that we’re inside of the telegraf database, let’s dig a little deeper using the show measurements  command:

As the InfluxDB documentation explains, SQL measurements can be thought of as SQL tables. As the measurement names above suggest, each one contains information which pertains to a specific entity (e.g cpu  contains CPU utilisation data).

Each record stored inside of a measurement is known as a point . Points are made up of the following:

  • time : Timestamp that represents the time in which the data was recorded.
  • field : Contain the actual measurement data, e.g 5% CPU utilisation. Each point  must contain one or more fields .
  • tags : Metadata about the data being recorded, e.g the hostname of the device whose CPU is being monitored. Each point  can contain zero or more tags .

(Note that both the fields and tags can be thought of as columns in the database table. We’ll see why in a moment.)

Let’s now see what fields and tags are in the telegraf  database.

Field Keys:

Tag Keys:

InfluxDB Queries

Let’s now see how fields and tags work together. Note that for the remainder of this post we’ll focus on the cpu  measurement:

With any luck this output should show you the difference between fields  and tags .  Given that the field  columns have varying data (e.g usage_idle  and usage_system), we know that they’re storing data. On the other hand, we can see that the tag  columns (e.g cpu and host ) have static information (e.g cpu0 , cpu1 , ubuntu ).

To really hit the point home, let’s look at an example. Let’s say we want to see only the ubuntu  host’s cpu-total  information. We’ll need to utilise those two tags  in order to see the fields  (data), like so:

Given that the fields are variable, it would be impossible to use them to obtain the above information which is why we use the static tags  instead.

Now that you know the difference between the two, let’s filter all output but the  usage_user  fields :

Note that I also included the cpu and host tags . The reason I did this is because without them, we wouldn’t know which cpu and host the usage_user data pertained to.

One other thing to note is that you must always specify at least one field , while tags are optional. This is because fields  contain data and  tags  are metadata. Therefore the former can provide value when used in isolation whereas the latter does not.

Keys & Values

Now that you’ve got an understanding of   fields and tags, it’s time to point out the difference between field keys and field values and between  tag keys and tag values.

If you were to think of the above databases as spreadsheets, the top row consists of field keys and tag keys. Every other “cell” contains field values and tag values .

For example, in the output above usage_user is a field key while cpu and host are tag keys. Further to this, every value in those columns are field values and tag values respectively.

Key Set

A collection of field key and field value pairs on a point is known as a field set.

Tag Set

A collection of tag keys and tag value pairs on a point is known as a tag set.

Series

As per the InfluxDB documentation, a Series is a “collection of data in InfluxDB’s data structure that share a measurement, tag set, and retention policy.”

Indexing

While on the topic of tags and keys, something you should keep at the front of your mind is that the former is indexed while the latter is not. As a result, queries run against tags perform much better than those which are performed against keys. As described in the InfluxDB documentation, it can be worthwhile redesigning your database to ensure that it runs optimally.

Retention & Shard Groups

I was going include Retention Policies and Shard Groups in this post but I’ve got so much to say about them I ended up putting them in their own post.

Glossary

If you see a term you’re unfamiliar with, it’s highly like you will find its definition in the  Glossary.

Schema Design

General Recommendations, Encouraged Schema Design, Discouraged Schema Design, and Shard Group Duration Management are all covered in the Schema Design section of the InfluxDB documentation and is well worth a read before you start using InfluxDB in a production environment.

Knowledge Base

See my Knowledge Base for more information.

As always, if you have any questions or have a topic that you would like me to discuss, please feel free to post a comment at the bottom of this blog entry, e-mail at will@oznetnerd.com, or drop me a message on Twitter (@OzNetNerd).

Note: This website is my personal blog. The opinions expressed in this blog are my own and not those of my employer.

Leave a Reply

Your email address will not be published. Required fields are marked *