
If there’s one thing that characterizes the Information Age that we find ourselves in today, it is streams of data. However, without proper ways to aggregate and transform this data into information, it’ll either vanish into the ether or become binary blobs gathering virtual dust on a storage device somewhere. Dealing with these streams of data is thus essential, whether it’s in business (e.g. stock markets), IT (e.g. services status), weather forecasting, or simply keeping tracking of the climate and status of devices inside a domicile.
The first step of aggregating data seems simple, but rather than just writing it to a storage device until it runs out of space like a poorly managed system log, the goal here isn’t merely to record, but also to make it searchable. After all, for information transformation we need to be able to efficiently search and annotate this data, which requires keeping track of context and using data structures that lend themselves to this.
For such data aggregation and subsequent visualization of information on flashy dashboards that people like to flaunt, there are a few mainstream options, with among ‘smart home’ users options like InfluxDB and Grafana often popping up, but these are far from the only options, and depending on the environment there are much more relevant solutions.
Don’t Call It Data Hoarding
Although the pretty graphs and other visualizations get most of the attention, the hard part comes with managing the incoming data streams and making sure that the potentially gigabytes of data that come in every day (or more, if you work at CERN), are filed away in a way that makes retrieval as easy as possible. At its core this means some kind of database system, where the data can be transformed into information by stuffing it into the appropriate table cells or whatever equivalent is used.
For things like sensor data where the data format tends to be rather simple (timestamp and value), a time series database (TSD) can be an efficient option as the full feature set of e.g. a full-fat SQL database like MySQL/MariaDB or PostgreSQL is unneeded. There are also a lot of open source options out there, making TSD users spoiled for choice. For example:
- InfluxDB – Partially open source, with version 3 being less of a successor and more of its own ‘edge data collector’ thing. Somewhat controversial due to the company’s strong commercial focus.
- Apache Kudu – Column-based database optimized for multidimensional OLAP workloads. Part of the Apache Hadoop distributed computing ecosystem.
- Prometheus – Developed at SoundCloud to support metrics monitoring. Also written in Go like InfluxDB v1 and v2.
- RRDTool – An all-in-one package that provides a circular buffer TSD that also does graphing and has a number of bindings for various programming languages.
- Graphite – Similar to RRDTool, but uses a Django web-based application to render graphs.
- TimescaleDB – Extends PostgreSQL and thus supports all typical SQL queries like any other relational database. The extensions focus on TSD functionality and related optimizations.
The internal implementations of these databases differ, with InfluxDB’s storage engine splitting the data up in so-called shards, which can be non-compacted ‘hot’ shards, or compacted ‘cold’ shards. The main purpose of this is to reduce the disk space required, with four compaction levels (including delta compression) used while still retaining easy access to specific time series using a time series index. The shard retention time can be optionally set within the database (‘bucket’) to automatically delete older shards.
A circular buffer as used by RRDTool dodges much of this storage problem by simply limiting how much data can be stored. If you do not care about historical data, or are happy to have another application do this long-term storage, then such a more simple TSD can be a lightweight alternative.
Pretty Graphs

While some of the TSDs come with their own graphing system, others rely on third-party solutions. The purpose of this graphing step is to take the raw data in the TSD and put them into a graph, a table or some other kind of visualization. When multiple of such visualizations are displayed concurrently and continuously, it’s called a ‘dashboard’, which is what software like Grafana allows you to create.
As an example of such a system, there is the Building Management and Control (BMaC) project that I created a few years ago. In addition to being able to control things like the airconditioning, the data from multiple sensors constantly get written into an InfluxDB bucket, which in the office test environment included such essentials like the number of cups of regular coffee and espresso consumed at the Jura coffee makers with their TOP-tronics brains, since this could be read out of their Flash memory.
With this visualization dashboard it’s easy to keep track of room temperature, air quality (CO2) and when to refill the beans in the coffee machines. Transforming raw data into such a dashboard is of course just one way to interpret raw data, with generating one-off graphs for e.g. inclusion in reports being another one. Which type of transformation is the right one thus depends on your needs.
In a more dynamic environment like system monitoring, you would likely prefer something like Nagios. This features clients that run on the systems being monitored and submit status and event reports, with a heavy focus on detecting problems within e.g. a server farm as soon as possible.
Complications
Everyone who has ever done anything with software knows that the glossy marketing flyers omit a lot of the factual reality. So too with TSDs and data visualization software. During the years of using Grafana and InfluxDB mostly in the context of the BMaC project, one of the most annoying things was the installation, which for Grafana means either downloading a package or using their special repository. Meanwhile for InfluxDB you will use their special repository no matter what, while on Windows you get the raw binaries and get to set things up by hand from there.
Another annoyance with InfluxDB comes in the form of its lack of MQTT support, with only its HTTP line protocol and its SQL-dialect available as ways to insert new time series data. For BMaC I had to write a special MQTT-to-HTTP bridge to perform the translation here. Having a TSD that directly supports the data protocol and format would be a real bonus, if it is available for your use case.
Overall, running a TSD with a dashboard can be very shiny, but it can be a serious time commitment to set up and maintain. For dashboards you’re also basically limited to Grafana with all its quirks, as the project it was forked from (Kibana) only supports ElasticSearch as data source, while Grafana supports multiple TSDs and even plain SQL databases like MariaDB and PostgreSQL.
It’s also possible to create a (free) online account with Grafana to gain access to a Prometheus TSD and Grafana dashboard, but this comes with the usual privacy concerns and the need to be online 24/7. Ultimately the key is to have a clear idea beforehand of what the problem is that you’re trying to solve with a TSD and a graphing solution or dashboard.
This articles is written by : Nermeen Nabil Khear Abdelmalak
All rights reserved to : USAGOLDMIES . www.usagoldmines.com
You can Enjoy surfing our website categories and read more content in many fields you may like .
Why USAGoldMines ?
USAGoldMines is a comprehensive website offering the latest in financial, crypto, and technical news. With specialized sections for each category, it provides readers with up-to-date market insights, investment trends, and technological advancements, making it a valuable resource for investors and enthusiasts in the fast-paced financial world.