Specification | Carbon DB

Problem Description

We have more and more devices that are able to measure the environment around us and the resources we use in our daily lives. Those devices often manage data differently, which makes it difficult to combine data from multiple devices in order to understand our use of resources. Once the data is stored, access to it needs to support the typical usage patterns that are useful in dealing with environmental and energy data. It should also be possible to specify access rights and privacy rules for data access.

Data Storage

Carbon DB is at its core an implementation of a timeseries database that is able to deal with different types of readings:

Discrete readings: Data sampled from a sensor, such as temperature reading, each data point is independant of data points before and after, downsampling is done by calculating the average over time.
Cumulative readings: Data that represents a cumulative measure, such as energy or resource usage, data points are tied to a period of time and are usually sampled at regular intervals, downsampling is done by calculating the sum over time.

Usage Patterns

At the individual timeseries level, Carbon DB will support the following operations:

Time slices: Filters a set of records based on a given time period.
Down-sampling: Aggregates data to a lower frequency than what is stored in the database.
Time zones: Applies time zone information to provide readings in a local time zone, including when down-sampling to a period of a day or longer over a time change day.
Derivation: Calculates a derived timeseries by applying a conversion factor to a timeseries or combining several timeseries together. This includes conversion from one unit to another.

Energy and carbon data is often retrieved for multiple metrics at a time for the same time slice and sampling frequency. Carbon DB will support this usage pattern through the following operations:

Group queries: Manipulates a group of timeseries or meters that share specific characteristics.
Streaming: Streams results sets back in order to handle large collections without running out of resources on the client or server side.

Data Security and Privacy

Carbon DB will make use of OAuth2, OpenID Connect and related protocols to control access to data, over multiple dimensions:

Timeseries: What timeseries or groups of timeseries is access granted to.
Time: What time period is access granted over. For example, if occupancy of a building changes, the former occupier may have access to the data until the change date and the present occupier from that same change date.
Granularity: What minimal sampling period is access granted for. As sensor and energy data can sometimes expose behaviour patterns, the occupier of a building may be granted access for the minimum granularity while other users may be limited to a granularity that is coarse enough to blur behaviour patterns.

Components

Carbon DB will be formed of several components that each support a set of API endpoints.

Carbon DB Components

Access Control

All requests to Carbon DB are filtered through access control. Permissions are managed externally.

1. Access Control

Timeseries

This component will store individual timeseries data and related metadata. It will be responsible for handling time slices, down-sampling and timezone shifts. The timeseries endpoints are designed to update and read data from a single timeseries.

2. Timeseries

Meters

Meters combine multiple related timeseries that measure different aspects of the same metering point. For example, an electricity meter may provide current, voltage and energy consumption. Meters will be responsible for managing groups of timeseries, including explicit timeseries, virtual and derived ones. Meter endpoints are designed to read data from multiple timeseries and even multiple meters in one go, returning a coherent data slice with all series aligned on timestamps.

3. Meters

Feeds

Each meter gets its underlying data from a data feed. Feeds will be responsible for converting data from an external format and updating the corresponding timeseries. The feed endpoints are designed to allow updates of a large number of meters at the same time. Feeds will also be responsible for annotating the relevant meters with incident information should any of them fail to provide data.

4. Feeds