Specification

Problem Description

We have more and more devices that are able to measure the environment around us and the resources we use in our daily lives. Those devices often manage data differently, which makes it difficult to combine data from multiple devices in order to understand our use of resources. Once the data is stored, access to it needs to support the typical usage patterns that are useful in dealing with environmental and energy data. It should also be possible to specify access rights and privacy rules for data access.

Data Storage

Carbon DB is at its core an implementation of a timeseries database that is able to deal with different types of readings:

Discrete readings
Data sampled from a sensor, such as temperature reading, each data point is independant of data points before and after, downsampling is done by calculating the average over time.
Cumulative readings
Data that represents a cumulative measure, such as energy or resource usage, data points are tied to a period of time and are usually sampled at regular intervals, downsampling is done by calculating the sum over time.

Usage Patterns

At the individual timeseries level, Carbon DB will support the following operations:

Time slices
Filters a set of records based on a given time period.
Down-sampling
Aggregates data to a lower frequency than what is stored in the database.
Time zones
Applies time zone information to provide readings in a local time zone, including when down-sampling to a period of a day or longer over a time change day.
Derivation
Calculates a derived timeseries by applying a conversion factor to a timeseries or combining several timeseries together. This includes conversion from one unit to another.

Energy and carbon data is often retrieved for multiple metrics at a time for the same time slice and sampling frequency. Carbon DB will support this usage pattern through the following operations:

Group queries
Manipulates a group of timeseries or meters that share specific characteristics.
Streaming
Streams results sets back in order to handle large collections without running out of resources on the client or server side.

Data Security and Privacy

Carbon DB will make use of OAuth2, OpenID Connect and related protocols to control access to data, over multiple dimensions:

Timeseries
What timeseries or groups of timeseries is access granted to.
Time
What time period is access granted over. For example, if occupancy of a building changes, the former occupier may have access to the data until the change date and the present occupier from that same change date.
Granularity
What minimal sampling period is access granted for. As sensor and energy data can sometimes expose behaviour patterns, the occupier of a building may be granted access for the minimum granularity while other users may be limited to a granularity that is coarse enough to blur behaviour patterns.

Components

Carbon DB will be formed of several components that each support a set of API endpoints.

Access Control Feeds Meters Timeseries
Carbon DB Components

Access Control

All requests to Carbon DB are filtered through access control. Permissions are managed externally.

1. Access Control

Timeseries

This component will store individual timeseries data and related metadata. It will be responsible for handling time slices, down-sampling and timezone shifts. The timeseries endpoints are designed to update and read data from a single timeseries.

2. Timeseries

Meters

Meters combine multiple related timeseries that measure different aspects of the same metering point. For example, an electricity meter may provide current, voltage and energy consumption. Meters will be responsible for managing groups of timeseries, including explicit timeseries, virtual and derived ones. Meter endpoints are designed to read data from multiple timeseries and even multiple meters in one go, returning a coherent data slice with all series aligned on timestamps.

3. Meters

Feeds

Each meter gets its underlying data from a data feed. Feeds will be responsible for converting data from an external format and updating the corresponding timeseries. The feed endpoints are designed to allow updates of a large number of meters at the same time. Feeds will also be responsible for annotating the relevant meters with incident information should any of them fail to provide data.

4. Feeds