Tracking Electricity Data with InfluxDB

← Back to Blog

Why Electricity Data Matters ⚡

In addition to energy-saving mechanisms, increasing progress in carbon-aware software is necessary to reduce carbon emissions and mitigate climate change.

Carbon-aware software can take into account the grid carbon intensity (GCI) of an energy grid.

Because of the variance and regional differences in grid carbon intensity (GCI), optimizing energy efficiency does not necessarily result in the most carbon-efficient solution [1]. Consequently, carbon-aware software has to consider the regional GCI to reduce Scope 2 emissions.

I used this in my masters thesis to build and simulate Squirrel, a carbon-aware job scheduler for Slurm. 🐿️

Getting Started 🌱

To get started, we'll set up a local InfluxDB using Docker Compose.

  1. Create a docker-compose.yaml file.
  2. In this file, insert the following:
    version: '3'
    
    services:
    influxdb:
        image: influxdb:2
        container_name: influxdb
        hostname: influxdb
        volumes:
        - type: volume
            source: influxdb2-data
            target: /var/lib/influxdb2
        - type: volume
            source: influxdb2-config
            target: /etc/influxdb2
        ports:
        - "8086:8086"
        networks:
        - naturegrid
    
    volumes:
    influxdb2-data:
    influxdb2-config:
    
    networks:
    naturegrid:
        driver: bridge
  3. Execute docker compose up -d
  4. In your browser, go to localhost:8086

You are asked to set up your InfluxDB instance. Name your user, choose a password, name the organization, and create a bucket. You can select anything here .

InfluxDB Setup Screenshot

Persist historical grid carbon intensity data 📜

We will use InfluxDB to store lifecycle grid carbon intensity (GCI) data from Electricity Maps.

Data portal downloads

Electricity Maps also provides hourly data from previous years for free. Their data portal contains data for all available energy zones.

Once you downloaded the data you need, you can ingest them into InfluxDB, e.g. with Python (adjustment needed):

"""Ingest data from Electricity Maps Data Portal CSV."""
from pathlib import Path

import pandas as pd
from influxdb_client import InfluxDBClient


def _transform_datestr(datestr: str):
    x_dt = datetime.fromisoformat(datestr)  # Date is in UTC
    return x_dt.isoformat(timespec="microseconds")


path_to_csv = Path("path/to/downloaded.csv")
emaps_df = pd.read_csv(path_to_csv)
relevant_data = emaps_df[
    ["Zone Id", "Carbon Intensity gCO₂eq/kWh (LCA)", "Datetime (UTC)"]
]
df_mapped = relevant_data.rename(
    columns={
        "Zone Id": "zone",
        "Carbon Intensity gCO₂eq/kWh (LCA)": "gci",
        "Datetime (UTC)": "time",
    }
)
influx_opt = {
    "bucket": "bucket name, e.g. squirrel",
    "measurement": "measurement name, e.g. electricity_maps",
    "field": "field name, e.g. carbonIntensity",
    "tags": {"zone": "zone of data, e.g. DE", "emissionFactorType": "lifecycle"},
}
zone_ids = df_mapped["zone"].unique()
assert len(zone_ids) == 1
influx_opt.get("tags").update({"zone": zone_ids[0]})
df_mapped.loc[:, "time"] = df_mapped["time"].apply(_transform_datestr)
client = InfluxDBClient(...) # Setup client with token etc.
with client.write_api() as writer:
    for _, row in data.iterrows():
        record = {
            "measurement": options["measurement"],
            "fields": {options["field"]: row[value_column]},
            "tags": options["tags"],
            "time": row[time_column],
        }
        writer.write(bucket=options["bucket"], record=record)

The last 24 hours

To get the current data, we will use Telegraf to get hourly lifecycle grid carbon intensity data from Electricity Maps and store them in InfluxDB. Feel free to check this official blogpost for running InfluxDB and Telegraf using Docker.

Please set up a free API token for Electricity Maps.

You can setup multiple Telegraf instances for multiple energy zones. We use the following docker-compose.yaml (replace values respectively):

version: '3'

services:
  influxdb:
    image: influxdb:2
    container_name: influxdb
    hostname: influxdb
    volumes:
      - type: volume
        source: influxdb2-data
        target: /var/lib/influxdb2
      - type: volume
        source: influxdb2-config
        target: /etc/influxdb2
    ports:
      - "8086:8086"
    networks:
      - naturegrid
  telegraf:
    image: telegraf:latest
    container_name: telegraf
    depends_on:
      - influxdb
    volumes:
      # Mount for telegraf config
      - ./telegraf.conf:/etc/telegraf/telegraf.conf:ro
    environment:
      - INFLUX_URL=http://influxdb:8086
      - INFLUX_TOKEN=influx-api-token
      - INFLUX_ORG=your-org
      - INFLUX_BUCKET=your-bucket
      - EMAPS_TOKEN=electricity-maps-api-token
      - EMAPS_URL=https://api.electricitymap.org/v3/carbon-intensity/history?zone=DE
    networks:
      - naturegrid

volumes:
  influxdb2-data:
  influxdb2-config:

networks:
  naturegrid:
    driver: bridge

Here is a template for the Telegraf configuration file telegraf.conf. This configuration instructs Telegraf to query the Electricity Maps API every 15 minutes by default. You can change the interval for testing purposes. The result will be written into your bucket as "electricity_maps" measurements.

# Configuration for telegraf agent
[agent]
  ## Default data collection interval for all inputs
  interval = "10s"
  ## Rounds collection interval to 'interval'
  ## ie, if interval="10s" then always collect on :00, :10, :20, etc.
  round_interval = true

  ## Telegraf will send metrics to outputs in batches of at most
  ## metric_batch_size metrics.
  ## This controls the size of writes that Telegraf sends to output plugins.
  metric_batch_size = 100

  ## Maximum number of unwritten metrics per output.  Increasing this value
  ## allows for longer periods of output downtime without dropping metrics at the
  ## cost of higher maximum memory usage.
  metric_buffer_limit = 100

  ## Collection jitter is used to jitter the collection by a random amount.
  ## Each plugin will sleep for a random time within jitter before collecting.
  ## This can be used to avoid many plugins querying things like sysfs at the
  ## same time, which can have a measurable effect on the system.
  collection_jitter = "2s"

  ## Default flushing interval for all outputs. Maximum flush_interval will be
  ## flush_interval + flush_jitter
  flush_interval = "30s"
  ## Jitter the flush interval by a random amount. This is primarily to avoid
  ## large write spikes for users running a large number of telegraf instances.
  ## ie, a jitter of 5s and interval 10s means flushes will happen every 10-15s
  flush_jitter = "1s"

  ## By default or when set to "0s", precision will be set to the same
  ## timestamp order as the collection interval, with the maximum being 1s.
  ##   ie, when interval = "10s", precision will be "1s"
  ##       when interval = "250ms", precision will be "1ms"
  ## Precision will NOT be used for service inputs. It is up to each individual
  ## service input to set the timestamp at the appropriate precision.
  ## Valid time units are "ns", "us" (or "µs"), "ms", "s".
  precision = "1s"

  ## Log at debug level.
  # debug = false
  ## Log only error level messages.
  quiet = false

  ## Name of the file to be logged to when using the "file" logtarget.  If set to
  ## the empty string then logs are written to stderr.
  logfile = ""

  ## The logfile will be rotated after the time interval specified.  When set
  ## to 0 no time based rotation is performed.  Logs are rotated only when
  ## written to, if there is no log activity rotation may be delayed.
  # logfile_rotation_interval = "0d"

  ## The logfile will be rotated when it becomes larger than the specified
  ## size.  When set to 0 no size based rotation is performed.
  # logfile_rotation_max_size = "0MB"

  ## Maximum number of rotated archives to keep, any older logs are deleted.
  ## If set to -1, no archives are removed.
  # logfile_rotation_max_archives = 5

  ## Pick a timezone to use when logging or type 'local' for local time.
  ## Example: America/Chicago
  # log_with_timezone = ""

  ## Override default hostname, if empty use os.Hostname()
  hostname = "telegraf"
  ## If set to true, do no set the "host" tag in the telegraf agent.
  omit_hostname = false

[[outputs.influxdb_v2]]
  ## The URLs of the InfluxDB cluster nodes.
  ##
  ## Multiple URLs can be specified for a single cluster, only ONE of the
  ## urls will be written to each interval.
  ##   ex: urls = ["https://us-west-2-1.aws.cloud2.influxdata.com"]
  urls = ["${INFLUX_URL}"]

  ## Token for authentication.
  token = "${INFLUX_TOKEN}"

  ## Organization is the name of the organization you wish to write to; must exist.
  organization = "${INFLUX_ORG}"

  ## Destination bucket to write into.
  bucket = "${INFLUX_BUCKET}"

  ## The value of this tag will be used to determine the bucket.  If this
  ## tag is not set the 'bucket' option is used as the default.
  # bucket_tag = ""

  ## If true, the bucket tag will not be added to the metric.
  # exclude_bucket_tag = false

  ## Timeout for HTTP messages.
  # timeout = "5s"

  ## Additional HTTP headers
  # http_headers = {"X-Special-Header" = "Special-Value"}

  ## HTTP Proxy override, if unset values the standard proxy environment
  ## variables are consulted to determine which proxy, if any, should be used.
  # http_proxy = "http://corporate.proxy:3128"

  ## HTTP User-Agent
  # user_agent = "telegraf"

  ## Content-Encoding for write request body, can be set to "gzip" to
  ## compress body or "identity" to apply no encoding.
  # content_encoding = "gzip"

  ## Enable or disable uint support for writing uints influxdb 2.0.
  # influx_uint_support = false

  ## Optional TLS Config for use on HTTP connections.
  # tls_ca = "/etc/telegraf/ca.pem"
  # tls_cert = "/etc/telegraf/cert.pem"
  # tls_key = "/etc/telegraf/key.pem"
  ## Use TLS but skip chain & host verification
  # insecure_skip_verify = false

# Read formatted metrics from one or more HTTP endpoints
[[inputs.http]]
  ## One or more URLs from which to read formatted metrics
  urls = ["${EMAPS_URL}"]
  headers = {"auth-token" = "${EMAPS_TOKEN}"}
  data_format = "json"
  name_override = "electricity_maps"
  tagexclude = ["url", "host"]
  json_query = "history"
  json_name_key = "carbonIntensity"
  tag_keys = ["zone", "emissionFactorType"]
  json_time_key = "datetime"
  json_time_format = "2006-01-02T15:04:05Z07:00"

Next steps 🔮

Congratulations, you have a working InfluxDB setup that collects data from Electricity Maps. You can use the time-series data for forecasting, analysis, simulations, or just your own curiosity. Just make sure to leave the container running in order to collect data 😉

References

  1. [1] Walid A. Hanafy et al. “The War of the Efficiencies: Understanding the Tension between Carbon and Energy Optimization”. In: Proceedings of the 2nd Workshop on Sustainable Computer Systems. HotCarbon '23. New York, NY, USA: Association for Computing Machinery, Aug. 2, 2023, pp. 1-7. isbn: 9798400702426. doi: 10.1145/3604930.3605709. https://dl.acm.org/doi/10.1145/3604930.3605709