A Look into Data Processing in Short-Term Power Trading
15 Nov 2022
Mario Nakhle, Product Management
All trading decisions in short-term power markets are data-driven.
Energy market participants have therefore been increasing their efforts to establish data processing pipelines in order to facilitate sourcing, collection, preparation, input, analysis and storage of data points.
Let’s zoom in on the six crucial stages of data processing.
Stages of Data Processing
Market participants predominantly obtain large amounts of data from a mix of third-party providers.
The unavoidable result is a potpourri of heterogeneous data points that need to be transformed in a series of data processing stages so that insights for intelligent trading decisions can be derived.
Stage 1: Data Sourcing
The data processing cycle starts with the selection and validation of data sources to fit trading requirements. Usually, one looks at the following characteristics:
Granularity — i.e. the data’s level of detail. For example, the weather may be forecasted in 60-minute, 30-minute or 15-minute time windows.
Data update frequency — weather forecast value of a given 30-minute time window can be updated every single hour or every 6 hours, for example.
Publication time — e.g. end of day, delayed or real-time data publication.
Availability — i.e. server availability to access data points at any given time.
Trustworthiness — there are multiple root sources in the market, which most data providers tend to aggregate in their datasets: TSO publications or data from major power exchanges and weather models, for example.
Popularity — since power spot markets are often behavior-driven, it is crucial to understand which data sources are used by other market participants.
Integration time — data exchange methods (e.g. FTP servers, push/pull APIs) and conformance checking process can impact turnaround times for integration.
Historical data timeframes — i.e. mechanisms to confirm that historical data is available for specific time periods.
Documentation — well maintained and thorough documentations reduce integration time and ensure long-term reliability.
Data quality issues — e.g. duplicates, missing data points and unstandardized data formats.
Stage 2: Data Collection
Once data providers have been selected and integrated, all available historical and live data points have to be fed into the market participants’ existing systems. This collection process often involves the development of applications to connect to the data providers’ APIs (if available) and stream incoming data.
Stage 3: Data Preparation
Since the accuracy of data outputs is reliant on the quality of incoming data, the third data processing stage comprises cleaning up received data points by applying the following common techniques:
removal of corrupt/irrelevant data and outliers
standardization of naming conventions
interpolation of missing data
Stage 4: Data Input
At this stage, data is input into corresponding data processing applications, primarily databases or message queues. Typically, one would use databases for historical data analysis or batch processing (mostly relevant for auction markets) and message queues with event-driven architectures for online processing (mostly relevant for continuous markets).
Stage 5: Data Analysis
During the fifth stage, multiple data manipulation techniques — sorting, summarization, aggregation, transformation, normalization — are used to process the collected data points, based on which trading decisions will be derived.
Batch processing applications allow for processing data points in batches each time the pre-specified amount of data is collected. In contrast, online processing applications run autonomously and react to every data update in real time.
While data processing in short-term power trading mostly follows generic data processing principles, it still comes with several industry-specific challenges:
Short trading windows in continuous markets — resulting in short time series that have to be aggregated
Large amounts of data for processing and storage (for example, order book updates, continuously updated weather forecasts, average prices and volumes)
Changing regulations — causing disruptions in historical data
European market coupling — increasing the amount of data needed for a specific region
Single extreme events — skewing the data points (for example, a power outage that distorts historical market data)
Stage 6: Data Storage
Once the output is available, it needs to be stored: historical data storage and live caching are the common options to accommodate different use cases.
Regardless of the selected way of storing data, the unique identifier of traded products — i.e. the combination of a market area, delivery start and end — needs to be applied when saving data points. Consistency in area naming, product size nomenclature and granularity therefore make data processing pipelines more scalable across products and markets.