Azure Event Hubs Capture

Efficiently and automatically capture Event Hubs data into Azure Storage.

Introduction

Azure Event Hubs Capture is a built-in feature that enables you to automatically and incrementally capture the output of your Event Hubs stream into a specified Azure Storage account (either Azure Blob Storage or Azure Data Lake Storage Gen2). This capability allows you to archive your event data for longer-term storage, replayability, batch analytics, and compliance purposes without requiring custom code.

Capture solves the common problem of wanting to retain and process streaming event data in a batch-friendly format. Instead of building and maintaining complex processing pipelines to offload data, Event Hubs Capture handles this seamlessly in the background.

How It Works

When Event Hubs Capture is enabled for an event hub, it continuously reads events from the event hub and batches them together. These batches are then written to your configured storage account in a format suitable for batch processing.

File Format (Apache Avro)

Event Hubs Capture uses Apache Avro for its efficiency and schema evolution capabilities. Each Avro file contains events from a specific partition within a defined time window. The schema of the Avro files includes metadata about the events, such as offset, sequence number, and timestamp, along with the event body itself.

A typical Avro schema might look like this (simplified):


{
  "type": "record",
  "name": "EventData",
  "fields": [
    {"name": "body", "type": ["null", "bytes"]},
    {"name": "properties", "type": {"type": "map", "values": "string"}},
    {"name": "systemProperties", "type": {"type": "map", "values": "string"}},
    {"name": "offset", "type": "long"},
    {"name": "sequenceNumber", "type": "long"},
    {"name": "partitionKey", "type": ["null", "string"]},
    {"name": "enqueuedTime", "type": "long"}
  ]
}
            

Key Features

Benefits

Supported Storage

Event Hubs Capture supports the following Azure storage services:

When configuring Capture, you will need to provide the connection string or managed identity details for your storage account and specify the target container.

Configuration

Event Hubs Capture can be enabled and configured directly from the Azure portal or programmatically using Azure SDKs, ARM templates, or Bicep.

Key configuration parameters include:

When configuring via the Azure portal, you navigate to your Event Hubs namespace, select the Event Hub you want to enable Capture for, and find the "Capture" setting.

Use Cases

Limitations

While powerful, Event Hubs Capture has some considerations: