Event Hubs Capture

Azure Event Hubs Capture is a built-in feature that automatically and incrementally batches the data in your Event Hubs, saving it to a Microsoft Azure Storage account or Azure Data Lake Storage Gen2 account of your choice.

Capture is designed to offload the task of archiving event data from Event Hubs. It's a fully managed service that continuously archives any data written to your Event Hubs. It's ideal for scenarios where you need to archive all incoming events for downstream processing, historical analysis, or compliance purposes.

Key Features of Event Hubs Capture

How Event Hubs Capture Works

When you enable Event Hubs Capture on an Event Hub namespace, you specify:

Diagram illustrating Event Hubs Capture flow
Conceptual flow of Event Hubs Capture

Event Hubs Capture operates by:

  1. Monitoring incoming events in your Event Hubs.
  2. Aggregating events into batches based on the configured time and size intervals.
  3. Writing these batches as Avro files to the specified Azure Storage account.
  4. The files are organized in a hierarchical structure within the storage account, typically including year, month, day, hour, and minute for easy querying and management.

Enabling Event Hubs Capture

You can enable Event Hubs Capture through the Azure portal, Azure CLI, PowerShell, or SDKs.

Azure Portal Steps:

  1. Navigate to your Event Hubs namespace in the Azure portal.
  2. In the left-hand menu, under "Settings," select "Create capture settings".
  3. Enable the Capture toggle.
  4. Select or create your Storage account and Blob container (or Data Lake Storage Gen2 file system).
  5. Configure the Capture interval (e.g., 300 seconds) and Capture size (e.g., 100 MB).
  6. Click "Save".

Important: Event Hubs Capture begins archiving events only *after* the capture setting is enabled. It does not capture historical data that arrived before the feature was turned on.

Use Cases for Event Hubs Capture

Data Format: Avro

Event Hubs Capture saves data in the Apache Avro format. Avro is a row-based data serialization system that provides rich data structures and a compact, fast, binary data format. Each captured file will contain multiple events. The Avro schema includes metadata about the event, such as offset, sequence number, timestamp, and properties, in addition to the event body.

You can use various tools and libraries to read Avro files, including:

Considerations

By leveraging Event Hubs Capture, you can efficiently integrate your real-time event streaming data with robust batch processing and analytical services.