Azure Event Hubs Documentation

Advanced Retention Policies in Azure Event Hubs

Optimizing data lifecycle management for your event streams.

Understanding Data Retention

Azure Event Hubs retains event data for a configured period, allowing consumers to process messages at their own pace or reprocess data if necessary. The retention period is crucial for compliance, debugging, and replay scenarios. Event Hubs offers two primary mechanisms for managing data retention:

  • Time-based retention: Events are discarded after a specified duration.
  • Size-based retention: Events are discarded once the total size of data in an Event Hub exceeds a specified limit. This is available in the Standard and Premium tiers.

Understanding these policies is vital for cost management, performance optimization, and meeting regulatory requirements.

Default Retention Period

By default, Azure Event Hubs retains data for 24 hours. This default can be adjusted during the creation of an Event Hub namespace or individual Event Hubs.

Configuring Time-Based Retention

Time-based retention is the most common method. You can configure this in two ways:

  1. At the Namespace Level: Setting a default retention period for all Event Hubs within a namespace.
  2. At the Event Hub Level: Overriding the namespace default for a specific Event Hub.

Using the Azure Portal

To configure time-based retention through the Azure portal:

  1. Navigate to your Event Hubs namespace in the Azure portal.
  2. In the left-hand menu, under Settings, select Event Hubs.
  3. Click on the specific Event Hub you want to configure or create a new one.
  4. Under the Properties tab, you will find the Message Retention setting. You can set this in hours (up to a maximum of 7 days for Standard/Basic, or 90 days for Premium/Dedicated).
  5. To set a default for the namespace, go back to the namespace overview page, and under Settings, select General. Look for Default message retention.

Using Azure CLI

Example command to update retention for a specific Event Hub (in hours):

az eventhubs eventhub update --resource-group myResourceGroup --namespace-name myNamespace --name myEventHub --retention-time 7
                

Example command to set the default retention for a namespace:

az eventhubs namespace update --resource-group myResourceGroup --name myNamespace --default-message-retention 7
                

Configuring Size-Based Retention (Standard & Premium Tiers)

Size-based retention offers more control when dealing with high-throughput scenarios. It ensures that the total storage used by an Event Hub does not exceed a defined limit, regardless of the time elapsed. This feature is only available in the Standard and Premium tiers.

How it Works

When both time-based and size-based retention are configured, Event Hubs will discard messages when EITHER the time limit OR the size limit is reached. The more restrictive condition will be met first.

Using the Azure Portal

To configure size-based retention:

  1. Navigate to your Event Hubs namespace and then to the specific Event Hub.
  2. In the Event Hub's Properties, you will find the Partition Retention (GB) setting (this is the size-based retention). Enter the desired maximum size in Gigabytes.

Using Azure CLI

Example command to update size-based retention for a specific Event Hub (in GB):

az eventhubs eventhub update --resource-group myResourceGroup --namespace-name myNamespace --name myEventHub --retention-in-gb 100
                

Important Considerations for Retention

  • Maximum Retention: Standard and Basic tiers have a maximum time-based retention of 7 days. Premium and Dedicated tiers support up to 90 days.
  • Data Loss: Be mindful of your retention settings to prevent accidental data loss.
  • Cost: Longer retention periods can increase storage costs.
  • Consumer Lag: Ensure your consumers can keep up with the data production rate. If consumers lag significantly, you might need to increase retention or scale your consumers.
  • Archiving: For long-term data archival, consider integrating Event Hubs with Azure Blob Storage or Azure Data Lake Storage using Event Hubs Capture.

Retention and Event Hubs Capture

Event Hubs Capture is a built-in feature that automatically and continuously archives event data to an Azure Blob Storage account or Azure Data Lake Storage Gen2. This is ideal for use cases requiring long-term storage, batch analytics, and compliance.

When Capture is enabled, it works independently of message retention settings. Events will be captured to your storage account and then removed from Event Hubs based on the configured message retention policy. This allows you to have both a short-term hot path for immediate processing and a long-term cold storage for historical analysis.

To configure Event Hubs Capture, navigate to your Event Hubs namespace, select Features, and then Event Hubs Capture.

Troubleshooting Retention Issues

If you encounter issues with data not being retained as expected:

  • Verify Configuration: Double-check the retention settings at both the namespace and Event Hub levels.
  • Check Tier Limitations: Ensure you are within the retention limits for your chosen Event Hubs tier.
  • Monitor Storage: For size-based retention, monitor the actual storage usage of your Event Hub.
  • Event Order: Remember that within a partition, events are ordered. When retention limits are hit, older messages are removed to make space for new ones.