Retention Policy
Understanding and configuring the retention policy for your Azure Event Hubs is crucial for managing data storage costs, compliance requirements, and ensuring that your applications have access to the necessary historical data. Event Hubs offers two primary retention mechanisms: time-based retention and size-based retention.
Time-Based Retention
By default, Event Hubs retains messages for a configurable period of time. Once this period expires, the messages are automatically deleted. This is the most common way to manage event data lifecycle.
- Default Retention: The default retention period for Event Hubs is 24 hours.
- Configurable Range: You can configure the retention period for an Event Hub from a minimum of 1 minute up to a maximum of 7 days.
- Configuration: This setting is configured at the namespace level and applies to all Event Hubs within that namespace, or it can be overridden at the individual Event Hub level.
How to Configure Time-Based Retention
You can set the retention policy using the Azure portal, Azure CLI, PowerShell, or SDKs.
Azure Portal:
- Navigate to your Event Hubs namespace in the Azure portal.
- Under "Settings", click on "Event Hubs".
- Select the specific Event Hub you wish to configure.
- In the Event Hub's settings blade, you will find the "Message Retention" option.
- Enter the desired retention period in hours or days.
- Click "Save".
Azure CLI:
To update the retention period for an Event Hub:
az eventhubs eventhub update --resource-group --namespace-name --name --message-retention 7
This command sets the retention to 7 days (168 hours).
Size-Based Retention
In addition to time-based retention, Event Hubs also supports size-based retention, which allows you to limit the total size of data stored within an Event Hub. This feature is particularly useful for controlling costs in high-throughput scenarios.
- Maximum Size: The maximum size for an Event Hub can be configured up to 1 TB (Terabyte) for Standard and Premium tiers. Basic tier has a lower limit.
- Behavior: When the Event Hub reaches its configured size limit, the oldest data is deleted to make room for new incoming events. This happens irrespective of the time-based retention policy.
- Interaction: If both time-based and size-based retention are configured, the Event Hub will discard data when EITHER the time limit OR the size limit is reached. The more restrictive policy dictates the data expiration.
How to Configure Size-Based Retention
Size-based retention is typically configured via the Azure CLI or SDKs.
Azure CLI:
To set the maximum size (e.g., 500 GB) for an Event Hub:
az eventhubs eventhub update --resource-group --namespace-name --name --max-size 500
Note: The value is in GB. The maximum allowed value depends on your Event Hubs tier.
Important Considerations
- Data Loss: Always ensure your retention policy aligns with your data durability and recovery needs. Accidental deletion due to misconfiguration can lead to irreversible data loss.
- Cost: Longer retention periods and larger size limits increase storage costs. Carefully balance your requirements with budget constraints.
- Compliance: Be aware of any industry-specific or regulatory compliance requirements that dictate how long you must retain event data.
- Tier Limits: Retention and size limits vary across Event Hubs pricing tiers (Basic, Standard, Premium). Consult the official Azure documentation for current limits.
Choosing the Right Policy
The best retention policy depends on your specific use case:
- Real-time Analytics: Short retention periods (e.g., 1-24 hours) might suffice if you only need to process data as it arrives.
- Auditing and Compliance: Longer retention periods (e.g., several days or weeks, potentially combined with archival to Azure Blob Storage or Data Lake Storage) are necessary if you need to retain data for auditing or regulatory compliance.
- Cost Optimization: If you are concerned about storage costs, carefully monitor usage and set appropriate size or time limits. Consider Event Hubs Capture feature for cost-effective long-term storage.