Message Retention in Azure Event Hubs

Message retention is a critical configuration setting in Azure Event Hubs that determines how long events are stored in an Event Hub before they are automatically deleted. This setting is crucial for managing storage costs, complying with data policies, and ensuring that consumers have sufficient time to process incoming events.

How Message Retention Works

When you create an Event Hub, you can specify a retention period for the messages it will store. This period is typically measured in days. Event Hubs automatically manage the deletion of old messages based on this configured retention period. Consumers can read events from any point within the retention window, provided they maintain their offset correctly.

Important: Message retention is configured at the Event Hub level, not at the partition level. All partitions within an Event Hub share the same retention policy.

Configurable Retention Periods

The maximum message retention period you can configure depends on the Event Hubs tier you are using:

You can adjust the retention period for an existing Event Hub through the Azure portal, Azure CLI, or Azure SDKs.

Factors Influencing Retention Period Choice

When deciding on an appropriate message retention period, consider the following:

Tip: For scenarios requiring long-term archiving or compliance, consider using Azure Event Hubs Capture. Event Hubs Capture automatically archives events from Event Hubs to an Azure Blob Storage account or Azure Data Lake Storage Gen2 for as long as you need.

Viewing and Modifying Message Retention

Azure Portal

  1. Navigate to your Event Hubs namespace in the Azure portal.
  2. Select the specific Event Hub you want to configure.
  3. Under "Settings", click on "Configuration".
  4. You will see the "Message retention (days)" setting where you can adjust the value.
  5. Click "Apply" to save your changes.

Azure CLI Example

To set message retention to 3 days for an Event Hub named myEventHub in a namespace named myNamespace:


az eventhubs eventhub update --resource-group myResourceGroup --namespace-name myNamespace --name myEventHub --retention-period 3
            

Calculating Storage Usage

Understanding your storage consumption helps in estimating costs. The storage used by messages is directly related to the message size, the number of partitions, and the retention period.

A rough estimate of storage consumed can be calculated as:

Storage = (Average Message Size) * (Total Throughput) * (Retention Period in Seconds)

Note that Event Hubs has a default limit of 1 TB per partition for retention. If you exceed this limit, older messages will be dropped even if the retention period has not elapsed.

Message Expiration

Messages are considered "expired" and eligible for deletion once they have been stored for longer than the configured retention period. Event Hubs continuously cleans up expired messages.

It's important to understand that the deletion is not instantaneous. There might be a small delay between a message reaching its retention limit and its actual removal from storage. Consumers should be designed to handle potential re-reads of the very last few messages if they process events very close to the retention deadline.

Consumer Logic Consideration:

When consumers read messages, they keep track of their position using an offset. If a consumer stops processing for a period longer than the retention policy, it may no longer be able to read older messages that have been deleted.