Partition Keys
Partition keys are a fundamental concept in Azure Event Hubs that influence how events are distributed across partitions. When you send events to an Event Hub, you can optionally specify a partition key. This key is then used by Event Hubs to determine which partition the event should be written to.
Why Use Partition Keys?
The primary reason for using partition keys is to ensure ordered delivery of events within a specific stream or logical entity. Events that share the same partition key are guaranteed to be sent to the same partition, and consequently, they will be processed in the order they were received by that partition.
This is crucial for scenarios where the order of operations matters, such as:
- Processing sensor readings from a specific device.
- Handling transactions for a particular user account.
- Maintaining the sequence of commands for a single client.
How Partition Keys Work
When an event is sent to an Event Hub:
- If a partition key is provided, Event Hubs uses a hash function on the key's value.
- The resulting hash is used to select a specific partition to which the event will be appended.
- If no partition key is provided, Event Hubs will choose a partition randomly, distributing the load across available partitions.
The internal partitioning mechanism ensures that identical partition keys always map to the same partition. This deterministic behavior is the basis for ordered processing.
Important Note: While partition keys ensure order within a partition, they do not guarantee order across different partitions. Events with different partition keys can be interleaved.
Choosing a Good Partition Key
The effectiveness of partition keys heavily relies on the choice of key. A good partition key should:
- Ensure Ordering: The key should represent an entity for which ordered processing is required.
- Distribute Load Evenly: If you have many entities requiring ordering, choose keys that are diverse enough to distribute events across multiple partitions. A single, highly active entity could overload a single partition.
- Be Stable: The partition key for an entity should not change over time.
Common examples of partition keys include:
- Device ID
- User ID
- Session ID
- Transaction ID
Example: Sending Events with Partition Keys (Conceptual)
When publishing events using an SDK, you would typically specify the partition key in the event properties.
// Conceptual example in C#
using Azure.Messaging.EventHubs;
using Azure.Messaging.EventHubs.Producer;
using System.Text;
using System.Threading.Tasks;
// ...
string connectionString = "YOUR_EVENTHUBS_CONNECTION_STRING";
string eventHubName = "YOUR_EVENTHUB_NAME";
await using var producerClient = new EventHubProducerClient(connectionString, eventHubName);
var eventData = new EventData(Encoding.UTF8.GetBytes("{\"message\": \"Temperature reading\"}"));
eventData.PartitionKey = "device-123"; // Assigning a partition key
await producerClient.SendAsync(new[] { eventData });
In this example, all events with the partition key device-123 will be routed to the same partition, ensuring their order of arrival within that partition is preserved.
Considerations
- Partition Count: The total number of partitions in your Event Hub namespace limits the potential parallelism. Even with perfect partitioning, you can't process events faster than the number of partitions allows for.
- Fan-out: If your application logic requires processing events from multiple partitions concurrently, you will use consumer groups. The partition key primarily impacts the producer's routing and the ordering guarantee.
Understanding and effectively utilizing partition keys is key to building robust and scalable streaming applications with Azure Event Hubs.