Azure Event Hubs Developer's Guide: Partition Keys

What are Partition Keys?

When you send events to an Azure Event Hub, each event can optionally be associated with a partition key. This key is a string value that Event Hubs uses to determine which partition the event should be sent to. The core principle is that all events with the same partition key will always be sent to the same partition within an Event Hub.

This behavior is crucial for several reasons:

Ordering Guarantees: Within a single partition, events are guaranteed to be stored and delivered in the order they were received by Event Hubs. By ensuring events that need to be processed in order share the same partition key, you can leverage this guarantee.
Partitioning and Throughput: Event Hubs are partitioned to enable high throughput. Sending events to different partitions can distribute the load across the service. Partition keys allow you to control this distribution.

How Event Hubs Uses Partition Keys

When an event is sent to an Event Hub:

If a partition key is provided, Event Hubs calculates a hash of the partition key.
This hash value is then used to determine the target partition. Specifically, the hash modulo the number of partitions in the hub determines the partition ID.
If no partition key is provided, Event Hubs assigns the event to a partition itself, typically using a round-robin approach. This means events without keys are distributed evenly but do not have ordering guarantees across different keys.

The formula is generally:

partitionId = hash(partitionKey) % numberOfPartitions

Choosing an Effective Partition Key

The choice of partition key significantly impacts your application's performance, ordering, and scalability. Here are some guidelines:

1. Preserve Ordering

If your application logic requires that a sequence of related events be processed in the order they occurred, you must use a partition key that groups these related events together. A common example is a device ID, a user ID, or a session ID.

Example: If you have sensor readings from multiple devices, using the deviceId as the partition key will ensure all readings from a single device go to the same partition, maintaining chronological order for that device's data.

// Example of sending an event with a partition key
await producerClient.sendEvent(new EventData("Sensor reading data"), { partitionKey: "device-123" });

2. Distribute Load Evenly

A good partition key should distribute events as evenly as possible across all available partitions. If a key is too popular, it can lead to a "hot partition," where one partition receives a disproportionately large amount of traffic, potentially becoming a bottleneck for both producers and consumers. Conversely, if a key is not unique enough, you might end up with multiple events being sent to the same partition when distribution would be beneficial.

Avoid: Keys like "true", "false", or a single constant value.
Prefer: Keys that have a high cardinality (many unique values) and are reasonably well-distributed.

Example: A unique transactionId is often a good choice for financial transactions, as each transaction is distinct and unlikely to create a hot partition.

3. Consider Consumers

Think about how your consumers will read from Event Hubs. If a consumer is designed to process events for a specific entity (e.g., a user, a device), then using that entity's identifier as the partition key will naturally align producers and consumers, allowing consumers to potentially read from a single partition and achieve higher throughput for that entity.

4. Key Length

The partition key is a string. While there isn't a strict length limit imposed by Event Hubs that would typically cause issues, keeping keys reasonably sized can be beneficial for performance and ease of management.

Examples of Partition Keys

Scenario	Good Partition Key Example	Why it's Good	Potential Pitfalls
IoT Device Telemetry	`deviceId`	Ensures all data from one device stays together for ordering and easier per-device processing. High cardinality.	If a single `deviceId` generates an extremely high volume of events, it could create a hot partition.
User Activity Tracking	`userId`	Guarantees all actions by a user are in order within a partition.	Popular users might create hot partitions.
Order Processing	`orderId`	All events related to a single order (e.g., placed, paid, shipped) are partitioned together for ordered processing.	Individual orders are unlikely to be a bottleneck.
System Events	A combination, e.g., `serverName + processId`	Ensures events from a specific process on a specific server are grouped.	If one server/process is far more active, it could still be an issue.

When Not to Use a Partition Key

In some scenarios, you might not need to specify a partition key:

Best-Effort Ordering: If the order of events across different logical entities doesn't matter, and you simply want to distribute events across partitions for scalability, letting Event Hubs handle the partitioning (by not specifying a key) is appropriate.
Maximizing Producer Throughput: In scenarios where you want to push as many events as possible as quickly as possible and don't have strict ordering requirements, omitting the partition key allows Event Hubs to optimize the distribution.

When no partition key is provided, Event Hubs uses a default partitioning scheme (typically round-robin) to distribute events. This is ideal for maximizing throughput if ordering across arbitrary events is not a concern.

Best Practices Summary

Identify Ordering Requirements: Determine which sets of events *must* be processed in order.
Choose High Cardinality Keys: For distribution, use keys with many unique values.
Monitor Partitions: Keep an eye on partition load to identify and address hot partitions.
Test Your Keys: Experiment with different key strategies to see how they affect your application's performance and reliability.
Document Your Choices: Clearly document why specific partition keys were chosen for different event types.

Understanding Partition Keys in Azure Event Hubs