Event Hubs Partitions
Partitions are a fundamental concept in Azure Event Hubs that enable high throughput, parallel processing, and ordered delivery of events within a logical stream.
What is a Partition?
An Event Hub is divided into one or more partitions. Each partition is an ordered, immutable sequence of events. Events are appended to a partition in the order they are received. Event Hubs guarantees that events within a single partition are always processed in the order they were sent.
Conceptual Diagram:
Key Characteristics of Partitions:
- Ordered Delivery: Within a partition, events are guaranteed to be ordered. This is crucial for many scenarios where the sequence of events matters.
- Parallel Processing: By distributing events across multiple partitions, Event Hubs enables consumers to read events in parallel, significantly increasing throughput. Each consumer (or consumer group) can have dedicated partitions to process.
- Scalability: The number of partitions directly impacts the maximum ingress and egress throughput of an Event Hub. You can scale up by increasing the number of partitions.
- Partition Key: When publishing events, you can specify a partition key. Event Hubs uses this key to determine which partition an event should be sent to. All events with the same partition key will always be routed to the same partition. This is essential for maintaining event order for related events (e.g., all events for a specific user or device). If no partition key is specified, Event Hubs will round-robin events across partitions.
- Offset: Each event within a partition is assigned a sequential, monotonically increasing integer value called an offset. Offsets are unique within a partition but not across the entire Event Hub. Consumers use offsets to track their progress in reading events.
Partitioning Strategy
Choosing the right number of partitions and using partition keys effectively are critical for optimal performance and cost management.
Number of Partitions:
The number of partitions determines the maximum parallel throughput you can achieve.
- Low Number of Partitions: Suitable for scenarios with lower throughput requirements or when strong ordering across all events is paramount and parallelism is less of a concern.
- High Number of Partitions: Ideal for high-throughput scenarios where parallel processing is key. Consider the number of consumer instances you expect to run. A common guideline is to have at least as many partitions as the maximum number of consumer instances that will read from the hub in parallel.
Partition Key Usage:
Use partition keys when you need to ensure that related events are processed in order.
- Device Telemetry: Use the device ID as the partition key to ensure all telemetry from a specific device is processed sequentially.
- User Actions: Use a user ID to group all actions performed by a user.
- Session IDs: Group events belonging to a particular session.
If you don't specify a partition key, Event Hubs distributes events evenly across all partitions using a round-robin mechanism. This maximizes throughput but does not guarantee ordering for related events.
Important Note:
The number of partitions for an Event Hub is set at creation time and cannot be changed later. If you need to increase the number of partitions, you must create a new Event Hub with the desired configuration.
Partitions and Consumer Groups
Consumer groups are logical groupings of consumers. Within a consumer group, each partition is consumed by only one instance of a consumer at a time. This allows multiple consumer groups to independently read from the same Event Hub without interfering with each other. Each consumer group maintains its own state and offset, enabling independent consumption of the event stream.
Example: Publishing with a Partition Key
// Example using Azure SDK for .NET
var producer = new EventHubProducerClient("YOUR_EVENTHUB_CONNECTION_STRING");
var eventData = new EventData(Encoding.UTF8.GetBytes("{\"message\": \"Sensor reading\"}"));
eventData.PartitionKey = "sensor-123"; // Events with this key go to the same partition
await producer.SendAsync(eventData);
By setting the PartitionKey, you ensure that all events with the key "sensor-123" will be directed to the same partition, preserving their order.