Partitions
Partitions are the fundamental building blocks of an Azure Event Hubs instance. They are ordered sequences of events that are appended to the Event Hubs log. Each partition acts as an independent log, allowing for parallelism in both publishing and consuming events.
Key Characteristics of Partitions
- Ordered Sequence
- Events within a single partition are always stored and delivered in the order they were received by the Event Hubs service. This ensures predictable processing for related events.
- Independent Log
- Each partition operates as an independent stream. Events sent to different partitions are processed independently, enabling horizontal scalability and high throughput.
- Fixed Number
- The number of partitions is fixed at the time the Event Hubs instance is created (or can be adjusted later, depending on the tier). This number determines the maximum parallelism for consumers.
- Partition Key
- When publishing events, you can optionally specify a partition key. This key is used by Event Hubs to determine which partition the event should be written to. Events with the same partition key are guaranteed to go to the same partition.
How Partitions Enable Scalability
The partitioning strategy is crucial for Event Hubs' ability to handle large volumes of data. Here's how it works:
- Parallel Publishing: Multiple producers can publish events concurrently to different partitions, maximizing ingress throughput.
- Parallel Consuming: Multiple consumers within a consumer group can read from different partitions simultaneously. This allows consumer applications to scale out and process events faster.
- Load Distribution: By distributing events across partitions, Event Hubs balances the load on the system.
Illustrative Diagram
Event Hubs Partitioning
(Key A)
(Key B)
(Key C)
(Key D)
Events with the same Partition Key are routed to the same partition.
Choosing the Number of Partitions
The number of partitions is a critical configuration parameter. Consider the following when deciding:
- Throughput Requirements: Higher throughput generally requires more partitions.
- Consumer Scalability: The maximum number of consumers that can read in parallel is limited by the number of partitions. A common pattern is to have as many or more consumer instances as partitions within a consumer group.
- Partition Key Strategy: A well-designed partition key strategy ensures even distribution of events across partitions.
Azure Event Hubs offers different pricing tiers with varying maximum partition limits. Always refer to the latest Azure documentation for specific limits and recommendations.
Understanding Partition Offset
Within each partition, events are assigned a sequential, monotonically increasing number called an offset. Consumers use this offset to track their progress through the partition's event stream. When a consumer reads an event, it notes its offset. If the consumer disconnects and reconnects, it can resume reading from the last noted offset.
Managing Partitions
The number of partitions for an Event Hubs namespace can be configured when you create the namespace and can often be updated later depending on the chosen tier. You can manage partitions through the Azure portal, Azure CLI, Azure PowerShell, or client SDKs.
Important Note: While events within a partition are ordered, there's no guaranteed ordering of events across different partitions unless you explicitly ensure it via partition keys.