Azure Event Hubs Developer Guide

Understanding Azure Event Hubs Partitions

Partitions are a fundamental concept in Azure Event Hubs. They are the ordered sequence of events that an Event Hub contains. An Event Hub is divided into one or more partitions, which allows for parallel processing of events and horizontal scaling of throughput. Each partition is a distinct, immutable sequence of events.

Why Partitions?

Partitioning Strategy

When you create an Event Hub, you specify the number of partitions. This number can be between 1 and the maximum allowed by your Event Hubs tier (e.g., 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024). Choosing the right number of partitions is crucial for performance and scalability.

Important: The number of partitions for an Event Hub cannot be changed after creation. If you need to increase the partition count, you must create a new Event Hub with the desired partition count and migrate your data.

Partition Key

When publishing events, you can optionally specify a partition key. If a partition key is specified:

The choice of partition key is critical for achieving balanced load distribution and maintaining ordering when required.

Consider the following when choosing a partition key:

Example: Event Delivery with Partition Key

Let's say you have an Event Hub with 4 partitions and you send events with the following partition keys:

Event Hubs will ensure that:

The order of arrival for events with the same partition key will be preserved within their designated partition.

Event Hubs and Consumer Groups

Partitions work in conjunction with consumer groups. Each consumer group reads from all partitions of an Event Hub independently. Within a single consumer group, only one instance of a given application can read from a specific partition at a time. This allows you to scale your processing by having multiple consumer groups, or by scaling instances within a consumer group to process events from different partitions in parallel.

If you have an Event Hub with 16 partitions, a single consumer group can, in theory, support up to 16 parallel consumer instances, each dedicated to reading from one partition.

Performance Tip: To maximize throughput and parallelism, ensure the number of partitions in your Event Hub is sufficient for your peak load and that your consumer applications are designed to read from multiple partitions concurrently.

Maximum Throughput

The total throughput of an Event Hub is the sum of the throughput across all its partitions. The maximum throughput is also constrained by the Event Hubs pricing tier. Higher tiers offer higher per-partition and total throughput limits.

[Example of Code Snippet for Publishing with Partition Key]


import os
from azure.eventhub import EventHubProducer, EventData

EVENT_HUB_CONNECTION_STR = os.environ.get('EVENT_HUB_CONNECTION_STR')
EVENT_HUB_NAME = "my-event-hub"

async def send_events():
    async with EventHubProducer(EVENT_HUB_CONNECTION_STR, EVENT_HUB_NAME) as producer:
        event_data_user = EventData("User login event data")
        event_data_user.properties = {"partition_key": "User123"}

        event_data_device = EventData("Device status update data")
        event_data_device.properties = {"partition_key": "DeviceXYZ"}

        await producer.send_event(event_data_user)
        await producer.send_event(event_data_device)
        print("Events sent with partition keys.")

# To run this example, you would typically have an async runner:
# import asyncio
# asyncio.run(send_events())
            

Understanding and correctly utilizing partitions is key to building scalable and performant event-driven applications with Azure Event Hubs.