Core Concepts of Azure Event Hubs

Event Hubs Overview

Azure Event Hubs is a big data streaming platform and event-ingestion service. It can be used for real-time analytics, continuous data processing, and commanding and control scenarios. Event Hubs handles millions of events per second, allowing you to build applications that react to events as they happen.

Key characteristics of Event Hubs include:

Producer-Consumer Model

Event Hubs operates on a classic producer-consumer model:

Producer-Consumer Model Diagram

Diagram illustrating the Producer-Consumer model for Event Hubs.

Event Hub

An Event Hub is the central entity in Event Hubs. It acts as a highly scalable publish-subscribe message broker. An event hub is a collection of event data. You can think of it as a database table for events. Data is sent to an event hub and stored for a configurable period before being deleted or archived.

When you create an Event Hubs namespace, you can then create one or more event hubs within that namespace. Each event hub has its own configuration, including retention period and partitioning strategy.

Partitions

Partitions are the fundamental unit of ordering and data storage within an event hub. An event hub is divided into one or more partitions. Data sent to an event hub is distributed across these partitions.

The number of partitions is chosen at the time of event hub creation and affects both scalability and cost. You can increase the number of partitions later, but you cannot decrease it.

Consumer Groups

A consumer group is an abstraction that allows multiple independent applications or services to read from the same event hub without interfering with each other. Each consumer group maintains its own offset (a pointer to the last read event) within each partition.

Tip: A common pattern is to have one consumer group for each application that needs to process the event stream.

Events

An event is the fundamental unit of data processed by Event Hubs. Events are records containing information that has occurred. Event Hubs supports up to 256KB of data per event, with a maximum batch size of 1MB.

An event typically consists of:

When an event is sent to Event Hubs, it is assigned an offset within a specific partition, which acts as its unique identifier within that partition.

Throughput Units (TUs)

Throughput Units (TUs) are the primary mechanism for provisioning capacity and managing throughput for Event Hubs. A TU is a unit of aggregated throughput that allows for ingress (incoming) and egress (outgoing) traffic.

When you provision TUs for an event hub, you are allocating a certain amount of ingress and egress capacity. The number of partitions you choose also influences how this capacity is utilized. For example, if you have 10 partitions and 4 TUs, the total ingress capacity is 4 MB/sec or 4000 events/sec, and this capacity is distributed across the partitions.

Note: For very high-scale scenarios, Event Hubs offers auto-inflate, which can automatically increase the number of TUs as needed, and dedicated clusters for predictable, high-demand workloads.

Event Hubs Capture

Event Hubs Capture is a built-in feature that automatically and continuously streams event data from an event hub to an Azure Storage account (Blob Storage or Data Lake Storage Gen2).

Key features of Event Hubs Capture:

This feature is ideal for scenarios where you need to retain historical event data for compliance, auditing, or batch analytics.