Core Concepts of Azure Event Hubs
Azure Event Hubs is a highly scalable data streaming platform and event ingestion service that can help you build real-time data pipelines and streaming applications. Understanding its core concepts is crucial for effective utilization.
Event Hubs Namespace
An Event Hubs namespace is a unique scoping container for managing Event Hubs and their associated resources. Think of it as a logical container for your event streams. When you create an Event Hubs namespace, you get a globally unique DNS name.
Event Hub
An Event Hub is the primary entity within an Event Hubs namespace. It's a managed stream of events. Data sent to an Event Hub is typically modeled as a feed of events. An Event Hub can have one or more partitions.
Partition
Partitions are the ordered sequences of events within an Event Hub. Event Hubs preserves the order of events within a partition but not necessarily between partitions. Events are appended to partitions in an append-only manner. Consumers can read events from partitions independently. Partitions are the key to scalability in Event Hubs, allowing for parallel processing of event streams.
- Events are sent to a specific partition either by using a partition key or by round-robin load balancing if no partition key is specified.
- Partition IDs are integers starting from 0.
Producer
A producer is an application or service that sends events to an Event Hub. Producers can send events to a specific partition or let Event Hubs decide the partition through load balancing or a partition key.
Consumer
A consumer is an application or service that reads events from an Event Hub. Consumers read events from one or more partitions. To process events in parallel, you can have multiple consumer instances reading from different partitions.
Consumer Group
A consumer group is a logical view of the entire Event Hub. Each consumer group allows an independent read from the event stream. This enables multiple applications (or different parts of the same application) to read from the same Event Hub without interfering with each other.
Offset
An offset is a unique, positive 64-bit integer value that identifies the position of an event within a partition. Consumers track their progress by recording the offset of the last event they have successfully processed. Event Hubs retains events for a configurable period (default is 1 day, up to 7 days). After the retention period expires, events are discarded.
Partition Key
A partition key is a string value that is used by the producer to determine which partition an event should be sent to. Events with the same partition key are guaranteed to be written to the same partition in the same order. This is useful for scenarios where you need to maintain ordering for a specific entity, such as all events for a particular user or device.
Example: Using Partition Key
If you send events with the partition key user-123, all those events will go to the same partition. This ensures that processing for user-123 happens in order.
// Pseudocode for producing with a partition key
producer.sendEvent(event, { partitionKey: 'user-123' });
Throughput Units (TUs)
Throughput Units (TUs) are the provisioned capacity units for Event Hubs. One TU provides 1MB of ingress or 2MB of egress per second. For standard tiers, you can scale up or down the number of TUs to match your application's throughput needs. Premium and Dedicated tiers offer more advanced capacity management.
Capture
Event Hubs Capture is a feature that enables you to automatically capture the event stream from an Event Hub and save it to an Azure Storage account (Blob Storage or Data Lake Storage Gen2). Captured events are stored in Apache Avro format. This is ideal for archival, batch processing, or further analysis with tools like Azure Databricks or Azure Synapse Analytics.