Core Concepts of Azure Event Hubs
Azure Event Hubs is a highly scalable data streaming platform and event ingestion service. It can capture, transform, and store millions of events per second. Understanding its core concepts is crucial for effectively using the service.
Event Hub
An Event Hub is the central entity for event ingestion. It's a partitioned stream of events. Event Hubs can be thought of as a Kafka-like topic. Each event hub has a name, and you send events to that specific hub. Data is organized into partitions within an event hub. You can send messages to a specific partition or let Event Hubs choose a partition based on a partitioning key.
Producer
A producer is any application or service that sends event data to an Event Hub. Producers can send events to a specific partition or allow Event Hubs to choose a partition for them. They typically use SDKs provided by Azure for various programming languages.
Consumer
A consumer is any application or service that reads event data from an Event Hub. Consumers typically read events in an ordered fashion within a partition. To manage reading from partitions, consumers often use consumer groups.
Partition
Partitions are the fundamental unit of parallelism in Event Hubs. They provide ordered, immutable sequences of events. The number of partitions is set when an Event Hub is created and cannot be changed later. Event Hubs distribute incoming events across these partitions. Consumers can read from multiple partitions concurrently to achieve higher throughput. Events within a partition are guaranteed to be ordered.
Consumer Group
A consumer group allows multiple applications or multiple instances of the same application to read from an Event Hub independently. Each consumer group maintains its own offset within each partition. This means that different consumer groups can process the same events without interfering with each other. For example, one consumer group might be for real-time analytics, while another might be for archiving.
Offset
An offset is a unique, 64-bit integer value that uniquely identifies an event within a partition. The offset is assigned by Event Hubs and acts as a bookmark for consumers. Consumers keep track of the last offset they have processed for each partition within their consumer group. When a consumer restarts, it can resume reading from the last known offset.
Partition Key
When a producer sends an event, it can optionally include a partition key. If a partition key is provided, Event Hubs uses a hash of the key to determine which partition the event belongs to. This ensures that all events with the same partition key are sent to the same partition. This is useful for scenarios where event order within a specific logical group (e.g., all events from a single device) is important.
Capture
Event Hubs Capture is a built-in feature that automatically and incrementally captures the output of an Event Hub stream and writes it to a Microsoft Azure Storage blob or Azure Data Lake Storage Gen2 account. This is ideal for archival, batch processing, or reprocessing scenarios.
Throughput Units (TUs) / Processing Units (PUs)
Event Hubs capacity is measured in Throughput Units (TUs) or Processing Units (PUs) depending on the tier. TUs provide a predictable level of throughput for ingress and egress. PUs offer a similar concept with bundled compute resources for enhanced performance and features. You scale your Event Hubs by increasing the number of TUs or PUs.
Schema Registry
While not strictly an Event Hubs concept, Schema Registry is often used in conjunction with Event Hubs for managing and validating event schemas. This helps ensure data consistency and interoperability between producers and consumers.
// Example of sending an event (conceptual)
const { EventHubProducerClient } = require("@azure/event-hubs");
async function sendEvent(connectionString, eventHubName, message) {
const producer = new EventHubProducerClient(connectionString, eventHubName);
const batch = await producer.createBatch();
batch.tryAdd({ body: message });
await producer.sendBatch(batch);
await producer.close();
}
// Example of receiving an event (conceptual)
const { EventHubConsumerClient } = require("@azure/event-hubs");
async function receiveEvents(connectionString, eventHubName, consumerGroupName) {
const consumerClient = new EventHubConsumerClient(consumerGroupName, connectionString, eventHubName);
const subscription = consumerClient.subscribe({
async processEvents(events, context) {
for (const event of events) {
console.log(`Received event: ${JSON.stringify(event.body)}`);
}
},
async processError(err, context) {
console.error(`Error occurred: ${err.message}`);
}
});
// To stop receiving, you would call subscription.close()
}