Azure Event Hubs

Developer's Guide: Core Concepts

Understanding Core Concepts

Azure Event Hubs is a highly scalable data streaming platform and event ingestion service. It can receive and process millions of events per second. Understanding its core concepts is crucial for designing and building efficient event-driven applications.

Event Hubs Namespace

An Event Hubs namespace is a logical container for Event Hubs. It provides a unique FQDN (fully qualified domain name) and is the administrative boundary for managing access policies, geo-disaster recovery configurations, and other settings.

  • Think of a namespace as a "project" or "application" scope for your event streams.
  • All Event Hubs within a namespace share the same FQDN.

Event Hub

An Event Hub is the entity within a namespace that actually stores the event data. It's the core component that producers write events to and consumers read events from.

  • An Event Hub is configured with a number of partitions, which determine its parallelism and throughput capacity.
  • You can think of an Event Hub as a specific data stream (e.g., "user-activity-stream", "iot-device-telemetry").

Partitions

Partitions are ordered sequences of events within an Event Hub. They are the unit of parallelism in Event Hubs.

  • When you create an Event Hub, you specify the number of partitions. This number can often be increased later, but not decreased.
  • Events sent to an Event Hub are distributed across its partitions.
  • Producers can target specific partitions using keys or let Event Hubs distribute them.
  • Consumers can read from one or more partitions in parallel.
  • The number of partitions directly impacts the maximum throughput and concurrency of your Event Hub.

Partitioning Strategy:

  • Partition Key: If you send an event with a partition key, all events with the same key will be directed to the same partition. This is useful for maintaining event order for a specific entity (e.g., all events for user ID '123' go to the same partition).
  • Round-Robin: If no partition key is specified, Event Hubs distributes events in a round-robin fashion across available partitions.

Consumer Groups

A consumer group is an abstraction that allows multiple independent applications or services to read from an Event Hub without interfering with each other. Each consumer group maintains its own offset within each partition.

  • The default consumer group is named $Default.
  • You can create custom consumer groups for different applications (e.g., "real-time-dashboard-cg", "batch-processing-cg").
  • Each consumer group tracks its own progress, so one application reading from a consumer group won't affect another application reading from a different consumer group.

Key Benefits:

  • Multiple applications can consume the same event stream concurrently.
  • Independent consumption: One consumer's failure or restart doesn't impact others.
  • Allows for different processing logic for the same data.

Events

An event is a lightweight record containing information. In Event Hubs, events are typically JSON or Avro serialized payloads, but they can be any format.

  • Events consist of a body, properties (key-value pairs), and system properties (like offset, sequence number, timestamp).
  • When events are sent to an Event Hub, they are appended to a partition.

Producers and Consumers

Producers: Applications or services that send event data to an Event Hub.

Consumers: Applications or services that read event data from an Event Hub, typically through a specific consumer group.

Event Hubs supports various SDKs for .NET, Java, Python, Go, and JavaScript, along with AMQP and Kafka protocols, enabling diverse producer and consumer implementations.

Throughput and Scaling

Event Hubs is designed for high throughput. You can scale your Event Hubs capacity by:

  • Increasing the number of partitions: Each partition provides a dedicated unit of throughput.
  • Adjusting Throughput Units (TUs) or Processing Units (PUs): For standard and premium tiers, respectively. TUs/PUs represent a pre-configured unit of dedicated messaging throughput.
Tip: Plan your partition count based on your expected throughput and the number of parallel consumers you anticipate. It's generally easier to scale out by adding more consumers to existing partitions than to increase partitions and rebalance.

Data Retention

Event Hubs retains event data for a configurable period, typically from 24 hours up to 7 days (for standard and premium tiers). Once the retention period expires, the data is automatically discarded.

  • This ensures that older data is automatically cleaned up.
  • Consumers must process data within the retention window.