Message Schema and Data Formats

Understanding the message schema is crucial for effectively sending and receiving data with Azure Event Hubs. Event Hubs is a highly scalable data streaming platform that accepts and processes millions of events per second. It can process event streams in real-time and store streams of event data. When events are sent to an Event Hub, they are serialized into a format that the Event Hub can store and later retrieve.

Event Hubs itself doesn't enforce a specific application-level message schema. It treats the event body as a binary blob of data. However, for interoperability and ease of use, common serialization formats are recommended and widely adopted by developers.

Common Serialization Formats

The choice of serialization format impacts performance, data size, and ease of use across different programming languages and systems. Here are some widely used formats:

Event Body Structure

While Event Hubs doesn't mandate a specific application schema, a well-structured event body facilitates easier processing. A typical event body might contain:

Example JSON Message Schema

This example demonstrates a simple JSON structure for an IoT device event:


{
  "eventId": "a1b2c3d4-e5f6-7890-1234-567890abcdef",
  "timestamp": "2023-10-27T10:30:00Z",
  "deviceId": "device-007",
  "eventType": "temperature-alert",
  "payload": {
    "temperature": 35.5,
    "unit": "Celsius",
    "location": "Living Room"
  },
  "sequenceNumber": 1024
}
            

Event Properties

In addition to the event body, Event Hubs provides a set of system properties and allows for custom properties to be attached to each event. These properties are key-value pairs that can be used for routing, filtering, or adding context without modifying the event body.

Common Event Properties:

Property Name Type Description
enqueuedTimeUtc DateTime The UTC date and time when the event was enqueued by Event Hubs.
offset Long The offset of the event within its partition.
sequenceNumber Long The sequence number of the event within its partition.
partitionKey String The partition key used for routing the event.
correlationId String Application-defined identifier for correlating events.
userProperties Dictionary<String, Object> Custom key-value pairs defined by the application.
Tip: Using userProperties is a flexible way to add metadata like source application, region, or critical flags without altering your primary data payload. This can simplify filtering and routing logic.

Schema Evolution

For formats like Avro and Protobuf, schema evolution is a key feature. This allows you to modify your data schema over time without breaking existing applications that process older versions of the schema. Event Hubs, as a transport layer, facilitates this by allowing different schema versions to coexist.

Best Practices

Warning: Event Hubs does not perform schema validation on the event body. It's the responsibility of your producer and consumer applications to ensure data integrity and format compliance.

By carefully designing your message schema and leveraging Event Hubs' features, you can build robust and scalable event-driven applications.