Using Azure Event Hubs with Stream Analytics

Note: This tutorial guides you through setting up a basic pipeline to ingest data from Azure Event Hubs and process it using Azure Stream Analytics. Ensure you have an Azure subscription and the necessary permissions.

Introduction

Azure Event Hubs is a highly scalable data streaming platform and event ingestion service. Azure Stream Analytics is a real-time analytics service that enables you to develop and run real-time analytics on multiple streaming data inputs. Combining these two services allows you to ingest vast amounts of data and process it in real-time for immediate insights and actions.

Prerequisites

Step 1: Create an Azure Event Hubs Namespace and Event Hub

First, we need an Event Hub to send our data to. You can do this via the Azure Portal or Azure CLI.

Using Azure Portal:

  1. Navigate to the Azure Portal.
  2. Click "Create a resource".
  3. Search for "Event Hubs" and select it.
  4. Click "Create".
  5. Select your Subscription, Resource Group, and provide a Namespace name (globally unique).
  6. Choose a Location and a Pricing tier (e.g., Standard).
  7. Click "Review + create", then "Create".
  8. Once the namespace is deployed, navigate to it and click "Event Hubs".
  9. Click "+ Event Hub".
  10. Provide an Event Hub name (e.g., myeventhub) and click "Create".

Using Azure CLI:

Replace placeholders with your desired values.


az group create --name myResourceGroup --location eastus
az eventhubs namespace create --name myEventHubNamespace --resource-group myResourceGroup --location eastus
az eventhubs eventhub create --name myEventHub --namespace-name myEventHubNamespace --resource-group myResourceGroup
            

Step 2: Create a Stream Analytics Job

Now, let's set up the Azure Stream Analytics job that will read from our Event Hub.

Using Azure Portal:

  1. In the Azure Portal, click "Create a resource".
  2. Search for "Stream Analytics job" and select it.
  3. Click "Create".
  4. Provide a Job name, Subscription, Resource Group, and Location.
  5. For "Hosting environment", choose "Cloud".
  6. Click "Review + create", then "Create".

Step 3: Configure Event Hubs as an Input for Stream Analytics

Connect your Stream Analytics job to the Event Hub you created.

  1. Navigate to your Stream Analytics job.
  2. Under "Job topology", click "Inputs".
  3. Click "+ Add stream input" and select "Event Hub".
  4. Configure the input:
    • Input alias: Give it a meaningful name (e.g., EventHubInput).
    • Event Hub namespace: Select the namespace you created.
    • Event hub name: Select the specific Event Hub name.
    • Event serialization format: Choose "JSON" (or Avro/CSV as appropriate).
    • Authentication mode: Choose "Connection string" or "Managed Identity" (recommended for production). For this tutorial, we'll use "Connection string".
    • Connection string: You can get this from your Event Hub namespace's "Shared access policies". Ensure the policy has "Listen" permissions.
  5. Click "Save".

Step 4: Configure an Output for Stream Analytics

For this tutorial, we'll output the processed data to Azure Blob Storage.

  1. In your Stream Analytics job, click "Outputs".
  2. Click "+ Add" and select "Blob Storage".
  3. Configure the output:
    • Output alias: Give it a name (e.g., BlobOutput).
    • Sink: Select "Blob Storage".
    • Storage account: Select your storage account.
    • Storage container: Select or create a container.
    • Authentication mode: Use "Connection string" or "Managed Identity".
    • Serialization format: Choose "JSON".
    • Path pattern: Use a pattern like {date:yyyy/MM/dd}/{time:HH-mm-ss} to organize your output files.
    • Partition key: Leave blank for now.
  4. Click "Save".

Step 5: Write a Stream Analytics Query

Define the real-time transformation logic. This simple query selects all fields from the input.

  1. In your Stream Analytics job, click "Query".
  2. Replace the default query with the following:
    
    SELECT
        *
    INTO
        BlobOutput
    FROM
        EventHubInput
                        
  3. Click "Save query".

Step 6: Start the Stream Analytics Job

Once the input, output, and query are configured, start the job.

  1. In your Stream Analytics job, click "Start" on the overview page.
  2. Choose "Now" for the "Job output start time".
  3. Click "Start".

Step 7: Send Data to Event Hubs

You'll need a producer application to send data to your Event Hub. Here's a simple example using Python and the azure-eventhub SDK.


from azure.eventhub import EventHubProducer, EventData
import json
import time

EVENT_HUB_CONNECTION_STR = "YOUR_EVENT_HUB_CONNECTION_STRING" # Get from Event Hubs Shared Access Policies (RootManageSharedAccessKey or a custom one with Send)
EVENT_HUB_NAME = "myeventhub" # Replace with your Event Hub name

def send_event(producer, data):
    event_data = EventData(json.dumps(data))
    producer.send(event_data)
    print(f"Sent: {data}")

if __name__ == "__main__":
    producer = EventHubProducer.from_connection_string(EVENT_HUB_CONNECTION_STR, event_hub_name=EVENT_HUB_NAME)

    try:
        for i in range(10):
            message_payload = {
                "deviceId": f"device-{i%3}",
                "temperature": 20.0 + (i % 5),
                "humidity": 50.0 + (i % 10),
                "timestamp": int(time.time())
            }
            send_event(producer, message_payload)
            time.sleep(1)

    finally:
        producer.close()
        print("Producer closed.")
            

Install the SDK: pip install azure-eventhub

Remember to replace YOUR_EVENT_HUB_CONNECTION_STRING with your actual connection string.

Verification

After sending data, navigate to your configured Blob Storage container. You should see files appearing with the processed data according to your path pattern.

Next Steps

Explore more complex Stream Analytics queries, different input/output configurations (like Azure SQL Database, Power BI), and error handling strategies.