Azure Stream Analytics: Input & Output Configuration

This document details how to configure inputs and outputs for Azure Stream Analytics (ASA) jobs. ASA allows you to ingest real-time data streams from various sources, process them with a SQL-like query language, and send the results to multiple destinations.

Understanding Inputs

Inputs are the data sources for your Stream Analytics job. ASA supports several types of inputs:

Event Hubs

Azure Event Hubs is a highly scalable data streaming platform and event ingestion service. It's commonly used as a source for ASA when dealing with large volumes of telemetry or log data.

  • Configuration: Requires Event Hubs namespace, name, shared access policy name and key (or managed identity).
  • Endpoint: Can be configured for either the Service Bus endpoint (older) or the Event Hubs endpoint (recommended).
  • Consumer Group: Crucial for managing parallel processing. Each ASA job input should ideally use a dedicated consumer group to avoid conflicts with other applications reading from the same Event Hub.

Example Input Configuration (JSON Snippet)

{
    "properties": {
        "type": "Stream",
        "datasource": {
            "type": "Microsoft.ServiceBus/EventHub",
            "properties": {
                "serviceBusNamespace": "your-eventhub-namespace",
                "eventHubName": "your-eventhub-name",
                "sharedAccessPolicyName": "your-policy-name",
                "sharedAccessPolicyKey": "your-policy-key",
                "consumerGroupName": "$Default",
                "authenticationMode": "ConnectionString"
            }
        },
        "compression": {
            "type": "None"
        }
    }
}

IoT Hub

Azure IoT Hub provides a secure, bidirectional communication channel between IoT devices and the cloud. It's a common choice for ingesting device-generated data.

Configuration is similar to Event Hubs, specifying the IoT Hub endpoint and the relevant authentication credentials.

Example Input Configuration (JSON Snippet)

{
    "properties": {
        "type": "Stream",
        "datasource": {
            "type": "Microsoft.Devices/IotHubs",
            "properties": {
                "iotHubNamespace": "your-iot-hub-name",
                "endpoint": "messages/events",
                "sharedAccessPolicyName": "iothubowner",
                "sharedAccessPolicyKey": "your-iot-hub-key",
                "consumerGroupName": "$Default",
                "authenticationMode": "ConnectionString"
            }
        },
        "compression": {
            "type": "None"
        }
    }
}

Blob Storage

Azure Blob Storage can be used as an input for batch processing or for providing reference data to a streaming job. It's not typically used for high-volume, real-time streaming but is useful for historical analysis or enriching streams.

  • Configuration: Requires storage account name, container name, and authentication (access key or managed identity).
  • Path Pattern: Define a pattern to select specific blobs (e.g., folder1/YYYY/MM/DD/data-*.json).
  • Date Format: Specify the format of date components in the path pattern (e.g., yyyy/MM/dd).

Understanding Outputs

Outputs are the destinations where your processed data from ASA will be sent. ASA supports a variety of output sinks:

Blob Storage

Store processed data in Azure Blob Storage for archival, further analysis, or integration with other systems.

  • Configuration: Storage account name, container name, and authentication.
  • Path Pattern: Define how output files will be organized (e.g., output/processed_data/YYYY/MM/DD/output-*.json).
  • Date Format: Similar to input blob storage, specify date formats in the path.
  • Serialization Format: Choose the format for the output data (e.g., JSON, CSV, Avro).

SQL Database

Write processed data directly into an Azure SQL Database table for relational data storage and querying.

  • Configuration: Server name, database name, table name, and SQL authentication or managed identity.
  • Write Mode: Choose between Insert (default) or Upsert (requires defining a key column).

Azure Cosmos DB

Send processed data to Azure Cosmos DB, a globally distributed, multi-model NoSQL database service.

  • Configuration: Account endpoint, database name, collection name, and key (or managed identity).
  • Document ID: Optionally specify a field from your output to use as the document ID.

Event Hubs

Send processed data to another Event Hub for further real-time processing by other applications or services.

  • Configuration: Event Hubs namespace, name, and authentication.

Power BI

Create real-time dashboards and reports by sending processed data directly to a Power BI dataset.

  • Configuration: Requires authentication and workspace/dataset selection. ASA handles the dataset creation and schema definition.

Proper configuration of inputs and outputs is fundamental to building effective Azure Stream Analytics solutions. Always consider the data volume, latency requirements, and downstream consumption needs when choosing and configuring your sources and sinks.