Azure Stream Analytics

Documentation - Writing Queries

Writing Queries in Azure Stream Analytics

Azure Stream Analytics uses a SQL-like query language to transform and process streaming data. This document provides an overview of how to write effective queries.

Core Concepts

Basic Query Structure

A Stream Analytics query typically selects data from one or more input streams, transforms it, and directs it to one or more output sinks. The fundamental structure is similar to standard SQL:

SELECT
    column1,
    column2,
    AggregateFunction(column3) AS alias_name
INTO
    output_alias
FROM
    input_alias TIMESTAMP BY event_timestamp_column
WHERE
    condition
GROUP BY
    column1, TumblingWindow(minute, 5);

Key Clauses and Functions

TIMESTAMP BY

This clause is crucial for defining how events are ordered in time. You specify a column that contains the event's timestamp. Stream Analytics uses this to manage event ordering and apply time-based operations.

SELECT *
INTO OutputAlias
FROM InputStream TIMESTAMP BY EventProcessedUtcTime;

TumblingWindow

Divides the stream into distinct, non-overlapping time intervals. Useful for aggregations over fixed periods.

SELECT
    System.Timestamp AS WindowEnd,
    COUNT(*) AS EventCount
INTO AggregatedOutput
FROM InputStream
GROUP BY TumblingWindow(minute, 1); -- 1-minute windows

HoppingWindow

Divides the stream into time intervals that overlap. The hop size determines how often new windows start.

SELECT
    System.Timestamp AS WindowEnd,
    AVG(Temperature) AS AvgTemperature
INTO HoppingAggregatedOutput
FROM SensorData
GROUP BY HoppingWindow(minute, 5, 1); -- 5-minute windows, hopping every 1 minute

SlidingWindow

Creates windows that slide forward with the stream, including events within a specified duration. Useful for detecting patterns that persist over time.

SELECT
    DeviceId,
    COUNT(*) AS MessageCount
INTO SlidingWindowOutput
FROM InputData
GROUP BY DeviceId, SlidingWindow(second, 60); -- 60-second sliding window

SessionWindow

Groups events into sessions based on inactivity. A session ends when there's a gap longer than the specified inactivity timeout.

SELECT
    UserId,
    COUNT(*) AS ActionsInSession
INTO SessionOutput
FROM UserActivity
GROUP BY UserId, SessionWindow(minute, 30); -- 30-minute inactivity timeout

JOIN Operations

You can join streams together. For time-series data, temporal joins are common, especially between a reference data input and a stream input.

SELECT
    s.DeviceId,
    s.Temperature,
    r.Location
INTO EnrichedOutput
FROM
    SensorStream s
JOIN
    ReferenceData r ON s.DeviceId = r.DeviceId;

LAG Function

Accesses data from a previous event within a specified time window or partition. Useful for comparing consecutive events.

SELECT
    DeviceId,
    CurrentReading,
    PreviousReading
INTO ComparisonOutput
FROM (
    SELECT
        DeviceId,
        Reading AS CurrentReading,
        LAG(Reading, 1, 0) OVER ( PARTITION BY DeviceId ORDER BY System.Timestamp ) AS PreviousReading
    FROM SensorData
)
WHERE CurrentReading > PreviousReading;

Tips for Writing Efficient Queries

This is a foundational guide. For more advanced scenarios, explore functions for string manipulation, date/time operations, complex event processing, and machine learning model integration.