Azure Stream Analytics Query Language
The Azure Stream Analytics Query Language (SAQL) is a SQL-like language used to process data streams in real-time. It allows you to filter, aggregate, and transform streaming data with low latency.
Core Concepts
SELECT Statement
The `SELECT` statement is used to retrieve data from input streams or query results. It supports standard SQL clauses like `FROM`, `WHERE`, `GROUP BY`, and `ORDER BY`.
SELECT
System.Timestamp AS WindowEnd,
DeviceId,
AVG(Temperature) AS AverageTemperature
FROM
IoTHub
WHERE
IsActive = 1
GROUP BY
DeviceId,
TumblingWindow(minute, 5)
Input Streams
Input streams represent the data sources your ASA job reads from. These can be Azure IoT Hub, Azure Event Hubs, or Azure Blob Storage.
Output Data
Output data is what your ASA job writes to after processing. Common destinations include Azure SQL Database, Azure Cosmos DB, Power BI, and Event Hubs.
Windowing Functions
Windowing functions are crucial for performing time-based aggregations on unbounded streaming data.
Tumbling Window
Tumbling windows divide the stream into discrete, non-overlapping time segments.
TumblingWindow(size, [field_or_time])
Example: `TumblingWindow(minute, 5)` creates 5-minute windows.
Hopping Window
Hopping windows allow for overlapping windows, which is useful for analyzing data over sliding periods.
HoppingWindow( HopLength, WindowSize [,field_or_time] )
Example: `HoppingWindow(minute, 5, 10)` creates 10-minute windows that advance every 5 minutes.
Sliding Window
Sliding windows are a special case of hopping windows where the hop length is equal to the window size.
SlidingWindow(size [,field_or_time])
Session Window
Session windows group events that occur within a defined period of activity, separated by inactivity.
SessionWindow(session_gap [,field_or_time])
Example: `SessionWindow(minutes, 10)` groups events into sessions that are separated by at least 10 minutes of inactivity.
Functions
SAQL provides a rich set of built-in functions for data manipulation and analysis.
Aggregate Functions
AVG(): Calculates the average of a value.COUNT(): Counts the number of records.MAX(): Finds the maximum value.MIN(): Finds the minimum value.SUM(): Calculates the sum of values.
Built-in Functions
SAQL includes functions for:
- String manipulation (e.g.,
LEN(),SUBSTRING()) - Date and Time operations (e.g.,
DATEDIFF(),DATEADD()) - Mathematical operations (e.g.,
ABS(),ROUND()) - Geospatial operations (e.g.,
ST_DISTANCE()) - Complex types (e.g., JSON parsing with
GetRecordPropertyValue())
Joins
You can join multiple streams or a stream with reference data.
Stream-to-Stream Joins
Join events from two different streams, typically within a time window.
SELECT
input1.DeviceId,
input1.Value AS SensorValue,
input2.Status
FROM
InputStream1 AS input1
JOIN
InputStream2 AS input2
ON
input1.DeviceId = input2.DeviceId
AND DATEDIFF(minute, input1, input2) BETWEEN 0 AND 5
Stream-to-Reference Data Joins
Join streaming data with static reference data to enrich events.
SELECT
s.DeviceId,
s.Temperature,
r.Location
FROM
StreamingData AS s
JOIN
ReferenceData AS r ON s.DeviceId = r.DeviceId
Common Patterns
- Filtering: Use the
WHEREclause to select relevant events. - Aggregation: Combine events within windows using
GROUP BYand aggregate functions. - Transformation: Modify event structure and data types using
SELECTlist expressions. - Anomaly Detection: Implement custom logic or use built-in functions for detecting unusual patterns.