Windowing in Azure Stream Analytics
Azure Stream Analytics (ASA) uses the concept of windowing to perform computations over time-series data. Because data streams are continuous, you often need to aggregate or analyze data within specific time boundaries. Windowing allows you to define these boundaries for operations like aggregations (SUM, COUNT, AVG), temporal joins, and detecting patterns over time.
Types of Windows
ASA supports several types of windows, each suited for different analytical scenarios:
1. Tumbling Windows
Tumbling windows are fixed-size, non-overlapping windows. Each event belongs to exactly one tumbling window. They are useful for discrete aggregations over fixed intervals.
- Size: Defined by a specific duration (e.g., 5 minutes, 1 hour).
- Behavior: The window starts at a specific time and advances by its size.
SELECT
System.Timestamp AS WindowEnd,
COUNT(*) AS EventCount
FROM
YourInputAlias
GROUP BY
TumblingWindow(minute, 5)
2. Hopping Windows
Hopping windows are fixed-size windows that can overlap. They allow you to analyze data in windows that "hop" forward by a defined amount, enabling more granular analysis without losing data that falls into overlapping periods.
- Size: The duration of the window.
- Skip: The interval by which the window hops forward. The skip interval must be less than or equal to the window size.
SELECT
System.Timestamp AS WindowEnd,
AVG(SensorValue) AS AverageReading
FROM
YourInputAlias
GROUP BY
HoppingWindow(minute, 10, 5) -- 10-minute window, hopping by 5 minutes
3. Sliding Windows
Sliding windows are fixed-size windows that move forward with each incoming event. An event can belong to multiple sliding windows if it falls within their respective time spans. This is ideal for looking at activity over a rolling period.
- Size: The duration of the window.
- Duplicates: Events can appear in multiple windows.
SELECT
System.Timestamp AS WindowEnd,
SUM(Amount) AS TotalSales
FROM
SalesStream
GROUP BY
SlidingWindow(hour, 1) -- 1-hour sliding window
4. Session Windows
Session windows group events based on user activity or idle time. A session is defined by a period of activity followed by a period of inactivity. If no events arrive within a specified timeout, the current session ends.
- Timeout: The maximum inactivity duration before a session ends.
- Events: All events within an active session are grouped together.
SELECT
System.Timestamp AS WindowEnd,
COUNT(*) AS SessionActivityCount
FROM
ClickStream
GROUP BY
SessionWindow(minute, 30) -- Session ends after 30 minutes of inactivity
Key Concepts
- Event Enqueue Time: The time an event is added to the input stream. ASA uses this by default for windowing unless
TIMESTAMP BYis used. - Timestamp By: Allows you to specify a custom timestamp column from your event data (e.g., application timestamp) to be used for windowing. This is crucial for accurate temporal analysis.
- Window End Time: In aggregation queries,
System.Timestampwithin theSELECTstatement typically refers to the end time of the window. - Watermarks: ASA uses watermarks to handle out-of-order events. A watermark represents the current time estimate for the last event that arrived. This ensures that ASA doesn't wait indefinitely for late events.
💡 Tip: Choosing the Right Window
The choice of window depends on your business logic. Tumbling windows are simple for fixed intervals, hopping windows for overlapping analysis, sliding windows for rolling metrics, and session windows for tracking user engagement.
Advanced Windowing Scenarios
You can combine windowing with other ASA features such as:
TIMESTAMP BY: Precisely control which timestamp is used for windowing.WITHIN GROUP (ORDER BY timestamp): For ranking and ordering events within a window.- Temporal Joins: Joining two streams based on overlapping time windows.