Window Functions in SQL
Window functions perform calculations across a set of table rows that are somehow related to the current row. This set of rows is called a "window". Window functions are similar to aggregate functions, but they do not cause rows to be collapsed into a single output row. Instead, they return a value for each row from the underlying query.
Window functions can be used in the SELECT
list and in the ORDER BY
clause.
Syntax
The general syntax for a window function is:
window_function ( expression )
OVER
(
[ partition_by_clause ]
[ order_by_clause ]
[ frame_clause ]
)
window_function
: The function to be applied to the window (e.g.,ROW_NUMBER
,RANK
,LAG
,SUM
,AVG
).expression
: The expression on which the function operates. This is typically a column name for aggregate window functions.OVER
: This keyword signifies that the function is a window function.partition_by_clause
: This clause divides the rows of the query into partitions (groups). The window function is applied independently to each partition. If omitted, the entire result set is treated as a single partition.order_by_clause
: This clause specifies the logical order of rows within each partition. This is crucial for functions that depend on order, like ranking or lead/lag functions.frame_clause
: This optional clause defines the subset of rows within the partition to be considered for the current row. It specifies the "frame" of rows relative to the current row.
Types of Window Functions
1. Ranking Window Functions
These functions assign a rank to each row within its partition based on the ordering specified in the ORDER BY
clause.
ROW_NUMBER()
: Assigns a unique, sequential integer to each row within its partition.RANK()
: Assigns a rank to each row. Rows with the same value in the ordering columns receive the same rank, and the next rank is skipped (e.g., 1, 2, 2, 4).DENSE_RANK()
: Assigns a rank to each row. Rows with the same value receive the same rank, and the next rank is consecutive (e.g., 1, 2, 2, 3).NTILE(n)
: Divides the rows within each partition into a specified number of groups (buckets) and assigns a bucket number to each row.
Example: Using RANK()
Find the rank of each employee within their department based on salary.
SELECT
employee_name,
department,
salary,
RANK() OVER ( PARTITION BY department ORDER BY salary DESC ) AS salary_rank
FROM employees;
2. Aggregate Window Functions
These are standard aggregate functions (like SUM
, AVG
, COUNT
, MIN
, MAX
) used as window functions. They compute an aggregate value for each row based on the window defined for that row.
Example: Using SUM()
Calculate the cumulative sum of sales for each product category over time.
SELECT
sale_date,
product_category,
sale_amount,
SUM(sale_amount) OVER ( PARTITION BY product_category ORDER BY sale_date ) AS cumulative_sales
FROM sales_data;
3. Value Window Functions (Navigation Functions)
These functions access data from other rows in the same result set without the need for self-joins. They are useful for comparing values between rows.
LAG(expression, offset, default_value)
: Accesses data from a previous row in the same result set.LEAD(expression, offset, default_value)
: Accesses data from a subsequent row in the same result set.FIRST_VALUE(expression)
: Returns the value of the expression for the first row in the window frame.LAST_VALUE(expression)
: Returns the value of the expression for the last row in the window frame.NTH_VALUE(expression, n)
: Returns the value of the expression for the nth row in the window frame.
Example: Using LEAD()
Find the difference in salary between an employee and the next highest-paid employee in the same department.
SELECT
employee_name,
department,
salary,
LEAD(salary, 1, 0) OVER ( PARTITION BY department ORDER BY salary DESC ) AS next_highest_salary
FROM employees;
The FRAME Clause
The FRAME
clause specifies which rows within a partition are included in the "frame" for the current row. It refines the window beyond just the partition and order.
Common frame units include:
ROWS BETWEEN ... AND ...
RANGE BETWEEN ... AND ...
Common frame boundaries:
UNBOUNDED PRECEDING
: From the beginning of the partition.n PRECEDING
: Then
rows before the current row.CURRENT ROW
: The current row.n FOLLOWING
: Then
rows after the current row.UNBOUNDED FOLLOWING
: To the end of the partition.
If the frame_clause
is omitted, the default behavior depends on whether an ORDER BY
clause is present in the OVER
clause:
- If
ORDER BY
is present: The default frame isRANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
. - If
ORDER BY
is absent: The default frame isRANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
(the entire partition).
FRAME
clause is key to precisely controlling the scope of calculations for window functions, especially when dealing with cumulative sums or moving averages.
Common Use Cases
- Calculating running totals or cumulative sums.
- Finding rankings within groups (e.g., top N performers per department).
- Comparing a row's value to previous or subsequent rows (e.g., year-over-year growth).
- Identifying gaps or overlaps in sequences.
- Calculating moving averages or rolling sums.