SQL Window Functions
Window functions perform calculations across a set of table rows that are somehow related to the current row. This set of rows is called a "window". A window function also has an OVER
clause that specifies how to partition the rows into groups (partitions) and how to order the rows within each partition.
Unlike aggregate functions that collapse rows into a single output row, window functions return a value for each row based on the window defined. This makes them incredibly powerful for tasks like ranking, calculating running totals, and finding moving averages.
Common Window Functions
-
ROW_NUMBER()
Assigns a unique sequential integer to each row within its partition, starting at 1. The order of rows within the partition is determined by the
ORDER BY
clause in theOVER
clause.ROW_NUMBER() OVER (ORDER BY column_name)
-
RANK()
Assigns a rank to each row within its partition. Rows with the same value in the ordering columns receive the same rank. The next rank is skipped. For example, if two rows are ranked 2, the next rank is 4.
RANK() OVER (ORDER BY column_name)
-
DENSE_RANK()
Assigns a rank to each row within its partition. Rows with the same value in the ordering columns receive the same rank. Unlike
RANK()
, the next rank is not skipped. For example, if two rows are ranked 2, the next rank is 3.DENSE_RANK() OVER (ORDER BY column_name)
-
LAG()
Accesses data from a previous row in the same result set without the use of self-joins. It returns the value from the row that precedes the current row within its partition based on the specified order.
LAG(column_name, offset, default_value) OVER (ORDER BY column_name)
-
LEAD()
Accesses data from a subsequent row in the same result set without the use of self-joins. It returns the value from the row that follows the current row within its partition based on the specified order.
LEAD(column_name, offset, default_value) OVER (ORDER BY column_name)
-
SUM() OVER()
Calculates the sum of a numeric expression for all rows within the window. When used with
PARTITION BY
, it calculates the sum for each partition.SUM(column_name) OVER (PARTITION BY partition_column ORDER BY order_column ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
-
AVG() OVER()
Calculates the average of a numeric expression for all rows within the window.
AVG(column_name) OVER (PARTITION BY partition_column ORDER BY order_column)
-
COUNT() OVER()
Counts the number of rows within the window.
COUNT(*) OVER (PARTITION BY partition_column)
-
FIRST_VALUE() OVER()
Returns the value of the first row in an ordered partition.
FIRST_VALUE(column_name) OVER (PARTITION BY partition_column ORDER BY order_column)
-
LAST_VALUE() OVER()
Returns the value of the last row in an ordered partition. Note: The default frame for
LAST_VALUE
isRANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
. To get the true last value of the partition, you often need to specifyROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
.LAST_VALUE(column_name) OVER (PARTITION BY partition_column ORDER BY order_column ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
Example: Calculating Running Totals
Let's consider a table of sales data:
SaleID | Product | SaleAmount | SaleDate |
---|---|---|---|
1 | Laptop | 1200.00 | 2023-01-15 |
2 | Keyboard | 75.00 | 2023-01-16 |
3 | Mouse | 25.00 | 2023-01-17 |
4 | Monitor | 300.00 | 2023-01-18 |
5 | Webcam | 50.00 | 2023-01-19 |
To calculate a running total of sales amounts, we can use the SUM()
window function:
SELECT SaleID, Product, SaleAmount, SaleDate, SUM(SaleAmount) OVER (ORDER BY SaleDate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS RunningTotal FROM Sales ORDER BY SaleDate;
This would produce the following result:
SaleID | Product | SaleAmount | SaleDate | RunningTotal |
---|---|---|---|---|
1 | Laptop | 1200.00 | 2023-01-15 | 1200.00 |
2 | Keyboard | 75.00 | 2023-01-16 | 1275.00 |
3 | Mouse | 25.00 | 2023-01-17 | 1300.00 |
4 | Monitor | 300.00 | 2023-01-18 | 1600.00 |
5 | Webcam | 50.00 | 2023-01-19 | 1650.00 |
Example: Partitioning for Departmental Averages
Consider an employee table with departments:
EmployeeID | Name | Department | Salary |
---|---|---|---|
101 | Alice | Sales | 60000 |
102 | Bob | Sales | 65000 |
103 | Charlie | IT | 70000 |
104 | David | IT | 75000 |
105 | Eve | Sales | 58000 |
To calculate the average salary within each department:
SELECT EmployeeID, Name, Department, Salary, AVG(Salary) OVER (PARTITION BY Department) AS DepartmentAverageSalary FROM Employees ORDER BY Department, Salary DESC;
This query would yield:
EmployeeID | Name | Department | Salary | DepartmentAverageSalary |
---|---|---|---|---|
102 | Bob | Sales | 65000 | 61000.00 |
101 | Alice | Sales | 60000 | 61000.00 |
105 | Eve | Sales | 58000 | 61000.00 |
104 | David | IT | 75000 | 72500.00 |
103 | Charlie | IT | 70000 | 72500.00 |