SQL Window Functions

Window functions perform calculations across a set of table rows that are somehow related to the current row. This set of rows is called a "window". A window function is similar to an aggregate function, but it returns a value for each row, whereas an aggregate function returns a single value for a group of rows.

Introduction to Window Functions

Window functions allow you to perform complex analytical queries that were difficult or impossible with standard SQL aggregate functions. They are particularly useful for tasks like:

  • Calculating running totals.
  • Finding the rank of a row within a partition.
  • Determining moving averages.
  • Comparing a row's value to the average value within its partition.

Syntax

The basic syntax for a window function involves the OVER clause:

window_function(expression) OVER ([PARTITION BY partition_expression, ... ] [ORDER BY sort_expression [ASC | DESC], ... ])

OVER Clause Components:

  • PARTITION BY: Divides the rows into partitions to which the window function is applied. Similar to GROUP BY but does not collapse rows.
  • ORDER BY: Specifies the order of rows within each partition. This is crucial for functions like ROW_NUMBER, RANK, LAG, and LEAD.
  • Frame Clause (optional): Defines a subset of rows within the partition to be considered for the current row (e.g., ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW for running totals).

Common Window Functions

1. Ranking Functions

  • ROW_NUMBER(): Assigns a unique sequential integer to each row within its partition.
  • RANK(): Assigns a rank to each row within its partition. Rows with the same value receive the same rank, and the next rank is skipped.
  • DENSE_RANK(): Assigns a rank to each row within its partition. Rows with the same value receive the same rank, and the next rank is not skipped.
  • NTILE(n): Divides rows within a partition into a specified number of groups (n) and assigns a group number to each row.

Example: Ranking Sales by Employee

Assume a Sales table with columns EmployeeID, SaleDate, and Amount.

SELECT
    EmployeeID,
    SaleDate,
    Amount,
    RANK() OVER (PARTITION BY EmployeeID ORDER BY Amount DESC) as SaleRank
FROM
    Sales;

This query ranks each sale for each employee in descending order of amount. The highest sale for an employee gets rank 1.

2. Value Functions

  • LAG(expression, offset, default): Accesses data from a previous row in the same result set without the use of a subquery.
  • LEAD(expression, offset, default): Accesses data from a subsequent row in the same result set without the use of a subquery.
  • FIRST_VALUE(expression): Returns the value of the specified expression for the first row in the window frame.
  • LAST_VALUE(expression): Returns the value of the specified expression for the last row in the window frame.
  • NTH_VALUE(expression, n): Returns the value of the specified expression for the nth row in the window frame.

Example: Comparing Sales to Previous Sale

SELECT
    EmployeeID,
    SaleDate,
    Amount,
    LAG(Amount, 1, 0) OVER (PARTITION BY EmployeeID ORDER BY SaleDate) as PreviousSaleAmount
FROM
    Sales;

This query shows the amount of the previous sale for each employee. If there is no previous sale, it defaults to 0.

3. Aggregate Functions (as Window Functions)

Standard aggregate functions like SUM, AVG, COUNT, MIN, and MAX can also be used as window functions by adding the OVER clause.

Example: Running Total of Sales

SELECT
    EmployeeID,
    SaleDate,
    Amount,
    SUM(Amount) OVER (PARTITION BY EmployeeID ORDER BY SaleDate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as RunningTotal
FROM
    Sales;

This query calculates the cumulative sum of sales for each employee, ordered by sale date. ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW is the default frame for ordered partitions, but it's good practice to be explicit.

When to Use Window Functions

Use window functions when you need to perform calculations that involve multiple rows related to the current row, without collapsing the rows like a standard GROUP BY clause would. They are invaluable for time-series analysis, performance reporting, and ranking scenarios.

Considerations

  • Performance: While powerful, complex window functions can impact query performance. Ensure proper indexing on columns used in PARTITION BY and ORDER BY clauses.
  • Syntax Variations: Different SQL dialects (e.g., SQL Server, PostgreSQL, MySQL 8+, Oracle) may have slight variations in the supported window functions or syntax.

For more advanced usage and specific SQL dialect implementations, please refer to the detailed documentation for your database system.