MSDN Documentation

SQL Grouping and Aggregation

This document covers essential concepts of grouping and aggregation in SQL Server. These techniques are fundamental for summarizing and analyzing data within your databases.

Introduction to Aggregation

Aggregation involves performing calculations on a set of rows and returning a single value. SQL Server provides several built-in aggregate functions:

Example Usage of Aggregate Functions

Consider a table named Sales with columns like ProductID, Quantity, and Price.

SELECT
    COUNT(*) AS TotalOrders,
    SUM(Quantity * Price) AS TotalRevenue,
    AVG(Quantity) AS AverageQuantityPerOrder,
    MIN(Price) AS MinimumPrice,
    MAX(Price) AS MaximumPrice
FROM
    Sales;

The GROUP BY Clause

The GROUP BY clause is used to arrange identical data into groups. It is often used in conjunction with aggregate functions to perform calculations on each group.

When you use GROUP BY, all columns in the SELECT list must either be in the GROUP BY clause or be an aggregate function.

Example: Grouping Sales by Product

To find the total revenue for each product:

SELECT
    ProductID,
    SUM(Quantity * Price) AS TotalRevenuePerProduct
FROM
    Sales
GROUP BY
    ProductID
ORDER BY
    ProductID;

This query will return a result set where each row represents a unique ProductID and its corresponding total revenue.

The HAVING Clause

The HAVING clause is used to filter groups based on a specified condition. It is similar to the WHERE clause, but WHERE filters individual rows before grouping, while HAVING filters groups after aggregation.

Example: Products with Revenue Above a Threshold

To find products whose total revenue exceeds $10,000:

SELECT
    ProductID,
    SUM(Quantity * Price) AS TotalRevenuePerProduct
FROM
    Sales
GROUP BY
    ProductID
HAVING
    SUM(Quantity * Price) > 10000
ORDER BY
    TotalRevenuePerProduct DESC;
Tip: GROUP BY vs. WHERE

Remember: WHERE filters rows before grouping, while HAVING filters groups after grouping.

Advanced Aggregation Techniques

ROLLUP and CUBE

ROLLUP and CUBE are extensions to the GROUP BY clause that generate subtotals and grand totals. They are particularly useful for generating summary reports.

Example: Using ROLLUP

Assuming a Sales table with Region and ProductID columns:

SELECT
    Region,
    ProductID,
    SUM(Quantity) AS TotalQuantitySold
FROM
    Sales
GROUP BY
    ROLLUP (Region, ProductID)
ORDER BY
    Region, ProductID;

This will show the total quantity sold for each product within each region, the total quantity sold for each region (regardless of product), and the grand total quantity sold.

Note on NULLs

When using ROLLUP or CUBE, NULL values in the result set indicate subtotals or grand totals. For example, a NULL in the Region column with a specific ProductID means the row represents the total for that ProductID across all regions.

Common Aggregation Scenarios

Summary

Grouping and aggregation are powerful tools in SQL Server for data analysis and reporting. By mastering aggregate functions, the GROUP BY clause, and the HAVING clause, you can derive meaningful insights from your data. Understanding ROLLUP and CUBE further enhances your ability to create comprehensive summary reports.