SQL Grouping Tutorial: Aggregate Functions & GROUP BY

Understanding GROUP BY in SQL

The GROUP BY clause in SQL is a powerful tool that allows you to group rows that have the same values in one or more columns into a summary row. This is often used in conjunction with aggregate functions (like COUNT, MAX, MIN, SUM, AVG) to perform calculations on each group.

What is GROUP BY?

Imagine you have a table of sales data, and you want to know the total sales for each product category. Instead of getting a single grand total, GROUP BY lets you segment your data and apply aggregate functions to each segment.

It collapses multiple rows into a single row based on the distinct values in the specified column(s).

Key Aggregate Functions

  • COUNT(column): Returns the number of rows or non-NULL values in a column.
  • SUM(column): Calculates the sum of values in a numeric column.
  • AVG(column): Computes the average of values in a numeric column.
  • MIN(column): Finds the minimum value in a column.
  • MAX(column): Finds the maximum value in a column.

Basic Syntax

The general syntax for using GROUP BY is:


SELECT column1, aggregate_function(column2)
FROM table_name
WHERE condition
GROUP BY column1
ORDER BY column1;
        

Important Note: Any column in the SELECT list that is not an aggregate function MUST be included in the GROUP BY clause.

Example Scenario

Let's consider a simple Orders table:

OrderID CustomerID OrderDate Amount Region
101C0012023-01-15150.00North
102C0022023-01-16220.50South
103C0012023-01-1775.20North
104C0032023-01-18310.00East
105C0022023-01-19180.75South
106C0012023-01-2095.00North
107C0042023-01-21450.00West
108C0032023-01-22110.25East

Example 1: Total Amount per Region

To find the total sales amount for each region, we can use SUM() with GROUP BY Region:


SELECT Region, SUM(Amount) AS TotalSales
FROM Orders
GROUP BY Region
ORDER BY Region;
        

Result:

Region TotalSales
East420.25
North320.20
South401.25
West450.00

Example 2: Number of Orders per Customer

To count how many orders each customer has placed:


SELECT CustomerID, COUNT(OrderID) AS NumberOfOrders
FROM Orders
GROUP BY CustomerID
ORDER BY CustomerID;
        

Result:

CustomerID NumberOfOrders
C0013
C0022
C0032
C0041

The HAVING Clause

While the WHERE clause filters rows *before* they are grouped, the HAVING clause filters groups *after* the grouping has occurred based on the results of aggregate functions.

Example 3: Regions with Total Sales Greater Than 400

We want to see only those regions where the total sales exceed 400:


SELECT Region, SUM(Amount) AS TotalSales
FROM Orders
GROUP BY Region
HAVING SUM(Amount) > 400
ORDER BY Region;
        

Result:

Region TotalSales
East420.25
South401.25
West450.00

Mastering GROUP BY and aggregate functions is fundamental to data analysis in SQL. It empowers you to summarize and derive insights from your data efficiently.