SQL Server Subqueries Tutorial - MSDN Documentation

SQL Server Subqueries

Subqueries, also known as inner queries or nested queries, are a fundamental concept in SQL Server that allow you to write more complex and powerful queries. A subquery is a query nested inside another SQL query. The outer query then uses the results returned by the subquery.

What are Subqueries?

A subquery is a SELECT statement that is embedded within another SQL statement. The outer statement can be a SELECT, INSERT, UPDATE, or DELETE statement, and it can also be another subquery.

Subqueries can be used to:

Perform comparisons with a derived value.
Test for the existence of rows.
Determine whether a set of values exists.

Types of Subqueries

Subqueries can be classified in several ways, but the most common distinction is between scalar, row, table, and correlated subqueries.

1. Scalar Subqueries

A scalar subquery returns a single value (one row and one column). It can be used anywhere an expression is allowed.

Example: Finding employees with salary greater than the average salary.


SELECT
    EmployeeName,
    Salary
FROM
    Employees
WHERE
    Salary > (SELECT AVG(Salary) FROM Employees);

2. Row Subqueries

A row subquery returns a single row with one or more columns. These are typically used with comparison operators like =, <>, IN, NOT IN, ANY, ALL, and EXISTS.

Example: Finding employees in the same department as 'John Doe'.


SELECT
    EmployeeName,
    DepartmentID
FROM
    Employees
WHERE
    DepartmentID = (SELECT DepartmentID FROM Employees WHERE EmployeeName = 'John Doe');

Note: This assumes 'John Doe' is unique. If not, a scalar subquery might not be appropriate, and you might need IN.

3. Table Subqueries

A table subquery (also called a derived table) returns multiple rows and multiple columns. These are commonly used in the FROM clause of a SELECT statement.

Example: Finding departments with more than 10 employees.


SELECT
    DepartmentName,
    EmployeeCount
FROM
    (SELECT DepartmentID, COUNT(*) AS EmployeeCount FROM Employees GROUP BY DepartmentID) AS DeptCounts
JOIN
    Departments ON DeptCounts.DepartmentID = Departments.DepartmentID
WHERE
    DeptCounts.EmployeeCount > 10;

4. Correlated Subqueries

A correlated subquery is a subquery that references one or more columns from the outer query. This means the subquery is evaluated once for each row processed by the outer query. They can be less efficient than non-correlated subqueries.

Example: Finding employees whose salary is higher than the average salary of their own department.


SELECT
    e1.EmployeeName,
    e1.Salary,
    e1.DepartmentID
FROM
    Employees e1
WHERE
    e1.Salary > (SELECT AVG(e2.Salary)
                  FROM Employees e2
                  WHERE e2.DepartmentID = e1.DepartmentID);

Using Subqueries with `IN` and `EXISTS`

The IN operator checks if a value is present in a list of values returned by a subquery. The EXISTS operator checks if the subquery returns any rows.

`IN` Operator

Example: Finding customers who have placed an order.


SELECT
    CustomerID,
    CustomerName
FROM
    Customers
WHERE
    CustomerID IN (SELECT CustomerID FROM Orders);

`EXISTS` Operator

Example: Finding departments that have at least one employee.


SELECT
    DepartmentName
FROM
    Departments d
WHERE
    EXISTS (SELECT 1 FROM Employees e WHERE e.DepartmentID = d.DepartmentID);

Best Practices

Readability: Use indentation and formatting to make subqueries easy to read.
Performance: Correlated subqueries can be slow. Consider alternatives like JOINs or CTEs (Common Table Expressions) when possible.
EXISTS vs. IN: For large datasets, EXISTS is often more performant than IN, especially when the subquery returns many rows.
Alias: Always use aliases for derived tables (subqueries in the FROM clause) and for tables in correlated subqueries to improve clarity.