SQL Subqueries Explained

What are Subqueries?

A subquery, also known as an inner query or nested query, is a query embedded within another SQL query. The outer query can be a SELECT, INSERT, UPDATE, or DELETE statement. Subqueries are powerful tools that allow you to break down complex problems into smaller, more manageable parts.

They are commonly used when you need to:

  • Filter data based on the results of another query.
  • Retrieve data that meets specific criteria determined by another query.
  • Perform calculations or aggregations that influence the main query.

Types of Subqueries

Subqueries can be classified based on their return values:

  • Scalar Subquery: Returns a single value (one row, one column).
  • Row Subquery: Returns a single row with multiple columns.
  • Column Subquery: Returns a single column with multiple rows.
  • Table Subquery: Returns multiple rows and multiple columns.

They can also be categorized by their relationship to the outer query:

  • Non-correlated Subquery: Executed once, and its result is used by the outer query.
  • Correlated Subquery: Executed repeatedly for each row processed by the outer query.

Subqueries in the WHERE Clause

This is one of the most common uses of subqueries. You can use them with comparison operators (=, !=, <, >, <=, >=), IN, NOT IN, EXISTS, and NOT EXISTS.

Example: Finding Employees Earning More Than the Average Salary

Let's say we have an employees table. We want to find all employees whose salary is greater than the average salary of all employees.


SELECT
    employee_name,
    salary
FROM
    employees
WHERE
    salary > (SELECT AVG(salary) FROM employees);
                    

In this example, the inner query (SELECT AVG(salary) FROM employees) calculates the average salary, and the outer query uses this single value to filter the results.

Example: Finding Products in a Specific Category

Suppose we have products and categories tables. We want to find all products belonging to the 'Electronics' category.


SELECT
    product_name,
    price
FROM
    products
WHERE
    category_id IN (SELECT category_id FROM categories WHERE category_name = 'Electronics');
                    

Here, the subquery returns a list of category_ids for 'Electronics', and the outer query selects products matching any of these IDs.

Subqueries in the FROM Clause (Derived Tables)

When a subquery is placed in the FROM clause, it's often referred to as a derived table or inline view. The result set of the subquery acts as a temporary table that the outer query can operate on.

Example: Calculating Average Order Value per Customer

Consider orders and customers tables. We want to find the average order value for each customer.


SELECT
    c.customer_name,
    avg_orders.average_order_value
FROM
    customers c
JOIN
    (SELECT
         customer_id,
         AVG(order_total) AS average_order_value
     FROM
         orders
     GROUP BY
         customer_id) AS avg_orders
ON
    c.customer_id = avg_orders.customer_id;
                    

The subquery aliased as avg_orders computes the average order value per customer, which is then joined with the customers table.

Subqueries in the SELECT Clause (Scalar Subqueries)

A scalar subquery can be used in the SELECT list to retrieve a single value for each row returned by the main query. This is useful for adding calculated or related information.

Example: Displaying Employee Name and Their Department's Average Salary


SELECT
    e.employee_name,
    e.salary,
    (SELECT AVG(salary) FROM employees WHERE department_id = e.department_id) AS department_average_salary
FROM
    employees e;
                    

For each employee e, the subquery calculates the average salary for their specific department.

Correlated vs. Non-correlated Subqueries

Non-correlated: The inner query can be executed independently of the outer query. Its result is determined once and then used by the outer query. The example of finding employees earning more than the average salary is non-correlated.

Correlated: The inner query references columns from the outer query. It depends on the outer query's current row and is executed for each row processed by the outer query. This can sometimes be less efficient.

Example: Finding Employees in the Same Department with Higher Salary (Correlated)


SELECT
    e1.employee_name,
    e1.salary
FROM
    employees e1
WHERE
    e1.salary > (SELECT AVG(e2.salary)
                   FROM employees e2
                   WHERE e2.department_id = e1.department_id);
                    

Here, for each employee e1, the subquery calculates the average salary specifically for e1's department (e2.department_id = e1.department_id).

Subqueries with EXISTS and NOT EXISTS

These operators check for the existence of rows returned by the subquery. They are often used for performance optimization, especially when dealing with large tables.

Example: Finding Customers Who Have Placed an Order


SELECT
    c.customer_name
FROM
    customers c
WHERE
    EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id);
                    

The subquery (SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id) returns at least one row if the customer has an order. EXISTS is true if the subquery returns any rows.

Best Practices and Considerations

  • Readability: Use clear aliases for subqueries (derived tables). Indent your SQL code properly.
  • Performance: Correlated subqueries can be slow. Consider rewriting them using JOINs or temporary tables if performance is an issue. Use EXPLAIN to analyze query execution plans.
  • Scalar Subqueries: Ensure scalar subqueries return at most one row and one column, otherwise, you'll get an error.
  • Alternatives: Sometimes, JOINs or Common Table Expressions (CTEs) can provide a more readable and performant alternative to subqueries.