SQL Subqueries Explained
What are Subqueries?
A subquery, also known as an inner query or nested query, is a query embedded within another SQL query. The outer query can be a SELECT
, INSERT
, UPDATE
, or DELETE
statement. Subqueries are powerful tools that allow you to break down complex problems into smaller, more manageable parts.
They are commonly used when you need to:
- Filter data based on the results of another query.
- Retrieve data that meets specific criteria determined by another query.
- Perform calculations or aggregations that influence the main query.
Types of Subqueries
Subqueries can be classified based on their return values:
- Scalar Subquery: Returns a single value (one row, one column).
- Row Subquery: Returns a single row with multiple columns.
- Column Subquery: Returns a single column with multiple rows.
- Table Subquery: Returns multiple rows and multiple columns.
They can also be categorized by their relationship to the outer query:
- Non-correlated Subquery: Executed once, and its result is used by the outer query.
- Correlated Subquery: Executed repeatedly for each row processed by the outer query.
Subqueries in the WHERE Clause
This is one of the most common uses of subqueries. You can use them with comparison operators (=
, !=
, <
, >
, <=
, >=
), IN
, NOT IN
, EXISTS
, and NOT EXISTS
.
Example: Finding Employees Earning More Than the Average Salary
Let's say we have an employees
table. We want to find all employees whose salary is greater than the average salary of all employees.
SELECT
employee_name,
salary
FROM
employees
WHERE
salary > (SELECT AVG(salary) FROM employees);
In this example, the inner query (SELECT AVG(salary) FROM employees)
calculates the average salary, and the outer query uses this single value to filter the results.
Example: Finding Products in a Specific Category
Suppose we have products
and categories
tables. We want to find all products belonging to the 'Electronics' category.
SELECT
product_name,
price
FROM
products
WHERE
category_id IN (SELECT category_id FROM categories WHERE category_name = 'Electronics');
Here, the subquery returns a list of category_id
s for 'Electronics', and the outer query selects products matching any of these IDs.
Subqueries in the FROM Clause (Derived Tables)
When a subquery is placed in the FROM
clause, it's often referred to as a derived table or inline view. The result set of the subquery acts as a temporary table that the outer query can operate on.
Example: Calculating Average Order Value per Customer
Consider orders
and customers
tables. We want to find the average order value for each customer.
SELECT
c.customer_name,
avg_orders.average_order_value
FROM
customers c
JOIN
(SELECT
customer_id,
AVG(order_total) AS average_order_value
FROM
orders
GROUP BY
customer_id) AS avg_orders
ON
c.customer_id = avg_orders.customer_id;
The subquery aliased as avg_orders
computes the average order value per customer, which is then joined with the customers
table.
Subqueries in the SELECT Clause (Scalar Subqueries)
A scalar subquery can be used in the SELECT
list to retrieve a single value for each row returned by the main query. This is useful for adding calculated or related information.
Example: Displaying Employee Name and Their Department's Average Salary
SELECT
e.employee_name,
e.salary,
(SELECT AVG(salary) FROM employees WHERE department_id = e.department_id) AS department_average_salary
FROM
employees e;
For each employee e
, the subquery calculates the average salary for their specific department.
Correlated vs. Non-correlated Subqueries
Non-correlated: The inner query can be executed independently of the outer query. Its result is determined once and then used by the outer query. The example of finding employees earning more than the average salary is non-correlated.
Correlated: The inner query references columns from the outer query. It depends on the outer query's current row and is executed for each row processed by the outer query. This can sometimes be less efficient.
Example: Finding Employees in the Same Department with Higher Salary (Correlated)
SELECT
e1.employee_name,
e1.salary
FROM
employees e1
WHERE
e1.salary > (SELECT AVG(e2.salary)
FROM employees e2
WHERE e2.department_id = e1.department_id);
Here, for each employee e1
, the subquery calculates the average salary specifically for e1
's department (e2.department_id = e1.department_id
).
Subqueries with EXISTS and NOT EXISTS
These operators check for the existence of rows returned by the subquery. They are often used for performance optimization, especially when dealing with large tables.
Example: Finding Customers Who Have Placed an Order
SELECT
c.customer_name
FROM
customers c
WHERE
EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id);
The subquery (SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id)
returns at least one row if the customer has an order. EXISTS
is true if the subquery returns any rows.
Best Practices and Considerations
- Readability: Use clear aliases for subqueries (derived tables). Indent your SQL code properly.
- Performance: Correlated subqueries can be slow. Consider rewriting them using JOINs or temporary tables if performance is an issue. Use
EXPLAIN
to analyze query execution plans. - Scalar Subqueries: Ensure scalar subqueries return at most one row and one column, otherwise, you'll get an error.
- Alternatives: Sometimes, JOINs or Common Table Expressions (CTEs) can provide a more readable and performant alternative to subqueries.