SQL Subqueries
Subqueries, also known as inner queries or nested queries, are SQL queries embedded within another SQL query. They are a powerful tool for performing complex data retrieval and manipulation tasks that cannot be easily achieved with a single, simple query.
What is a Subquery?
A subquery is a `SELECT` statement that is nested inside another SQL statement such as `SELECT`, `INSERT`, `UPDATE`, or `DELETE`. Subqueries can also be used within other subqueries.
Types of Subqueries
Subqueries can be categorized based on the number of rows and columns they return and how they are used in the outer query:
- Scalar Subquery: Returns a single value (one row, one column).
- Row Subquery: Returns a single row with multiple columns.
- Column Subquery: Returns a single column with multiple rows.
- Table Subquery: Returns multiple rows and multiple columns.
Where Can Subqueries Be Used?
Subqueries can be used in various clauses of a SQL statement:
- `WHERE` Clause: To filter rows based on the result of the subquery.
- `HAVING` Clause: To filter groups based on the result of the subquery.
- `SELECT` Clause: To retrieve a single value as a column.
- `FROM` Clause: To treat the result of the subquery as a temporary table (derived table).
- `INSERT`, `UPDATE`, `DELETE` Statements: To specify values or rows to be affected.
Examples of Subqueries
1. Subquery in the `WHERE` Clause (Scalar Subquery)
Find all employees whose salary is greater than the average salary.
SELECT employee_name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
2. Subquery in the `WHERE` Clause (Column Subquery with `IN`)
Find all customers who have placed an order.
SELECT customer_name
FROM customers
WHERE customer_id IN (SELECT DISTINCT customer_id FROM orders);
3. Subquery in the `WHERE` Clause (Column Subquery with `EXISTS`)
Find all customers who have never placed an order.
SELECT customer_name
FROM customers c
WHERE NOT EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id);
4. Subquery in the `SELECT` Clause (Scalar Subquery)
Show each product name along with the total number of orders it has been included in.
SELECT
product_name,
(SELECT COUNT(*) FROM order_items oi WHERE oi.product_id = p.product_id) AS total_orders
FROM
products p;
5. Subquery in the `FROM` Clause (Derived Table)
Find the average order total for each customer.
SELECT
c.customer_name,
AVG(order_totals.total_amount) AS average_order_total
FROM
customers c
JOIN
(SELECT customer_id, SUM(quantity * price) AS total_amount
FROM orders o
JOIN order_items oi ON o.order_id = oi.order_id
GROUP BY customer_id) AS order_totals
ON c.customer_id = order_totals.customer_id
GROUP BY c.customer_name;
Correlated vs. Non-Correlated Subqueries
- Non-Correlated Subquery: The subquery can be executed independently of the outer query. It runs only once.
- Correlated Subquery: The subquery depends on the values from the outer query. It is executed repeatedly, once for each row in the outer query.
Performance Considerations
While subqueries are powerful, their performance can be a concern, especially correlated subqueries or when dealing with large datasets. Always analyze the execution plan and consider:
- Using `JOIN` operations instead of subqueries where possible.
- Optimizing the subquery itself.
- Ensuring appropriate indexes are in place.
- Using `EXISTS` or `NOT EXISTS` when simply checking for the presence or absence of rows, as they can be more efficient than `IN` or `NOT IN` with large datasets.