What are Subqueries?
A subquery, also known as an inner query or nested query, is a query that is embedded within another SQL query. Subqueries are used to perform operations that require multiple steps to complete. They can be used in various parts of a SQL statement, including the SELECT list, FROM clause, WHERE clause, and even within other subqueries.
Subqueries are powerful tools for:
- Filtering data based on results from another query.
- Performing calculations or aggregations that depend on other data.
- Simplifying complex query logic.
Types of Subqueries
Subqueries can be categorized based on the number of rows and columns they return, and their relationship with the outer query:
- Scalar Subqueries: Return a single value (one row, one column).
- Row Subqueries: Return a single row with multiple columns.
- Column Subqueries: Return a single column with multiple rows.
- Table Subqueries: Return multiple rows and multiple columns (treated as a derived table).
- Correlated Subqueries: The inner query depends on the outer query for its execution.
- Non-Correlated Subqueries: The inner query can be executed independently of the outer query.
Scalar Subqueries
A scalar subquery returns exactly one row and one column. It's often used in the SELECT list or WHERE clause where a single value is expected.
Example:
SELECTCustomerName, (SELECTAVG(OrderTotal)FROMOrdersWHERECustomerID = c.CustomerID )ASAverageOrderTotalFROMCustomers c;
Row and Column Subqueries
These subqueries return multiple columns or multiple rows, and are typically used with comparison operators like IN, ANY, or ALL.
Example (Column Subquery with IN):
SELECTProductNameFROMProductsWHEREProductIDIN(SELECTProductIDFROMOrderDetailsWHEREQuantity > 10 );
Correlated Subqueries
A correlated subquery is one where the inner query references columns from the outer query. The inner query is executed once for each row processed by the outer query.
Example:
SELECTE.EmployeeName, E.SalaryFROMEmployees EWHEREE.Salary > (SELECTAVG(Salary)FROMEmployeesWHEREDepartmentID = E.DepartmentID-- Correlated condition);
Subqueries in WHERE Clause
This is one of the most common uses. Subqueries in the WHERE clause are used to filter records based on a condition evaluated by the subquery.
Example: Find all customers who have placed at least one order.
SELECTCustomerNameFROMCustomersWHERECustomerIDIN(SELECTDISTINCT CustomerIDFROMOrders );
Subqueries in SELECT Clause
Scalar subqueries can be used in the SELECT list to retrieve calculated or related single values for each row returned by the main query.
Example: Show each product's name and the total quantity ordered for that product.
SELECTp.ProductName, (SELECTSUM(od.Quantity)FROMOrderDetails odWHEREod.ProductID = p.ProductID )ASTotalQuantitySoldFROMProducts p;
Subqueries in FROM Clause (Derived Tables)
A subquery in the FROM clause is treated as a temporary table, also known as a derived table or inline view. This derived table can then be queried like a regular table.
Example: Find the average order total per customer, but only for customers who have placed more than 5 orders.
SELECTCustomerID, AVG(OrderTotal)ASAverageOrderTotalFROMOrdersGROUPBYCustomerIDHAVINGCOUNT(*) > 5UNIONSELECT'N/A'ASCustomerID,NULLASAverageOrderTotalWHERENOT EXISTS(SELECT1FROMOrdersGROUPBYCustomerIDHAVINGCOUNT(*) > 5 );
Subqueries with Operators
Subqueries are often used with operators like:
IN/NOT IN: Checks if a value exists in a list returned by the subquery.=/<>: Used with scalar subqueries.>,<,>=,<=: Used with scalar subqueries or withANY/ALL.EXISTS/NOT EXISTS: Checks for the existence of rows returned by the subquery.ANY/SOME: Compares a value to each value in a list (e.g., > ANY means greater than at least one value).ALL: Compares a value to all values in a list (e.g., > ALL means greater than all values).
Example (EXISTS): Find customers who have NOT placed an order.
SELECTCustomerNameFROMCustomers cWHERENOT EXISTS(SELECT1FROMOrders oWHEREo.CustomerID = c.CustomerID );
Performance Considerations
While powerful, subqueries can sometimes impact performance. Here are some tips:
- Correlated Subqueries: Be mindful of correlated subqueries as they can be executed many times. Often, they can be rewritten as joins or derived tables for better performance.
INvs. `EXISTS`: For large result sets from the subquery, `EXISTS` is often more efficient than `IN` because `EXISTS` stops checking as soon as it finds a match, whereas `IN` might have to build a temporary list.- Indexing: Ensure that columns used in the join conditions (implicitly or explicitly) within your subqueries are properly indexed.
- `SELECT *` in Subqueries: Avoid selecting all columns (`SELECT *`) in a subquery if you only need one or a few. This reduces the amount of data processed.
- Use appropriate subquery types: Choose the subquery type that best fits the problem. A scalar subquery is more efficient than a row or table subquery if only a single value is needed.