SQL Database Engine Indexes

Indexes are special lookup tables that the database search engine can use to speed up data retrieval operations on a table. An index is a data structure (most commonly a B-tree) associated with a table or a view. The index stores a copy of one or more columns of the table or view in a sorted order, along with pointers to the original rows. This allows the database engine to quickly locate rows that match specific criteria without scanning the entire table.

Types of Indexes

Clustered Indexes

A clustered index defines the physical order of data in the table. Because of this, a table can only have one clustered index. The leaf nodes of a clustered index contain the actual data pages of the table. This makes data retrieval very efficient for range queries or when searching by the clustered index key.

Key Characteristics:

Defines physical storage order.
Only one per table.
Leaf nodes contain data.

Nonclustered Indexes

A nonclustered index is a separate structure from the data rows. It contains index key values and pointers (row locators) to the data rows. A table can have multiple nonclustered indexes. They are useful for speeding up searches on columns that are frequently queried but are not the primary means of organizing the table's data.

Key Characteristics:

Separate from data rows.
Multiple per table allowed.
Leaf nodes contain pointers to data.

Index Creation and Management

Indexes can be created using the CREATE INDEX statement. It's crucial to design indexes strategically to balance performance gains with the overhead of maintaining them.

Creating a Clustered Index

CREATE CLUSTERED INDEX IX_Customer_CustomerID
ON Customers (CustomerID);

Creating a Nonclustered Index

CREATE NONCLUSTERED INDEX IX_Order_OrderDate
ON Orders (OrderDate);

Important Considerations:

Choose columns for indexes that are frequently used in WHERE clauses, JOIN conditions, or ORDER BY clauses.
Avoid indexing columns with very low cardinality (few distinct values).
Consider composite indexes for queries filtering on multiple columns.
Regularly review and maintain indexes to remove fragmentation and unused indexes.

Index Maintenance

Over time, data modifications (inserts, updates, deletes) can lead to index fragmentation. Fragmentation can degrade query performance. SQL Server provides tools and commands to manage index fragmentation.

Reorganizing and Rebuilding Indexes

Reorganize: Merges pages with fewer than 50% free space and defragments page order. It's an online operation for most editions.
Rebuild: Deallocates unused pages, marks pages for deallocation, and reorganizes all pages into a highly efficient, sorted order. It can be an online operation depending on the SQL Server edition and index type.

You can use the ALTER INDEX statement for these operations:

-- Reorganize a nonclustered index
ALTER INDEX IX_Order_OrderDate ON Orders REORGANIZE;

-- Rebuild a clustered index
ALTER INDEX PK_Product_ProductID ON Products REBUILD;

Performance Tip: Use dynamic management views (DMVs) like sys.dm_db_index_physical_stats to identify fragmented indexes and determine the appropriate action (reorganize or rebuild).

Covering Indexes

A covering index is a nonclustered index that includes all the columns required to satisfy a query directly from the index. This avoids the need for the database engine to access the base table, significantly improving performance.

-- Query that can be satisfied by a covering index
SELECT CustomerID, FirstName, LastName
FROM Customers
WHERE PostalCode = '90210';

-- Creating a covering index for the above query
CREATE NONCLUSTERED INDEX IX_Customer_PostalCode_Names
ON Customers (PostalCode)
INCLUDE (FirstName, LastName);

By understanding and effectively utilizing indexes, you can dramatically improve the performance and scalability of your SQL Server database applications.