Clustered Indexes

A clustered index defines the physical order of data in a table. Because a table can have only one clustered index, the table is physically sorted according to the clustered index key.

The clustered index key is used to uniquely identify each row in the table. When a clustered index is created, the data rows are physically stored in the leaf level of the index, sorted by the clustered index key. This makes retrieval of rows based on the clustered index key very efficient.

How Clustered Indexes Work

A clustered index is implemented as a B-tree structure. The root and intermediate levels of the B-tree contain index pages that store pointers to the next level of the B-tree. The leaf level of the B-tree contains the actual data pages, sorted by the clustered index key.

When you query a table with a clustered index, SQL Server can quickly navigate the B-tree to locate the desired data rows. For example, if you search for a row using the clustered index key, SQL Server starts at the root page, follows the pointers to the appropriate index page, and then navigates down to the leaf page containing the data.

Creating Clustered Indexes

You can create a clustered index on a table using the CREATE CLUSTERED INDEX statement. Typically, a clustered index is created on the primary key of a table, as primary keys are usually unique and frequently used for data retrieval.

CREATE CLUSTERED INDEX IX_Customers_CustomerID
ON Customers (CustomerID);

If a primary key constraint is defined on a table, SQL Server automatically creates a unique clustered index on the primary key column(s) by default, unless you specify otherwise.

Clustered Index Key Choices

The choice of the clustered index key is critical for performance. The key should be:

Unique: Ensures each row can be uniquely identified.
Narrow: Smaller keys lead to smaller index pages, improving cache efficiency.
Static: Keys that don't change frequently reduce the overhead of updating the index.
Ever-increasing: An ever-increasing key (like an identity column) can improve insert performance by minimizing page splits.

Unique Clustered Index

A unique clustered index guarantees that the values in the clustered index key columns are unique for all rows in the table. This is the default behavior when creating a clustered index on a primary key.

Example:

-- Assuming CustomerID is the primary key and is unique
CREATE UNIQUE CLUSTERED INDEX PK_Customers
ON Customers (CustomerID);

Nonunique Clustered Index

A nonunique clustered index allows duplicate values in the clustered index key columns. If the key is not unique, SQL Server adds a 4-byte uniqueifier to each row to make the key unique.

Example:

CREATE CLUSTERED INDEX IX_Orders_OrderDate
ON Orders (OrderDate); -- OrderDate might have duplicates

Clustered vs. Nonclustered Indexes

The fundamental difference lies in how data is stored:

Clustered Index: Determines the physical order of data rows. The leaf level contains the data itself.
Nonclustered Index: Creates a separate structure from the data. The leaf level contains pointers (row locators) to the actual data rows.

A table can have only one clustered index but multiple nonclustered indexes.

Considerations

Insert/Update Performance: Inserts and updates on the clustered index key can be more expensive, especially if they cause page splits.
Selectivity: A well-chosen clustered index key can significantly improve the performance of queries that filter or sort by that key.
Primary Key: It's generally recommended to use the primary key as the clustered index key.

Best Practices

Choose a narrow, unique, and static clustered index key.
An identity column is often an excellent choice for a clustered index key due to its ever-increasing nature.
Avoid using wide keys (e.g., large text fields) or frequently updated keys as the clustered index key.
If no primary key is defined, consider creating a clustered index on a column that is most frequently used in WHERE clauses or JOIN conditions.

Microsoft Learn

Table of Contents