Indexing in Azure Cosmos DB
Efficiently querying your data is crucial for any database. Azure Cosmos DB offers a robust and automatic indexing mechanism that significantly enhances query performance. This document explains how Cosmos DB indexing works, how to configure it, and best practices.
Automatic Indexing
By default, Azure Cosmos DB automatically indexes all data within your containers. As soon as an item is created or updated, its data is indexed. This means you don't need to explicitly create indexes beforehand as you might in traditional relational databases.
The indexing policy defines which parts of your documents are indexed and how. Cosmos DB uses a lazy indexing process, which means that indexes are updated asynchronously. This minimizes the impact on write operations.
Indexing Policy
You can customize the indexing policy for each container. The policy consists of:
- Composite Indexes: Used for queries with multiple filter conditions (e.g.,
WHERE Category = 'Electronics' AND Price < 100
). - Spatial Indexes: Used for geospatial queries (e.g., finding points within a specific radius).
- Unordered vs. Ordered Indexes: By default, indexes are unordered. You can specify ordered indexes for fields that are frequently used in ORDER BY clauses.
Here's an example of a default indexing policy for a JSON document:
{
"indexingMode": "consistent",
"automatic": true,
"includedPaths": [
{
"path": "/*"
}
],
"excludedPaths": [
{
"path": "/"_ts/**"
}
]
}
Indexing Modes
Cosmos DB supports two indexing modes:
- Consistent: Indexes are updated synchronously with data operations. This ensures that queries always return the latest data, but can incur a slight overhead on write operations. This is the default and recommended mode for most scenarios.
- Lazy: Indexes are updated asynchronously after data operations. This offers the best performance for write operations but means that queries might not reflect the very latest changes immediately. Use this mode when write throughput is critical and a small delay in query consistency is acceptable.
Configuring Indexing
You can modify the indexing policy through the Azure portal, Azure CLI, PowerShell, or the Cosmos DB SDKs.
When to Customize Indexing
- Performance Optimization: If you have specific query patterns that are not performing optimally, you might need to create composite indexes or specify ordered indexes.
- Reducing Index Size: If your documents are large and you don't need to query certain paths, you can exclude them from indexing to reduce storage costs and potentially improve write performance.
- Geospatial Queries: To perform efficient geospatial queries, you need to include paths with GeoJSON data and ensure they are indexed.
Best Practices
- Index only what you need: Avoid indexing entire documents (
/*
) if you only query specific fields. Be more selective with yourincludedPaths
. - Use composite indexes for compound queries: For queries with multiple filters (e.g.,
WHERE A = 1 AND B = 2
), a composite index on(A, B)
is generally more efficient than separate indexes. - Consider ordered indexes: If you frequently use
ORDER BY
clauses on specific fields, creating ordered indexes for those fields can significantly speed up sorting. - Exclude unnecessary paths: Fields like
_ts
(timestamp) or internal metadata are often not queried and can be excluded. - Monitor RU consumption: Indexing contributes to Request Unit (RU) consumption. Optimize your indexing policy to reduce unnecessary RU usage.
Example: Composite Index for Range Queries
Suppose you have documents with category
and price
fields, and you frequently query for items within a specific category and price range.
Query:
SELECT * FROM c WHERE c.category = 'Electronics' AND c.price < 500
A composite index on [/category ASC, /price ASC]
would be highly beneficial for this query.
Conclusion
Azure Cosmos DB's automatic indexing simplifies database management. By understanding the indexing policy and applying best practices, you can further optimize query performance and ensure your applications are highly responsive.