MSDN Azure Documentation

Azure Cosmos DB Indexing: A Comprehensive Guide

This guide provides an in-depth look at indexing in Azure Cosmos DB, a critical component for optimizing query performance and managing throughput.

Introduction to Indexing

Azure Cosmos DB is a globally distributed, multi-model database service that supports various APIs. To achieve low latency and high throughput for your applications, effective indexing is paramount. Cosmos DB automatically indexes data as it's ingested, providing a powerful query engine that eliminates the need for explicit index management in many scenarios.

Understanding the Indexing Policy

The indexing policy in Cosmos DB defines how your data is indexed. It's a JSON document that specifies which paths within your documents should be indexed, the indexing mode, and the data types to be indexed. You can define a default indexing policy or a custom one for specific containers.

Here's an example of a default indexing policy:

{ "indexingMode": "consistent", "automatic": true, "includedPaths": [ { "path": "/*" } ], "excludedPaths": [ { "path": "/"_etag"" } ] }

Indexing Modes: Consistent vs. Lazy

Cosmos DB offers two primary indexing modes:

Consistent Mode Details

When using consistent indexing, every document write (create, update, delete) triggers an index update. This is the most common mode and provides the best query experience without needing to worry about stale data.

Lazy Mode Details

Lazy indexing is beneficial when you have a high volume of writes and can tolerate a brief period of eventual consistency for queries. The index is typically updated within a few seconds of the data operation.

Indexing Paths: Includes vs. Excludes

The includedPaths and excludedPaths arrays within the indexing policy control which parts of your documents are indexed.

Example of excluding a specific path:

{ "indexingMode": "consistent", "automatic": true, "includedPaths": [ { "path": "/*" } ], "excludedPaths": [ { "path": "/description" } ] }

Composite Indexes

Composite indexes allow you to create indexes on multiple fields within a single document. These are essential for efficiently querying based on multiple criteria in a single SQL or NoSQL query.

Composite indexes are defined in the indexing policy as an array of paths. The order of paths in a composite index matters. For example, an index on /category and /price would be defined as:

{ "path": "/category", "order": "ascending" }, { "path": "/price", "order": "descending" }

Spatial Indexes

For geospatial queries, Azure Cosmos DB supports spatial data types and indexing. By indexing a property that contains GeoJSON data, you can perform efficient spatial queries like finding points within a given polygon or calculating distances.

To enable spatial indexing, ensure your path definition includes the appropriate data type for your GeoJSON data.

Performance Considerations

Indexing significantly impacts query performance and Request Units (RUs) consumed by your operations. However, it also adds overhead to write operations and consumes storage.

Best Practices

Conclusion

Effective indexing in Azure Cosmos DB is a balance between query performance, write latency, and storage costs. By understanding the indexing policy, modes, and path definitions, you can tune your Cosmos DB solution to meet the demands of your application. Regularly review your indexing strategy as your application evolves to ensure optimal performance and cost-efficiency.


Last updated: October 26, 2023