Understanding Indexing in Azure Cosmos DB

Tip: Effective indexing is crucial for optimizing query performance and managing costs in Azure Cosmos DB.

Introduction to Indexing

Azure Cosmos DB automatically indexes all data written to your containers. This automatic indexing ensures that your queries are fast without requiring you to manage indexes manually. The indexer processes every document, creating a composite index for all items in a container.

However, understanding how indexing works and how to influence it can significantly improve your application's performance and cost-efficiency. Azure Cosmos DB offers various indexing policies that allow you to tailor the indexing process to your specific query patterns.

Automatic Indexing vs. Manual Control

By default, Azure Cosmos DB uses an automatic indexing policy. This policy indexes all paths within your documents for both equality and range queries. While convenient, this can sometimes lead to higher RU/s consumption for write operations if you have very large documents or complex nested structures that don't align with your query needs.

You can gain more control by defining a custom indexing policy. This allows you to:

Indexing Policies Explained

1. Including and Excluding Paths

You can explicitly include or exclude paths from the indexing process. This is particularly useful for large documents or when you know certain fields will never be queried.

Example: Excluding a path

{ "indexingMode": "consistent", "automatic": false, "includedPaths": [ { "path": "/*", "indexes": [ { "kind": "Range", "dataType": "string", "precision": -1 }, { "kind": "Range", "dataType": "number", "precision": -1 } ] } ], "excludedPaths": [ { "path": "/sensitiveData/*" } ] }

In this example, all data is indexed by default ("automatic": false and then explicitly defined), but the path /sensitiveData/* is excluded from indexing.

2. Index Kinds

Azure Cosmos DB supports several kinds of indexes:

3. Composite Indexes

Composite indexes can significantly speed up queries that use multiple `WHERE` clauses or `ORDER BY` clauses on different fields.

Consider a query like: SELECT * FROM c WHERE c.category = "electronics" AND c.price > 100 ORDER BY c.price ASC.

An appropriate composite index would be on (category, price). The order matters:

{ "path": "/category/?", "order": "ascending" }, { "path": "/price/?", "order": "ascending" }

If the query was ORDER BY c.price ASC, and the composite index was on (category, price), the performance would be optimal. If the query was ORDER BY c.category ASC, a composite index on (price, category) would be better.

Indexing Modes

Azure Cosmos DB offers two indexing modes:

Optimizing Indexing

Important: Always analyze your application's query patterns before creating a custom indexing policy. Over-indexing or indexing unnecessary paths can lead to increased storage costs and RU/s consumption.

Best Practices:

Next Steps

Now that you understand indexing strategies, you can explore how to optimize performance further with specific tuning techniques.

Continue to the Performance Tuning tutorial.