Azure Cosmos DB SQL API Indexing - Optimize Your Queries

The Power of Indexing in Cosmos DB

Azure Cosmos DB's SQL API leverages a powerful indexing mechanism to accelerate query execution. By default, Cosmos DB automatically indexes all items in your container using an index that supports a broad range of queries. However, understanding and optimizing this indexing can dramatically improve query latency, throughput, and reduce Request Units (RUs) consumed.

Effective indexing is crucial for any application relying on efficient data retrieval. It transforms potentially slow scan operations into lightning-fast lookups.

Understanding the Default Indexing Policy

Cosmos DB provides a default indexing policy that indexes every property of every JSON document. This is convenient for development and simple scenarios, but for performance-critical applications, a tailored policy is often necessary.

The default policy typically includes:

Indexing all paths (/*).
Using range indexes for common data types (strings, numbers, booleans).
Automatic creation of composite indexes and spatial indexes as needed by queries.

You can view and modify your container's indexing policy via the Azure portal or programmatically.

Key Indexing Concepts

1. Indexing Paths

Define which parts of your documents are indexed. You can include or exclude specific paths to reduce index size and improve performance.

Including paths: Specifically tell Cosmos DB to index certain properties.
Excluding paths: Tell Cosmos DB *not* to index certain properties (useful for large, infrequently queried arrays or blob-like fields).

Example: Indexing only the 'category' and 'productId' fields:


{
    "indexingMode": "consistent",
    "automatic": true,
    "includedPaths": [
        { "path": "/category/?" },
        { "path": "/productId/?" }
    ],
    "excludedPaths": [
        { "path": "/*" }
    ]
}

2. Indexing Modes

Determines when the index is updated relative to data operations.

consistent (Default): The index is always up-to-date with your data. This is the most common mode.
lazy: The index is updated only when a query is executed. This can save RUs during write-heavy workloads but might lead to stale query results if not managed carefully.
off: Indexing is disabled. This is rarely used but can be beneficial if you only perform full document scans and never query by specific fields.

3. Composite Indexes

Used for queries that filter or order by multiple properties. A composite index is defined on two or more paths.

Example: Optimizing queries filtering by category AND price, or ordering by category then price.


{
    "compositeIndexes": [
        [
            { "path": "/category", "order": "ascending" },
            { "path": "/price", "order": "descending" }
        ]
    ]
}

Tip: The order of paths in a composite index matters. Place the most frequently filtered or ordered path first.

4. Spatial Indexes

Enable efficient querying of geospatial data (e.g., finding points within a radius). Cosmos DB supports GeoJSON format for spatial data.

Example: Indexing a 'location' GeoJSON property.


{
    "indexingMode": "consistent",
    "automatic": true,
    "spatialIndexes": [
        {
            "path": "/location",
            "type": "Point"
        }
    ]
}

5. Range Indexes

These are the default indexes for primitive data types like numbers, strings, and booleans. They allow for efficient range queries (e.g., price > 100).

Strategies for Effective Indexing

1. Analyze Your Query Patterns

The most important step! Understand which fields are most frequently used in your WHERE clauses, ORDER BY clauses, and JOIN operations. Use the Azure portal's Query Metrics or application logs to identify slow queries.

2. Use Composite Indexes Judiciously

While powerful, composite indexes consume more storage and RUs for writes. Only create them for common, multi-property query patterns. Avoid creating redundant indexes.

3. Exclude Unnecessary Paths

If you have large arrays, metadata fields, or large text blobs that are rarely queried, exclude them from indexing. This significantly reduces index size and write costs.

Info: Excluding a path does not mean it won't be stored; it simply means it won't be indexed for fast lookups.

4. Consider Indexing Mode

For workloads with extremely high write volumes and infrequent reads, a lazy indexing mode might offer cost savings, but requires careful consideration of potential read staleness.

5. Monitor Index Size and Performance

Regularly check your container's index size and query performance metrics in the Azure portal. Large index sizes can increase RU consumption for writes and reads.

Example: Optimizing a Product Catalog Query

Imagine a product catalog where you frequently query products by category and sort them by price.

Scenario: Frequent Queries

Query to find products in 'electronics' category, ordered by price descending:


SELECT *
FROM c
WHERE c.category = 'electronics'
ORDER BY c.price DESC

Recommended Indexing Policy

To optimize this, you'd add a composite index:


{
    "indexingMode": "consistent",
    "automatic": true,
    "includedPaths": [
        { "path": "/*" }
    ],
    "excludedPaths": [],
    "compositeIndexes": [
        [
            { "path": "/category", "order": "ascending" },
            { "path": "/price", "order": "descending" }
        ]
    ]
}

This policy ensures that Cosmos DB can efficiently satisfy both the filter on category and the sort order on price without scanning the entire dataset.

Getting Started

Optimizing indexing is an ongoing process. Start by understanding your application's data access patterns and iteratively refine your indexing policy. The Azure portal provides excellent tools for monitoring and managing your indexing policies.

Dive deeper into the official Azure Cosmos DB documentation for advanced indexing techniques and best practices.

Learn More on Microsoft Docs