Cosmos DB Indexing

Azure Cosmos DB automatically and transparently indexes all your data. The indexing policy defines how data is indexed, allowing you to optimize query performance. Cosmos DB supports a variety of indexing strategies to suit different workload patterns.

Automatic Indexing

By default, Cosmos DB uses a range index for all string values and a consistent index for all other data types. This means that most queries will perform well without any explicit configuration. The indexing process is asynchronous and happens as data is created, updated, or deleted.

Indexing Policy

You can customize the indexing behavior by defining an indexing policy. This policy allows you to specify:

Default Indexing Policy

The default indexing policy for Cosmos DB is:

{
    "indexingMode": "consistent",
    "automatic": true,
    "includedPaths": [
        {
            "path": "/*"
        }
    ],
    "excludedPaths": [
        {
            "path": "/"_etag""
        }
    ]
}

Configuring Indexing Policy

You can modify the indexing policy for a container through the Azure portal, Azure CLI, PowerShell, or SDKs. Here's an example of an indexing policy that excludes a specific path and includes a composite index:

Custom Indexing Policy Example

{
    "indexingMode": "consistent",
    "automatic": true,
    "includedPaths": [
        {"path": "/*"},
        {"path": "/myArray/\\*", "kind": "range", "precision": -1}
    ],
    "excludedPaths": [
        {"path": "/sensitiveData/*"},
        {"path": "/nonQueryableField"}
    ],
    "compositeIndexes": [
        [
            {"path": "/category", "order": "ascending"},
            {"path": "/price", "order": "descending"}
        ]
    ]
}

In this example:

  • /sensitiveData/* and /nonQueryableField are excluded from indexing.
  • A composite index is created on category (ascending) and price (descending) for efficient queries using both fields.
  • /myArray/\* with kind: "range" and precision: -1 indexes all elements within the myArray.

Indexing Mode: Consistent vs. Lazy

Consistent: This is the default and recommended mode. Indexes are updated synchronously with data operations. This ensures that query results are always up-to-date, but can have a slight overhead on write operations.

Lazy: In lazy indexing mode, indexes are updated periodically. This can improve write performance significantly, especially for bulk operations. However, there might be a delay before newly inserted or updated data is available for querying.

Querying Geospatial Data

Cosmos DB supports geospatial queries using the GeoJSON format. To enable efficient spatial queries, ensure your indexing policy includes spatial indexes. For example, to index a location property:

{
    "indexingMode": "consistent",
    "automatic": true,
    "includedPaths": [
        {"path": "/*"},
        {"path": "/location/*", "kind": "spatial"}
    ]
}

You can then use functions like ST_DISTANCE and ST_WITHIN in your SQL queries.

Performance Tip: Carefully plan your indexing strategy. Indexing every field can lead to high storage costs and slower write operations. Only index the fields you actively query or sort by. Use excludedPaths to prevent indexing of unnecessary data.

Managing Indexing

You can view and manage your indexing policy in the Azure portal under the "Scale & Settings" section of your Cosmos DB account or database. For programmatic management, use the Azure SDKs for your preferred language.

Further Reading: