Scaling Azure Cosmos DB

Azure Cosmos DB is a globally distributed, multi-model database service that enables you to harness the benefits of global distribution, elastic scalability, and low latency. Scaling is a core feature, allowing your application to handle varying loads by adjusting provisioned throughput and storage.

Understanding Throughput in Cosmos DB

Throughput in Azure Cosmos DB is measured in Request Units (RUs). A Request Unit is a normalized measure of the compute, memory, and IOPS required to perform database operations. You can provision throughput at the container (or collection) level or at the database level. There are two primary ways to manage throughput:

Manual Throughput: You explicitly set the number of RUs for your containers or databases.
Autoscale Throughput: Cosmos DB automatically scales your throughput up and down based on your application's workload, within a defined maximum RU limit. This is often the most cost-effective and efficient way to manage throughput for unpredictable workloads.

Provisioning Throughput

Throughput can be provisioned in two scopes:

Container Level: The most common approach. Throughput is dedicated to a specific container.
Database Level: Shared throughput for all containers within a database. This is useful when you have many containers with infrequent requests.

Note: Autoscale throughput is available for both SQL (Core) API and MongoDB API accounts. For other APIs, manual provisioning is the primary method.

Elastic Scale: Throughput and Storage

Cosmos DB offers independent and elastic scaling for both throughput and storage. As your data volume grows, your storage scales automatically. As your request load increases, you can scale your throughput.

Storage Scaling

Storage scales automatically to accommodate your data. There are limits on the maximum storage per partition, which is why effective partitioning is crucial for large datasets.

Throughput Scaling Options

You can scale throughput at any time through the Azure portal, Azure CLI, PowerShell, or SDKs.

Manual Throughput:


# Example: Scaling a container to 1000 RUs using Azure CLI
az cosmosdb sql container update \
    --resource-group MyResourceGroup \
    --account-name MyCosmosDBAccount \
    --database-name MyDatabase \
    --name MyContainer \
    --throughput 1000

Autoscale Throughput:

When configuring autoscale, you specify a maximum RU/s. Cosmos DB will scale throughput between 10% of the maximum and the maximum RU/s. For example, if you set the maximum to 4000 RU/s, it will scale between 400 and 4000 RU/s.


# Example: Setting autoscale to a maximum of 4000 RU/s using Azure CLI
az cosmosdb sql container update \
    --resource-group MyResourceGroup \
    --account-name MyCosmosDBAccount \
    --database-name MyDatabase \
    --name MyContainer \
    --max-throughput 4000

Partitioning for Scale

Partitioning is fundamental to achieving high scalability and performance in Cosmos DB. A logical partition is a group of documents that share the same partition key value. A physical partition can store multiple logical partitions. Choosing an effective partition key is critical:

Cardinality: A partition key with a high number of distinct values is generally better.
Distribution: A partition key that distributes requests evenly across logical partitions prevents "hot partitions."
Query Patterns: Design your partition key to align with your most frequent query patterns.

Partition Key Limits

Each logical partition has a storage limit (currently 20 GB) and a throughput limit. If a logical partition exceeds its throughput limit, requests might be throttled. If it exceeds its storage limit, you will need to repartition your data.

Tip: For datasets that grow very large, consider using a synthetic partition key or a composite partition key to ensure better distribution and avoid hitting partition limits.

Global Distribution and Scaling

Azure Cosmos DB supports multi-master writes and single-region writes. You can add or remove regions from your Cosmos DB account at any time. This allows you to:

Improve Latency: Place data closer to your users in different geographical regions.
High Availability: Ensure your application remains available even if an entire region experiences an outage.
Seamless Scaling: Add regions to scale your application's global reach and throughput capacity.

Adding and Removing Regions

You can manage regions via the Azure portal or programmatically.


# Example: Adding a region to a Cosmos DB account using Azure CLI
az cosmosdb create \
    --name MyCosmosDBAccount \
    --resource-group MyResourceGroup \
    --locations region1=eastus region2=westus \
    --capabilities EnableServerless EnableMaterializedViews

Monitoring Scalability

Regularly monitor your Cosmos DB account's performance metrics, including Request Units consumed, throttled requests, latency, and storage usage. This helps you identify potential bottlenecks and adjust your scaling strategy proactively.

Request Units (RUs): Monitor `Total RUs Consumed` and `Max RU/s` to understand your throughput needs.
Throttled Requests: High numbers of throttled requests (HTTP status code 429) indicate that your provisioned throughput is insufficient.
Latency: Monitor `Read Latency` and `Write Latency` to ensure your application is meeting performance requirements.

Azure Monitor provides comprehensive tools for observing these metrics.