Azure Cosmos DB Scalability Reference

Understanding Scalability in Azure Cosmos DB

Azure Cosmos DB is a globally distributed, multi-model database service that enables you to elastically and independently scale throughput and storage across any number of geographic regions. This document provides a detailed reference on how to effectively scale your Azure Cosmos DB deployments.

Request Units (RUs) and Throughput Provisioning

Azure Cosmos DB uses Request Units (RUs) as a logical representation of database throughput. An RU normalizes the compute, CPU, and memory resources required to execute database operations. You can provision throughput either manually or automatically.

Manual Throughput Provisioning

In manual mode, you specify the exact number of RUs per second (RU/s) you want to provision for your container or database. This is ideal for predictable workloads. If your consumption exceeds the provisioned RU/s, requests will be throttled.


// Example: Provisioning 400 RU/s for a container
PUT /dbs/mydatabase/colls/mycontainer
{
    "id": "mycontainer",
    "throughput": 400
}

Autoscale Throughput Provisioning

Autoscale allows Azure Cosmos DB to automatically scale your throughput (RU/s) based on your workload demands, up to a specified maximum. This is perfect for variable or unpredictable workloads, ensuring you have enough throughput without over-provisioning.

Autoscale throughput is provisioned in increments of 100 RU/s, with a maximum limit that you define. The system scales your throughput within the range of max_throughput / 10 to max_throughput.


// Example: Provisioning autoscale throughput up to 4000 RU/s
PUT /dbs/mydatabase/colls/mycontainer
{
    "id": "mycontainer",
    "autoscaleSettings": {
        "maxThroughput": 4000
    }
}

Data Partitioning and Scalability

Azure Cosmos DB partitions your data horizontally across logical partitions. Each logical partition is further mapped to one or more physical partitions. Effective partitioning is crucial for achieving linear scalability.

Partition Key Selection

The choice of partition key is paramount. A good partition key:

Has a high cardinality (many distinct values).
Distributes requests evenly across partitions.
Avoids "hot partitions" (partitions that receive a disproportionate amount of traffic).

Consider attributes that are frequently queried or used in filters. For example, in a multi-tenant application, tenantId is often a good choice.

Understanding Logical vs. Physical Partitions

You interact with logical partitions through your application. Azure Cosmos DB manages the mapping of logical partitions to physical partitions to ensure performance and scalability. The number of physical partitions scales automatically to accommodate storage and throughput needs.

Partition Key Range Splitting

When a logical partition grows too large (storage) or becomes a hot spot (throughput), Azure Cosmos DB automatically splits the partition into two new logical partitions. This process is transparent to your application.

Global Distribution and Scalability

Azure Cosmos DB is designed for global scale. You can distribute your data across any number of Azure regions, providing low latency access for users worldwide and high availability.

Adding and Removing Regions

You can add or remove read-write or read-only regions to your Cosmos DB account through the Azure portal, Azure CLI, or SDKs.


# Example: Adding a region using Azure CLI
az cosmosdb region create --name EastUS --resource-group my-rg --account-name mycosmosdb

Replication

Data is automatically replicated across all configured regions. You can choose between multiple write regions (active-active) or a single write region with multiple read-only regions (active-passive).

Automatic Storage Scaling

Storage in Azure Cosmos DB scales automatically as you add data. There is no need to pre-provision storage capacity. As your data grows, Azure Cosmos DB automatically adds more physical partitions to accommodate the data.

Monitoring Scalability Metrics

Monitoring key metrics is essential for managing your Cosmos DB scalability effectively:

Consumed RU/s: Track your actual RU usage against provisioned throughput.
Storage Usage: Monitor how much storage your data is consuming.
Throttled Requests: Identify if your application is hitting RU limits.
Partition Utilization: Monitor for potential hot partitions.

Use Azure Monitor and Cosmos DB diagnostic logs to gain insights into your database's performance and scalability.

Scalability Best Practices

Choose Partition Keys Wisely: This is the single most important factor for scalability.
Provision Adequate Throughput: Monitor and adjust RU/s based on workload. Consider autoscale for variable loads.
Distribute Globally: Place your data closer to your users for low latency and high availability.
Batch Operations: For write-heavy workloads, batching multiple operations into a single request can improve efficiency.
Optimize Queries: Ensure your queries are efficient and leverage indexing effectively.
Monitor and Alert: Set up alerts for high RU consumption, throttled requests, and storage growth.