Azure Cosmos DB Partitioning - Azure Documentation

This tutorial guides you through the concepts and best practices of partitioning in Azure Cosmos DB, a globally distributed, multi-model database service. Effective partitioning is crucial for achieving high performance, scalability, and availability.

Understanding Partitioning in Azure Cosmos DB

Azure Cosmos DB uses partitioning to distribute data across multiple logical and physical partitions. This distribution enables the service to scale horizontally, handle massive amounts of data, and sustain high throughput.

Key Concepts

Partition Key: A property that determines which logical partition an item belongs to. The choice of a good partition key is vital for performance.
Logical Partitions: Collections of items that share the same partition key value.
Physical Partitions: The underlying storage units that host logical partitions. Azure Cosmos DB manages the mapping between logical and physical partitions.
Partition Throughput: Throughput (Request Units or RUs) is distributed across physical partitions.

Choosing an Effective Partition Key

The selection of a partition key significantly impacts your application's performance and scalability. An ideal partition key should:

Have a high cardinality (many distinct values).
Be evenly distributed to avoid hot partitions.
Be frequently used in queries to serve requests efficiently.

Common Partitioning Strategies

Here are some common strategies:

User ID: If your application is multi-tenant or user-centric, using a user ID as the partition key can isolate data per user.
Tenant ID: Similar to User ID, but for isolating data across different organizations or tenants.
Geographical Region: For geo-distributed applications, partitioning by region can ensure data locality.
Date/Time Buckets: For time-series data, partitioning by date (e.g., day, week) can be effective, but be mindful of hot partitions if not managed carefully.

Logical and Physical Partitions in Azure Cosmos DB

Handling Hot Partitions

A hot partition occurs when a disproportionate amount of traffic or data is directed to a single physical partition. This can lead to throttling and performance degradation. Strategies to mitigate hot partitions include:

Choosing a different partition key: Re-evaluating your data model and selecting a key with better distribution.
Using a larger partition key: If your partition key is a composite key, consider adding more properties to increase cardinality.
Scaling up throughput: Temporarily increase RU/s for the container to absorb spikes, but this is not a long-term solution for fundamental partitioning issues.

Important: Once a container is created with a partition key, you cannot change it. If you need to change the partition key, you must migrate your data to a new container.

Partitioning Best Practices

Select a partition key with high cardinality.
Avoid partition keys that have sequential values or are highly correlated with time.
Ensure even distribution of read and write operations across partitions.
Monitor partition usage and throughput using Azure Monitor.
Consider using the `id` property as a partition key for very high-throughput scenarios where each item is unique and accessed directly.

Tutorial: Implementing Partitioning

Let's walk through a practical example of creating a container with a specific partition key using the Azure portal.

Step 1: Navigate to Azure Cosmos DB

Go to the Azure portal and select your Azure Cosmos DB account.

Step 2: Create a New Container

In your Cosmos DB account, navigate to your desired database and click on "Add Container".

Step 3: Configure Partition Key

In the "Add Container" pane:

Enter a Container ID (e.g., products).
Select a Partition Key. For example, if you are storing product information, you might choose /categoryId or /brandId if these properties are well-distributed.
Configure throughput (Manual or Autoscale).
Click "OK".

Example: Partitioning by User ID in an e-commerce scenario

Consider an e-commerce application where each user has their own set of orders. Partitioning by /userId ensures that all data for a specific user is stored within a single logical partition, making user-specific queries very efficient.

Here's a sample JSON document:


{
    "id": "order12345",
    "userId": "userABCDE",
    "orderDate": "2023-10-27T10:00:00Z",
    "totalAmount": 150.75,
    "items": [...]
}

In this example, /userId ("userABCDE") would be the partition key. All orders belonging to "userABCDE" would reside in the same logical partition.

Advanced Partitioning Topics

Partitioning in different APIs: Learn how partitioning applies to SQL (Core) API, MongoDB API, Cassandra API, Gremlin API, and Table API.
Serverless Partitioning: How partitioning works with serverless offerings.
Partition Key Migration: Strategies and considerations for changing your partition key post-creation (requires data migration).

For more in-depth information, please refer to the official Azure Cosmos DB documentation on partitioning.