Azure Cosmos DB: NoSQL Partition Keys

Mastering Scalability and Performance in Your Data

Introduction to Partition Keys

Partition keys are fundamental to achieving horizontal scalability and high performance in Azure Cosmos DB's NoSQL (Core) API. They are a property within your JSON documents that Cosmos DB uses to distribute data across multiple logical and physical partitions. Choosing the right partition key is one of the most critical decisions you'll make when designing your Cosmos DB solution.

Key Goal: Distribute requests and data evenly across partitions to maximize throughput and minimize latency.

A well-chosen partition key ensures:

  • Scalability: Your application can handle increasing amounts of data and traffic by automatically scaling out across more physical partitions.
  • Performance: Queries are routed efficiently, and data access is fast because related data often resides on the same partition.
  • Cost-Effectiveness: Prevents hot partitions that can lead to throttling and increased Request Unit (RU) consumption.

How Partition Keys Work

Cosmos DB uses a hash-based partitioning algorithm. When you choose a partition key, Cosmos DB hashes the value of that key to determine which logical partition the document belongs to. Each logical partition is then mapped to one or more physical partitions. Cosmos DB automatically manages the distribution of data and traffic across these physical partitions.

  • Partition Key Value: A property (or a combination of properties) within your document.
  • Logical Partition: A logical grouping of documents that share the same partition key value.
  • Physical Partition: The actual storage unit where data is stored. Multiple logical partitions can reside on a single physical partition, up to a certain limit.
  • Request Routing: When you send a query, Cosmos DB uses the partition key value in your query to route the request to the relevant logical/physical partition(s).

Request Unit (RU) Consumption

The performance and cost of operations are measured in Request Units (RUs). Every operation (read, write, query) consumes RUs. Even distribution of RUs across partitions is crucial. A "hot partition" occurs when a disproportionate amount of traffic hits a single partition, leading to throttling and inefficient RU usage.

Best Practices for Choosing a Partition Key

Selecting an effective partition key is an art and a science. Consider these best practices:

  • High Cardinality: The partition key value should have a wide range of distinct values. This helps distribute data and requests across many logical partitions. For example, a unique `userId` is better than a `status` like "Active" or "Inactive".
  • Even Data Distribution: Aim for a partition key that distributes your data and request load as evenly as possible. Avoid keys that tend to have many documents with the same few values.
  • Query Patterns: Choose a key that is frequently used in your queries' `WHERE` clauses (filters). This allows Cosmos DB to efficiently route queries directly to the relevant partitions, rather than scanning all partitions.
  • Avoid Time-Based Keys (Generally): While seemingly intuitive, keys like `date` or `timestamp` often lead to hot partitions, especially if most data is written at the same time. If you need to query by time, consider a composite key or a different approach.
  • Synthetic Partition Keys: If you don't have a natural property that meets the criteria, you can create a synthetic key by combining multiple properties or using a GUID.
  • Partition Key Size Limit: A single partition key can contain up to 1MB of data.
  • Maximum Number of Logical Partitions: The number of logical partitions per physical partition is limited. Choose a key that spreads data widely to avoid hitting this limit.

Common Partition Key Examples

E-commerce: Orders

Scenario: A large online store with millions of orders.

Good Partition Key: orderId or customerId.

Reasoning: High cardinality, often used in queries to retrieve a specific order or all orders for a customer.

Bad Partition Key: orderStatus (e.g., "Pending", "Shipped"). This would lead to hot partitions for common statuses.

IoT: Device Telemetry

Scenario: Millions of IoT devices sending telemetry data.

Good Partition Key: deviceId.

Reasoning: Each device has a unique ID, ensuring data and queries for a specific device are localized. Excellent for querying latest telemetry for a device.

Bad Partition Key: timestamp (without further breakdown). This would create hot partitions based on ingestion time.

User Profiles: Social Media

Scenario: Storing user profiles and their associated data.

Good Partition Key: userId.

Reasoning: Unique identifier for each user, ensuring all their data is co-located. Ideal for fetching a user's profile and related information.

Bad Partition Key: country. If a few countries dominate user population, this creates hot partitions.

Important Considerations

  • Immutable Partition Key: Once a document is created, its partition key value cannot be changed. If you need to change it, you'll have to re-ingest the data.
  • Partition Key Choice is Permanent: You cannot change the partition key for an existing container. If you realize your initial choice was poor, you'll need to create a new container with the correct partition key and migrate your data.
  • Composite Partition Keys: You can use up to three properties to form a composite partition key (e.g., /partition1/partition2/partition3). This can help with uneven distribution if a single property isn't sufficient. The order of properties matters for query efficiency.
  • Synthetic Partition Keys: When no single property provides good distribution, consider creating a synthetic partition key. For example, if you have `city` and `state`, you could use `/state/city` as a composite key. Or, if you have many documents for a popular `tenantId`, you might add a `subTenantId` or a random suffix to distribute load.
  • Dimension vs. Transactional Partitioning: For analytics workloads, you might partition by a dimension (e.g., `productId`) to group related items. For transactional workloads, partitioning by a high-cardinality identifier (e.g., `orderId`, `userId`) is usually better.