Containers in Azure Cosmos DB
This document provides a comprehensive guide to understanding and managing containers within Azure Cosmos DB.
Introduction to Containers
A container is the fundamental unit of scalability and throughput in Azure Cosmos DB. It's a schema-agnostic container for a hierarchical collection of resources. A container can hold entities, stored procedures, triggers, and user-defined functions.
Each container is uniquely identified by a name within a specific database. Containers are partitioned by a partition key, which is a property within the document that Cosmos DB uses to distribute data across logical partitions. The choice of partition key is critical for performance and scalability.
Creating Containers
You can create containers using various methods, including the Azure portal, Azure CLI, Azure PowerShell, or the Azure Cosmos DB SDKs for different programming languages.
When creating a container, you must specify:
- The database ID the container belongs to.
- The container ID (name).
- The partition key path.
- The indexing policy (optional, with default settings available).
- The throughput provisioned (request units per second, RU/s).
Here's an example of creating a container using the Azure CLI:
az cosmosdb container create \
--resource-group MyResourceGroup \
--account-name MyCosmosDBAccount \
--database-name MyDatabase \
--name MyContainer \
--partition-key-path "/categoryId"
Partitioning Strategies
Effective partitioning is crucial for distributing your data and requests evenly across logical partitions. This ensures optimal performance, scalability, and predictable costs.
A partition key is a property from your document that Cosmos DB uses to determine which logical partition the document should be stored in. To achieve effective partitioning, consider these strategies:
- Cardinality: Choose a partition key with a high cardinality (many unique values) to distribute data widely.
- Uniform distribution: Aim for a partition key that distributes read and write operations evenly across logical partitions. Avoid "hot" partitions that receive a disproportionate amount of traffic.
- Partition key depth: For composite partition keys, Cosmos DB supports up to 3 keys, which can provide finer-grained distribution.
Indexing Policies
Azure Cosmos DB automatically indexes all data within a container. The indexing policy defines how documents are indexed. By default, Cosmos DB uses an automatic indexing policy that indexes all properties of a document, providing a balance between query performance and storage overhead.
You can customize the indexing policy to optimize for specific query patterns:
- Inclusion/Exclusion: Specify which paths to include or exclude from indexing.
- Data types: Define indexing for specific data types (e.g., strings, numbers).
- Index kind: Choose between different index kinds like `range`, `hash`, or `spatial`.
Here's a JSON snippet for a custom indexing policy:
{
"indexingMode": "consistent",
"automatic": false,
"includedPaths": [
{
"path": "/*",
"indexes": [
{
"kind": "Range",
"dataType": "String",
"precision": 3
},
{
"kind": "Range",
"dataType": "Number",
"precision": -1
}
]
}
],
"excludedPaths": [
{
"path": "/nonIndexedContent/*"
}
]
}
Throughput Provisioning
Throughput in Azure Cosmos DB is measured in Request Units per second (RU/s). You can provision throughput at the container level or the database level.
- Manual Throughput: You specify a fixed RU/s value.
- Autoscale Throughput: Cosmos DB automatically scales the RU/s up and down based on workload demands, up to a maximum limit.
The cost of your Cosmos DB account is directly related to the provisioned throughput and storage consumed. Monitoring your RU/s consumption is essential for cost management.
Common Operations
Key operations you can perform on containers include:
- Create Container: As shown above.
- Read Container: Retrieve properties of an existing container.
- Update Container: Modify indexing policies, throughput, or TTL.
- Delete Container: Remove a container and all its data.
- List Containers: Retrieve a list of all containers within a database.
Example: Reading Container Properties (Azure SDK for .NET)
using Microsoft.Azure.Cosmos;
// ...
ContainerProperties containerProperties = await container.ReadContainerAsync();
Console.WriteLine($"Container ID: {containerProperties.Id}");
Console.WriteLine($"Partition Key Path: {containerProperties.PartitionKeyPath}");
Understanding and effectively managing containers is key to building scalable and performant applications on Azure Cosmos DB.