Azure Cosmos DB Storage Overview
Azure Cosmos DB is a globally distributed, multi-model database service. This document provides an overview of how data is stored and managed within Azure Cosmos DB, focusing on storage characteristics, performance tiers, and cost considerations.
Key Storage Concepts
Azure Cosmos DB offers a schemaless data structure, allowing you to store JSON, Avro, and plain text documents. The primary storage unit is an item, which is the atomic unit of data. Items are organized within containers (similar to tables or collections). Containers are in turn grouped into databases.
Containers and Items
Each container is a set of items. When you create a container, you specify a partition key. The partition key is a property within the JSON document that determines which physical partitions the items are stored on. Proper partitioning is crucial for achieving high throughput and scalability.
Partitioning
Partitioning is the process of horizontally scaling your database. Azure Cosmos DB uses the partition key you define to distribute your data across a set of logical and physical partitions. For optimal performance and to avoid hot partitions, choose a partition key with a high cardinality and even distribution of requests.
Storage Performance Tiers
Azure Cosmos DB offers different performance tiers to meet varying workload requirements. These tiers are primarily based on Request Units (RUs) per second, which is a normalized measure of throughput. Storage is provisioned alongside compute, and the amount of storage available is tied to the throughput provisioned.
- Standard Storage: Provisioned throughput with associated storage. Storage scales automatically up to a limit determined by your provisioned throughput.
- Autoscale Storage: Throughput and storage scale automatically based on usage, up to predefined maximums.
-- Example: Creating a container with a partition key
-- In SQL API
CREATE CONTAINER c
FROM @container
WITH {
partitionKey: {
paths: ["/categoryId"],
kind: "Hash"
}
}
Cost Management
The cost of Azure Cosmos DB storage is primarily driven by the provisioned throughput (RUs) and the amount of data stored. While storage itself is relatively inexpensive, the RU/s provisioned directly impacts your bill. Understanding your data access patterns and optimizing your partition key strategy can help manage costs effectively.
Storage Limits
The maximum storage capacity for a container is determined by the provisioned throughput. For example, a container provisioned with 1000 RU/s can store up to 10 GB of data. As you increase your RU/s, your storage capacity also increases proportionally. Each partition has a physical limit for storage (currently 20 GB). Azure Cosmos DB automatically manages the number of physical partitions to accommodate your data size and throughput requirements.