Azure Cosmos DB Storage Overview

Updated: October 26, 2023 Read time: 5 minutes Contributors: Microsoft

Azure Cosmos DB is a globally distributed, multi-model database service. This document provides an overview of how data is stored and managed within Azure Cosmos DB, focusing on storage characteristics, performance tiers, and cost considerations.

Key Storage Concepts

Azure Cosmos DB offers a schemaless data structure, allowing you to store JSON, Avro, and plain text documents. The primary storage unit is an item, which is the atomic unit of data. Items are organized within containers (similar to tables or collections). Containers are in turn grouped into databases.

Containers and Items

Each container is a set of items. When you create a container, you specify a partition key. The partition key is a property within the JSON document that determines which physical partitions the items are stored on. Proper partitioning is crucial for achieving high throughput and scalability.

Note: Azure Cosmos DB automatically manages the physical partitions. You don't need to provision or manage them yourself.

Partitioning

Partitioning is the process of horizontally scaling your database. Azure Cosmos DB uses the partition key you define to distribute your data across a set of logical and physical partitions. For optimal performance and to avoid hot partitions, choose a partition key with a high cardinality and even distribution of requests.

Storage Performance Tiers

Azure Cosmos DB offers different performance tiers to meet varying workload requirements. These tiers are primarily based on Request Units (RUs) per second, which is a normalized measure of throughput. Storage is provisioned alongside compute, and the amount of storage available is tied to the throughput provisioned.

-- Example: Creating a container with a partition key -- In SQL API CREATE CONTAINER c FROM @container WITH { partitionKey: { paths: ["/categoryId"], kind: "Hash" } }

Cost Management

The cost of Azure Cosmos DB storage is primarily driven by the provisioned throughput (RUs) and the amount of data stored. While storage itself is relatively inexpensive, the RU/s provisioned directly impacts your bill. Understanding your data access patterns and optimizing your partition key strategy can help manage costs effectively.

Storage Limits

The maximum storage capacity for a container is determined by the provisioned throughput. For example, a container provisioned with 1000 RU/s can store up to 10 GB of data. As you increase your RU/s, your storage capacity also increases proportionally. Each partition has a physical limit for storage (currently 20 GB). Azure Cosmos DB automatically manages the number of physical partitions to accommodate your data size and throughput requirements.

Tip: Monitor your storage usage and throughput consumption in the Azure portal to identify potential cost savings and performance optimizations.

Further Reading