Azure Cosmos DB is a globally distributed, multi-model database service that enables you to rapidly develop and scale modern applications. It offers a flexible and robust data model, a guaranteed low latency, high availability, and elastic scalability. This document provides an overview of the core concepts that underpin Azure Cosmos DB.
Key takeaway: Azure Cosmos DB is designed for applications that require global distribution, high throughput, and low latency.
An Azure Cosmos DB account is the top-level resource. It represents a globally distributed database instance. You can configure your account for single-region or multi-region writes and reads.
A database is a logical container for resources like containers and stored procedures. It acts as a namespace for your data within an Azure Cosmos DB account.
A container is the most granular unit of scalability and throughput. It's a schema-agnostic container for your data. Containers can store collections of JSON documents, key-value pairs, graphs, or property graphs. Each container is automatically indexed.
An item is the basic unit of data within a container. In a NoSQL document database, an item is an entity, represented as a JSON document. In a graph database, an item can be a vertex or an edge.
To achieve horizontal scalability, containers are partitioned. Data is distributed across multiple logical and physical partitions. Each logical partition contains a set of items that share the same partition key value.
A partition key is a property within your items that determines which logical partition the item belongs to. Choosing an effective partition key is crucial for performance and scalability. It should have a high cardinality (many unique values) and distribute requests evenly.
In an e-commerce application, you might use userId or orderId as a partition key.
{
"id": "doc1",
"userId": "user123",
"orderDate": "2024-07-26T10:00:00Z",
"totalAmount": 99.99
}
Throughput in Azure Cosmos DB is measured in Request Units (RUs). You can provision throughput at the container or database level. Provisioned throughput ensures predictable performance and availability.
A Request Unit (RU) is a normalized measure of the computational resources (CPU, memory, IOPS, etc.) required to perform a database operation. Simple reads and writes consume fewer RUs than complex queries.
Tip: Understanding RU consumption is key to cost optimization and performance tuning.
Azure Cosmos DB offers five distinct consistency levels, providing a trade-off between consistency, availability, and latency:
Azure Cosmos DB supports a variety of data models, including:
The schema-agnostic nature of containers allows for flexible and evolving data structures.
Azure Cosmos DB supports multiple APIs, allowing you to use your existing skillsets and tools:
Azure Cosmos DB is a powerful and versatile database service for modern cloud-native applications. By understanding its core concepts like accounts, databases, containers, items, partitions, RUs, and consistency levels, you can effectively design, build, and scale your applications globally.