Azure Cosmos DB Concepts: Overview

Share | Feedback

On this page

Introduction

Azure Cosmos DB is a globally distributed, multi-model database service that enables you to rapidly develop and scale modern applications. It offers a flexible and robust data model, a guaranteed low latency, high availability, and elastic scalability. This document provides an overview of the core concepts that underpin Azure Cosmos DB.

Key takeaway: Azure Cosmos DB is designed for applications that require global distribution, high throughput, and low latency.

Key Concepts

Accounts

An Azure Cosmos DB account is the top-level resource. It represents a globally distributed database instance. You can configure your account for single-region or multi-region writes and reads.

Databases

A database is a logical container for resources like containers and stored procedures. It acts as a namespace for your data within an Azure Cosmos DB account.

Containers

A container is the most granular unit of scalability and throughput. It's a schema-agnostic container for your data. Containers can store collections of JSON documents, key-value pairs, graphs, or property graphs. Each container is automatically indexed.

Items

An item is the basic unit of data within a container. In a NoSQL document database, an item is an entity, represented as a JSON document. In a graph database, an item can be a vertex or an edge.

Partitions

To achieve horizontal scalability, containers are partitioned. Data is distributed across multiple logical and physical partitions. Each logical partition contains a set of items that share the same partition key value.

Partition Key

A partition key is a property within your items that determines which logical partition the item belongs to. Choosing an effective partition key is crucial for performance and scalability. It should have a high cardinality (many unique values) and distribute requests evenly.

Example:

In an e-commerce application, you might use userId or orderId as a partition key.


{
    "id": "doc1",
    "userId": "user123",
    "orderDate": "2024-07-26T10:00:00Z",
    "totalAmount": 99.99
}
        

Throughput

Throughput in Azure Cosmos DB is measured in Request Units (RUs). You can provision throughput at the container or database level. Provisioned throughput ensures predictable performance and availability.

Request Units (RUs)

A Request Unit (RU) is a normalized measure of the computational resources (CPU, memory, IOPS, etc.) required to perform a database operation. Simple reads and writes consume fewer RUs than complex queries.

Tip: Understanding RU consumption is key to cost optimization and performance tuning.

Consistency Levels

Azure Cosmos DB offers five distinct consistency levels, providing a trade-off between consistency, availability, and latency:

Data Modeling

Azure Cosmos DB supports a variety of data models, including:

The schema-agnostic nature of containers allows for flexible and evolving data structures.

API Support

Azure Cosmos DB supports multiple APIs, allowing you to use your existing skillsets and tools:

Conclusion

Azure Cosmos DB is a powerful and versatile database service for modern cloud-native applications. By understanding its core concepts like accounts, databases, containers, items, partitions, RUs, and consistency levels, you can effectively design, build, and scale your applications globally.

Keywords: Azure Cosmos DB, NoSQL, Database, Global Distribution, Scalability, Throughput, Request Units, Partition Key, Consistency, Document Database, Key-Value Store, Graph Database, Cassandra API, MongoDB API, SQL API.