Azure Cosmos DB Reference

Overview

Azure Cosmos DB is a globally distributed, multi-model database service. It allows you to quickly create and query document, key/value, and graph databases, all of which benefit from global distribution and horizontal scaling capabilities. Cosmos DB is a fully managed NoSQL database service that guarantees low latency, high availability, and elastic scalability.

This reference guide provides detailed information about the APIs, SDKs, concepts, and best practices for working with Azure Cosmos DB.

Architecture

Cosmos DB's architecture is designed for extreme scalability and global distribution. It utilizes a log-structured merge-tree (LSM tree) storage engine for efficient data writes and reads. Replication across multiple regions is handled seamlessly, ensuring high availability and low latency for users worldwide.

Key components include:

  • Global Distribution: Replicate data to any Azure region.
  • Multi-Model Support: Supports Document, Key-Value, Graph, and Column-Family data models.
  • Elastic Scalability: Scale throughput and storage independently.
  • Guaranteed SLAs: Provides industry-leading Service Level Agreements for availability, throughput, latency, and consistency.

Key Concepts

Accounts

An Azure Cosmos DB account is the top-level resource. It represents a globally distributed database. Each account can contain multiple databases.

Databases

A database in Cosmos DB is a logical namespace for a set of containers. It's a unit of management and billing.

Containers

A container is the fundamental unit of scalability and throughput in Cosmos DB. It's where your data is stored. Containers can store a schema-agnostic set of items (documents, key-value pairs, graphs, etc.). You can create a container with a specific partition key to distribute your data efficiently.

Examples of container types include:

  • Collection: For document data (e.g., using the Core (SQL) API).
  • Table: For key-value data (e.g., using the Table API).
  • Graph: For graph data (e.g., using the Gremlin API).
  • CimDocument: For column-family data (e.g., using the Cassandra API).

Items

An item is the basic unit of data storage within a container. The format of an item depends on the API being used. For the Core (SQL) API, an item is typically a JSON document.

Example JSON document:


{
    "id": "example-item-123",
    "productName": "Azure Cosmos DB T-Shirt",
    "category": "Apparel",
    "price": 19.99,
    "tags": ["azure", "cosmosdb", "apparel"]
}
                

Partitions

To achieve horizontal scaling, Cosmos DB partitions data within a container. Each partition is a range of items that share the same partition key value. A well-chosen partition key is crucial for performance and scalability.

The partition key is a property within your items that Cosmos DB uses to distribute data across multiple physical partitions.

Request Units (RUs)

Request Units (RUs) are the measure of throughput in Azure Cosmos DB. They abstract the underlying database resources like CPU, memory, and IOPS consumed by database operations. All operations (reads, writes, queries) consume RUs, and you provision a certain amount of RUs per second (RU/s) for your containers.

A simple read of an item might consume 1 RU, while a complex query could consume many more.

API Reference

REST API

Azure Cosmos DB provides a comprehensive RESTful API for interacting with your data. You can use this API directly or through the various SDKs.

The REST API allows you to perform operations such as:

  • Creating, reading, updating, and deleting accounts, databases, and containers.
  • Performing CRUD (Create, Read, Update, Delete) operations on items.
  • Executing queries against your data.
  • Managing indexing policies and throughput.

Common Endpoints:

GET /dbs - List databases

POST /dbs/{db_id}/colls - Create a new container

GET /dbs/{db_id}/colls/{coll_id}/docs - List items in a container

POST /dbs/{db_id}/colls/{coll_id}/docs - Create a new item

SDK Reference

Azure Cosmos DB offers SDKs for various popular programming languages to simplify development:

Language Package Name / Repository Link
.NET Microsoft.Azure.Cosmos NuGet | GitHub
Java com.azure:azure-cosmos Maven | GitHub
Node.js @azure/cosmos NPM | GitHub
Python azure-cosmos PyPI | GitHub
Go sdk/cosmos/azcosmos Go Module | GitHub

Best Practices

To maximize performance, scalability, and cost-effectiveness, follow these best practices:

  • Choose the right partition key: Select a high-cardinality property that evenly distributes requests and storage across partitions. Avoid hot partitions.
  • Request Unit (RU) optimization: Monitor RU consumption and adjust provisioned throughput. Use autoscale where appropriate. Design efficient queries to minimize RU usage.
  • Indexing: Understand the indexing policy and tailor it to your workload. Remove unnecessary paths from indexing to save on RU costs.
  • Connection management: Use the appropriate SDK and leverage connection pooling for consistent performance.
  • Consistency levels: Choose the consistency level that best suits your application's needs for data freshness and performance.
  • Batch operations: For multiple small operations, consider using stored procedures or transactions for efficiency.