Azure Cosmos DB: Comprehensive Documentation

Introduction to Azure Cosmos DB

Azure Cosmos DB is a globally distributed, multi-model database service that enables you to harness the benefits of modern cloud database development. It offers comprehensive SLAs on availability, latency, throughput, and consistency, backed by a 99.999% availability guarantee.

Cosmos DB supports various data models and APIs, making it a versatile choice for a wide range of applications, from web and mobile to IoT and gaming. Its key features include:

Getting Started with Cosmos DB

1. Create an Azure Cosmos DB Account

You can create an account via the Azure portal, Azure CLI, PowerShell, or Azure Resource Manager (ARM) templates. An account is the top-level resource for Cosmos DB.

# Example using Azure CLI
az cosmosdb create --name mycosmosdbaccount --resource-group myResourceGroup --locations region1 region2
See detailed account creation guide

2. Create a Database

Within your Cosmos DB account, you can create one or more databases. Databases logically group containers.

// Example using Node.js SDK
const { CosmosClient } = require("@azure/cosmos");
const client = new CosmosClient("YOUR_COSMOS_DB_CONNECTION_STRING");
await client.databases.create({ id: "myDatabase" });
Learn more about database creation

3. Create a Container

Containers are the fundamental units of data storage and throughput in Cosmos DB. They can hold a collection of items (documents, rows, nodes, etc.). You'll need to specify a partition key for your container.

A well-chosen partition key is crucial for performance and scalability.
// Example using Python SDK
const cosmos_client = cosmos_db.CosmosClient(
    url_connection="YOUR_COSMOS_DB_CONNECTION_STRING"
)
database = cosmos_client.CreateDatabase("myDatabase")
container = database.CreateContainer(
    id="myContainer",
    partition_key={"path": "/myPartitionKey"}
)
Container and Partition Key best practices

Data Modeling in Cosmos DB

Cosmos DB is schema-agnostic. You can store data in JSON, or other formats depending on the API you use. Effective data modeling is key to leveraging Cosmos DB's capabilities.

Partitioning Strategies

Partitioning divides your data into smaller, manageable chunks based on a partition key. This enables horizontal scaling of storage and throughput.

Understand partitioning

Indexing Policies

Cosmos DB automatically indexes all data written to a container. You can customize the indexing policy to optimize for specific query patterns, including including/excluding paths and setting indexing modes (consistent, lazy, none).

// Example indexing policy for SQL API
{
    "indexingMode": "consistent",
    "automatic": true,
    "includedPaths": [
        { "path": "/*" }
    ],
    "excludedPaths": [
        { "path": "/content/sensitiveData/*" }
    ]
}
Customizing indexing policies

Supported APIs

Cosmos DB offers multiple APIs, allowing you to use the programming model you're most comfortable with.

SQL (Core) API

The native API for Cosmos DB, offering rich query capabilities using familiar SQL syntax. Supports JSON documents.

SQL API Reference

MongoDB API

Compatible with the MongoDB wire protocol. Allows you to use existing MongoDB drivers and tools with Cosmos DB.

MongoDB API Guide

Cassandra API

High-throughput, low-latency access using the Cassandra Query Language (CQL).

Cassandra API Overview

Gremlin API

For graph data, using the Apache TinkerPop Gremlin query language.

Gremlin API Documentation

Table API

A key/value store compatible with Azure Table Storage.

Table API Details

Performance and Scalability

Throughput (Request Units per second - RU/s)

Throughput is provisioned in Request Units (RUs), a normalized measure of database throughput. You can provision throughput at the container or database level.

Autoscale provisioned throughput automatically scales RU/s based on your workload.
Understanding Request Units

Consistency Levels

Cosmos DB provides five well-defined consistency levels, offering tunable trade-offs between consistency, availability, and latency:

Choosing a consistency level

Scaling Options

Cosmos DB offers both manual and autoscale provisioned throughput. For storage, partitions scale automatically as your data grows.

Feature Description
Manual Throughput Fixed RU/s provisioned for a container or database.
Autoscale Throughput Automatically scales RU/s up and down based on workload, up to a configured maximum.
Storage Scaling Automatic horizontal scaling of physical partitions.
Scaling strategies for Cosmos DB

Security Features

Cosmos DB offers robust security features:

Secure your Cosmos DB data

Monitoring and Diagnostics

Monitor your Cosmos DB resources using Azure Monitor, logs, and metrics. Set up alerts for performance and availability issues.

Monitoring Cosmos DB with Azure Monitor

SDKs and Tools

Azure Cosmos DB provides official SDKs for popular programming languages, along with tools like Azure Data Explorer and the Cosmos DB Data Migration Tool.

Explore available SDKs and tools