Azure Cosmos DB: Comprehensive Documentation
Introduction to Azure Cosmos DB
Azure Cosmos DB is a globally distributed, multi-model database service that enables you to harness the benefits of modern cloud database development. It offers comprehensive SLAs on availability, latency, throughput, and consistency, backed by a 99.999% availability guarantee.
Cosmos DB supports various data models and APIs, making it a versatile choice for a wide range of applications, from web and mobile to IoT and gaming. Its key features include:
- Global distribution
- Multi-model capabilities
- Guaranteed low latency
- Elastic scalability
- Comprehensive SLAs
Getting Started with Cosmos DB
1. Create an Azure Cosmos DB Account
You can create an account via the Azure portal, Azure CLI, PowerShell, or Azure Resource Manager (ARM) templates. An account is the top-level resource for Cosmos DB.
# Example using Azure CLI
az cosmosdb create --name mycosmosdbaccount --resource-group myResourceGroup --locations region1 region2
See detailed account creation guide
2. Create a Database
Within your Cosmos DB account, you can create one or more databases. Databases logically group containers.
// Example using Node.js SDK
const { CosmosClient } = require("@azure/cosmos");
const client = new CosmosClient("YOUR_COSMOS_DB_CONNECTION_STRING");
await client.databases.create({ id: "myDatabase" });
Learn more about database creation
3. Create a Container
Containers are the fundamental units of data storage and throughput in Cosmos DB. They can hold a collection of items (documents, rows, nodes, etc.). You'll need to specify a partition key for your container.
// Example using Python SDK
const cosmos_client = cosmos_db.CosmosClient(
url_connection="YOUR_COSMOS_DB_CONNECTION_STRING"
)
database = cosmos_client.CreateDatabase("myDatabase")
container = database.CreateContainer(
id="myContainer",
partition_key={"path": "/myPartitionKey"}
)
Container and Partition Key best practices
Data Modeling in Cosmos DB
Cosmos DB is schema-agnostic. You can store data in JSON, or other formats depending on the API you use. Effective data modeling is key to leveraging Cosmos DB's capabilities.
Partitioning Strategies
Partitioning divides your data into smaller, manageable chunks based on a partition key. This enables horizontal scaling of storage and throughput.
- Logical Partitioning: Based on the partition key defined for a container.
- Physical Partitioning: Cosmos DB manages the physical distribution of data across storage nodes.
Indexing Policies
Cosmos DB automatically indexes all data written to a container. You can customize the indexing policy to optimize for specific query patterns, including including/excluding paths and setting indexing modes (consistent, lazy, none).
// Example indexing policy for SQL API
{
"indexingMode": "consistent",
"automatic": true,
"includedPaths": [
{ "path": "/*" }
],
"excludedPaths": [
{ "path": "/content/sensitiveData/*" }
]
}
Customizing indexing policies
Supported APIs
Cosmos DB offers multiple APIs, allowing you to use the programming model you're most comfortable with.
SQL (Core) API
The native API for Cosmos DB, offering rich query capabilities using familiar SQL syntax. Supports JSON documents.
SQL API ReferenceMongoDB API
Compatible with the MongoDB wire protocol. Allows you to use existing MongoDB drivers and tools with Cosmos DB.
MongoDB API GuideCassandra API
High-throughput, low-latency access using the Cassandra Query Language (CQL).
Cassandra API OverviewGremlin API
For graph data, using the Apache TinkerPop Gremlin query language.
Gremlin API DocumentationTable API
A key/value store compatible with Azure Table Storage.
Table API DetailsPerformance and Scalability
Throughput (Request Units per second - RU/s)
Throughput is provisioned in Request Units (RUs), a normalized measure of database throughput. You can provision throughput at the container or database level.
Consistency Levels
Cosmos DB provides five well-defined consistency levels, offering tunable trade-offs between consistency, availability, and latency:
- Strong
- Bounded Staleness
- Session
- Consistent Prefix
- Eventual
Scaling Options
Cosmos DB offers both manual and autoscale provisioned throughput. For storage, partitions scale automatically as your data grows.
Feature | Description |
---|---|
Manual Throughput | Fixed RU/s provisioned for a container or database. |
Autoscale Throughput | Automatically scales RU/s up and down based on workload, up to a configured maximum. |
Storage Scaling | Automatic horizontal scaling of physical partitions. |
Security Features
Cosmos DB offers robust security features:
- Azure Active Directory integration
- Role-Based Access Control (RBAC)
- Primary/Secondary keys and resource tokens
- Network firewall and private endpoint support
- Data encryption at rest and in transit
Monitoring and Diagnostics
Monitor your Cosmos DB resources using Azure Monitor, logs, and metrics. Set up alerts for performance and availability issues.
Monitoring Cosmos DB with Azure MonitorSDKs and Tools
Azure Cosmos DB provides official SDKs for popular programming languages, along with tools like Azure Data Explorer and the Cosmos DB Data Migration Tool.
- SDKs: .NET, Java, Node.js, Python, Go, Spring Data
- Tools: Azure Portal, Azure CLI, Azure PowerShell, Azure Data Explorer, VS Code Extension