Introduction to Azure Cosmos DB

Azure Cosmos DB is Microsoft's globally distributed, multi-model database service. It allows you to elastically and independently scale throughput and storage across any number of geographic regions. You can also take advantage of fast, read and write access to data across the globe, and wait for at-most 99th percentile read and write latencies, all while under a 99.999% availability Service Level Agreement (SLA).

Cosmos DB supports popular NoSQL APIs and query languages:

  • Core (SQL) API: For JSON and document data.
  • MongoDB API: For MongoDB applications.
  • Cassandra API: For Apache Cassandra applications.
  • Table API: For Azure Table storage applications.
  • Gremlin API: For graph data.
This flexibility means you can use the best API for your application's needs, without having to learn a new database or re-architect your existing application.

Getting Started

To get started with Azure Cosmos DB, follow these steps:

  1. Create an Azure Account: If you don't have one, sign up for a free Azure account.
  2. Create a Cosmos DB Account: Navigate to the Azure portal and create a new Azure Cosmos DB account.
  3. Choose an API: Select the API that best suits your application requirements.
  4. Create a Database and Container: Within your Cosmos DB account, create a database and then a container to store your data.
  5. Add Data: Use the Azure portal, SDKs, or one of the compatible APIs to insert your first items.
Pro Tip: For a quick hands-on experience, try the Azure Cosmos DB interactive tutorial in the Azure portal, or clone one of the sample applications from our GitHub repository.

Key Concepts

Databases, Containers, and Items

Cosmos DB has a hierarchical data model. A Cosmos DB account contains one or more databases. Each database can contain multiple containers. Containers are the core data-access and throughput-provisioning units. Data is partitioned across containers. Items are the base-level abstractions of data stored in a container. For the Core (SQL) API, items are JSON documents. For other APIs, items can be rows, nodes, edges, or key-value pairs.

Request Units (RUs)

Request Units (RUs) are a normalized measure of the resources—CPU, memory, and IOPS—required to perform database operations. Cosmos DB provisions throughput in terms of RUs per second (RU/s). You can manually provision RUs or use autoscale. Understanding RUs is crucial for cost management and performance tuning.

Partitioning

Cosmos DB uses horizontal partitioning to scale databases. A partition key is a property that determines which partition an item belongs to. Choosing an effective partition key is essential for scalability, performance, and uniform data distribution.

Consistency Models

Cosmos DB offers five well-defined consistency levels, allowing you to balance consistency, availability, and latency:

  • Strong: All reads are guaranteed to return the most recent committed write.
  • Bounded Staleness: Reads are guaranteed to be no more than k-versions or t-time stale.
  • Session: Reads within the same session are guaranteed to return the most recent committed write.
  • Consistent Prefix: Reads might return older versions of data, but never out of order.
  • Eventual: Reads might return stale data, with no guarantee on ordering.

APIs

Cosmos DB is a multi-model database. It supports multiple data models and communication protocols through its API implementations. The primary API is the Core (SQL) API, which provides a powerful SQL query interface for JSON documents. Other supported APIs include MongoDB, Cassandra, Table, and Gremlin, allowing you to leverage your existing skills and codebases.

Data Modeling

Effective data modeling in Cosmos DB depends heavily on the chosen API and the access patterns of your application. For the Core (SQL) API, consider denormalization to optimize read performance. Understand how your partition key choice impacts query performance and scalability.

Security

Cosmos DB offers robust security features, including role-based access control (RBAC), network isolation with VNet integration and private endpoints, and automatic data encryption at rest and in transit.

Performance

Optimize performance by selecting an appropriate partition key, provisioning adequate throughput (RU/s), utilizing indexing policies, and designing your queries efficiently. Consider the trade-offs between consistency levels and performance.

Cost Management

Costs in Cosmos DB are primarily determined by provisioned throughput (RU/s) and consumed storage. Monitor your RU consumption and storage usage, leverage autoscale throughput where appropriate, and consider reserved capacity for predictable workloads.

Monitoring and Diagnostics

Use Azure Monitor and Azure diagnostics to track performance metrics, analyze logs, and set up alerts for your Cosmos DB accounts. This helps in identifying bottlenecks and ensuring the availability and performance of your applications.

Explore the tutorials and SDKs sections for practical guidance.