Azure Cosmos DB Concepts

Azure Cosmos DB is a globally distributed, multi-model database service that enables you to create and query NoSQL, relational, and graph databases with minimal development. It's designed to be massively scalable, highly available, and low-latency.

Introduction to Azure Cosmos DB

Azure Cosmos DB is Microsoft's globally distributed, multi-model database service. It offers a variety of data models and APIs, including SQL (DocumentDB), MongoDB, Cassandra, Gremlin (Graph), and Table. This flexibility allows developers to use the API that best suits their application needs while benefiting from the core features of Cosmos DB.

Data Model

Azure Cosmos DB supports several data models:

APIs

Cosmos DB provides multiple APIs to interact with your data:

Accounts, Databases, Containers, and Items

The core hierarchical structure in Azure Cosmos DB is:

Partitioning

To achieve massive scalability, Azure Cosmos DB partitions data horizontally. Each container is logically partitioned into multiple partitions. A partition key is selected when you create a container. This key, a property within your item's JSON document, determines which partition an item is stored in. Good partition key design is crucial for performance and scalability.

Choosing an effective partition key is essential for distributing your data and requests uniformly across partitions.

Indexing

Azure Cosmos DB automatically indexes every property of every item stored in a container. The indexing policy, which is configurable, dictates how this indexing is performed. The default indexing policy is a composite index that includes all properties, providing broad query support.


{
  "indexingMode": "consistent",
  "automatic": true,
  "includedPaths": [
    {
      "path": "/*",
      "indexes": [
        {
          "kind": "Range",
          "dataTypes": ["String", "Number", "Boolean", "Null"],
          "precision": -1
        },
        {
          "kind": "Spatial",
          "dataTypes": ["Point", "Polygon", "LineString"]
        },
        {
          "kind": "Composite",
          "dataTypes": ["String", "Number", "Boolean", "Null"],
          "size": 2
        }
      ]
    }
  ],
  "excludedPaths": []
}
            

Consistency Models

Azure Cosmos DB offers five well-defined consistency levels, allowing you to balance consistency, availability, and latency:

Throughput

Throughput in Azure Cosmos DB is measured in Request Units (RUs) per second. You can provision throughput in two ways:

Each operation (e.g., reading an item, querying) consumes a certain number of RUs, depending on the operation type, item size, and indexing. Understanding RU consumption is key to cost management and performance tuning.

Regions and Replication

Azure Cosmos DB is a globally distributed service. You can replicate your data across multiple Azure regions for high availability and low latency access for users worldwide. You can configure which regions your account is deployed in and enable multi-master writes for active-active global distribution.

Global distribution and multi-region writes are key differentiators for building highly available and responsive applications.