Microsoft Azure Documentation

Azure Cosmos DB Concepts

Azure Cosmos DB is a globally distributed, multi-model database service that enables you to create applications with highly responsive and mission-critical scale. This section covers the core concepts that underpin Azure Cosmos DB.

Accounts, Databases, and Containers

Azure Cosmos DB organizes data into accounts, databases, and containers. An account is the top-level resource, containing one or more databases. A database is a logical namespace for containers and their data. A container is the fundamental unit of scalability and throughput in Azure Cosmos DB. It comprises a set of items, and it's where you store your data.

Data in a container is typically structured as JSON documents, but Azure Cosmos DB also supports other data models like key-value, graph, and column-family through its various APIs.

Items

An item is the basic unit of data that is stored within a container. When you use the SQL API (the default API), items are represented as JSON documents. Items can contain any number of properties, including nested objects and arrays.

A typical item in Azure Cosmos DB for NoSQL is a JSON document.

Each item has a unique identifier, known as the item ID. This ID, combined with the partition key, uniquely identifies an item within a container.

Partition Keys

To distribute data and throughput across multiple physical partitions, containers are partitioned. A partition key is a property within your items that Azure Cosmos DB uses to determine which physical partition to store the item on. Selecting an appropriate partition key is crucial for achieving scalable and high-performing applications.

The partition key value is part of the logical key of an item. For example, if you have a container with a partition key of /customerId, then an item with customerId set to "contoso-123" will be routed to the partition associated with that value.

Request Units (RUs)

Throughput in Azure Cosmos DB is measured in Request Units (RUs). RUs are a normalized measure of the throughput that your database or container requires. Different database operations (reading an item, writing an item, querying, etc.) consume a different number of RUs based on factors like the size of the data, the complexity of the operation, and the performance tier of the database.

You can provision throughput either on a per-container basis or a per-database basis. You can also use autoscale throughput, where Azure Cosmos DB automatically scales your throughput based on your workload.

Indexing

Azure Cosmos DB automatically indexes every item in your containers. By default, it uses a composite index that indexes all properties of your items. The indexing policy can be customized to include or exclude specific paths, and to define the indexing mode (consistent, lazy, or none).

For optimal query performance, especially with large datasets, consider creating specific composite indexes or spatial indexes as needed, rather than relying solely on the default indexing.

Consistency Levels

Azure Cosmos DB offers five well-defined consistency levels that you can choose from, balancing consistency, availability, and latency: Strong, Bounded Staleness, Session, Consistent Prefix, and Eventual.

The default consistency level is Session. Understanding these levels is key to designing applications that meet your specific requirements for data freshness and availability.


// Example: Getting the current consistency level
const consistency = await client.getConsistencyLevel();
console.log(`Current consistency level: ${consistency}`);
                

Throughput Partitioning

Azure Cosmos DB partitions data and throughput across multiple physical partitions. Each physical partition has its own allocated storage and throughput. The partition key determines which physical partition an item resides in. When you perform operations, the RUs are consumed from the physical partition that contains the relevant data.

For optimal performance, ensure your partition key design distributes your workload evenly across partitions to avoid "hot partitions" which can become a bottleneck.

APIs

Azure Cosmos DB supports multiple APIs, allowing you to use the data model and programming model that best suits your application needs:

  • Core (SQL) API: The default and most common API, supporting JSON documents and SQL-like query language.
  • MongoDB API: Provides compatibility with MongoDB applications.
  • Cassandra API: Supports Apache Cassandra data models.
  • Gremlin API: Enables working with graph data.
  • Table API: Compatible with Azure Table storage.