Core Concepts of Azure Cosmos DB

Azure Cosmos DB is a globally distributed, multi-model database service that enables you to create and query NoSQL and relational data with unparalleled ease. It offers robust features for enterprise applications, including guaranteed low latency, high availability, and elastic scalability.

Data Model

Azure Cosmos DB supports multiple data models, including document, key-value, graph, and column-family. The default and most common model is the document model, where data is stored as JSON documents. Each document is an atomic unit of data and can have a complex nested structure.

  • Documents: Self-contained JSON objects representing an entity.
  • Collections (or Containers): Logical groupings of documents. In Cosmos DB, this is referred to as a Container.
  • Items: The fundamental unit of data within a Container, typically a JSON document.

Partitioning

To achieve massive scalability and high performance, Azure Cosmos DB uses a technique called partitioning. Data is horizontally distributed across multiple logical and physical partitions. Each partition is an independent unit of scaling.

  • Partition Key: A property within your documents that Azure Cosmos DB uses to determine which partition an item belongs to. Choosing an effective partition key is crucial for performance and scalability.
  • Logical Partition: A set of items that share the same partition key value.
  • Physical Partition: The actual physical storage unit that hosts one or more logical partitions.

Key takeaway: A well-chosen partition key distributes your data and request load evenly across physical partitions, preventing hot spots and maximizing throughput.

Request Units (RUs)

Azure Cosmos DB's performance is measured in Request Units (RUs). An RU is a normalized measure of throughput that represents the combination of database resources required to execute a request, including CPU, memory, and I/O. Every operation performed against your Cosmos DB database consumes a certain number of RUs.

  • Provisioned Throughput: You can provision throughput in terms of RUs per second (RU/s) at the Container or Database level.
  • Autoscale: A mode where Cosmos DB automatically scales your throughput up and down based on your workload, ensuring optimal performance and cost efficiency.
  • Serverless: A consumption-based pricing model where you pay for actual usage (RUs consumed) rather than provisioned throughput.

A simple read operation might consume 1 RU, while a complex query with a filter and sort could consume significantly more. You can monitor RU consumption in the Azure portal.

Consistency Levels

Azure Cosmos DB offers a comprehensive set of five well-defined consistency levels that allow you to balance consistency, availability, and latency based on your application's needs. These levels range from strong consistency to eventual consistency.

  • Strong: Reads always return the most up-to-date data. Offers the highest consistency but can impact latency.
  • Bounded Staleness: Reads are guaranteed to be no more than 'k' updates or 't' time behind the leader.
  • Session: Reads within a single client session are consistent. This is the default and offers a good balance.
  • Consistent Prefix: Reads are guaranteed to return a prefix of all writes.
  • Eventual: Reads might return stale data. Offers the lowest latency and highest availability.

Choosing the right consistency level is critical for application design. Most applications benefit from the default Session consistency.

Global Distribution

Azure Cosmos DB is designed for global distribution. You can easily distribute your data across any number of Azure regions. This capability allows you to build highly available and resilient applications that are deployed close to your users worldwide.

  • Multi-region Writes: Enable writes to be performed in multiple regions simultaneously, offering low write latency globally.
  • Automatic Failover: In case of a regional outage, Azure Cosmos DB automatically fails over to other available regions, ensuring continuous availability.

APIs

Azure Cosmos DB supports multiple APIs, allowing you to use your existing development tools and frameworks. Key APIs include:

  • Core (SQL) API: The primary API for JSON document access and querying using a SQL-like query language.
  • MongoDB API: Compatible with MongoDB applications.
  • Cassandra API: Compatible with Apache Cassandra.
  • Gremlin API: For graph database workloads.
  • Table API: Compatible with Azure Table storage.

Key Terminology Summary

Term Description
Account The top-level resource in Azure Cosmos DB.
Database A logical namespace within an account.
Container (Collection) A logical grouping of items, with a partition key defined.
Item (Document) A fundamental unit of data, typically a JSON object.
Partition Key A property used to distribute data across partitions.
Request Unit (RU) A measure of database throughput.
Region A physical geographical location where your data is replicated.

Understanding these core concepts is fundamental to effectively designing, developing, and managing applications with Azure Cosmos DB.