Querying Data in Azure Cosmos DB

Introduction to Querying

Azure Cosmos DB offers a powerful and flexible querying experience that supports various SQL-like query syntaxes. The primary query language is SQL, but you can also leverage APIs like MongoDB, Cassandra, Gremlin, and Table for querying.

This tutorial focuses on using the SQL API for querying, which is the most common method. We'll explore basic and advanced querying techniques, including filtering, sorting, projection, aggregation, and working with nested data.

Note: Azure Cosmos DB is a multi-model database. The query language depends on the API you choose (e.g., SQL API, MongoDB API, Gremlin API). This tutorial primarily covers the SQL API.

Basic Querying with SQL API

The basic structure of a SQL query in Azure Cosmos DB is similar to traditional SQL:

SELECT  FROM  WHERE 

Selecting All Fields

To retrieve all documents from a container (equivalent to a table in relational databases), you can use the `*` wildcard:

SELECT * FROM c

Here, c is an alias for the container.

Selecting Specific Fields

To retrieve only specific properties from your documents, list them after SELECT:

SELECT c.name, c.age FROM c

Filtering Data (WHERE Clause)

The WHERE clause allows you to filter documents based on specified conditions. You can use various operators:

  • Comparison operators: =, !=, >, <, >=, <=
  • Logical operators: AND, OR, NOT
  • IN operator: Checks if a value exists in a list.
  • LIKE operator: Performs pattern matching (supports `?` for single character and `%` for multiple characters).

Example: Find users older than 30.

SELECT * FROM c WHERE c.age > 30

Example: Find users named "Alice" or "Bob".

SELECT * FROM c WHERE c.name = "Alice" OR c.name = "Bob"

Example: Find users whose name starts with "J".

SELECT * FROM c WHERE STARTSWITH(c.name, "J")

Sorting Data (ORDER BY Clause)

Use the ORDER BY clause to sort the query results. You can sort in ascending (ASC, default) or descending (DESC) order.

SELECT * FROM c ORDER BY c.age DESC

You can also sort by multiple properties:

SELECT * FROM c ORDER BY c.city ASC, c.name DESC

Projection and Aliasing

Projection is the process of selecting specific fields. You can rename fields using aliases for better readability.

SELECT c.name AS UserName, c.city AS UserCity FROM c

Working with Nested Data

Azure Cosmos DB documents are often JSON, which can contain nested objects and arrays. You can access nested properties using dot notation.

Example: Accessing a property within a nested object.

SELECT c.address.street, c.address.city FROM c

Accessing Array Elements

To query within arrays, you can use the `VALUE` keyword or simply iterate if the array is the result of a join or subquery.

Example: Find users who live in a specific city that is present in an array of addresses.

SELECT * FROM c JOIN address IN c.addresses WHERE address.city = "New York"

Unnesting Arrays (ARRAY_CONTAINS)

The ARRAY_CONTAINS function checks if an array contains a specific value.

SELECT * FROM c WHERE ARRAY_CONTAINS(c.tags, "azure")

Aggregation Queries

Azure Cosmos DB supports several aggregation functions for summarizing data, such as COUNT, SUM, AVG, MIN, and MAX.

Example: Count the total number of users.

SELECT COUNT(c) AS totalUsers FROM c

Example: Calculate the average age of users.

SELECT AVG(c.age) AS averageAge FROM c

Grouping Data (GROUP BY Clause)

Use the GROUP BY clause to group results based on one or more properties and then apply aggregation functions.

SELECT c.city, COUNT(c) AS numberOfUsers FROM c GROUP BY c.city

Top and Skip

Use TOP to limit the number of results and SKIP to offset the results, useful for pagination.

Example: Get the top 10 oldest users.

SELECT TOP 10 * FROM c ORDER BY c.age DESC

Example: Get the next 10 users after the first 10 (for pagination).

SELECT * FROM c ORDER BY c.name ASC OFFSET 10 LIMIT 10

Parameterized Queries

Parameterized queries are essential for security and performance. They prevent SQL injection and allow the query plan to be reused.

SELECT * FROM c WHERE c.name = @userName AND c.age > @minAge

When executing this query, you would provide values for @userName and @minAge separately.

Tip: Always use parameterized queries when user input is involved in your queries.

Cross-Partition Queries

By default, Azure Cosmos DB executes queries within a single partition for optimal performance. If your query needs to span multiple partitions (and you haven't defined a partition key in your query), it's called a cross-partition query. These can be less performant and more costly.

To optimize, always include the partition key in your WHERE clause whenever possible.

-- Example with partition key (assuming 'category' is the partition key)
SELECT * FROM c WHERE c.category = "electronics" AND c.price > 100

Further Learning

Explore the official Azure Cosmos DB documentation for advanced querying features, including:

  • User-Defined Functions (UDFs)
  • Stored Procedures
  • Triggers
  • Working with geospatial data
  • Using different APIs (MongoDB, Gremlin, Cassandra)

Check out the Azure Cosmos DB SQL query documentation for detailed examples and syntax.