Querying Data in Azure Cosmos DB
Introduction to Querying
Azure Cosmos DB offers a powerful and flexible querying experience that supports various SQL-like query syntaxes. The primary query language is SQL, but you can also leverage APIs like MongoDB, Cassandra, Gremlin, and Table for querying.
This tutorial focuses on using the SQL API for querying, which is the most common method. We'll explore basic and advanced querying techniques, including filtering, sorting, projection, aggregation, and working with nested data.
Basic Querying with SQL API
The basic structure of a SQL query in Azure Cosmos DB is similar to traditional SQL:
SELECT FROM WHERE
Selecting All Fields
To retrieve all documents from a container (equivalent to a table in relational databases), you can use the `*` wildcard:
SELECT * FROM c
Here, c
is an alias for the container.
Selecting Specific Fields
To retrieve only specific properties from your documents, list them after SELECT
:
SELECT c.name, c.age FROM c
Filtering Data (WHERE Clause)
The WHERE
clause allows you to filter documents based on specified conditions. You can use various operators:
- Comparison operators:
=
,!=
,>
,<
,>=
,<=
- Logical operators:
AND
,OR
,NOT
IN
operator: Checks if a value exists in a list.LIKE
operator: Performs pattern matching (supports `?` for single character and `%` for multiple characters).
Example: Find users older than 30.
SELECT * FROM c WHERE c.age > 30
Example: Find users named "Alice" or "Bob".
SELECT * FROM c WHERE c.name = "Alice" OR c.name = "Bob"
Example: Find users whose name starts with "J".
SELECT * FROM c WHERE STARTSWITH(c.name, "J")
Sorting Data (ORDER BY Clause)
Use the ORDER BY
clause to sort the query results. You can sort in ascending (ASC
, default) or descending (DESC
) order.
SELECT * FROM c ORDER BY c.age DESC
You can also sort by multiple properties:
SELECT * FROM c ORDER BY c.city ASC, c.name DESC
Projection and Aliasing
Projection is the process of selecting specific fields. You can rename fields using aliases for better readability.
SELECT c.name AS UserName, c.city AS UserCity FROM c
Working with Nested Data
Azure Cosmos DB documents are often JSON, which can contain nested objects and arrays. You can access nested properties using dot notation.
Example: Accessing a property within a nested object.
SELECT c.address.street, c.address.city FROM c
Accessing Array Elements
To query within arrays, you can use the `VALUE` keyword or simply iterate if the array is the result of a join or subquery.
Example: Find users who live in a specific city that is present in an array of addresses.
SELECT * FROM c JOIN address IN c.addresses WHERE address.city = "New York"
Unnesting Arrays (ARRAY_CONTAINS)
The ARRAY_CONTAINS
function checks if an array contains a specific value.
SELECT * FROM c WHERE ARRAY_CONTAINS(c.tags, "azure")
Aggregation Queries
Azure Cosmos DB supports several aggregation functions for summarizing data, such as COUNT
, SUM
, AVG
, MIN
, and MAX
.
Example: Count the total number of users.
SELECT COUNT(c) AS totalUsers FROM c
Example: Calculate the average age of users.
SELECT AVG(c.age) AS averageAge FROM c
Grouping Data (GROUP BY Clause)
Use the GROUP BY
clause to group results based on one or more properties and then apply aggregation functions.
SELECT c.city, COUNT(c) AS numberOfUsers FROM c GROUP BY c.city
Top and Skip
Use TOP
to limit the number of results and SKIP
to offset the results, useful for pagination.
Example: Get the top 10 oldest users.
SELECT TOP 10 * FROM c ORDER BY c.age DESC
Example: Get the next 10 users after the first 10 (for pagination).
SELECT * FROM c ORDER BY c.name ASC OFFSET 10 LIMIT 10
Parameterized Queries
Parameterized queries are essential for security and performance. They prevent SQL injection and allow the query plan to be reused.
SELECT * FROM c WHERE c.name = @userName AND c.age > @minAge
When executing this query, you would provide values for @userName
and @minAge
separately.
Cross-Partition Queries
By default, Azure Cosmos DB executes queries within a single partition for optimal performance. If your query needs to span multiple partitions (and you haven't defined a partition key in your query), it's called a cross-partition query. These can be less performant and more costly.
To optimize, always include the partition key in your WHERE
clause whenever possible.
-- Example with partition key (assuming 'category' is the partition key)
SELECT * FROM c WHERE c.category = "electronics" AND c.price > 100
Further Learning
Explore the official Azure Cosmos DB documentation for advanced querying features, including:
- User-Defined Functions (UDFs)
- Stored Procedures
- Triggers
- Working with geospatial data
- Using different APIs (MongoDB, Gremlin, Cassandra)
Check out the Azure Cosmos DB SQL query documentation for detailed examples and syntax.