Introduction to Performance
Azure Cosmos DB is a globally distributed, multi-model database service that enables you to rapidly develop and scale high-performance applications. Achieving optimal performance is crucial for user experience, cost-efficiency, and application reliability. This guide provides key tips and best practices.
Indexing Strategies
Cosmos DB automatically indexes data, but understanding and optimizing this process can significantly boost query performance.
Automatic Indexing Policy
By default, Cosmos DB indexes all properties of your documents. For optimal performance, consider customizing the indexing policy to include only the fields you frequently query or filter on. This reduces indexing overhead and storage.
Example: Exclude large or infrequently used fields.
{
"indexingMode": "consistent",
"automatic": true,
"includePaths": [
{ "path": "/*" }
],
"excludePaths": [
{ "path": "/largeField/*" },
{ "path": "/metadata/internal/*" }
]
}
Composite Indexes
When queries involve multiple filter conditions on different properties, composite indexes can dramatically improve performance by allowing Cosmos DB to satisfy the query with a single index lookup.
Example: For queries filtering by status and then timestamp.
{
"path": "/status",
"order": "ascending"
},
{
"path": "/timestamp",
"order": "descending"
}
Range Indexes for Numerical/Date Data
Ensure that numerical and date/time fields intended for range queries (e.g., >, <) are indexed correctly with appropriate paths.
Effective Partitioning
A well-chosen partition key is fundamental to distributing your data and request load evenly across partitions, preventing hot partitions.
Choose a High-Cardinality Partition Key
Select a partition key with a large number of distinct values. This ensures that data is spread across many logical partitions, leading to better scalability and request distribution.
Good choices: User IDs, Session IDs, Device IDs.
Poor choices: Boolean flags, Status fields with few unique values.
Avoid Hot Partitions
Monitor your partition usage. If a single partition is consistently consuming a disproportionate amount of Request Units (RUs) or storing significantly more data, your partition key strategy may need adjustment.
Partition Key Size Limits
Be aware of the 20GB per logical partition limit. Design your partition key to ensure that individual partitions do not grow excessively large.
Throughput Management (RUs)
Request Units (RUs) are the measure of throughput in Cosmos DB. Efficiently managing RUs impacts both performance and cost.
Autoscale vs. Manual Throughput
Autoscale is ideal for unpredictable workloads, automatically scaling RUs up and down based on usage. Manual throughput is suitable for predictable, steady workloads where you can precisely provision.
Consider autoscale for development and testing environments, and potentially for production if your traffic patterns are highly variable.
Provision Throughput at the Container Level
For shared container scenarios, provision throughput at the container level. For workloads with distinct performance requirements, consider provisioning throughput at the database level and enabling autoscale for individual containers.
Optimize Operations for RU Efficiency
Understand the RU cost of different operations. Point reads and writes are generally cheaper than complex queries. Design your application to use the most cost-effective operations possible.
Tip: Use stored procedures for bulk operations to reduce network latency and RU costs.
Batching Operations
When performing multiple inserts or updates, batch them into a single transaction or stored procedure. This is far more efficient than issuing individual requests.
Query Optimization
Write efficient queries that leverage indexes and minimize unnecessary data retrieval.
Use System Functions Wisely
Functions like LOWER(), UPPER(), or mathematical functions applied to indexed fields can prevent index usage. If possible, store data in the desired case or format.
Avoid `SELECT *`
Only project the fields you need. Selecting all fields in a large document increases network traffic and processing overhead.
Example:
SELECT c.id, c.name, c.email FROM c WHERE c.isActive = true
Leverage the Azure Cosmos DB Emulator
Test your queries against the Cosmos DB Emulator. It provides a local development environment to debug and optimize queries without incurring cloud costs.
Consider Stored Procedures and User-Defined Functions (UDFs)
For complex logic or operations that need to be performed transactionally on the server, stored procedures and UDFs can improve performance by reducing network round trips.
Connection Management
Efficiently managing connections to Cosmos DB can prevent performance bottlenecks.
Use SDKs and Keep Connections Warm
Use the official Azure Cosmos DB SDKs. These SDKs implement efficient connection pooling and retry logic. Avoid creating new client instances for every operation; reuse a single client instance throughout your application's lifecycle.
Example (conceptual):
// Initialize once
var client = new CosmosClient("YOUR_COSMOS_DB_CONNECTION_STRING");
// Reuse client for all operations
var container = client.GetContainer("your_database", "your_container");
// ... perform operations using container ...
Tune Request Options
For SDKs, tune options like MaxRetryAttemptsOnRateLimitedOperations and MaxRetryWaitTimeOnRateLimitedOperations to gracefully handle throttling.
Consider Gateway vs. Direct Mode
The SDKs typically default to Direct Mode (TCP), which offers lower latency. Gateway Mode (HTTPS) might be preferred in certain network environments or for simplicity.