Monitoring Azure Cosmos DB

Effective monitoring is crucial for understanding the health, performance, and usage patterns of your Azure Cosmos DB accounts. This document provides a comprehensive guide to the various tools and metrics available for monitoring your Cosmos DB resources.

Key Monitoring Tools

Azure Cosmos DB integrates with several Azure monitoring services:

  • Azure Monitor: The primary monitoring solution for Azure resources. It collects and analyzes telemetry data, allowing you to visualize performance, detect anomalies, and set up alerts.
  • Azure Log Analytics: A service within Azure Monitor that provides a powerful query language (Kusto Query Language - KQL) for analyzing logs and performance data.
  • Azure Advisor: Offers recommendations to optimize performance, security, and cost for your Azure resources.

Metrics for Monitoring

Azure Cosmos DB exposes a rich set of metrics. Some of the most important ones include:

Throughput Metrics

  • RU Consumption: Request Units consumed by your operations. Essential for capacity planning and cost management.
  • Provisioned RU/s: The total Request Units provisioned for your container or database.
  • Max RU/s: The maximum RU/s observed during a given interval.
  • Throttled Requests: The number of requests that were throttled due to exceeding provisioned RU/s.

Latency Metrics

  • Average Latency: The average time taken for requests to complete.
  • Max Latency: The maximum latency observed for requests.
  • Read Latency: Latency specifically for read operations.
  • Write Latency: Latency specifically for write operations.
  • System Latency: The time taken by the Azure Cosmos DB service to process the request.

Storage Metrics

  • Document Count: The number of documents stored in your container.
  • Total Storage: The total amount of storage consumed by your container.

Availability Metrics

  • Availability: The percentage of time your Cosmos DB endpoint is available.

Using Azure Monitor Dashboards

You can create custom dashboards in Azure Monitor to visualize key metrics for your Cosmos DB account. Here’s how:

  1. Navigate to the Azure portal and select your Cosmos DB account.
  2. In the left-hand menu, under "Monitoring," click on "Metrics."
  3. Choose the metrics you want to visualize (e.g., RU Consumption, Latency).
  4. Select the desired time range and aggregation (e.g., Average, Maximum).
  5. Click "Pin to dashboard" to add the chart to a new or existing dashboard.

Tip

Create separate dashboards for different aspects of monitoring, such as performance, cost, and health.

Setting Up Alerts

Proactive alerting helps you respond quickly to issues. Configure alerts in Azure Monitor based on specific metric thresholds:

  • Navigate to the Azure portal and select your Cosmos DB account.
  • In the left-hand menu, under "Monitoring," click on "Alerts."
  • Click "+ Create alert rule."
  • Define the scope (your Cosmos DB resource).
  • Configure the condition (e.g., "Throttled Requests" is greater than 0 over 5 minutes).
  • Specify the action group (e.g., send an email, trigger a webhook).
  • Give your alert a name and description.

Common alert scenarios include:

  • High RU consumption or throttling
  • Increased latency
  • Low availability
  • Significant increases in storage usage

Querying Logs with Log Analytics

To gain deeper insights, you can export Cosmos DB diagnostic logs to Log Analytics. This allows you to run complex queries for troubleshooting and analysis.

Enabling Diagnostic Settings

  1. Navigate to your Cosmos DB account in the Azure portal.
  2. Under "Monitoring," select "Diagnostic settings."
  3. Click "+ Add diagnostic setting."
  4. Select the logs and metrics you want to archive. Choose "Send to Log Analytics workspace."
  5. Select your Log Analytics workspace.

Example Log Analytics Queries

Here are some example KQL queries:


// Get all throttled requests in the last 24 hours
AzureDiagnostics
| where ResourceProvider == "MICROSOFT.DOCUMENTDB" and Category == "DataRequests"
| where StatusCode == "429" // 429 is the status code for throttling
| project TimeGenerated, Resource, OperationName, StatusCode, SubStatusCode, DurationMs, ResponseBody
| order by TimeGenerated desc
                

// Average read and write latency per operation type in the last hour
AzureDiagnostics
| where ResourceProvider == "MICROSOFT.DOCUMENTDB" and Category == "DataRequests"
| summarize avgReadLatency = avg(ReadLatency), avgWriteLatency = avg(WriteLatency) by OperationName
| project OperationName, avgReadLatency, avgWriteLatency
                

Important

Ensure you understand the cost implications of sending logs to Log Analytics and querying large datasets.

Azure Advisor Recommendations

Azure Advisor analyzes your resource configuration and usage telemetry to provide recommendations. For Cosmos DB, Advisor can offer insights into:

  • Performance: Recommending adjustments to RU/s based on observed usage.
  • Cost: Identifying underutilized resources that could be scaled down.
  • High Availability: Suggesting geo-replication configurations.

Review these recommendations regularly in the Azure portal to optimize your Cosmos DB deployment.

Monitoring Best Practices

  • Define SLOs: Establish Service Level Objectives for key metrics like latency and availability.
  • Set up Alerts: Configure alerts for critical thresholds to enable proactive response.
  • Regularly Review Dashboards: Use Azure Monitor dashboards to get a quick overview of your database's health.
  • Analyze Trends: Look for patterns in metrics over time to understand performance and capacity needs.
  • Use Log Analytics for Deep Dives: Leverage Log Analytics for detailed troubleshooting when issues arise.
  • Monitor Across Regions: If using multi-region writes, monitor latency and availability in each region.