Monitoring Azure Cosmos DB

Effective monitoring is crucial for understanding the performance, availability, and health of your Azure Cosmos DB resources. This guide covers the key metrics, logs, and tools you can use to monitor your Cosmos DB instances.

Key Monitoring Areas

Azure Cosmos DB provides comprehensive monitoring capabilities across several key areas:

Azure Monitor for Cosmos DB

Azure Monitor is the primary service for collecting and analyzing telemetry from your Azure resources. It offers integrated monitoring for Azure Cosmos DB.

Metrics

Azure Cosmos DB exposes a rich set of metrics that provide insights into your database's performance and health. Some of the most important metrics include:

You can access these metrics directly in the Azure portal under the "Monitoring" section of your Cosmos DB account. You can create custom dashboards, set alerts, and analyze trends over time.

Resource Utilization Example (Metrics Chart)

Below is a visual representation of how Request Units per second (RU/s) might look:

Simulated RU/s Consumption Over Time

Simulated RU/s Consumption Graph

This is a placeholder for a real-time graph.

Logs

Azure Cosmos DB generates diagnostic logs that can be sent to various destinations, including Log Analytics, Storage Accounts, and Event Hubs. These logs are invaluable for detailed troubleshooting.

Types of Diagnostic Logs:

To enable diagnostic logs, navigate to your Cosmos DB account in the Azure portal, go to "Diagnostic settings," and configure where you want to send the logs.

Log Analytics Queries (Kusto Query Language - KQL)

Using Azure Log Analytics, you can query your Cosmos DB logs to gain deeper insights. Here are some example KQL queries:

// Total requests per hour
AzureDiagnostics
| where ResourceProvider == "MICROSOFT.DOCUMENTDB" and Category == "OperationLogs"
| summarize count() by bin(TimeGenerated, 1h)
| render timechart
// Latency distribution for read operations
AzureDiagnostics
| where ResourceProvider == "MICROSOFT.DOCUMENTDB" and Category == "OperationLogs" and OperationName == "ReadDocument"
| extend LatencyMs = todouble(DurationMs)
| summarize count() by LatencyMs
| render barchart with (x: LatencyMs, y: count_)
// Count of failed requests by error code
AzureDiagnostics
| where ResourceProvider == "MICROSOFT.DOCUMENTDB" and Category == "OperationLogs" and ResultType == "Failed"
| summarize count() by HttpStatus, SubStatus
| order by count_ desc

Alerts

Set up alerts in Azure Monitor to notify you when specific conditions are met. This proactive approach helps you respond quickly to potential issues.

Common Alerting Scenarios:

Configure alert rules to trigger notifications via email, SMS, or other action groups.

Azure Service Health

Azure Service Health provides information about Azure service incidents and planned maintenance that may affect your Cosmos DB resources. It's essential to monitor this for any service-impacting events.

Tools and Integrations

Best Practice: Regularly review your monitoring data and adjust your provisioning and configuration to optimize performance and cost.
Important: Ensure you have appropriate permissions to access and configure monitoring settings for your Azure Cosmos DB resources.