Monitoring Azure Cosmos DB
Effective monitoring is crucial for understanding the performance, availability, and health of your Azure Cosmos DB resources. This guide covers the key metrics, logs, and tools you can use to monitor your Cosmos DB instances.
Key Monitoring Areas
Azure Cosmos DB provides comprehensive monitoring capabilities across several key areas:
- Performance Metrics: Track throughput, latency, and request rates.
- Availability: Monitor uptime and error rates.
- Resource Utilization: Observe storage usage and RU consumption.
- Costs: Understand your spending related to Cosmos DB operations.
Azure Monitor for Cosmos DB
Azure Monitor is the primary service for collecting and analyzing telemetry from your Azure resources. It offers integrated monitoring for Azure Cosmos DB.
Metrics
Azure Cosmos DB exposes a rich set of metrics that provide insights into your database's performance and health. Some of the most important metrics include:
- Total Request Units: The total Request Units (RUs) consumed by all operations.
- Database/Collection RUs: RUs consumed by specific databases or collections.
- Data Usage: The amount of storage consumed by your data.
- Throughput: The number of requests per second.
- Latency: The time it takes for requests to complete.
- Max/Avg Document Size: Useful for understanding data patterns.
- Successful/Failed Requests: Count of successful and failed operations.
You can access these metrics directly in the Azure portal under the "Monitoring" section of your Cosmos DB account. You can create custom dashboards, set alerts, and analyze trends over time.
Resource Utilization Example (Metrics Chart)
Below is a visual representation of how Request Units per second (RU/s) might look:
Simulated RU/s Consumption Over Time
This is a placeholder for a real-time graph.
Logs
Azure Cosmos DB generates diagnostic logs that can be sent to various destinations, including Log Analytics, Storage Accounts, and Event Hubs. These logs are invaluable for detailed troubleshooting.
Types of Diagnostic Logs:
- Operation Logs: Records of all operations performed against your Cosmos DB account (e.g., reads, writes, queries).
- Performance Logs: Detailed performance-related data.
- Audit Logs: Logs related to security and access control.
To enable diagnostic logs, navigate to your Cosmos DB account in the Azure portal, go to "Diagnostic settings," and configure where you want to send the logs.
Log Analytics Queries (Kusto Query Language - KQL)
Using Azure Log Analytics, you can query your Cosmos DB logs to gain deeper insights. Here are some example KQL queries:
// Total requests per hour
AzureDiagnostics
| where ResourceProvider == "MICROSOFT.DOCUMENTDB" and Category == "OperationLogs"
| summarize count() by bin(TimeGenerated, 1h)
| render timechart
// Latency distribution for read operations
AzureDiagnostics
| where ResourceProvider == "MICROSOFT.DOCUMENTDB" and Category == "OperationLogs" and OperationName == "ReadDocument"
| extend LatencyMs = todouble(DurationMs)
| summarize count() by LatencyMs
| render barchart with (x: LatencyMs, y: count_)
// Count of failed requests by error code
AzureDiagnostics
| where ResourceProvider == "MICROSOFT.DOCUMENTDB" and Category == "OperationLogs" and ResultType == "Failed"
| summarize count() by HttpStatus, SubStatus
| order by count_ desc
Alerts
Set up alerts in Azure Monitor to notify you when specific conditions are met. This proactive approach helps you respond quickly to potential issues.
Common Alerting Scenarios:
- High RU consumption approaching provisioned limits.
- Increased request latency.
- High number of failed requests.
- Significant changes in data usage.
Configure alert rules to trigger notifications via email, SMS, or other action groups.
Azure Service Health
Azure Service Health provides information about Azure service incidents and planned maintenance that may affect your Cosmos DB resources. It's essential to monitor this for any service-impacting events.
Tools and Integrations
- Azure Portal: The central hub for viewing metrics, logs, and configuring alerts.
- Azure Monitor Agent: For collecting logs and metrics from VMs and other resources.
- Log Analytics Workspace: For storing and querying diagnostic logs.
- Application Insights: To correlate application performance with Cosmos DB performance.
- Azure CLI / PowerShell: For programmatic monitoring and automation.