Metrics for Application Services
This document provides an in-depth look at the metrics available for monitoring your Azure App Services. Understanding these metrics is crucial for diagnosing performance issues, capacity planning, and ensuring the health and availability of your applications.
Key Metrics Categories
Metrics are broadly categorized to help you quickly identify areas of concern:
- Performance Metrics: These measure the responsiveness and efficiency of your application, such as request duration, CPU time, and memory usage.
- Resource Utilization: These metrics indicate how your App Service is using its allocated resources, including CPU, memory, disk, and network bandwidth.
- Request Metrics: These focus on the incoming traffic and responses, such as the number of requests, HTTP status codes, and request failures.
- Health Metrics: These provide insights into the overall health and availability of your application, including uptime and response times from a user's perspective.
Commonly Used Metrics
Here are some of the most frequently monitored metrics:
1. HTTP Server Errors
Description: The number of HTTP requests that resulted in a server-side error (HTTP status codes 5xx). A spike in this metric often indicates an issue within your application code or backend services.
How to interpret: Monitor this value closely. Any increase warrants immediate investigation. Correlate with other metrics like CPU usage, memory, and application logs.
Example:
HTTP Server Errors: 5 (Last Hour)
2. HTTP 500 Errors
Description: A specific subset of server errors, focusing on the ubiquitous 500 Internal Server Error. This is a direct indicator of an unexpected condition encountered by the server that prevented it from fulfilling the request.
How to interpret: This is a critical metric. Investigate application logs and recent code deployments when this metric rises.
3. CPU Time
Description: The total amount of processor time (in milliseconds) consumed by your App Service instance. This metric helps understand the computational load on your application.
How to interpret: Consistently high CPU usage might indicate inefficient code, a need for more powerful instances, or a sudden surge in traffic. Look for patterns and correlate with request volume.
Example Chart Data:
[
{"timestamp": "2023-10-27T10:00:00Z", "value": 150},
{"timestamp": "2023-10-27T10:05:00Z", "value": 165},
{"timestamp": "2023-10-27T10:10:00Z", "value": 180},
{"timestamp": "2023-10-27T10:15:00Z", "value": 170}
]
4. Memory Working Set
Description: The amount of physical memory (in MB) currently used by your App Service. This metric is vital for detecting memory leaks or excessive memory consumption.
How to interpret: A constantly increasing memory working set over time, without a corresponding increase in workload, suggests a memory leak. High memory usage can lead to performance degradation and application crashes.
5. Requests
Description: The total number of HTTP requests received by your App Service. This is a fundamental metric for understanding traffic patterns and load.
How to interpret: Monitor this to understand peak hours, identify trends, and correlate with other metrics to assess resource utilization under load.
6. Response Time
Description: The average time (in milliseconds) it takes for your App Service to respond to an HTTP request. This is a key indicator of user experience.
How to interpret: Increasing response times suggest performance bottlenecks. Investigate CPU, memory, disk I/O, and external dependencies when response times degrade.
Accessing and Visualizing Metrics
You can access and visualize these metrics through several interfaces:
- Azure Portal: Navigate to your App Service resource and select the "Metrics" blade.
- Azure CLI: Use commands like
az monitor metrics listto retrieve metric data programmatically. - Azure Monitor REST API: For custom integrations and advanced automation.
Configuring Alerts
It's highly recommended to configure alerts based on critical metrics. This allows you to be notified proactively when performance deviates from acceptable levels. For example, you can set up an alert for:
- HTTP Server Errors exceeding a certain threshold.
- CPU Time consistently above 80%.
- Response Time exceeding a defined SLA.
Refer to the Alerting Documentation for detailed steps on configuring alerts.