Overview
Monitoring is a core capability of Azure Machine Learning that enables you to track model performance, resource utilization, and data drift in real time. It integrates with Azure Monitor and Azure Log Analytics to provide a unified view of your ML workloads.
Metrics
Azure ML automatically emits a set of built‑in metrics. You can also publish custom metrics.
| Metric | Description |
|---|---|
| ml.compute.cpu_percent | CPU utilization of the compute target. |
| ml.compute.memory_percent | Memory consumption. |
| ml.model.inference_latency | Latency of model inference calls. |
| ml.model.accuracy | Real‑time model accuracy (if logged). |
Alerts
Configure alerts to be notified when metrics breach thresholds.
Log Streaming
Stream logs from training runs and deployments directly to Azure Log Analytics.
API Reference
Use the Azure ML Python SDK to retrieve monitoring data programmatically.
from azure.identity import DefaultAzureCredential
from azure.ai.ml import MLClient
ml_client = MLClient(DefaultAzureCredential(), subscription_id, resource_group, workspace_name)
metrics = ml_client.monitoring.list_metrics(run_id="my-run-id")
for m in metrics:
print(f"{m.name}: {m.value}")
Sample Code
Deploy a model with monitoring enabled.
from azure.ai.ml import command, Job
from azure.ai.ml.entities import Model
model = Model(name="my-model", path="./model")
job = command(
display_name="score-job",
command="python score.py",
compute="cpu-cluster",
environment="AzureML-sklearn-0.24-ubuntu18.04-py37-cpu:1",
outputs={"model": model},
enable_monitoring=True
)
ml_client.jobs.create_or_update(job)
Best Practices
- Enable built‑in metrics from the start of your experiment.
- Set up alerting for critical thresholds such as latency or CPU usage.
- Ingest custom business KPIs to correlate model performance with business outcomes.
- Leverage Log Analytics workspaces to retain logs for compliance.
- Periodically review drift metrics and retrain models as needed.