Azure Machine Learning Monitoring

Overview

Monitoring is a core capability of Azure Machine Learning that enables you to track model performance, resource utilization, and data drift in real time. It integrates with Azure Monitor and Azure Log Analytics to provide a unified view of your ML workloads.

Metrics

Azure ML automatically emits a set of built‑in metrics. You can also publish custom metrics.

MetricDescription
ml.compute.cpu_percentCPU utilization of the compute target.
ml.compute.memory_percentMemory consumption.
ml.model.inference_latencyLatency of model inference calls.
ml.model.accuracyReal‑time model accuracy (if logged).

Alerts

Configure alerts to be notified when metrics breach thresholds.

Log Streaming

Stream logs from training runs and deployments directly to Azure Log Analytics.

API Reference

Use the Azure ML Python SDK to retrieve monitoring data programmatically.

from azure.identity import DefaultAzureCredential
from azure.ai.ml import MLClient

ml_client = MLClient(DefaultAzureCredential(), subscription_id, resource_group, workspace_name)

metrics = ml_client.monitoring.list_metrics(run_id="my-run-id")
for m in metrics:
    print(f"{m.name}: {m.value}")
      

Sample Code

Deploy a model with monitoring enabled.

from azure.ai.ml import command, Job
from azure.ai.ml.entities import Model

model = Model(name="my-model", path="./model")
job = command(
    display_name="score-job",
    command="python score.py",
    compute="cpu-cluster",
    environment="AzureML-sklearn-0.24-ubuntu18.04-py37-cpu:1",
    outputs={"model": model},
    enable_monitoring=True
)
ml_client.jobs.create_or_update(job)
      

Best Practices

  • Enable built‑in metrics from the start of your experiment.
  • Set up alerting for critical thresholds such as latency or CPU usage.
  • Ingest custom business KPIs to correlate model performance with business outcomes.
  • Leverage Log Analytics workspaces to retain logs for compliance.
  • Periodically review drift metrics and retrain models as needed.