Azure AI Machine Learning Documentation

Monitoring Azure AI Machine Learning

This section provides detailed information on how to monitor your Azure AI Machine Learning resources and deployed models. Effective monitoring is crucial for understanding performance, detecting issues, and ensuring the reliability of your machine learning solutions.

Key Monitoring Areas

Monitoring Data

Understand how to collect and analyze data related to your model's predictions and inputs. This includes data drift detection and monitoring for data quality issues.

Data Drift Detection

Azure AI Machine Learning provides capabilities to detect drift in your data. Data drift occurs when the statistical properties of the incoming data change over time compared to the training data. This can significantly impact model performance.


# Example of setting up data drift monitoring in Python SDK
from azure.ai.ml import MLClient
from azure.ai.ml.entities import DataDriftMonitor, DataDriftSignal, DataDriftMetric

ml_client = MLClient.from_config(credential=..., subscription_id=..., resource_group=..., workspace_name=...)

# Define your training and production data inputs
training_data = DataInput(type="uri_folder", path="azureml://datastores/workspaceblobstore/paths/training_data")
production_data = DataInput(type="uri_folder", path="azureml://datastores/workspaceblobstore/paths/production_data")

# Configure the data drift signal
data_drift_signal = DataDriftSignal(
    data_drift_metric=DataDriftMetric.ALL,
    target_column_name="target",
    time_column_name="timestamp"
)

# Create the data drift monitor
data_drift_monitor = DataDriftMonitor(
    display_name="MyDataDriftMonitor",
    description="Monitor for data drift in production data",
    target_data=production_data,
    baseline_data=training_data,
    signal=data_drift_signal,
    schedule=Schedule(interval="1", interval_unit="DAY") # Check daily
)

ml_client.monitoring.create_or_update(data_drift_monitor)
                

Model Performance Monitoring

Track the accuracy, precision, recall, and other relevant metrics of your deployed models in real-time. This helps identify performance degradation.

Metrics Collection

When deploying a model to an endpoint, you can enable the collection of inference logs and metrics. These metrics are often sent to Azure Application Insights for detailed analysis.

Metric Description
Inference Latency Time taken to process an inference request.
Inference Throughput Number of inference requests processed per unit of time.
Error Rate Percentage of requests that resulted in an error.
Prediction Distribution Distribution of predicted labels or values.

Infrastructure and Resource Metrics

Monitor the underlying infrastructure supporting your Azure AI Machine Learning workloads. This includes compute usage, network traffic, and storage utilization.

Azure Monitor Integration

Azure AI Machine Learning integrates with Azure Monitor, allowing you to view and analyze metrics related to your workspace, compute clusters, and endpoints. Key metrics include:

  • CPU and Memory Utilization
  • Network In/Out
  • Disk IOPS
  • Queue Lengths (for batch endpoints)

Logging and Auditing

Understand how to access and interpret logs generated by your Azure AI Machine Learning services. This is essential for debugging and troubleshooting.

Inference Logs

Logs from deployed endpoints capture details about incoming requests, model predictions, and any errors encountered during inference. These can be configured to be sent to Azure Application Insights or Log Analytics.

Audit Logs

Azure Activity Logs provide a record of operations performed on your Azure resources, including Azure AI Machine Learning workspace. This helps track who did what, when, and to which resource.

Alerting and Notifications

Set up alerts to be notified proactively when critical issues or performance thresholds are met. This enables timely intervention.

Tip: Configure alerts for high error rates, significant data drift detection, or exceeding resource utilization limits to ensure your ML solutions remain operational and performant.

Creating Alerts

Alerts can be configured through Azure Monitor based on specific metrics, log queries, or activity log events. You can define rules to trigger notifications via email, SMS, or webhooks.