How to deploy Azure Machine Learning models

Introduction

Deploying your trained machine learning models to production is a critical step in bringing your AI solutions to life. Azure Machine Learning provides a comprehensive set of tools and services to help you deploy models as web services, integrate them with applications, and manage their lifecycle.

This guide will walk you through the common deployment scenarios and best practices for deploying models using Azure Machine Learning.

Deployment Targets

Azure Machine Learning supports deployment to various targets, allowing you to choose the best fit for your application's needs:

  • Azure Container Instances (ACI): Ideal for development, testing, and low-scale production. Provides a quick and easy way to deploy models without managing underlying infrastructure.
  • Azure Kubernetes Service (AKS): Recommended for production workloads requiring high scalability, availability, and management of complex deployments.
  • Azure Machine Learning Managed Endpoints: A fully managed inference service for real-time and batch scoring. Simplifies deployment and management.

Common Deployment Scenarios

Deploying to Azure Container Instances (ACI)

ACI is a great starting point for deploying your models. It's serverless and easy to set up.

Step 1: Create an Inference Configuration

This involves defining how your model will be served. You'll need a scoring script (e.g., score.py) and an environment.

# score.py
import json
import numpy as np
import pickle
from sklearn.externals import joblib # Or import from sklearn.datasets

def init():
    global model
    # Load the model from disk
    model_path = 'model.pkl' # Replace with your model file name
    model = joblib.load(model_path)

def run(raw_data):
    try:
        data = json.loads(raw_data)['data']
        data = np.array(data)
        # Make prediction
        result = model.predict(data)
        # You can return this to any JSON-friendly format
        return json.dumps({"result": result.tolist()})
    except Exception as e:
        error = str(e)
        return json.dumps({"error": error})
                            

Step 2: Create a Deployment Configuration

This specifies the compute resources needed for your deployment.

from azureml.core.webservice import AciWebservice
from azureml.core.model import InferenceConfig, Model

# Assuming you have an Azure ML Workspace and registered model 'my-model'
inference_config = InferenceConfig(entry_script="score.py", environment=myenv)
aci_config = AciWebservice.deploy_configuration(cpu_cores = 1, memory_gb = 1, description = 'Deploy my model to ACI')

service = Model.deploy(workspace=workspace,
                       name='my-aci-service',
                       models=[model], # Assuming 'model' is your registered model object
                       inference_config=inference_config,
                       deployment_config=aci_config,
                       overwrite=True)
service.wait_for_deployment(show_output=True)
                            

For more complex environments, consider using Dockerfiles with your Azure ML environment.

Deploying to Azure Kubernetes Service (AKS)

AKS provides a robust platform for scalable and highly available deployments.

Step 1: Create an AKS Cluster

You can create an AKS cluster via the Azure portal, Azure CLI, or SDK.

from azureml.core.compute import ComputeTarget, AksCompute

# Define the cluster provisioning configuration
prov_config = AksCompute.provisioning_configuration(location='eastus',
                                                   agent_count = 3,
                                                   vm_size = 'Standard_DS3_v2')

# Create the cluster
aks_target = ComputeTarget.create(workspace=workspace,
                                  name='my-aks-cluster',
                                  provisioning_configuration=prov_config)
aks_target.wait_for_completion(show_output=True)
                            

Step 2: Deploy to AKS

Similar to ACI, you define an inference configuration and a deployment configuration, but this time targeting your AKS cluster.

from azureml.core.webservice import AksWebservice
from azureml.core.model import InferenceConfig, Model

inference_config = InferenceConfig(entry_script="score.py", environment=myenv)
aks_config = AksWebservice.deploy_configuration(cpu_cores = 1, memory_gb = 1, autoscale_enabled=True, max_replicas=3)

service = Model.deploy(workspace=workspace,
                       name='my-aks-service',
                       models=[model], # Assuming 'model' is your registered model object
                       inference_config=inference_config,
                       deployment_config=aks_config,
                       compute_target=aks_target, # Specify your AKS compute target
                       overwrite=True)
service.wait_for_deployment(show_output=True)
                            

Ensure your AKS cluster has enough resources and is properly configured for your model's demands.

Azure Machine Learning Managed Endpoints

Managed endpoints offer a simplified and scalable way to deploy models. They abstract away much of the infrastructure management.

Real-time Endpoints

For low-latency, high-throughput inference.

Step 1: Create an Online Endpoint

Define the endpoint configuration.

from azureml.core.model import OnlineEndpoint, Model

# Create or get your model
# model = Model(...)

# Create an online endpoint
endpoint = OnlineEndpoint.create(workspace=workspace,
                                name='my-realtime-endpoint',
                                description='Real-time inference endpoint',
                                auth_mode='key')
                            

Step 2: Create a Deployment

Deploy your model to the endpoint.

from azureml.core.model import OnlineDeployment

deployment = OnlineDeployment.create(endpoint=endpoint,
                                     name='my-deployment',
                                     models=[model], # Your registered model
                                     environment=myenv, # Your environment
                                     code_path='.', # Path to your scoring script
                                     scoring_script='score.py',
                                     instance_type='Standard_DS2_v2',
                                     instance_count=1)
deployment.wait_for_deployment(show_output=True)
                            

Batch Endpoints

For scoring large datasets offline.

Deployment for batch endpoints follows a similar pattern, defining a batch deployment on a batch endpoint.

Managed endpoints simplify scaling, monitoring, and updating your deployed models.

Monitoring and Management

Once deployed, it's crucial to monitor your models for performance, drift, and errors.

  • Azure Monitor: Collects metrics and logs for your deployed services.
  • Application Insights: Provides detailed insights into application performance and usage.
  • Model performance tracking: Implement logging within your scoring script to track predictions and potential issues.

Conclusion

Azure Machine Learning offers flexible and powerful options for deploying your models. Whether you choose ACI for simplicity, AKS for robust production, or Managed Endpoints for ease of use, you can effectively bring your AI solutions to production environments.