Deploying Machine Learning Models with Azure ML

Deploying Your Machine Learning Models with Azure ML

Deploying a machine learning model is the crucial step that makes your trained model available for use in real-world applications. Azure Machine Learning provides a robust and flexible platform for deploying models to various targets, enabling real-time inference, batch scoring, and integration with existing systems.

Key Deployment Concepts

Models: The serialized artifact of your trained machine learning model (e.g., a Python pickle file, ONNX file).
Environments: Define the dependencies (libraries, packages, runtime) required for your model to run.
Inference Script: A Python script that loads your model and defines how to handle incoming requests (e.g., pre-processing input, running prediction, post-processing output).
Scoring URI: The endpoint where your deployed model can be accessed for inference.
Deployment Targets: The infrastructure where your model will be hosted, such as Azure Kubernetes Service (AKS), Azure Container Instances (ACI), or managed endpoints.

Deployment Targets

Azure ML supports several deployment targets, each suited for different scenarios:

Azure Container Instances (ACI)

Ideal for development, testing, and low-scale production. Offers quick deployment and management without complex infrastructure setup.

Azure Kubernetes Service (AKS)

Designed for scalable, highly available, and robust production workloads. Provides fine-grained control over deployments, scaling, and load balancing.

Managed Endpoints (Online & Batch]

Streamlined deployment experience for real-time (online) and batch scoring. Azure ML handles much of the infrastructure management, allowing you to focus on the model.

Steps to Deploy a Model

1. Register Your Model

Before deployment, your trained model needs to be registered with Azure ML. This allows you to version and manage your models.


from azureml.core import Workspace, Model

ws = Workspace.from_config()
model = Model.register(
    workspace=ws,
    model_path='outputs/my_model.pkl',
    model_name='my-first-model',
    tags={'area': 'classification', 'type': 'logistic-regression'},
    description='My first deployed classification model'
)
print(f"Model registered: {model.name}, version {model.version}")

2. Create an Inference Configuration

Define the environment and the scoring script for your model.

inference_config.yml:


environments:
  - name: azureml_deploy_env
    image: mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04
    conda_dependencies:
      - python=3.8
      - pip:
        - azureml-defaults
        - scikit-learn==0.24.2
        - pandas
        - inference-schema
        - mlflow
    git_ignore_file:
      - .gitignore

code_path: ./scripts
entry_script: score.py

scripts/score.py:


import json
import joblib
import numpy as np
import os

def init():
    # This function is called when the container is started.
    # Load the model from the registered path.
    global model
    model_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'), 'my_model.pkl')
    model = joblib.load(model_path)

def run(raw_data):
    # This function is called for every inference request.
    # Convert the input data to a numpy array.
    data = json.loads(raw_data)['data']
    input_data = np.array(data)

    # Make a prediction.
    prediction = model.predict(input_data)

    # Return the prediction.
    return json.dumps({"prediction": prediction.tolist()})

3. Create a Deployment Configuration

Specify the compute target and other deployment settings.

For ACI:


from azureml.core.webservice import AciWebservice

aci_config = AciWebservice.deploy_configuration(
    cpu_cores=1,
    memory_gb=1,
    description='Deploy to ACI for testing'
)

For AKS:


from azureml.core.webservice import Webservice
from azureml.core.compute import AksCompute

# Assumes you have an AKS cluster named 'my-aks-cluster'
compute_target = AksCompute(ws, 'my-aks-cluster')

aks_config = Webservice.deploy_configuration(
    description='Deploy to AKS for production',
    cpu_cores=2,
    memory_gb=4,
    enable_app_insights=True
)

4. Deploy the Service

Use the registered model, inference config, and deployment config to create the web service.


from azureml.core.model import InferenceConfig
from azureml.core.webservice import Webservice

# Load your registered model
model = ws.models['my-first-model']

# Create the inference configuration
inference_config = InferenceConfig(
    entry_script='score.py',
    source_directory='./scripts',
    environment_variables={
        'AZUREML_MODEL_DIR': os.getenv('AZUREML_MODEL_DIR', './') # Default for local testing
    }
)

# Deploy the model (e.g., to ACI)
service_name = 'my-aci-service'
service = Model.deploy(
    workspace=ws,
    name=service_name,
    models=[model],
    inference_config=inference_config,
    deployment_config=aci_config, # or aks_config
    overwrite=True
)
service.wait_for_deployment(show_output=True)

print(f"Service deployed at: {service.scoring_uri}")

Testing Your Deployed Service

Once deployed, you can send test requests to the scoring URI.


import requests
import json

uri = service.scoring_uri
headers = {'Content-Type': 'application/json'}

test_data = {"data": [[1, 2, 3, 4]]}

response = requests.post(uri, data=json.dumps(test_data), headers=headers)

if response.status_code == 200:
    print(f"Prediction: {response.json()}")
else:
    print(f"Error: {response.status_code} - {response.text}")

Managed Endpoints for Simplicity

For simpler deployment workflows, especially for real-time inference, consider using Azure ML Managed Endpoints. They abstract away much of the underlying infrastructure, offering a more streamlined path to production.

Online Endpoints: For real-time, low-latency predictions.
Batch Endpoints: For scoring large datasets asynchronously.

Managed endpoints can be deployed using the Azure ML SDK or CLI, offering a declarative approach to defining your deployment configuration.

For more detailed information and advanced scenarios, please refer to the official Azure ML documentation.