Deploying Your Machine Learning Models with Azure ML
Deploying a machine learning model is the crucial step that makes your trained model available for use in real-world applications. Azure Machine Learning provides a robust and flexible platform for deploying models to various targets, enabling real-time inference, batch scoring, and integration with existing systems.
Key Deployment Concepts
- Models: The serialized artifact of your trained machine learning model (e.g., a Python pickle file, ONNX file).
- Environments: Define the dependencies (libraries, packages, runtime) required for your model to run.
- Inference Script: A Python script that loads your model and defines how to handle incoming requests (e.g., pre-processing input, running prediction, post-processing output).
- Scoring URI: The endpoint where your deployed model can be accessed for inference.
- Deployment Targets: The infrastructure where your model will be hosted, such as Azure Kubernetes Service (AKS), Azure Container Instances (ACI), or managed endpoints.
Deployment Targets
Azure ML supports several deployment targets, each suited for different scenarios:
Azure Container Instances (ACI)
Ideal for development, testing, and low-scale production. Offers quick deployment and management without complex infrastructure setup.
Azure Kubernetes Service (AKS)
Designed for scalable, highly available, and robust production workloads. Provides fine-grained control over deployments, scaling, and load balancing.
Managed Endpoints (Online & Batch]
Streamlined deployment experience for real-time (online) and batch scoring. Azure ML handles much of the infrastructure management, allowing you to focus on the model.
Steps to Deploy a Model
1. Register Your Model
Before deployment, your trained model needs to be registered with Azure ML. This allows you to version and manage your models.
from azureml.core import Workspace, Model
ws = Workspace.from_config()
model = Model.register(
workspace=ws,
model_path='outputs/my_model.pkl',
model_name='my-first-model',
tags={'area': 'classification', 'type': 'logistic-regression'},
description='My first deployed classification model'
)
print(f"Model registered: {model.name}, version {model.version}")
2. Create an Inference Configuration
Define the environment and the scoring script for your model.
inference_config.yml:
environments:
- name: azureml_deploy_env
image: mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04
conda_dependencies:
- python=3.8
- pip:
- azureml-defaults
- scikit-learn==0.24.2
- pandas
- inference-schema
- mlflow
git_ignore_file:
- .gitignore
code_path: ./scripts
entry_script: score.py
scripts/score.py:
import json
import joblib
import numpy as np
import os
def init():
# This function is called when the container is started.
# Load the model from the registered path.
global model
model_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'), 'my_model.pkl')
model = joblib.load(model_path)
def run(raw_data):
# This function is called for every inference request.
# Convert the input data to a numpy array.
data = json.loads(raw_data)['data']
input_data = np.array(data)
# Make a prediction.
prediction = model.predict(input_data)
# Return the prediction.
return json.dumps({"prediction": prediction.tolist()})
3. Create a Deployment Configuration
Specify the compute target and other deployment settings.
For ACI:
from azureml.core.webservice import AciWebservice
aci_config = AciWebservice.deploy_configuration(
cpu_cores=1,
memory_gb=1,
description='Deploy to ACI for testing'
)
For AKS:
from azureml.core.webservice import Webservice
from azureml.core.compute import AksCompute
# Assumes you have an AKS cluster named 'my-aks-cluster'
compute_target = AksCompute(ws, 'my-aks-cluster')
aks_config = Webservice.deploy_configuration(
description='Deploy to AKS for production',
cpu_cores=2,
memory_gb=4,
enable_app_insights=True
)
4. Deploy the Service
Use the registered model, inference config, and deployment config to create the web service.
from azureml.core.model import InferenceConfig
from azureml.core.webservice import Webservice
# Load your registered model
model = ws.models['my-first-model']
# Create the inference configuration
inference_config = InferenceConfig(
entry_script='score.py',
source_directory='./scripts',
environment_variables={
'AZUREML_MODEL_DIR': os.getenv('AZUREML_MODEL_DIR', './') # Default for local testing
}
)
# Deploy the model (e.g., to ACI)
service_name = 'my-aci-service'
service = Model.deploy(
workspace=ws,
name=service_name,
models=[model],
inference_config=inference_config,
deployment_config=aci_config, # or aks_config
overwrite=True
)
service.wait_for_deployment(show_output=True)
print(f"Service deployed at: {service.scoring_uri}")
Testing Your Deployed Service
Once deployed, you can send test requests to the scoring URI.
import requests
import json
uri = service.scoring_uri
headers = {'Content-Type': 'application/json'}
test_data = {"data": [[1, 2, 3, 4]]}
response = requests.post(uri, data=json.dumps(test_data), headers=headers)
if response.status_code == 200:
print(f"Prediction: {response.json()}")
else:
print(f"Error: {response.status_code} - {response.text}")
Managed Endpoints for Simplicity
For simpler deployment workflows, especially for real-time inference, consider using Azure ML Managed Endpoints. They abstract away much of the underlying infrastructure, offering a more streamlined path to production.
- Online Endpoints: For real-time, low-latency predictions.
- Batch Endpoints: For scoring large datasets asynchronously.
Managed endpoints can be deployed using the Azure ML SDK or CLI, offering a declarative approach to defining your deployment configuration.
For more detailed information and advanced scenarios, please refer to the official Azure ML documentation.