How to Deploy a Model to Azure Machine Learning
This guide provides a step-by-step approach to deploying your trained machine learning models as web services in Azure Machine Learning. This allows you to make your models accessible for real-time predictions or batch scoring.
Prerequisites
- An Azure subscription.
- An Azure Machine Learning workspace.
- A trained machine learning model (e.g., scikit-learn, TensorFlow, PyTorch).
- The model's dependencies documented in a
requirements.txtfile. - An inference script (
score.py) that loads the model and handles prediction requests.
Understanding Deployment Components
Deploying a model involves several key components:
- Model: The serialized model file (e.g.,
.pkl,.h5). - Scoring Script (
score.py): A Python script that defines two functions:init(): Loads the model and any necessary data. This function is called once when the service starts.run(raw_data): Takes raw input data, preprocesses it, makes a prediction using the loaded model, and returns the prediction.
- Environment: Specifies the Python packages and other dependencies required by your model and scoring script. This is typically defined in a
conda.yamlorrequirements.txtfile. - Deployment Target: Where you deploy your model. Common targets include:
- Azure Kubernetes Service (AKS): For scalable, production-grade deployments.
- Azure Container Instances (ACI): For development, testing, or low-scale deployments.
Deployment Steps
Step 1: Register Your Model
Before deployment, your model needs to be registered in your Azure Machine Learning workspace. This allows for versioning and easy access.
# Example using Azure ML SDK
from azureml.core import Workspace, Model
ws = Workspace.from_config()
model = Model.register(workspace=ws,
model_path='path/to/your/model',
model_name='my-ml-model',
tags={'area': 'classification', 'type': 'xgboost'})
Step 2: Create a Scoring Script
The scoring script is the heart of your deployed service. It handles loading and prediction.
# score.py
import json
import numpy as np
import joblib # or tensorflow, torch, etc.
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
def init():
# This function is called when the container is started
global model
# Model path is the name of the registered model
model_path = 'my-ml-model' # Should match registered model name
model = joblib.load(model_path)
def run(raw_data):
# This function is called for every prediction request
try:
data = json.loads(raw_data)['data']
# Assuming data is a list of lists or similar
input_data = np.array(data)
# Preprocess input data if necessary (e.g., scaling)
# scaler = StandardScaler() # If scaler was saved with the model
# input_data = scaler.transform(input_data)
predictions = model.predict(input_data)
# Return the predictions in JSON format
return json.dumps({"predictions": predictions.tolist()})
except Exception as e:
error = str(e)
return json.dumps({"error": error})
Note: Ensure your model loading and prediction logic matches your model's framework and requirements.
Step 3: Define the Environment
Create a conda.yaml or use a requirements.txt file to specify dependencies.
# conda.yaml example
name: azureml_env
channels:
- defaults
dependencies:
- python=3.8
- pip
- pip:
- azureml-defaults
- numpy==1.20.3
- pandas==1.3.3
- scikit-learn==0.24.2
- xgboost==1.5.0 # Or your specific ML library
Step 4: Configure the Deployment Configuration
Specify the deployment target (ACI or AKS) and other configuration settings.
# Example for ACI deployment
from azureml.core.model import InferenceConfig, DeploymentConfig
from azureml.core.webservice import AciWebservice
# Load registered model
model_name = 'my-ml-model'
model_version = 1 # Or the latest version
model_to_deploy = Model(workspace=ws, name=model_name, version=model_version)
# Define inference configuration
inference_config = InferenceConfig(entry_script="score.py",
environment=ws.environments["my_custom_env"]) # If using custom env
# or environment_files=["conda.yaml"])
# Define deployment configuration
deployment_config = DeploymentConfig(cpu_cores=1, memory_gb=1)
# Create the service
aci_service_name = 'my-aci-service'
aci_service = AciWebservice.deploy_configuration(name=aci_service_name,
cpu_cores=1,
memory_gb=1,
description='Real-time classification service')
aci_service = Model.deploy(workspace=ws,
name=aci_service_name,
models=[model_to_deploy],
inference_config=inference_config,
deployment_config=aci_service,
overwrite=True)
aci_service.wait_for_deployment(show_output=True)
print(f"ACI service deployed: {aci_service.scoring_uri}")
For AKS deployment, you would use AksWebservice.deploy_configuration and Model.deploy with AKS configuration.
Step 5: Test Your Deployed Service
Once deployed, you can send requests to the scoring URI to get predictions.
# Example testing with ACI
import requests
import json
scoring_uri = aci_service.scoring_uri
headers = {'Content-Type': 'application/json'}
# Sample data to send for prediction
sample_input = {"data": [[1.0, 2.0, 3.0, 4.0]]} # Adjust based on your model's input
response = requests.post(scoring_uri, data=json.dumps(sample_input), headers=headers)
print(f"Status Code: {response.status_code}")
print(f"Response: {response.json()}")
Best Practices
- Containerization: Azure ML uses Docker containers for deployment, ensuring your environment is consistent.
- Versioning: Register multiple versions of your model to manage updates and rollbacks.
- Monitoring: Implement logging and monitoring for your deployed services to track performance and detect issues.
- Security: Use authentication and authorization mechanisms for your web services.
- Scalability: For high-traffic scenarios, consider deploying to Azure Kubernetes Service (AKS).