Deploy Models - Azure Machine Learning Documentation

Overview

Deploying a trained model to Azure Machine Learning enables you to expose it as a real‑time endpoint or a batch scoring pipeline. This guide walks you through the supported deployment options, best practices, and how to monitor your service once it’s live.

Prerequisites

An Azure subscription with access to Azure Machine Learning workspace.
Azure CLI 2.30+ or az ml extension installed.
Python 3.8‑3.11 with azureml-core and azureml-mlflow packages.
Trained model artifact (e.g., .pkl, .onnx, or .tar.gz).

Deploy with Azure ML SDK

from azureml.core import Workspace, Model, Environment, InferenceConfig
from azureml.core.webservice import AciWebservice, Webservice

ws = Workspace.from_config()
model = Model.register(workspace=ws,
                       model_path="outputs/model.pkl",
                       model_name="my-model")

env = Environment.from_conda_specification(
    name="my-env",
    file_path="environment.yml"
)

inference_config = InferenceConfig(entry_script="score.py", environment=env)

deployment_config = AciWebservice.deploy_configuration(cpu_cores=2,
                                                      memory_gb=4,
                                                      auth_enabled=True)

service = Model.deploy(workspace=ws,
                       name="my-model-service",
                       models=[model],
                       inference_config=inference_config,
                       deployment_config=deployment_config)
service.wait_for_deployment(show_output=True)

print(f"Scoring URI: {service.scoring_uri}")
print(f"Swagger URI: {service.swagger_uri}")

# Cell 1: Load workspace
ws = Workspace.from_config()

# Cell 2: Register model
model = Model.register(ws, model_path="model.pkl", model_name="my-model")

# Cell 3: Define env & inference
env = Environment.from_conda_specification("ml-env", "environment.yml")
inf_cfg = InferenceConfig(entry_script="score.py", environment=env)

# Cell 4: Deploy to ACI
aci_cfg = AciWebservice.deploy_configuration(cpu_cores=2, memory_gb=4)
svc = Model.deploy(ws, "my-model-svc", [model], inf_cfg, aci_cfg)
svc.wait_for_deployment(show_output=True)
svc.scoring_uri

Deploy with Azure CLI

Run the following commands from your terminal. Make sure you have az ml extension installed.

# Register the model
az ml model register -n my-model -f model.pkl

# Create an environment definition (environment.yml)
az ml environment create -f environment.yml -n my-env

# Deploy the model to Azure Container Instance
az ml online-endpoint create -n my-model-endpoint --auth-mode token
az ml online-deployment create -e my-model-endpoint -n default \
    --model my-model:1 \
    --environment my-env:latest \
    --instance-type Standard_DS2_v2 \
    --scale-minimum 1 --scale-maximum 5

Monitoring & Scaling

Azure Machine Learning provides built-in metrics and autoscaling. Use the portal or CLI to configure alerts.

Metrics: Requests, Latency, CPU, Memory.
Autoscale: Based on CPU or Requests thresholds.
Logging: Enable Application Insights for detailed request logs.

Example CLI to enable autoscaling:

az ml online-deployment update -e my-model-endpoint -n default \
    --scale-minimum 2 --scale-maximum 10 \
    --scale-rule-name cpu-rule \
    --scale-rule-type cpu \
    --scale-rule-threshold 70

Sample Code

Download a complete example project that includes score.py, environment.yml, and a Jupyter notebook.

Download Sample (ZIP)

FAQ

What is the difference between ACI and AKS deployments?

ACI (Azure Container Instances) is ideal for low‑traffic or testing scenarios. AKS (Azure Kubernetes Service) provides production‑grade scaling, high availability, and advanced networking.

How do I secure my endpoint?

Use token‑based authentication (--auth-mode token) and optionally restrict access with Azure Private Link. Enable HTTPS (default) and configure firewall rules.

Can I deploy multiple models to a single endpoint?

Yes. Use separate deployments within the same online endpoint and invoke the desired model by specifying the deployment name in the request URL.