Deploy Machine Learning Models

Deploying a trained model to an Azure Machine Learning endpoint enables you to expose it as a RESTful service for real‑time inference. This guide walks through three common deployment methods: Azure portal, Azure CLI, and the Azure ML Python SDK.

Prerequisites

Azure subscription with Machine Learning workspace created.
Trained model registered in the workspace.
Azure CLI az version 2.30+ installed.
Python 3.8+ and azureml-sdk installed for SDK method.

Deploy via Azure portal

1. Navigate to your Azure Machine Learning workspace.

2. Select Models → choose the model you want to deploy → click Deploy.

3. In the Deployments blade, select Real‑time endpoint and configure:

Compute type: Azure Container Instance or Azure Kubernetes Service
Instance size (e.g., Standard_DS2_v2)
Scaling rules (optional)

4. Click Deploy. The endpoint will be provisioned in a few minutes.

Deploy via Azure CLI

Use the az ml extension to create a deployment YAML and submit it.

# Install ML extension if not present
az extension add -n ml

# Create a deployment config file (deploy.yml)
cat > deploy.yml <<EOF
$schema: https://azuremlschemas.azureedge.net/latest/onlineEndpoint.schema.json
name: my-ml-endpoint
auth_mode: aml_token
type: online
tier: Basic
EOF

# Deploy the model
az ml model deploy -n my-ml-deployment \
   --model my-model:1 \
   --endpoint-name my-ml-endpoint \
   --instance-type Standard_DS2_v2 \
   --instance-count 1
EOF

Deploy via Python SDK

Below is a minimal example using the Azure ML SDK to create an online endpoint and deploy a model.

from azure.ai.ml import MLClient
from azure.ai.ml.entities import (
    ManagedOnlineEndpoint,
    ManagedOnlineDeployment,
    Model,
)

# Initialize client
ml_client = MLClient(
    subscription_id="YOUR_SUBSCRIPTION_ID",
    resource_group_name="YOUR_RESOURCE_GROUP",
    workspace_name="YOUR_WORKSPACE_NAME",
)

# Register model (if not already registered)
model = Model(
    path="path/to/model.pkl",
    name="my-model",
    type="custom_model",
)
ml_client.models.create_or_update(model)

# Create endpoint
endpoint = ManagedOnlineEndpoint(
    name="my-ml-endpoint",
    auth_mode="aml_token",
    traffic={"default": 100},
)
ml_client.endpoints.create_or_update(endpoint)

# Create deployment
deployment = ManagedOnlineDeployment(
    name="my-deployment",
    endpoint_name=endpoint.name,
    model=model,
    instance_type="Standard_DS2_v2",
    instance_count=1,
)
ml_client.online_deployments.create_or_update(deployment)

print(f"Endpoint URL: {endpoint.scoring_uri}")

Monitoring & Scaling

Azure ML provides built-in monitoring for latency, request count, and CPU/memory usage.

Navigate to your endpoint → Metrics to view real‑time graphs.
Enable auto‑scale on AKS deployments via autoscale_policy in the deployment YAML or SDK.

FAQs

Can I deploy multiple models to the same endpoint?

Yes. Use traffic routing to split requests between different deployments.

How do I secure the endpoint?

Set auth_mode to key or aad_token and provide the appropriate keys or tokens in client requests.

What is the price model?

Pricing is based on the compute type (ACI or AKS) and the number of instances. See the pricing page for details.