Overview
Deploying a trained model to Azure Machine Learning enables you to expose it as a real‑time endpoint or a batch scoring pipeline. This guide walks you through the supported deployment options, best practices, and how to monitor your service once it’s live.
Prerequisites
- An Azure subscription with access to Azure Machine Learning workspace.
- Azure CLI
2.30+oraz mlextension installed. - Python
3.8‑3.11withazureml-coreandazureml-mlflowpackages. - Trained model artifact (e.g.,
.pkl,.onnx, or.tar.gz).
Deploy with Azure ML SDK
from azureml.core import Workspace, Model, Environment, InferenceConfig
from azureml.core.webservice import AciWebservice, Webservice
ws = Workspace.from_config()
model = Model.register(workspace=ws,
model_path="outputs/model.pkl",
model_name="my-model")
env = Environment.from_conda_specification(
name="my-env",
file_path="environment.yml"
)
inference_config = InferenceConfig(entry_script="score.py", environment=env)
deployment_config = AciWebservice.deploy_configuration(cpu_cores=2,
memory_gb=4,
auth_enabled=True)
service = Model.deploy(workspace=ws,
name="my-model-service",
models=[model],
inference_config=inference_config,
deployment_config=deployment_config)
service.wait_for_deployment(show_output=True)
print(f"Scoring URI: {service.scoring_uri}")
print(f"Swagger URI: {service.swagger_uri}")
# Cell 1: Load workspace
ws = Workspace.from_config()
# Cell 2: Register model
model = Model.register(ws, model_path="model.pkl", model_name="my-model")
# Cell 3: Define env & inference
env = Environment.from_conda_specification("ml-env", "environment.yml")
inf_cfg = InferenceConfig(entry_script="score.py", environment=env)
# Cell 4: Deploy to ACI
aci_cfg = AciWebservice.deploy_configuration(cpu_cores=2, memory_gb=4)
svc = Model.deploy(ws, "my-model-svc", [model], inf_cfg, aci_cfg)
svc.wait_for_deployment(show_output=True)
svc.scoring_uri
Deploy with Azure CLI
Run the following commands from your terminal. Make sure you have az ml extension installed.
# Register the model
az ml model register -n my-model -f model.pkl
# Create an environment definition (environment.yml)
az ml environment create -f environment.yml -n my-env
# Deploy the model to Azure Container Instance
az ml online-endpoint create -n my-model-endpoint --auth-mode token
az ml online-deployment create -e my-model-endpoint -n default \
--model my-model:1 \
--environment my-env:latest \
--instance-type Standard_DS2_v2 \
--scale-minimum 1 --scale-maximum 5
Monitoring & Scaling
Azure Machine Learning provides built-in metrics and autoscaling. Use the portal or CLI to configure alerts.
- Metrics:
Requests,Latency,CPU,Memory. - Autoscale: Based on
CPUorRequeststhresholds. - Logging: Enable Application Insights for detailed request logs.
Example CLI to enable autoscaling:
az ml online-deployment update -e my-model-endpoint -n default \
--scale-minimum 2 --scale-maximum 10 \
--scale-rule-name cpu-rule \
--scale-rule-type cpu \
--scale-rule-threshold 70
Sample Code
Download a complete example project that includes score.py, environment.yml, and a Jupyter notebook.
FAQ
What is the difference between ACI and AKS deployments?
ACI (Azure Container Instances) is ideal for low‑traffic or testing scenarios. AKS (Azure Kubernetes Service) provides production‑grade scaling, high availability, and advanced networking.
How do I secure my endpoint?
Use token‑based authentication (--auth-mode token) and optionally restrict access with Azure Private Link. Enable HTTPS (default) and configure firewall rules.
Can I deploy multiple models to a single endpoint?
Yes. Use separate deployments within the same online endpoint and invoke the desired model by specifying the deployment name in the request URL.