Endpoints
Azure Machine Learning endpoints provide a managed way to expose your trained models as REST APIs. Choose the right endpoint type based on latency, scaling, and usage patterns.
Endpoint Types
| Type | Use‑case | Typical latency | Scaling |
|---|---|---|---|
| Real‑time (Online) endpoint | Low‑latency inferencing for web or mobile apps | ≤ 100 ms | Auto‑scale, cluster‑based |
| Batch endpoint | Large‑scale asynchronous scoring | Minutes‑hours | Job‑based compute |
| Managed online endpoint | Serverless inferencing with consumption‑based pricing | ≈ 200 ms | Dynamic, per‑request scaling |
Create a Real‑time Endpoint
az ml online-endpoint create \
--name my-ml-endpoint \
--resource-group my-rg \
--workspace my-workspace
Deploy a Model
az ml online-deployment create \
--name my-deployment \
--endpoint-name my-ml-endpoint \
--model my-model:1 \
--instance-type Standard_DS3_v2 \
--instance-count 2
Invoke the Endpoint
import requests, json
url = "https://my-ml-endpoint.eastus2.inference.ml.azure.com/score"
headers = {
"Content-Type": "application/json",
"Authorization": "Bearer $TOKEN"
}
payload = {"data": [[5.1, 3.5, 1.4, 0.2]]}
response = requests.post(url, headers=headers, json=payload)
print(response.json())
Monitoring & Logging
Enable Application Insights when creating the endpoint to capture request/response logs, latency metrics, and custom dimensions.
az ml online-endpoint update \
--name my-ml-endpoint \
--set tags.environment=prod \
--add identity.type=SystemAssigned \
--set auth.mode=key \
--set logging.enabled=true