Azure Machine Learning Documentation

Endpoints

Azure Machine Learning endpoints provide a managed way to expose your trained models as REST APIs. Choose the right endpoint type based on latency, scaling, and usage patterns.

Endpoint Types

TypeUse‑caseTypical latencyScaling
Real‑time (Online) endpointLow‑latency inferencing for web or mobile apps≤ 100 msAuto‑scale, cluster‑based
Batch endpointLarge‑scale asynchronous scoringMinutes‑hoursJob‑based compute
Managed online endpointServerless inferencing with consumption‑based pricing≈ 200 msDynamic, per‑request scaling

Create a Real‑time Endpoint

az ml online-endpoint create \
    --name my-ml-endpoint \
    --resource-group my-rg \
    --workspace my-workspace

Deploy a Model

az ml online-deployment create \
    --name my-deployment \
    --endpoint-name my-ml-endpoint \
    --model my-model:1 \
    --instance-type Standard_DS3_v2 \
    --instance-count 2

Invoke the Endpoint

import requests, json

url = "https://my-ml-endpoint.eastus2.inference.ml.azure.com/score"
headers = {
    "Content-Type": "application/json",
    "Authorization": "Bearer $TOKEN"
}
payload = {"data": [[5.1, 3.5, 1.4, 0.2]]}
response = requests.post(url, headers=headers, json=payload)
print(response.json())

Monitoring & Logging

Enable Application Insights when creating the endpoint to capture request/response logs, latency metrics, and custom dimensions.

az ml online-endpoint update \
    --name my-ml-endpoint \
    --set tags.environment=prod \
    --add identity.type=SystemAssigned \
    --set auth.mode=key \
    --set logging.enabled=true

Related Articles