Azure Machine Learning

Endpoints

Azure Machine Learning endpoints provide a managed way to expose your trained models as REST APIs. Choose the right endpoint type based on latency, scaling, and usage patterns.

Endpoint Types

Type	Use‑case	Typical latency	Scaling
Real‑time (Online) endpoint	Low‑latency inferencing for web or mobile apps	≤ 100 ms	Auto‑scale, cluster‑based
Batch endpoint	Large‑scale asynchronous scoring	Minutes‑hours	Job‑based compute
Managed online endpoint	Serverless inferencing with consumption‑based pricing	≈ 200 ms	Dynamic, per‑request scaling

Create a Real‑time Endpoint

az ml online-endpoint create \
    --name my-ml-endpoint \
    --resource-group my-rg \
    --workspace my-workspace

Deploy a Model

az ml online-deployment create \
    --name my-deployment \
    --endpoint-name my-ml-endpoint \
    --model my-model:1 \
    --instance-type Standard_DS3_v2 \
    --instance-count 2

Invoke the Endpoint

import requests, json

url = "https://my-ml-endpoint.eastus2.inference.ml.azure.com/score"
headers = {
    "Content-Type": "application/json",
    "Authorization": "Bearer $TOKEN"
}
payload = {"data": [[5.1, 3.5, 1.4, 0.2]]}
response = requests.post(url, headers=headers, json=payload)
print(response.json())

Monitoring & Logging

Enable Application Insights when creating the endpoint to capture request/response logs, latency metrics, and custom dimensions.

az ml online-endpoint update \
    --name my-ml-endpoint \
    --set tags.environment=prod \
    --add identity.type=SystemAssigned \
    --set auth.mode=key \
    --set logging.enabled=true