Azure / AI + Machine Learning / Azure Machine Learning / Managed Online Endpoints

Managed Online Endpoints in Azure Machine Learning

Managed online endpoints provide a fully managed experience for deploying and hosting real-time inference models in Azure Machine Learning. They simplify the process of making your machine learning models available for predictions with low latency and high throughput.

What are Managed Online Endpoints?

Managed online endpoints are REST APIs that allow your applications to send data to your deployed model and receive predictions in real-time. Azure Machine Learning handles the underlying infrastructure, including scaling, patching, and load balancing, so you can focus on your model and application.

Key Benefits:

Fully Managed Infrastructure: No need to manage virtual machines or Kubernetes clusters for your online deployments.
High Availability and Scalability: Automatically scales to handle varying workloads and ensures your endpoints are always available.
Low Latency: Optimized for real-time inference, providing fast response times.
Secure: Integrates with Azure Active Directory and network security features.
Cost-Effective: Pay only for the resources consumed by your deployments.

How to Create and Deploy to Managed Online Endpoints

1. Register and Mount Data (If Required)

Ensure your model and any associated data (e.g., preprocessing pipelines, lookup tables) are registered in Azure Machine Learning datastores or datasets.

2. Define the Scoring Script

Create a Python script (e.g., score.py) that defines how to load your model and how to perform inference. This script typically includes two functions:

init(): Called once when the endpoint is started. Use this to load your model into memory.
run(raw_data): Called for each inference request. It takes the raw input data, preprocesses it, performs inference using the loaded model, and returns the predictions.


import os
import json
import pickle
from azureml.core.model import Model

def init():
    global model
    # Retrieve the path to the model file
    model_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'), 'my_model.pkl')
    # Load the model
    with open(model_path, 'rb') as file:
        model = pickle.load(file)

def run(raw_data):
    # Deserialize JSON data
    data = json.loads(raw_data)['data']
    # Perform inference
    predictions = model.predict(data)
    # Return predictions as JSON
    return json.dumps({"predictions": predictions.tolist()})

3. Create an Environment

Define the Conda environment required for your scoring script, including any necessary Python packages and their versions.


name: azureml_deployment
dependencies:
  - python=3.8
  - pip:
    - azureml-defaults
    - scikit-learn
    - pandas

4. Create an Online Endpoint

You can create a managed online endpoint using the Azure portal, Azure CLI, or the Azure Machine Learning SDK.

Note: When creating an endpoint, you'll specify basic properties like name, description, and authentication type.

5. Deploy a Model to the Endpoint

Deploy your registered model to the created endpoint. This involves specifying the deployment name, the model to deploy, the scoring script, and the environment.


from azure.ai.ml import MLClient, model, online_endpoint, online_deployment
from azure.ai.ml.entities import Environment
from azure.identity import DefaultAzureCredential

# Authenticate and get ML client
ml_client = MLClient(
    DefaultAzureCredential(),
    "",
    "",
    "",
)

# Define the online endpoint
endpoint_name = "my-managed-endpoint"
endpoint = online_endpoint.OnlineEndpoint(
    name=endpoint_name,
    description="My online endpoint for real-time inference",
    auth_mode="key"
)
ml_client.online_endpoints.begin_create_or_update(endpoint).result()

# Define the online deployment
deployment_name = "my-blue-deployment"
model_asset = model.Model(path="azureml:my_model:1") # Replace with your registered model and version
env = Environment(
    conda_file="conda.yaml",
    image="mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04:latest"
)

deployment = online_deployment.OnlineDeployment(
    name=deployment_name,
    endpoint_name=endpoint_name,
    model=model_asset,
    environment=env,
    scoring_script="score.py",
    instance_type="Standard_DS2_v2", # Example instance type
    instance_count=1,
)

ml_client.online_deployments.begin_create_or_update(deployment).result()

Testing Your Managed Online Endpoint

Once deployed, you can test your endpoint by sending POST requests with your data. You can use tools like cURL, Postman, or the Azure Machine Learning SDK.


curl -X POST "YOUR_ENDPOINT_URL/score" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"data": [[1, 2, 3, 4], [5, 6, 7, 8]]}'

Monitoring and Management

Azure Machine Learning provides tools to monitor the performance of your managed online endpoints, including request rates, latency, error rates, and resource utilization. You can also manage deployments, scale them, and update them with new model versions.

Managed online endpoints are a powerful feature for delivering real-time AI services, making it easier than ever to integrate machine learning models into your applications.