Azure AI ML Endpoints - Microsoft Learn

Azure Machine Learning Endpoints

Endpoints in Azure Machine Learning are the gateway to your trained machine learning models, allowing them to be consumed by applications and services. They provide a REST API interface for real-time inference and can also support batch scoring scenarios. This section delves into the different types of endpoints, how to create and manage them, and best practices for their deployment.

What are Azure ML Endpoints?

An endpoint acts as a managed web service that hosts your machine learning model. When you deploy a model to an endpoint, Azure ML provisions the necessary compute resources and configures a scalable, secure API for accessing your model.

Types of Endpoints

Azure Machine Learning supports two primary types of endpoints:

Online Endpoints: Designed for low-latency, real-time inference. These endpoints are ideal for scenarios where you need immediate predictions, such as powering a web application or an IoT device. They offer features like autoscaling and high availability.
Batch Endpoints: Optimized for scoring large volumes of data asynchronously. Batch endpoints are suitable for scenarios where you can afford to wait for predictions, like processing daily sales data or analyzing large image datasets.

Creating and Deploying Endpoints

You can create and deploy endpoints using various tools:

Azure Machine Learning Studio: A user-friendly web interface for managing your ML resources, including endpoints.
Azure CLI (ml extension): A command-line interface for scripting and automating ML tasks.
Python SDK: Programmatically interact with Azure ML services using Python.

Example: Deploying a model to an Online Endpoint (Conceptual)

The process typically involves packaging your model, creating an inference script, defining the environment, and then deploying it to an endpoint.

Key Concepts:

Model: Your trained machine learning artifact (e.g., a scikit-learn model, a TensorFlow graph, a PyTorch model).
Inference Script: A script (e.g., score.py) that defines how to load your model and make predictions.
Environment: Specifies the dependencies (libraries, runtime) required for your model to run.
Compute Target: The Azure ML managed compute resource where your endpoint will be deployed (e.g., managed online endpoints).

Example YAML for Online Endpoint Deployment:


$schema: "http://azureml/v1.0/endpoints.json"
name: my-online-endpoint
description: A sample online endpoint for model inference
auth_mode: key
throttle_settings:
  max_concurrent_requests_per_node: 1
  max_qps_per_endpoint: 100
  max_qps_per_instance: 50
endpoints:
  production:
    name: production
    description: Production deployment
    model:
      uri: azureml:my-model-name:1
    instance_type: Standard_DS3_v2
    instance_count: 1
    liveness_probe:
      path: /health
      initial_delay_seconds: 30
      period_seconds: 10
    readiness_probe:
      path: /health
      initial_delay_seconds: 30
      period_seconds: 10

Managing Endpoints

Once deployed, you can monitor the performance, scale your endpoints up or down, update the deployed model, and manage access keys through the Azure portal or the Azure CLI.

Best Practices

Choose the right endpoint type: Select online endpoints for real-time needs and batch endpoints for large-scale asynchronous processing.
Optimize your model: Ensure your model is efficient for inference to reduce latency and costs.
Implement health probes: Use liveness and readiness probes to ensure your endpoint is healthy and available.
Monitor performance: Regularly track metrics like latency, throughput, and error rates to identify and resolve issues.
Secure your endpoints: Utilize authentication and authorization mechanisms to protect your deployed models.

Learn More

Explore the official Azure Machine Learning documentation for in-depth guides, tutorials, and API references related to endpoints.

Deploy models with online endpoints in Azure Machine Learning

Deploy models with batch endpoints in Azure Machine Learning