How to Consume Azure Machine Learning Models

This guide provides detailed instructions on how to consume and interact with machine learning models deployed in Azure Machine Learning. We will cover various methods for sending data to your deployed models and receiving predictions.

Introduction

Once your machine learning model is trained and deployed, the next crucial step is to integrate it into your applications or workflows. Azure Machine Learning offers flexible ways to consume your deployed models, enabling real-time inference or batch scoring.

Methods for Consuming Models

You can consume your Azure ML models through several primary mechanisms:

  • REST Endpoints: The most common method, allowing any application that can make HTTP requests to interact with your model.
  • SDKs: Utilize Azure Machine Learning SDKs (Python, .NET, etc.) for programmatic access and deeper integration.
  • Batch Endpoints: For scoring large datasets asynchronously.

1. Consuming via REST Endpoints

Azure Machine Learning deploys models as web services, accessible via REST APIs. This is a language-agnostic approach.

Prerequisites

  • A deployed model in Azure Machine Learning.
  • The scoring URI and authentication key for your endpoint.

Steps

Get Endpoint Details

Navigate to your deployed service in the Azure ML studio. Under the "Endpoints" tab, select your deployed model. You will find the Scoring URI and primary/secondary Keys.

Example:
Scoring URI: https://your-endpoint-name.your-region.azureml.ms/score
Primary Key: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Prepare Input Data

Your input data must be formatted according to what your model expects. This is typically JSON. For example, if your model expects a list of features:


{
  "data": [
    [1.0, 2.5, 3.1],
    [4.2, 5.0, 6.7]
  ]
}
                    

The exact structure depends on your model's input schema.

Send Inference Request

Use an HTTP client (like curl, Postman, or programming language libraries) to send a POST request to the scoring URI.


curl -X POST \
  https://your-endpoint-name.your-region.azureml.ms/score \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer YOUR_PRIMARY_KEY' \
  -d '{
        "data": [
          [1.0, 2.5, 3.1],
          [4.2, 5.0, 6.7]
        ]
      }'
                    

Receive and Process Predictions

The response will contain the predictions in a JSON format. The structure will vary based on your model's output.


{
  "predictions": [0, 1]
}
                    

Authentication

For authenticated endpoints, you'll need to include an Authorization header with your key. For token-based authentication, follow the specific token acquisition process.

2. Consuming via Azure Machine Learning SDK (Python)

The Python SDK provides a more integrated experience for interacting with deployed models.

Prerequisites

  • Azure ML SDK installed (pip install azureml-core).
  • An authenticated Workspace object.
  • The name or ID of your deployed model endpoint.

Steps

Get the Deployed Service Object


from azureml.core.workspace import Workspace
from azureml.core.model import InferenceConfig, Model
from azureml.core.webservice import AciWebservice, Webservice

# Load workspace
ws = Workspace.from_config() # Assumes config.json is present or specified

# Get the deployed service
service_name = 'your-deployed-service-name'
service = Webservice(name=service_name, workspace=ws)
                    

Prepare Input Data

Data can be passed as Python dictionaries or lists, which the SDK will serialize to JSON.


input_data = {"data": [[1.0, 2.5, 3.1], [4.2, 5.0, 6.7]]}
                    

Invoke the Service


# For real-time inference
result = service.run(input_data=input_data)
print(result)
                    

Batch Scoring with SDK

For batch scoring, you'd typically use the BatchEndpoint or create a pipeline that uses the deployed model.

3. Consuming via Batch Endpoints

Batch endpoints are designed for scoring large volumes of data asynchronously. They are ideal for scenarios where you don't need immediate, real-time predictions.

Key Concepts

  • Input Data: Typically stored in Azure Blob Storage or ADLS Gen2.
  • Output: Predictions are written to a specified output location.
  • Asynchronous: You submit a job and poll for its completion.

Steps (Conceptual)

  1. Deploy your model to a Batch Endpoint.
  2. Prepare your input data and upload it to a supported storage location.
  3. Submit a batch inference job using the Azure ML SDK or CLI, specifying input and output locations.
  4. Monitor the job status until completion.
  5. Retrieve predictions from the output location.

Refer to the specific Azure ML documentation for Batch Endpoints for detailed command-line or SDK examples.

Best Practices

  • Error Handling: Implement robust error handling in your consuming application to gracefully manage API failures or unexpected responses.
  • Authentication: Securely manage your authentication keys. Consider using Azure Key Vault.
  • Data Validation: Validate input data before sending it to the model to prevent errors and ensure consistency.
  • Monitoring: Monitor your deployed endpoints for performance and usage patterns.