Deploy a model to an Azure Machine Learning endpoint

This quickstart guide walks you through deploying a machine learning model to a real-time endpoint using Azure Machine Learning. This allows you to make predictions on new data in real-time.

Prerequisites

An Azure subscription. If you don't have one, create a free account.
An Azure Machine Learning workspace.
A trained machine learning model registered in your workspace.
Azure CLI and the Azure Machine Learning extension installed.

Step 1: Create an inference script

An inference script (score.py) defines how to load your model and make predictions. It must contain two functions:

init(): Called once when the service is loaded. Used for loading the model.
run(raw_data): Called for each inference request.


import os
import json
import numpy as np
from azureml.core.model import Model

def init():
    global model
    # This name is registered in the Azure ML workspace
    model_path = Model.get_model_path('my-sklearn-model')
    model = joblib.load(model_path)

def run(raw_data):
    try:
        data = json.loads(raw_data)['data']
        data = np.array(data)
        result = model.predict(data)
        return json.dumps({'result': result.tolist()})
    except Exception as e:
        error = str(e)
        return json.dumps({'error': error})

Step 2: Define the environment

Specify the dependencies your inference script needs using a Conda environment file (conda.yaml).


name: azureml_deploy
channels:
  - conda-forge
dependencies:
  - python=3.8
  - pip
  - pip:
    - azureml-defaults
    - scikit-learn
    - joblib
    - numpy

Step 3: Create the deployment configuration

Define the compute resources for your endpoint. For real-time endpoints, you typically use managed online endpoints.

You can use the Azure CLI to create an inference configuration. This involves specifying the entry script and the Conda environment.


az ml model deploy --name my-model-endpoint \
                   --model-path 'azureml:my-sklearn-model:1' \
                   --file-path 'score.py' \
                   --environment-file 'conda.yaml' \
                   --resource-group my-resource-group \
                   --workspace my-workspace \
                   --instance-type Standard_DS2_v2 \
                   --instance-count 1

This command registers your model, creates an inference configuration, and deploys it to a managed online endpoint. Replace my-sklearn-model with the name and version of your registered model, and adjust resource group and workspace names accordingly.

Step 4: Test the endpoint

Once the deployment is complete, you can test the endpoint by sending a POST request with sample data.

First, retrieve the scoring URI and key:


az ml endpoint show --name my-model-endpoint --query scoring_uri
az ml endpoint show --name my-model-endpoint --query primary_key

Then, use a tool like curl to send a request:


SCORING_URI=$(az ml endpoint show --name my-model-endpoint --query scoring_uri -o tsv)
PRIMARY_KEY=$(az ml endpoint show --name my-model-endpoint --query primary_key -o tsv)

curl -X POST \
     -H "Authorization: Bearer $PRIMARY_KEY" \
     -H "Content-Type: application/json" \
     -d '{"data": [[1, 2, 3, 4]]}' \
     "$SCORING_URI/score"

The response will contain the predictions from your model.

Next Steps

Learn more about managed online endpoints.
Explore options for batch scoring.
Discover how to monitor your deployed models.