Train a Model with Azure Machine Learning

How to Train a Model with Azure Machine Learning

This guide provides step-by-step instructions on how to train a machine learning model using Azure Machine Learning, covering common scenarios and best practices.

1. Set up Your Environment

Before you can train a model, ensure you have the necessary tools and environment configured. This typically involves:

An Azure subscription.
An Azure Machine Learning workspace.
The Azure Machine Learning SDK for Python installed.
A compute target (e.g., Compute Instance, Compute Cluster).

For detailed setup instructions, refer to the "Create an Azure Machine Learning Workspace" guide.

2. Prepare Your Data

Data is the foundation of any machine learning model. Ensure your data is clean, properly formatted, and accessible by your Azure Machine Learning workspace.

You can upload your data directly to the workspace or mount it from Azure Blob Storage, Azure Data Lake Storage, or other supported data sources. Azure Machine Learning Data Assets help manage and version your data.

Tip: Use Azure Machine Learning Datasets to easily register and version your data for reproducible training runs.

3. Choose a Training Method

Azure Machine Learning supports various ways to train models:

Script Training: Train using your custom Python scripts.
Automated ML: Let Azure ML automatically find the best model and hyperparameters for your data.
Designer: Use a drag-and-drop interface to build and train models visually.

This guide focuses on script training.

4. Write Your Training Script

Your training script should perform the following:

Load data.
Preprocess data.
Define your model architecture.
Train the model.
Log metrics and save the model.

Here's a simplified example of a Python training script using Scikit-learn:


import os
import argparse
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from azureml.core import Run, Dataset
import pandas as pd

# Parse arguments
parser = argparse.ArgumentParser()
parser.add_argument('--data-path', type=str, help='Path to the training data')
parser.add_argument('--reg-rate', type=float, default=0.01, help='Regularization rate')
args = parser.parse_args()

# Get current run
run = Run.get_context()

# Load data
# Assume data is registered as a dataset with name 'my-iris-dataset'
# Or passed via input data binding
try:
    dataset = Dataset.get_by_name(run.experiment.workspace, 'my-iris-dataset')
    input_data = dataset.to_pandas_dataframe()
except Exception:
    # Fallback if dataset not registered or if path is provided directly
    input_data = pd.read_csv(args.data_path)

# Prepare data
X = input_data[['sepal_length', 'sepal_width', 'petal_length', 'petal_width']]
y = input_data['species']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train model
reg_rate = args.reg_rate
print(f'Regularization rate is set to: {reg_rate}')

model = LogisticRegression(C=1.0/reg_rate, solver='liblinear')
model.fit(X_train, y_train)

# Evaluate model
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy}')

# Log metrics
run.log('accuracy', accuracy)
run.log('regularization_rate', reg_rate)

# Save model
os.makedirs('outputs', exist_ok=True)
model_path = os.path.join('outputs', 'sklearn_model.pkl')
with open(model_path, 'wb') as f:
    pickle.dump(model, f)

print(f'Model saved to: {model_path}')
run.upload_file(name='outputs/sklearn_model.pkl', path_or_file=model_path)
run.complete()

5. Submit a Training Job

You submit your training script as a job to a compute target in Azure Machine Learning. This involves creating an Estimator or ScriptRunConfig object and submitting it.

Here's a Python example for submitting a job using the Azure ML SDK:


from azureml.core import Workspace, Experiment, Environment
from azureml.core.compute import ComputeTarget
from azureml.core.script_run_config import ScriptRunConfig
import os

# Load workspace configuration
ws = Workspace.from_config()

# Get compute target
compute_name = "your-compute-cluster-name" # Replace with your compute target name
compute_target = ComputeTarget(workspace=ws, name=compute_name)

# Define environment
# Use a curated environment or create your own
curated_env_name = "AzureML-sklearn-0.24-ubuntu18.04-py37-cpu"
try:
    env = Environment.get(workspace=ws, name=curated_env_name)
except Exception:
    print(f"Curated environment '{curated_env_name}' not found. Consider creating a custom environment.")
    # Example of creating a custom environment (uncomment and modify if needed)
    # env = Environment(name="my-custom-env")
    # env.docker.image = "mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04:latest"
    # env.python.conda_dependencies.add_pip_package("scikit-learn")
    # env.python.conda_dependencies.add_pip_package("azureml-defaults")
    # env.register(workspace=ws)

# Define data input
# Assuming your data is registered as a dataset 'my-iris-dataset'
dataset_name = 'my-iris-dataset'
input_dataset = ws.datasets[dataset_name].as_mount()

# Create a ScriptRunConfig
src = ScriptRunConfig(
    source_directory='.',  # Directory containing your training script
    script='train_script.py', # Your training script name
    arguments=['--data-path', input_dataset.as_download(), '--reg-rate', '0.05'],
    compute_target=compute_target,
    environment=env
)

# Submit the experiment
experiment_name = 'iris-training-experiment'
experiment = Experiment(workspace=ws, name=experiment_name)
run = experiment.submit(src)

print(f"Submitted training job: {run.id}")
print(f"View run in Azure ML Studio: {run.get_portal_url()}")

# Optional: Wait for the run to complete
# run.wait_for_completion(show_output=True)

Pro Tip: For distributed training, configure your training script and the ScriptRunConfig accordingly, leveraging Azure ML's distributed training support.

6. Monitor and Review Runs

Once submitted, you can monitor your training jobs in the Azure Machine Learning Studio. You can view logs, metrics, hyperparameters, and the saved model artifacts.

Key things to check:

Metrics: Accuracy, loss, precision, recall, etc.
Logs: Standard output and error from your script.
Model Artifacts: The saved model file (e.g., sklearn_model.pkl).
Hyperparameters: The parameters used for training.

7. Register and Deploy Your Model

After successful training, register your model in the Azure Machine Learning model registry. This allows you to manage and deploy your models effectively.

The next step would be to deploy this registered model as a web service for real-time inference or as a batch endpoint for offline processing. Refer to the "How to Deploy a Model" guide for details.