Introduction
This guide provides a step-by-step walkthrough on how to train a machine learning model using Azure Machine Learning. Azure ML is a cloud-based service that allows you to build, train, and deploy machine learning models faster, with more collaboration, and at scale.
We will cover the essential components and steps, from setting up your workspace to evaluating your trained model.
Prerequisites
- An Azure subscription.
- An Azure Machine Learning workspace. If you don't have one, you can create it via the Azure portal or the Azure CLI.
- Basic understanding of machine learning concepts.
- Python 3.6 or later installed (if using the SDK).
Key Steps to Train a Model
Step 1: Set Up Your Environment
Before you begin, ensure you have your Azure ML workspace ready. You can interact with your workspace using:
- Azure ML Studio: A web-based UI for managing your ML projects.
- Azure ML SDK for Python: A Python library for programmatic access.
- Azure CLI ml extension: A command-line interface for managing Azure ML resources.
For this guide, we'll assume you're using the Python SDK.
Step 2: Connect to Your Workspace
Initialize the MLClient object to connect to your Azure ML workspace.
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential
# Authenticate and create MLClient
try:
credential = DefaultAzureCredential()
# Replace with your actual subscription ID, resource group, and workspace name
ml_client = MLClient(
credential=credential,
subscription_id="YOUR_SUBSCRIPTION_ID",
resource_group_name="YOUR_RESOURCE_GROUP",
workspace_name="YOUR_WORKSPACE_NAME"
)
print("Connected to Azure ML workspace successfully.")
except Exception as e:
print(f"Error connecting to workspace: {e}")
Step 3: Prepare Your Data
Machine learning training requires data. You can upload your data to Azure ML datastores or create data assets.
For example, to create a data asset from a local CSV file:
from azure.ai.ml.entities import Data
my_data = Data(
path="./local/path/to/your/data.csv",
type="uri_file",
description="My training dataset",
name="my-training-data",
version="1"
)
ml_client.data.create_or_update(my_data)
print(f"Data asset '{my_data.name}' created/updated.")
Step 4: Define Your Training Script
Create a Python script that contains your model training logic. This script should accept arguments for data input and output directories.
Example train.py:
import argparse
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
import joblib # or pickle
def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument("--data_path", type=str, help="Path to the input data directory")
parser.add_argument("--model_output_path", type=str, help="Path to save the trained model")
args = parser.parse_args()
return args
def train_model(data_path, model_output_path):
# Load data
df = pd.read_csv(f"{data_path}/data.csv")
X = df[['feature1', 'feature2']]
y = df['target']
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train a simple model
model = LogisticRegression()
model.fit(X_train, y_train)
# Save the model
joblib.dump(model, f"{model_output_path}/model.pkl")
print("Model trained and saved.")
if __name__ == "__main__":
args = parse_args()
train_model(args.data_path, args.model_output_path)
Step 5: Create an Azure ML Job
Define a training job in Azure ML. This specifies the command to run, the environment, and the data inputs.
from azure.ai.ml import command, Input
from azure.ai.ml.entities import Environment
# Define the command job
job = command(
code="./src", # Local path to the folder containing train.py
command="python train.py --data_path ${{inputs.training_data}} --model_output_path ${{outputs.model_output}}",
inputs={
"training_data": Input(type="uri_folder", path="azureml:my-training-data:1") # Reference the data asset
},
outputs={
"model_output": {"type": "uri_folder"}
},
environment="azureml://registries/azureml/environments/sklearn-1.0/versions/1", # Example environment
compute="YOUR_COMPUTE_CLUSTER_NAME", # Name of your Azure ML compute cluster
display_name="model-training-job",
experiment_name="my-ml-experiment"
)
# Submit the job
returned_job = ml_client.jobs.create_or_update(job)
print(f"Job submitted: {returned_job.studio_url}")
Step 6: Monitor and Evaluate
You can monitor the progress of your job in Azure ML Studio. Once the job completes, you can retrieve the trained model artifacts and evaluate its performance.
To download the model:
# Get the job output artifact URI
model_path = returned_job.outputs.model_output.path
print(f"Model saved at: {model_path}")
# You can then download the model using the Azure ML SDK or directly from the Azure portal
# Example of downloading using CLI (after job completes):
# az ml job stream -n
# az ml job download -n --all
Next Steps
- Model Registration: Register your trained model in Azure ML for versioning and easier deployment.
- Model Deployment: Deploy your model as a web service (online endpoint) or batch endpoint.
- Experiment Tracking: Use Azure ML's experiment tracking to log metrics, parameters, and artifacts for better reproducibility.
- Automated ML: Explore Azure ML's automated ML capabilities for a faster path to model creation.