Azure ML Training: Getting Started with Your First Model

This article provides a step-by-step guide to train your first machine learning model using Azure Machine Learning (Azure ML). We'll cover the essential components and walk you through a simple classification task.

1. Prerequisites

An Azure subscription. If you don't have one, sign up for a free trial.
Azure CLI installed and configured.
Python 3.6 or later.

2. Setting Up Your Azure ML Workspace

An Azure ML workspace is the foundational resource for all your Azure ML activities. It provides a centralized place to manage your data, experiments, models, and other assets.

Create an Azure ML Workspace:


az ml workspace create -n myworkspace -g myresourcegroup --location eastus

Replace myworkspace and myresourcegroup with your desired names.

Configure your local environment to connect to the workspace:


az ml workspace config -n myworkspace -g myresourcegroup --file aml_config/config.json

3. Preparing Your Data

For this tutorial, we'll use a common dataset available online. You can also upload your own CSV files.

You can register datasets in your workspace, which allows for versioning and easier access.

Create a dataset definition (e.g., src/dataset.json):


{
    "schema_version": "1.0",
    "data_type": "csv",
    "data_location": "https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv",
    "description": "Red Wine Quality Dataset"
}


az ml dataset create -f src/dataset.json -n wine-quality-red --version 1

4. Creating a Training Script

This Python script will perform the actual model training.

Create a file named train.py in a src directory:


from azureml.core import Run, Dataset
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
import pandas as pd

# Get the current run
run = Run.get_context()

# Load the dataset
# Data is downloaded to the path specified by run.input_datasets['wine_data']
dataset_path = run.input_datasets['wine_data']
data = pd.read_csv(dataset_path, sep=';')

# Prepare data
X = data.drop('quality', axis=1)
y = data['quality']

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train model
model = LogisticRegression(solver='liblinear', multi_class='auto')
model.fit(X_train, y_train)

# Evaluate model
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

print(f"Model Accuracy: {accuracy}")

# Log metrics and save model
run.log('accuracy', accuracy)
run.log_image(name='accuracy_bar', plot=lambda: plot_accuracy(accuracy)) # Assuming plot_accuracy is defined

# Save the model
run.save_model(model_name='wine_quality_model.pkl', model_file_name='wine_quality_model.pkl', model=model)

print("Model training complete and model saved.")

# Placeholder for plot_accuracy function (for illustrative purposes)
def plot_accuracy(acc):
    import matplotlib.pyplot as plt
    fig = plt.figure()
    plt.bar(['Accuracy'], [acc], color=var(--primary-color))
    plt.ylabel('Score')
    plt.title('Model Accuracy')
    plt.ylim(0, 1)
    return fig

Note: For simplicity, the plot_accuracy function is a placeholder. In a real scenario, you'd implement it to generate the plot.

5. Submitting a Training Job

Now, we'll create a configuration to run our training script on Azure ML compute.

Create an environment definition (e.g., src/environment.yml):


name: sklearn-env
dependencies:
  - python=3.8
  - pip
  - scikit-learn
  - pandas
  - matplotlib
  - pip:
    - azureml-sdk[cli]
    - azureml-mlflow

Create a compute target if you don't have one:


az ml compute create -n cpu-cluster --type ComputeInstance --size Standard_DS1_v2 -m 1

This command creates a compute instance. For actual training, you'd typically use a compute cluster.

Submit the training job:


az ml job create --file src/job.yml

Where src/job.yml contains:


$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
code: src
command: >-
  python train.py
environment: azureml:sklearn-env:1 # Replace with your environment name and version
compute: azureml:cpu-cluster # Replace with your compute target name
inputs:
  wine_data:
    type: dataset
    path: azureml:wine-quality-red:1 # Replace with your dataset name and version
experiment_name: azureml-training-demo
display_name: first-model-training
distribution:
  type: null # For single-node training
resources:
  instance_count: 1
  instance_type: Standard_DS1_v2
log_parameters: {}
log_datasets: {}
log_models:
  wine_quality_model:
    type: mlflow
    path: outputs/wine_quality_model.pkl # Path within the job's output directory

6. Monitoring and Retrieving Results

After submitting the job, you can monitor its progress in the Azure ML Studio. Once completed, you can retrieve the logged metrics and the trained model.

In Azure ML Studio: Navigate to Experiments > Your Experiment Name. You'll see your job listed, and you can click on it to view logs, metrics, and output files.

You can also use the Azure CLI to get job details:


az ml job show -n

To download the model:


az ml model download --name wine_quality_model --output-dir ./models --workspace myworkspace --resource-group myresourcegroup

This will download the wine_quality_model.pkl file to your local ./models directory.

Conclusion

Congratulations! You have successfully trained and deployed your first machine learning model using Azure Machine Learning. This is just the beginning; Azure ML offers a rich set of tools for data preparation, feature engineering, hyperparameter tuning, and model deployment.

Explore more Azure ML tutorials to deepen your understanding.