Azure Machine Learning SDK Reference
Introduction to the Azure Machine Learning SDK
The Azure Machine Learning SDK for Python is a powerful tool that allows you to manage and orchestrate your machine learning workflows on Azure. It provides a comprehensive set of classes and functions for interacting with Azure Machine Learning resources, from data preparation and model training to deployment and monitoring.
This SDK enables developers and data scientists to build, train, and deploy machine learning models at scale. Whether you're working with simple scripts or complex deep learning models, the SDK offers flexibility and control over your entire ML lifecycle.
Key Components
Workspaces
A workspace is the top-level resource for Azure Machine Learning. It provides a centralized place to work with all the artifacts you create when you use Azure Machine Learning. This includes notebooks, compute instances, experiments, models, and datastores.
from azureml.core import Workspace
ws = Workspace.from_config()
Compute Resources
The SDK allows you to create and manage various compute targets for training and inference, including:
- Compute Instances: Cloud-based workstations for development.
- Compute Clusters: Scalable clusters for distributed training.
- Inference Clusters: Managed Kubernetes clusters for deploying models.
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException
# Define the compute target configuration
compute_name = "my-aml-compute"
vm_size = "STANDARD_DS11_V2"
min_nodes = 0
max_nodes = 4
try:
compute_target = ComputeTarget(workspace=ws, name=compute_name)
print(f"Found existing compute target: {compute_name}")
except ComputeTargetException:
print(f"Creating new compute target: {compute_name}")
compute_config = AmlCompute.provisioning_configuration(vm_size=vm_size,
min_nodes=min_nodes,
max_nodes=max_nodes)
compute_target = ComputeTarget.create(ws, compute_name, compute_config)
compute_target.wait_for_completion(show_output=True)
Experiments and Jobs
Experiments are containers for your runs. A run represents a single execution of your training script. The SDK simplifies submitting and tracking these runs.
from azureml.core import Experiment, ScriptRunConfig
# Configure the run
src = ScriptRunConfig(source_directory='.', script='train.py', compute_target=compute_target)
# Create an experiment and submit the run
experiment = Experiment(workspace=ws, name='my-training-experiment')
run = experiment.submit(src)
run.wait_for_completion(show_output=True)
Models and Deployments
Register your trained models in the Azure ML model registry and deploy them as real-time endpoints or batch endpoints.
from azureml.core.model import Model
# Register the model
model = Model.register(model_path='outputs/model.pkl', # Relative path to the model file
model_name='my-sklearn-model',
tags={'area': 'regression', 'type': 'sklearn'},
properties={'accuracy': 0.95},
workspace=ws)
print(f"Registered model: {model.name}, version: {model.version}")
Data Management
Connect to and manage your data sources using Datastores and create Data Assets for versioning and tracking.
from azureml.core import Datastore, Dataset
# Get a reference to the default datastore
datastore = ws.get_default_datastore()
# Create a dataset from a folder (e.g., for tabular data)
# dataset = Dataset.Tabular.from_delimited_files(path=(datastore, 'path/to/your/data/*.csv'))
# dataset = dataset.register(workspace=ws, name='my-training-data', create_new_version=True)