Compute Targets in Azure AI ML

Understand and manage the computational resources used to train and deploy your machine learning models.

What are Compute Targets?

In Azure Machine Learning, a compute target is a specific Azure resource where you run your training scripts or host your model deployments. It can be a local machine, a virtual machine (VM), a Docker container, or specialized hardware like GPUs. Choosing the right compute target is crucial for optimizing performance, cost, and scalability of your machine learning workflows.

Types of Compute Targets

Azure Machine Learning offers a variety of compute targets to suit different needs:

Compute Instance

A fully managed cloud-based workstation for development. Ideal for experimentation and interactive training.

  • Pre-configured with ML tools
  • Access via SSH or VS Code Remote
  • Scalable CPU and GPU options

Compute Cluster

A managed cluster of VMs that can scale up or down automatically. Perfect for batch training jobs.

  • Auto-scaling capabilities
  • Cost-effective for large-scale training
  • Supports various VM sizes and GPU types

Inference Cluster (AKS)

Azure Kubernetes Service (AKS) for deploying models for real-time inference at scale.

  • High availability and scalability
  • Managed Kubernetes environment
  • Suitable for production web services

Attached Compute

Use existing compute resources like Azure VMs or Databricks clusters.

  • Leverage existing infrastructure
  • Requires manual management
  • Flexibility for hybrid scenarios

Creating and Managing Compute Targets

You can create and manage compute targets through the Azure Machine Learning studio, Azure CLI, or Python SDK.

Using the Azure Machine Learning Studio

The studio provides a user-friendly interface to provision and configure compute resources.

  1. Navigate to your Azure Machine Learning workspace.
  2. Go to the 'Compute' section in the left-hand navigation.
  3. Click '+ New' to create a new compute resource.
  4. Select the type of compute and configure its properties.

Using the Azure CLI

The Azure CLI offers a command-line interface for managing compute targets.

az ml compute create -f compute.yml

Where compute.yml is a YAML configuration file defining your compute target's properties.

Using the Python SDK

The Python SDK allows programmatic creation and management within your scripts.


from azure.ai.ml import MLClient
from azure.ai.ml.entities import ComputeInstance, ComputeType

# Assume ml_client is initialized
# ml_client = MLClient(...)

# Create a Compute Instance
compute_name = "cpu-cluster-01"
compute_instance = ComputeInstance(
    name=compute_name,
    size="STANDARD_DS3_V2",
    idle_time_before_shutdown_minutes=30
)

try:
    ml_client.compute.begin_create_or_update(compute_instance).result()
    print(f"Compute instance '{compute_name}' created successfully.")
except Exception as e:
    print(f"Error creating compute instance: {e}")
                    

Best Practices

  • Choose the Right Compute: Select compute targets based on your workload (training vs. inference), scale, and budget.
  • Utilize Auto-Scaling: For compute clusters, leverage auto-scaling to optimize costs by adjusting the number of nodes based on demand.
  • Monitor Usage: Regularly monitor your compute resource utilization and costs.
  • Shut Down Idle Resources: Configure idle shutdown for compute instances and clusters to save costs when not in use.
  • Security: Ensure proper network security and access controls for your compute targets.

Common Scenarios

Scenario Recommended Compute Target Reasoning
Interactive development and debugging Compute Instance Provides a dedicated, cloud-based dev environment with easy access to tools.
Large-scale model training Compute Cluster Scales automatically to handle demanding training jobs and offers GPU support.
Deploying models for real-time predictions Inference Cluster (AKS) Optimized for high-throughput, low-latency inference and production readiness.
Leveraging existing on-premises resources Attached Compute (e.g., Azure Arc-enabled Kubernetes) Integrates with your existing infrastructure for hybrid deployments.

By effectively utilizing Azure AI ML compute targets, you can build, train, and deploy machine learning models efficiently and at scale.