Manage Compute for Azure Machine Learning

This guide provides comprehensive instructions on managing compute resources for your Azure Machine Learning workloads. Learn how to create, configure, and scale various compute targets to optimize your machine learning workflows.

Supported Compute Targets

Azure Machine Learning offers a variety of compute targets suitable for different stages of your machine learning lifecycle:

Compute Instances: Cloud-based workstations for development and testing.
Compute Clusters: Scalable clusters of VMs for batch training and inference.
Inference Clusters: Managed Kubernetes clusters for deploying models.
Attached Compute: Integrate your existing Azure compute resources like Azure HDInsight or Azure Databricks.

Creating a Compute Instance

Follow these steps to create a new compute instance:

Navigate to Compute in Azure ML Studio

In your Azure Machine Learning workspace, go to the 'Compute' section in the left-hand navigation pane.

Select 'Compute Instances' Tab

Click on the 'Compute instances' tab and then click '+ New'.

Configure Compute Instance Details

Choose a virtual machine size, region, and name for your compute instance. You can also configure advanced settings like SSH access.

Create the Instance

Click 'Create' to provision your compute instance. This may take a few minutes.

Tip: For cost efficiency, consider using Spot instances for compute clusters, especially for training jobs that can tolerate interruptions.

Creating a Compute Cluster

Compute clusters provide scalable resources for training and batch inference.

az ml compute create --resource-group my-resource-group --workspace-name my-workspace --name cpu-cluster --type amlcompute --size Standard_NC6 --min-nodes 0 --max-nodes 10

Key Parameters for Compute Clusters:

--name: A unique name for your compute cluster.
--type: Typically amlcompute for Azure Machine Learning managed compute.
--size: The VM SKU for the nodes in your cluster (e.g., Standard_NC6 for GPU, Standard_DS3_v2 for CPU).
--min-nodes: The minimum number of nodes to keep running.
--max-nodes: The maximum number of nodes the cluster can scale to.

Managing Existing Compute

You can manage your compute resources through the Azure portal, Azure CLI, or the Python SDK.

Scaling Compute Clusters

To scale a compute cluster, you can update the --min-nodes and --max-nodes parameters:

az ml compute update --resource-group my-resource-group --workspace-name my-workspace --name cpu-cluster --max-nodes 20

Deleting Compute Resources

To delete a compute instance or cluster:

az ml compute delete --resource-group my-resource-group --workspace-name my-workspace --name my-compute-name --yes

Best Practices

Choose the right VM size based on your workload requirements (CPU-bound, memory-bound, GPU-intensive).
Configure auto-scaling for compute clusters to optimize costs and performance.
Use dedicated compute instances for interactive development to avoid impacting shared resources.
Monitor your compute resource utilization to identify bottlenecks or underutilized resources.

Tip: For production deployments, consider using Inference Clusters (managed Kubernetes) for high availability and scalability.