Get Started with Azure Batch
Azure Batch is a managed cloud service that enables you to efficiently run large-scale parallel and high-performance computing (HPC) applications in the cloud. It handles the provisioning, management, and scheduling of compute resources, allowing you to focus on your code.
What is Azure Batch?
Azure Batch simplifies the process of running batch workloads. Key features include:
- Automatic Scaling: Automatically scales compute resources up or down based on job demand.
- Job Scheduling: Manages job execution and dependencies.
- Resource Management: Provisions and manages virtual machines (compute nodes) in pools.
- Application Packaging: Distributes your applications and their dependencies to compute nodes.
- Monitoring: Provides tools to monitor job progress and resource utilization.
Core Concepts
Understanding these concepts is crucial for using Azure Batch effectively:
- Pools: Collections of compute nodes (virtual machines) that run your application workloads.
- Nodes: The virtual machines within a pool.
- Jobs: A logical collection of tasks that represent a unit of work to be run on compute nodes.
- Tasks: The individual units of work that are executed on compute nodes.
- Applications: Packages that contain your application executables and necessary files.
Steps to Get Started
1. Create an Azure Batch Account
You need an Azure Batch account to manage your Batch workloads. You can create one through the Azure portal.
Go to the Azure portal, search for "Batch accounts", and click "Create".
Create Batch Account2. Create a Pool of Compute Nodes
Define the size and type of virtual machines you want to use for your computations. You can choose from various VM sizes and operating systems.
In your Batch account, navigate to "Pools" and click "Add". Configure your pool with:
- VM size: e.g.,
Standard_D2s_v3
- OS: e.g., Ubuntu Server 20.04 LTS
- Scale: Start with a fixed number of nodes or enable auto-scaling.
# Example of pool configuration (simplified)
{
"id": "my-batch-pool",
"vmSize": "standard_d2s_v3",
"targetDedicatedNodes": 4,
"enableAutoScale": false,
"virtualMachineConfiguration": {
"imageReference": {
"publisher": "Canonical",
"offer": "0001-com-ubuntu-server-focal",
"sku": "20_04-lts-gen2"
},
"nodeAgentSkuId": "batch.nodeagent.ubuntu.20.04"
}
}
3. Upload Your Application
Package your application code and dependencies. Batch supports uploading applications as zip archives.
You can also use container images for your workloads.
4. Create a Job and Tasks
A job groups tasks together. Tasks are the actual commands or programs that run on the compute nodes.
Example task execution:
# Command to run a Python script
python /mnt/batch/tasks/startup/mydummy/my_script.py --input-data /mnt/batch/tasks/fsshare/input/data.txt
You can submit jobs and tasks using:
- Azure Batch SDKs (Python, .NET, Java, Node.js)
- Azure CLI
- Azure PowerShell
Example using Azure CLI:
az batch job create --id my-batch-job --pool-id my-batch-pool --template-file job-template.json
az batch task create --job-id my-batch-job --template-file task-template.json
5. Monitor and Manage
Use the Azure portal or Batch APIs to monitor job progress, view task output, and check the status of your compute nodes.