Introduction
Azure Kubernetes Service (AKS) provides a managed Kubernetes environment that simplifies deploying, managing, and scaling machine learning (ML) workloads. This tutorial will guide you through the process of setting up AKS for your ML projects, covering everything from initial cluster creation to deploying and managing your ML services.
Why AKS for ML Workloads?
- Scalability: Easily scale your training and inference endpoints up or down based on demand.
- Resource Management: Efficiently manage compute resources (CPU, GPU) for demanding ML tasks.
- Portability: Standardized containerized environments ensure your ML models run consistently across different stages.
- Orchestration: Kubernetes handles deployment, scaling, and networking complexities.
- Integration: Seamless integration with other Azure services like Azure Container Registry, Azure Machine Learning, and Azure Storage.
Prerequisites
- An Azure account with an active subscription.
- Azure CLI installed and configured.
kubectlinstalled.- Docker installed (for local image building).
Step 1: Create an Azure Kubernetes Service (AKS) Cluster
First, we'll create a new AKS cluster. You can customize the node count, VM size, and other settings as per your requirements. For ML workloads, consider using VMs with GPU support if needed.
az group create --name MyMLResourceGroup --location eastusaz aks create \ --resource-group MyMLResourceGroup \ --name MyMLAKSCluster \ --node-count 3 \ --enable-addons monitoring \
--enable-managed-identity \
--generate-ssh-keys \
--node-vm-size Standard_DS3_v2
This command creates a resource group and then an AKS cluster named MyMLAKSCluster with 3 nodes of size Standard_DS3_v2 in the eastus region. The monitoring add-on is enabled for observing cluster performance.
After the cluster is created, connect kubectl to your AKS cluster:
az aks get-credentials --resource-group MyMLResourceGroup --name MyMLAKSCluster
Step 2: Containerize Your ML Model
Your ML model needs to be packaged into a Docker container. This typically involves a web server (like Flask or FastAPI) to expose your model as an API endpoint.
Dockerfile for a simple Flask ML API:
# Use an official Python runtime as a parent imageFROM python:3.9-slim-buster# Set the working directory in the containerWORKDIR /app# Copy the requirements file into the container at /appCOPY requirements.txt .# Install any needed packages specified in requirements.txtRUN pip install --no-cache-dir -r requirements.txt# Copy the current directory contents into the container at /appCOPY . .# Make port 80 available to the world outside this containerEXPOSE 80# Define environment variableENV NAME World# Run app.py when the container launchesCMD ["python", "app.py"]
Build the Docker image and push it to a container registry like Azure Container Registry (ACR):
# Login to ACR (replace with your ACR name)az acr login --name myacrmlregistry# Build the Docker imagedocker build -t myacrmlregistry.azurecr.io/ml-api:v1 .# Push the image to ACRdocker push myacrmlregistry.azurecr.io/ml-api:v1
Step 3: Define Kubernetes Deployments and Services
Create Kubernetes manifest files (YAML) to define how your containerized ML application should be deployed and exposed.
Deployment Manifest (deployment.yaml)
This defines the desired state for your application pods.
apiVersion: apps/v1kind: Deploymentmetadata: name: ml-api-deploymentspec: replicas: 3 selector: matchLabels: app: ml-api template: metadata: labels: app: ml-api spec: containers: - name: ml-api-container image: myacrmlregistry.azurecr.io/ml-api:v1 ports: - containerPort: 80 resources: requests: cpu: "200m" memory: "512Mi" limits: cpu: "500m" memory: "1Gi"
Service Manifest (service.yaml)
This defines how to access your application, typically via a LoadBalancer for external access.
apiVersion: v1kind: Servicemetadata: name: ml-api-servicespec: selector: app: ml-api ports: - protocol: TCP port: 80 targetPort: 80 type: LoadBalancer
Step 4: Apply Manifests to AKS
Use kubectl to deploy your application to the AKS cluster.
kubectl apply -f deployment.yamlkubectl apply -f service.yaml
You can check the status of your deployment and service:
kubectl get deploymentskubectl get podskubectl get services
Once the service is provisioned and has an external IP address, you can start sending requests to your ML API.
Step 5: Advanced ML Workload Management
For more complex ML scenarios, consider integrating with Azure Machine Learning for:
- Managed Training: Use AKS as a compute target for training large models with Azure ML.
- Model Deployment: Deploy trained models as managed endpoints directly from Azure ML to AKS.
- MLOps: Implement CI/CD pipelines for model retraining and deployment.
GPU Support
To utilize GPUs for deep learning workloads, ensure your AKS node pools are configured with GPU-enabled VM sizes. You may also need to install the NVIDIA device plugin for Kubernetes.
Scaling ML Inference
Use Kubernetes Horizontal Pod Autoscaler (HPA) to automatically scale your ML inference pods based on metrics like CPU or custom metrics (e.g., requests per second).
Conclusion
Azure Kubernetes Service offers a robust and scalable platform for deploying and managing your machine learning workloads. By containerizing your models and leveraging Kubernetes orchestration, you can achieve efficient resource utilization, simplified deployment, and seamless scaling for both training and inference.