Scaling Applications on Azure Kubernetes Service (AKS)
This tutorial guides you through the process of scaling your applications deployed on Azure Kubernetes Service (AKS) to handle varying workloads. We will cover both manual and automatic scaling strategies.
- An Azure account with an active subscription.
- An AKS cluster deployed and configured.
kubectlinstalled and configured to connect to your AKS cluster.- A sample application deployed on your AKS cluster (e.g., a web server).
Why Scale Applications?
Applications need to scale to ensure they remain available and performant under different load conditions. Scaling up (adding more resources) can handle increased demand, while scaling down (removing resources) can save costs during periods of low activity. Kubernetes provides powerful tools to manage this automatically.
Manual Scaling
Manual scaling involves explicitly changing the number of pod replicas for a deployment or stateful set.
Scaling Deployments
You can scale a deployment using the kubectl scale command. This command directly updates the replicas field in your deployment's manifest.
Check current replica count
First, verify the current number of replicas for your application's deployment:
kubectl get deployment
Scale the deployment
To scale the deployment to, for example, 5 replicas:
kubectl scale deployment --replicas=5
Kubernetes will then ensure that 5 pods are running for this deployment. You can verify this with the get deployment command again.
Automatic Scaling
Automatic scaling is the preferred method for managing application capacity as it responds dynamically to real-time metrics.
Horizontal Pod Autoscaler (HPA)
The Horizontal Pod Autoscaler (HPA) automatically scales the number of pods in a deployment or replica set based on observed CPU utilization or other metrics.
Configuring HPA
You can create an HPA resource using kubectl autoscale or by defining a YAML manifest.
Define Resource Requests
Ensure your deployment includes CPU and memory requests. If not, you'll need to update your deployment manifest. For example:
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-web-app
spec:
replicas: 1
selector:
matchLabels:
app: my-web-app
template:
metadata:
labels:
app: my-web-app
spec:
containers:
- name: web-container
image: nginx
ports:
- containerPort: 80
resources:
requests:
cpu: "100m" # 100 millicores
memory: "128Mi" # 128 Mebibytes
limits:
cpu: "200m"
memory: "256Mi"
Create HPA
Use kubectl autoscale to create an HPA that scales your deployment based on CPU utilization. This command will set the target CPU utilization to 50% and the minimum/maximum number of replicas.
kubectl autoscale deployment --cpu-percent=50 --min=1 --max=10
This command configures the HPA to scale the deployment between 1 and 10 replicas, targeting 50% CPU utilization. The HPA controller will periodically check the average CPU usage across all pods and adjust the replica count accordingly.
Verify HPA
Check the status of your HPA:
kubectl get hpa
You can also describe the HPA for more details:
kubectl describe hpa
Cluster Autoscaler
While HPA scales your application *pods*, the Kubernetes Cluster Autoscaler scales the *nodes* in your cluster. If your pods cannot be scheduled due to insufficient node resources, the Cluster Autoscaler will add new nodes. Conversely, it will remove underutilized nodes to save costs.
Enabling Cluster Autoscaler
The Cluster Autoscaler is typically configured when you create your AKS cluster or can be enabled on an existing cluster. It works by integrating with Azure VM Scale Sets.
Ensure your node pools are configured for autoscaling. You can manage this through the Azure portal or using Azure CLI.
Best Practices for Scaling
- Define Resource Requests and Limits: Essential for both HPA and efficient resource scheduling.
- Monitor Performance: Regularly check your application and cluster performance metrics.
- Set Appropriate Min/Max Replicas: Balance cost-efficiency with availability.
- Test Your Scaling Configuration: Simulate load to ensure your autoscalers respond as expected.
- Consider Pod Disruption Budgets (PDBs): To ensure a minimum number of your pods are always available during voluntary disruptions like node upgrades.
By implementing these scaling strategies, you can ensure your applications on AKS are resilient, performant, and cost-effective.