Azure Documentation

Scaling Azure Kubernetes Service (AKS) clusters

Azure Kubernetes Service (AKS) enables you to deploy, manage, and scale containerized applications using Kubernetes without the operational overhead of maintaining Kubernetes infrastructure. This tutorial covers how to scale your AKS clusters to handle varying workloads.

Understanding Scaling in AKS

Scaling in AKS can refer to two primary concepts:

  • Scaling Pods: Adjusting the number of pod replicas to meet application demand.
  • Scaling Nodes: Adjusting the number of virtual machines (nodes) in your cluster's node pool to accommodate more pods.

Scaling Pods Automatically with the Horizontal Pod Autoscaler (HPA)

The Horizontal Pod Autoscaler (HPA) automatically adjusts the number of pods in a deployment or replica set based on observed metrics like CPU utilization or memory usage.

Prerequisites:

  • An existing Azure Kubernetes Service (AKS) cluster.
  • kubectl installed and configured to connect to your AKS cluster.
  • Metrics Server deployed in your AKS cluster (usually deployed by default).

Creating a Deployment:

First, let's create a sample application deployment. Save the following as azure-vote-front.yaml:


apiVersion: apps/v1
kind: Deployment
metadata:
  name: azure-vote-front
spec:
  replicas: 1
  selector:
    matchLabels:
      app: azure-vote-front
  template:
    metadata:
      labels:
        app: azure-vote-front
    spec:
      containers:
      - name: azure-vote-front
        image: mcr.microsoft.com/azuredocs/azure-vote-frontend:v1
        ports:
        - containerPort: 80
        env:
        - name: STORAGE_ACCOUNT_NAME
          value: "azurevotecfg"
        resources:
          requests:
            cpu: "100m" # Request 100 millicores of CPU
          limits:
            cpu: "200m" # Limit to 200 millicores of CPU

Apply the deployment:

kubectl apply -f azure-vote-front.yaml

Creating a Horizontal Pod Autoscaler:

Now, create an HPA resource that targets the azure-vote-front deployment. This HPA will scale the deployment based on CPU utilization, targeting an average of 50% of the requested CPU.

Save the following as azure-vote-hpa.yaml:


apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: azure-vote-front-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: azure-vote-front
  minReplicas: 1
  maxReplicas: 5
  targetCPUUtilizationPercentage: 50

Apply the HPA:

kubectl apply -f azure-vote-hpa.yaml

Testing the HPA:

You can simulate load on your application to test the HPA. First, expose your deployment:

kubectl expose deployment azure-vote-front --type LoadBalancer --port 80 --target-port 80

Wait for the load balancer to provision and get its external IP address:

kubectl get service azure-vote-front

Once you have the external IP, you can generate traffic. A simple way is to use kubectl run to create a temporary pod that sends requests:


kubectl run load-generator --image=busybox --rm -it --restart=Never -- sh
/ # wget -qO- http://<YOUR_EXTERNAL_IP>/
... (repeat many times)

While generating load, monitor the HPA status:

kubectl get hpa azure-vote-front-hpa --watch

You should see the TARGETS metric (CPU utilization) increase, and eventually, the REPLICAS count will scale up towards MAX REPLICAS.

Scaling Nodes with the Cluster Autoscaler

The Cluster Autoscaler automatically adjusts the number of nodes in your cluster's node pools. It can be configured to add nodes when pods are unschedulable due to resource constraints or remove nodes when they are underutilized.

Enabling the Cluster Autoscaler:

You can enable the Cluster Autoscaler when creating an AKS cluster or by updating an existing cluster.

During AKS Cluster Creation:

Use the Azure CLI with the --enable-cluster-autoscaler flag:

az aks create \
    --resource-group myResourceGroup \
    --name myAKSCluster \
    --node-count 1 \
    --enable-cluster-autoscaler \
    --min-count 1 \
    --max-count 3 \
    --node-vm-size Standard_DS2_v2 \
    --location eastus

Updating an Existing Node Pool:

To enable it on an existing node pool, first ensure the node pool has a minimum and maximum number of nodes defined.

az aks nodepool update \
    --resource-group myResourceGroup \
    --cluster-name myAKSCluster \
    --name agentpool \
    --enable-cluster-autoscaler \
    --min-count 1 \
    --max-count 5

Configuring Node Pool Limits:

When enabling the autoscaler, you must define the --min-count and --max-count for the node pool. These limits define the bounds for how many nodes the autoscaler can provision.

The Cluster Autoscaler works by evaluating pending pods that cannot be scheduled. If a pod cannot be scheduled due to insufficient resources, the autoscaler checks if it can add a node (within the specified max count) to accommodate it.

Manual Scaling

While autoscaling is recommended for dynamic workloads, you can also manually scale your deployments and node pools.

Manual Pod Scaling:

Scale a deployment to a specific number of replicas:

kubectl scale deployment azure-vote-front --replicas=5

Manual Node Scaling:

Scale a specific node pool to a desired number of nodes. Note that this requires the Cluster Autoscaler to be disabled or the manual scale to be within the autoscaler's min/max limits if it's enabled.

az aks nodepool scale \
    --resource-group myResourceGroup \
    --cluster-name myAKSCluster \
    --name agentpool \
    --node-count 3

Manually scaling nodes can be disruptive if not planned carefully, especially in production environments.

Best Practices for Scaling AKS

  • Monitor Resource Utilization: Regularly monitor CPU and memory usage of your pods and nodes to set appropriate HPA and Cluster Autoscaler thresholds.
  • Set Realistic Limits: Define sensible minReplicas and maxReplicas for your HPA and --min-count and --max-count for your node pools to balance performance and cost.
  • Choose Appropriate VM Sizes: Select VM sizes for your nodes that can accommodate your application's resource requirements.
  • Use Pod Anti-Affinity: Distribute replicas of your applications across different nodes for high availability.
  • Consider Horizontal Scaling for Applications: Design your applications to be stateless and scalable horizontally.
  • Understand Node Taints and Tolerations: Use these to control pod scheduling and ensure critical workloads land on specific nodes.

By effectively leveraging the Horizontal Pod Autoscaler and the Cluster Autoscaler, you can ensure your AKS clusters are resilient, performant, and cost-efficient, adapting seamlessly to your application's evolving demands.