Azure Kubernetes Service (AKS) Documentation

Learn how to scale your applications effectively.

Scaling Applications on Azure Kubernetes Service (AKS)

This tutorial guides you through the process of scaling your applications deployed on Azure Kubernetes Service (AKS) to handle varying workloads. We will cover both manual and automatic scaling strategies.

Prerequisites:
  • An Azure account with an active subscription.
  • An AKS cluster deployed and configured.
  • kubectl installed and configured to connect to your AKS cluster.
  • A sample application deployed on your AKS cluster (e.g., a web server).

Why Scale Applications?

Applications need to scale to ensure they remain available and performant under different load conditions. Scaling up (adding more resources) can handle increased demand, while scaling down (removing resources) can save costs during periods of low activity. Kubernetes provides powerful tools to manage this automatically.

Manual Scaling

Manual scaling involves explicitly changing the number of pod replicas for a deployment or stateful set.

Scaling Deployments

You can scale a deployment using the kubectl scale command. This command directly updates the replicas field in your deployment's manifest.

1

Check current replica count

First, verify the current number of replicas for your application's deployment:

kubectl get deployment 
2

Scale the deployment

To scale the deployment to, for example, 5 replicas:

kubectl scale deployment  --replicas=5

Kubernetes will then ensure that 5 pods are running for this deployment. You can verify this with the get deployment command again.

Automatic Scaling

Automatic scaling is the preferred method for managing application capacity as it responds dynamically to real-time metrics.

Horizontal Pod Autoscaler (HPA)

The Horizontal Pod Autoscaler (HPA) automatically scales the number of pods in a deployment or replica set based on observed CPU utilization or other metrics.

Understanding HPA Metrics

HPA can scale based on various metrics, including:

  • CPU Utilization: The percentage of CPU requested by containers.
  • Memory Utilization: The percentage of memory requested by containers.
  • Custom Metrics: Metrics from external systems or custom applications.
  • External Metrics: Metrics from external sources like Azure Monitor.

For HPA to work effectively, your pods must have CPU and memory requests defined in their pod specifications.

Configuring HPA

You can create an HPA resource using kubectl autoscale or by defining a YAML manifest.

1

Define Resource Requests

Ensure your deployment includes CPU and memory requests. If not, you'll need to update your deployment manifest. For example:


apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-web-app
spec:
  replicas: 1
  selector:
    matchLabels:
      app: my-web-app
  template:
    metadata:
      labels:
        app: my-web-app
    spec:
      containers:
      - name: web-container
        image: nginx
        ports:
        - containerPort: 80
        resources:
          requests:
            cpu: "100m"  # 100 millicores
            memory: "128Mi" # 128 Mebibytes
          limits:
            cpu: "200m"
            memory: "256Mi"
                    
2

Create HPA

Use kubectl autoscale to create an HPA that scales your deployment based on CPU utilization. This command will set the target CPU utilization to 50% and the minimum/maximum number of replicas.

kubectl autoscale deployment  --cpu-percent=50 --min=1 --max=10

This command configures the HPA to scale the deployment between 1 and 10 replicas, targeting 50% CPU utilization. The HPA controller will periodically check the average CPU usage across all pods and adjust the replica count accordingly.

3

Verify HPA

Check the status of your HPA:

kubectl get hpa

You can also describe the HPA for more details:

kubectl describe hpa 
Scaling based on Memory

To scale based on memory, you would typically define a custom metric or use a memory utilization target if your Kubernetes version and setup supports it directly. Often, it involves setting up Prometheus and the Prometheus adapter for Kubernetes.

For a basic memory target, you might see configurations like:


apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: my-web-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-web-app
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 70 # Target 70% memory utilization
                    

Cluster Autoscaler

While HPA scales your application *pods*, the Kubernetes Cluster Autoscaler scales the *nodes* in your cluster. If your pods cannot be scheduled due to insufficient node resources, the Cluster Autoscaler will add new nodes. Conversely, it will remove underutilized nodes to save costs.

Enabling Cluster Autoscaler

The Cluster Autoscaler is typically configured when you create your AKS cluster or can be enabled on an existing cluster. It works by integrating with Azure VM Scale Sets.

Ensure your node pools are configured for autoscaling. You can manage this through the Azure portal or using Azure CLI.

Note: HPA and Cluster Autoscaler work together. HPA ensures your application has enough pods, and Cluster Autoscaler ensures your cluster has enough nodes to run those pods.

Best Practices for Scaling

By implementing these scaling strategies, you can ensure your applications on AKS are resilient, performant, and cost-effective.