Monitoring Azure Kubernetes Service (AKS)
Effective monitoring is essential to maintain performance, reliability, and security of your AKS clusters. Below are the common approaches and tools you can use.
Azure Monitor
Prometheus & Grafana
Logging & Alerts
Azure Monitor for Containers
Azure Monitor provides out‑of‑the‑box metrics, logs, and dashboards for AKS.
# Enable Azure Monitor for a new AKS cluster
az aks create \
--resource-group MyResourceGroup \
--name MyAKSCluster \
--enable-addons monitoring \
--generate-ssh-keys
# Enable on an existing cluster
az aks enable-addons \
--resource-group MyResourceGroup \
--name MyAKSCluster \
--addons monitoring
Prometheus & Grafana
Deploy the Prometheus stack via Helm for granular metrics.
# Add Helm repo
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
# Install kube-prometheus-stack
helm install kube-prometheus-stack prometheus-community/kube-prometheus-stack \
--namespace monitoring --create-namespace \
--set grafana.sidecar.dashboards.enabled=true
Logging, Alerts & Workbooks
- Log Analytics Workspace – centralize container logs.
- Azure Alerts – create metric alerts for CPU, memory, node health.
- Workbooks – build custom visualizations for troubleshooting.
Example: Create a CPU usage alert
az monitor metrics alert create \
--name HighCpuAlert \
--resource-group MyResourceGroup \
--scopes /subscriptions//resourceGroups/MyResourceGroup/providers/Microsoft.ContainerService/managedClusters/MyAKSCluster \
--condition "max container_cpu_usage_seconds_total > 80" \
--description "CPU usage over 80%" \
--action /subscriptions//resourceGroups/MyResourceGroup/providers/microsoft.insights/actionGroups/MyActionGroup
Best Practices
- Enable Azure Monitor for Containers on all production clusters.
- Deploy Prometheus Exporters for custom application metrics.
- Use Grafana dashboards for visualizing SLA‑related KPIs.
- Set up automated alerts for critical thresholds (CPU, memory, pod restarts).
- Centralize logs in a Log Analytics Workspace for correlation with metrics.