Home > Articles > Azure > Kubernetes Service (AKS) > Tutorials > Monitor AKS

Monitor Azure Kubernetes Service (AKS)

This tutorial demonstrates how to monitor your Azure Kubernetes Service (AKS) cluster and workloads for performance, health, and potential issues. Effective monitoring is crucial for maintaining a stable and efficient Kubernetes environment.

Key Monitoring Tools and Concepts

AKS integrates with several Azure services and Kubernetes-native tools to provide comprehensive monitoring capabilities:

Azure Monitor: A unified observability service that collects, analyzes, and acts on telemetry from your cloud and on-premises environments.
Container Insights: A feature of Azure Monitor that provides performance monitoring of your AKS clusters. It collects memory and processor usage, network, and disk. It also analyzes and alerts on the performance data of your containers.
Kubernetes Metrics Server: A scalable, efficient source of container resource metrics for Kubernetes.
Prometheus & Grafana: Popular open-source tools for monitoring and visualization, often deployed within Kubernetes clusters.

Setting Up Container Insights

Container Insights offers a rich, interactive experience for monitoring your AKS clusters. Follow these steps to enable it:

Navigate to your AKS cluster resource in the Azure portal.
In the left-hand menu, under "Monitoring," select "Insights."
If Container Insights is not enabled, you will see an option to enable it. Click "Enable."
You may need to select a Log Analytics workspace to store the collected data. Create a new one or select an existing workspace.
Confirm the settings and click "Create."

Enabling Container Insights will deploy a monitoring agent (as a DaemonSet) to your cluster nodes to collect logs and metrics. This might incur additional costs depending on your Azure subscription and data retention policies.

Exploring Container Insights Dashboards

Once Container Insights is enabled, you can access detailed dashboards directly from your AKS cluster's "Insights" page:

Cluster Overview

The main dashboard provides a high-level view of your cluster's health, including:

CPU and Memory utilization across the cluster.
Node count and status.
Pod status (Running, Pending, Failed).
Network traffic and disk I/O.

Controller and Pod Views

Drill down into specific Deployments, ReplicaSets, or Pods to monitor their individual performance metrics. You can see resource usage, restarts, and events for each component.

Configuring Alerts

Proactive alerting is essential for responding to issues before they impact your users. Azure Monitor allows you to create alert rules based on specific metrics or log queries.

Go to "Monitor" in the Azure portal and select "Alerts."
Click "Create" and then "Alert rule."
Scope: Select your AKS cluster.
Condition: Define the signal (e.g., "CPU usage percentage"), operator, threshold, and evaluation frequency.
Actions: Configure action groups to notify administrators via email, SMS, or trigger other automated responses.
Details: Provide a name, description, and severity for the alert.

Consider setting up alerts for common issues such as high CPU/memory usage, node unavailability, high restart counts for pods, or API server latency.

Using Kubernetes-Native Tools

Prometheus and Grafana

For more advanced and customizable monitoring, you can deploy Prometheus and Grafana directly within your AKS cluster.

Prometheus: A powerful time-series database and monitoring system.
Grafana: A popular data visualization and dashboarding tool that integrates seamlessly with Prometheus.

You can install these using Helm charts or custom Kubernetes manifests. Many community Helm charts are available, such as kube-prometheus-stack.

Example Helm installation:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install prometheus prometheus-community/kube-prometheus-stack --namespace monitoring --create-namespace

Best Practices for AKS Monitoring

Enable Container Insights: For out-of-the-box monitoring and alerting capabilities.
Define Clear SLOs/SLAs: Understand what performance metrics are critical for your applications.
Set Up Meaningful Alerts: Focus on actionable alerts that require intervention.
Regularly Review Dashboards: Stay informed about the overall health and performance of your cluster.
Monitor Resource Requests and Limits: Ensure your pods have adequate resources and that resource starvation is avoided.
Log Aggregation: Centralize your application and cluster logs for easier troubleshooting. Azure Monitor's Log Analytics is ideal for this.
Trace Distributed Applications: For microservices, consider distributed tracing tools like Application Insights or Jaeger.

By implementing a robust monitoring strategy, you can ensure your AKS deployments are reliable, performant, and cost-effective.