Introduction

Effective monitoring and logging are critical for understanding the behavior of your Azure Kubernetes Service (AKS) cluster, troubleshooting issues, and optimizing performance. Azure provides powerful tools and services to help you collect, analyze, and visualize data from your AKS environment.

This guide covers the key components and strategies for implementing robust monitoring and logging for your AKS clusters.

Azure Monitor Integration

Azure Monitor is the foundational monitoring service in Azure. It collects and analyzes telemetry from your cloud and on-premises environments. For AKS, Azure Monitor offers:

  • Container Insights: A comprehensive solution for collecting, analyzing, and acting on telemetry from your container workloads.
  • Diagnostic Settings: Allows you to send AKS diagnostic logs and metrics to various destinations, including Log Analytics workspaces.
  • Alerting: Set up alerts based on metrics and log queries to proactively notify you of potential issues.

Container Insights

Container Insights is the recommended solution for monitoring AKS. It provides deep visibility into the performance and health of your container orchestrator, nodes, and applications running in AKS. It leverages Azure Monitor Agents to collect data.

Collecting Logs

Container Insights can collect:

  • Container Logs: Standard output and error streams from your containers.
  • Node Logs: System logs from the AKS nodes.
  • Kubernetes Events: Events generated by the Kubernetes control plane and workloads.

These logs are typically sent to a Log Analytics workspace for analysis.

Collecting Metrics

Key performance metrics are collected, including:

  • CPU and memory utilization for nodes and pods.
  • Network traffic.
  • Disk I/O.
  • Kubernetes API server performance.

Metrics are stored in Azure Monitor Metrics, allowing for time-series analysis and visualization.

Visualizing Data

Container Insights offers pre-built workbooks and dashboards in the Azure portal:

  • Cluster Overview: Provides a high-level view of cluster health, node status, and resource utilization.
  • Controller & Pod Performance: Detailed performance metrics for deployments, replica sets, and individual pods.
  • Node Performance: Insights into the resource usage and health of your worker nodes.
  • Live Data: Real-time streaming of logs and events.

You can also create custom dashboards and workbooks using Kusto Query Language (KQL) queries against your Log Analytics data.

AKS Diagnostic Settings

You can configure diagnostic settings for your AKS cluster to export specific logs and metrics to various destinations:

  • Log Analytics Workspace: For centralized logging and analysis.
  • Storage Account: For long-term archiving of logs and metrics.
  • Event Hubs: For streaming logs to third-party SIEM or analytics tools.

Common log categories to enable include kube-audit-admin, kube-controller-manager, kube-scheduler, and cluster-autoscaler.

Note: Enabling detailed audit logs can generate a significant volume of data. Configure retention policies carefully to manage costs.

Log Analytics Workspace

A Log Analytics workspace is a central repository for log data collected from various sources, including your AKS cluster. It provides a powerful query engine (KQL) for searching, analyzing, and visualizing your log data.

With Log Analytics, you can:

  • Run complex queries to troubleshoot issues.
  • Create custom alerts based on log data.
  • Export data for further analysis or compliance.

Fluentd and OMS Agent

For older AKS clusters or specific logging requirements, you might use:

  • Fluentd: A popular open-source data collector that can be deployed as a DaemonSet in AKS to collect container logs and forward them to various backends.
  • OMS Agent for Linux: The predecessor to the Azure Monitor Agent, which could also be used to collect logs and metrics.

Container Insights now primarily uses the Azure Monitor Agent, which offers improved performance and integration.

Custom Logging Solutions

While Container Insights offers a comprehensive managed solution, you can also integrate custom logging:

  • Sidecar Containers: Deploy a logging agent as a sidecar container in your pods to collect application-specific logs.
  • Dedicated Logging Clusters: For very large-scale deployments, you might consider a dedicated logging cluster using tools like Elasticsearch, Fluentd, and Kibana (EFK stack) or Loki, Promtail, and Grafana (PLG stack), sending data from AKS to this cluster.

Best Practices

  • Enable Container Insights: Start with the managed Container Insights solution for a quick and effective monitoring setup.
  • Centralize Logs: Use a dedicated Log Analytics workspace for your AKS cluster logs.
  • Configure Audit Logs: Enable audit logs for security and compliance, but be mindful of data volume.
  • Set Up Alerts: Define meaningful alerts for critical metrics (e.g., high CPU/memory, pod failures, node unavailability).
  • Regularly Review Dashboards: Familiarize yourself with the pre-built dashboards and customize them as needed.
  • Understand KQL: Learn Kusto Query Language for advanced log analysis and troubleshooting.
  • Manage Costs: Monitor your Log Analytics data ingestion and retention costs.