Monitoring Azure Kubernetes Service (AKS)

Effective monitoring is crucial for the health, performance, and security of your Azure Kubernetes Service (AKS) clusters.

Key Monitoring Components in AKS

Azure Kubernetes Service integrates deeply with Azure Monitor, providing a comprehensive solution for collecting, analyzing, and acting on telemetry from your Kubernetes cluster and workloads. This includes metrics, logs, and traces.

  • Container Insights: A capabilities-rich solution for performance monitoring of container workloads in AKS. It collects metrics and logs from your cluster nodes and containers, using these to support auto-scaling, fault detection, and root cause analysis.
  • Azure Monitor Metrics: Provides time-series data about resource performance. For AKS, this includes metrics for cluster nodes, pods, and container CPU/memory usage.
  • Azure Monitor Logs (Log Analytics): Collects and analyzes log data from various sources. For AKS, this means capturing logs from Kubernetes control plane components, node logs, and application logs.
  • Azure Activity Log: Records subscription-level events that occur in Azure. This helps in understanding cluster-level operations and auditing.

Getting Started with Container Insights

Container Insights is the recommended way to monitor your AKS clusters. It provides out-of-the-box dashboards and alerts.

Tip: Ensure Container Insights is enabled during AKS cluster creation or enable it on an existing cluster for the best monitoring experience.

Enabling Container Insights

  1. Navigate to your AKS cluster resource in the Azure portal.
  2. Under the "Monitoring" section, select "Insights".
  3. If not enabled, click "Enable" and follow the prompts to configure a new Log Analytics workspace or select an existing one.

Exploring the Monitoring Dashboard

Once enabled, you can access a rich set of visualizations:

  • Cluster Overview: High-level health, node counts, CPU/Memory utilization.
  • Controller/Pod Performance: Detailed metrics for deployments, stateful sets, and individual pods.
  • Node Performance: Resource usage and health of your cluster nodes.
  • Workload Performance: Application-specific metrics if configured.
  • Live Data: View live logs and events as they happen.

Customizing Monitoring and Alerting

Beyond the default dashboards, you can create custom alerts and workbooks for specific needs.

Configuring Alerts

Set up alerts based on metrics or log queries to be notified of potential issues:

  1. Go to "Monitor" in the Azure portal, then "Alerts".
  2. Click "+ Create" > "Alert rule".
  3. Select the scope (your AKS cluster or Log Analytics workspace).
  4. Define the condition based on a metric (e.g., high CPU usage on nodes) or a log search query (e.g., frequent error logs from a specific application).
  5. Configure the action group to receive notifications (email, SMS, webhook).

Creating Custom Workbooks

Azure Workbooks allow you to create interactive reports and dashboards by combining text, metrics, logs, and parameters.

  • Access "Workbooks" from the Azure Monitor menu.
  • Create a new workbook or edit an existing one.
  • Add queries to fetch data from your Log Analytics workspace.
  • Visualize data using charts, grids, and other components.
  • Use parameters to make your workbooks interactive.
Security Note: Be mindful of the data you collect and store in Log Analytics. Sensitive information should be sanitized or avoided where possible.

Log Collection Strategies

Effective log collection is vital for debugging and auditing. AKS supports various methods:

  • Container Insights: Automatically collects stdout/stderr from containers and system logs.
  • Azure Monitor Agent (AMA): The recommended agent for collecting logs and metrics. It's deployed as a DaemonSet and can be configured to collect specific logs (e.g., application logs from files).
  • Fluent Bit/Fluentd: Open-source log processors that can be deployed as a DaemonSet to collect logs and forward them to various destinations, including Azure Log Analytics.

Example: Collecting Application Logs with AMA

You can configure the Azure Monitor Agent data collection rules to include specific application log files from your nodes.


# Excerpt from an AMA Data Collection Rule (DCR)
{
  "properties": {
    "dataSources": {
      "performanceCounters": [...],
      "extensions": [...],
      "logFiles": [
        {
          "filePatterns": [
            "/var/log/my-app/*.log"
          ],
          "name": "app-logs",
          "collectInterval": "0:01:00",
          "streams": [
            "Microsoft-ContainerLog-App"
          ]
        }
      ]
    },
    "destination": {
      "logAnalytics": [
        {
          "workspaceResourceId": "/subscriptions/.../resourceGroups/.../providers/Microsoft.OperationalInsights/workspaces/..."
        }
      ]
    }
  }
}
                

Best Practices for AKS Monitoring

  • Establish Baselines: Understand normal performance patterns to easily identify anomalies.
  • Monitor Key Metrics: Focus on CPU, memory, network I/O, disk I/O, pod restarts, and application-specific metrics.
  • Centralize Logging: Use a single Log Analytics workspace for all your AKS logs.
  • Implement Health Checks: Use liveness and readiness probes for your applications.
  • Set Up Meaningful Alerts: Avoid alert fatigue by focusing on actionable alerts.
  • Regularly Review Dashboards: Proactively check the health and performance of your cluster.
  • Leverage Kubernetes Events: Monitor Kubernetes events for scheduling issues, image pulls, and other cluster-level activities.