Troubleshooting Azure Virtual Machines

Table of Contents

Virtual Machine Connectivity Issues

This section covers common problems related to connecting to your Azure Virtual Machines, including RDP, SSH, and network access.

Common Causes

Connectivity problems can arise from several sources:

  • Incorrect Network Security Group (NSG) rules.
  • Firewall configurations within the VM's OS.
  • Network routing or subnet misconfigurations.
  • VM status (e.g., stopped, deallocated).
  • Public IP address or DNS resolution issues.

Troubleshooting Steps

  1. Verify VM Status: Ensure the VM is running and not deallocated in the Azure portal.
  2. Check Network Security Groups (NSGs): Review NSG rules associated with the VM's network interface and subnet. Confirm that inbound traffic on the required ports (e.g., RDP 3389, SSH 22) is allowed from your IP address or range.
  3. Inspect OS Firewall: Log into the VM (if possible via a different method, like serial console) and check its internal firewall settings. Ensure the necessary ports are open.
  4. Validate Public IP and DNS: Confirm the VM has a public IP address assigned and that DNS records are resolving correctly.
  5. Use Azure Network Watcher: Utilize tools like Connection Troubleshoot and IP Flow Verify in Azure Network Watcher to diagnose connectivity path issues.

Specific Scenarios

If you can't RDP, consider the following:

  • Ensure port 3389 is open in NSGs and the Windows Firewall.
  • Check if the Remote Desktop service is running on the VM.
  • Verify network connectivity using Test-NetConnection (PowerShell) from another VM in the same VNet.
  • Use the VM Boot Diagnostics and Serial Console to access the OS and check logs or services.

You can also try redeploying the VM to reset its underlying host.

For SSH issues:

  • Confirm port 22 is permitted by NSGs and the Linux firewall (iptables, firewalld).
  • Check the SSH daemon (sshd) status on the VM.
  • Ensure your SSH keys are correctly configured on both the client and server.
  • Use the VM Boot Diagnostics and Serial Console to examine SSH logs (e.g., /var/log/auth.log).

Performance Bottlenecks

Diagnosing and resolving performance issues like slow response times or high resource utilization.

Identifying Bottlenecks

  1. Monitor Azure Metrics: Use Azure Monitor to track CPU utilization, memory usage, disk I/O (IOPS, throughput), and network traffic for your VM.
  2. Analyze OS Performance Counters: Inside the VM, use tools like Task Manager (Windows) or top/htop (Linux) to identify processes consuming excessive resources.
  3. Review Disk Performance: Check disk queue length and latency. High values often indicate I/O limitations. Consider upgrading to faster disk types (e.g., Premium SSDs, Ultra Disks) or increasing IOPS/throughput.
  4. Assess Network Throughput: Monitor network in/out data transfer. If limits are reached, consider the VM's network performance tier or VNet architecture.

Common Solutions

Boot and Operating System Errors

Resolving issues that prevent your VM from booting correctly or cause OS-level failures.

Using Boot Diagnostics

Azure's Boot Diagnostics feature is crucial for diagnosing startup failures:

Common Boot Issues and Solutions

This often indicates a corrupted system file or driver issue. Use the Serial Console to access the OS and review boot logs, or attempt to revert recent changes (e.g., driver updates, application installations).

This can happen if the OS disk is corrupted or detached. You may need to attach the OS disk to another VM to inspect or repair it, or potentially recreate the VM from a snapshot or backup.

These indicate critical OS failures. Examine the dump files (Windows) or kernel logs (Linux) using the Serial Console to identify the cause (e.g., faulty driver, hardware incompatibility).

Disk and Storage Problems

Troubleshooting issues related to Azure Managed Disks, including performance, errors, and data corruption.

Disk Performance Degradation

If your VM disks are slow:

Disk Errors and Unresponsiveness

If disks become unresponsive or report errors:

Managed Disks vs. Unmanaged Disks

Managed Disks are the recommended and modern approach, offering higher availability and simplifying management compared to unmanaged disks.

Resource Management and Quotas

Understanding and resolving issues related to Azure resource quotas and VM deployment limits.

Common Quota Issues

Requesting Quota Increases

If you hit a quota limit, you can request an increase through the Azure portal:

  1. Navigate to "Subscriptions" in the Azure portal.
  2. Select your subscription.
  3. Under "Settings", choose "Usage + quotas".
  4. Find the relevant quota, click "Request increase", and fill out the form.

Resource Naming and Tagging

Consistent naming conventions and tagging are vital for managing resources, especially in large deployments. This helps in identifying and troubleshooting specific VMs or groups of VMs.

Advanced Networking Errors

Deep dives into complex networking scenarios affecting VM connectivity and communication.

Virtual Network Peering Issues

If VMs in different VNets cannot communicate:

Load Balancer and Application Gateway Problems

When VMs behind a load balancer or application gateway are inaccessible:

DNS Resolution Failures

If VMs cannot resolve internal or external hostnames: