MSDN Documentation

Azure Networking Troubleshooting Guide

Troubleshooting Azure Networking

Introduction

Azure networking provides a robust and scalable infrastructure for your cloud applications. However, like any complex system, issues can arise. This guide aims to provide a comprehensive approach to identifying, diagnosing, and resolving common Azure networking problems.

Effective troubleshooting requires a systematic approach, a good understanding of Azure networking concepts, and familiarity with the available tools.

Common Issues

Connectivity Problems

These are the most frequent issues, ranging from complete connection failures to intermittent packet loss.

  • Virtual Machine (VM) to VM connectivity: Ensuring VMs within the same or different VNet can communicate.
  • On-premises to Azure connectivity: Troubleshooting VPN gateways, ExpressRoute connections, and hybrid network configurations.
  • Internet connectivity: Diagnosing issues with outbound or inbound traffic to/from the public internet.
  • Service endpoints and private endpoints: Verifying access to Azure PaaS services.

Performance Bottlenecks

Slow network performance can impact application responsiveness and user experience.

  • Throughput limitations: Identifying if network bandwidth is the limiting factor.
  • Latency: Diagnosing high latency between resources or to/from on-premises.
  • Network interface (NIC) limitations: Checking for saturated NICs or incorrect configurations.

Security Misconfigurations

Incorrectly configured network security groups (NSGs), firewalls, or routing can inadvertently block legitimate traffic or expose resources.

  • NSG rules blocking traffic: Overly restrictive inbound or outbound rules.
  • Azure Firewall issues: Policy misconfigurations or rules preventing desired connections.
  • Network Security Group Flow Logs analysis: Identifying suspicious traffic patterns or denied connections.

Routing Issues

Incorrect routing tables can cause traffic to be sent to the wrong destination or dropped.

  • User Defined Routes (UDRs): Misconfigured UDRs forcing traffic through firewalls or to null routes.
  • Gateway routing: Problems with traffic flow through VPN or ExpressRoute gateways.
  • BGP peering issues: For hybrid connections, problems with BGP route advertisement and reception.

Troubleshooting Tools

Azure Network Watcher

Network Watcher is a powerful suite of tools for monitoring and troubleshooting Azure network resources. Key features include:

  • Connection Troubleshoot: Tests connectivity between two endpoints.
  • IP Flow Verify: Checks if traffic is allowed or denied by NSGs.
  • Next Hop: Determines the next hop for traffic from a VM.
  • Packet Capture: Captures network packets for deeper analysis.
  • Topology: Visualizes network topology.
  • Connection Monitor: Monitors network connectivity to and from Azure resources.

TCPdump & Wireshark

For in-depth packet analysis, these tools are invaluable.

You can use Network Watcher's Packet Capture feature to collect packet data from VMs and then download these captures for analysis with Wireshark.

Example capture command (Linux):

sudo tcpdump -i eth0 -w capture.pcap host 10.0.0.4 and port 80

Azure Monitor

Azure Monitor provides insights into your Azure resources. For networking, it's useful for:

  • Collecting metrics on network throughput, latency, and packet drops.
  • Analyzing Network Security Group Flow Logs to understand traffic patterns.
  • Setting up alerts for abnormal network behavior.

Azure CLI & PowerShell

These command-line tools are essential for querying resource configurations and performing basic network tests.

Example (Azure CLI) to check NSG rules:

az network nsg rule list --resource-group myResourceGroup --nsg-name myNsg --output table

Example (Azure PowerShell) to check VM network interface:

Get-AzNetworkInterface -ResourceGroupName "myResourceGroup" -Name "myNic"

Step-by-Step Troubleshooting Guide

  1. Define the Problem: Clearly understand what is not working, who is affected, and when the issue started.
  2. Gather Information: Collect relevant details such as source/destination IPs, ports, protocols, error messages, and timestamps.
  3. Check Basic Connectivity:
    • Can the source VM ping the destination IP?
    • Can you SSH/RDP to the destination?
  4. Utilize Network Watcher:
    • Use Connection Troubleshoot to test basic reachability.
    • Use IP Flow Verify to check NSG rules.
    • Use Next Hop to understand routing.
  5. Review Network Security Groups (NSGs): Verify inbound and outbound rules on both source and destination NSGs.
  6. Examine Routing Tables: Check for any User Defined Routes (UDRs) that might be misdirecting traffic.
  7. Inspect Firewalls: If Azure Firewall or Network Virtual Appliances (NVAs) are in use, check their logs and policies.
  8. Analyze Network Watcher Packet Capture: If necessary, capture traffic and analyze with Wireshark.
  9. Check Azure Monitor: Review metrics and logs for any anomalies.
  10. Verify DNS Resolution: Ensure DNS is correctly resolving names to IPs.
  11. Consider PaaS Services: If connecting to Azure services, check their specific networking configurations (e.g., service endpoints, private endpoints).
  12. Test from Different Points: Try testing from an on-premises machine, another VM, or a different subnet to isolate the issue.

Advanced Topics

  • Network Virtual Appliances (NVAs): Troubleshooting third-party firewall, load balancer, or other network virtual appliances deployed in Azure.
  • ExpressRoute & VPN Gateway Performance: Deep dives into BGP, MTU, and throughput optimization.
  • Azure Load Balancer & Application Gateway: Diagnosing issues with load balancing rules, health probes, and session affinity.
  • Service Fabric & AKS Networking: Specific networking considerations for containerized and microservice applications.

Resources

Community Forums and Blogs:

Engage with the Azure community on Microsoft Q&A and relevant tech blogs for real-world scenarios and solutions.