Troubleshoot Azure ExpressRoute Connectivity

This document provides guidance on troubleshooting common connectivity issues with Azure ExpressRoute. ExpressRoute provides private connections between your on-premises network and Microsoft Azure. When connectivity issues arise, it's crucial to have a systematic approach to identify and resolve them.

Important: Before diving into troubleshooting, ensure you have reviewed the ExpressRoute overview and understand the basic architecture.

Common Issues and Solutions

1. No Connectivity / Intermittent Connectivity

This is one of the most common issues. It can manifest as a complete lack of traffic flow or sporadic interruptions.

Possible Causes:

  • Incorrect IP Addressing or Subnetting: Ensure that your on-premises network and Azure virtual networks are configured with non-overlapping IP address spaces.
  • BGP Peering Issues: BGP (Border Gateway Protocol) is essential for routing. Verify BGP configuration on your edge routers and Azure side.
  • Access Control Lists (ACLs) or Network Security Groups (NSGs): Firewalls or NSGs on either end might be blocking traffic.
  • Physical Layer Issues: Check the status of your ExpressRoute circuit and the physical connectivity with your connectivity provider.
  • Route Advertisement Problems: Ensure that routes from your on-premises network are being advertised correctly to Azure and vice-versa.

Troubleshooting Steps:

  1. Verify IP Address Spaces: Use tools like ipconfig (Windows) or ifconfig (Linux) on your on-premises servers and check Azure VNet settings.
  2. Check BGP Status:

    On your edge router, check the BGP status. Look for established peering sessions with Microsoft's edge routers. On Azure, you can check the BGP peering status via the Azure portal under the ExpressRoute circuit's "BGP peers" section.

    # Example CLI command (conceptual, actual commands vary by vendor)
    show ip bgp summary
  3. Inspect ACLs and NSGs: Review all firewall rules and NSGs applied to your subnets and virtual machines that might be impacting traffic flow.
  4. Check ExpressRoute Circuit Status: In the Azure portal, check the status of your ExpressRoute circuit. It should be "Provisioned" and "Enabled".
  5. Monitor Route Tables: Use Azure CLI or PowerShell to inspect the effective routes for your VMs and on-premises routers to confirm routes are being learned and advertised correctly.
    # Example Azure CLI command
    az network route-table list --resource-group MyResourceGroup
  6. Contact Connectivity Provider: If physical layer issues are suspected, engage your connectivity provider to check the status of the physical link.

2. High Latency

High latency can impact application performance. It's often related to the physical path of the traffic or suboptimal routing.

Possible Causes:

  • Geographic Distance: The physical distance between your on-premises location and the Azure region.
  • Congestion: Network congestion within your on-premises network, your provider's network, or Microsoft's network.
  • Suboptimal Path Selection: BGP might be selecting a less optimal path.
  • Performance Bottlenecks: Issues with your network hardware or configurations.

Troubleshooting Steps:

  1. Perform Traceroutes: Run traceroutes from your on-premises machines to Azure VMs and vice-versa to identify where latency is introduced.
    # Example command
    traceroute 10.0.0.4
  2. Check ExpressRoute Bandwidth: Ensure your ExpressRoute circuit is adequately sized for your traffic needs.
  3. Review BGP Attributes: Analyze BGP attributes like AS-PATH and LOCAL_PREF to understand path selection.
  4. Consult Connectivity Provider: Discuss potential network congestion or routing inefficiencies with your provider.

3. Bandwidth Underutilization

If your application is not achieving the expected throughput despite a provisioned ExpressRoute circuit.

Possible Causes:

  • Application-Level Limitations: The application itself might be the bottleneck.
  • TCP Window Size: Incorrect TCP window sizing can limit throughput.
  • Packet Loss: Even small amounts of packet loss can significantly degrade TCP performance.
  • Under-provisioning: The circuit might be provisioned with insufficient bandwidth.

Troubleshooting Steps:

  1. Test with iPerf: Use tools like iperf3 to test raw network throughput between your on-premises and Azure endpoints.
  2. Check TCP Settings: Ensure that TCP auto-tuning is enabled and appropriately configured on both ends.
  3. Monitor Packet Loss: Use ping and other network monitoring tools to detect packet loss.
  4. Review Circuit Speed: Confirm the provisioned speed of your ExpressRoute circuit.
Warning: Before making significant changes to your network configuration, especially routing or firewall rules, ensure you have a rollback plan and perform changes during scheduled maintenance windows.

Tools for Troubleshooting

  • Azure Network Watcher: Provides tools like connection troubleshoot, IP flow verify, next hop, and packet capture for diagnosing network issues within Azure.
  • Azure CLI / Azure PowerShell: For querying ExpressRoute circuit status, BGP peering, and route tables.
  • Connectivity Provider Tools: Your provider may offer specific diagnostic tools or reports.
  • Standard Network Utilities: ping, traceroute, netstat, tcpdump, wireshark.

Escalation

If you have followed these steps and are still experiencing issues, you may need to engage Microsoft Support. Ensure you have gathered all relevant diagnostic information, including:

  • ExpressRoute circuit ID
  • Connectivity provider details
  • Specific symptoms (e.g., no connectivity, high latency)
  • Timestamps of when issues started and occurred
  • Results of your troubleshooting steps
  • Relevant configuration snippets (sanitized for sensitive information)