Introduction to Troubleshooting Virtual WAN
Azure Virtual WAN provides a global network architecture that allows you to connect multiple Azure regions and on-premises sites. Troubleshooting issues within this complex environment requires a systematic approach. This document outlines common problems and their solutions.
Common Connectivity Issues
Connectivity problems are the most frequent challenges encountered. They can stem from misconfigurations at various points in the network path.
1. No Connectivity Between Hub and Spoke Networks
- Verify that the VNet peering is correctly configured between the hub virtual network and the spoke virtual networks.
- Ensure that network security groups (NSGs) associated with subnets in both hub and spoke VNets allow traffic.
- Check the effective routes on the VM's network interface in the spoke VNet to confirm that routes to other spokes or on-premises are present and correct.
2. No Connectivity to On-Premises Sites
- Confirm the VPN device configuration at the on-premises location matches the Virtual WAN VPN site configuration (IPSec parameters, pre-shared keys, etc.).
- Check the status of the VPN tunnel in the Azure portal. If it's down, review logs on both Azure and the on-premises VPN device.
- Ensure that the on-premises firewall allows traffic to and from the Azure Virtual WAN VPN endpoint IP addresses.
- Verify that the on-premises network has routes pointing to the Azure Virtual WAN hub's address space.
VPN Gateway Troubleshooting
Site-to-Site VPN Connectivity
If your S2S VPN tunnel isn't establishing or is intermittently dropping:
- Tunnel Status: Check the tunnel status in the Azure portal. Look for error messages or codes.
- IPSec/IKE Parameters: Ensure that the IPSec and IKE phase 1 and phase 2 parameters (encryption, integrity, DH group, lifetime) are identical on both Azure and your on-premises VPN device.
- Pre-Shared Key: Double-check that the pre-shared key (PSK) is correct on both ends.
- NAT Traversal: If your on-premises VPN device is behind a NAT, ensure NAT-T is enabled and configured correctly.
- Public IP Addresses: Verify that the correct public IP addresses for your on-premises VPN device and the Azure VPN gateway are configured.
Example of common mismatches:
# Phase 1 (IKE) Mismatches
# Azure: AES256, SHA256, DHGroup14
# On-Prem: AES128, SHA1, DHGroup2
Point-to-Site (P2S) VPN Connectivity
For P2S issues:
- Client Configuration: Ensure the P2S VPN client profile is downloaded correctly and installed on user machines.
- Certificates: Verify that the root certificate and the client certificate (if using certificate-based authentication) are correctly installed on the client machine.
- Azure Portal Status: Check the P2S connection status for clients in the Azure portal.
- Firewall Rules: Confirm that local firewalls on client machines are not blocking the VPN connection.
VPN Performance Issues
- Bandwidth: Ensure the VPN gateway SKU is appropriately sized for your throughput requirements.
- MTU: Mismatched MTU values can cause fragmentation and performance degradation. Consider using MSS clamping if necessary.
- Network Path: Investigate latency and packet loss on the underlying internet path.
ExpressRoute Troubleshooting
General ExpressRoute Problems
- Link Status: Check the ExpressRoute circuit status in the Azure portal. It should be 'Provisioned' and 'Enabled'.
- Provider Connectivity: Confirm that your connectivity provider has established the physical connection and that the peering status with Azure is active.
- VLAN/Encap: Ensure the correct VLAN and encapsulation type (e.g., QinQ) are configured with your provider.
ExpressRoute Peering Problems
- Azure Private Peering: Verify that the VNet and ExpressRoute circuit are in the same location and that the BGP AS numbers and IP addresses are correctly configured.
- Microsoft Peering: Ensure that the correct Microsoft peering IP prefixes are advertised and that you are attempting to reach public Azure services.
Routing Issues
Routing is critical for traffic flow between connected networks.
Route Propagation
- Effective Routes: Use the "Effective routes" feature on a VM's NIC to diagnose routing. Check for routes to destinations you expect to reach.
- Connection Propagation: In the Virtual WAN hub, ensure that routes from connected VNets and VPN/ExpressRoute connections are propagating correctly to the hub's routing tables.
- VNet Route Propagation: Ensure that route propagation is enabled for connected VNets in the hub settings.
Route Table Analysis
- Hub Routing Tables: Examine the routing tables within the Virtual WAN hub. Understand which routes are present and where they are directing traffic.
- Azure Route Server: If using Azure Route Server for routing between VNets and NVAs, verify its configuration and its BGP peering with your NVAs.
BGP Issues
- BGP Peering Status: Check the BGP status in the Azure portal for VPN or ExpressRoute connections. Ensure peers are established.
- ASN Mismatches: Verify that the Autonomous System Numbers (ASNs) are correctly configured on both ends of the BGP session.
- IP Address Configuration: Ensure that the BGP peering IP addresses are correctly assigned and routable.
- Route Advertisements: Confirm that the correct prefixes are being advertised from your on-premises networks or spokes.
Example BGP troubleshooting command (on-prem device):
show ip bgp summary
Firewall and Security Issues
Azure Firewall Policy
If traffic is being blocked unexpectedly by Azure Firewall deployed in the hub:
- Network Rules: Review network rules to ensure they permit the required traffic (protocol, port, source/destination IP).
- Application Rules: Check application rules for FQDNs and protocols if you are inspecting application-layer traffic.
- DNAT Rules: Verify DNAT rules if you are redirecting inbound traffic to internal servers.
- Threat Intelligence: Temporarily disable threat intelligence filtering to see if it resolves the issue, then re-enable with specific exceptions if needed.
Network Security Group (NSG) Rules
- Inbound/Outbound Rules: Carefully examine NSG rules applied to subnets in your hub and spoke VNets. Ensure they allow necessary communication and deny unwanted traffic.
- Priority: Remember that NSG rules are processed in order of priority. A lower priority rule can override a higher one.
- Deny Rules: Pay close attention to explicit deny rules.
Leveraging Monitoring Tools
Azure provides several tools to help diagnose issues:
- Azure Network Watcher: Use Connection Troubleshoot, IP Flow Verify, Packet Capture, and Next Hop features to analyze connectivity and routing.
- Azure Monitor: Collect and analyze logs from Virtual WAN, VPN Gateways, and Azure Firewall. Set up alerts for critical events.
- Diagnostics Settings: Configure diagnostics settings on Virtual WAN resources to send logs to Log Analytics, Storage Accounts, or Event Hubs for deeper analysis.
- VPN Diagnostics (for VPN Gateway): Access specific diagnostic tools for VPN Gateway issues within the Azure portal.
Common Error Codes and Meanings
Understanding common error codes can speed up your investigation:
ERR_IPSEC_TUNNEL_IS_DOWN: Indicates a problem with the IPSec tunnel establishment. Check crypto parameters, PSK, and network connectivity.ERR_CONNECTION_TIMED_OUT: Often points to network reachability issues or firewall blocking.ERR_BGP_PEER_DOWN: BGP session has dropped. Investigate routing, ASN, and IP address configuration for BGP peers.ERR_PACKET_LOSS: High packet loss on the network path. Investigate the underlying network infrastructure.