Introduction to Azure Networking Troubleshooting
This guide provides steps and tools to help you diagnose and resolve common networking issues within your Azure environment. Effective troubleshooting relies on a systematic approach, understanding the Azure networking components, and utilizing the right diagnostic tools.
We'll cover a range of scenarios, from basic connectivity problems to complex routing and load balancing challenges.
Connection Issues
Problems connecting to or from Azure resources are frequent. Here's how to approach them:
Common Causes:
- Incorrect Network Security Group (NSG) or Application Security Group (ASG) rules.
- Misconfigured firewall rules (Azure Firewall, UDRs).
- Virtual Network (VNet) peering or gateway configuration errors.
- Issues with VPN or ExpressRoute connections.
- Resource availability or health status.
Diagnostic Tools:
- Azure Network Watcher: Provides tools like Connection Troubleshoot, IP Flow Verify, and Network Security Group (NSG) Flow Logs.
- VM Boot Diagnostics: For issues with VM connectivity upon startup.
- Packet Capture: To inspect network traffic in real-time.
- Azure CLI/PowerShell: For scripting checks and retrieving resource configurations.
Troubleshooting Steps:
- Verify NSG/ASG Rules: Use IP Flow Verify to check if traffic is allowed or denied by NSGs.
- Check Route Tables: Examine user-defined routes (UDRs) to ensure traffic is being routed as expected.
- Test Connectivity: Use Network Watcher's Connection Troubleshoot from source to destination.
- Examine Firewall Logs: If using Azure Firewall, check its logs for blocked traffic.
Performance Problems
Slow network performance can impact application responsiveness. Investigate these areas:
Common Causes:
- Network latency between regions or on-premises.
- Bandwidth limitations.
- Suboptimal VNet peering configurations.
- Inefficient routing.
- Network congestion.
Diagnostic Tools:
- Network Watcher: Connection Monitor: For measuring network performance over time.
- Azure Monitor: Metrics: Track network interface metrics like Bytes received/sent and Packets received/sent.
- Performance Test Tools: Tools like iperf or NDT can be run from VMs.
Troubleshooting Steps:
- Measure Latency: Use `ping` or `traceroute` between endpoints.
- Analyze Bandwidth: Check VM network egress/ingress limits and VNet throughput.
- Optimize Routing: Ensure efficient routes are in place, especially for traffic traversing ExpressRoute or VPN.
- Check for Packet Loss: Use Connection Monitor or repeated pings to identify loss.
Firewall and Network Security Group (NSG) Issues
NSGs and Azure Firewall are critical for network security. Incorrect configurations are a common source of connectivity problems.
Key Concepts:
- NSGs: Applied at the NIC or subnet level to filter traffic based on source/destination IP, port, and protocol.
- Azure Firewall: A managed cloud-based network security service that protects your Azure virtual network resources.
- Order of Operations: Azure Firewall rules are evaluated before NSG rules for traffic exiting a VNet. NSGs are evaluated first for traffic entering a subnet.
Common Scenarios & Solutions:
- Blocked Inbound Traffic: Ensure NSG rules allow traffic on the required port and protocol from the correct source.
- Blocked Outbound Traffic: Check Azure Firewall rules and NSG outbound rules.
- Access to Internet Denied: Verify that NAT rules or default outbound rules are configured correctly.
- Misconfigured Service Tags: Ensure you are using appropriate service tags (e.g.,
Internet,AzureCloud) in your rules.
Example NSG Rule (Allowing HTTPS):
Name: AllowHTTPS
Priority: 300
Source: Any
Source port ranges: *
Destination: Any
Destination port ranges: 443
Protocol: TCP
Action: Allow
Troubleshooting Steps:
- Use IP Flow Verify to determine if traffic is blocked by an NSG.
- Enable NSG Flow Logs to analyze traffic patterns and identify denied connections.
- Review Azure Firewall logs for any denied connections.
- Double-check the priority and direction of your rules.
DNS Resolution Issues
Problems resolving hostnames to IP addresses can prevent access to services.
Common Causes:
- Incorrect Azure DNS configuration.
- Issues with on-premises DNS servers forwarding to Azure.
- Firewall blocking DNS traffic (UDP/TCP port 53).
- Corrupted DNS cache on the client or server.
Diagnostic Tools:
- `nslookup` / `dig` commands: Test DNS resolution from VMs.
- Azure DNS Analytics: In Azure Monitor, provides insights into DNS queries.
- Network Watcher: Packet Capture: To see DNS traffic.
Troubleshooting Steps:
- From a VM, run
nslookup example.comto check resolution. - Verify that VMs are configured to use the correct DNS servers (Azure-provided or custom).
- Ensure that DNS forwarding rules are correctly set up for hybrid environments.
- Check if DNS traffic is allowed by NSGs and firewalls.
Routing Errors
Incorrect routes can send traffic to the wrong destinations or blackhole it.
Key Concepts:
- Route Tables: Associate with subnets to define custom routes.
- System Routes: Default routes provided by Azure.
- BGP Routes: For dynamic routing with VPN Gateways and ExpressRoute.
Diagnostic Tools:
- Network Watcher: Next Hop: Determines the next hop for traffic from a VM.
- Network Watcher: IP Flow Verify: Helps understand routing decisions.
- Azure CLI/PowerShell: To view route tables and effective routes.
Troubleshooting Steps:
- Use Next Hop from the source VM to see where traffic is directed.
- Examine the effective routes on the VM's network interface.
- Verify that user-defined routes (UDRs) are correctly configured and associated with the correct subnets.
- If using BGP, check route propagation and peer status.
Load Balancer Issues
Problems with Azure Load Balancer or Application Gateway can lead to unavailable applications.
Common Causes:
- Unhealthy backend pool members.
- Incorrect load balancing rules or probes.
- NSG/Firewall blocking health probe traffic.
- Configuration errors in the load balancer itself.
Diagnostic Tools:
- Azure Load Balancer Metrics: Track health probe status, backend health, etc.
- Azure Monitor: Application Gateway Metrics and Logs.
- `tcpping` utility: Can be used from VMs to test port accessibility.
Troubleshooting Steps:
- Check the health status of backend pool members in the load balancer or Application Gateway configuration.
- Verify that health probe endpoints are accessible and responding correctly.
- Ensure NSG rules allow traffic from the load balancer's probe IP to the backend servers on the probe port.
- Test direct connectivity to a backend VM to rule out other issues.
VPN and Gateway Issues
Connectivity problems with Azure VPN Gateway or ExpressRoute circuits.
Common Causes:
- Incorrect VPN tunnel configuration (PSK, IKE versions).
- IP address conflicts.
- On-premises firewall blocking VPN traffic.
- BGP peering issues.
- Circuit capacity or performance issues.
Diagnostic Tools:
- Azure VPN Gateway Diagnostics.
- Network Watcher: VPN Troubleshoot.
- Azure Monitor: Gateway metrics.
- On-premises network device logs.
Troubleshooting Steps:
- Check the connection status of your VPN or ExpressRoute circuit.
- Verify that shared keys or certificates match on both ends.
- Ensure that the correct IP address spaces are advertised and received.
- If using BGP, check the BGP peering status and learned routes.
- Consult your on-premises network team for potential firewall blocks or configuration issues.