Introduction to Azure Networking Troubleshooting

This guide provides steps and tools to help you diagnose and resolve common networking issues within your Azure environment. Effective troubleshooting relies on a systematic approach, understanding the Azure networking components, and utilizing the right diagnostic tools.

We'll cover a range of scenarios, from basic connectivity problems to complex routing and load balancing challenges.

Connection Issues

Problems connecting to or from Azure resources are frequent. Here's how to approach them:

Common Causes:

  • Incorrect Network Security Group (NSG) or Application Security Group (ASG) rules.
  • Misconfigured firewall rules (Azure Firewall, UDRs).
  • Virtual Network (VNet) peering or gateway configuration errors.
  • Issues with VPN or ExpressRoute connections.
  • Resource availability or health status.

Diagnostic Tools:

  • Azure Network Watcher: Provides tools like Connection Troubleshoot, IP Flow Verify, and Network Security Group (NSG) Flow Logs.
  • VM Boot Diagnostics: For issues with VM connectivity upon startup.
  • Packet Capture: To inspect network traffic in real-time.
  • Azure CLI/PowerShell: For scripting checks and retrieving resource configurations.

Troubleshooting Steps:

  1. Verify NSG/ASG Rules: Use IP Flow Verify to check if traffic is allowed or denied by NSGs.
  2. Check Route Tables: Examine user-defined routes (UDRs) to ensure traffic is being routed as expected.
  3. Test Connectivity: Use Network Watcher's Connection Troubleshoot from source to destination.
  4. Examine Firewall Logs: If using Azure Firewall, check its logs for blocked traffic.

Performance Problems

Slow network performance can impact application responsiveness. Investigate these areas:

Common Causes:

  • Network latency between regions or on-premises.
  • Bandwidth limitations.
  • Suboptimal VNet peering configurations.
  • Inefficient routing.
  • Network congestion.

Diagnostic Tools:

  • Network Watcher: Connection Monitor: For measuring network performance over time.
  • Azure Monitor: Metrics: Track network interface metrics like Bytes received/sent and Packets received/sent.
  • Performance Test Tools: Tools like iperf or NDT can be run from VMs.

Troubleshooting Steps:

  1. Measure Latency: Use `ping` or `traceroute` between endpoints.
  2. Analyze Bandwidth: Check VM network egress/ingress limits and VNet throughput.
  3. Optimize Routing: Ensure efficient routes are in place, especially for traffic traversing ExpressRoute or VPN.
  4. Check for Packet Loss: Use Connection Monitor or repeated pings to identify loss.

Firewall and Network Security Group (NSG) Issues

NSGs and Azure Firewall are critical for network security. Incorrect configurations are a common source of connectivity problems.

Key Concepts:

  • NSGs: Applied at the NIC or subnet level to filter traffic based on source/destination IP, port, and protocol.
  • Azure Firewall: A managed cloud-based network security service that protects your Azure virtual network resources.
  • Order of Operations: Azure Firewall rules are evaluated before NSG rules for traffic exiting a VNet. NSGs are evaluated first for traffic entering a subnet.

Common Scenarios & Solutions:

  • Blocked Inbound Traffic: Ensure NSG rules allow traffic on the required port and protocol from the correct source.
  • Blocked Outbound Traffic: Check Azure Firewall rules and NSG outbound rules.
  • Access to Internet Denied: Verify that NAT rules or default outbound rules are configured correctly.
  • Misconfigured Service Tags: Ensure you are using appropriate service tags (e.g., Internet, AzureCloud) in your rules.

Example NSG Rule (Allowing HTTPS):

Name: AllowHTTPS
Priority: 300
Source: Any
Source port ranges: *
Destination: Any
Destination port ranges: 443
Protocol: TCP
Action: Allow
                            

Troubleshooting Steps:

  1. Use IP Flow Verify to determine if traffic is blocked by an NSG.
  2. Enable NSG Flow Logs to analyze traffic patterns and identify denied connections.
  3. Review Azure Firewall logs for any denied connections.
  4. Double-check the priority and direction of your rules.

DNS Resolution Issues

Problems resolving hostnames to IP addresses can prevent access to services.

Common Causes:

  • Incorrect Azure DNS configuration.
  • Issues with on-premises DNS servers forwarding to Azure.
  • Firewall blocking DNS traffic (UDP/TCP port 53).
  • Corrupted DNS cache on the client or server.

Diagnostic Tools:

  • `nslookup` / `dig` commands: Test DNS resolution from VMs.
  • Azure DNS Analytics: In Azure Monitor, provides insights into DNS queries.
  • Network Watcher: Packet Capture: To see DNS traffic.

Troubleshooting Steps:

  1. From a VM, run nslookup example.com to check resolution.
  2. Verify that VMs are configured to use the correct DNS servers (Azure-provided or custom).
  3. Ensure that DNS forwarding rules are correctly set up for hybrid environments.
  4. Check if DNS traffic is allowed by NSGs and firewalls.

Routing Errors

Incorrect routes can send traffic to the wrong destinations or blackhole it.

Key Concepts:

  • Route Tables: Associate with subnets to define custom routes.
  • System Routes: Default routes provided by Azure.
  • BGP Routes: For dynamic routing with VPN Gateways and ExpressRoute.

Diagnostic Tools:

  • Network Watcher: Next Hop: Determines the next hop for traffic from a VM.
  • Network Watcher: IP Flow Verify: Helps understand routing decisions.
  • Azure CLI/PowerShell: To view route tables and effective routes.

Troubleshooting Steps:

  1. Use Next Hop from the source VM to see where traffic is directed.
  2. Examine the effective routes on the VM's network interface.
  3. Verify that user-defined routes (UDRs) are correctly configured and associated with the correct subnets.
  4. If using BGP, check route propagation and peer status.

Load Balancer Issues

Problems with Azure Load Balancer or Application Gateway can lead to unavailable applications.

Common Causes:

  • Unhealthy backend pool members.
  • Incorrect load balancing rules or probes.
  • NSG/Firewall blocking health probe traffic.
  • Configuration errors in the load balancer itself.

Diagnostic Tools:

  • Azure Load Balancer Metrics: Track health probe status, backend health, etc.
  • Azure Monitor: Application Gateway Metrics and Logs.
  • `tcpping` utility: Can be used from VMs to test port accessibility.

Troubleshooting Steps:

  1. Check the health status of backend pool members in the load balancer or Application Gateway configuration.
  2. Verify that health probe endpoints are accessible and responding correctly.
  3. Ensure NSG rules allow traffic from the load balancer's probe IP to the backend servers on the probe port.
  4. Test direct connectivity to a backend VM to rule out other issues.

VPN and Gateway Issues

Connectivity problems with Azure VPN Gateway or ExpressRoute circuits.

Common Causes:

  • Incorrect VPN tunnel configuration (PSK, IKE versions).
  • IP address conflicts.
  • On-premises firewall blocking VPN traffic.
  • BGP peering issues.
  • Circuit capacity or performance issues.

Diagnostic Tools:

  • Azure VPN Gateway Diagnostics.
  • Network Watcher: VPN Troubleshoot.
  • Azure Monitor: Gateway metrics.
  • On-premises network device logs.

Troubleshooting Steps:

  1. Check the connection status of your VPN or ExpressRoute circuit.
  2. Verify that shared keys or certificates match on both ends.
  3. Ensure that the correct IP address spaces are advertised and received.
  4. If using BGP, check the BGP peering status and learned routes.
  5. Consult your on-premises network team for potential firewall blocks or configuration issues.