Troubleshooting Azure Network Watcher
Azure Network Watcher provides capabilities to monitor, diagnose, and view metrics for your Azure network resources. This guide focuses on using Network Watcher to troubleshoot common networking issues within your Azure environment.
Common Troubleshooting Scenarios
Connectivity Issues
Problems with traffic flow between virtual machines, to/from the internet, or between VNets are frequent. Network Watcher helps identify where connectivity is being blocked.
- VM to VM: Ensure Network Security Groups (NSGs) and User Defined Routes (UDRs) are correctly configured.
- VM to Internet: Check NAT rules, firewall rules, and routing.
- On-premises to Azure: Verify VPN or ExpressRoute configuration, BGP routing, and NSGs.
Performance Degradation
Slow network performance can be caused by various factors including suboptimal routing, bandwidth limitations, or network congestion. While Network Watcher doesn't directly measure bandwidth, tools like Packet Capture and Flow Logs can help identify unusual traffic patterns.
Security Group Rule Mismatches
Incorrectly configured NSG rules are a primary cause of connectivity problems. IP Flow Verify is crucial for diagnosing these.
- Verify inbound and outbound rules.
- Check priority and stateful nature of rules.
- Ensure correct protocol, ports, and IP addresses/ranges are used.
Application Gateway Issues
Troubleshooting Application Gateway involves checking backend health, listener configurations, routing rules, and SSL certificates.
Load Balancer Problems
Diagnosing issues with Azure Load Balancer requires examining backend pool health, health probes, and load balancing rules.
Network Watcher Tools
Network Watcher offers a suite of powerful tools to help diagnose and troubleshoot network issues:
IP Flow Verify
Purpose: Determines if traffic is allowed or denied to or from a virtual machine. It checks Network Security Group (NSG) rules and route tables.
Use Case: Ideal for diagnosing connectivity issues caused by NSG rules. You can specify source/destination IP, port, and protocol to see which rule is blocking traffic.
# Example: Checking inbound traffic to a VM
az network watcher show-security-group-view --resource-group MyResourceGroup --vm vm1 --protocol TCP --source-port 80 --destination-port 80 --direction Inbound
Next Hop
Purpose: Determines the next hop for traffic originating from a virtual machine. This helps understand routing decisions.
Use Case: Useful for troubleshooting routing problems, especially when UDRs are involved, to see if traffic is being routed to the intended destination or appliance.
# Example: Finding the next hop for traffic to 8.8.8.8 from vm1
az network watcher get-next-hop --resource-group MyResourceGroup --vm vm1 --destination-ip 8.8.8.8
Packet Capture
Purpose: Captures network packets on a virtual machine. This provides granular insight into network traffic.
Use Case: Deep dive analysis for unexpected traffic, application-level communication issues, or when other tools don't provide enough detail. Requires installation on the VM or use of agent-based capture.
Connection Monitor
Purpose: Monitors network connectivity between two endpoints. It can monitor connectivity to Azure resources, custom endpoints, and on-premises endpoints.
Use Case: Proactively monitors the health of connections and provides alerts for failures. Excellent for establishing baseline connectivity and detecting intermittent issues.
Flow Logs
Purpose: Captures information about IP traffic flowing through your Azure VNets. Flow logs are recorded at the NIC level for VNet flow logs and at the Application Gateway or Load Balancer level for their respective flow logs.
Use Case: Analyze traffic patterns, identify network usage, and troubleshoot network security issues. Useful for auditing and understanding traffic flow over time.
Network Configuration Diagnostic
Purpose: Checks the configuration of a virtual machine and its associated network resources to identify common configuration issues.
Use Case: A good starting point for diagnosing a VM that has lost network connectivity or is exhibiting unexpected network behavior.
Troubleshooting Workflow
A systematic approach is key to efficient troubleshooting:
- Identify the symptom: Clearly define the problem (e.g., "VM A cannot reach VM B on port 80").
- Gather context: Understand the affected resources, their locations, and recent changes.
- Start broad, then narrow down:
- Use IP Flow Verify for NSG/route issues.
- Use Next Hop to understand routing.
- Use Connection Monitor for proactive monitoring and connection health.
- Deep dive if necessary:
- Use Packet Capture for detailed traffic analysis.
- Use Flow Logs for historical traffic analysis and auditing.
- Check service-specific diagnostics: For Application Gateway or Load Balancer issues, use their specific diagnostic tools and backend health status.
- Review logs: Examine system logs, application logs, and any analytics from Flow Logs.
Best Practices
- Enable Network Watcher: Ensure Network Watcher is enabled for all regions where you have network resources.
- Configure Connection Monitor: Set up Connection Monitor for critical endpoints to proactively detect issues.
- Enable Flow Logs: Turn on Flow Logs for your VNets, especially for security-sensitive workloads or for auditing purposes.
- Regularly Review NSG Rules: Audit and simplify NSG rules to minimize complexity and potential misconfigurations.
- Document Network Topology: Maintain clear documentation of your VNet architecture, routing, and security policies.
- Utilize Tagging: Tag resources consistently to facilitate filtering and management within Network Watcher.