Azure Load Balancer Troubleshooting Guide

Introduction

Azure Load Balancer distributes incoming traffic across a pool of backend resources, ensuring high availability and responsiveness. This guide provides steps and best practices for troubleshooting common problems you might encounter.

Before diving into specific issues, ensure you have the following in place:

Proper network configuration (VNet, subnets, NSGs).
Backend health probes configured and passing.
Frontend IP configuration matches your requirements.
Backend pool members are healthy and accessible.

Common Issues and Solutions

Troubleshooting Steps:

Check Health Probes:
- Ensure the health probe protocol (TCP, HTTP, HTTPS) and port are correctly configured.
- Verify that the backend instances are listening on the specified probe port and responding correctly.
- Check Network Security Groups (NSGs) on backend subnets to ensure they allow inbound traffic from the load balancer's probe IP (usually 168.63.129.16) on the probe port.
Verify Backend Pool:
- Confirm that the backend instances are correctly added to the load balancer's backend pool.
- Ensure the IP addresses configured in the backend pool match the actual IP addresses of your instances.
Inspect Network Security Groups (NSGs):
- On the backend subnet's NSG, check for any rules blocking inbound traffic from the load balancer's frontend IP or the probe IP.
- Ensure NSGs on the frontend subnet (if applicable) allow inbound traffic to the load balancer.
Review Load Balancing Rules:
- Verify the frontend IP, protocol, frontend port, and backend port settings in your load balancing rule.
- Ensure the rule is enabled and associated with the correct backend pool and health probe.
Test Connectivity Directly:
- Try to connect to a backend instance directly using its private IP address from within the VNet to rule out instance-level issues.

Tip: Use Azure Network Watcher's IP Flow Verify to check if NSGs are blocking traffic.

Troubleshooting Steps:

Check Application Status:
- Ensure the application or service running on the backend instance is healthy and responding to requests on the configured port.
Review Health Probe Configuration:
- Double-check the health probe's protocol, port, request path (for HTTP/S probes), and interval.
- If using HTTP/S probes, ensure the specified request path returns a 2xx or 3xx status code.
Firewall/Antivirus on Instances:
- Check the operating system's firewall or any antivirus software running on the backend instances. Ensure they are not blocking traffic on the health probe port or application port.
Instance Resource Utilization:
- Monitor CPU, memory, and network utilization on backend instances. High utilization can prevent instances from responding to probes.
Azure Load Balancer Diagnostics:
- Utilize Azure Load Balancer diagnostics logs in Azure Monitor for detailed insights into probe failures.

Tip: For HTTP/S probes, consider using a simple endpoint that returns a static response to ensure probe functionality.

Troubleshooting Steps:

Verify Load Balancing Rule:
- Ensure the "Session persistence" (or "None") setting in your load balancing rule is configured correctly. Set it to "Client IP" or "Client IP and protocol" as needed.
Client IP Source:
- Understand how the client's IP address is seen by the load balancer. If traffic is coming through multiple NATs or proxies, the source IP seen by the load balancer might not be the true client IP.
- For Standard Load Balancer, consider using "Client IP and protocol" if your application requires it.
Backend Application Handling:
- Ensure your backend application is designed to handle state or is stateless and can be scaled out without session dependency.
- If your application relies on sticky sessions, ensure it doesn't have internal logic that overrides or conflicts with the load balancer's session persistence.
Test with Different Clients:
- Test from different client machines or network locations to confirm consistent behavior.

Important: Session persistence is based on the source IP address *as seen by the load balancer*. Network configurations can impact this.

Troubleshooting Steps:

Check Backend Instance Performance:
- Monitor CPU, memory, and disk I/O on your backend instances.
- Ensure instances are adequately sized for the workload.
Review Health Probe Intervals:
- Very frequent health probes can add overhead. Ensure the probe interval is reasonable for your application's needs.
Load Balancer SKU:
- Ensure you are using the appropriate SKU (Standard vs. Basic). Standard Load Balancer offers higher performance and availability.
Network Path:
- Use Azure Network Watcher's connection troubleshoot or `tcpping` tool to diagnose latency between clients and the load balancer, and between the load balancer and backend instances.
- Check for any network bottlenecks in your VNet or between your on-premises network and Azure.
Backend Application Optimization:
- Optimize your application code and database queries for better performance.

Info: Azure Load Balancer itself adds minimal latency. Most performance issues stem from backend instances or network congestion.

Using Azure Monitor and Diagnostics

Azure Monitor provides crucial insights into your Load Balancer's health and performance. Leverage the following:

Metrics: Monitor metrics like Data Path Availability, Total Packets In/Out, Total Flows, and Health Probe Status.
Diagnostic Settings: Enable sending Load Balancer logs (e.g., FrontendDiagnosticLog, BackendDiagnosticLog, LoadBalancerProbeLog) to Log Analytics, Storage Accounts, or Event Hubs for deeper analysis.
Log Analytics: Query logs to understand specific connection attempts, probe failures, and traffic flow patterns.

Example Log Analytics query for probe failures:

AzureDiagnostics
| where ResourceProvider == "MICROSOFT.NETWORK" and Category == "LoadBalancerProbeLog"
| where ProbeStatus == "Unhealthy"
| project TimeGenerated, ResourceId, ProbeName, BackendPoolName, BackendInstanceIP, ProbeStatus, InstanceStatus
| order by TimeGenerated desc

Best Practices

Use Standard SKU: For production workloads, always use the Standard SKU of Azure Load Balancer for enhanced features and reliability.
Configure Health Probes Correctly: Accurate health probes are critical for the load balancer to make informed decisions about traffic distribution.
Implement NSGs Thoughtfully: Ensure NSGs allow necessary traffic for the load balancer and its probes while maintaining security.
Monitor Regularly: Proactive monitoring using Azure Monitor helps identify and resolve issues before they impact users.
Document Your Configuration: Keep detailed records of your Load Balancer configuration, including frontend IPs, backend pools, and rules, for easier troubleshooting.