Troubleshooting Azure Load Balancer

This section provides guidance on diagnosing and resolving common issues with Azure Load Balancer. A well-functioning load balancer is crucial for ensuring high availability and scalability of your applications.

Common Load Balancer Issues and Solutions

1. Health Probe Failures

Health probes are essential for the load balancer to determine the health of backend instances. If probes fail, traffic may not be directed to healthy instances.

Symptom: Instances are marked as unhealthy, or the load balancer stops sending traffic to certain instances.
Diagnosis:
- Verify that the health probe configuration (port, protocol, path) matches the application listening on the backend instances.
- Ensure that the backend instances are running and responsive on the specified probe port.
- Check Network Security Groups (NSGs) associated with the backend subnet or network interfaces. Ensure they allow inbound traffic from the load balancer's probe IP (typically 168.63.129.16) on the probe port.
- Confirm that firewalls on the backend instances are not blocking the health probe traffic.
- Review application logs on the backend instances for any errors that might cause them to fail the probe.
Resolution:
- Correct the health probe configuration in the Azure portal.
- Adjust NSGs to permit health probe traffic.
- Configure instance-level firewalls to allow probe traffic.
- Address application errors impacting probe responses.

2. Connection Timeouts or Refusals

Users might experience connection timeouts or refusals when trying to access services behind the load balancer.

Symptom: Clients cannot establish a connection, or connections are dropped.
Diagnosis:
- Frontend IP and Port: Ensure the frontend IP configuration and the listener port on the load balancer are correctly configured.
- Backend Pool: Verify that backend instances are registered in the backend pool and are healthy (check health probe status).
- Load Balancing Rules: Confirm that the load balancing rule is configured with the correct frontend IP, port, backend pool, and probe.
- NSGs: Check NSGs on both the frontend subnet (if applicable) and the backend subnet for rules blocking traffic. Remember to allow inbound traffic from the internet on the frontend port and outbound traffic from the load balancer to the backend instances on their service port.
- Instance Firewalls: Ensure that the firewalls on the backend virtual machines allow inbound traffic on the service port from the load balancer's internal IP addresses.
- Application Responsiveness: Confirm that the application on the backend instances is actively listening and responding to requests on the correct port.
Resolution:
- Update load balancer configuration if necessary.
- Modify NSGs to allow required traffic flows.
- Configure instance firewalls.
- Troubleshoot application-level issues.

3. Uneven Traffic Distribution

Traffic might not be distributed evenly across backend instances, leading to some instances being overloaded while others are idle.

Symptom: Some backend instances show high CPU or network utilization, while others are low.
Diagnosis:
- Load Balancing Algorithm: Azure Load Balancer uses a five-tuple hash-based distribution (source IP, source port, destination IP, destination port, protocol). Understand how this works and if your application traffic patterns might lead to unequal distribution (e.g., many clients with the same source IP).
- Session Persistence (Sticky Sessions): If session persistence is enabled, subsequent requests from the same client will always go to the same backend instance. If this is not desired, it can lead to uneven load.
- Health Probe Configuration: If health probes are too sensitive or not accurately reflecting application load, they might incorrectly mark instances as unhealthy or healthy, affecting distribution.
- Backend Instance Performance: Ensure all backend instances have similar capacity and performance.
Resolution:
- If uneven distribution is a concern due to the hashing algorithm, consider using Azure Application Gateway for more advanced routing capabilities or ensuring a diverse set of source IPs if possible.
- Adjust session persistence settings if they are contributing to the issue.
- Fine-tune health probe configurations.
- Ensure backend instances are adequately provisioned.

4. Slow Performance

Applications might exhibit slow response times when accessed through the load balancer.

Symptom: Web pages load slowly, API calls take a long time to complete.
Diagnosis:
- Latency: Measure latency between the client and the load balancer, and between the load balancer and the backend instances. Azure's 168.63.129.16 IP address can be used for connectivity tests.
- Backend Instance Performance: Check the CPU, memory, and network utilization of the backend virtual machines.
- Application Profiling: Profile the application code on the backend instances to identify performance bottlenecks.
- Network Path: Use tools like traceroute (or tracert on Windows) to diagnose potential network congestion or high latency hops.
- Load Balancer SKU: Ensure the Load Balancer SKU (Standard or Basic) meets your performance requirements. Standard Load Balancer offers higher throughput.
Resolution:
- Optimize application code.
- Scale up or scale out backend instances.
- Choose the appropriate Load Balancer SKU.
- Address any identified network latency issues.

Troubleshooting Tools and Techniques

Azure Network Watcher: Utilize Network Watcher's "Connection Troubleshoot" and "IP Flow Verify" features to diagnose connectivity issues between components.
Azure Monitor: Monitor Load Balancer metrics (e.g., healthy/unhealthy host counts, data path availability, network in/out) for anomalies.
Log Analytics: Query diagnostic logs for Azure Load Balancer to gain detailed insights into traffic flow and errors.
tcpdump/Wireshark: Capture network traffic on backend instances to inspect incoming requests and outgoing responses.
netcat (nc): Use netcat on backend instances to test if a port is open and listening.
Azure CLI/PowerShell: Use these tools to inspect and modify your Load Balancer configuration programmatically.

Note: Always ensure your Network Security Groups (NSGs) are configured to allow traffic from Azure's internal IP address 168.63.129.16. This IP is used by Azure platform services for health probes, connectivity checks, and other essential functions.

Tip: When diagnosing health probe issues, consider using a simple web server or a basic HTTP endpoint on your backend instances that returns a consistent success code (e.g., 200 OK) for the health probe path. This helps isolate whether the issue is with the probe itself or a more complex application-level problem.

If you continue to experience issues, consider opening a support ticket with Azure for further assistance.

Back to Troubleshooting Home

Documentation

Troubleshooting Azure Load Balancer

Common Load Balancer Issues and Solutions

1. Health Probe Failures

2. Connection Timeouts or Refusals

3. Uneven Traffic Distribution

4. Slow Performance

Troubleshooting Tools and Techniques