App Service Scaling - Azure Developers

Understanding Scaling in Azure App Service

Azure App Service offers two primary scaling methods:

Vertical Scaling (Scale Up): Increasing the resources (CPU, RAM, disk space) allocated to your existing instances. This is achieved by changing the App Service Plan tier.
Horizontal Scaling (Scale Out): Increasing the number of instances running your application. This allows your application to handle more concurrent requests.

You can choose to manually scale or configure autoscale rules to adjust the number of instances based on predefined metrics.

Manual Scaling

Manual scaling provides direct control over the number of instances. You can adjust the instance count at any time.

Steps for Manual Scaling:

Navigate to your App Service in the Azure portal.
Under "Scale out (App Service plan)", select "Manual scale".
Choose the desired number of instances.
Click "Save".

When to use manual scaling: Predictable traffic patterns, testing, or for applications with fixed resource requirements.

Autoscale

Autoscale allows your App Service to automatically adjust the number of instances based on performance metrics or a schedule.

Key Autoscale Features:

Metric-based scaling: Scales based on CPU usage, memory usage, disk queue length, HTTP queue length, or custom metrics.
Schedule-based scaling: Scales instances up or down at specific times or days of the week (e.g., scaling up during peak business hours).
Minimum and maximum instance limits: Define boundaries to control costs and ensure availability.

Configuring Autoscale Rules:

Navigate to your App Service plan in the Azure portal.
Under "Scale out (App Service plan)", select "Autoscale".
Configure the default rule (for when no other rules match).
Add new scale-out or scale-in rules. For each rule, define:
- The metric to monitor.
- The threshold value.
- The time grain (how often to evaluate the metric).
- The statistic (e.g., average, maximum).
- The cooldown period (how long to wait before triggering another scale action).
- The number of instances to add or remove.
Configure schedule-based scaling if needed.
Set the minimum and maximum number of instances.
Click "Save".

Example Autoscale Rule: Scale out by 1 instance when CPU percentage is greater than 70% for 10 minutes. Scale in by 1 instance when CPU percentage is less than 30% for 15 minutes.

Scaling Considerations

App Service Plan Tier: Higher tiers provide more resources per instance and support a larger number of instances for scale-out.
Instance Warm-up: When scaling out, new instances need time to start up and become ready. Configure instance warm-up to minimize downtime.
Load Balancer: Azure Load Balancer distributes traffic across your instances. Ensure your application is stateless or handles state appropriately.
Monitoring: Regularly monitor your application's performance and scaling metrics to fine-tune your autoscale rules.

Code Example: Measuring Request Latency (Conceptual)

While Azure monitors built-in metrics, you might want to implement custom metrics. Here's a conceptual example of how you might track request latency in your application code (e.g., using ASP.NET Core middleware):


// Example using ASP.NET Core Middleware
public class RequestLatencyMiddleware
{
    private readonly RequestDelegate _next;
    private readonly ILogger _logger;

    public RequestLatencyMiddleware(RequestDelegate next, ILogger logger)
    {
        _next = next;
        _logger = logger;
    }

    public async Task InvokeAsync(HttpContext context)
    {
        var stopwatch = Stopwatch.StartNew();
        await _next(context);
        stopwatch.Stop();

        // Log or send this latency metric to an Application Insights custom metric
        // This could be used for custom autoscale rules if configured correctly
        _logger.LogInformation($"Request {context.Request.Path} took {stopwatch.ElapsedMilliseconds}ms");

        // In a real scenario, you would push this to Azure Monitor/Application Insights
        // TelemetryClient.TrackMetric("RequestLatency", stopwatch.ElapsedMilliseconds);
    }
}

// In Startup.cs or Program.cs:
// app.UseMiddleware();

Best Practices

Start with autoscale rules and monitor their effectiveness.
Set reasonable minimum and maximum instance limits to control costs and performance.
Test your scaling configurations under load.
Ensure your application is designed to be stateless for efficient horizontal scaling.
Utilize Azure Monitor and Application Insights for comprehensive performance analysis.