Scaling Azure Analysis Services

This document provides guidance on how to scale your Azure Analysis Services instances to meet performance and capacity demands. Scaling involves adjusting the resources allocated to your Analysis Services instance, primarily through scaling compute and memory.

Understanding Scaling in Azure Analysis Services

Azure Analysis Services offers a tiered pricing model, allowing you to select the appropriate capacity for your workload. Scaling is primarily achieved by changing the service tier and, within certain tiers, adjusting the number of compute units (for Premium tier).

Service Tiers

Azure Analysis Services is available in different service tiers:

Developer: For development and testing. Limited capacity.
Basic: For production workloads with moderate performance needs.
Standard: Offers more capacity and performance than Basic.
Premium: The highest tier, designed for large-scale enterprise workloads. It offers the most capacity, performance, and advanced features, including Query Scale-Out.

Scaling Up and Down

You can scale your instance up or down to adapt to changing needs:

Scaling Up: Moving to a higher service tier or increasing compute units (Premium tier) to handle larger data volumes, more concurrent users, or higher query performance requirements.
Scaling Down: Moving to a lower service tier or decreasing compute units (Premium tier) to reduce costs when demand is lower.

How to Scale Your Instance

Scaling can be performed through the Azure portal, Azure CLI, or PowerShell.

Using the Azure Portal

Navigate to your Azure Analysis Services resource in the Azure portal.
In the left-hand menu, under "Settings," click on Scale.
Select the desired Service Tier and, if applicable (Premium tier), adjust the Capacity (number of compute units).
Click Apply to save your changes.

Figure 1: Azure Analysis Services Scaling blade in the Azure Portal.

Using Azure CLI

You can use the az aas update command. For example, to change the tier and capacity:

az aas update --resource-group <YourResourceGroup> --name <YourAnalysisServicesName> --tier Premium --capacity 4

Using PowerShell

You can use the Set-AzAnalysisServicesServer cmdlet.

Set-AzAnalysisServicesServer -ResourceGroupName <YourResourceGroup> -Name <YourAnalysisServicesName> -Tier Premium -Capacity 4

Query Scale-Out (Premium Tier)

For the Premium tier, you can enable Query Scale-Out to improve query performance and availability. This distributes read-only query workloads across multiple read-only replicas.

When enabled, read-only queries are automatically load-balanced across the primary server and its read-only replicas.
The primary server handles all write operations (e.g., data refreshes, model changes).

Configuring Query Scale-Out

Query Scale-Out is configured in the Scale-Out section of the Analysis Services resource in the Azure portal.

Navigate to your Analysis Services resource.
Under "Settings," click Scale-out.
Toggle Query scale-out to On.
Configure the number of read-only replicas.
Note the read-only connection string provided, which clients should use for read queries.

                    Important Considerations for Scale-Out:
                    Query Scale-Out is only available in the Premium tier.
Ensure your client applications and reporting tools are configured to use the read-only endpoint for queries.
Monitoring query performance is crucial to determine the optimal number of read-only replicas.

                

Monitoring and Performance Tuning

Regularly monitor your Azure Analysis Services instance's performance metrics to identify bottlenecks and adjust scaling as needed. Key metrics include:

CPU utilization
Memory usage
Query latency
Data refresh duration
Concurrent user connections

Azure Monitor provides comprehensive tools for tracking these metrics.

Choosing the Right Capacity

The optimal capacity (service tier and compute units) depends on several factors:

Data volume: Larger models generally require more memory and compute.
Query complexity: Complex queries consume more CPU.
Concurrency: The number of users or applications querying the service simultaneously.
Refresh frequency and duration: Frequent or long-running refreshes impact available resources.

It's often recommended to start with a reasonable capacity and then monitor performance, adjusting as necessary based on real-world usage.