Batch Operations in Azure Table Storage
Azure Table Storage supports batch operations to perform multiple operations (inserts, updates, deletes) against entities within a single storage account. This improves performance by reducing the number of network round trips.
Introduction
Azure Table Storage is a NoSQL key-value store that stores large amounts of structured, non-relational data. When dealing with multiple entities that need to be modified, performing individual operations can be inefficient. Batch operations offer a way to group these operations into a single request, optimizing network latency and throughput.
Understanding Batch Operations
A batch operation in Azure Table Storage is a single HTTP request that can contain multiple individual entity operations. These operations can be:
- Insert: Add a new entity.
- Update: Modify an existing entity.
- Delete: Remove an existing entity.
Crucially, all operations within a single batch must target entities residing in the same table and belong to the same partition key.
Limits and Constraints
When using batch operations, consider the following limitations:
- A batch can contain a maximum of 100 individual operations.
- All operations must target entities within the same table.
- All entities involved in a batch must share the same partition key.
- A batch operation cannot include a mix of operations that modify the partition key or row key of an entity.
- Batch operations are atomic within a partition; either all operations succeed, or none of them do. This ensures data consistency.
Performing Batch Operations
The specific implementation of batch operations depends on the Azure Storage SDK you are using. Generally, you will:
- Create a list or collection of operations to perform.
- For each operation, define the type (insert, update, delete), the entity data (if applicable), and the target table.
- Group these operations into a batch request.
- Submit the batch request to the Azure Table Storage service.
The service will then execute all operations within the batch. If any operation fails, the entire batch is rolled back for that partition.
Examples
C# Example
Using the Azure SDK for .NET:
using Azure;
using Azure.Data.Tables;
using System;
using System.Collections.Generic;
using System.Threading.Tasks;
// Assume 'connectionString' and 'tableName' are already defined
string connectionString = "YOUR_AZURE_STORAGE_CONNECTION_STRING";
string tableName = "MySampleTable";
string partitionKey = "SalesData";
var client = new TableClient(connectionString, tableName);
await client.CreateIfNotExistsAsync();
// Entities for the batch operation
var product1 = new TableEntity("SalesData", "987")
{
{ "ProductName", "Widget Pro" },
{ "Quantity", 150 },
{ "Price", 25.50 }
};
var product2 = new TableEntity("SalesData", "988")
{
{ "ProductName", "Gizmo Lite" },
{ "Quantity", 300 },
{ "Price", 10.75 }
};
var updatedProduct1 = new TableEntity("SalesData", "987") // Must have same partition and row key
{
{ "ProductName", "Widget Pro (Updated)" },
{ "Quantity", 175 }, // Updated quantity
{ "Price", 26.00 } // Updated price
};
// Create a list of operations
var batch = new List();
// Add insert operations
batch.Add(new TableTransactionAction(TableTransactionActionType.Add, product1));
batch.Add(new TableTransactionAction(TableTransactionActionType.Add, product2));
// Add an update operation
batch.Add(new TableTransactionAction(TableTransactionActionType.Update, updatedProduct1));
try
{
Response> response = await client.SubmitTransactionAsync(batch);
Console.WriteLine($"Batch operation completed successfully. {response.Value.Count} operations processed.");
}
catch (RequestFailedException ex)
{
Console.WriteLine($"Batch operation failed: {ex.Message}");
// You can inspect ex.Status and ex.ErrorCode for more details.
}
Python Example
Using the Azure SDK for Python:
from azure.data.tables import TableServiceClient, UpdateMode
from azure.core.exceptions import ResourceExistsError, HttpResponseError
# Assume 'connection_string' and 'table_name' are already defined
connection_string = "YOUR_AZURE_STORAGE_CONNECTION_STRING"
table_name = "MySampleTable"
partition_key = "SalesData"
try:
table_service_client = TableServiceClient.from_connection_string(conn_str=connection_string)
table_client = table_service_client.get_table_client(table_name=table_name)
table_client.create_table()
print(f"Table '{table_name}' created.")
except ResourceExistsError:
print(f"Table '{table_name}' already exists.")
# Entities for the batch operation
entity1 = {
"PartitionKey": partition_key,
"RowKey": "987",
"ProductName": "Widget Pro",
"Quantity": 150,
"Price": 25.50
}
entity2 = {
"PartitionKey": partition_key,
"RowKey": "988",
"ProductName": "Gizmo Lite",
"Quantity": 300,
"Price": 10.75
}
updated_entity1 = {
"PartitionKey": partition_key,
"RowKey": "987", # Must have same partition and row key
"ProductName": "Widget Pro (Updated)",
"Quantity": 175, # Updated quantity
"Price": 26.00 # Updated price
}
# Create a list of operations
batch_operations = []
# Add insert operations
batch_operations.append(("insert", entity1))
batch_operations.append(("insert", entity2))
# Add an update operation
batch_operations.append(("update", updated_entity1))
try:
results = table_client.submit_transaction(batch_operations)
print(f"Batch operation completed successfully. {len(results)} operations processed.")
# results is a list of responses for each operation
except HttpResponseError as e:
print(f"Batch operation failed: {e}")
# You can inspect e.status_code for more details.
Best Practices
- Group by Partition Key: Always ensure that all entities in a batch share the same partition key. This is a hard requirement and critical for the operation's success.
- Limit Batch Size: While the limit is 100 operations, consider whether smaller batches might be more manageable and easier to debug if issues arise.
- Error Handling: Implement robust error handling to gracefully manage failures. Since batches are atomic per partition, understand that a failure might require retrying the entire batch or a subset of operations.
- Idempotency: Design your operations to be idempotent where possible, especially if you anticipate retries.
- Monitoring: Monitor your storage account for performance and error rates, particularly for tables that frequently use batch operations.