Uploading Files to Azure Blob Storage
This guide demonstrates how to upload files of various sizes to Azure Blob Storage using the Azure SDK for Python. We'll cover common scenarios and best practices.
Prerequisites
- An Azure Storage account. If you don't have one, create one.
- Install the Azure Blob Storage SDK for Python:
pip install azure-storage-blob - Have your storage account connection string or access key ready.
Understanding Blob Types
Azure Blob Storage supports three types of blobs:
- Block blobs: Optimized for storing large amounts of unstructured data like documents, media files, and application data. This is the most common type for file uploads.
- Append blobs: Optimized for append operations, such as writing to log files.
- Page blobs: Optimized for storing random access data, such as virtual machine disks.
For general file uploads, you'll primarily work with block blobs.
Uploading a Small File
For smaller files, a single upload operation is efficient.
import os
from azure.storage.blob import BlobServiceClient
# Replace with your actual connection string
connect_str = os.getenv("AZURE_STORAGE_CONNECTION_STRING")
container_name = "mycontainer"
local_file_name = "my_local_file.txt"
blob_name = "my_uploaded_blob.txt"
def upload_small_file(connect_str, container_name, local_file_name, blob_name):
try:
# Create the BlobServiceClient object
blob_service_client = BlobServiceClient.from_connection_string(connect_str)
# Get a client to interact with a specific container
container_client = blob_service_client.get_container_client(container_name)
# Create a local file for uploading
with open(local_file_name, "w") as f:
f.write("This is the content of the local file.\n")
f.write("It's a relatively small file, suitable for a single upload.\n")
print(f"Uploading {local_file_name} to {blob_name}...")
# Upload the file
with open(local_file_name, "rb") as data:
container_client.upload_blob(name=blob_name, data=data)
print("Upload successful!")
except Exception as e:
print(f"An error occurred: {e}")
if __name__ == "__main__":
# Ensure the connection string is set
if not connect_str:
print("Please set the AZURE_STORAGE_CONNECTION_STRING environment variable.")
else:
# Create a dummy file if it doesn't exist
if not os.path.exists(local_file_name):
with open(local_file_name, "w") as f:
f.write("This is dummy content.\n")
upload_small_file(connect_str, container_name, local_file_name, blob_name)
Explanation:
- We initialize a
BlobServiceClientusing the connection string. - We get a
ContainerClientfor the target container. - We open the local file in binary read mode (
"rb"). container_client.upload_blob()handles the entire upload for smaller files.
Uploading Large Files with Progress Tracking
For large files, it's recommended to use blob chunking or parallel uploads to improve performance and reliability. The SDK automatically handles chunking for large files. You can also track the upload progress.
import os
from azure.storage.blob import BlobServiceClient, ProgressRecorder
# Replace with your actual connection string
connect_str = os.getenv("AZURE_STORAGE_CONNECTION_STRING")
container_name = "mycontainer"
large_local_file_name = "my_large_file.zip"
large_blob_name = "my_large_uploaded_file.zip"
# Create a dummy large file for demonstration
if not os.path.exists(large_local_file_name):
print(f"Creating dummy large file: {large_local_file_name}...")
with open(large_local_file_name, "wb") as f:
f.seek(1024 * 1024 * 50 - 1) # Create a 50MB file
f.write(b"\0")
print("Dummy file created.")
def upload_large_file_with_progress(connect_str, container_name, local_file_name, blob_name):
try:
blob_service_client = BlobServiceClient.from_connection_string(connect_str)
container_client = blob_service_client.get_container_client(container_name)
print(f"Uploading {local_file_name} to {blob_name}...")
# Use upload_blob which automatically handles chunking for large files
# and provides progress callbacks
progress_recorder = ProgressRecorder()
container_client.upload_blob(
name=blob_name,
data=local_file_name,
progress_callback=progress_recorder.progress_callback
)
print(f"Upload successful! Uploaded {progress_recorder.bytes_processed} bytes.")
except Exception as e:
print(f"An error occurred: {e}")
if __name__ == "__main__":
if not connect_str:
print("Please set the AZURE_STORAGE_CONNECTION_STRING environment variable.")
else:
upload_large_file_with_progress(connect_str, container_name, large_local_file_name, large_blob_name)
Key Points:
- The
upload_blobmethod automatically performs chunking when the file size exceeds a certain threshold. - The
progress_callbackparameter allows you to specify a function to be called periodically during the upload, providing real-time progress updates. TheProgressRecorderclass is a convenient utility for this.
Uploading to a Specific Directory (Virtual Directory)
Azure Blob Storage doesn't have actual folders, but you can simulate them by including directory names in your blob names. For example, my-folder/my-document.txt.
import os
from azure.storage.blob import BlobServiceClient
# Replace with your actual connection string
connect_str = os.getenv("AZURE_STORAGE_CONNECTION_STRING")
container_name = "mycontainer"
local_file_name = "report.pdf"
blob_name_in_folder = "reports/archive/report.pdf" # Simulates a folder structure
def upload_to_folder(connect_str, container_name, local_file_name, blob_name):
try:
blob_service_client = BlobServiceClient.from_connection_string(connect_str)
container_client = blob_service_client.get_container_client(container_name)
# Create a dummy file
with open(local_file_name, "wb") as f:
f.write(b"%PDF-1.0\n% This is a dummy PDF file.\n")
print(f"Uploading {local_file_name} to {blob_name}...")
with open(local_file_name, "rb") as data:
container_client.upload_blob(name=blob_name, data=data)
print("Upload to simulated folder successful!")
except Exception as e:
print(f"An error occurred: {e}")
if __name__ == "__main__":
if not connect_str:
print("Please set the AZURE_STORAGE_CONNECTION_STRING environment variable.")
else:
upload_to_folder(connect_str, container_name, local_file_name, blob_name_in_folder)
When you list blobs in the container, you'll see the blob with its full path.
Managing Upload Options
The upload_blob method accepts several optional parameters:
| Parameter | Type | Description |
|---|---|---|
name |
str |
The name of the blob. Required. |
data |
bytes or str or file-like object |
The data to upload. Required. |
overwrite |
bool |
If True, replaces an existing blob with the same name. Defaults to False. |
metadata |
dict |
A dictionary of key-value pairs for blob metadata. |
content_settings |
ContentSettings object |
Specifies content type, encoding, etc. |
encoding |
str |
The encoding to use if data is a string. Defaults to 'utf-8'. |
overwrite=True is useful for updating existing files. Be cautious to avoid accidental data loss.
Next Steps
Now that you know how to upload files, explore these related topics: