Uploading Files to Azure Blob Storage

This guide demonstrates how to upload files of various sizes to Azure Blob Storage using the Azure SDK for Python. We'll cover common scenarios and best practices.

Prerequisites

An Azure Storage account. If you don't have one, create one.
Install the Azure Blob Storage SDK for Python: pip install azure-storage-blob
Have your storage account connection string or access key ready.

Understanding Blob Types

Azure Blob Storage supports three types of blobs:

Block blobs: Optimized for storing large amounts of unstructured data like documents, media files, and application data. This is the most common type for file uploads.
Append blobs: Optimized for append operations, such as writing to log files.
Page blobs: Optimized for storing random access data, such as virtual machine disks.

For general file uploads, you'll primarily work with block blobs.

Uploading a Small File

For smaller files, a single upload operation is efficient.

                Python
                
import os
from azure.storage.blob import BlobServiceClient

# Replace with your actual connection string
connect_str = os.getenv("AZURE_STORAGE_CONNECTION_STRING")
container_name = "mycontainer"
local_file_name = "my_local_file.txt"
blob_name = "my_uploaded_blob.txt"

def upload_small_file(connect_str, container_name, local_file_name, blob_name):
    try:
        # Create the BlobServiceClient object
        blob_service_client = BlobServiceClient.from_connection_string(connect_str)

        # Get a client to interact with a specific container
        container_client = blob_service_client.get_container_client(container_name)

        # Create a local file for uploading
        with open(local_file_name, "w") as f:
            f.write("This is the content of the local file.\n")
            f.write("It's a relatively small file, suitable for a single upload.\n")

        print(f"Uploading {local_file_name} to {blob_name}...")

        # Upload the file
        with open(local_file_name, "rb") as data:
            container_client.upload_blob(name=blob_name, data=data)

        print("Upload successful!")

    except Exception as e:
        print(f"An error occurred: {e}")

if __name__ == "__main__":
    # Ensure the connection string is set
    if not connect_str:
        print("Please set the AZURE_STORAGE_CONNECTION_STRING environment variable.")
    else:
        # Create a dummy file if it doesn't exist
        if not os.path.exists(local_file_name):
            with open(local_file_name, "w") as f:
                f.write("This is dummy content.\n")
        upload_small_file(connect_str, container_name, local_file_name, blob_name)

            

Explanation:

We initialize a BlobServiceClient using the connection string.
We get a ContainerClient for the target container.
We open the local file in binary read mode ("rb").
container_client.upload_blob() handles the entire upload for smaller files.

Uploading Large Files with Progress Tracking

For large files, it's recommended to use blob chunking or parallel uploads to improve performance and reliability. The SDK automatically handles chunking for large files. You can also track the upload progress.

                Python
                
import os
from azure.storage.blob import BlobServiceClient, ProgressRecorder

# Replace with your actual connection string
connect_str = os.getenv("AZURE_STORAGE_CONNECTION_STRING")
container_name = "mycontainer"
large_local_file_name = "my_large_file.zip"
large_blob_name = "my_large_uploaded_file.zip"

# Create a dummy large file for demonstration
if not os.path.exists(large_local_file_name):
    print(f"Creating dummy large file: {large_local_file_name}...")
    with open(large_local_file_name, "wb") as f:
        f.seek(1024 * 1024 * 50 - 1) # Create a 50MB file
        f.write(b"\0")
    print("Dummy file created.")

def upload_large_file_with_progress(connect_str, container_name, local_file_name, blob_name):
    try:
        blob_service_client = BlobServiceClient.from_connection_string(connect_str)
        container_client = blob_service_client.get_container_client(container_name)

        print(f"Uploading {local_file_name} to {blob_name}...")

        # Use upload_blob which automatically handles chunking for large files
        # and provides progress callbacks
        progress_recorder = ProgressRecorder()
        container_client.upload_blob(
            name=blob_name,
            data=local_file_name,
            progress_callback=progress_recorder.progress_callback
        )

        print(f"Upload successful! Uploaded {progress_recorder.bytes_processed} bytes.")

    except Exception as e:
        print(f"An error occurred: {e}")

if __name__ == "__main__":
    if not connect_str:
        print("Please set the AZURE_STORAGE_CONNECTION_STRING environment variable.")
    else:
        upload_large_file_with_progress(connect_str, container_name, large_local_file_name, large_blob_name)

            

Key Points:

The upload_blob method automatically performs chunking when the file size exceeds a certain threshold.
The progress_callback parameter allows you to specify a function to be called periodically during the upload, providing real-time progress updates. The ProgressRecorder class is a convenient utility for this.

For very large files or high throughput requirements, consider using the Azure CLI or Azure Storage Explorer for manual uploads, or explore advanced SDK features like parallel uploads.

Uploading to a Specific Directory (Virtual Directory)

Azure Blob Storage doesn't have actual folders, but you can simulate them by including directory names in your blob names. For example, my-folder/my-document.txt.

                Python
                
import os
from azure.storage.blob import BlobServiceClient

# Replace with your actual connection string
connect_str = os.getenv("AZURE_STORAGE_CONNECTION_STRING")
container_name = "mycontainer"
local_file_name = "report.pdf"
blob_name_in_folder = "reports/archive/report.pdf" # Simulates a folder structure

def upload_to_folder(connect_str, container_name, local_file_name, blob_name):
    try:
        blob_service_client = BlobServiceClient.from_connection_string(connect_str)
        container_client = blob_service_client.get_container_client(container_name)

        # Create a dummy file
        with open(local_file_name, "wb") as f:
            f.write(b"%PDF-1.0\n% This is a dummy PDF file.\n")

        print(f"Uploading {local_file_name} to {blob_name}...")
        with open(local_file_name, "rb") as data:
            container_client.upload_blob(name=blob_name, data=data)

        print("Upload to simulated folder successful!")

    except Exception as e:
        print(f"An error occurred: {e}")

if __name__ == "__main__":
    if not connect_str:
        print("Please set the AZURE_STORAGE_CONNECTION_STRING environment variable.")
    else:
        upload_to_folder(connect_str, container_name, local_file_name, blob_name_in_folder)

            

When you list blobs in the container, you'll see the blob with its full path.

Managing Upload Options

The upload_blob method accepts several optional parameters:

Parameter	Type	Description
`name`	`str`	The name of the blob. Required.
`data`	`bytes` or `str` or file-like object	The data to upload. Required.
`overwrite`	`bool`	If `True`, replaces an existing blob with the same name. Defaults to `False`.
`metadata`	`dict`	A dictionary of key-value pairs for blob metadata.
`content_settings`	`ContentSettings` object	Specifies content type, encoding, etc.
`encoding`	`str`	The encoding to use if `data` is a string. Defaults to 'utf-8'.

Setting overwrite=True is useful for updating existing files. Be cautious to avoid accidental data loss.

Next Steps

Now that you know how to upload files, explore these related topics: