Azure Blob Storage: Working with Blobs
This document provides a comprehensive guide to working with blobs in Azure Blob Storage, covering their types, operations, and best practices.
What are Blobs?
Blobs (Binary Large Objects) are the simplest type of Azure Storage object. Blob storage is ideal for storing massive amounts of unstructured data, such as:
- Text or binary data
- Images and videos
- Application installer files
- Log files
- Backup data
Types of Blobs
Azure Blob Storage supports three types of blobs:
- Block blobs: Optimized for uploading large amounts of data to a storage account. A block blob is composed of blocks, where each block is identified by its block ID. A block blob can contain at most 50,000 blocks and a maximum total size of approximately 190.7 TiB.
- Append blobs: Optimized for append operations such as logging data. An append blob is composed of blocks. Like a block blob, an append blob is identified by its block ID. However, all blocks in an append blob must be appended. When you update an append blob, you can only append to it, you cannot modify or delete it.
- Page blobs: Optimized for random read and write operations. Page blobs can be up to 8 TiB in size and are used to store IaaS virtual machine disk data.
Common Blob Operations
Here are some of the most common operations you can perform on blobs:
Uploading a Blob
Uploading data to blob storage is a fundamental operation. You can upload a block blob, append blob, or page blob. The following example demonstrates uploading a block blob using Azure SDK for Python.
from azure.storage.blob import BlobServiceClient, ContentSettings
connection_string = "YOUR_AZURE_STORAGE_CONNECTION_STRING"
container_name = "my-container"
blob_name = "my-blob.txt"
file_path = "path/to/your/local/file.txt"
# Create the BlobServiceClient object
blob_service_client = BlobServiceClient.from_connection_string(connection_string)
# Get a client to interact with a specific blob
blob_client = blob_service_client.get_blob_client(container=container_name, blob=blob_name)
# Upload the blob
with open(file_path, "rb") as data:
blob_client.upload_blob(data, overwrite=True, content_settings=ContentSettings(content_type='text/plain'))
print(f"Blob '{blob_name}' uploaded successfully.")
Downloading a Blob
Retrieving data from blob storage is equally straightforward. You can download the entire blob or a portion of it.
using Azure.Storage.Blobs;
using System;
using System.IO;
string connectionString = "YOUR_AZURE_STORAGE_CONNECTION_STRING";
string containerName = "my-container";
string blobName = "my-blob.txt";
string downloadFilePath = "path/to/save/downloaded_file.txt";
// Create the BlobServiceClient object
BlobServiceClient blobServiceClient = new BlobServiceClient(connectionString);
// Get a client to interact with a specific blob
BlobClient blobClient = blobServiceClient.GetBlobClient(blobName);
// Download the blob
using (var stream = File.OpenWrite(downloadFilePath))
{
await blobClient.DownloadToAsync(stream);
}
Console.WriteLine($"Blob '{blobName}' downloaded successfully to '{downloadFilePath}'.");
Listing Blobs in a Container
You can retrieve a list of all blobs within a specific container.
const { BlobServiceClient } = require("@azure/storage-blob");
const connectionString = "YOUR_AZURE_STORAGE_CONNECTION_STRING";
const containerName = "my-container";
async function listBlobs() {
const blobServiceClient = BlobServiceClient.fromConnectionString(connectionString);
const containerClient = blobServiceClient.getContainerClient(containerName);
console.log("Blobs in container:");
for await (const blob of containerClient.listBlobs()) {
console.log("- " + blob.name);
}
}
listBlobs().catch(err => console.error(err));
Blob Properties and Metadata
Each blob has system properties (e.g., ETag, last modified time) and can also store user-defined metadata. Metadata is stored as a collection of key-value pairs.
Note: Metadata is not indexed and cannot be queried. It's intended for storing application-specific information associated with the blob.
Blob Access Tiers
Azure Blob Storage offers different access tiers to optimize costs based on data access frequency:
- Hot tier: For frequently accessed data.
- Cool tier: For infrequently accessed data.
- Archive tier: For rarely accessed data with flexible retrieval times.
You can set the access tier for individual blobs or for the entire container.
Security Considerations
Securing your blobs is paramount. Consider the following:
- Use Shared Access Signatures (SAS) for time-limited, delegated access.
- Implement Azure Active Directory authentication for robust identity management.
- Enforce encryption at rest and in transit.
- Configure network access restrictions.