Azure Data Lake Storage Gen2 (Blob)
What is Data Lake Storage?
Azure Data Lake Storage (ADLS) Gen2 combines the capabilities of Azure Blob storage with a hierarchical namespace. It provides massive scale, secure, and cost‑effective storage for analytics workloads.
- Unlimited storage capacity
- High throughput and low latency
- Fine‑grained security with POSIX ACLs
- Seamless integration with Azure analytics services
Getting Started
Follow the steps below to create a Data Lake Storage account and upload data using Azure CLI.
# Create a resource group
az group create --name myResourceGroup --location eastus
# Create a storage account with hierarchical namespace enabled
az storage account create \
--name mystorageaccount \
--resource-group myResourceGroup \
--location eastus \
--sku Standard_LRS \
--kind StorageV2 \
--hierarchical-namespace true
# Create a filesystem (container)
az storage container create \
--account-name mystorageaccount \
--name myfilesystem
# Upload a file
az storage blob upload \
--account-name mystorageaccount \
--container-name myfilesystem \
--name data/sample.csv \
--file ./sample.csv
Key Concepts
| Concept | Description |
|---|---|
| Filesystem | Top‑level container that holds directories and files. |
| Directory | Virtual folder that can contain files and sub‑directories. |
| File | Blob stored inside a filesystem; supports operations like append, flush, and lease. |
| Hierarchical Namespace | Enables atomic directory operations and POSIX‑like ACLs. |
| Access Tiers | Hot, Cool, and Archive tiers for cost‑optimized storage. |
Code Sample: Python SDK
from azure.storage.filedatalake import DataLakeServiceClient
service = DataLakeServiceClient(account_url="https://mystorageaccount.dfs.core.windows.net", credential="")
file_system_client = service.get_file_system_client(file_system="myfilesystem")
# Create a directory
directory_client = file_system_client.get_directory_client("logs/2025")
directory_client.create_directory()
# Upload a file
file_client = directory_client.get_file_client("app.log")
with open("app.log", "rb") as data:
file_client.upload_data(data, overwrite=True)
print("Upload complete")
Best Practices
- Enable hierarchical namespace for new accounts.
- Use Azure Role‑Based Access Control (RBAC) combined with ACLs for fine‑grained security.
- Leverage lifecycle management policies to automatically transition data to lower‑cost tiers.
- Store data in region‑specific accounts to reduce latency for analytics workloads.
- Monitor storage metrics and set alerts for anomalous activity.