Azure Data Lake Storage Gen2 (Blob)

What is Data Lake Storage?

Azure Data Lake Storage (ADLS) Gen2 combines the capabilities of Azure Blob storage with a hierarchical namespace. It provides massive scale, secure, and cost‑effective storage for analytics workloads.

  • Unlimited storage capacity
  • High throughput and low latency
  • Fine‑grained security with POSIX ACLs
  • Seamless integration with Azure analytics services

Getting Started

Follow the steps below to create a Data Lake Storage account and upload data using Azure CLI.

# Create a resource group
az group create --name myResourceGroup --location eastus

# Create a storage account with hierarchical namespace enabled
az storage account create \
  --name mystorageaccount \
  --resource-group myResourceGroup \
  --location eastus \
  --sku Standard_LRS \
  --kind StorageV2 \
  --hierarchical-namespace true

# Create a filesystem (container)
az storage container create \
  --account-name mystorageaccount \
  --name myfilesystem

# Upload a file
az storage blob upload \
  --account-name mystorageaccount \
  --container-name myfilesystem \
  --name data/sample.csv \
  --file ./sample.csv

Key Concepts

ConceptDescription
FilesystemTop‑level container that holds directories and files.
DirectoryVirtual folder that can contain files and sub‑directories.
FileBlob stored inside a filesystem; supports operations like append, flush, and lease.
Hierarchical NamespaceEnables atomic directory operations and POSIX‑like ACLs.
Access TiersHot, Cool, and Archive tiers for cost‑optimized storage.

Code Sample: Python SDK

from azure.storage.filedatalake import DataLakeServiceClient

service = DataLakeServiceClient(account_url="https://mystorageaccount.dfs.core.windows.net", credential="")
file_system_client = service.get_file_system_client(file_system="myfilesystem")

# Create a directory
directory_client = file_system_client.get_directory_client("logs/2025")
directory_client.create_directory()

# Upload a file
file_client = directory_client.get_file_client("app.log")
with open("app.log", "rb") as data:
    file_client.upload_data(data, overwrite=True)

print("Upload complete")

Best Practices

  1. Enable hierarchical namespace for new accounts.
  2. Use Azure Role‑Based Access Control (RBAC) combined with ACLs for fine‑grained security.
  3. Leverage lifecycle management policies to automatically transition data to lower‑cost tiers.
  4. Store data in region‑specific accounts to reduce latency for analytics workloads.
  5. Monitor storage metrics and set alerts for anomalous activity.

Resources