Azure Data Lake Storage Reference
Welcome to the comprehensive reference documentation for Azure Data Lake Storage. This service is designed for big data analytics workloads, enabling you to store massive amounts of structured, semi-structured, and unstructured data.
Introduction
Azure Data Lake Storage is a scalable, secure data lake solution built on Azure Blob Storage. It provides a robust foundation for modern data analytics and AI scenarios.
Core Concepts
- Data Lake: A centralized repository that allows you to store all your structured and unstructured data at any scale.
- Hierarchical Namespace (HNS): A feature that provides object storage with a file system semantic, enabling efficient data management and analytics.
- Access Control Lists (ACLs): Fine-grained permissions for files and directories, ensuring secure data access.
- Performance Optimization: Designed for high throughput and low latency analytics workloads.
Azure Data Lake Storage Gen1
The initial generation of Azure Data Lake Storage. While still supported, Azure Data Lake Storage Gen2 is recommended for new deployments.
Key features include:
- Massive scalability
- Security and compliance features
- Integration with Azure analytics services
Azure Data Lake Storage Gen2
A set of capabilities dedicated to big data analytics, built on Azure Blob Storage. It combines the scalability of cloud object storage with a file system, directory, and file semantics.
Key benefits of Gen2:
- Hierarchical Namespace: Enables directory operations and atomic directory-level operations.
- Cost-Effective Storage: Leverages Azure Blob Storage pricing.
- Optimized for Analytics: Enhanced performance for analytics frameworks like Apache Hadoop and Spark.
- POSIX-like ACLs: Enhanced security and access control.
Key Features and Capabilities
- Scalability and Performance: Handles petabytes of data with high throughput.
- Security: Encryption at rest and in transit, Azure AD integration, RBAC, and ACLs.
- Data Tiering: Hot, cool, and archive access tiers for cost management.
- Lifecycle Management: Automate data movement between tiers.
- Replication Options: Locally-redundant storage (LRS), geo-redundant storage (GRS), and zone-redundant storage (ZRS).
REST API Reference
Interact with Azure Data Lake Storage programmatically using its REST APIs. These APIs allow you to perform operations on your data and storage accounts.
Common operations include:
- Account Management (Create, Delete, Get Properties)
- Container/Filesystem Operations (Create, Delete, List)
- Blob/File Operations (Upload, Download, Delete, Set Properties)
- Access Control Management (Get/Set ACLs)
View the full Azure Data Lake Storage REST API documentation
SDKs
Azure Data Lake Storage is supported by various SDKs for different programming languages, simplifying integration into your applications.
Language | Package Name / Library | Link |
---|---|---|
.NET | Azure.Storage.Blobs | Azure SDK for .NET |
Java | azure-storage-blob | Azure SDK for Java |
Python | azure-storage-blob | Azure SDK for Python |
Node.js | @azure/storage-blob | Azure SDK for Node.js |
Go | azure-storage-blob | Azure SDK for Go |
Azure CLI Commands
Manage Azure Data Lake Storage using the Azure Command-Line Interface (CLI).
Examples:
az storage fs --help
az storage fs list --account-name <storage-account-name>
az storage fs directory create --name <directory-name> --account-name <storage-account-name> --file-system <filesystem-name>
az storage fs file upload --account-name <storage-account-name> --file-system <filesystem-name> --source <local-file-path> --destination <remote-file-path>
View all Azure CLI commands for storage
Azure PowerShell Cmdlets
Manage Azure Data Lake Storage using Azure PowerShell.
Examples:
Get-AzStorageAccount -ResourceGroupName "myResourceGroup" -AccountName "mystorageaccount"
New-AzDataLakeStorageGen2FileSystem -Name "myfilesystem" -Context $ctx
Set-AzDataLakeStorageGen2ItemAclObject -FileSystem "myfilesystem" -Path "mydirectory/myfile.txt" -Acl "user::rwx,group::r-x,other::---"
View all Azure PowerShell cmdlets for storage
Pricing
Azure Data Lake Storage pricing is based on Azure Blob Storage pricing, which includes costs for:
- Data stored (per GB per month)
- Transactions (read, write, etc.)
- Data egress
- Redundancy options (LRS, GRS, ZRS)
For detailed pricing information, please refer to the official Azure pricing page.
View Azure Data Lake Storage Pricing
Code Samples and Tutorials
Explore code samples and tutorials to learn how to use Azure Data Lake Storage for various scenarios.