Azure Data Lake Storage Reference

Welcome to the comprehensive reference documentation for Azure Data Lake Storage. This service is designed for big data analytics workloads, enabling you to store massive amounts of structured, semi-structured, and unstructured data.

Introduction

Azure Data Lake Storage is a scalable, secure data lake solution built on Azure Blob Storage. It provides a robust foundation for modern data analytics and AI scenarios.

Core Concepts

  • Data Lake: A centralized repository that allows you to store all your structured and unstructured data at any scale.
  • Hierarchical Namespace (HNS): A feature that provides object storage with a file system semantic, enabling efficient data management and analytics.
  • Access Control Lists (ACLs): Fine-grained permissions for files and directories, ensuring secure data access.
  • Performance Optimization: Designed for high throughput and low latency analytics workloads.

Azure Data Lake Storage Gen1

The initial generation of Azure Data Lake Storage. While still supported, Azure Data Lake Storage Gen2 is recommended for new deployments.

Key features include:

  • Massive scalability
  • Security and compliance features
  • Integration with Azure analytics services

Azure Data Lake Storage Gen2

A set of capabilities dedicated to big data analytics, built on Azure Blob Storage. It combines the scalability of cloud object storage with a file system, directory, and file semantics.

Key benefits of Gen2:

  • Hierarchical Namespace: Enables directory operations and atomic directory-level operations.
  • Cost-Effective Storage: Leverages Azure Blob Storage pricing.
  • Optimized for Analytics: Enhanced performance for analytics frameworks like Apache Hadoop and Spark.
  • POSIX-like ACLs: Enhanced security and access control.

Key Features and Capabilities

  • Scalability and Performance: Handles petabytes of data with high throughput.
  • Security: Encryption at rest and in transit, Azure AD integration, RBAC, and ACLs.
  • Data Tiering: Hot, cool, and archive access tiers for cost management.
  • Lifecycle Management: Automate data movement between tiers.
  • Replication Options: Locally-redundant storage (LRS), geo-redundant storage (GRS), and zone-redundant storage (ZRS).

REST API Reference

Interact with Azure Data Lake Storage programmatically using its REST APIs. These APIs allow you to perform operations on your data and storage accounts.

Common operations include:

  • Account Management (Create, Delete, Get Properties)
  • Container/Filesystem Operations (Create, Delete, List)
  • Blob/File Operations (Upload, Download, Delete, Set Properties)
  • Access Control Management (Get/Set ACLs)

View the full Azure Data Lake Storage REST API documentation

SDKs

Azure Data Lake Storage is supported by various SDKs for different programming languages, simplifying integration into your applications.

Language Package Name / Library Link
.NET Azure.Storage.Blobs Azure SDK for .NET
Java azure-storage-blob Azure SDK for Java
Python azure-storage-blob Azure SDK for Python
Node.js @azure/storage-blob Azure SDK for Node.js
Go azure-storage-blob Azure SDK for Go

Azure CLI Commands

Manage Azure Data Lake Storage using the Azure Command-Line Interface (CLI).

Examples:

az storage fs --help
az storage fs list --account-name <storage-account-name>
az storage fs directory create --name <directory-name> --account-name <storage-account-name> --file-system <filesystem-name>
az storage fs file upload --account-name <storage-account-name> --file-system <filesystem-name> --source <local-file-path> --destination <remote-file-path>

View all Azure CLI commands for storage

Azure PowerShell Cmdlets

Manage Azure Data Lake Storage using Azure PowerShell.

Examples:

Get-AzStorageAccount -ResourceGroupName "myResourceGroup" -AccountName "mystorageaccount"
New-AzDataLakeStorageGen2FileSystem -Name "myfilesystem" -Context $ctx
Set-AzDataLakeStorageGen2ItemAclObject -FileSystem "myfilesystem" -Path "mydirectory/myfile.txt" -Acl "user::rwx,group::r-x,other::---"

View all Azure PowerShell cmdlets for storage

Pricing

Azure Data Lake Storage pricing is based on Azure Blob Storage pricing, which includes costs for:

  • Data stored (per GB per month)
  • Transactions (read, write, etc.)
  • Data egress
  • Redundancy options (LRS, GRS, ZRS)

For detailed pricing information, please refer to the official Azure pricing page.

View Azure Data Lake Storage Pricing

Code Samples and Tutorials

Explore code samples and tutorials to learn how to use Azure Data Lake Storage for various scenarios.