Azure Data Lake Storage Gen2 SDK Reference
This document provides a comprehensive reference for the Azure Data Lake Storage Gen2 SDKs, enabling you to build powerful data analytics solutions by programmatically interacting with your data lake.
Introduction
Azure Data Lake Storage Gen2 (ADLS Gen2) is a highly scalable and secure data lake built on Azure Blob Storage. It is optimized for big data analytics workloads, offering features like hierarchical namespace, POSIX-like access control, and direct integration with Azure HDInsight, Azure Databricks, and Azure Synapse Analytics.
SDK Overview
The Azure Data Lake Storage Gen2 SDKs provide a managed, object-oriented interface to interact with ADLS Gen2 resources. They simplify common tasks such as creating file systems, uploading and downloading files, managing directories, and controlling access.
Supported Languages
- .NET
- Java
- Python
- JavaScript (Node.js and Browser)
- Go
Clients
The SDKs expose different client objects to manage various levels of ADLS Gen2 resources.
DataLakeFileSystemClient
The DataLakeFileSystemClient
is the entry point for interacting with ADLS Gen2. It allows you to manage file systems (containers).
// Replace with your connection string
var connectionString = "DefaultEndpointsProtocol=https;AccountName=your_account_name;AccountKey=your_account_key;EndpointSuffix=core.windows.net";
var fileSystemClient = new DataLakeFileSystemClient(connectionString, "your_filesystem_name");
DataLakeDirectoryClient
The DataLakeDirectoryClient
represents a directory within a file system. It allows you to perform operations on directories and their contents.
DataLakeFileClient
The DataLakeFileClient
represents a file within a directory. It provides methods for uploading, downloading, appending to, and managing files.
Operations
File System Operations
- Create a file system
- Delete a file system
- List file systems
- Get file system properties
Directory Operations
- Create a directory
- Delete a directory
- List directories and files within a directory
- Get directory properties
- Set/Get directory metadata
File Operations
- Create/Upload a file
- Download a file
- Append data to a file
- Read data from a file
- Delete a file
- Get file properties
- Set/Get file metadata
- Set/Get file access control lists (ACLs)
Access Control
ADLS Gen2 supports POSIX-like Access Control Lists (ACLs) for fine-grained permissions management.
- Set ACLs on files and directories
- Get ACLs for files and directories
- Modify ACLs
Code Samples
Explore practical examples of using the SDKs to perform common tasks.
Creating a File System and Uploading a File (.NET)
public void CreateFileSystemAndUploadFile(string fileSystemName, string directoryName, string fileName, BinaryData fileContent)
This method demonstrates creating a file system, a directory within it, and then uploading a file with provided content.
using Azure.Storage.DataLake;
using System;
public class DataLakeExample
{
public void CreateFileSystemAndUploadFile(string connectionString, string fileSystemName, string directoryName, string fileName, BinaryData fileContent)
{
var dataLakeServiceClient = new DataLakeServiceClient(connectionString);
var fileSystemClient = dataLakeServiceClient.GetFileSystemClient(fileSystemName);
// Create File System if it doesn't exist
fileSystemClient.CreateIfNotExists();
// Get Directory Client
var directoryClient = fileSystemClient.GetDirectoryClient(directoryName);
// Create Directory if it doesn't exist
directoryClient.CreateIfNotExists();
// Get File Client
var fileClient = directoryClient.GetFileClient(fileName);
// Upload File Content
using (var stream = fileContent.ToStream())
{
fileClient.Upload(stream, overwrite: true);
}
Console.WriteLine($"File '{fileName}' uploaded to '{fileSystemName}/{directoryName}/'");
}
}
Downloading a File (.NET)
public Stream DownloadFile(string fileSystemName, string directoryName, string fileName)
This method retrieves a file from ADLS Gen2 as a readable stream.
using Azure.Storage.DataLake;
using System.IO;
public class DataLakeExample
{
public Stream DownloadFile(string connectionString, string fileSystemName, string directoryName, string fileName)
{
var dataLakeServiceClient = new DataLakeServiceClient(connectionString);
var fileSystemClient = dataLakeServiceClient.GetFileSystemClient(fileSystemName);
var directoryClient = fileSystemClient.GetDirectoryClient(directoryName);
var fileClient = directoryClient.GetFileClient(fileName);
var downloadResult = fileClient.Read();
return downloadResult.Value.Content.ToStream();
}
}
Troubleshooting
Common issues and their resolutions:
- Authentication Errors: Ensure your connection string or credentials are correct and have the necessary permissions.
- Path Not Found: Verify that the file system, directory, and file paths are accurate.
- Rate Limiting: For high-throughput operations, consider implementing retry logic with exponential backoff.