Azure Data Lake Storage Gen2 .NET SDK API Reference

Table of Contents

Introduction

This document provides the API reference for the Azure Data Lake Storage Gen2 .NET SDK. This SDK allows you to interact with your Azure Data Lake Storage Gen2 accounts programmatically using C#.

Azure Data Lake Storage Gen2 is a powerful, scalable, and secure data lake built on Azure Blob Storage. It provides hierarchical namespace capabilities, enabling big data analytics scenarios.

Getting Started

To use the .NET SDK, you first need to install the relevant NuGet packages:


dotnet add package Azure.Storage.DataLake
dotnet add package Azure.Identity
                

You can then authenticate using various methods, such as connection strings or Azure Identity credentials.


// Example using DefaultAzureCredential
using Azure.Identity;
using Azure.Storage.DataLake;

string accountName = "YOUR_STORAGE_ACCOUNT_NAME";
var credential = new DefaultAzureCredential();
var dataLakeUri = new Uri($"https://{accountName}.dfs.core.windows.net");
var dataLakeServiceClient = new DataLakeServiceClient(dataLakeUri, credential);

// Example using Connection String
// string connectionString = "YOUR_CONNECTION_STRING";
// var dataLakeServiceClient = new DataLakeServiceClient(connectionString);
                

Core Concepts

DataLakeStorageClient

The primary client for interacting with the Azure Data Lake Storage Gen2 service. It provides methods for managing file systems (containers).

Namespace: Azure.Storage.DataLake

Key Methods:

  • GetFileSystemClient(string fileSystemName): Returns a client for a specific file system.
  • CreateFileSystemAsync(string fileSystemName): Creates a new file system.
  • DeleteFileSystemAsync(string fileSystemName): Deletes a file system.

DataLakeFileClient

Represents a file within a Data Lake Storage Gen2 file system. It provides methods for file operations such as uploading, downloading, and managing metadata.

Namespace: Azure.Storage.DataLake

Key Properties:

  • Name: The name of the file.
  • Uri: The URI of the file.

Key Methods:

  • UploadAsync(Stream stream, bool overwrite = false): Uploads data from a stream to the file.
  • ReadAsync(): Reads the content of the file into a stream.
  • DeleteAsync(): Deletes the file.
  • GetPropertiesAsync(): Retrieves metadata and properties of the file.

DataLakeDirectoryClient

Represents a directory within a Data Lake Storage Gen2 file system. It provides methods for directory operations such as creating, deleting, and listing contents.

Namespace: Azure.Storage.DataLake

Key Properties:

  • Name: The name of the directory.
  • Uri: The URI of the directory.

Key Methods:

  • CreateSubdirectoryAsync(string subdirectoryName): Creates a new subdirectory.
  • GetFileClient(string fileName): Returns a client for a file within this directory.
  • GetDirectoryClient(string subdirectoryName): Returns a client for a subdirectory.
  • DeleteAsync(): Deletes the directory.
  • GetPathsAsync(): Lists the paths (files and subdirectories) within this directory.

Key Operations

File Operations

Performing operations on files involves obtaining a DataLakeFileClient.


// Get a file system client
var fileSystemClient = dataLakeServiceClient.GetFileSystemClient("myfilesystem");

// Get a file client for a file named "mydata.csv"
var fileClient = fileSystemClient.GetFileClient("mydata.csv");

// Upload a file
using (var stream = File.OpenRead("local_data.csv"))
{
    await fileClient.UploadAsync(stream, overwrite: true);
    Console.WriteLine("File uploaded successfully.");
}

// Download a file
using (var stream = await fileClient.ReadAsync())
using (var outputStream = File.Create("downloaded_data.csv"))
{
    await stream.CopyToAsync(outputStream);
    Console.WriteLine("File downloaded successfully.");
}

// Delete a file
await fileClient.DeleteAsync();
Console.WriteLine("File deleted.");
                    

Directory Operations

Managing directories is done through the DataLakeDirectoryClient.


// Get a file system client
var fileSystemClient = dataLakeServiceClient.GetFileSystemClient("myfilesystem");

// Create a subdirectory
await fileSystemClient.CreateSubdirectoryAsync("data/processed");
Console.WriteLine("Subdirectory 'data/processed' created.");

// Get a directory client for the created subdirectory
var directoryClient = fileSystemClient.GetDirectoryClient("data/processed");

// List contents of the directory
await foreach (var pathItem in directoryClient.GetPathsAsync())
{
    Console.WriteLine($"Path: {pathItem.Name}, Is Directory: {pathItem.IsDirectory}");
}

// Delete a directory (must be empty or use recursive delete option if available/supported)
await directoryClient.DeleteAsync();
Console.WriteLine("Directory 'data/processed' deleted.");
                    

Access Control (ACLs)

The SDK supports managing Access Control Lists (ACLs) for files and directories to control permissions.

You can set and get ACLs using methods on DataLakeFileClient and DataLakeDirectoryClient.


// Assuming 'fileClient' is an instance of DataLakeFileClient
var acl = "user:alice:rwx"; // Example ACL entry

// Add or update ACL
await fileClient.SetAccessControlAsync(acl);
Console.WriteLine("ACL updated.");

// Get ACLs
var aclResult = await fileClient.GetAccessControlAsync();
Console.WriteLine("Current ACLs:");
foreach (var entry in aclResult.Value.AccessControlList)
{
    Console.WriteLine($"- {entry.Id}: {entry.Permissions}");
}
                    

Code Examples

For more comprehensive examples, please refer to the official Azure SDK for .NET samples repository on GitHub.

Azure.Storage.DataLake GitHub Repository

Additional documentation can be found on Microsoft Docs.