Introduction to Azure Data Lake Storage Gen2 Java SDK
This document provides a comprehensive reference for the Azure Data Lake Storage Gen2 (ADLS Gen2) Java SDK. ADLS Gen2 is a set of capabilities built on Azure Blob Storage, designed for big data analytics workloads. The Java SDK allows you to interact with your ADLS Gen2 data programmatically from your Java applications.
Getting Started
To begin using the ADLS Gen2 Java SDK, you need to include the necessary dependency in your project.
Maven Dependency
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-storage-blob</artifactId>
<version>12.25.0</version> <!-- Check for the latest version -->
</dependency>
Gradle Dependency
implementation 'com.azure:azure-storage-blob:12.25.0' <em>// Check for the latest version</em>
Once the dependency is added, you can start writing Java code to interact with ADLS Gen2.
Core Concepts
- Storage Account: The top-level container for all Azure Storage objects.
- File System: A namespace within a storage account. Similar to a folder at the root of a file system.
- Directory: A hierarchical structure within a file system, used for organizing files.
- File: The data object stored within a directory.
Authentication
You can authenticate with ADLS Gen2 using several methods:
- Connection String: A simple way to connect using a pre-defined connection string.
- Shared Key Access: Using your storage account's access keys.
- Azure Identity: Recommended for production environments, using managed identities or service principals for secure authentication.
The SDK provides classes like BlobServiceClientBuilder
to configure these authentication methods.
Blob Client
The primary client for interacting with ADLS Gen2 is the BlobServiceClient
. You can obtain a file system client (DataLakeFileSystemClient
) from the blob service client.
import com.azure.storage.blob.BlobServiceClient;
import com.azure.storage.blob.BlobServiceClientBuilder;
import com.azure.storage.blob.models.BlobErrorCode;
import com.azure.storage.blob.models.BlobStorageException;
import com.azure.storage.blob.datalake.DataLakeFileSystemClient;
public class AdlsClientExample {
public static void main(String[] args) {
final String connectionString = System.getenv("AZURE_STORAGE_CONNECTION_STRING");
final String fileSystemName = "myfilesystem";
if (connectionString == null || connectionString.isEmpty()) {
System.err.println("Please set the AZURE_STORAGE_CONNECTION_STRING environment variable.");
return;
}
BlobServiceClient blobServiceClient = new BlobServiceClientBuilder()
.connectionString(connectionString)
.buildClient();
DataLakeFileSystemClient fileSystemClient = blobServiceClient.getFileSystemClient(fileSystemName);
try {
if (!fileSystemClient.exists()) {
System.out.println("Creating file system: " + fileSystemName);
fileSystemClient.create();
} else {
System.out.println("File system " + fileSystemName + " already exists.");
}
} catch (BlobStorageException e) {
if (e.getStatusCode() == 409) {
System.out.println("File system already exists.");
} else {
System.err.println("Error creating file system: " + e.getMeaning());
e.printStackTrace();
}
}
}
}
Filesystem Operations
Manage your file systems using the DataLakeFileSystemClient
.
Create File System
Creates a new file system.
fileSystemClient.create();
List File Systems
Retrieves a list of all file systems in the storage account.
for ( var filesystem : blobServiceClient.listFileSystems()) {
System.out.println(filesystem.getName());
}
Delete File System
Deletes an existing file system and all its contents.
fileSystemClient.delete();
Directory Operations
Organize your data using directories. Get a DataLakeDirectoryClient
from the file system client.
Create Directory
Creates a new directory within a file system.
final String directoryName = "my/nested/directory";
DataLakeDirectoryClient directoryClient = fileSystemClient.getDirectoryClient(directoryName);
directoryClient.create();
List Directories
Lists directories within a specified path.
for (var dir : fileSystemClient.listPaths("my/nested")) {
if (dir.isDirectory()) {
System.out.println(dir.getName());
}
}
Delete Directory
Deletes a directory and all its contents. Use recursive=true
to delete non-empty directories.
directoryClient.delete();
// For non-empty directories
fileSystemClient.deleteDirectory("my/directory", true);
File Operations
Handle file uploads, downloads, and reads. Get a DataLakeFileClient
from the file system client.
Upload File
Uploads a local file to ADLS Gen2.
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
// ... within a method ...
final String fileName = "mydata.txt";
final String filePath = "upload/data.txt";
File localFile = new File(filePath);
try (FileInputStream fileStream = new FileInputStream(localFile)) {
DataLakeFileClient fileClient = fileSystemClient.getFileClient(fileName);
fileClient.upload(new BufferedInputStream(fileStream), localFile.length());
System.out.println("Successfully uploaded " + fileName);
} catch (IOException e) {
e.printStackTrace();
}
Download File
Downloads a file from ADLS Gen2 to a local path.
import java.io.FileOutputStream;
import java.io.IOException;
import java.nio.channels.Channels;
// ... within a method ...
final String downloadPath = "download/data.txt";
final String remoteFileName = "mydata.txt";
try (FileOutputStream outputStream = new FileOutputStream(downloadPath)) {
DataLakeFileClient fileClient = fileSystemClient.getFileClient(remoteFileName);
fileClient.read(outputStream);
System.out.println("Successfully downloaded " + remoteFileName);
} catch (IOException e) {
e.printStackTrace();
}
Read File Content
Reads the content of a file directly into a string or stream.
// Read as String
DataLakeFileClient fileClient = fileSystemClient.getFileClient("path/to/your/file.txt");
String fileContent = fileClient.read().toString();
System.out.println("File content: " + fileContent);
// Read into an InputStream
try (InputStream inputStream = fileClient.read().toStream()) {
// Process the inputStream
} catch (IOException e) {
e.printStackTrace();
}
Delete File
Deletes a file from ADLS Gen2.
fileSystemClient.deleteFile("path/to/your/file.txt");
Advanced Features
- Append Blobs: For large files, use append operations.
- Access Control Lists (ACLs): Manage permissions for files and directories.
- Leasing: Implement exclusive write access to a blob.
- Metadata: Attach custom metadata to files and directories.
Code Examples
Explore more detailed code examples in the official GitHub repository.
API Reference
For a complete list of classes, methods, and their parameters, please refer to the Microsoft Learn API documentation.