Introduction to the Azure Data Lake Storage Gen2 JavaScript SDK

The Azure Data Lake Storage Gen2 JavaScript SDK provides a robust and efficient way to interact with your Data Lake Storage Gen2 accounts from your Node.js and browser-based applications. This SDK leverages the Azure Storage REST API to offer functionalities for managing file systems, directories, and files.

Key features include:

  • Seamless integration with Azure Active Directory (AAD) for authentication.
  • High-level abstractions for common data lake operations.
  • Support for streaming large files.
  • Comprehensive error handling and retry policies.

Installation

You can install the Azure Data Lake Storage Gen2 SDK using npm or yarn:


npm install @azure/storage-file-datalake
# or
yarn add @azure/storage-file-datalake
                

Getting Started

To get started, you'll need an Azure Data Lake Storage Gen2 account and a connection string or credentials. Here's a basic example of how to create a DataLakeServiceClient:


import { DataLakeServiceClient } from "@azure/storage-file-datalake";
import { DefaultAzureCredential } from "@azure/identity";

async function main() {
    const accountName = "YOUR_STORAGE_ACCOUNT_NAME";
    const credential = new DefaultAzureCredential(); // Or use other credentials like SharedTokenCacheCredential

    const serviceClient = new DataLakeServiceClient(
        `https://${accountName}.dfs.core.windows.net`,
        credential
    );

    console.log("DataLakeServiceClient created successfully.");

    // Now you can use serviceClient to manage file systems
    // For example:
    // const fileSystems = serviceClient.listFileSystems();
    // for await (const fs of fileSystems) {
    //     console.log(fs.name);
    // }
}

main().catch((error) => {
    console.error("The following error occurred: ", error);
    process.exit(1);
});
                

Replace YOUR_STORAGE_ACCOUNT_NAME with your actual storage account name. The DefaultAzureCredential automatically tries to authenticate using environment variables, managed identity, or other available methods.

DataLakeFileSystemClient

The DataLakeFileSystemClient represents a file system within your Data Lake Storage Gen2 account. You can obtain an instance of this client from the DataLakeServiceClient.

Creating a File System Client


// Assuming serviceClient is already initialized as shown in Getting Started

const fileSystemName = "myfilesystem";
const fileSystemClient = serviceClient.getFileSystemClient(fileSystemName);
console.log(`FileSystemClient for '${fileSystemName}' created.`);
                

createFileSystem(fileSystemName: string, options?: FileSystemCreateOptionalParams)

Creates a new file system within the storage account.

createFileSystem(fileSystemName: string, options?: FileSystemCreateOptionalParams): Promise<FileSystemCreateResponse>

Parameters:

Name Type Description
fileSystemName string The name of the file system to create.
options FileSystemCreateOptionalParams (optional) Optional parameters for the operation.

Returns:

Promise<FileSystemCreateResponse>: A promise that resolves with information about the created file system.

deleteFileSystem(options?: FileSystemDeleteOptionalParams)

Deletes the file system.

deleteFileSystem(options?: FileSystemDeleteOptionalParams): Promise<FileSystemDeleteResponse>

Parameters:

Name Type Description
options FileSystemDeleteOptionalParams (optional) Optional parameters for the operation.

Returns:

Promise<FileSystemDeleteResponse>: A promise that resolves when the file system is successfully deleted.

listFileSystems(options?: FileSystemsListOptionalParams)

Lists all file systems in the storage account.

listFileSystems(options?: FileSystemsListOptionalParams): AsyncIterable<FileSystemItem>

Parameters:

Name Type Description
options FileSystemsListOptionalParams (optional) Optional parameters for the operation, such as prefix filtering.

Returns:

AsyncIterable<FileSystemItem>: An async iterable that yields FileSystemItem objects representing each file system.

getFileSystem(options?: FileSystemGetOptionalParams)

Gets properties of the file system.

getFileSystem(options?: FileSystemGetOptionalParams): Promise<FileSystemGetResponse>

Parameters:

Name Type Description
options FileSystemGetOptionalParams (optional) Optional parameters for the operation.

Returns:

Promise<FileSystemGetResponse>: A promise that resolves with the file system properties.

DataLakeDirectoryClient

The DataLakeDirectoryClient represents a directory within a file system. You can obtain an instance of this client from the DataLakeFileSystemClient.

Creating a Directory Client


// Assuming fileSystemClient is already initialized

const directoryName = "mydata/raw";
const directoryClient = fileSystemClient.getDirectoryClient(directoryName);
console.log(`DirectoryClient for '${directoryName}' created.`);
                

createDirectory(options?: DirectoryCreateOptionalParams)

Creates a new directory.

createDirectory(options?: DirectoryCreateOptionalParams): Promise<DirectoryCreateResponse>

Parameters:

Name Type Description
options DirectoryCreateOptionalParams (optional) Optional parameters for the operation, such as metadata.

Returns:

Promise<DirectoryCreateResponse>: A promise that resolves with information about the created directory.

deleteDirectory(options?: DirectoryDeleteOptionalParams)

Deletes the directory. The directory must be empty.

deleteDirectory(options?: DirectoryDeleteOptionalParams): Promise<DirectoryDeleteResponse>

Parameters:

Name Type Description
options DirectoryDeleteOptionalParams (optional) Optional parameters for the operation, such as a recursive delete flag.

Returns:

Promise<DirectoryDeleteResponse>: A promise that resolves when the directory is successfully deleted.

listPaths(options?: PathsListOptionalParams)

Lists the paths (files and subdirectories) within this directory.

listPaths(options?: PathsListOptionalParams): AsyncIterable<PathItem>

Parameters:

Name Type Description
options PathsListOptionalParams (optional) Optional parameters for the operation, such as specifying if subdirectories should be included and filtering.

Returns:

AsyncIterable<PathItem>: An async iterable that yields PathItem objects representing paths within the directory.

DataLakeFileClient

The DataLakeFileClient represents a file within a directory or file system. You can obtain an instance of this client from the DataLakeFileSystemClient or DataLakeDirectoryClient.

Creating a File Client


// Assuming fileSystemClient is already initialized

const fileName = "mydata/raw/sales.csv";
const fileClient = fileSystemClient.getFileClient(fileName);
console.log(`FileClient for '${fileName}' created.`);
                

createFile(options?: FileCreateOptionalParams)

Creates a new file. If the file already exists, this operation overwrites it.

createFile(options?: FileCreateOptionalParams): Promise<FileCreateResponse>

Parameters:

Name Type Description
options FileCreateOptionalParams (optional) Optional parameters for the operation, such as setting the file's content type.

Returns:

Promise<FileCreateResponse>: A promise that resolves with information about the created file.

deleteFile(options?: FileDeleteOptionalParams)

Deletes the file.

deleteFile(options?: FileDeleteOptionalParams): Promise<FileDeleteResponse>

Parameters:

Name Type Description
options FileDeleteOptionalParams (optional) Optional parameters for the operation.

Returns:

Promise<FileDeleteResponse>: A promise that resolves when the file is successfully deleted.

readFile(options?: FileReadOptionalParams)

Reads the content of the file. Returns a readable stream.

readFile(options?: FileReadOptionalParams): Promise<ReadableStream>

Parameters:

Name Type Description
options FileReadOptionalParams (optional) Optional parameters for the operation, such as specifying a range to read.

Returns:

Promise<ReadableStream>: A promise that resolves with a readable stream of the file's content.

Example of reading a file:


import { pipeline } from "stream/promises";
import { createWriteStream } from "fs";

// Assuming fileClient is initialized for a file

async function downloadFile(fileClient, localPath) {
    const response = await fileClient.readFile();
    const destination = createWriteStream(localPath);
    await pipeline(response.readableStreamBody, destination);
    console.log(`File downloaded to ${localPath}`);
}

// downloadFile(fileClient, "./local_sales.csv").catch(console.error);
                

appendData(data: Uint8Array | ArrayBuffer | Buffer, options?: FileAppendDataOptionalParams)

Appends data to the end of the file. This operation is asynchronous and may require multiple calls for large data chunks.

appendData(data: Uint8Array | ArrayBuffer | Buffer, options?: FileAppendDataOptionalParams): Promise<FileAppendDataResponse>

Parameters:

Name Type Description
data Uint8Array | ArrayBuffer | Buffer The data to append to the file.
options FileAppendDataOptionalParams (optional) Optional parameters for the operation, such as setting the position for appending.

Returns:

Promise<FileAppendDataResponse>: A promise that resolves with information about the append operation, including the new file length.

flushData(options?: FileFlushDataOptionalParams)

Commits the data that has been appended to the file. This operation finalizes the file's content up to the specified length.

flushData(options?: FileFlushDataOptionalParams): Promise<FileFlushDataResponse>

Parameters:

Name Type Description
options FileFlushDataOptionalParams (optional) Optional parameters for the operation, such as setting the file's final content length.

Returns:

Promise<FileFlushDataResponse>: A promise that resolves with information about the flush operation.

Example of writing to a file:


// Assuming fileClient is initialized

async function writeFileContent(fileClient, content) {
    // Create the file if it doesn't exist (overwrites if it does)
    await fileClient.createFile();

    // Append data
    await fileClient.appendData(Buffer.from(content));

    // Flush data to commit
    await fileClient.flushData({
        finalSize: Buffer.byteLength(content)
    });
    console.log("File content written successfully.");
}

// writeFileContent(fileClient, "This is the content of my file.").catch(console.error);
                

getFileProperties(options?: FileGetPropertiesOptionalParams)

Gets the properties of the file.

getFileProperties(options?: FileGetPropertiesOptionalParams): Promise<FileGetPropertiesResponse>

Parameters:

Name Type Description
options FileGetPropertiesOptionalParams (optional) Optional parameters for the operation.

Returns:

Promise<FileGetPropertiesResponse>: A promise that resolves with the file's properties.