Introduction to the Azure Data Lake Storage Gen2 JavaScript SDK
The Azure Data Lake Storage Gen2 JavaScript SDK provides a robust and efficient way to interact with your Data Lake Storage Gen2 accounts from your Node.js and browser-based applications. This SDK leverages the Azure Storage REST API to offer functionalities for managing file systems, directories, and files.
Key features include:
- Seamless integration with Azure Active Directory (AAD) for authentication.
- High-level abstractions for common data lake operations.
- Support for streaming large files.
- Comprehensive error handling and retry policies.
Installation
You can install the Azure Data Lake Storage Gen2 SDK using npm or yarn:
npm install @azure/storage-file-datalake
# or
yarn add @azure/storage-file-datalake
Getting Started
To get started, you'll need an Azure Data Lake Storage Gen2 account and a connection string or credentials. Here's a basic example of how to create a DataLakeServiceClient:
import { DataLakeServiceClient } from "@azure/storage-file-datalake";
import { DefaultAzureCredential } from "@azure/identity";
async function main() {
const accountName = "YOUR_STORAGE_ACCOUNT_NAME";
const credential = new DefaultAzureCredential(); // Or use other credentials like SharedTokenCacheCredential
const serviceClient = new DataLakeServiceClient(
`https://${accountName}.dfs.core.windows.net`,
credential
);
console.log("DataLakeServiceClient created successfully.");
// Now you can use serviceClient to manage file systems
// For example:
// const fileSystems = serviceClient.listFileSystems();
// for await (const fs of fileSystems) {
// console.log(fs.name);
// }
}
main().catch((error) => {
console.error("The following error occurred: ", error);
process.exit(1);
});
Replace YOUR_STORAGE_ACCOUNT_NAME with your actual storage account name. The DefaultAzureCredential automatically tries to authenticate using environment variables, managed identity, or other available methods.
DataLakeFileSystemClient
The DataLakeFileSystemClient represents a file system within your Data Lake Storage Gen2 account. You can obtain an instance of this client from the DataLakeServiceClient.
Creating a File System Client
// Assuming serviceClient is already initialized as shown in Getting Started
const fileSystemName = "myfilesystem";
const fileSystemClient = serviceClient.getFileSystemClient(fileSystemName);
console.log(`FileSystemClient for '${fileSystemName}' created.`);
createFileSystem(fileSystemName: string, options?: FileSystemCreateOptionalParams)
Creates a new file system within the storage account.
createFileSystem(fileSystemName: string, options?: FileSystemCreateOptionalParams): Promise<FileSystemCreateResponse>
Parameters:
| Name | Type | Description |
|---|---|---|
fileSystemName |
string |
The name of the file system to create. |
options |
FileSystemCreateOptionalParams (optional) |
Optional parameters for the operation. |
Returns:
Promise<FileSystemCreateResponse>: A promise that resolves with information about the created file system.
deleteFileSystem(options?: FileSystemDeleteOptionalParams)
Deletes the file system.
deleteFileSystem(options?: FileSystemDeleteOptionalParams): Promise<FileSystemDeleteResponse>
Parameters:
| Name | Type | Description |
|---|---|---|
options |
FileSystemDeleteOptionalParams (optional) |
Optional parameters for the operation. |
Returns:
Promise<FileSystemDeleteResponse>: A promise that resolves when the file system is successfully deleted.
listFileSystems(options?: FileSystemsListOptionalParams)
Lists all file systems in the storage account.
listFileSystems(options?: FileSystemsListOptionalParams): AsyncIterable<FileSystemItem>
Parameters:
| Name | Type | Description |
|---|---|---|
options |
FileSystemsListOptionalParams (optional) |
Optional parameters for the operation, such as prefix filtering. |
Returns:
AsyncIterable<FileSystemItem>: An async iterable that yields FileSystemItem objects representing each file system.
getFileSystem(options?: FileSystemGetOptionalParams)
Gets properties of the file system.
getFileSystem(options?: FileSystemGetOptionalParams): Promise<FileSystemGetResponse>
Parameters:
| Name | Type | Description |
|---|---|---|
options |
FileSystemGetOptionalParams (optional) |
Optional parameters for the operation. |
Returns:
Promise<FileSystemGetResponse>: A promise that resolves with the file system properties.
DataLakeDirectoryClient
The DataLakeDirectoryClient represents a directory within a file system. You can obtain an instance of this client from the DataLakeFileSystemClient.
Creating a Directory Client
// Assuming fileSystemClient is already initialized
const directoryName = "mydata/raw";
const directoryClient = fileSystemClient.getDirectoryClient(directoryName);
console.log(`DirectoryClient for '${directoryName}' created.`);
createDirectory(options?: DirectoryCreateOptionalParams)
Creates a new directory.
createDirectory(options?: DirectoryCreateOptionalParams): Promise<DirectoryCreateResponse>
Parameters:
| Name | Type | Description |
|---|---|---|
options |
DirectoryCreateOptionalParams (optional) |
Optional parameters for the operation, such as metadata. |
Returns:
Promise<DirectoryCreateResponse>: A promise that resolves with information about the created directory.
deleteDirectory(options?: DirectoryDeleteOptionalParams)
Deletes the directory. The directory must be empty.
deleteDirectory(options?: DirectoryDeleteOptionalParams): Promise<DirectoryDeleteResponse>
Parameters:
| Name | Type | Description |
|---|---|---|
options |
DirectoryDeleteOptionalParams (optional) |
Optional parameters for the operation, such as a recursive delete flag. |
Returns:
Promise<DirectoryDeleteResponse>: A promise that resolves when the directory is successfully deleted.
listPaths(options?: PathsListOptionalParams)
Lists the paths (files and subdirectories) within this directory.
listPaths(options?: PathsListOptionalParams): AsyncIterable<PathItem>
Parameters:
| Name | Type | Description |
|---|---|---|
options |
PathsListOptionalParams (optional) |
Optional parameters for the operation, such as specifying if subdirectories should be included and filtering. |
Returns:
AsyncIterable<PathItem>: An async iterable that yields PathItem objects representing paths within the directory.
DataLakeFileClient
The DataLakeFileClient represents a file within a directory or file system. You can obtain an instance of this client from the DataLakeFileSystemClient or DataLakeDirectoryClient.
Creating a File Client
// Assuming fileSystemClient is already initialized
const fileName = "mydata/raw/sales.csv";
const fileClient = fileSystemClient.getFileClient(fileName);
console.log(`FileClient for '${fileName}' created.`);
createFile(options?: FileCreateOptionalParams)
Creates a new file. If the file already exists, this operation overwrites it.
createFile(options?: FileCreateOptionalParams): Promise<FileCreateResponse>
Parameters:
| Name | Type | Description |
|---|---|---|
options |
FileCreateOptionalParams (optional) |
Optional parameters for the operation, such as setting the file's content type. |
Returns:
Promise<FileCreateResponse>: A promise that resolves with information about the created file.
deleteFile(options?: FileDeleteOptionalParams)
Deletes the file.
deleteFile(options?: FileDeleteOptionalParams): Promise<FileDeleteResponse>
Parameters:
| Name | Type | Description |
|---|---|---|
options |
FileDeleteOptionalParams (optional) |
Optional parameters for the operation. |
Returns:
Promise<FileDeleteResponse>: A promise that resolves when the file is successfully deleted.
readFile(options?: FileReadOptionalParams)
Reads the content of the file. Returns a readable stream.
readFile(options?: FileReadOptionalParams): Promise<ReadableStream>
Parameters:
| Name | Type | Description |
|---|---|---|
options |
FileReadOptionalParams (optional) |
Optional parameters for the operation, such as specifying a range to read. |
Returns:
Promise<ReadableStream>: A promise that resolves with a readable stream of the file's content.
Example of reading a file:
import { pipeline } from "stream/promises";
import { createWriteStream } from "fs";
// Assuming fileClient is initialized for a file
async function downloadFile(fileClient, localPath) {
const response = await fileClient.readFile();
const destination = createWriteStream(localPath);
await pipeline(response.readableStreamBody, destination);
console.log(`File downloaded to ${localPath}`);
}
// downloadFile(fileClient, "./local_sales.csv").catch(console.error);
appendData(data: Uint8Array | ArrayBuffer | Buffer, options?: FileAppendDataOptionalParams)
Appends data to the end of the file. This operation is asynchronous and may require multiple calls for large data chunks.
appendData(data: Uint8Array | ArrayBuffer | Buffer, options?: FileAppendDataOptionalParams): Promise<FileAppendDataResponse>
Parameters:
| Name | Type | Description |
|---|---|---|
data |
Uint8Array | ArrayBuffer | Buffer |
The data to append to the file. |
options |
FileAppendDataOptionalParams (optional) |
Optional parameters for the operation, such as setting the position for appending. |
Returns:
Promise<FileAppendDataResponse>: A promise that resolves with information about the append operation, including the new file length.
flushData(options?: FileFlushDataOptionalParams)
Commits the data that has been appended to the file. This operation finalizes the file's content up to the specified length.
flushData(options?: FileFlushDataOptionalParams): Promise<FileFlushDataResponse>
Parameters:
| Name | Type | Description |
|---|---|---|
options |
FileFlushDataOptionalParams (optional) |
Optional parameters for the operation, such as setting the file's final content length. |
Returns:
Promise<FileFlushDataResponse>: A promise that resolves with information about the flush operation.
Example of writing to a file:
// Assuming fileClient is initialized
async function writeFileContent(fileClient, content) {
// Create the file if it doesn't exist (overwrites if it does)
await fileClient.createFile();
// Append data
await fileClient.appendData(Buffer.from(content));
// Flush data to commit
await fileClient.flushData({
finalSize: Buffer.byteLength(content)
});
console.log("File content written successfully.");
}
// writeFileContent(fileClient, "This is the content of my file.").catch(console.error);
getFileProperties(options?: FileGetPropertiesOptionalParams)
Gets the properties of the file.
getFileProperties(options?: FileGetPropertiesOptionalParams): Promise<FileGetPropertiesResponse>
Parameters:
| Name | Type | Description |
|---|---|---|
options |
FileGetPropertiesOptionalParams (optional) |
Optional parameters for the operation. |
Returns:
Promise<FileGetPropertiesResponse>: A promise that resolves with the file's properties.