Microsoft Azure

Documentation

Azure Data Lake Storage REST API - Data Operations

This document details the REST API operations for managing data within Azure Data Lake Storage Gen2. These operations allow you to interact with your data at a file and directory level.

Introduction

Azure Data Lake Storage Gen2 combines the capabilities of Azure Blob Storage with the functionality of Azure Data Lake Storage Gen1. It provides a hierarchical namespace and is optimized for big data analytics workloads.

Authentication

All Data Lake Storage Gen2 REST API requests must be authenticated. Supported authentication methods include:

Authentication details are typically provided via the Authorization header.

Common Headers

Several headers are common across many Data Lake Storage Gen2 REST operations:

Data Operations

Create Directory

PUT /webhdfs/v1/{filesystem}/{path}?op=MKDIRS&[auth-params]

Creates a directory in the specified filesystem. The path can include multiple directory levels. All parent directories must exist.

Parameters:
  • filesystem: The name of the filesystem (container).
  • path: The path of the directory to create.
  • op=MKDIRS: The operation to perform.
  • perm: (Optional) POSIX-style permissions (octal).
Example Request:
PUT /webhdfs/v1/mycontainer/data/raw?op=MKDIRS&perm=777 HTTP/1.1
Host: mydatalakestorage.dfs.core.windows.net
Authorization: Bearer <your-access-token>
x-ms-version: 2019-07-07
x-ms-date: Tue, 23 Jul 2024 10:00:00 GMT
x-ms-client-request-id: a1b2c3d4-e5f6-7890-1234-567890abcdef

Create/Append File

PUT /webhdfs/v1/{filesystem}/{path}?op=CREATE&[auth-params]

Creates a new file or appends data to an existing file. If the file exists, data is appended to the end.

Parameters:
  • filesystem: The name of the filesystem (container).
  • path: The path of the file to create or append to.
  • op=CREATE: The operation to perform.
  • overwrite: (Optional) Boolean, true to overwrite if file exists, false to append (default).
  • replication: (Optional) Replication factor.
  • blocksize: (Optional) Block size.
  • permission: (Optional) POSIX-style permissions (octal).
Request Body:

The content of the file to be written.

Example Request:
PUT /webhdfs/v1/mycontainer/data/report.csv?op=CREATE&overwrite=true&permission=644 HTTP/1.1
Host: mydatalakestorage.dfs.core.windows.net
Authorization: Bearer <your-access-token>
x-ms-version: 2019-07-07
Content-Length: 1234
Content-Type: text/csv
x-ms-client-request-id: b2c3d4e5-f6a7-8901-2345-67890abcdef1

<file content...>

Read File

GET /webhdfs/v1/{filesystem}/{path}?op=OPEN[&offset={offset}][&length={length}][&auth-params]

Reads the content of a file. You can specify an offset and length to read a specific portion of the file.

Parameters:
  • filesystem: The name of the filesystem (container).
  • path: The path of the file to read.
  • op=OPEN: The operation to perform.
  • offset: (Optional) The starting byte offset to read from. Defaults to 0.
  • length: (Optional) The number of bytes to read. Defaults to reading the entire remaining file.
Example Request:
GET /webhdfs/v1/mycontainer/data/report.csv?op=OPEN&offset=1024&length=512 HTTP/1.1
Host: mydatalakestorage.dfs.core.windows.net
Authorization: Bearer <your-access-token>
x-ms-version: 2019-07-07
x-ms-client-request-id: c3d4e5f6-a7b8-9012-3456-7890abcdef12
Response Body:

The content of the file or a portion of it.

Delete Item

DELETE /webhdfs/v1/{filesystem}/{path}?op=DELETE[&recursive={true|false}][&auth-params]

Deletes a file or a directory. Use recursive=true to delete a non-empty directory.

Parameters:
  • filesystem: The name of the filesystem (container).
  • path: The path of the file or directory to delete.
  • op=DELETE: The operation to perform.
  • recursive: (Optional) Boolean, true to delete directory and its contents, false otherwise. Required for non-empty directories.
Example Request:
DELETE /webhdfs/v1/mycontainer/data/old_report.csv?op=DELETE HTTP/1.1
Host: mydatalakestorage.dfs.core.windows.net
Authorization: Bearer <your-access-token>
x-ms-version: 2019-07-07
x-ms-client-request-id: d4e5f6a7-b8c9-0123-4567-890abcdef123

List Items

GET /webhdfs/v1/{filesystem}/{path}?op=LIST[&auth-params]

Lists the files and directories within a specified directory.

Parameters:
  • filesystem: The name of the filesystem (container).
  • path: The path of the directory to list. Use / to list contents of the root.
  • op=LIST: The operation to perform.
Example Request:
GET /webhdfs/v1/mycontainer/data?op=LIST HTTP/1.1
Host: mydatalakestorage.dfs.core.windows.net
Authorization: Bearer <your-access-token>
x-ms-version: 2019-07-07
x-ms-client-request-id: e5f6a7b8-c9d0-1234-5678-90abcdef1234
Response Body:

A JSON object containing a list of file and directory entries, including name, type, size, and modification time.

{
    "Directory": {
        "name": "data",
        "modificationTime": 1627075200000,
        "accessTime": 0,
        "length": 0,
        "replication": 1,
        "permissions": "33188",
        "owner": "username",
        "group": "groupname",
        "children": [
            {
                "name": "raw",
                "type": "DIRECTORY"
            },
            {
                "name": "report.csv",
                "type": "FILE",
                "length": 1234,
                "modificationTime": 1627075500000,
                "accessTime": 0,
                "replication": 1,
                "permissions": "33188",
                "owner": "username",
                "group": "groupname"
            }
        ]
    }
}

Get Properties

GET /webhdfs/v1/{filesystem}/{path}?op=GETSTATUS[&auth-params]

Retrieves status information (metadata) for a file or directory.

Parameters:
  • filesystem: The name of the filesystem (container).
  • path: The path of the file or directory.
  • op=GETSTATUS: The operation to perform.
Example Request:
GET /webhdfs/v1/mycontainer/data/report.csv?op=GETSTATUS HTTP/1.1
Host: mydatalakestorage.dfs.core.windows.net
Authorization: Bearer <your-access-token>
x-ms-version: 2019-07-07
x-ms-client-request-id: f6a7b8c9-d0e1-2345-6789-0abcdef12345
Response Body:

A JSON object containing metadata such as name, type, size, modification time, owner, and permissions.

Set Properties

PUT /webhdfs/v1/{filesystem}/{path}?op=SETOWNER&[auth-params] PUT /webhdfs/v1/{filesystem}/{path}?op=SETPERMISSION&[auth-params]

Sets ownership or permissions for a file or directory. These are often separate operations.

SETOWNER Parameters:
  • filesystem: The name of the filesystem (container).
  • path: The path of the file or directory.
  • op=SETOWNER: The operation to perform.
  • owner: The new owner's username.
  • group: The new owner's group name.
SETPERMISSION Parameters:
  • filesystem: The name of the filesystem (container).
  • path: The path of the file or directory.
  • op=SETPERMISSION: The operation to perform.
  • permission: POSIX-style permissions (octal).

Set Access Control

PATCH /webhdfs/v1/{filesystem}/{path}?action=setAccessControl[&auth-params]

Sets the Access Control List (ACL) for a file or directory. This operation uses the REST API for Blob Storage with Data Lake Storage extensions.

Parameters:
  • filesystem: The name of the filesystem (container).
  • path: The path of the file or directory.
  • action=setAccessControl: The action to perform.
Request Body:

A JSON object containing the ACL specifications.

{
    "Acl": "user::rwx,user:alice:r-x,group::rwx,mask::rwx,other::r--"
}

Get Access Control

GET /webhdfs/v1/{filesystem}/{path}?action=getAccessControl[&auth-params]

Retrieves the Access Control List (ACL) for a file or directory.

Parameters:
  • filesystem: The name of the filesystem (container).
  • path: The path of the file or directory.
  • action=getAccessControl: The action to perform.
Response Body:

A JSON object containing the ACL specifications.

{
    "Acl": "user::rwx,user:alice:r-x,group::rwx,mask::rwx,other::r--"
}

Response Codes

Common HTTP status codes returned by Data Lake Storage Gen2 REST API operations:

Error Handling

When an error occurs, the API typically returns an error response body in JSON format, containing details about the error.

{
    "error": {
        "code": "FileNotFound",
        "message": "The specified file or directory was not found.\nRequestId:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx\nTime:2024-07-23T10:05:00.123Z"
    }
}