Azure Data Lake Storage REST API - Data Operations
This document details the REST API operations for managing data within Azure Data Lake Storage Gen2. These operations allow you to interact with your data at a file and directory level.
Introduction
Azure Data Lake Storage Gen2 combines the capabilities of Azure Blob Storage with the functionality of Azure Data Lake Storage Gen1. It provides a hierarchical namespace and is optimized for big data analytics workloads.
Authentication
All Data Lake Storage Gen2 REST API requests must be authenticated. Supported authentication methods include:
- Azure Active Directory (Azure AD) service principal or managed identity.
- Shared Key authorization (less recommended for production scenarios).
Authentication details are typically provided via the Authorization
header.
Common Headers
Several headers are common across many Data Lake Storage Gen2 REST operations:
x-ms-version
: Specifies the API version being used.Authorization
: Contains the authentication credentials.x-ms-client-request-id
: A GUID that uniquely identifies the client request.Accept
: Specifies the desired representation of the response.
Data Operations
Create Directory
Creates a directory in the specified filesystem. The path can include multiple directory levels. All parent directories must exist.
Parameters:
filesystem
: The name of the filesystem (container).path
: The path of the directory to create.op=MKDIRS
: The operation to perform.perm
: (Optional) POSIX-style permissions (octal).
Example Request:
PUT /webhdfs/v1/mycontainer/data/raw?op=MKDIRS&perm=777 HTTP/1.1
Host: mydatalakestorage.dfs.core.windows.net
Authorization: Bearer <your-access-token>
x-ms-version: 2019-07-07
x-ms-date: Tue, 23 Jul 2024 10:00:00 GMT
x-ms-client-request-id: a1b2c3d4-e5f6-7890-1234-567890abcdef
Create/Append File
Creates a new file or appends data to an existing file. If the file exists, data is appended to the end.
Parameters:
filesystem
: The name of the filesystem (container).path
: The path of the file to create or append to.op=CREATE
: The operation to perform.overwrite
: (Optional) Boolean,true
to overwrite if file exists,false
to append (default).replication
: (Optional) Replication factor.blocksize
: (Optional) Block size.permission
: (Optional) POSIX-style permissions (octal).
Request Body:
The content of the file to be written.
Example Request:
PUT /webhdfs/v1/mycontainer/data/report.csv?op=CREATE&overwrite=true&permission=644 HTTP/1.1
Host: mydatalakestorage.dfs.core.windows.net
Authorization: Bearer <your-access-token>
x-ms-version: 2019-07-07
Content-Length: 1234
Content-Type: text/csv
x-ms-client-request-id: b2c3d4e5-f6a7-8901-2345-67890abcdef1
<file content...>
Read File
Reads the content of a file. You can specify an offset and length to read a specific portion of the file.
Parameters:
filesystem
: The name of the filesystem (container).path
: The path of the file to read.op=OPEN
: The operation to perform.offset
: (Optional) The starting byte offset to read from. Defaults to 0.length
: (Optional) The number of bytes to read. Defaults to reading the entire remaining file.
Example Request:
GET /webhdfs/v1/mycontainer/data/report.csv?op=OPEN&offset=1024&length=512 HTTP/1.1
Host: mydatalakestorage.dfs.core.windows.net
Authorization: Bearer <your-access-token>
x-ms-version: 2019-07-07
x-ms-client-request-id: c3d4e5f6-a7b8-9012-3456-7890abcdef12
Response Body:
The content of the file or a portion of it.
Delete Item
Deletes a file or a directory. Use recursive=true
to delete a non-empty directory.
Parameters:
filesystem
: The name of the filesystem (container).path
: The path of the file or directory to delete.op=DELETE
: The operation to perform.recursive
: (Optional) Boolean,true
to delete directory and its contents,false
otherwise. Required for non-empty directories.
Example Request:
DELETE /webhdfs/v1/mycontainer/data/old_report.csv?op=DELETE HTTP/1.1
Host: mydatalakestorage.dfs.core.windows.net
Authorization: Bearer <your-access-token>
x-ms-version: 2019-07-07
x-ms-client-request-id: d4e5f6a7-b8c9-0123-4567-890abcdef123
List Items
Lists the files and directories within a specified directory.
Parameters:
filesystem
: The name of the filesystem (container).path
: The path of the directory to list. Use/
to list contents of the root.op=LIST
: The operation to perform.
Example Request:
GET /webhdfs/v1/mycontainer/data?op=LIST HTTP/1.1
Host: mydatalakestorage.dfs.core.windows.net
Authorization: Bearer <your-access-token>
x-ms-version: 2019-07-07
x-ms-client-request-id: e5f6a7b8-c9d0-1234-5678-90abcdef1234
Response Body:
A JSON object containing a list of file and directory entries, including name, type, size, and modification time.
{
"Directory": {
"name": "data",
"modificationTime": 1627075200000,
"accessTime": 0,
"length": 0,
"replication": 1,
"permissions": "33188",
"owner": "username",
"group": "groupname",
"children": [
{
"name": "raw",
"type": "DIRECTORY"
},
{
"name": "report.csv",
"type": "FILE",
"length": 1234,
"modificationTime": 1627075500000,
"accessTime": 0,
"replication": 1,
"permissions": "33188",
"owner": "username",
"group": "groupname"
}
]
}
}
Get Properties
Retrieves status information (metadata) for a file or directory.
Parameters:
filesystem
: The name of the filesystem (container).path
: The path of the file or directory.op=GETSTATUS
: The operation to perform.
Example Request:
GET /webhdfs/v1/mycontainer/data/report.csv?op=GETSTATUS HTTP/1.1
Host: mydatalakestorage.dfs.core.windows.net
Authorization: Bearer <your-access-token>
x-ms-version: 2019-07-07
x-ms-client-request-id: f6a7b8c9-d0e1-2345-6789-0abcdef12345
Response Body:
A JSON object containing metadata such as name, type, size, modification time, owner, and permissions.
Set Properties
Sets ownership or permissions for a file or directory. These are often separate operations.
SETOWNER
Parameters:
filesystem
: The name of the filesystem (container).path
: The path of the file or directory.op=SETOWNER
: The operation to perform.owner
: The new owner's username.group
: The new owner's group name.
SETPERMISSION
Parameters:
filesystem
: The name of the filesystem (container).path
: The path of the file or directory.op=SETPERMISSION
: The operation to perform.permission
: POSIX-style permissions (octal).
Set Access Control
Sets the Access Control List (ACL) for a file or directory. This operation uses the REST API for Blob Storage with Data Lake Storage extensions.
Parameters:
filesystem
: The name of the filesystem (container).path
: The path of the file or directory.action=setAccessControl
: The action to perform.
Request Body:
A JSON object containing the ACL specifications.
{
"Acl": "user::rwx,user:alice:r-x,group::rwx,mask::rwx,other::r--"
}
Get Access Control
Retrieves the Access Control List (ACL) for a file or directory.
Parameters:
filesystem
: The name of the filesystem (container).path
: The path of the file or directory.action=getAccessControl
: The action to perform.
Response Body:
A JSON object containing the ACL specifications.
{
"Acl": "user::rwx,user:alice:r-x,group::rwx,mask::rwx,other::r--"
}
Response Codes
Common HTTP status codes returned by Data Lake Storage Gen2 REST API operations:
200 OK
: The request was successful.201 Created
: The resource was successfully created.204 No Content
: The request was successful, but there is no content to return (e.g., DELETE).400 Bad Request
: The request was malformed or invalid.401 Unauthorized
: Authentication failed.403 Forbidden
: The authenticated user does not have permission to perform the operation.404 Not Found
: The specified resource was not found.409 Conflict
: The request conflicts with the current state of the resource (e.g., trying to create a file that already exists withoverwrite=false
).500 Internal Server Error
: An unexpected error occurred on the server.
Error Handling
When an error occurs, the API typically returns an error response body in JSON format, containing details about the error.
{
"error": {
"code": "FileNotFound",
"message": "The specified file or directory was not found.\nRequestId:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx\nTime:2024-07-23T10:05:00.123Z"
}
}