Azure Data Lake Storage Gen2
Access Control Reference
This document provides a comprehensive reference for understanding and managing access control within Azure Data Lake Storage Gen2.
Understanding Access Control Models
Data Lake Storage Gen2 supports two primary access control models:
- Azure Role-Based Access Control (RBAC): This model controls access at the account and container level. It defines who has permissions to perform management operations on the storage account and its containers. Common roles include Storage Blob Data Owner, Storage Blob Data Contributor, and Storage Blob Data Reader.
- Access Control Lists (ACLs): This model provides fine-grained, POSIX-like permissions at the file and directory level. ACLs are inherited and can be explicitly set for individual files and folders, allowing for granular control over read, write, and execute permissions for users and groups.
RBAC Roles for Data Lake Storage Gen2
The following are key RBAC roles for managing data in Data Lake Storage Gen2:
| Role Name | Description | Permissions |
|---|---|---|
| Storage Blob Data Owner | Full control over blob data, including ownership. | Read, Write, Delete, Set ACLs, Set Ownership |
| Storage Blob Data Contributor | Read, Write, and Delete blob data. | Read, Write, Delete |
| Storage Blob Data Reader | Read blob data. | Read |
ACL Permissions
ACLs use a set of permissions for the owner, owning group, and others, similar to POSIX permissions:
- Read (r): Allows viewing file/directory contents or listing directory contents.
- Write (w): Allows creating, deleting, or renaming files/directories within a directory.
- Execute (x): Allows entering a directory (for directories) or running an executable file (for files).
Each entry in an ACL specifies a principal (user or group) and the associated permissions.
Managing Access Control
Using Azure Portal
You can manage RBAC roles through the Azure portal by navigating to your storage account, selecting "Access control (IAM)", and assigning roles. ACLs can be managed for individual files and directories via the portal's "Containers" or "Data Lake Storage" views.
Using Azure CLI
The Azure CLI provides commands for managing both RBAC and ACLs:
# Assign an RBAC role
az role assignment create --role "Storage Blob Data Contributor" --assignee "user@example.com" --scope "/subscriptions/{subId}/resourceGroups/{rgName}/providers/Microsoft.Storage/storageAccounts/{accountName}/blobServices/default/containers/{containerName}"
# Set ACLs for a directory
az storage fs access set --acl "user::rwx,group::rwx,other::rx" --path "/my/directory" --account-name mydatalakestorage --file-system mycontainer
# Set ACLs for a file
az storage fs access set --acl "user::rw,group::r,other::-" --path "/my/file.txt" --account-name mydatalakestorage --file-system mycontainer
Using Azure PowerShell
Azure PowerShell also offers cmdlets for access control management:
# Assign an RBAC role
New-AzRoleAssignment -ObjectId (Get-AzADUser -UserPrincipalName "user@example.com").Id -RoleDefinitionName "Storage Blob Data Contributor" -Scope "/subscriptions/{subId}/resourceGroups/{rgName}/providers/Microsoft.Storage/storageAccounts/{accountName}"
# Set ACLs for a directory (using Storage module)
Set-AzDataLakeGen2ItemAclObject -FileSystem "mycontainer" -Path "my/directory" -Ace "user::rwx,group::rwx,other::rx" -Context $ctx
# Set ACLs for a file (using Storage module)
Set-AzDataLakeGen2ItemAclObject -FileSystem "mycontainer" -Path "my/file.txt" -Ace "user::rw,group::r,other::-" -Context $ctx
Using SDKs
Programmatic management of access control is possible using Azure SDKs for various languages (Python, Java, .NET, Node.js). Refer to the respective SDK documentation for specific methods.
ACL Inheritance and Propagation
When you create a new file or directory, it inherits ACLs from its parent directory. You can control this behavior:
- Default ACLs: These are associated with directories and are used to grant permissions to newly created child items.
- Access ACLs: These are the effective permissions on a file or directory.
ACLs can be recursively set or modified for entire directory trees, which is crucial for managing large datasets.
Best Practices for Access Control
- Principle of Least Privilege: Grant users and service principals only the permissions they need to perform their tasks.
- Use RBAC for Account-Level Management: Leverage RBAC for broad administrative permissions on storage accounts and containers.
- Use ACLs for Fine-Grained Control: Employ ACLs for granular permissions on files and directories within containers.
- Leverage Service Principals: Use service principals for applications and services that need to access Data Lake Storage Gen2.
- Regularly Audit Permissions: Periodically review and audit access control settings to ensure security.
- Understand Default ACLs: Configure default ACLs to streamline permission management for new data.