Azure Logo Azure Data Lake Storage Gen2

Access Control Reference

This document provides a comprehensive reference for understanding and managing access control within Azure Data Lake Storage Gen2.

Note: Azure Data Lake Storage Gen2 combines the capabilities of Azure Data Lake Storage Gen1 with Azure Blob Storage. Access control mechanisms leverage both Azure role-based access control (RBAC) and Access Control Lists (ACLs).

Understanding Access Control Models

Data Lake Storage Gen2 supports two primary access control models:

  • Azure Role-Based Access Control (RBAC): This model controls access at the account and container level. It defines who has permissions to perform management operations on the storage account and its containers. Common roles include Storage Blob Data Owner, Storage Blob Data Contributor, and Storage Blob Data Reader.
  • Access Control Lists (ACLs): This model provides fine-grained, POSIX-like permissions at the file and directory level. ACLs are inherited and can be explicitly set for individual files and folders, allowing for granular control over read, write, and execute permissions for users and groups.

RBAC Roles for Data Lake Storage Gen2

The following are key RBAC roles for managing data in Data Lake Storage Gen2:

Role Name Description Permissions
Storage Blob Data Owner Full control over blob data, including ownership. Read, Write, Delete, Set ACLs, Set Ownership
Storage Blob Data Contributor Read, Write, and Delete blob data. Read, Write, Delete
Storage Blob Data Reader Read blob data. Read

ACL Permissions

ACLs use a set of permissions for the owner, owning group, and others, similar to POSIX permissions:

  • Read (r): Allows viewing file/directory contents or listing directory contents.
  • Write (w): Allows creating, deleting, or renaming files/directories within a directory.
  • Execute (x): Allows entering a directory (for directories) or running an executable file (for files).

Each entry in an ACL specifies a principal (user or group) and the associated permissions.

Managing Access Control

Using Azure Portal

You can manage RBAC roles through the Azure portal by navigating to your storage account, selecting "Access control (IAM)", and assigning roles. ACLs can be managed for individual files and directories via the portal's "Containers" or "Data Lake Storage" views.

Using Azure CLI

The Azure CLI provides commands for managing both RBAC and ACLs:

# Assign an RBAC role
az role assignment create --role "Storage Blob Data Contributor" --assignee "user@example.com" --scope "/subscriptions/{subId}/resourceGroups/{rgName}/providers/Microsoft.Storage/storageAccounts/{accountName}/blobServices/default/containers/{containerName}"

# Set ACLs for a directory
az storage fs access set --acl "user::rwx,group::rwx,other::rx" --path "/my/directory" --account-name mydatalakestorage --file-system mycontainer

# Set ACLs for a file
az storage fs access set --acl "user::rw,group::r,other::-" --path "/my/file.txt" --account-name mydatalakestorage --file-system mycontainer

Using Azure PowerShell

Azure PowerShell also offers cmdlets for access control management:

# Assign an RBAC role
New-AzRoleAssignment -ObjectId (Get-AzADUser -UserPrincipalName "user@example.com").Id -RoleDefinitionName "Storage Blob Data Contributor" -Scope "/subscriptions/{subId}/resourceGroups/{rgName}/providers/Microsoft.Storage/storageAccounts/{accountName}"

# Set ACLs for a directory (using Storage module)
Set-AzDataLakeGen2ItemAclObject -FileSystem "mycontainer" -Path "my/directory" -Ace "user::rwx,group::rwx,other::rx" -Context $ctx

# Set ACLs for a file (using Storage module)
Set-AzDataLakeGen2ItemAclObject -FileSystem "mycontainer" -Path "my/file.txt" -Ace "user::rw,group::r,other::-" -Context $ctx

Using SDKs

Programmatic management of access control is possible using Azure SDKs for various languages (Python, Java, .NET, Node.js). Refer to the respective SDK documentation for specific methods.

ACL Inheritance and Propagation

When you create a new file or directory, it inherits ACLs from its parent directory. You can control this behavior:

  • Default ACLs: These are associated with directories and are used to grant permissions to newly created child items.
  • Access ACLs: These are the effective permissions on a file or directory.

ACLs can be recursively set or modified for entire directory trees, which is crucial for managing large datasets.

Tip: For optimal performance and management, it's recommended to set default ACLs on parent directories and let them propagate to newly created items.

Best Practices for Access Control

  • Principle of Least Privilege: Grant users and service principals only the permissions they need to perform their tasks.
  • Use RBAC for Account-Level Management: Leverage RBAC for broad administrative permissions on storage accounts and containers.
  • Use ACLs for Fine-Grained Control: Employ ACLs for granular permissions on files and directories within containers.
  • Leverage Service Principals: Use service principals for applications and services that need to access Data Lake Storage Gen2.
  • Regularly Audit Permissions: Periodically review and audit access control settings to ensure security.
  • Understand Default ACLs: Configure default ACLs to streamline permission management for new data.