Getting Started with Azure Data Lake Storage
Azure Data Lake Storage (ADLS) Gen2 combines the capabilities of a hierarchical file system with Azure Blob storage. This guide walks you through creating an ADLS Gen2 account, uploading data, and accessing it programmatically.
Prerequisites
- An active Azure subscription
- Azure CLI installed (
az) or access to the Azure portal - Basic knowledge of storage concepts
1. Create a Storage Account with ADLS Gen2
# Create a resource group
az group create --name MyResourceGroup --location eastus
# Create a storage account with hierarchical namespace enabled
az storage account create \
--name mystorageaccount123 \
--resource-group MyResourceGroup \
--location eastus \
--sku Standard_LRS \
--kind StorageV2 \
--hierarchical-namespace true
1. Go to the Azure portal (portal.azure.com).
2. Click "Create a resource" → "Storage" → "Storage account".
3. Fill in the basics (subscription, resource group, name, region).
4. In the "Advanced" tab, enable "Hierarchical namespace".
5. Review and create the account.
2. Create a File System (Container)
# Using Azure CLI
az storage fs create \
--account-name mystorageaccount123 \
--name my-filesystem \
--auth-mode login
3. Upload Data
You can upload files using Azure Storage Explorer, the portal, or the CLI.
# Upload a local file to the filesystem
az storage fs file upload \
--account-name mystorageaccount123 \
--filesystem my-filesystem \
--path sample.txt \
--source ./sample.txt
4. Access Data Programmatically
Below is a quick example using the Azure SDK for Python.
from azure.storage.filedatalake import DataLakeServiceClient
# Authenticate using a connection string
connection_string = "DefaultEndpointsProtocol=https;AccountName=mystorageaccount123;AccountKey=<key>;EndpointSuffix=core.windows.net"
service_client = DataLakeServiceClient.from_connection_string(connection_string)
# Get a file system client
filesystem_client = service_client.get_file_system_client("my-filesystem")
# Get a directory client (root) and upload a file
directory_client = filesystem_client.get_directory_client("/")
file_client = directory_client.create_file("hello.txt")
file_client.append_data(data=b"Hello Azure Data Lake!", offset=0, length=22)
file_client.flush_data(22)
print("File uploaded successfully.")
using Azure.Storage.Files.DataLake;
using System;
using System.Text;
using System.Threading.Tasks;
class Program
{
static async Task Main()
{
string connectionString = "DefaultEndpointsProtocol=https;AccountName=mystorageaccount123;AccountKey=<key>;EndpointSuffix=core.windows.net";
DataLakeServiceClient serviceClient = new DataLakeServiceClient(connectionString);
DataLakeFileSystemClient fileSystem = serviceClient.GetFileSystemClient("my-filesystem");
DataLakeDirectoryClient directory = fileSystem.GetDirectoryClient("/");
DataLakeFileClient file = await directory.CreateFileAsync("hello.txt");
byte[] data = Encoding.UTF8.GetBytes("Hello Azure Data Lake!");
await file.AppendAsync(data, offset: 0);
await file.FlushAsync(data.Length);
Console.WriteLine("File uploaded successfully.");
}
}
5. Secure Your Data
Use Azure Role-Based Access Control (RBAC) and POSIX-like ACLs to manage permissions.
| Permission | Description |
|---|---|
| Reader | Can read data but not modify. |
| Contributor | Can read and write data. |
| Owner | Full control, including ACL changes. |