Getting Started with Azure Data Lake Storage Gen2
Welcome to Azure Data Lake Storage Gen2! This guide will walk you through the initial steps to set up and start using your Data Lake Storage Gen2 account.
Create an Azure Storage Account
Azure Data Lake Storage Gen2 is a set of capabilities dedicated to big data analytics built on Azure Blob Storage. You'll need an Azure Storage account with hierarchical namespace enabled.
- Sign in to the Azure portal.
- Search for "Storage accounts" and select it.
- Click "Create".
- Choose your subscription and resource group.
- Provide a unique storage account name.
- Select a region.
- For "Performance", choose "Standard".
- For "Redundancy", select your desired option (e.g., "Geo-redundant storage (GRS)").
- Navigate to the "Advanced" tab.
- Under "Data Lake Storage Gen2", enable "Hierarchical namespace".
- Review and create the account.
Once created, you'll have a storage account with Data Lake Storage Gen2 capabilities.
Create a Container (Filesystem)
Within your storage account, you create containers, which are analogous to filesystems in Data Lake Storage Gen2. These containers will hold your data.
- Navigate to your newly created storage account in the Azure portal.
- Under "Data Lake Storage Gen2", select "Containers".
- Click "+ Container".
- Enter a name for your container (e.g.,
mydatalake
). Container names must be lowercase. - Choose the public access level (e.g., "Private").
- Click "Create".
Your container is now ready to store files and directories.
Upload Data
You can upload data to your Data Lake Storage Gen2 account using various methods. Here are a couple of common ones:
Using Azure Storage Explorer
Azure Storage Explorer is a cross-platform graphical tool that allows you to easily manage your Azure Storage resources.
- Download and install Azure Storage Explorer.
- Connect to your Azure account or storage account.
- Navigate to your Data Lake Storage Gen2 account and container.
- Drag and drop files or folders, or use the "Upload" button.
Using Azure CLI
The Azure Command-Line Interface (CLI) provides powerful scripting capabilities.
First, ensure you have the Azure CLI installed and logged in:
To upload a file:
Example:
Accessing Your Data
Once your data is uploaded, you can access it using various tools and services:
- Azure Storage Explorer: For interactive browsing and download.
- Azure CLI: For scripting and command-line operations.
- SDKs: For programmatic access from applications (Python, .NET, Java, etc.).
- Azure Synapse Analytics, Azure Databricks, HDInsight: Big data analytics services that can directly query and process data in Data Lake Storage Gen2.
For programmatic access, you'll typically use connection strings or managed identities to authenticate.
Next Steps
Congratulations! You've successfully set up and started using Azure Data Lake Storage Gen2. Here are some resources for continuing your journey:
- Explore tutorials for common use cases.
- Learn about the core concepts like the hierarchical namespace and access control.
- Discover how to integrate with other Azure services for analytics.