Azure Storage

Azure Data Lake Storage Gen2

A highly scalable, secure, and cost-effective data lake solution for big data analytics.

What is Azure Data Lake Storage Gen2?

Azure Data Lake Storage Gen2 is a set of capabilities dedicated to big data analytics, built on the foundation of Azure Blob Storage. It is designed to serve data from a data lake. A data lake is a centralized repository that allows you to store all your structured and unstructured data at any scale.

Key aspects include:

Key Features and Benefits

Hierarchical Namespace

Optimizes data management for big data analytics, improving performance and usability.

AbFS Driver

A native Hadoop File System (HDFS) driver for Data Lake Storage Gen2, enabling seamless integration with analytics frameworks like Spark and Hive.

Security & Governance

Fine-grained access control through Azure AD, POSIX-like ACLs, and role-based access control (RBAC).

Scalability & Performance

Handles massive datasets and high throughput with low latency, essential for complex analytics.

Cost Optimization

Tiered storage options (Hot, Cool, Archive) reduce costs by moving data to less accessible but cheaper tiers.

Interoperability

Works seamlessly with other Azure services like Azure Databricks, Azure Synapse Analytics, and HDInsight.

Common Use Cases

Getting Started

Learn how to create and manage your Data Lake Storage Gen2 accounts: