Azure Synapse Analytics - Documentation

What is Azure Synapse Analytics?

Azure Synapse Analytics is an enterprise analytics service that accelerates time to insight across data warehouses and Big Data systems. It brings together data integration, enterprise data warehousing, and Big Data analytics into uniquely integrated experiences.

Synapse brings together the best of SQL and Spark. It offers a unified workspace where you can ingest, prepare, manage, and serve data for immediate BI and machine learning needs.

Key Benefit

Synapse unifies disparate data sources and analytic technologies, simplifying the development and operationalization of complex data solutions.

Key Features

Unified Workspace: A single pane of glass for all your analytics needs, including data ingestion, preparation, management, serving, and monitoring.
Apache Spark Integration: Seamlessly run Spark jobs for advanced analytics, data engineering, and machine learning directly within Synapse.
Azure Data Explorer Integration: Leverage the power of Azure Data Explorer for real-time analytics and time-series data exploration.
SQL Serverless: Query data directly from your data lake using T-SQL without provisioning infrastructure.
SQL Dedicated Pools: Provision dedicated resources for high-performance data warehousing with familiar SQL capabilities.
Data Integration: Build robust ETL/ELT pipelines using a code-free or code-based experience.
Monitoring & Security: Comprehensive tools for monitoring performance, managing security, and ensuring compliance.

Architecture Overview

Azure Synapse Analytics comprises several core components:

Synapse Workspace: The central hub for managing all your analytics assets and resources.
Apache Spark Pools: Managed Spark clusters for big data processing and machine learning.
SQL Pools:
- Dedicated SQL Pools: Provisioned resources for traditional data warehousing workloads.
- Serverless SQL Pools: On-demand query capabilities over data in Azure Data Lake Storage.
Data Explorer Pools: Optimized for real-time analytics and log data.
Data Integration Pipelines: Orchestrate data movement and transformation across various services.

These components work together to provide a flexible and powerful analytics platform.

Getting Started with Synapse

Follow these steps to begin using Azure Synapse Analytics:

Create a Synapse Workspace: Provision a new workspace in the Azure portal.
Ingest Data: Use Synapse Pipelines or other tools to bring your data into Synapse or a linked data lake.
Explore Data:
- Use Serverless SQL pools to query data directly from Azure Data Lake Storage Gen2.
- Use Spark Notebooks to perform complex data transformations and analysis.
- Load data into Dedicated SQL pools for high-performance warehousing.
Build and Deploy: Develop ETL/ELT pipelines, machine learning models, and BI dashboards.

Refer to the official Azure Synapse Analytics documentation for detailed guides and tutorials.

Example: Querying data with Serverless SQL

You can query CSV files in your data lake using T-SQL:


SELECT
    TOP 100 *
FROM
    OPENROWSET(
        BULK 'https://yourdatalake.dfs.core.windows.net/yourcontainer/yourfolder/*.csv',
        FORMAT = 'CSV'
    ) AS [result]

Common Use Cases

Enterprise Data Warehousing: Build modern data warehouses with elastic scaling and high performance.
Big Data Analytics: Process and analyze massive datasets using Apache Spark.
Real-time Analytics: Gain insights from streaming data with Azure Data Explorer integration.
Machine Learning: Train, deploy, and manage machine learning models directly within the Synapse environment.
Data Exploration and Discovery: Quickly analyze data without complex setup using Serverless SQL pools.
Data Integration: Consolidate data from various sources for reporting and analysis.