Azure Synapse Analytics - Data Integration

Data Integration in Azure Synapse Analytics

Azure Synapse Analytics is a limitless analytics service that brings together data warehousing and Big Data analytics. It provides a unified experience to ingest, prepare, manage, and serve data for immediate business intelligence and machine learning needs.

Data integration is a core capability of Synapse Analytics, enabling you to connect to various data sources, transform data, and load it into your data store for analysis. This section details the tools and features available for data integration within Synapse.

Key Data Integration Features

Pipelines: Orchestrate complex data movement and transformation workflows.
Data Flows: Visual data transformation capabilities without writing code.
Connectors: Support for a wide range of data sources and destinations.
Integration Runtime: Managed compute infrastructure for data movement and transformation activities.

Getting Started with Data Integration

To begin with data integration in Synapse Analytics, you will typically:

Connect to Data Sources: Utilize the built-in connectors to establish connections to your data.
Create Pipelines: Design and build data pipelines to automate your data processes.
Use Data Flows: Leverage visual data flows to transform your data without extensive coding.
Monitor and Manage: Track the execution of your pipelines and manage your data integration activities.

Pipelines

Synapse pipelines are logical groupings of activities that together perform a task. Activities can include data movement, data transformation, control flow, and more. You can use pipelines to:

Copy data from one location to another.
Execute SQL scripts or stored procedures.
Run Spark notebooks or jobs.
Orchestrate complex ETL/ELT processes.

Learn more about creating and configuring pipelines.

Data Flows

Mapping Data Flows provide a fully managed visual experience to build and manage data transformations at scale without writing code. You can graphically design your data transformation logic, and Synapse translates this into Spark jobs that run on managed Spark clusters. Key features include:

Source transformation: Connect to data sources.
Derived Column transformation: Create new columns.
Aggregate transformation: Group data.
Join transformation: Combine datasets.
Sink transformation: Load data into destinations.

Explore the capabilities of Mapping Data Flows.

Integration Runtime

The Integration Runtime (IR) is the compute infrastructure used by Synapse pipelines to perform data movement and transformation activities. Synapse offers three types of IR:

Azure Integration Runtime: Fully managed compute in Azure for cloud-to-cloud data movement.
Self-hosted Integration Runtime: Install on-premises or in a private network to move data between on-premises and cloud.
Azure-SSIS Integration Runtime: For lifting and shifting existing SSIS packages to Synapse.

Note: Understanding the different Integration Runtime options is crucial for efficient and secure data integration, especially when dealing with hybrid cloud scenarios.

Supported Data Sources and Destinations

Synapse Analytics supports a vast array of data sources and destinations, including:

Azure Blob Storage
Azure Data Lake Storage (Gen1 and Gen2)
Azure SQL Database
Azure Cosmos DB
On-premises SQL Server
Amazon S3
And many more...

Important: Always refer to the latest official Microsoft documentation for an up-to-date list of supported connectors and their configurations.

Next Steps

Continue exploring the following topics to deepen your understanding of data integration in Azure Synapse Analytics: