Ingest data into Azure Synapse Analytics by using Azure Data Factory
This tutorial shows you how to use Azure Data Factory to copy data from an on-premises SQL Server to Azure Synapse Analytics. We'll cover setting up the linked services, datasets, and a pipeline to orchestrate the data movement.
Prerequisites
- An Azure subscription: If you don't have an Azure subscription, create a free account before you begin.
- Azure Synapse Analytics workspace: Create a workspace if you don't have one.
- On-premises SQL Server: Ensure you have access to an on-premises SQL Server instance with data.
- Self-hosted integration runtime: You'll need to install and configure this on a machine that can access your on-premises SQL Server.
Steps
-
Create an Azure Data Factory instance
Navigate to the Azure portal and create a new Azure Data Factory resource. Follow the prompts to configure the name, subscription, resource group, and region.
-
Configure the Self-hosted Integration Runtime
Download and install the Integration Runtime Configuration Manager on a machine that can access your on-premises data source. Follow the instructions to register it with your Data Factory.
-
Create Linked Services
In your Azure Data Factory, create two linked services:
- On-premises SQL Server: Configure this with the connection details for your on-premises SQL Server, including the server name, database name, authentication type, and username/password. Select your self-hosted integration runtime.
- Azure Synapse Analytics: Configure this with the connection details for your Synapse workspace, including the workspace name, SQL endpoint, database name, and authentication method.
Example JSON for a SQL Server linked service:
{ "name": "OnPremSqlLinkedService", "properties": { "type": "AzureSqlDatabase", "typeProperties": { "connectionString": "Data Source=your_server_name;Initial Catalog=your_database_name;Integrated Security=False;User ID=your_username;Password=your_password;Encrypt=True;TrustServerCertificate=False;Connection Timeout=30", "encryptedCredential": "" }, "connectVia": { "referenceName": "your_self_hosted_ir_name", "type": "IntegrationRuntimeReference" } } } -
Create Datasets
Create two datasets:
- Source Dataset: Represents the data in your on-premises SQL Server. Specify the table name.
- Sink Dataset: Represents the destination table in Azure Synapse Analytics. Specify the table name.
-
Create a Pipeline
Create a new pipeline and add a Copy Data activity.
- Configure the Source tab of the Copy Data activity to use your source dataset.
- Configure the Sink tab to use your sink dataset.
- Map the columns between your source and sink as needed.
-
Debug and Publish the Pipeline
Debug your pipeline to test the data flow. Once satisfied, publish all your Data Factory artifacts.
-
Trigger the Pipeline
You can trigger the pipeline manually, on a schedule, or by an event. This tutorial focuses on a manual trigger.
Next Steps
After successfully ingesting data, you can explore various analytical capabilities within Azure Synapse Analytics, such as running SQL queries, building data warehousing solutions, or leveraging Spark for big data processing.