Ingest data into Azure Synapse Analytics by using Azure Data Factory

Azure Data Factory Overview

This tutorial shows you how to use Azure Data Factory to copy data from an on-premises SQL Server to Azure Synapse Analytics. We'll cover setting up the linked services, datasets, and a pipeline to orchestrate the data movement.

Prerequisites

  • An Azure subscription: If you don't have an Azure subscription, create a free account before you begin.
  • Azure Synapse Analytics workspace: Create a workspace if you don't have one.
  • On-premises SQL Server: Ensure you have access to an on-premises SQL Server instance with data.
  • Self-hosted integration runtime: You'll need to install and configure this on a machine that can access your on-premises SQL Server.

Steps

  1. Create an Azure Data Factory instance

    Navigate to the Azure portal and create a new Azure Data Factory resource. Follow the prompts to configure the name, subscription, resource group, and region.

  2. Configure the Self-hosted Integration Runtime

    Download and install the Integration Runtime Configuration Manager on a machine that can access your on-premises data source. Follow the instructions to register it with your Data Factory.

  3. Create Linked Services

    In your Azure Data Factory, create two linked services:

    • On-premises SQL Server: Configure this with the connection details for your on-premises SQL Server, including the server name, database name, authentication type, and username/password. Select your self-hosted integration runtime.
    • Azure Synapse Analytics: Configure this with the connection details for your Synapse workspace, including the workspace name, SQL endpoint, database name, and authentication method.

    Example JSON for a SQL Server linked service:

    {
        "name": "OnPremSqlLinkedService",
        "properties": {
            "type": "AzureSqlDatabase",
            "typeProperties": {
                "connectionString": "Data Source=your_server_name;Initial Catalog=your_database_name;Integrated Security=False;User ID=your_username;Password=your_password;Encrypt=True;TrustServerCertificate=False;Connection Timeout=30",
                "encryptedCredential": ""
            },
            "connectVia": {
                "referenceName": "your_self_hosted_ir_name",
                "type": "IntegrationRuntimeReference"
            }
        }
    }
  4. Create Datasets

    Create two datasets:

    • Source Dataset: Represents the data in your on-premises SQL Server. Specify the table name.
    • Sink Dataset: Represents the destination table in Azure Synapse Analytics. Specify the table name.
  5. Create a Pipeline

    Create a new pipeline and add a Copy Data activity.

    • Configure the Source tab of the Copy Data activity to use your source dataset.
    • Configure the Sink tab to use your sink dataset.
    • Map the columns between your source and sink as needed.
  6. Debug and Publish the Pipeline

    Debug your pipeline to test the data flow. Once satisfied, publish all your Data Factory artifacts.

  7. Trigger the Pipeline

    You can trigger the pipeline manually, on a schedule, or by an event. This tutorial focuses on a manual trigger.

Next Steps

After successfully ingesting data, you can explore various analytical capabilities within Azure Synapse Analytics, such as running SQL queries, building data warehousing solutions, or leveraging Spark for big data processing.