Azure Synapse Analytics Pipelines

This document provides a comprehensive reference for Azure Synapse Analytics Pipelines, including activities, triggers, datasets, linked services, and best practices for building data integration and orchestration solutions within Azure Synapse Analytics.

Overview

Azure Synapse Analytics pipelines are logical groupings of activities that together perform a task. Pipelines are used to automate processes, orchestrate data movement, and transform data. They offer a powerful way to build complex data workflows in the cloud.

Key Concepts

Pipeline Activities

Data Movement Activities

Data Transformation Activities

Control Flow Activities

Copy Data Activity

The Copy Data activity is used to copy data from a source data store to a sink data store. It supports a wide range of connectors and data formats.

Properties

{ "name": "CopyDataActivity", "type": "Copy", "dependsOn": [], "policy": { "timeout": "0:30:00", "retry": 0, "retryIntervalInSeconds": 30, "secureOutput": false, "secureInput": false }, "userProperties": [], "typeProperties": { "source": { "type": "BlobSource", "recursive": true }, "sink": { "type": "ParquetSink", "writeBatchSize": 1000 }, "enableStaging": false, "translator": { "type": "TabularTranslator", "mappings": [ { "source": { "name": "col1" }, "sink": { "name": "ColumnA" } } ] } } }

Execute SQL Script Activity

This activity executes a SQL script against a relational database.

{ "name": "ExecuteSQL", "type": "Script", "dependsOn": [], "policy": { "timeout": "1:00:00", "retry": 3, "retryIntervalInSeconds": 60, "secureOutput": false, "secureInput": false }, "userProperties": [], "typeProperties": { "scripts": [ { "sqlScript": "CREATE TABLE IF NOT EXISTS MyTable (Id INT, Name VARCHAR(100));" } ] } }

Databricks Notebook Activity

Allows you to execute a Databricks notebook as part of your pipeline.

Tip: Ensure your Synapse workspace is properly integrated with Azure Databricks for seamless execution.

Get Metadata Activity

Retrieves metadata from a data store, such as file names, sizes, and last modified dates.

Delete Activity

Deletes files or folders from a data store.

Stored Procedure Activity

Executes a stored procedure in a data store.

Azure Function Activity

Executes an Azure Function as a custom activity.

Pipeline Triggers

Schedule Trigger

Runs a pipeline at a specified time interval.

{ "type": "ScheduleTrigger", "typeProperties": { "recurrence": { "frequency": "Day", "interval": 1, "startTime": "2023-10-27T08:00:00Z", "timeSkip": 1, "skipLookbackBoundary": true, "whenEndTime": "2024-12-31T23:59:00Z", "count": 5 } } }

Event Trigger

Triggers a pipeline based on an event, such as a file arriving in Blob Storage.

Tumbling Window Trigger

A time-windowed trigger that processes data in discrete, non-overlapping intervals.

For Each Activity

Iterates over a collection of items and executes a set of activities for each item.

If Condition Activity

Executes a set of activities based on a specified condition.

Wait Activity

Pauses the execution of a pipeline for a specified duration.

Execute Pipeline Activity

Allows one pipeline to call another pipeline, enabling pipeline composition.

Datasets

Datasets represent the data within the linked data stores. They specify the data format, location, and schema.

{ "name": "SourceCSVDataset", "properties": { "linkedServiceName": { "referenceName": "AzureBlobStorageLinkedService", "type": "LinkedServiceReference" }, "type": "DelimitedText", "typeProperties": { "location": { "type": "AzureBlobStorageLocation", "container": "source-data" }, "columnDelimiter": ",", "firstRowAsHeader": true } } }

Linked Services

Linked services define the connection information to external resources such as databases, file systems, and cloud services.

{ "name": "AzureBlobStorageLinkedService", "properties": { "type": "AzureBlobStorage", "typeProperties": { "connectionString": "@{secrets('AzureKeyVault')}" } } }
Note: Always use Azure Key Vault to store sensitive connection strings and credentials.

Best Practices