Providers

This document is for Airflow 2.x and later. For older versions, please refer to the relevant documentation.

In Airflow, Providers are the standard way to distribute and consume integrations with external systems. They package together Operators, Hooks, Sensors, Executors, and other components that allow Airflow to interact with a specific service or technology. This modular approach makes Airflow highly extensible and keeps the core Airflow project lean and focused.

What is a Provider?

A Provider is essentially a Python package that follows a specific naming convention and directory structure. It allows you to install additional functionality for Airflow without cluttering the core project. For example, if you need to interact with AWS S3, you would install the AWS provider. If you need to work with Google Cloud Storage, you'd install the Google Cloud provider.

Key Components of a Provider

Providers typically include:

Provider Naming Convention

Provider packages follow a naming convention:

apache-airflow-providers-{system}

For example:

Installing Providers

You can install providers using pip. For example, to install the Google Cloud provider:

pip install apache-airflow-providers-google

To install multiple providers at once:

pip install apache-airflow-providers-amazon apache-airflow-providers-snowflake

Discovering Available Providers

The official Airflow Providers directory on GitHub lists all supported providers and their respective packages. You can find it here.

Using Providers in DAGs

Once installed, you can import and use the operators, hooks, and other components from a provider in your DAGs. For instance, to use the GCSToGCSOperator from the Google provider:


from __future__ import annotations

import pendulum

from airflow.models.dag import DAG
from airflow.providers.google.cloud.operators.gcs import GCSToGCSOperator

with DAG(
    dag_id="gcs_to_gcs_example",
    schedule=None,
    start_date=pendulum.datetime(2023, 1, 1, tz="UTC"),
    catchup=False,
    tags=["gcp", "gcs"],
) as dag:
    copy_gcs_to_gcs = GCSToGCSOperator(
        task_id="copy_gcs_to_gcs_task",
        source_bucket="my-source-bucket",
        source_object="path/to/my/file.txt",
        destination_bucket="my-destination-bucket",
        destination_object="new/path/for/file.txt",
    )
            

Provider Development

If you need to integrate with a system for which an official provider doesn't exist, you can develop your own. The process involves creating a Python package that adheres to the provider conventions. You can find detailed guides on developing your own providers in the contributing section of the documentation.

Key Benefits of Providers

Important Note:

Starting with Airflow 2.0, all integrations with external systems are managed through Providers. The old plugin system has been largely superseded by the provider mechanism for core integrations. While plugins still exist for custom extensions not covered by providers, it's recommended to use providers whenever possible.