Providers
This document is for Airflow 2.x and later. For older versions, please refer to the relevant documentation.
In Airflow, Providers are the standard way to distribute and consume integrations with external systems. They package together Operators, Hooks, Sensors, Executors, and other components that allow Airflow to interact with a specific service or technology. This modular approach makes Airflow highly extensible and keeps the core Airflow project lean and focused.
What is a Provider?
A Provider is essentially a Python package that follows a specific naming convention and directory structure. It allows you to install additional functionality for Airflow without cluttering the core project. For example, if you need to interact with AWS S3, you would install the AWS provider. If you need to work with Google Cloud Storage, you'd install the Google Cloud provider.
Key Components of a Provider
Providers typically include:
- Operators: Define actions to be performed within a task.
- Hooks: Provide interfaces to external services. They abstract the details of connecting and interacting with databases, APIs, and other systems.
- Sensors: A type of Operator that waits for a certain condition to be met.
- Executors: Define how tasks are run (though less common for specific external service providers).
- Connections: Define connection parameters for external services.
- Type Definitions: For type hinting and static analysis.
Provider Naming Convention
Provider packages follow a naming convention:
apache-airflow-providers-{system}
For example:
apache-airflow-providers-amazonfor AWSapache-airflow-providers-googlefor Google Cloudapache-airflow-providers-cncf-kubernetesfor Kubernetes
Installing Providers
You can install providers using pip. For example, to install the Google Cloud provider:
pip install apache-airflow-providers-google
To install multiple providers at once:
pip install apache-airflow-providers-amazon apache-airflow-providers-snowflake
Discovering Available Providers
The official Airflow Providers directory on GitHub lists all supported providers and their respective packages. You can find it here.
Using Providers in DAGs
Once installed, you can import and use the operators, hooks, and other components from a provider in your DAGs. For instance, to use the GCSToGCSOperator from the Google provider:
from __future__ import annotations
import pendulum
from airflow.models.dag import DAG
from airflow.providers.google.cloud.operators.gcs import GCSToGCSOperator
with DAG(
dag_id="gcs_to_gcs_example",
schedule=None,
start_date=pendulum.datetime(2023, 1, 1, tz="UTC"),
catchup=False,
tags=["gcp", "gcs"],
) as dag:
copy_gcs_to_gcs = GCSToGCSOperator(
task_id="copy_gcs_to_gcs_task",
source_bucket="my-source-bucket",
source_object="path/to/my/file.txt",
destination_bucket="my-destination-bucket",
destination_object="new/path/for/file.txt",
)
Provider Development
If you need to integrate with a system for which an official provider doesn't exist, you can develop your own. The process involves creating a Python package that adheres to the provider conventions. You can find detailed guides on developing your own providers in the contributing section of the documentation.
Key Benefits of Providers
- Modularity: Keeps the core Airflow project clean.
- Extensibility: Easily add support for new systems.
- Maintainability: Integrations are managed and updated independently.
- Community Driven: Encourages contributions and broader ecosystem support.
Important Note:
Starting with Airflow 2.0, all integrations with external systems are managed through Providers. The old plugin system has been largely superseded by the provider mechanism for core integrations. While plugins still exist for custom extensions not covered by providers, it's recommended to use providers whenever possible.