Providers

Airflow is designed to be extensible, and Providers are the primary way to extend Airflow's capabilities. They package together the Operators, Hooks, Sensors, and other integrations that Airflow needs to interact with external systems and services.

What are Providers?

Before Airflow 2.0, integrations with external systems like AWS, GCP, Snowflake, etc., were often bundled directly into the core Airflow distribution or managed as separate plugins. This led to challenges in dependency management, versioning, and a bloated core.

Providers were introduced to solve these issues. They are separate Python packages that can be installed independently of the Airflow core. This allows for better modularity, faster release cycles for integrations, and the ability for community members and organizations to contribute and maintain their own integrations.

Key Components of a Provider

Discovering and Installing Providers

The official Airflow Providers are published on PyPI and can be installed using pip. You can find a list of official providers on the installation page.

To install a provider, use pip:


pip install apache-airflow-providers-google
pip install apache-airflow-providers-amazon
pip install apache-airflow-providers-snowflake
            

Using Providers in DAGs

Once a provider is installed, you can import and use its components directly in your DAGs. For example, to use a Google Cloud Storage operator:


from __future__ import annotations

import pendulum

from airflow.models.dag import DAG
from airflow.providers.google.cloud.operators.gcs import GCSCreateBucketOperator

with DAG(
    dag_id="gcs_example",
    schedule=None,
    start_date=pendulum.datetime(2023, 1, 1, tz="UTC"),
    catchup=False,
    tags=["gcp", "example"],
) as dag:
    create_gcs_bucket = GCSCreateBucketOperator(
        task_id="create_gcs_bucket",
        bucket_name="my-airflow-test-bucket",
        project_id="my-gcp-project-id",
    )
            

Provider Development

Developing your own Airflow providers allows you to encapsulate custom integrations and share them within your organization or with the wider community. The process involves defining your operators, hooks, and other components, and then packaging them into a standard Python distribution.

For detailed information on developing providers, please refer to the Extending Airflow documentation.

Note: Always ensure you are installing providers compatible with your Airflow version. Refer to the official provider documentation for compatibility matrices and detailed usage instructions for each provider.