Executor Reference

Airflow's executor is responsible for running your tasks. Airflow supports several executors, each with its own strengths and weaknesses. Choosing the right executor is crucial for scaling your Airflow deployment and managing your workloads effectively.

Supported Executors

Here's a summary of the executors available in Apache Airflow:

Executor Description Use Case Complexity
LocalExecutor Runs tasks locally on the same machine as the scheduler. Tasks are executed in parallel using processes. Development, small deployments, testing. Low
CeleryExecutor Distributes tasks to Celery workers. Requires a message broker (e.g., RabbitMQ, Redis). Scalable, distributed workloads. Medium
KubernetesExecutor Launches tasks as individual Kubernetes Pods. Ideal for containerized environments. Dynamic scaling, Kubernetes native deployments. High
DaskExecutor Distributes tasks to Dask workers. Leverages Dask's distributed computing capabilities. Python-heavy workloads, scientific computing. Medium
SequentialExecutor Runs tasks one after another sequentially. No parallelism. Debugging, simple testing. Very Low

LocalExecutor

The LocalExecutor is the default executor and is suitable for development and smaller deployments. It spins up a new process for each task on the same machine where the Airflow scheduler is running. Parallelism is achieved by running multiple processes concurrently.

Configuration:

AIRFLOW__CORE__EXECUTOR = LocalExecutor

You can configure the parallelism using:

AIRFLOW__CORE__PARALLELISM = 32
AIRFLOW__CORE__MAX_THREADS = 32

CeleryExecutor

The CeleryExecutor leverages the Celery distributed task queue system. Airflow tasks are sent to a message broker (like RabbitMQ or Redis), and Celery workers pick up and execute these tasks.

Configuration:

AIRFLOW__CORE__EXECUTOR = CeleryExecutor

You'll need to configure your message broker URL:

AIRFLOW__CELERY__BROKER_URL = redis://localhost:6379/1
AIRFLOW__CELERY__RESULT_BACKEND = redis://localhost:6379/1

And specify the number of Celery workers:

# In celery_worker_config.py or via environment variables
task_acks_late = True
worker_concurrency = 4

KubernetesExecutor

The KubernetesExecutor is ideal for environments already using Kubernetes. Each task is executed in its own isolated Kubernetes Pod. This provides excellent resource isolation and scalability.

Configuration:

AIRFLOW__CORE__EXECUTOR = KubernetesExecutor

You can specify the Kubernetes namespace, image, and other Pod configurations:

AIRFLOW__KUBERNETES__NAMESPACE = airflow
AIRFLOW__KUBERNETES__POD_IMAGE = apache/airflow:latest
AIRFLOW__KUBERNETES__CREATE_POD_TEMPLATE_FILE = /opt/airflow/pod-template.yaml

DaskExecutor

The DaskExecutor allows Airflow to distribute tasks across a Dask cluster. This is particularly useful for Python-heavy workloads where Dask can manage distributed computation efficiently.

Configuration:

AIRFLOW__CORE__EXECUTOR = DaskExecutor

Configure your Dask scheduler address:

AIRFLOW__DASK__SCHEDULER_ADDRESS = tcp://dask-scheduler:8786

SequentialExecutor

The SequentialExecutor runs tasks one after another. It's not suitable for production environments but is excellent for debugging and understanding task dependencies without the complexities of parallelism.

Configuration:

AIRFLOW__CORE__EXECUTOR = SequentialExecutor

Choosing the Right Executor

The choice of executor depends on your specific needs:

Remember to configure the `AIRFLOW__CORE__EXECUTOR` setting in your `airflow.cfg` or via environment variables to select your desired executor.