Executor Reference
Airflow's executor is responsible for running your tasks. Airflow supports several executors, each with its own strengths and weaknesses. Choosing the right executor is crucial for scaling your Airflow deployment and managing your workloads effectively.
Supported Executors
Here's a summary of the executors available in Apache Airflow:
| Executor | Description | Use Case | Complexity |
|---|---|---|---|
LocalExecutor |
Runs tasks locally on the same machine as the scheduler. Tasks are executed in parallel using processes. | Development, small deployments, testing. | Low |
CeleryExecutor |
Distributes tasks to Celery workers. Requires a message broker (e.g., RabbitMQ, Redis). | Scalable, distributed workloads. | Medium |
KubernetesExecutor |
Launches tasks as individual Kubernetes Pods. Ideal for containerized environments. | Dynamic scaling, Kubernetes native deployments. | High |
DaskExecutor |
Distributes tasks to Dask workers. Leverages Dask's distributed computing capabilities. | Python-heavy workloads, scientific computing. | Medium |
SequentialExecutor |
Runs tasks one after another sequentially. No parallelism. | Debugging, simple testing. | Very Low |
LocalExecutor
The LocalExecutor is the default executor and is suitable for development and smaller deployments. It spins up a new process for each task on the same machine where the Airflow scheduler is running. Parallelism is achieved by running multiple processes concurrently.
Configuration:
AIRFLOW__CORE__EXECUTOR = LocalExecutor
You can configure the parallelism using:
AIRFLOW__CORE__PARALLELISM = 32
AIRFLOW__CORE__MAX_THREADS = 32
CeleryExecutor
The CeleryExecutor leverages the Celery distributed task queue system. Airflow tasks are sent to a message broker (like RabbitMQ or Redis), and Celery workers pick up and execute these tasks.
Configuration:
AIRFLOW__CORE__EXECUTOR = CeleryExecutor
You'll need to configure your message broker URL:
AIRFLOW__CELERY__BROKER_URL = redis://localhost:6379/1
AIRFLOW__CELERY__RESULT_BACKEND = redis://localhost:6379/1
And specify the number of Celery workers:
# In celery_worker_config.py or via environment variables
task_acks_late = True
worker_concurrency = 4
KubernetesExecutor
The KubernetesExecutor is ideal for environments already using Kubernetes. Each task is executed in its own isolated Kubernetes Pod. This provides excellent resource isolation and scalability.
Configuration:
AIRFLOW__CORE__EXECUTOR = KubernetesExecutor
You can specify the Kubernetes namespace, image, and other Pod configurations:
AIRFLOW__KUBERNETES__NAMESPACE = airflow
AIRFLOW__KUBERNETES__POD_IMAGE = apache/airflow:latest
AIRFLOW__KUBERNETES__CREATE_POD_TEMPLATE_FILE = /opt/airflow/pod-template.yaml
DaskExecutor
The DaskExecutor allows Airflow to distribute tasks across a Dask cluster. This is particularly useful for Python-heavy workloads where Dask can manage distributed computation efficiently.
Configuration:
AIRFLOW__CORE__EXECUTOR = DaskExecutor
Configure your Dask scheduler address:
AIRFLOW__DASK__SCHEDULER_ADDRESS = tcp://dask-scheduler:8786
SequentialExecutor
The SequentialExecutor runs tasks one after another. It's not suitable for production environments but is excellent for debugging and understanding task dependencies without the complexities of parallelism.
Configuration:
AIRFLOW__CORE__EXECUTOR = SequentialExecutor
Choosing the Right Executor
The choice of executor depends on your specific needs:
- Development & Testing:
LocalExecutororSequentialExecutor. - Scalable Workloads:
CeleryExecutorif you have a message broker and worker infrastructure. - Kubernetes Environments:
KubernetesExecutorfor seamless integration. - Python/Scientific Computing:
DaskExecutor.
Remember to configure the `AIRFLOW__CORE__EXECUTOR` setting in your `airflow.cfg` or via environment variables to select your desired executor.