Pools
Pools are a mechanism to limit the execution of tasks by concurrency. This is particularly useful when you want to limit access to an external service that has a limited number of connections available.
What are Pools?
In Apache Airflow, a Pool is a named resource that allows you to control the number of concurrent task executions
that can run against a specific target. Imagine you have an API that can only handle 5 concurrent requests.
You can create an Airflow Pool named my_api_pool with a maximum concurrency of 5. Any task assigned
to this pool will only run if there's an available slot within the pool's concurrency limit.
Pools help prevent overwhelming external systems, manage resource contention, and ensure fair usage among different DAGs.
Creating Pools
Pools can be created and managed through the Airflow UI or programmatically.
Using the Airflow UI:
- Navigate to the Admin menu in the Airflow UI.
- Click on Pools.
- Click the + button to add a new pool.
- Fill in the required fields:
- Pool Name: A unique identifier for your pool (e.g.,
my_api_pool). - Slots: The maximum number of concurrent tasks allowed for this pool.
- Description: An optional description of the pool's purpose.
- Pool Name: A unique identifier for your pool (e.g.,
- Click Save.
Programmatically (via the Airflow CLI):
You can also use the Airflow CLI to create pools:
airflow pools create --name my_cli_pool --slots 3 --description "Pool created via CLI"
Using Pools
To assign a task to a pool, you specify the pool parameter in the operator's constructor.
If a task is assigned to a pool that doesn't exist, it will be treated as if it has no pool assigned.
Here's an example of a DAG using pools:
from __future__ import annotations
import pendulum
from airflow.models.dag import DAG
from airflow.operators.bash import BashOperator
with DAG(
dag_id="pool_example_dag",
schedule=None,
start_date=pendulum.datetime(2023, 1, 1, tz="UTC"),
catchup=False,
tags=["example", "pools"],
) as dag:
task_no_pool = BashOperator(
task_id="task_no_pool",
bash_command="echo 'This task has no pool assigned.'",
)
task_with_pool_1 = BashOperator(
task_id="task_with_pool_1",
bash_command="echo 'Running task in my_api_pool.' && sleep 10",
pool="my_api_pool",
pool_slots=1, # Default is 1, explicitly stating for clarity
)
task_with_pool_2 = BashOperator(
task_id="task_with_pool_2",
bash_command="echo 'Running another task in my_api_pool.' && sleep 10",
pool="my_api_pool",
pool_slots=1,
)
task_with_different_pool = BashOperator(
task_id="task_with_different_pool",
bash_command="echo 'Running task in another_pool.' && sleep 5",
pool="another_pool",
pool_slots=2,
)
task_no_pool & [task_with_pool_1, task_with_pool_2, task_with_different_pool]
In this example, if my_api_pool is configured with 2 slots,
task_with_pool_1 and task_with_pool_2 can run concurrently.
If another_pool has 2 slots, task_with_different_pool can run.
Important Note on `pool_slots`
The pool_slots parameter in an operator defaults to 1.
This means that each task instance consumes 1 slot from the pool.
If a task needs to consume more slots (e.g., it's a more resource-intensive job),
you can specify a higher value for pool_slots.
Pool Configuration
When you define a pool, you set the total number of slots available.
Each task instance that uses the pool by default consumes 1 slot.
You can override this behavior by setting the pool_slots parameter on the task instance.
| Parameter | Description | Default |
|---|---|---|
pool |
The name of the pool the task belongs to. | None (no pool) |
pool_slots |
The number of slots this task instance consumes from the pool. | 1 |
Monitoring Pools
The Airflow UI provides a dedicated section to monitor your pools:
- Navigate to the Admin menu.
- Click on Pools.
Here you can see:
- Pool names
- Total available slots
- Currently occupied slots
- The number of running tasks
- The number of queued tasks
When a task tries to acquire slots from a pool and all slots are occupied, the task instance will be queued and will wait until a slot becomes available. The UI will reflect this queueing behavior.