DAG Runs
A DAG run is a specific instance of a DAG. When you trigger a DAG, Airflow creates a DAG run for it. Each DAG run has a unique ID, a logical date, and a state (e.g., running, success, failed).
What is a DAG Run?
In Apache Airflow, a Directed Acyclic Graph (DAG) defines a set of tasks and their dependencies. A DAG Run represents a single execution of that DAG. Think of the DAG as the blueprint and the DAG Run as a specific instance of that blueprint being built.
Key Attributes of a DAG Run:
- DAG ID: The identifier of the DAG being run.
- Run ID: A unique identifier for this specific DAG Run. It often includes the logical date and a timestamp.
- Logical Date (Execution Date): The timestamp that represents the logical time of the data the DAG run is processing. This is crucial for backfilling and scheduling.
- State: The current status of the DAG Run (e.g.,
queued,running,success,failed,scheduled). - Created At: Timestamp when the DAG Run was created.
- Updated At: Timestamp when the DAG Run was last updated.
- Start Date: Timestamp when the DAG Run actually started executing tasks.
- End Date: Timestamp when the DAG Run finished executing tasks.
Creating DAG Runs
DAG Runs are typically created in two ways:
- Scheduled Runs: Airflow's scheduler automatically creates DAG runs based on the
schedule_intervaldefined in your DAG. - Manual Triggers: Users can manually trigger a DAG run from the Airflow UI or via the Airflow CLI.
Automatic Scheduling
When you define a schedule_interval for your DAG (e.g., daily, hourly), Airflow's scheduler will periodically check if it's time to create a new DAG run. For a daily schedule, it will typically create a DAG run for the previous day at a specific time.
from __future__ import annotations
import pendulum
from airflow.models.dag import DAG
with DAG(
dag_id="my_scheduled_dag",
start_date=pendulum.datetime(2023, 1, 1, tz="UTC"),
schedule="@daily", # This will trigger a DAG run daily
catchup=False,
tags=["example"],
) as dag:
pass
Manual Triggers
You can trigger a DAG run manually for several reasons, such as testing or re-running a specific period.
From the UI: Navigate to the DAGs view, find your DAG, and click the "Trigger DAG" button. You can specify configuration options and a logical date.
From the CLI:
airflow dags trigger my_dag_id --conf '{"key": "value"}' --execution-date 2023-10-27T00:00:00
The --conf parameter allows you to pass a JSON configuration that can be accessed within your tasks. The --execution-date (or --logical-date) parameter specifies the logical date for this manual run.
Understanding the Logical Date
The logical date (often referred to as execution date in older versions) is a fundamental concept. It represents the point in time that the DAG run is *for*. For a daily DAG scheduled to run at midnight, the logical date for that run would be the start of the day (e.g., 2023-10-26 00:00:00 UTC if the run happens on 2023-10-27).
This distinction is vital:
- The logical date defines the data interval for the DAG run.
- The run date (when it actually starts) is when Airflow executes the DAG.
This separation allows for catchup and backfilling, where you can have Airflow run your DAG for past logical dates.
DAG Run States
DAG runs transition through various states as they are processed:
- queued: The DAG run has been created but is waiting for resources or an available worker.
- running: The DAG run is actively executing tasks.
- success: All tasks within the DAG run have completed successfully.
- failed: At least one task within the DAG run failed and could not be retried to success.
- scheduled: The DAG run has been scheduled to run in the future.
- upstream_failed: The DAG run was skipped because an upstream task failed.
- skipped: The DAG run was explicitly skipped.
The Airflow UI provides a visual representation of these states for each DAG run.
Viewing DAG Runs
You can monitor your DAG runs through the Airflow UI:
- Navigate to the DAGs view.
- Click on a specific DAG name.
- The DAG runs are displayed in a timeline or grid view, showing their state, logical date, and duration.
Note on `catchup`
The catchup=False parameter in a DAG definition prevents Airflow from automatically creating DAG runs for all past missed schedules between the start_date and the current date when the DAG is first unpaused. Setting it to True (or omitting it if the default is True) will cause Airflow to create runs for all missed intervals.
Tip: Using `dag_run.conf`
When triggering a DAG manually with configuration (--conf), you can access this configuration within your Python tasks using {{ dag_run.conf }} Jinja templating. This is very powerful for passing parameters to your DAG runs.
from airflow.decorators import task
@task
def print_conf(**context):
conf = context["dag_run"].conf
print(f"Configuration received: {conf}")
print_conf()