DAGs (Directed Acyclic Graphs)

A Directed Acyclic Graph (DAG) is a collection of tasks with dependencies defined on them. Airflow represents all of its workflows as DAGs. A DAG is a Python script that is used to configure the DAG.

What is a DAG?

A DAG is a structured set of tasks that Airflow runs. Tasks are organized in a way that reflects their relationships and dependencies. The graph is directed because tasks can only depend on tasks that come before them, and it's acyclic because a task cannot depend on itself or on any task that has already completed.

Key Components of a DAG

Defining a DAG

DAGs are defined in Python files. Here's a simple example:


from __future__ import annotations

import pendulum

from airflow.models.dag import DAG
from airflow.operators.bash import BashOperator

with DAG(
    dag_id="my_simple_dag",
    start_date=pendulum.datetime(2023, 1, 1, tz="UTC"),
    schedule=None,
    catchup=False,
    tags=["example", "core-concepts"],
) as dag:
    # Define tasks
    task_1 = BashOperator(
        task_id="say_hello",
        bash_command="echo 'Hello, World!'",
    )

    task_2 = BashOperator(
        task_id="say_goodbye",
        bash_command="echo 'Goodbye, Airflow!'",
    )

    # Define dependencies
    task_1 >> task_2
            

Common DAG Arguments

Argument Description Default
dag_id A unique identifier for the DAG. Required
start_date The date from which the DAG should start running. Airflow uses this to determine when to schedule the first DAG run. Required
schedule The schedule interval for the DAG. Can be a cron expression, a timedelta object, or None for manual runs. @once
catchup If True, Airflow will schedule DAG runs for all missing past intervals between start_date and the current date. True
tags A list of tags to associate with the DAG for organization and filtering in the UI. []
default_args A dictionary of default arguments to apply to all tasks within the DAG. {}

Task Dependencies

Dependencies define the execution order of tasks. Airflow supports several ways to define these relationships:

Tip

For complex dependency graphs, consider using the chain() utility from airflow.utils.helpers.

Task States

Tasks and DAG runs can be in various states throughout their lifecycle. Common states include:

The Airflow UI for DAGs

The Airflow Web UI provides a visual representation of your DAGs, their structure, and their current status. You can monitor task progress, view logs, and trigger DAG runs from the UI.

Note

DAGs are parsed by the Airflow scheduler at regular intervals. Ensure your DAG files are placed in the DAGs folder configured in your Airflow environment.