Table of Contents

Airflow Variables

Last Updated: 2023-10-27

Introduction

Airflow Variables allow you to store and retrieve arbitrary key-value pairs, making it easy to manage configuration settings, external resource credentials, or any other dynamic data that your DAGs might need. They are a fundamental component for making your Airflow deployments flexible and configurable without hardcoding sensitive or environment-specific information directly into your DAG files.

Variables are stored in the Airflow metadata database and can be managed through the Airflow UI or programmatically via the Airflow CLI or Python API. This separation of configuration from code is a key principle for robust data pipeline management.

Variables in the UI

The Airflow UI provides a dedicated section for managing Variables. Navigate to Admin > Variables to access this interface.

Here you can:

  • View existing variables: A table lists all currently defined variables, showing their key, value, and last updated timestamp.
  • Add new variables: Click the '+ Add Row' button to create a new variable. You'll need to provide a unique Key and a corresponding Value.
  • Edit variables: Click the edit icon next to a variable to modify its value.
  • Delete variables: Click the delete icon to remove a variable. Be cautious, as this action is irreversible and will affect any DAGs that rely on it.

For sensitive information like API keys or passwords, it is highly recommended to use Airflow Connections instead of Variables, as Connections are designed with more robust security features and often integrate with secrets management backends. However, for non-sensitive configuration, Variables are excellent.

Loading Variables in a DAG

You can easily retrieve the value of a variable within your DAG code using the `Variable` class from the `airflow.models` module.


from airflow import DAG
from airflow.models import Variable
from airflow.operators.bash import BashOperator
from datetime import datetime

with DAG(
    dag_id='variable_example',
    start_date=datetime(2023, 1, 1),
    schedule_interval=None,
    catchup=False
) as dag:
    # Retrieve a variable's value
    my_api_key = Variable.get("my_api_key")
    my_config_value = Variable.get("custom_config_setting", default_var="default_value")

    print_variable_task = BashOperator(
        task_id='print_variable',
        bash_command=f'echo "API Key: {my_api_key}" && echo "Config: {my_config_value}"'
    )
                

The `Variable.get(key, default_var=None)` method is used to fetch the value associated with the given key. If the variable does not exist, and a default_var is provided, that default value will be returned. Otherwise, an exception will be raised.

For security, especially with potentially sensitive values, consider setting the variable as 'Secret' in the UI. When a variable is marked as secret, its value will be hidden in the UI and can only be retrieved in logs if explicitly printed (and even then, it might be masked depending on Airflow configuration).

Referencing Variables in Templates

Airflow's templating engine allows you to directly reference variables within operator parameters that accept Jinja templating. This is a very common and convenient way to inject configuration into your tasks.

You can access variables using the {{ var.value.your_variable_key }} syntax.


from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime

def print_message(**context):
    message = context['templates_dict']['dynamic_message']
    print(f"The message is: {message}")

with DAG(
    dag_id='template_variable_example',
    start_date=datetime(2023, 1, 1),
    schedule_interval=None,
    catchup=False
) as dag:
    templated_task = PythonOperator(
        task_id='templated_message',
        python_callable=print_message,
        templates_dict={
            'dynamic_message': 'Hello from {{ var.value.greeting_message }}!'
        }
    )
                

In this example, if you have a variable with the key greeting_message set to "Airflow User", the rendered template will become "Hello from Airflow User!".

You can also use {{ var.json.your_variable_key }} if your variable's value is a JSON string, which will deserialize it into a Python dictionary or list.

Variable Scopes

Airflow variables are global and not tied to a specific DAG or task by default. When you call Variable.get('my_key'), Airflow looks for this key across all defined variables in the metadata database.

There isn't a formal scoping mechanism like DAG-specific variables built into the core Variable feature itself. If you need values that are specific to a particular DAG or environment, common patterns include:

  • Using Prefixes: Name your variables with a prefix that indicates their purpose or the DAG they belong to (e.g., my_dag_a.api_endpoint, my_dag_b.timeout_seconds).
  • Configuration Files: Store configuration in external files (e.g., JSON, YAML) and load those files using a single variable that points to the file path or contains the file content.
  • Environment Variables: Utilize the underlying operating system's environment variables.

Best Practices

  • Use Variables for Configuration, Not Data: Variables are best suited for settings, flags, and non-sensitive configuration parameters. Avoid storing large amounts of data or runtime information in variables.
  • Prefer Connections for Secrets: For passwords, API keys, and other credentials, use Airflow Connections. They are more secure and often integrate with external secrets managers (like HashiCorp Vault, AWS Secrets Manager, GCP Secret Manager).
  • Keep Keys Descriptive: Use clear and descriptive keys for your variables (e.g., s3_bucket_name, etl_process_version) to make them easily understandable.
  • Leverage Default Values: Use the default_var parameter in Variable.get() to provide sensible fallback values, making your DAGs more resilient to missing configurations.
  • Mark Sensitive Variables as Secrets: If a variable contains information that should not be exposed in plain text (even if not a full credential), mark it as a "Secret" in the UI.
  • Consider JSON Variables: For structured configuration, store a JSON string as the variable's value and retrieve it using var.json or by parsing the string in your Python code.
  • Audit and Clean Up: Regularly review your variables to ensure they are still needed and that sensitive information is handled appropriately.