Last Updated: 2023-10-27
Airflow Variables allow you to store and retrieve arbitrary key-value pairs, making it easy to manage configuration settings, external resource credentials, or any other dynamic data that your DAGs might need. They are a fundamental component for making your Airflow deployments flexible and configurable without hardcoding sensitive or environment-specific information directly into your DAG files.
Variables are stored in the Airflow metadata database and can be managed through the Airflow UI or programmatically via the Airflow CLI or Python API. This separation of configuration from code is a key principle for robust data pipeline management.
The Airflow UI provides a dedicated section for managing Variables. Navigate to Admin > Variables to access this interface.
Here you can:
+ Add Row' button to create a new variable. You'll need to provide a unique Key and a corresponding Value.For sensitive information like API keys or passwords, it is highly recommended to use Airflow Connections instead of Variables, as Connections are designed with more robust security features and often integrate with secrets management backends. However, for non-sensitive configuration, Variables are excellent.
You can easily retrieve the value of a variable within your DAG code using the `Variable` class from the `airflow.models` module.
from airflow import DAG
from airflow.models import Variable
from airflow.operators.bash import BashOperator
from datetime import datetime
with DAG(
dag_id='variable_example',
start_date=datetime(2023, 1, 1),
schedule_interval=None,
catchup=False
) as dag:
# Retrieve a variable's value
my_api_key = Variable.get("my_api_key")
my_config_value = Variable.get("custom_config_setting", default_var="default_value")
print_variable_task = BashOperator(
task_id='print_variable',
bash_command=f'echo "API Key: {my_api_key}" && echo "Config: {my_config_value}"'
)
The `Variable.get(key, default_var=None)` method is used to fetch the value
associated with the given key. If the variable does not exist,
and a default_var is provided, that default value will be returned.
Otherwise, an exception will be raised.
For security, especially with potentially sensitive values, consider setting the variable as 'Secret' in the UI. When a variable is marked as secret, its value will be hidden in the UI and can only be retrieved in logs if explicitly printed (and even then, it might be masked depending on Airflow configuration).
Airflow's templating engine allows you to directly reference variables within operator parameters that accept Jinja templating. This is a very common and convenient way to inject configuration into your tasks.
You can access variables using the {{ var.value.your_variable_key }} syntax.
from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime
def print_message(**context):
message = context['templates_dict']['dynamic_message']
print(f"The message is: {message}")
with DAG(
dag_id='template_variable_example',
start_date=datetime(2023, 1, 1),
schedule_interval=None,
catchup=False
) as dag:
templated_task = PythonOperator(
task_id='templated_message',
python_callable=print_message,
templates_dict={
'dynamic_message': 'Hello from {{ var.value.greeting_message }}!'
}
)
In this example, if you have a variable with the key greeting_message
set to "Airflow User", the rendered template will become "Hello from Airflow User!".
You can also use {{ var.json.your_variable_key }} if your variable's
value is a JSON string, which will deserialize it into a Python dictionary or list.
Airflow variables are global and not tied to a specific DAG or task by default.
When you call Variable.get('my_key'), Airflow looks for this key
across all defined variables in the metadata database.
There isn't a formal scoping mechanism like DAG-specific variables built into the core Variable feature itself. If you need values that are specific to a particular DAG or environment, common patterns include:
my_dag_a.api_endpoint, my_dag_b.timeout_seconds).s3_bucket_name, etl_process_version) to make them easily understandable.default_var parameter in Variable.get() to provide sensible fallback values, making your DAGs more resilient to missing configurations.var.json or by parsing the string in your Python code.