Configuration Reference
This document details the configuration options available for Apache Airflow. Configuration is managed via the airflow.cfg file, or through environment variables, or a combination of both. Airflow allows for flexible configuration to suit various deployment needs.
Core Configuration Options
The [core] section in airflow.cfg controls the fundamental behavior of Airflow.
airflow_home
The root directory for Airflow. This is where Airflow looks for configuration files, DAGs, logs, and plugins. Defaults to ~/airflow.
[core]
airflow_home = /path/to/your/airflow/home
dags_folder
The directory where Airflow looks for DAGs. This can be a relative path from airflow_home or an absolute path. Defaults to $AIRFLOW_HOME/dags.
[core]
dags_folder = /path/to/your/dags
executor
Specifies the executor to use. Common options include SequentialExecutor, LocalExecutor, CeleryExecutor, and KubernetesExecutor. Defaults to SequentialExecutor.
[core]
executor = LocalExecutor
sql_alchemy_conn
The database connection string for Airflow's metadata database. This uses SQLAlchemy connection string format.
[core]
sql_alchemy_conn = postgresql+psycopg2://user:password@host:port/database
Webserver Configuration
The [webserver] section manages settings for the Airflow web UI.
base_url
The public URL of the Airflow webserver. This is important for features like email notifications to link back to the UI correctly.
[webserver]
base_url = http://localhost:8080
port
The port on which the webserver will listen. Defaults to 8080.
[webserver]
port = 8080
workers
The number of worker processes for the webserver (if using a multi-process setup).
[webserver]
workers = 4
Scheduler Configuration
The [scheduler] section configures the Airflow scheduler daemon.
dag_dir_list_interval
The interval in seconds at which the scheduler scans the dags_folder for new or updated DAGs. Defaults to 200.
[scheduler]
dag_dir_list_interval = 120
min_file_process_interval
The minimum interval in seconds between processing DAG files. This helps prevent excessive CPU usage when many DAGs are present. Defaults to 15.
[scheduler]
min_file_process_interval = 5
Configuration Hierarchy and Environment Variables
Airflow supports a configuration hierarchy, allowing you to override settings from airflow.cfg using environment variables. Environment variables take precedence over settings in airflow.cfg. The format for environment variables is generally AIRFLOW__SECTION__KEY.
Example: To set the base_url via environment variable, you would use:
export AIRFLOW__WEBSERVER__BASE_URL=http://my-airflow.example.com
Logging Configuration
The [logging] section dictates how Airflow logs events.
remote_logging
Set to True to enable remote logging to services like S3, GCS, or Azure Blob Storage.
[logging]
remote_logging = False
remote_log_conn_id
The Airflow connection ID for the remote logging service.
[logging]
remote_log_conn_id = aws_default
Example airflow.cfg
Here's a snippet of a typical airflow.cfg file:
[core]
airflow_home = /usr/local/airflow
dags_folder = /usr/local/airflow/dags
executor = LocalExecutor
sql_alchemy_conn = postgresql+psycopg2://airflow:airflow@postgres/airflow
[webserver]
base_url = http://localhost:8080
port = 8080
[scheduler]
dag_dir_list_interval = 120
min_file_process_interval = 5
[logging]
remote_logging = False
log_filename_template = {{ filename_template }}
base_log_folder = /var/log/airflow/scheduler/{{dag_id}}/{{task_id}}/{{execution_date}}/{{try_number}}
remote_base_log_folder = s3://my-airflow-logs/{{ dag_id }}/{{ task_id }}/{{ execution_date }}/{{ try_number }}
remote_log_conn_id = aws_default