Configuration Reference

Last updated: 2023-10-27 | Stable Release

This document details the configuration options available for Apache Airflow. Configuration is managed via the airflow.cfg file, or through environment variables, or a combination of both. Airflow allows for flexible configuration to suit various deployment needs.

Core Configuration Options

The [core] section in airflow.cfg controls the fundamental behavior of Airflow.

airflow_home

The root directory for Airflow. This is where Airflow looks for configuration files, DAGs, logs, and plugins. Defaults to ~/airflow.

[core]
airflow_home = /path/to/your/airflow/home

dags_folder

The directory where Airflow looks for DAGs. This can be a relative path from airflow_home or an absolute path. Defaults to $AIRFLOW_HOME/dags.

[core]
dags_folder = /path/to/your/dags

executor

Specifies the executor to use. Common options include SequentialExecutor, LocalExecutor, CeleryExecutor, and KubernetesExecutor. Defaults to SequentialExecutor.

[core]
executor = LocalExecutor

sql_alchemy_conn

The database connection string for Airflow's metadata database. This uses SQLAlchemy connection string format.

[core]
sql_alchemy_conn = postgresql+psycopg2://user:password@host:port/database

Webserver Configuration

The [webserver] section manages settings for the Airflow web UI.

base_url

The public URL of the Airflow webserver. This is important for features like email notifications to link back to the UI correctly.

[webserver]
base_url = http://localhost:8080

port

The port on which the webserver will listen. Defaults to 8080.

[webserver]
port = 8080

workers

The number of worker processes for the webserver (if using a multi-process setup).

[webserver]
workers = 4

Scheduler Configuration

The [scheduler] section configures the Airflow scheduler daemon.

dag_dir_list_interval

The interval in seconds at which the scheduler scans the dags_folder for new or updated DAGs. Defaults to 200.

[scheduler]
dag_dir_list_interval = 120

min_file_process_interval

The minimum interval in seconds between processing DAG files. This helps prevent excessive CPU usage when many DAGs are present. Defaults to 15.

[scheduler]
min_file_process_interval = 5

Configuration Hierarchy and Environment Variables

Airflow supports a configuration hierarchy, allowing you to override settings from airflow.cfg using environment variables. Environment variables take precedence over settings in airflow.cfg. The format for environment variables is generally AIRFLOW__SECTION__KEY.

Example: To set the base_url via environment variable, you would use:

export AIRFLOW__WEBSERVER__BASE_URL=http://my-airflow.example.com

Logging Configuration

The [logging] section dictates how Airflow logs events.

remote_logging

Set to True to enable remote logging to services like S3, GCS, or Azure Blob Storage.

[logging]
remote_logging = False

remote_log_conn_id

The Airflow connection ID for the remote logging service.

[logging]
remote_log_conn_id = aws_default

Example airflow.cfg

Here's a snippet of a typical airflow.cfg file:

[core]
airflow_home = /usr/local/airflow
dags_folder = /usr/local/airflow/dags
executor = LocalExecutor
sql_alchemy_conn = postgresql+psycopg2://airflow:airflow@postgres/airflow

[webserver]
base_url = http://localhost:8080
port = 8080

[scheduler]
dag_dir_list_interval = 120
min_file_process_interval = 5

[logging]
remote_logging = False
log_filename_template = {{ filename_template }}
base_log_folder = /var/log/airflow/scheduler/{{dag_id}}/{{task_id}}/{{execution_date}}/{{try_number}}
remote_base_log_folder = s3://my-airflow-logs/{{ dag_id }}/{{ task_id }}/{{ execution_date }}/{{ try_number }}
remote_log_conn_id = aws_default

Further Reading