Airflow Configuration
This document details the configuration options available for Apache Airflow. Airflow's behavior is controlled by a configuration file, which by default is named airflow.cfg. This file can be placed in several locations, with Airflow searching for it in the following order:
- The directory specified by the
AIRFLOW_CONFIGenvironment variable. - The current working directory (where the Airflow CLI is run).
- The home directory of the user running Airflow (
~/airflow/airflow.cfg). - The configuration directory specified by
AIRFLOW_HOMEenvironment variable (e.g.,$AIRFLOW_HOME/airflow.cfg). - The default configuration directory installed with Airflow (typically
/etc/airflow/airflow.cfgor similar).
Many of these settings can also be overridden by environment variables, which are particularly useful in containerized environments. Airflow will look for environment variables in the format AIRFLOW__{SECTION}__{KEY}. For example, to set the parallelism option in the [core] section, you would use the environment variable AIRFLOW__CORE__PARALLELISM.
Configuration File Structure
The airflow.cfg file is organized into sections, each with a set of configuration keys. The most common sections include:
[core] Section
General core settings for Airflow.
airflow_home: The directory where Airflow stores its configuration, logs, and DAGs. Defaults to~/airflow.dags_folder: The directory where Airflow looks for DAG files. Defaults to$AIRFLOW_HOME/dags.executor: The executor to use. Common options includeSequentialExecutor,LocalExecutor,CeleryExecutor, andKubernetesExecutor. Defaults toSequentialExecutor.parallelism: The maximum number of task instances that can be running concurrently across all active DAGs. Defaults to 32.dag_concurrency: The maximum number of task instances allowed to run concurrently within a single DAG. Defaults to 16.max_active_runs_per_dag: The maximum number of active DAG runs allowed for a single DAG. Defaults to 16.max_threads: The maximum number of threads to use for the webserver and scheduler. Defaults to 2.
[webserver] Section
Settings related to the Airflow webserver.
host: The hostname or IP address to bind the webserver to. Defaults tolocalhost.port: The port number for the webserver. Defaults to8080.secret_key: A secret key used for session management. It's highly recommended to set a strong, unique key in production.auth_manager: The authentication manager to use (e.g.,airflow.providers.common.compat.airflow.auth_manager.airflow_auth_manager.AirflowAuthManager).
[scheduler] Section
Settings for the Airflow scheduler.
dag_dir_list_interval: The interval, in seconds, at which the scheduler scans the DAGs folder for new or updated DAGs. Defaults to 120.min_file_process_interval: The minimum interval, in seconds, between processing DAG files. Defaults to 30.num_runs_per_loop: The number of DAG runs the scheduler processes in each loop. Defaults to 10.
[database] Section
Database connection and configuration settings.
sql_alchemy_conn: The SQLAlchemy connection string for the metadata database. Defaults tosqlite:///airflow.db.
[logging] Section
Logging configuration.
base_log_folder: The directory where logs are stored. Defaults to$AIRFLOW_HOME/logs.remote_logging: Whether to enable remote logging (e.g., to S3, GCS, Azure Blob Storage). Defaults toFalse.remote_log_conn_id: The Airflow connection ID for the remote logging service.remote_log_path: The path within the remote storage for logs.
[operators] Section
Default settings for operators.
default_executor: The default executor for operators if not specified.
Example airflow.cfg
[core]
airflow_home = /opt/airflow
dags_folder = /opt/airflow/dags
executor = LocalExecutor
parallelism = 64
dag_concurrency = 16
max_active_runs_per_dag = 16
[webserver]
host = 0.0.0.0
port = 8080
secret_key = MySuperSecretKey123
[scheduler]
dag_dir_list_interval = 60
min_file_process_interval = 30
[database]
sql_alchemy_conn = postgresql+psycopg2://user:password@host:port/dbname
[logging]
base_log_folder = /opt/airflow/logs
remote_logging = True
remote_log_conn_id = airflow_s3_logging
remote_log_path = s3://my-airflow-logs/{{ ds }}
Important Note on Security Keys
The secret_key in the [webserver] section is critical for session security. In production environments, never use the default or example key. Generate a strong, unique key and store it securely, ideally using environment variables or a secrets management system.
Environment Variable Overrides
For containerized deployments or CI/CD pipelines, using environment variables to configure Airflow is highly recommended. For example, to override the parallelism setting:
export AIRFLOW__CORE__PARALLELISM=100
This will set the parallelism to 100, overriding the value in airflow.cfg if it exists.
Common Configuration Tasks
-
Changing the Executor
To switch to the
LocalExecutor, modify the[core]section:[core] executor = LocalExecutor -
Configuring Remote Logging
To enable logging to Amazon S3, ensure you have the relevant provider installed (e.g.,
apache-airflow-providers-amazon) and configure the connection and logging settings:[logging] remote_logging = True remote_log_conn_id = airflow_s3_logging remote_log_path = s3://your-bucket-name/airflow-logsYou will also need to set up an Airflow connection with the ID
airflow_s3_logging, providing your AWS credentials and bucket details. -
Setting a Custom DAGs Folder
If your DAGs are located in a different directory than the default:
[core] dags_folder = /path/to/your/dags
For a comprehensive list of all configuration options and their detailed descriptions, please refer to the Airflow Configuration Reference.