Apache Airflow

The platform to programmatically author, schedule, and monitor workflows.

Airflow Configuration

This document details the configuration options available for Apache Airflow. Airflow's behavior is controlled by a configuration file, which by default is named airflow.cfg. This file can be placed in several locations, with Airflow searching for it in the following order:

  1. The directory specified by the AIRFLOW_CONFIG environment variable.
  2. The current working directory (where the Airflow CLI is run).
  3. The home directory of the user running Airflow (~/airflow/airflow.cfg).
  4. The configuration directory specified by AIRFLOW_HOME environment variable (e.g., $AIRFLOW_HOME/airflow.cfg).
  5. The default configuration directory installed with Airflow (typically /etc/airflow/airflow.cfg or similar).

Many of these settings can also be overridden by environment variables, which are particularly useful in containerized environments. Airflow will look for environment variables in the format AIRFLOW__{SECTION}__{KEY}. For example, to set the parallelism option in the [core] section, you would use the environment variable AIRFLOW__CORE__PARALLELISM.

Configuration File Structure

The airflow.cfg file is organized into sections, each with a set of configuration keys. The most common sections include:

[core] Section

General core settings for Airflow.

[webserver] Section

Settings related to the Airflow webserver.

[scheduler] Section

Settings for the Airflow scheduler.

[database] Section

Database connection and configuration settings.

[logging] Section

Logging configuration.

[operators] Section

Default settings for operators.

Example airflow.cfg

[core]
airflow_home = /opt/airflow
dags_folder = /opt/airflow/dags
executor = LocalExecutor
parallelism = 64
dag_concurrency = 16
max_active_runs_per_dag = 16

[webserver]
host = 0.0.0.0
port = 8080
secret_key = MySuperSecretKey123

[scheduler]
dag_dir_list_interval = 60
min_file_process_interval = 30

[database]
sql_alchemy_conn = postgresql+psycopg2://user:password@host:port/dbname

[logging]
base_log_folder = /opt/airflow/logs
remote_logging = True
remote_log_conn_id = airflow_s3_logging
remote_log_path = s3://my-airflow-logs/{{ ds }}

Important Note on Security Keys

The secret_key in the [webserver] section is critical for session security. In production environments, never use the default or example key. Generate a strong, unique key and store it securely, ideally using environment variables or a secrets management system.

Environment Variable Overrides

For containerized deployments or CI/CD pipelines, using environment variables to configure Airflow is highly recommended. For example, to override the parallelism setting:

export AIRFLOW__CORE__PARALLELISM=100

This will set the parallelism to 100, overriding the value in airflow.cfg if it exists.

Common Configuration Tasks

For a comprehensive list of all configuration options and their detailed descriptions, please refer to the Airflow Configuration Reference.