Database Administration

Apache Airflow relies on a relational database to store metadata about your DAGs, tasks, runs, connections, variables, and more. The performance and reliability of this database are critical for the overall health of your Airflow deployment.

Database Configuration

The database connection URI is configured in your Airflow configuration file (airflow.cfg) under the [core] section using the sql_alchemy_conn parameter.

[core]
sql_alchemy_conn = postgresql+psycopg2://user:password@host:port/database

Airflow supports several database backends, including PostgreSQL, MySQL, and SQLite. For production environments, PostgreSQL or MySQL are highly recommended.

Supported Databases

Note on Database Choice

When choosing a database for production, consider factors like scalability, reliability, and your team's existing expertise. PostgreSQL is generally considered a robust and feature-rich choice for Airflow.

Database Migrations

When you upgrade Airflow, you will often need to run database migrations to update the database schema to match the new Airflow version. This is done using the Airflow CLI:

airflow db upgrade

It is crucial to back up your database before running any migrations. Airflow will prompt you if it detects a version mismatch.

Performing a Database Backup

Before performing any database operations, especially upgrades, it is highly recommended to back up your Airflow metadata database. The method for backing up will depend on your chosen database system:

Database Maintenance

Regular maintenance is essential for keeping your Airflow database healthy and performant. This includes:

Performance Tip

Regularly prune old task instance data and DAG run history to keep the database size manageable. Airflow provides mechanisms for this, which are covered in the Pruning Old Data section (though not directly in this document).

Database Tuning

The optimal database configuration can vary significantly based on your workload and hardware. Here are some general areas to consider:

Caution

Database tuning can be complex and may require expert knowledge. Always test changes thoroughly in a staging environment before applying them to production. Incorrect tuning can degrade performance or even cause instability.

Troubleshooting Database Issues

Common database-related issues include:

Consult your database system's documentation for specific troubleshooting steps and performance analysis tools.

This section provides a foundational understanding of managing the Airflow metadata database. For advanced configurations and specific database optimizations, refer to the documentation of your chosen database system.