Connection Management

This document covers best practices and advanced techniques for managing connections in Apache Airflow.

What are Connections?

Connections in Airflow are a way to store and manage the details of external systems and services that Airflow needs to interact with. This includes databases, cloud services, APIs, and more. Storing these details in Airflow's connection store rather than in DAG files makes your DAGs more portable and secure, as sensitive information like passwords and keys are not hardcoded.

Creating and Managing Connections

Connections can be managed through the Airflow UI or programmatically using the Airflow CLI or Python API.

Via the Airflow UI

Navigate to the "Admin" -> "Connections" tab in the Airflow UI. From here, you can:

The "Extra" field is a JSON dictionary that allows you to store additional parameters specific to the connection type. For example, for a PostgreSQL connection, you might store SSL parameters here.

Via the Airflow CLI

The Airflow CLI provides commands to manage connections:


# List all connections
airflow connections list

# List connections with passwords (use with caution)
airflow connections list --show-password

# Add a new connection
airflow connections add 'my_s3_conn' --conn-type 'aws' --conn-host '...' --conn-login '...' --conn-password '...' --conn-extra '{"region_name": "us-east-1"}'

# Get connection details
airflow connections get 'my_s3_conn'

# Delete a connection
airflow connections delete 'my_s3_conn'
            

Via the Airflow Python API

You can also manage connections using Python scripts:


from airflow.models.connection import Connection
from airflow.utils.session import provide_session

@provide_session
def add_new_connection(session=None):
    new_conn = Connection(
        conn_id='my_redis_conn',
        conn_type='redis',
        host='localhost',
        port=6379,
        login='user',
        password='password',
        extra='{"db": 0}'
    )
    session.add(new_conn)
    session.commit()
    print("Connection 'my_redis_conn' added.")

@provide_session
def get_connection(conn_id, session=None):
    conn = session.query(Connection).filter(Connection.conn_id == conn_id).first()
    if conn:
        print(f"Connection ID: {conn.conn_id}")
        print(f"Conn Type: {conn.conn_type}")
        print(f"Host: {conn.host}")
        print(f"Extra: {conn.extra}")
    else:
        print(f"Connection '{conn_id}' not found.")

# Example usage:
# add_new_connection()
# get_connection('my_redis_conn')
            

Secrets Management and Security

Airflow provides several ways to securely manage sensitive credentials.

Using Environment Variables

You can specify connection details using environment variables prefixed with AIRFLOW_CONN_. For example, to set up an S3 connection:


export AIRFLOW_CONN_AWS_DEFAULT_CONN='aws://user:password@host:port/schema?extra_param=value'
            

Airflow will automatically pick these up.

Using a Secrets Backend

For more robust secrets management, Airflow integrates with external secrets backends like HashiCorp Vault, AWS Secrets Manager, Google Secret Manager, and Azure Key Vault.

To configure a secrets backend, you need to set the airflow.cfg configuration parameter [core] secrets_backend to the URI of your secrets backend.

Example: HashiCorp Vault

If you are using HashiCorp Vault, your airflow.cfg might look like this:


[core]
# Example for Vault:
# secrets_backend = airflow.providers.hashicorp.secrets.vault.vault.VaultBackend
# vault_base_url = https://vault.example.com:8200
# vault_token = your_vault_token_here (or use auth method)
# vault_mount_point = airflow
            

When using a secrets backend, Airflow expects connections to be stored in a specific path within your secrets store. The format is typically {mount_point}/secret/{conn_id}.

Connection URI Format

Connections can also be defined using a URI string. This is particularly useful when importing/exporting connections or when using certain secrets backends.

The general format is:


conn_type://[username[:password]@]host[:port][/schema][?query_parameters]
            

Examples:

Advanced Connection Management

Connection Priority

Airflow allows you to specify multiple connections for a given connection ID. Airflow will try to use them in the order they are defined. This can be useful for failover scenarios or for testing different configurations.

Connection Types

Airflow supports a wide range of built-in connection types, and you can also define custom connection types.

Common built-in types include:

Each connection type has specific fields and expected "extra" parameters. Refer to the documentation for specific Airflow providers for details on their connection types.

Custom Connection Class

You can create your own custom connection classes by inheriting from airflow.models.Connection and implementing custom logic for parsing and accessing connection details.

Tip

Always strive to use a secrets backend for storing sensitive credentials in production environments. Avoid hardcoding credentials or storing them directly in the Airflow database if possible.

Note

When using the aws connection type, the conn_id is typically used as the profile name if you don't provide explicit access keys. If you provide access keys, they will be used directly.