Connections
Airflow Connections allow you to manage hostnames, ports, logins, passwords, and other parameters for external services that Airflow needs to interact with.
Connections are a fundamental part of how Airflow interacts with external systems, such as databases, cloud services, messaging queues, and more. Instead of hardcoding credentials and host information directly into your DAGs, you define these details as connections within Airflow. This approach enhances security, maintainability, and reusability.
What is a Connection?
A connection in Airflow is a named entry that stores configuration details for a specific external resource. Each connection has a unique identifier (often called a "conn_id") and contains attributes like:
- Connection Type: Specifies the type of service the connection is for (e.g., 'Postgres', 'S3', 'HTTP', 'Docker').
- Host: The hostname or IP address of the service.
- Schema: The protocol (e.g., 'http', 'https', 's3').
- Login: The username for authentication.
- Password: The password for authentication.
- Port: The port number the service is listening on.
- Extra: A JSON string for additional, service-specific parameters.
Managing Connections
Connections can be managed through the Airflow UI or via the command-line interface (CLI). They can also be configured using environment variables or a `connections.yaml` file.
Via the Airflow UI
Navigate to the 'Admin' -> 'Connections' section in the Airflow UI. From there, you can:
- Add a new connection: Click the '+' button to create a new connection.
- Edit an existing connection: Click on a connection's ID to modify its details.
- Delete a connection: Select connections and click the 'Delete' button.
Note: Passwords and sensitive information should be handled with care. Consider using secrets backends for more robust security.
Via the Airflow CLI
The Airflow CLI provides commands to manage connections:
airflow connections add --conn-id: Adds a new connection using a URI.--conn-type airflow connections list: Lists all existing connections.airflow connections delete --conn-id: Deletes a connection.
Example CLI command to add a PostgreSQL connection:
airflow connections add \
--conn-id my_postgres_db \
--conn-type postgres \
postgresql://user:password@host:port/database
Using Connections in DAGs
In your DAGs, you reference connections by their `conn_id`. Most operators and hooks in Airflow will have a `conn_id` parameter where you can specify the connection to use.
For example, using the PostgreSQL operator:
from __future__ import annotations
import pendulum
from airflow.models.dag import DAG
from airflow.providers.postgres.operators.postgres import PostgresOperator
with DAG(
dag_id="example_postgres_connection",
schedule=None,
start_date=pendulum.datetime(2023, 1, 1, tz="UTC"),
catchup=False,
tags=["example", "postgres"],
) as dag:
run_this_first = PostgresOperator(
task_id="create_table",
postgres_conn_id="my_postgres_db", # Reference to your connection ID
sql="CREATE TABLE IF NOT EXISTS my_table (id SERIAL PRIMARY KEY, name VARCHAR(255));",
)
run_this_second = PostgresOperator(
task_id="insert_data",
postgres_conn_id="my_postgres_db",
sql="INSERT INTO my_table (name) VALUES ('Airflow User');",
)
run_this_first >> run_this_second
Connection URI Format
Connections can often be represented using a standard URI format:
scheme://login:password@host:port/database?extra_param1=value1&extra_param2=value2
For example:
- PostgreSQL:
postgresql://user:password@host:port/database - HTTP:
http://user:password@host:port/path - AWS S3:
aws://access_key:secret_key@host/bucket(or managed by IAM roles)
The 'extra' field can be used to store JSON data for service-specific parameters that don't fit into the standard URI components. For instance, for an HTTP connection, you might store:
{"use_proxy": "True", "proxy_url": "http://proxy.example.com:8080"}
Secrets Management
For enhanced security, Airflow integrates with external secrets management systems like HashiCorp Vault, AWS Secrets Manager, GCP Secret Manager, and Azure Key Vault. Instead of storing secrets directly in Airflow Connections, you can configure Airflow to retrieve them dynamically from these services. This is highly recommended for production environments.
Tip: When using secrets backends, the 'Password' field in an Airflow connection can often store a reference (e.g., a path or key) to the actual secret in the secrets backend.
Connection Types
Airflow supports a wide range of built-in connection types, and you can define custom ones or use types provided by Airflow providers. Common types include:
| Connection Type | Description |
|---|---|
| Amazon Web Services (AWS) | For connecting to AWS services. |
| Azure Resource Manager (ARM) | For connecting to Azure services. |
| Google Cloud Platform (GCP) | For connecting to GCP services. |
| PostgreSQL | For PostgreSQL databases. |
| MySQL | For MySQL databases. |
| HTTP | For generic HTTP endpoints. |
| SFTP | For Secure File Transfer Protocol. |
| Kafka | For Apache Kafka. |
| RabbitMQ | For RabbitMQ message broker. |
| Docker | For interacting with Docker. |
The exact list of available connection types depends on your Airflow installation and installed providers.
Conclusion
Mastering Airflow Connections is crucial for building robust and secure data pipelines. By centralizing external service configurations, you ensure consistency, simplify management, and improve the overall reliability of your Airflow workflows.