Secrets Management
Managing sensitive information like database passwords, API keys, and encryption keys is crucial for the security of your Apache Airflow deployment. Airflow provides several mechanisms to handle secrets securely.
Important: Never hardcode secrets directly in your DAG files or Airflow configuration. Use the recommended secrets management solutions.
Core Principles
- Least Privilege: Grant only the necessary permissions for Airflow components to access secrets.
- Isolation: Separate secrets from your code and configuration files.
- Auditing: Keep track of who accesses secrets and when.
Supported Secrets Backends
Airflow supports integration with various secrets management systems. The primary configuration for these backends is done in your airflow.cfg file. You can specify the backend using the secrets_backend parameter in the [core] section.
Environment Variables
The simplest way to provide secrets is through environment variables. Airflow can read environment variables prefixed with AIRFLOW__. For example, a database connection string can be provided as:
export AIRFLOW__CORE__SQL_ALCHEMY_CONN='postgresql://user:password@host:port/database'
This method is convenient for local development and simple deployments but might not be ideal for large-scale production environments due to limitations in central management and rotation.
HashiCorp Vault
HashiCorp Vault is a popular and robust secrets management tool. Airflow can integrate with Vault to retrieve secrets dynamically.
To use Vault, you'll need to configure the secrets_backend to point to the Vault backend and provide the necessary connection details. This typically involves specifying the Vault URL, authentication method (e.g., token, AppRole), and the path to your secrets in Vault.
Example configuration in airflow.cfg:
[core]
secrets_backend = airflow.providers.hashicorp.secrets.vault.VaultBackend
secrets_backend_kwargs = {
"url": "http://127.0.0.1:8200",
"auth_method": "token",
"token": "your_vault_token"
}
Secrets are then accessed in DAGs using Airflow's connections or directly via the secrets backend hook.
AWS Secrets Manager
For AWS users, Airflow can leverage AWS Secrets Manager. This allows you to store and manage your secrets within AWS.
Configure the secrets_backend to use the AWS Secrets Manager backend. You'll need to provide AWS credentials (via environment variables, IAM roles, or configuration files) and specify the secret name or ARN.
[core]
secrets_backend = airflow.providers.amazon.aws.secrets.aws_secrets_manager.AwsSecretsManagerBackend
secrets_backend_kwargs = {
"aws_conn_id": "aws_default",
"secret_id": "my-airflow-db-password"
}
GCP Secret Manager
Similarly, for Google Cloud Platform users, Airflow can integrate with GCP Secret Manager.
Configure the secrets_backend to use the GCP Secret Manager backend. This typically involves specifying the GCP connection ID and the secret name.
[core]
secrets_backend = airflow.providers.google.cloud.secrets.secret_manager.GCPSecretManagerBackend
secrets_backend_kwargs = {
"gcp_conn_id": "google_cloud_default",
"secret_id": "airflow-db-password",
"version": "latest"
}
Azure Key Vault
Airflow also supports integration with Azure Key Vault for managing secrets in Azure environments.
[core]
secrets_backend = airflow.providers.microsoft.azure.secrets.key_vault.AzureKeyVaultBackend
secrets_backend_kwargs = {
"azure_conn_id": "azure_default",
"vault_name": "my-airflow-keyvault"
}
Accessing Secrets in DAGs
Once a secrets backend is configured, you can access secrets in your DAGs using Airflow's connection mechanism or the secrets backend directly.
Using Airflow Connections
The most common way is to configure Airflow connections where the password or other sensitive fields are fetched from the configured secrets backend. When you create a connection in the UI or via the CLI, you can leave the password field blank if it's managed by your secrets backend.
For example, if you have a PostgreSQL connection named my_postgres_db and its password is stored in Vault at secret/data/airflow/connections/my_postgres_db#password, Airflow will fetch it automatically.
Direct Access with Hooks
You can also use the secrets backend hook directly within your Python code.
from airflow.providers.hashicorp.secrets.vault import VaultHook
def my_task_function(**context):
vault_hook = VaultHook()
secret = vault_hook.get_secret("secret/data/my_app/config")
api_key = secret.get("api_key")
print(f"API Key: {api_key}")
# ... DAG definition ...
Security Considerations
- Permissions: Ensure the Airflow service account or user has the minimum necessary permissions to access secrets in your chosen backend.
- Encryption: Secrets stored in your backend should be encrypted at rest.
- Rotation: Implement a strategy for rotating secrets periodically to enhance security.
- Auditing: Regularly review audit logs from your secrets backend to detect any suspicious activity.
Refer to the Secrets Backend Configuration section for detailed instructions on setting up each specific secrets backend.