The Airflow Webserver
The Airflow webserver is the primary user interface for interacting with your Airflow environment. It provides a rich graphical interface for monitoring DAGs, managing tasks, and configuring your Airflow deployment.
Core Functionality
- DAG Visualization: View and explore your Directed Acyclic Graphs (DAGs) in various formats, including the Graph View, Tree View, Gantt Chart, and more.
- Task Monitoring: Monitor the status of individual tasks within your DAGs, view logs, and rerun failed tasks.
- Operator Interaction: Interact with operators by clearing task states, marking tasks as success/failure, and triggering DAG runs.
- Configuration Management: Access and manage Airflow configurations, connections, variables, and pools.
- User Management: (Depending on authentication backend) Manage users and their permissions.
Configuration Options
The webserver's behavior can be customized through the Airflow configuration file (airflow.cfg) or environment variables. Key webserver configuration parameters include:
Webserver Section
| Parameter | Description | Default Value |
|---|---|---|
webserver_port |
The port on which the webserver will listen. | 8080 |
webserver_host |
The host IP address on which the webserver will bind. Use 0.0.0.0 to bind to all interfaces. |
localhost |
secret_key |
A secret key used for Flask session management. Should be kept secure. | (randomly generated on first run) |
dag_dir_list_desc |
Whether to list DAG directories in descending order. | True |
dag_code_page_length |
Number of lines to display for DAG code. | 100 |
authenticate |
Enable or disable authentication. | False |
auth_backend |
The authentication backend to use (e.g., airflow.providers.fab.auth_manager.fab_auth_manager.FabAuthManager for FAB). |
airflow.security.permissions.all_perms |
Security Considerations
When deploying the webserver, especially in production environments, it's crucial to consider security:
- Authentication and Authorization: Always enable authentication and configure an appropriate authorization backend to control access to Airflow resources.
- HTTPS: Configure the webserver to use HTTPS to encrypt communication. This typically involves setting up a reverse proxy (like Nginx or Apache HTTP Server) in front of the Airflow webserver.
- Network Access: Restrict network access to the webserver port to only trusted IP addresses.
secret_key: Ensure yoursecret_keyis strong and kept confidential.
Running the Webserver
To start the Airflow webserver, use the Airflow CLI:
airflow webserver -p 8080
This command will start the webserver on port 8080. You can access it by navigating to http://localhost:8080 in your web browser.
Running in the Background
For production deployments, you'll typically want to run the webserver as a background service. This can be achieved using process management tools like systemd, supervisord, or container orchestration platforms like Kubernetes.
Key Features and Usage
DAGs View
The DAGs view is your central hub for managing and monitoring your workflows. You can:
- See a list of all available DAGs.
- Toggle DAGs on and off.
- View DAG status (running, paused, success, failed).
- Access different visualizations (Graph, Tree, Gantt, Calendar, etc.).
- Clear task states, mark tasks as success/failure, and trigger DAG runs.
Browse Menu
The "Browse" menu provides access to various Airflow components:
- Jobs: View details about running and historical jobs.
- Task Instances: Inspect individual task instances, their states, logs, and execution times.
- DAG Runs: View and manage runs of your DAGs.
- SLA Misses: Track any DAGs or tasks that missed their Service Level Agreements.
- Audit Logs: Review actions performed within Airflow (requires appropriate configuration).
Admin Menu
The "Admin" menu offers access to administrative configurations:
- Connections: Manage connections to external systems.
- Variables: Define and manage Airflow variables.
- Pools: Configure resource pools to limit concurrency.
- Configuration: View the current Airflow configuration.
- Users/Roles: (If FAB auth is enabled) Manage users and roles.
By understanding and configuring the Airflow webserver, you can effectively manage and monitor your data pipelines.