Hey everyone,
I'm looking to squeeze every bit of performance out of our Airflow setup. We're seeing some significant delays in task execution, especially during peak hours. I've already implemented some basic DAG structuring and used the executor recommended for our scale, but I feel there's more to explore.
Specifically, I'm interested in:
- Database query optimization related to Airflow metadata.
- Strategies for parallelizing tasks effectively beyond simple dependencies.
- Tips for reducing the overhead of task scheduling and monitoring.
- Any best practices for configuring the Celery or Kubernetes executor for maximum throughput.
What are your go-to methods for optimizing Airflow performance?