Hey everyone!
I'm looking to implement some best practices for using Airflow at Spotify, especially concerning DAG organization, task idempotency, and efficient resource utilization. We're dealing with large-scale data pipelines and need to ensure reliability and scalability. Any insights or shared experiences from anyone working with Airflow in a similar environment would be greatly appreciated!
Specifically, I'm interested in:
- DAG folder structure for monorepos.
- Strategies for handling large numbers of tasks and DAGs.
- Recommended patterns for idempotency in custom operators.
- Best ways to monitor and debug complex pipelines.
Thanks in advance!