Artwork

İçerik The Data Flowcast tarafından sağlanmıştır. Bölümler, grafikler ve podcast açıklamaları dahil tüm podcast içeriği doğrudan The Data Flowcast veya podcast platform ortağı tarafından yüklenir ve sağlanır. Birinin telif hakkıyla korunan çalışmanızı izniniz olmadan kullandığını düşünüyorsanız burada https://tr.player.fm/legal özetlenen süreci takip edebilirsiniz.
Player FM - Podcast Uygulaması
Player FM uygulamasıyla çevrimdışı Player FM !

Overcoming Airflow Scaling Challenges at Monzo Bank with Jonathan Rainer

43:39
 
Paylaş
 

Manage episode 465365556 series 2948506
İçerik The Data Flowcast tarafından sağlanmıştır. Bölümler, grafikler ve podcast açıklamaları dahil tüm podcast içeriği doğrudan The Data Flowcast veya podcast platform ortağı tarafından yüklenir ve sağlanır. Birinin telif hakkıyla korunan çalışmanızı izniniz olmadan kullandığını düşünüyorsanız burada https://tr.player.fm/legal özetlenen süreci takip edebilirsiniz.

Scaling a data orchestration platform to manage thousands of tasks daily demands innovative solutions and strategic problem-solving. In this episode, we explore the complexities of scaling Airflow and the challenges of orchestrating thousands of tasks in dynamic data environments. Jonathan Rainer, Former Platform Engineer at Monzo Bank, joins us to share his journey optimizing data pipelines, overcoming UI limitations and ensuring DAG consistency in high-stakes scenarios.

Key Takeaways:

(03:11) Using Airflow to schedule computation in BigQuery.

(07:02) How DAGs with 8,000+ tasks were managed nightly.

(08:18) Ensuring accuracy in regulatory reporting for banking.

(11:35) Handling task inconsistency and DAG failures with automation.

(16:09) Building a service to resolve DAG consistency issues in Airflow.

(25:05) Challenges with scaling the Airflow UI for thousands of tasks.

(27:03) The role of upstream and downstream task management in Airflow.

(37:33) The importance of operational metrics for monitoring Airflow health.

(39:19) Balancing new tools with root cause analysis to address scaling issues.

(41:35) Why scaling solutions require both technical and leadership buy-in

Resources Mentioned:

Jonathan Rainer -

https://www.linkedin.com/in/jonathan-rainer/

Monzo Bank -

https://www.linkedin.com/company/monzo-bank/

Apache Airflow -

https://airflow.apache.org/

BigQuery -

https://airflow.apache.org/docs/apache-airflow-providers-google/stable/operators/cloud/bigquery.html

Kubernetes -

https://kubernetes.io/

Thanks for listening to “The Data Flowcast: Mastering Airflow for Data Engineering & AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.

#AI #Automation #Airflow #MachineLearning

  continue reading

51 bölüm

Artwork
iconPaylaş
 
Manage episode 465365556 series 2948506
İçerik The Data Flowcast tarafından sağlanmıştır. Bölümler, grafikler ve podcast açıklamaları dahil tüm podcast içeriği doğrudan The Data Flowcast veya podcast platform ortağı tarafından yüklenir ve sağlanır. Birinin telif hakkıyla korunan çalışmanızı izniniz olmadan kullandığını düşünüyorsanız burada https://tr.player.fm/legal özetlenen süreci takip edebilirsiniz.

Scaling a data orchestration platform to manage thousands of tasks daily demands innovative solutions and strategic problem-solving. In this episode, we explore the complexities of scaling Airflow and the challenges of orchestrating thousands of tasks in dynamic data environments. Jonathan Rainer, Former Platform Engineer at Monzo Bank, joins us to share his journey optimizing data pipelines, overcoming UI limitations and ensuring DAG consistency in high-stakes scenarios.

Key Takeaways:

(03:11) Using Airflow to schedule computation in BigQuery.

(07:02) How DAGs with 8,000+ tasks were managed nightly.

(08:18) Ensuring accuracy in regulatory reporting for banking.

(11:35) Handling task inconsistency and DAG failures with automation.

(16:09) Building a service to resolve DAG consistency issues in Airflow.

(25:05) Challenges with scaling the Airflow UI for thousands of tasks.

(27:03) The role of upstream and downstream task management in Airflow.

(37:33) The importance of operational metrics for monitoring Airflow health.

(39:19) Balancing new tools with root cause analysis to address scaling issues.

(41:35) Why scaling solutions require both technical and leadership buy-in

Resources Mentioned:

Jonathan Rainer -

https://www.linkedin.com/in/jonathan-rainer/

Monzo Bank -

https://www.linkedin.com/company/monzo-bank/

Apache Airflow -

https://airflow.apache.org/

BigQuery -

https://airflow.apache.org/docs/apache-airflow-providers-google/stable/operators/cloud/bigquery.html

Kubernetes -

https://kubernetes.io/

Thanks for listening to “The Data Flowcast: Mastering Airflow for Data Engineering & AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.

#AI #Automation #Airflow #MachineLearning

  continue reading

51 bölüm

Minden epizód

×
 
Loading …

Player FM'e Hoş Geldiniz!

Player FM şu anda sizin için internetteki yüksek kalitedeki podcast'leri arıyor. En iyi podcast uygulaması ve Android, iPhone ve internet üzerinde çalışıyor. Aboneliklerinizi cihazlar arasında eş zamanlamak için üye olun.

 

Hızlı referans rehberi

Keşfederken bu şovu dinleyin
Çal