Player FM uygulamasıyla çevrimdışı Player FM !
DoK Talks#103 -Performant and Version-Aware Analytics With Spark & lakeFS on K8s // Itai Admi
Manage episode 307565024 series 2865115
https://go.dok.community/slack
https://dok.community/
ABSTRACT OF THE TALK
Spark and lakeFS are revolutionizing large scale data processing that is version-aware. Is it possible to run this architecture over Kubernetes? We’ll cover the fastest way to get this environment up and running, and the benefits you get with it. Finally we’ll show how horizontal scaling and the lakeFS Hadoop Filesystem avoid processing bottlenecks as workloads increase.
BIO
Itai is a R&D team leader at Treeverse, the company behind open-source lakeFS. He thrives on finding creative solutions for complex problems, especially if it involves code. Previously, Itai worked at Microsoft and Ridge on data infrastructure, tooling, and performance. Itai received his B.Sc degree in Computer Science and an MBA from Tel Aviv University.
KEY TAKE-AWAYS FROM THE TALK
- Importance of building reproducible data pipelines.
- Managing your data the same way you're managing your code.
243 bölüm
Manage episode 307565024 series 2865115
https://go.dok.community/slack
https://dok.community/
ABSTRACT OF THE TALK
Spark and lakeFS are revolutionizing large scale data processing that is version-aware. Is it possible to run this architecture over Kubernetes? We’ll cover the fastest way to get this environment up and running, and the benefits you get with it. Finally we’ll show how horizontal scaling and the lakeFS Hadoop Filesystem avoid processing bottlenecks as workloads increase.
BIO
Itai is a R&D team leader at Treeverse, the company behind open-source lakeFS. He thrives on finding creative solutions for complex problems, especially if it involves code. Previously, Itai worked at Microsoft and Ridge on data infrastructure, tooling, and performance. Itai received his B.Sc degree in Computer Science and an MBA from Tel Aviv University.
KEY TAKE-AWAYS FROM THE TALK
- Importance of building reproducible data pipelines.
- Managing your data the same way you're managing your code.
243 bölüm
Tüm bölümler
×Player FM'e Hoş Geldiniz!
Player FM şu anda sizin için internetteki yüksek kalitedeki podcast'leri arıyor. En iyi podcast uygulaması ve Android, iPhone ve internet üzerinde çalışıyor. Aboneliklerinizi cihazlar arasında eş zamanlamak için üye olun.