Ep11. Designing Data-Intensive Applications - Partitioning Eng Cafe podcast

Ep11. Designing Data-Intensive Applications - Partitioning

4y ago 33:46

Paylaş

İçerik Thomas Wang tarafından sağlanmıştır. Bölümler, grafikler ve podcast açıklamaları dahil tüm podcast içeriği doğrudan Thomas Wang veya podcast platform ortağı tarafından yüklenir ve sağlanır. Birinin telif hakkıyla korunan çalışmanızı izniniz olmadan kullandığını düşünüyorsanız burada https://tr.player.fm/legal özetlenen süreci takip edebilirsiniz.

这一期我们讨论Designing Data-Intensive Applications书中partitioning这一章的学习笔记。

🔴 这一期偏重技术话题，我们会用很多英文表述技术性专有名词。之前有朋友反馈过中英夹杂对大家收听不方便，希望在意的朋友见谅。如果有不准确或者过时的地方欢迎指正。

# Show Notes

📕 Designing Data-Intensive Applications
What is partitioning?
- A partition is a division of a logical database or its constituent elements into distinct independent parts.
Main reason: scalability - the query load can be distributed across many processors.
Youtube / Vitess scaling story
- Single MySQL → Add read replica → Write can’t catchup up → Partition
How to partition?
Partitioning by Key Range (e.g., Bigtable)
- Assign a continuous range of keys to each partition
- Pro: range scan is easier, data locality
- Cons: certain access patterns can lead to hot spots (timestamp)
- Cons: finding split points and managing rebalancing is hard
Partitioning by Hash
- Good hash function: uniformly distribute keys
- Con: no easy range queries
Cassandra does KKV (partitioning key, sort key, value)
Hot spots: 3% of Twitter's Servers Dedicated to Justin Bieber
Secondary indexes: Local index
- Efficient write, expensive read
- ElasticSearch
Secondary indexes: Global index
- Efficient read, expensive write
- Using Global Secondary Indexes in DynamoDB (这里说错了，DynamoDB 支持 20 global secondary indexes per table）
Rebalancing partitions
- Move loads to other nodes
Fixed number of partitions
- New node steals partitions from every existing node
Notion: 480 partitions
Dynamic partitioning
- 📈: split partition into 2
- 📉: merge 2 partitions into 1
Fixed number of partitions per node
- https://www.datastax.com/blog/new-token-allocation-algorithm-cassandra-30
Operations: full automatic (dangerous) / semi-automatic / full manual (tedious)
Request Routing
- 3 approaches: nodes talk to each other, separate routing tier, smart client
- Separate coordination service such as ZooKeeper
Notes by xg

# 联系方式

官网: eng.cafe
微信公众号: Eng Cafe
Twitter: @engcafefm
Youtube: Eng Cafe
小宇宙播客
泛用型播客客户端: eng.cafe/subscribe
Email: [email protected]

16 bölüm

这一期我们讨论Designing Data-Intensive Applications书中partitioning这一章的学习笔记。

📕 Designing Data-Intensive Applications
What is partitioning?
- A partition is a division of a logical database or its constituent elements into distinct independent parts.
Main reason: scalability - the query load can be distributed across many processors.
Youtube / Vitess scaling story
- Single MySQL → Add read replica → Write can’t catchup up → Partition
How to partition?
Partitioning by Key Range (e.g., Bigtable)
- Assign a continuous range of keys to each partition
- Pro: range scan is easier, data locality
- Cons: certain access patterns can lead to hot spots (timestamp)
- Cons: finding split points and managing rebalancing is hard
Partitioning by Hash
- Good hash function: uniformly distribute keys
- Con: no easy range queries
Cassandra does KKV (partitioning key, sort key, value)
Hot spots: 3% of Twitter's Servers Dedicated to Justin Bieber
Secondary indexes: Local index
- Efficient write, expensive read
- ElasticSearch
Secondary indexes: Global index
- Efficient read, expensive write
- Using Global Secondary Indexes in DynamoDB (这里说错了，DynamoDB 支持 20 global secondary indexes per table）
Rebalancing partitions
- Move loads to other nodes
Fixed number of partitions
- New node steals partitions from every existing node
Notion: 480 partitions
Dynamic partitioning
- 📈: split partition into 2
- 📉: merge 2 partitions into 1
Fixed number of partitions per node
- https://www.datastax.com/blog/new-token-allocation-algorithm-cassandra-30
Operations: full automatic (dangerous) / semi-automatic / full manual (tedious)
Request Routing
- 3 approaches: nodes talk to each other, separate routing tier, smart client
- Separate coordination service such as ZooKeeper
Notes by xg

# 联系方式

官网: eng.cafe
微信公众号: Eng Cafe
Twitter: @engcafefm
Youtube: Eng Cafe
小宇宙播客
泛用型播客客户端: eng.cafe/subscribe
Email: [email protected]

Dinlemeye Değer Podcast'ler

Eng Cafe « »
Ep11. Designing Data-Intensive Applications - Partitioning