专栏名称: 星辰破

今天看啥

微信公众号rss订阅, 微信rss, 稳定的RSS源

微信公众号RSS订阅方法

B站投稿RSS订阅方法

知乎回答RSS订阅方法

知乎专栏 RSS订阅方法

雪球动态RSS订阅方法

微博RSS订阅方法

微博搜索关键词订阅方法

豆瓣日记 RSS订阅方法

Designing DIA note 44 --

星辰破 · 简书 · · 2019-07-20 23:35

6.2 partitioning of key value data

goal of partitioning -- spread the data & query load evenly across nodes
skewed -- an unfair partitioning, some partitions have more data / queries than others ==> less effective
hot pot -- a partition with disproportionately high load
to avoid hot pot ==> the simplest approach would be to assign records to nodes randomly
disadvantage of random assign -- when you read, you have to query all nodes in parallel as there's no way knowing which node the record is on

6.2.1 partitioning by key range

assign a continuous range fo keys to each partition, like a paper encyclopedia
with range boundaries + node partition assignment ==> you can make the request directly to the appropriate node (pick the correct book)

Figure 6-2. A print encyclopedia is partitioned by key range.

key ranges may not be evenly spaced, because your data may not be evenly distributed
to distribute the data evenly, the partition boundaries need to adapt to the data -- could be chosen manually/automatically
this strategy is used by Bigtable ==> HBase, RethinkDB, MongoDB < 2.4
within each partition, we can keep keys in sorted order ==> easy range scan, can fetch several related records in 1 query (e.g. key = timestamp, get all records for a month)

downside of key range partitioning

certain access patterns can lead to hot spots
e.g. key = timestamp, partitions => time range, then 1 day data of all sensors can go to the same partition while others sit idle
to avoid the problem, use sth other than the timestamp as the key => e.g. partitioning by sensor name + time

6.2.2 partitioning by hash of key

to avoid the risk of skew and hot spots, we could use the hash of a given key to determine the partition

suitable hash for partitioning

take skewed data ==> make it uniformly distributed
no need to be cryptographically strong

product	hash function
MongoDB	MD5
Cassandra	Murmur3
Voldemort	Fowler-Noll-Vo

unsuitable hash for partitioning

the same key may have a different hash in different processes

product	hash function

原文地址：访问原文地址
快照地址：访问文章快照

分享到微博

推荐文章

槽值 · 网易沸点哒哒工作室招聘视频策划（外包）、实习生

昨天

FM93交通之声 · 又一国！“已向俄罗斯提出加入申请”

昨天

FM93交通之声 · 首次！拜登竟然公开向泽连斯基道歉

昨天

槽值 · “恕我直言，中国人快把济州岛给买下了”

3 天前

HOT男人 · 男人洗澡迷思：香皂 or 沐浴露？

5 天前

胡锡进观察 · 称美国经济“一片大好”的人，并不了解背后的增长逻辑

2 月前

中国人民银行 · 2024年2月金融统计数据报告

2 月前

唐史主任司马迁 · 这两天操作难度快速加大，开的新仓一多半都亏钱，尤其是随手买的，稍-20240314145154

2 月前

甲子光年 · 谁能定义下一个AIoT时代？ | 甲子光年

2 年前

医学界 · 住院！“特朗普疗法”有何特殊？

3 年前