今天看啥  ›  专栏  ›  星辰破

Designing DIA note 44 --

星辰破  · 简书  ·  · 2019-07-20 23:35

6.2 partitioning of key value data

  • goal of partitioning -- spread the data & query load evenly across nodes
  • skewed -- an unfair partitioning, some partitions have more data / queries than others ==> less effective
  • hot pot -- a partition with disproportionately high load
  • to avoid hot pot ==> the simplest approach would be to assign records to nodes randomly
  • disadvantage of random assign -- when you read, you have to query all nodes in parallel as there's no way knowing which node the record is on

6.2.1 partitioning by key range

  • assign a continuous range fo keys to each partition, like a paper encyclopedia
  • with range boundaries + node partition assignment ==> you can make the request directly to the appropriate node (pick the correct book)
Figure 6-2. A print encyclopedia is partitioned by key range.
  • key ranges may not be evenly spaced, because your data may not be evenly distributed
  • to distribute the data evenly, the partition boundaries need to adapt to the data -- could be chosen manually/automatically
  • this strategy is used by Bigtable ==> HBase, RethinkDB, MongoDB < 2.4
  • within each partition, we can keep keys in sorted order ==> easy range scan, can fetch several related records in 1 query (e.g. key = timestamp, get all records for a month)
downside of key range partitioning
  • certain access patterns can lead to hot spots
  • e.g. key = timestamp, partitions => time range, then 1 day data of all sensors can go to the same partition while others sit idle
  • to avoid the problem, use sth other than the timestamp as the key => e.g. partitioning by sensor name + time

6.2.2 partitioning by hash of key

  • to avoid the risk of skew and hot spots, we could use the hash of a given key to determine the partition
suitable hash for partitioning
  • take skewed data ==> make it uniformly distributed
  • no need to be cryptographically strong
product hash function
MongoDB MD5
Cassandra Murmur3
Voldemort Fowler-Noll-Vo
unsuitable hash for partitioning
  • the same key may have a different hash in different processes
product hash function



原文地址:访问原文地址
快照地址: 访问文章快照