[DDIS Part II]: Distributed system - Partition
Partition is a way to divide large dataset into multiple smaller datasets, such that any record appears in exactly one of such smaller datasets. Then different partitions can be stored in different nodes. This is useful because:
- Scalability: request loads can be split to different nodes. The more fairly distributed partitions mean better request load balancing, hence greater scalability. Assume share-nothing distributed system (separate machines contain different partitions)
- Can be used along with replication scheme for each partition, and brings benefits of lower latency, higher availability (but also all the problems we had with replications)
Replication works independent of partitioning. Hence, this article's focus is on partition only. For replication, refer to this article I wrote.
1 Partitioning of key-value store
This section concerns with how to split key-value database into multiple partitions.
1.1 Partition by key range (good for range query, bad for hot spot)
Imagine a key-value store for an encyclopedia, where key is the word. Partition 1 can have all the words from A to B, Partition 2 can have all the words from C to E etc. This is advantageous because we can do range query (find all the encyclopedia words that start with letters 'Ab'). However, there's skew and hotspot issue.
1.1.1 Skew and hot spot
The problem with this is hot spot and skewing. We observe skewing in a scenario where the encyclopedia grows and a certain partition end up having majority of the database, decreasing advantage of partitioning (because one node is significantly bigger than another, so uses one node's resource disproportionately and request could come to that node more than others - a remedy could be rebalancing the data and distributing the data). Hot spot is the set of entries that recieve majority of requests - those entries might have to do with famous celebrity data. Simply evenly distributing the entries doesn't realize the partitioning benefits, because the requests are still overloaded to certain entry/entries that reside in particular node.
1.2 Partition by hash of key (bad for range query, good for hot spot)
Hash of key determines which partition an entry belongs to. Each partition will take certain range of hashes (p0: 0 to 100, p1: 1 to 200 ...), and since occurrence of entries do not follow any pattern, there's decreased chance of having hot spots in any particular partition. However, we lose the benefit of range query that we could get from partitions that were based on key range.
1.3 Difficulty of resolving hot spot
Neither of the two partitioning mechanisms can completely avoid hot spot issue. Extreme case that break the partitioning benefits is where only one entry is ever queried - then, the requests will always go to one node even with partitioning.
A workaround is to distribute that most used entry to many nodes (eg multiple entries for the same data, which will have different keys and be placed in different nodes). This requires tracking those entries, and eventually combining the updates together (replication issues, again). There's also issue of responding to changes (for example, when the most used entry changes / when number of nodes to distribute an entry has to increase, what key should be allocated in what pattern of string?)