5.3.3 Sloppy Quorums & hinted handoff
DB with appropriately configured quorums can
- tolerate the failure of individual nodes without the need for failover ==> high availability
- tolerate individual nodes going slow ==> low latency
But quorums are not as fault-tolerant as they could be
- a network interruption can cut off a client from a large number of DB nodes
==> < w / r nodes remain, can't reach a quorum
In a large cluster (>> n nodes), the client can connect to some DB nodes (not the nodes needed for a quorum) during the network interruption. Now we are facing a tradeoff, return errors or accept writes on other nodes?
sloppy quorum
writes and reads still require w and r successful responses, but those may include nodes that are not among the designated n "home" nodes for a value.
hinted handoff
once the network interruption is fixed, any writes that one node temporarily accepted on behalf of another node are sent to the appropriate "home" nodes
|
pros |
cons |
sloppy quorum |
useful for increase write availability |
even when w + r > n, there's no guarantee to read the latest value |
- sloppy quorum isn't a quorum at all, but only an assurance of durability
- there's no guarantee of latest value until the hinted handoff has completed.
sloppy quorum in Dynamo implementations
DB |
default config |
Riak |
enabled |
Cassandra |
disabled |
Voldemort |
disabled |
5.3.3.1 multi-datacenter operation
Leaderless replication is suitable for multi-DC operation, since it's designed to tolerate
- conflicting concurrent writes
- network interruptions
- latency spikes
DB |
multi-DC support |
Cassandra Voldemort |
# of replicas n includes nodes in all DCs can configure # of replicas in each DC the client only waits for ack from a quorum of nodes within its local DC async cross-DC replication |
Riak |
keeps all comm between clients & DB nodes local to one DC async cross-DC replication |
Reference
Designing Data-Intensive Applications by Martin Kleppman