今天看啥  ›  专栏  ›  星辰破

Designing DIA note 114 -- keep systems in sync

星辰破  · 简书  ·  · 2019-07-14 09:35

11.2 DB & stream

  • we saw that log-based MQ have been successful in taking ideas from DB and applying them to messaging, we can also go in reverse
  • an event may also be a write to a DB ==> fundamental connection between DB & streams
  • a replication log is a stream of DB write events -- describe the data changes that occurred

state machine replication principle

if every event represents a write to the database, and every replica processes thesame events in the same order, then the replicas will all end up in the same final state.

11.2.1 Keeping systems in Sync

  • there's no simple system that can satisfy all data storage, querying, and processing needs
  • most nontrivial apps need to combine different techs in order to satisfy their requirements, each has its own copy of data.

example

  • OLTP DB => serve user requests
  • cache => speed up common requests
  • full-text index => handle search queries
  • DW => analytics

  • as the same data appears in different places (DB, cache, search index, DW), they need to be kept in sync
  • the sync could be done by a batch process like ETL in DW
  • if periodic full DB dumps are too slow, we could use dual writes -- the app code writes to each of the systems when data changes sequentially or concurrently
problem of dual writes
  • concurrency problem : race condition
    ==> solution: additional concurrency detection mechanism required
Figure 11-4. In the database, X is first set to A and then to B, while at the search index the writes arrive in the opposite order.
  • fault-tolerance problem: one of the writes may fail while the other succeeds, as a result, 2 systems become inconsistent
    ==> solution: ensure atomic commit, expensive to solve
  • multi-leader replication conflicts: DB has a leader, search index has a leader, and they don't follow each other

Reference
Designing Data-Intensive Applications by Martin Kleppman




原文地址:访问原文地址
快照地址: 访问文章快照