Google SWE teaches systems design | EP23: Conflict-Free Replicated Data Types

Jordan has no life
27 Apr 202213:29

Summary

TLDRThis video script delves into Conflict-Free Replicated Data Types (CRDTs), essential for maintaining data consistency in multi-leader database replication systems. It explains CRDTs' role in conflict resolution, their implementation in collaborative tools like Google Drive and Figma, and their categorization into operation-based and state-based types. The script provides detailed examples, including grow-only counters and sets, and touches on sequence CRDTs, crucial for features like collaborative text editing. The importance of CRDTs in distributed key-value stores like Riak and Redis is highlighted, setting the stage for further exploration of caching and cache consistency.

Takeaways

  • 📚 The script introduces Conflict-Free Replicated Data Types (CRDTs), a technology for managing conflicts in distributed databases with multiple leaders.
  • 🔄 CRDTs allow databases to converge to a consistent state by passing operations or states between replicas, which is crucial for maintaining data integrity in distributed systems.
  • 🛠 CRDTs can implement various data structures such as counters, sets, maps, and lists, which are particularly useful for applications like collaborative text editing.
  • 💡 The script humorously suggests that CRDTs could have been useful in personal relationships for conflict resolution, highlighting their purpose in data systems.
  • 🖥️ Use cases for CRDTs include collaborative editing platforms like Google Drive, Figma, and Office 365, online chat systems, and applications with offline editing capabilities.
  • 🔍 The video will cover distributed key-value stores like Riak and Redis, which utilize CRDTs as a differentiating feature compared to other databases like Cassandra.
  • 📈 There are two main types of CRDTs: operation-based and state-based, each with its advantages depending on the size of the data and the number of operations.
  • 🔢 Operation-based CRDTs are efficient for transmitting small operations rather than large states, making them suitable for systems with infrequent updates.
  • 🗃️ State-based CRDTs involve sending the entire state from one node to another, which can be simple to reason about but may be slow for large states.
  • 🔄 The merge function for CRDTs must be commutative and idempotent to ensure that the data converges correctly after synchronization.
  • 🔑 The script provides examples of CRDTs, such as grow-only counters and sets, explaining how they operate and converge to a consistent state across replicas.

Q & A

  • What are Conflict-free Replicated Data Types (CRDTs)?

    -CRDTs are data structures that allow multiple database leaders to manage and replicate data without conflicts. They ensure that each database can converge to a consistent state after merging changes, even if the operations are out of order or duplicated.

  • Why are CRDTs important in database systems like Dynamo or Cassandra?

    -CRDTs are important because they help mitigate write conflicts in systems with multiple potential leaders, where writes can be sent to different leaders and then need to be replicated and merged. They allow for higher write throughput with eventual consistency.

  • What are the typical use cases for CRDTs?

    -CRDTs are useful in collaborative editing tools like Google Drive, Figma, and Office 365, online chat systems for maintaining message order, offline editing applications like calendar apps that need to sync changes back to the database, and distributed key-value stores such as Riak and Redis.

  • What are the two main types of CRDTs?

    -The two main types of CRDTs are operation-based and state-based. Operation-based CRDTs pass operations like increments or decrements between databases, while state-based CRDTs transmit the entire state of the data structure.

  • How do operation-based CRDTs handle the potential for non-idempotent operations?

    -Operation-based CRDTs ensure idempotency by using mechanisms such as TCP to deduplicate operations or by including an extra key that helps ensure that each operation is only applied once.

  • What is the advantage of using operation-based CRDTs when the state is large?

    -Operation-based CRDTs are advantageous when the state is large because they only transmit the operations that change the state, rather than the entire state itself, reducing the amount of data transmitted over the network.

  • How do state-based CRDTs merge changes from different nodes?

    -State-based CRDTs merge changes by sending the entire state from one node to another and then using a commutative and idempotent merge function, such as taking the element-wise maximum, to combine the states.

  • Can you explain how a grow-only counter CRDT works?

    -A grow-only counter CRDT works by having each database replica keep track of the number of increments it has received. When replicas merge, they take the element-wise maximum of their counters to ensure the counter value reflects all increments across all replicas.

  • What is the difference between a grow-only counter and a counter that supports both increments and decrements?

    -A grow-only counter only tracks increments, while a counter that supports both increments and decrements maintains two arrays: one for increments and one for decrements. The actual counter value is the sum of increments minus the sum of decrements.

  • How do CRDT sets handle the addition and removal of elements?

    -CRDT sets handle additions by keeping an array of added elements and handle removals with an array of removed elements. The set's contents are determined by taking the union of added elements and subtracting any elements in the removed set.

  • What are some of the challenges with sequence CRDTs used for collaborative editing?

    -Sequence CRDTs face challenges with maintaining order during inserts into the middle of a list, ensuring characters are not interleaved, and dealing with complex algorithms required for proper merging and synchronization.

Outlines

00:00

🔄 Introduction to CRDTs and Their Importance

This paragraph introduces Conflict-Free Replicated Data Types (CRDTs), a technology designed to handle conflicts in multi-leader database systems like Dynamo or Cassandra. CRDTs allow databases to converge to a consistent state despite conflicts, which can arise from high write throughput. The speaker humorously suggests that CRDTs could have helped in personal relationships, emphasizing their utility in conflict resolution. CRDTs can implement various data structures like counters, sets, maps, and lists, and are particularly useful in applications requiring collaborative editing, online chat systems, and offline editing sync, such as in calendar apps. The paragraph also mentions that CRDTs are a key differentiator in distributed key-value stores like Riak and Redis, setting the stage for further discussion in subsequent videos.

05:03

📊 Understanding Operation-Based and State-Based CRDTs

The speaker delves into the two main types of CRDTs: operation-based and state-based. Operation-based CRDTs involve passing operations, like increments in a counter, from one database to another, which is efficient for large states but requires ensuring idempotency to avoid duplication issues. State-based CRDTs, on the other hand, involve sending the entire state of a data structure, which can be less efficient for large states but is simpler to reason about. The paragraph explains the importance of commutativity and idempotency in merge functions for both types, and how these properties ensure that CRDTs converge to a consistent state even when operations or states are duplicated due to network issues. The speaker also discusses the suitability of each type depending on the size of the data and the frequency of operations.

10:04

📈 Examples of CRDTs: Grow-Only Counters and Sets

This paragraph provides detailed examples of CRDTs, starting with grow-only counters, which maintain an array of increments on each database replica. The process of incrementing the counter, the eventual consistency through anti-entropy processes or gossip protocols, and the convergence to a unified counter value across replicas are explained. The explanation includes how to handle queries for the counter value before and after convergence. The paragraph then extends the concept to sets, which track elements added and removed, and how the set contents are determined by the union of added elements and the exclusion of removed elements. The merge process for sets is also discussed, highlighting the use of element-wise maximum for both added and removed arrays to ensure convergence. The paragraph concludes with alternative approaches to handle the removal of elements in sets, such as using timestamps or unique tags to prevent permanent removal.

🔍 Conclusion on CRDTs and Preview of Upcoming Topics

In conclusion, the speaker emphasizes the utility of CRDTs in ensuring data convergence in multi-leader database replication systems. They are often used alongside version vectors to track dependencies and determine the concurrency of writes. The speaker also expresses the intention to explore CRDTs further in relation to Riak and other key-value stores, promising a deeper dive into topics like caching and cache consistency in upcoming videos. The paragraph wraps up with an acknowledgment of the complexity of sequence CRDTs and collaborative editing, which will be addressed in future videos, and a sign-off wishing the audience a good day.

Mindmap

Keywords

💡CRDTs

CRDTs, or Conflict-Free Replicated Data Types, are data structures that can be replicated across multiple databases and will eventually converge to a consistent state even if updates occur in parallel at different nodes. They are essential for maintaining consistency in distributed systems, particularly in databases with multiple potential leaders. In the video, CRDTs are highlighted as a key technology for managing conflicts in NoSQL databases like Dynamo-style systems and Cassandra.

💡Key-Value Stores

Key-Value Stores are a type of database that stores data as a collection of key-value pairs. They are simple and fast, making them suitable for scenarios where quick lookups and insertions are necessary. The script discusses moving into topics related to key-value stores, indicating that they are a significant part of the video's exploration of NoSQL databases.

💡Dynamo Style

Dynamo Style refers to a design pattern used in distributed database systems, inspired by the Amazon Dynamo paper. It typically involves a decentralized, highly available, and partition-tolerant system with multiple potential leaders for data replication. The script mentions Dynamo-style systems as an example where CRDTs can be particularly useful.

💡Cassandra

Cassandra is a specific implementation of a NoSQL database that follows the Dynamo-style design. It is known for its high write throughput and eventual consistency model. The script uses Cassandra as an example of a database where right conflicts can occur and where CRDTs can be used to manage these conflicts.

💡Operational Transform

Operational Transform is an algorithm used in collaborative systems to resolve conflicts in a way that allows multiple users to edit a document simultaneously. It is mentioned in the script as an alternative to CRDTs for managing conflicts in collaborative editing scenarios, with a promise of a future video dedicated to this topic.

💡Multi-Leader Replication

Multi-Leader Replication is a replication strategy where more than one node can accept write operations, leading to potential conflicts that need to be resolved. The script discusses this concept in the context of databases like Cassandra and how CRDTs can be used to manage conflicts in such setups.

💡Gossip Protocols

Gossip Protocols are a method of disseminating information in distributed systems, where each node communicates with a random selection of other nodes. The script mentions that these protocols, which can deliver messages out of order and potentially duplicate them, work well with state-based CRDTs.

💡Anti-Entropy Process

Anti-Entropy Process refers to the background process in distributed systems that helps nodes to synchronize their states, ensuring eventual consistency. In the script, it is mentioned as a process that can help CRDTs converge to a consistent state across different replicas.

💡Grow-Only Counter

A Grow-Only Counter is a specific type of CRDT that can only be incremented and not decremented. The script provides an example of a grow-only counter, explaining how each database replica keeps track of increments and eventually merges these counts to maintain consistency.

💡Sequence CRDTs

Sequence CRDTs are a type of CRDT designed to handle ordered lists, which is particularly challenging due to the need to maintain order amidst concurrent insertions. The script briefly mentions sequence CRDTs and their complexity, noting that they will be discussed in more detail in a future video.

💡Riak

Riak is a distributed key-value store database that uses CRDTs to ensure data consistency. The script mentions Riak as an example of a distributed key-value store that utilizes CRDTs, setting it apart from other databases like Cassandra.

Highlights

Introduction to Conflict-Free Replicated Data Types (CRDTs) and their importance in database systems.

CRDTs allow for convergence to a consistent state between database leaders in multi-leader replication setups.

CRDTs can implement data structures like counters, sets, maps, and lists, useful for applications like collaborative text editing.

Use cases of CRDTs include collaborative editing platforms like Google Drive, Figma, and Office 365.

Operational transform is another algorithm for conflict resolution in collaborative editing, to be discussed in a future video.

CRDTs are used in distributed key-value stores like Riak and Redis, differentiating them from databases like Cassandra.

Two main types of CRDTs: operation-based and state-based, each with their own advantages and use cases.

Operation-based CRDTs are beneficial when the state is large and few operations are performed relative to the state size.

State-based CRDTs are simpler to reason about but can be slow when transmitting large states over the network.

Gossip protocols work well with state-based CRDTs for exchanging information out of order and potentially duplicated.

Detailed explanation of a grow-only counter CRDT, demonstrating how databases converge to a consistent counter value.

Counter CRDTs that can be incremented and decremented, using arrays to track increments and decrements.

Sets CRDTs function similarly to grow-only counters, with added and removed element tracking.

Timestamps or unique tags can be used to resolve conflicts in set CRDTs where elements are added and removed.

Sequence CRDTs are complex and used for maintaining order in lists, such as in collaborative editing applications.

CRDTs are essential for ensuring data convergence in multi-leader database replication, often used with version vectors.

Upcoming discussion on key-value stores, caching, and cache consistency in subsequent videos.

Transcripts

play00:00

alrighty so we finally got through all

play00:02

the content we needed to do for the wide

play00:05

column no sql databases which means

play00:07

we're going to start moving into

play00:09

basically all the key value stores and

play00:12

the associated content with that so in

play00:14

order to get started we're going to talk

play00:16

about crdts or conflict-pre-replicated

play00:19

data types

play00:20

and i'll just go ahead and jump into

play00:22

that right now

play00:25

okay crttts what are they

play00:27

well basically in a bunch of the

play00:29

database systems that i've talked about

play00:31

so far aka dynamo style

play00:34

or you know things like cassandra where

play00:36

there's a bunch of possible leaders that

play00:37

rights can be sent to and rights are

play00:39

going to be replicated in some sort of

play00:41

topology or ordering around those

play00:43

leaders there's inevitably going to be

play00:45

right conflicts obviously this comes at

play00:48

the trade-off of higher right throughput

play00:49

which is great but it's still extra

play00:51

things that we have to deal with and

play00:53

think about in order to kind of mitigate

play00:54

those right conflicts

play00:56

as such in the past few years we've seen

play00:58

this new type of technology come around

play01:00

which is called a conflict-free

play01:02

replicated data type crdt for short and

play01:05

basically all those do

play01:07

are it's basically a piece of data that

play01:09

each database leader can keep internally

play01:12

that eventually can be sent from one

play01:14

database to the other and you know upon

play01:16

merging they can kind of converge to a

play01:18

consistent state between them

play01:20

obviously crdts are super useful and if

play01:23

i had had one in my past relationship

play01:26

maybe me and my ex would still be

play01:28

together because we could have used some

play01:29

sort of conflict resolution like that

play01:32

crdts can implement things like counters

play01:34

sets maps and lists

play01:36

and

play01:37

even for something like collaborative

play01:39

text editing they can be really useful

play01:41

especially as far as lists go

play01:43

so

play01:44

that brings us to use cases mainly so in

play01:47

terms of collaborative editing like

play01:48

google drive figma office 365 being able

play01:52

to create a list of items that comes

play01:55

from a multi-leader replication setup is

play01:58

really useful and it's not something

play02:00

that's super easy um there's a ton of

play02:03

challenges involved there and it's still

play02:04

a research topic that is very much

play02:07

ongoing

play02:08

you can also use an algorithm called

play02:10

operational transform for this type of

play02:12

thing but that is a topic for another

play02:14

video

play02:15

there's also online chat systems in

play02:17

terms of ensuring that the ordering of

play02:19

the chats is eventually going to be the

play02:21

same amongst all users

play02:24

there's also anything that involves

play02:26

offline editing like say a calendar app

play02:28

eventually you're gonna have to sync

play02:29

those changes back into the database and

play02:31

so in that sense each offline client is

play02:34

kind of treated like its own database in

play02:36

a multi-leader replication setup and

play02:38

then finally crdts are now used in a

play02:41

decent amount of distributed key value

play02:43

stores such as ryak and redis since

play02:46

these are both things that i'll be

play02:47

talking about in subsequent videos i

play02:49

wanted to first talk about crdts because

play02:51

they're very important differentiator

play02:53

between these and something like

play02:54

cassandra

play02:56

okay so there are two main types of

play02:58

crdts there's operation based and

play03:01

state-based crdts

play03:03

operation based is what it sounds like

play03:05

basically you're going to be passing the

play03:08

operation from database to database so

play03:11

basically for something like a counter

play03:13

instead of passing an entire database's

play03:15

local counter from database to database

play03:17

you'll simply pass the fact that

play03:19

there was an increment operation

play03:21

what's important about this is that even

play03:23

though those operations may be

play03:25

commutative as in it doesn't matter

play03:27

which one is executed first the order of

play03:29

them doesn't matter

play03:30

they may not be item potent which means

play03:33

that if for some reason the network goes

play03:35

ahead and duplicates the fact that a

play03:37

client tried to increment a counter one

play03:39

database node might receive the fact

play03:42

that there were two increments when the

play03:44

other database node thinks there is only

play03:46

one increment so it's important to make

play03:48

sure that these are deduplicated which

play03:50

you can do via something like tcp or

play03:52

maybe you could even just include an

play03:54

extra key

play03:55

that acts

play03:56

as a way to ensure item potency

play03:59

additionally these are really good

play04:00

compared to state-based crdts when the

play04:03

state of whatever it is that we're

play04:05

transmitting or trying to keep track of

play04:07

is very large and expensive to kind of

play04:10

transmit over the network

play04:12

even especially if there are very few

play04:14

operations relative to the size of the

play04:17

crdt itself

play04:19

again operation based crdts are probably

play04:21

the move here

play04:23

that being said these two things are

play04:25

both equivalent mathematically so it

play04:27

really kind of comes down to these

play04:28

trade-offs

play04:29

as far as state-based crdts go

play04:32

for something like say a counter for

play04:34

example like i mentioned you would be

play04:36

sending the entirety of the counter over

play04:38

from one node to another and then the

play04:40

nodes once they receive kind of those

play04:42

remote counters would uh be going ahead

play04:44

and merging in that state

play04:47

the merge function it's very important

play04:49

that it's both commutative and item

play04:51

potent so even if you accidentally

play04:52

duplicate a merge it has no effect and

play04:55

emerge will go ahead and propagate all

play04:57

the previous changes that a node has

play05:00

basically realized or seen in the past

play05:02

in this sense

play05:03

state-based crdts are pretty simple to

play05:05

reason about but like i said they can be

play05:08

kind of slow if you have a lot to

play05:09

transmit over the network

play05:11

finally since gossip protocols deliver

play05:14

messages out of order and since they are

play05:17

potentially duplicated i've mentioned

play05:19

gossip protocols in a prior video on my

play05:20

channel so feel free to look at that

play05:23

they work very well with state-based

play05:24

crdts for kind of exchanging all of this

play05:27

information

play05:28

okay so now i'm going to go through a

play05:30

few examples of crdts i'm going to go

play05:32

through the first one in a lot of depth

play05:34

and then we'll quickly go through the

play05:36

next couple to kind of see how you can

play05:38

grow those from the first example

play05:41

okay

play05:42

so this is a grow only counter which is

play05:44

basically saying we are going to have

play05:47

counters on every single database

play05:48

replica which keep track of how many

play05:51

increments they've received

play05:53

and after merging

play05:55

other information with other replicas

play05:57

imagine there's some sort of

play05:58

anti-entropy process in the background

play06:00

that kind of exchanges information from

play06:02

one node to the other what's going to

play06:04

now happen is that the information is

play06:06

going to be eventually consistent and

play06:08

converging so that every node is

play06:10

somewhat up to date

play06:12

so what's going to happen is that each

play06:13

node in the database is going to start

play06:15

off with an array of zeros this

play06:16

insinuates that all the counters are at

play06:18

zero and that

play06:20

the reason in this case why we only have

play06:22

two elements in the list is there are

play06:23

two replicas

play06:25

so now every single time that one of the

play06:27

replicas handles an increment operation

play06:29

it's going to increment its own local

play06:31

counter so that is its corresponding

play06:34

index in the list that it's holding so

play06:36

as you can see i'm uh you know

play06:38

incrementing uh replica one or basically

play06:41

replica one is handling my increment so

play06:43

it's going to increment the zeroth index

play06:45

of its list which corresponds to it

play06:48

okay client b is going to do the same

play06:50

thing but it's going to be handled by

play06:52

replica 2 and that increment is going to

play06:54

happen five times so now you can see

play06:56

that there's going to be five in that

play06:58

counter so still even though

play07:00

these two separate requests have been

play07:02

handled

play07:03

the two replicas have not synced up with

play07:05

one another yet they're still

play07:06

eventually going to converge but right

play07:08

now they have yet to converge so let's

play07:10

say i want to query the counter value

play07:12

and the request is handled by replica

play07:14

one

play07:15

it's going to return one because it's

play07:16

just going to sum up that local array if

play07:18

i had that query handled by replica 2

play07:20

then it would actually return 5.

play07:23

but eventually what's going to happen is

play07:25

that anti-entropy process which runs in

play07:27

the background or maybe it's a gossip

play07:28

protocol or something like that but

play07:30

anything that transmits the information

play07:32

is going to go ahead and send that array

play07:34

from the second replica to the first and

play07:37

what's going to happen is the first

play07:38

replica is going to say okay for every

play07:40

element in this list

play07:42

i'm going to create a new array which

play07:43

now represents my counter and to get

play07:45

that new array i'm simply going to take

play07:47

the maximum so the element-wise maximum

play07:49

of each index

play07:51

so now

play07:52

obviously now we're going to have one 5

play07:54

instead of the 1-0 that we had before

play07:57

when the changes are propagated

play07:58

propagated from replica 1 to replica 2

play08:01

we're going to call that merge function

play08:02

on replica 2 and now they're both

play08:04

consistent and have one comma 5 as their

play08:06

counter

play08:08

so what this basically means is that as

play08:10

far as each replica is aware replica 1

play08:12

has processed one increment request and

play08:14

replica 2 has processed five increment

play08:16

requests so now if i were to go ahead

play08:19

and query either of them for the total

play08:20

counter value all that's going to happen

play08:23

is we're going to sum up this array and

play08:25

i'm going to get the value six so they

play08:27

have converged if i were to continue to

play08:29

make more increments on either replica

play08:31

they would temporarily again be out of

play08:33

sync but eventually they would come back

play08:35

to the same value and agree that's kind

play08:36

of the point of a crdt

play08:39

so we've discussed the grow only counter

play08:41

but what if i wanted a counter that

play08:42

could both be incremented and

play08:44

decremented well if you remember from

play08:46

the previous diagram

play08:48

what happened was that each replica kind

play08:50

of had a list showing how many

play08:52

increments

play08:53

every single replica has processed as

play08:55

far as it's aware so we're going to now

play08:57

do the same thing but one for all the

play08:59

increments and one array for all the

play09:02

decrements and then to get the actual

play09:03

counter value from one replica you're

play09:06

going to sum up the increments array and

play09:09

then subtract the sum of the decrements

play09:10

array and as far as the merge segment uh

play09:13

basically the merge function goes when

play09:15

uh these replicas are synchronizing with

play09:17

one another

play09:18

uh you basically merge them in the same

play09:20

way as we did the increment array so the

play09:22

increment arrays we do the element wise

play09:24

max and the decrement array we do the

play09:26

element-wise max as well

play09:29

i see that i wrote min here on the slide

play09:30

but it's actually the element-wise max

play09:33

so moving on to sets it's basically the

play09:35

same as the grow only counter with the

play09:38

following adjustments

play09:40

as opposed to using an increments and

play09:42

degrees list held on each replica now

play09:45

we're keeping track of two arrays one

play09:46

for elements added and one for elements

play09:48

removed so to get the set contents

play09:51

basically we take all the elements in

play09:53

the added list and then we just don't

play09:56

count any element that appears in that

play09:58

removed set

play09:59

and then as far as merging them goes we

play10:01

basically just take the union of two

play10:03

added sets and the union of two removed

play10:06

sets

play10:07

so

play10:07

as you can see there's kind of this very

play10:09

similar process of just being able to

play10:11

have that commutative merge function

play10:13

which is idempotent because the union

play10:15

with a set is obviously item potent and

play10:17

in the case of the counters taking the

play10:20

maximum or the element-wise maximum of

play10:22

two lists is obviously going to be item

play10:24

potent

play10:25

so that's how sets would work but as you

play10:27

can see one of the issues with what i

play10:29

just expressed with sets is that once an

play10:31

element is in the remove array it can no

play10:34

longer be in that set again so how can

play10:36

we change this

play10:37

basically to mitigate this some

play10:39

variations have been created

play10:41

one is every set

play10:43

every element in the add and remove set

play10:44

has a timestamp and then you would kind

play10:46

of use the timestamp to basically say oh

play10:48

if there's an element in the added array

play10:50

but it has a more recent timestamp than

play10:52

the same element in the removed array

play10:54

then actually the fact that that element

play10:57

was removed is not valid we can say that

play10:58

that element was once again added

play11:01

additionally another way of getting

play11:03

around this because every time we use

play11:05

timestamps and distributed systems we

play11:07

run the risk of dealing with unreliable

play11:08

clocks

play11:09

is to just attach a unique tag to every

play11:12

single element in the the ad set

play11:14

and then you can go ahead and add that

play11:16

element tag tuple into the remove set as

play11:19

well

play11:20

and if an element tag combination is

play11:22

only in the ad set but not the remove

play11:24

set then you can assume that that

play11:26

element is actually still in basically

play11:29

the merged set or the result of the crdt

play11:33

as far as sequence crdts go and this is

play11:36

kind of what we might use for something

play11:37

like collaborative editing i'm not going

play11:39

to touch too much into these just

play11:40

because they're pretty complicated and

play11:42

they're actually a lot of issues as far

play11:45

as trying to get them to work well and

play11:47

multiple different possible algorithms

play11:49

for getting them to do so

play11:50

it's something that i plan on discussing

play11:52

in a future video but for now just know

play11:54

that the general kind of gist behind

play11:56

this is we're trying to get a list and

play11:59

doing so is really tough because when

play12:00

you're preserving order there are all

play12:02

these inserts into the middle of the

play12:03

list and it's really hard when you're

play12:05

trying to kind of make sure that the

play12:07

characters that you're inserting are not

play12:08

getting interleaved with the characters

play12:10

that someone else is inserting so it's a

play12:12

pretty complicated thing that i will 100

play12:14

discuss it in a future video along with

play12:16

operational transform which i believe is

play12:18

the current algorithm that google docs

play12:20

uses for collaborative text editing

play12:23

okay

play12:24

so in conclusion crdts are super useful

play12:27

data structures to ensure convergence in

play12:30

a multi-leader database replication

play12:32

schema

play12:33

they're generally used in conjunction

play12:34

with or with kind of like a similar

play12:36

design pattern to version vectors which

play12:38

if you remember from my multi-leader

play12:40

replication video is basically keeping

play12:42

track of the dependencies of every

play12:44

single write and using this to determine

play12:46

when two writes are concurrent or if

play12:48

they're not concurrent

play12:50

since this is a really differentiating

play12:53

feature of ryak versus something like

play12:55

cassandra i figured it was really

play12:56

important to kind of understand how

play12:58

crdts actually work and give a few

play13:00

examples of them before actually looking

play13:02

at kind of the ryac breakdown itself

play13:05

because i didn't want to just

play13:06

off-handedly mention crdts and then be

play13:09

like oh yeah ryak has those that

play13:10

wouldn't really be fair to you guys

play13:12

so all in all i hope this video was

play13:14

useful

play13:15

i look forward to getting back into the

play13:16

key value stores because then we're

play13:18

going to get to touch upon some more new

play13:19

things that are really important

play13:21

like caching and cache consistency and

play13:24

that's all going to come into play soon

play13:25

so yeah things are really coming

play13:26

together guys and have a good day

Rate This

5.0 / 5 (0 votes)

Related Tags
CRDTsData SynchronizationMulti-LeaderDatabasesConflict ResolutionDistributed SystemsGoogle DocsCollaborative EditingGossip ProtocolOperational Transform