Google SWE teaches systems design | EP23: Conflict-Free Replicated Data Types
Summary
TLDRThis video script delves into Conflict-Free Replicated Data Types (CRDTs), essential for maintaining data consistency in multi-leader database replication systems. It explains CRDTs' role in conflict resolution, their implementation in collaborative tools like Google Drive and Figma, and their categorization into operation-based and state-based types. The script provides detailed examples, including grow-only counters and sets, and touches on sequence CRDTs, crucial for features like collaborative text editing. The importance of CRDTs in distributed key-value stores like Riak and Redis is highlighted, setting the stage for further exploration of caching and cache consistency.
Takeaways
- 📚 The script introduces Conflict-Free Replicated Data Types (CRDTs), a technology for managing conflicts in distributed databases with multiple leaders.
- 🔄 CRDTs allow databases to converge to a consistent state by passing operations or states between replicas, which is crucial for maintaining data integrity in distributed systems.
- 🛠 CRDTs can implement various data structures such as counters, sets, maps, and lists, which are particularly useful for applications like collaborative text editing.
- 💡 The script humorously suggests that CRDTs could have been useful in personal relationships for conflict resolution, highlighting their purpose in data systems.
- 🖥️ Use cases for CRDTs include collaborative editing platforms like Google Drive, Figma, and Office 365, online chat systems, and applications with offline editing capabilities.
- 🔍 The video will cover distributed key-value stores like Riak and Redis, which utilize CRDTs as a differentiating feature compared to other databases like Cassandra.
- 📈 There are two main types of CRDTs: operation-based and state-based, each with its advantages depending on the size of the data and the number of operations.
- 🔢 Operation-based CRDTs are efficient for transmitting small operations rather than large states, making them suitable for systems with infrequent updates.
- 🗃️ State-based CRDTs involve sending the entire state from one node to another, which can be simple to reason about but may be slow for large states.
- 🔄 The merge function for CRDTs must be commutative and idempotent to ensure that the data converges correctly after synchronization.
- 🔑 The script provides examples of CRDTs, such as grow-only counters and sets, explaining how they operate and converge to a consistent state across replicas.
Q & A
What are Conflict-free Replicated Data Types (CRDTs)?
-CRDTs are data structures that allow multiple database leaders to manage and replicate data without conflicts. They ensure that each database can converge to a consistent state after merging changes, even if the operations are out of order or duplicated.
Why are CRDTs important in database systems like Dynamo or Cassandra?
-CRDTs are important because they help mitigate write conflicts in systems with multiple potential leaders, where writes can be sent to different leaders and then need to be replicated and merged. They allow for higher write throughput with eventual consistency.
What are the typical use cases for CRDTs?
-CRDTs are useful in collaborative editing tools like Google Drive, Figma, and Office 365, online chat systems for maintaining message order, offline editing applications like calendar apps that need to sync changes back to the database, and distributed key-value stores such as Riak and Redis.
What are the two main types of CRDTs?
-The two main types of CRDTs are operation-based and state-based. Operation-based CRDTs pass operations like increments or decrements between databases, while state-based CRDTs transmit the entire state of the data structure.
How do operation-based CRDTs handle the potential for non-idempotent operations?
-Operation-based CRDTs ensure idempotency by using mechanisms such as TCP to deduplicate operations or by including an extra key that helps ensure that each operation is only applied once.
What is the advantage of using operation-based CRDTs when the state is large?
-Operation-based CRDTs are advantageous when the state is large because they only transmit the operations that change the state, rather than the entire state itself, reducing the amount of data transmitted over the network.
How do state-based CRDTs merge changes from different nodes?
-State-based CRDTs merge changes by sending the entire state from one node to another and then using a commutative and idempotent merge function, such as taking the element-wise maximum, to combine the states.
Can you explain how a grow-only counter CRDT works?
-A grow-only counter CRDT works by having each database replica keep track of the number of increments it has received. When replicas merge, they take the element-wise maximum of their counters to ensure the counter value reflects all increments across all replicas.
What is the difference between a grow-only counter and a counter that supports both increments and decrements?
-A grow-only counter only tracks increments, while a counter that supports both increments and decrements maintains two arrays: one for increments and one for decrements. The actual counter value is the sum of increments minus the sum of decrements.
How do CRDT sets handle the addition and removal of elements?
-CRDT sets handle additions by keeping an array of added elements and handle removals with an array of removed elements. The set's contents are determined by taking the union of added elements and subtracting any elements in the removed set.
What are some of the challenges with sequence CRDTs used for collaborative editing?
-Sequence CRDTs face challenges with maintaining order during inserts into the middle of a list, ensuring characters are not interleaved, and dealing with complex algorithms required for proper merging and synchronization.
Outlines
🔄 Introduction to CRDTs and Their Importance
This paragraph introduces Conflict-Free Replicated Data Types (CRDTs), a technology designed to handle conflicts in multi-leader database systems like Dynamo or Cassandra. CRDTs allow databases to converge to a consistent state despite conflicts, which can arise from high write throughput. The speaker humorously suggests that CRDTs could have helped in personal relationships, emphasizing their utility in conflict resolution. CRDTs can implement various data structures like counters, sets, maps, and lists, and are particularly useful in applications requiring collaborative editing, online chat systems, and offline editing sync, such as in calendar apps. The paragraph also mentions that CRDTs are a key differentiator in distributed key-value stores like Riak and Redis, setting the stage for further discussion in subsequent videos.
📊 Understanding Operation-Based and State-Based CRDTs
The speaker delves into the two main types of CRDTs: operation-based and state-based. Operation-based CRDTs involve passing operations, like increments in a counter, from one database to another, which is efficient for large states but requires ensuring idempotency to avoid duplication issues. State-based CRDTs, on the other hand, involve sending the entire state of a data structure, which can be less efficient for large states but is simpler to reason about. The paragraph explains the importance of commutativity and idempotency in merge functions for both types, and how these properties ensure that CRDTs converge to a consistent state even when operations or states are duplicated due to network issues. The speaker also discusses the suitability of each type depending on the size of the data and the frequency of operations.
📈 Examples of CRDTs: Grow-Only Counters and Sets
This paragraph provides detailed examples of CRDTs, starting with grow-only counters, which maintain an array of increments on each database replica. The process of incrementing the counter, the eventual consistency through anti-entropy processes or gossip protocols, and the convergence to a unified counter value across replicas are explained. The explanation includes how to handle queries for the counter value before and after convergence. The paragraph then extends the concept to sets, which track elements added and removed, and how the set contents are determined by the union of added elements and the exclusion of removed elements. The merge process for sets is also discussed, highlighting the use of element-wise maximum for both added and removed arrays to ensure convergence. The paragraph concludes with alternative approaches to handle the removal of elements in sets, such as using timestamps or unique tags to prevent permanent removal.
🔍 Conclusion on CRDTs and Preview of Upcoming Topics
In conclusion, the speaker emphasizes the utility of CRDTs in ensuring data convergence in multi-leader database replication systems. They are often used alongside version vectors to track dependencies and determine the concurrency of writes. The speaker also expresses the intention to explore CRDTs further in relation to Riak and other key-value stores, promising a deeper dive into topics like caching and cache consistency in upcoming videos. The paragraph wraps up with an acknowledgment of the complexity of sequence CRDTs and collaborative editing, which will be addressed in future videos, and a sign-off wishing the audience a good day.
Mindmap
Keywords
💡CRDTs
💡Key-Value Stores
💡Dynamo Style
💡Cassandra
💡Operational Transform
💡Multi-Leader Replication
💡Gossip Protocols
💡Anti-Entropy Process
💡Grow-Only Counter
💡Sequence CRDTs
💡Riak
Highlights
Introduction to Conflict-Free Replicated Data Types (CRDTs) and their importance in database systems.
CRDTs allow for convergence to a consistent state between database leaders in multi-leader replication setups.
CRDTs can implement data structures like counters, sets, maps, and lists, useful for applications like collaborative text editing.
Use cases of CRDTs include collaborative editing platforms like Google Drive, Figma, and Office 365.
Operational transform is another algorithm for conflict resolution in collaborative editing, to be discussed in a future video.
CRDTs are used in distributed key-value stores like Riak and Redis, differentiating them from databases like Cassandra.
Two main types of CRDTs: operation-based and state-based, each with their own advantages and use cases.
Operation-based CRDTs are beneficial when the state is large and few operations are performed relative to the state size.
State-based CRDTs are simpler to reason about but can be slow when transmitting large states over the network.
Gossip protocols work well with state-based CRDTs for exchanging information out of order and potentially duplicated.
Detailed explanation of a grow-only counter CRDT, demonstrating how databases converge to a consistent counter value.
Counter CRDTs that can be incremented and decremented, using arrays to track increments and decrements.
Sets CRDTs function similarly to grow-only counters, with added and removed element tracking.
Timestamps or unique tags can be used to resolve conflicts in set CRDTs where elements are added and removed.
Sequence CRDTs are complex and used for maintaining order in lists, such as in collaborative editing applications.
CRDTs are essential for ensuring data convergence in multi-leader database replication, often used with version vectors.
Upcoming discussion on key-value stores, caching, and cache consistency in subsequent videos.
Transcripts
alrighty so we finally got through all
the content we needed to do for the wide
column no sql databases which means
we're going to start moving into
basically all the key value stores and
the associated content with that so in
order to get started we're going to talk
about crdts or conflict-pre-replicated
data types
and i'll just go ahead and jump into
that right now
okay crttts what are they
well basically in a bunch of the
database systems that i've talked about
so far aka dynamo style
or you know things like cassandra where
there's a bunch of possible leaders that
rights can be sent to and rights are
going to be replicated in some sort of
topology or ordering around those
leaders there's inevitably going to be
right conflicts obviously this comes at
the trade-off of higher right throughput
which is great but it's still extra
things that we have to deal with and
think about in order to kind of mitigate
those right conflicts
as such in the past few years we've seen
this new type of technology come around
which is called a conflict-free
replicated data type crdt for short and
basically all those do
are it's basically a piece of data that
each database leader can keep internally
that eventually can be sent from one
database to the other and you know upon
merging they can kind of converge to a
consistent state between them
obviously crdts are super useful and if
i had had one in my past relationship
maybe me and my ex would still be
together because we could have used some
sort of conflict resolution like that
crdts can implement things like counters
sets maps and lists
and
even for something like collaborative
text editing they can be really useful
especially as far as lists go
so
that brings us to use cases mainly so in
terms of collaborative editing like
google drive figma office 365 being able
to create a list of items that comes
from a multi-leader replication setup is
really useful and it's not something
that's super easy um there's a ton of
challenges involved there and it's still
a research topic that is very much
ongoing
you can also use an algorithm called
operational transform for this type of
thing but that is a topic for another
video
there's also online chat systems in
terms of ensuring that the ordering of
the chats is eventually going to be the
same amongst all users
there's also anything that involves
offline editing like say a calendar app
eventually you're gonna have to sync
those changes back into the database and
so in that sense each offline client is
kind of treated like its own database in
a multi-leader replication setup and
then finally crdts are now used in a
decent amount of distributed key value
stores such as ryak and redis since
these are both things that i'll be
talking about in subsequent videos i
wanted to first talk about crdts because
they're very important differentiator
between these and something like
cassandra
okay so there are two main types of
crdts there's operation based and
state-based crdts
operation based is what it sounds like
basically you're going to be passing the
operation from database to database so
basically for something like a counter
instead of passing an entire database's
local counter from database to database
you'll simply pass the fact that
there was an increment operation
what's important about this is that even
though those operations may be
commutative as in it doesn't matter
which one is executed first the order of
them doesn't matter
they may not be item potent which means
that if for some reason the network goes
ahead and duplicates the fact that a
client tried to increment a counter one
database node might receive the fact
that there were two increments when the
other database node thinks there is only
one increment so it's important to make
sure that these are deduplicated which
you can do via something like tcp or
maybe you could even just include an
extra key
that acts
as a way to ensure item potency
additionally these are really good
compared to state-based crdts when the
state of whatever it is that we're
transmitting or trying to keep track of
is very large and expensive to kind of
transmit over the network
even especially if there are very few
operations relative to the size of the
crdt itself
again operation based crdts are probably
the move here
that being said these two things are
both equivalent mathematically so it
really kind of comes down to these
trade-offs
as far as state-based crdts go
for something like say a counter for
example like i mentioned you would be
sending the entirety of the counter over
from one node to another and then the
nodes once they receive kind of those
remote counters would uh be going ahead
and merging in that state
the merge function it's very important
that it's both commutative and item
potent so even if you accidentally
duplicate a merge it has no effect and
emerge will go ahead and propagate all
the previous changes that a node has
basically realized or seen in the past
in this sense
state-based crdts are pretty simple to
reason about but like i said they can be
kind of slow if you have a lot to
transmit over the network
finally since gossip protocols deliver
messages out of order and since they are
potentially duplicated i've mentioned
gossip protocols in a prior video on my
channel so feel free to look at that
they work very well with state-based
crdts for kind of exchanging all of this
information
okay so now i'm going to go through a
few examples of crdts i'm going to go
through the first one in a lot of depth
and then we'll quickly go through the
next couple to kind of see how you can
grow those from the first example
okay
so this is a grow only counter which is
basically saying we are going to have
counters on every single database
replica which keep track of how many
increments they've received
and after merging
other information with other replicas
imagine there's some sort of
anti-entropy process in the background
that kind of exchanges information from
one node to the other what's going to
now happen is that the information is
going to be eventually consistent and
converging so that every node is
somewhat up to date
so what's going to happen is that each
node in the database is going to start
off with an array of zeros this
insinuates that all the counters are at
zero and that
the reason in this case why we only have
two elements in the list is there are
two replicas
so now every single time that one of the
replicas handles an increment operation
it's going to increment its own local
counter so that is its corresponding
index in the list that it's holding so
as you can see i'm uh you know
incrementing uh replica one or basically
replica one is handling my increment so
it's going to increment the zeroth index
of its list which corresponds to it
okay client b is going to do the same
thing but it's going to be handled by
replica 2 and that increment is going to
happen five times so now you can see
that there's going to be five in that
counter so still even though
these two separate requests have been
handled
the two replicas have not synced up with
one another yet they're still
eventually going to converge but right
now they have yet to converge so let's
say i want to query the counter value
and the request is handled by replica
one
it's going to return one because it's
just going to sum up that local array if
i had that query handled by replica 2
then it would actually return 5.
but eventually what's going to happen is
that anti-entropy process which runs in
the background or maybe it's a gossip
protocol or something like that but
anything that transmits the information
is going to go ahead and send that array
from the second replica to the first and
what's going to happen is the first
replica is going to say okay for every
element in this list
i'm going to create a new array which
now represents my counter and to get
that new array i'm simply going to take
the maximum so the element-wise maximum
of each index
so now
obviously now we're going to have one 5
instead of the 1-0 that we had before
when the changes are propagated
propagated from replica 1 to replica 2
we're going to call that merge function
on replica 2 and now they're both
consistent and have one comma 5 as their
counter
so what this basically means is that as
far as each replica is aware replica 1
has processed one increment request and
replica 2 has processed five increment
requests so now if i were to go ahead
and query either of them for the total
counter value all that's going to happen
is we're going to sum up this array and
i'm going to get the value six so they
have converged if i were to continue to
make more increments on either replica
they would temporarily again be out of
sync but eventually they would come back
to the same value and agree that's kind
of the point of a crdt
so we've discussed the grow only counter
but what if i wanted a counter that
could both be incremented and
decremented well if you remember from
the previous diagram
what happened was that each replica kind
of had a list showing how many
increments
every single replica has processed as
far as it's aware so we're going to now
do the same thing but one for all the
increments and one array for all the
decrements and then to get the actual
counter value from one replica you're
going to sum up the increments array and
then subtract the sum of the decrements
array and as far as the merge segment uh
basically the merge function goes when
uh these replicas are synchronizing with
one another
uh you basically merge them in the same
way as we did the increment array so the
increment arrays we do the element wise
max and the decrement array we do the
element-wise max as well
i see that i wrote min here on the slide
but it's actually the element-wise max
so moving on to sets it's basically the
same as the grow only counter with the
following adjustments
as opposed to using an increments and
degrees list held on each replica now
we're keeping track of two arrays one
for elements added and one for elements
removed so to get the set contents
basically we take all the elements in
the added list and then we just don't
count any element that appears in that
removed set
and then as far as merging them goes we
basically just take the union of two
added sets and the union of two removed
sets
so
as you can see there's kind of this very
similar process of just being able to
have that commutative merge function
which is idempotent because the union
with a set is obviously item potent and
in the case of the counters taking the
maximum or the element-wise maximum of
two lists is obviously going to be item
potent
so that's how sets would work but as you
can see one of the issues with what i
just expressed with sets is that once an
element is in the remove array it can no
longer be in that set again so how can
we change this
basically to mitigate this some
variations have been created
one is every set
every element in the add and remove set
has a timestamp and then you would kind
of use the timestamp to basically say oh
if there's an element in the added array
but it has a more recent timestamp than
the same element in the removed array
then actually the fact that that element
was removed is not valid we can say that
that element was once again added
additionally another way of getting
around this because every time we use
timestamps and distributed systems we
run the risk of dealing with unreliable
clocks
is to just attach a unique tag to every
single element in the the ad set
and then you can go ahead and add that
element tag tuple into the remove set as
well
and if an element tag combination is
only in the ad set but not the remove
set then you can assume that that
element is actually still in basically
the merged set or the result of the crdt
as far as sequence crdts go and this is
kind of what we might use for something
like collaborative editing i'm not going
to touch too much into these just
because they're pretty complicated and
they're actually a lot of issues as far
as trying to get them to work well and
multiple different possible algorithms
for getting them to do so
it's something that i plan on discussing
in a future video but for now just know
that the general kind of gist behind
this is we're trying to get a list and
doing so is really tough because when
you're preserving order there are all
these inserts into the middle of the
list and it's really hard when you're
trying to kind of make sure that the
characters that you're inserting are not
getting interleaved with the characters
that someone else is inserting so it's a
pretty complicated thing that i will 100
discuss it in a future video along with
operational transform which i believe is
the current algorithm that google docs
uses for collaborative text editing
okay
so in conclusion crdts are super useful
data structures to ensure convergence in
a multi-leader database replication
schema
they're generally used in conjunction
with or with kind of like a similar
design pattern to version vectors which
if you remember from my multi-leader
replication video is basically keeping
track of the dependencies of every
single write and using this to determine
when two writes are concurrent or if
they're not concurrent
since this is a really differentiating
feature of ryak versus something like
cassandra i figured it was really
important to kind of understand how
crdts actually work and give a few
examples of them before actually looking
at kind of the ryac breakdown itself
because i didn't want to just
off-handedly mention crdts and then be
like oh yeah ryak has those that
wouldn't really be fair to you guys
so all in all i hope this video was
useful
i look forward to getting back into the
key value stores because then we're
going to get to touch upon some more new
things that are really important
like caching and cache consistency and
that's all going to come into play soon
so yeah things are really coming
together guys and have a good day
Weitere ähnliche Videos ansehen
Choosing a Database for Systems Design: All you need to know in one video
Google SWE teaches systems design | EP26: Redis and Memcached Explained (While Drunk?)
DS201.12 Replication | Foundations of Apache Cassandra
Google SWE teaches systems design | EP20: Coordination Services
Redis Tutorial for Beginners #1 - What is Redis?
How indexes work in Distributed Databases, their trade-offs, and challenges
5.0 / 5 (0 votes)