Google SWE teaches systems design | EP20: Coordination Services

Jordan has no life
24 Apr 202211:41

Summary

TLDRIn this video, the presenter dives into the concept of coordination services in distributed systems, emphasizing their role in maintaining shared state and cluster configuration. Key examples like ZooKeeper and etcd are mentioned, highlighting their use of consensus algorithms for high availability and eventual consistency. The video also explores methods to achieve strong consistency, such as reading from the leader, using the sync command, and the emerging concept of quorum reads. The presenter wraps up by stressing the importance of coordination services in modern data storage systems and the trade-offs between consistency and performance.

Takeaways

  • 🏋️ The video is about systems design, focusing on coordination services in distributed systems.
  • 📚 The speaker plans to discuss HBase, a type of database, but first needs to cover coordination services and related technologies.
  • 🌐 Coordination services are essential for maintaining shared state or information in a cluster of nodes in a distributed system.
  • 🔑 They include details like IP addresses, node partitions, and the status of nodes (alive or down), as well as distributed locks.
  • 🤹‍♂️ Examples of coordination services mentioned are ZooKeeper and etcd, which provide centralized systems for metadata about the cluster.
  • 📈 These services are designed for read-heavy workloads, with a focus on eventual consistency rather than strong consistency for efficiency.
  • 🔒 Coordination services ensure data is highly available through replication and consensus algorithms like Paxos, Raft, or Zab.
  • 🔄 Monotonic reads are supported to prevent time from appearing to move backward, ensuring reads progress forward.
  • 👀 Watches can be attached to keys or files to receive notifications if the data changes, similar to serializable snapshot isolation in databases.
  • 🔑 Three methods for achieving strong consistency in coordination services are discussed: reading from the leader, using the 'sync' operation, and exploring quorum reads.
  • 🚀 The importance of coordination services in modern data storage systems is highlighted, noting their role in maintaining system integrity and performance.

Q & A

  • What is the main purpose of a coordination service in a distributed system?

    -The main purpose of a coordination service in a distributed system is to maintain shared state or information about the configuration of the cluster, including IP addresses of nodes, partitions, and the status of nodes, as well as facilitating distributed locks and consensus for certain operations.

  • Can you name two examples of coordination services mentioned in the script?

    -Two examples of coordination services mentioned in the script are ZooKeeper and etcd.

  • What does the term 'highly available' imply for coordination services?

    -For coordination services, 'highly available' means that they are replicated key-value stores built on a consensus layer, ensuring that the system remains operational and accessible even if some of the nodes fail.

  • Why are coordination services generally designed for read-heavy workloads?

    -Coordination services are designed for read-heavy workloads because writes require consensus among nodes, which can be slow, whereas reads can be scaled linearly with the number of nodes, making them more efficient for such workloads.

  • What is the significance of monotonic reads in coordination services?

    -Monotonic reads ensure that when a client reads from one replica and then from another, the second read is not more outdated than the first, thus preventing the illusion of time moving backward and maintaining a consistent view of the system state.

  • What is a 'watch' in the context of coordination services?

    -A 'watch' in coordination services is a mechanism that allows a client to attach to a key or file, receiving notifications if the watched item changes before the client's transaction is complete, enabling the client to retry operations or make informed decisions.

  • How does the sync operation help achieve stronger consistency in coordination services?

    -The sync operation writes a command into the replicated log, allowing clients to read only from replicas that have the sync included in their log, ensuring that all subsequent reads are from a point in time after the sync was committed.

  • What is the potential issue with using quorum reads to achieve strong consistency?

    -Quorum reads can face a race condition where the leader has committed a value locally but other replicas have not yet received the commit message, leading to a situation where a quorum read might return an outdated value.

  • Why might reading from the leader be problematic for coordination services?

    -Reading from the leader can be problematic because the leader is already handling all write operations and communicating with other nodes, so additional read requests could overload the leader and slow down the system.

  • What is the trade-off when choosing to achieve strong consistency in coordination services?

    -Achieving strong consistency in coordination services comes at the cost of reduced performance, as it requires more communication and coordination among nodes, which can slow down the system.

  • How do coordination services differ from gossip protocols in terms of data storage and propagation?

    -Coordination services store data in a centralized, replicated location and use a consensus algorithm for data updates, ensuring atomicity and total ordering. In contrast, gossip protocols involve nodes randomly passing information to each other until the information is propagated throughout the system.

Outlines

00:00

😀 Introduction to Coordination Services in Distributed Systems

The speaker begins by greeting new subscribers and briefly explains the purpose of the video, which is to discuss coordination services in distributed systems. They mention their plan to cover HBase, a type of database, but first, they need to explain the prerequisite technologies. The speaker then defines a coordination service as a centralized system that maintains shared state and metadata about a cluster, including node IP addresses, partitions, and distributed locks. Examples of coordination services are ZooKeeper and etcd. The video aims to provide a general understanding of how these services work, rather than focusing on specific details of any one service.

05:02

🔒 The Role of Coordination Services in Ensuring High Availability

This paragraph delves into the specifics of coordination services, emphasizing their role in maintaining high availability through replication and consensus mechanisms. The speaker explains that these services are key-value stores built on top of a consensus layer, which ensures a replicated log across all nodes. They discuss the trade-off between write performance and consistency, noting that coordination services are optimized for read-heavy workloads and may not provide strong consistency for reads. The concept of monotonic reads is introduced to prevent time-reversal issues in data reading, and the use of watches to ensure predicate validity is also highlighted, which allows clients to be notified of changes to data they are interested in.

10:02

🔑 Achieving Strong Consistency in Coordination Services

The speaker explores the methods to achieve strong consistency in coordination services, which is not inherently provided due to performance considerations. Three approaches are discussed: reading from the leader, which is reliable but can overload the leader; using the 'sync' operation, which records a position in the log to ensure all subsequent reads are from up-to-date replicas; and quorum reads, which involve reading from a majority of nodes to get the most current value. The paragraph also addresses the challenges and potential race conditions associated with quorum reads, and how they might be improved with further research. The importance of coordination services in modern data storage systems is reiterated, and the video concludes with a reminder of the trade-offs between consistency and performance.

Mindmap

Keywords

💡Coordination Service

A coordination service is a system used in distributed computing to manage shared state and configuration information across a cluster of nodes. It is essential for maintaining information such as IP addresses of nodes, partition assignments, and node membership status. The script discusses coordination services like ZooKeeper and etcd, highlighting their role in ensuring distributed locks and consensus for various operations within a distributed system.

💡Distributed System

A distributed system is a network of computers that work together to achieve a common goal. The video script emphasizes the need for coordination services in distributed systems to manage shared state and configuration, illustrating the complexity of keeping track of multiple nodes and their states within such systems.

💡ZooKeeper

ZooKeeper is a specific example of a coordination service mentioned in the script. It is used to maintain configuration information, provide distributed synchronization, and provide group services. The script explains that ZooKeeper operates almost like a file system but with the ability for files to have child files, built on a consensus layer.

💡Consensus Algorithm

A consensus algorithm is a process that allows multiple nodes in a distributed system to agree on a single data value or a series of operations. The script mentions that coordination services like ZooKeeper use consensus algorithms such as ZAB (ZooKeeper Atomic Broadcast) to ensure all nodes agree on the state of the system.

💡High Availability

High availability in the context of the video refers to systems that are designed to be operational and accessible at all times. The script explains that coordination services are highly available, meaning they are replicated key-value stores that ensure continuous operation even if some nodes fail.

💡Read Heavy Workloads

Read heavy workloads are situations where the majority of operations are reads rather than writes. The script points out that coordination services are optimized for read heavy workloads, which is why they may not provide strong consistency out of the box, as it would be inefficient for such use cases.

💡Eventual Consistency

Eventual consistency is a consistency model used in distributed systems where the system guarantees that if no new updates are made to a given data item, eventually all accesses will return the last updated value. The script explains that coordination services typically deal with eventual consistency rather than strong consistency to optimize for read operations.

💡Monotonic Reads

Monotonic reads ensure that a client receives newer or equal data with each successive read, avoiding the perception of time moving backward. The script mentions that coordination services avoid the issue of outdated reads by using a replicated log to track the most recent updates, thus ensuring monotonic reads.

💡Predicate Validity

Predicate validity refers to the concept of ensuring that the conditions under which a read or write operation was made are still valid when the operation is completed. The script explains that coordination services allow attaching a 'watch' to a key, which notifies the client if the watched item changes before the transaction is complete, thus maintaining predicate validity.

💡Strong Consistency

Strong consistency is a consistency model where a read operation will always return the most recent write. The script discusses the trade-offs involved in achieving strong consistency, noting that it can be achieved through specific mechanisms like reading from the leader or using the sync operation, but at the cost of reduced performance.

💡Quorum Reads

Quorum reads are a method of reading from a majority of nodes in a distributed system to obtain an up-to-date value. The script explores the concept of quorum reads as a potential way to achieve strong consistency without overloading the leader, although it also points out the challenges and race conditions that can occur with this approach.

Highlights

Introduction to the concept of coordination services in distributed systems.

Explanation of the need for shared state or information in a cluster of nodes.

Mention of technologies like ZooKeeper and etcd as examples of coordination services.

Discussion on the importance of coordination services for read-heavy workloads.

Clarification on the difference between strong consistency and eventual consistency in coordination services.

Introduction of the term 'monotonic reads' and its significance in coordination services.

Explanation of 'predicate validity' and how coordination services ensure it.

Description of how to attach a 'watch' to a key in coordination services for change notifications.

Three methods to achieve strong consistency in coordination services: reading from the leader, using sync, and quorum reads.

Analysis of the limitations and potential issues with reading from the leader for strong consistency.

Explanation of the 'sync' operation and its role in ensuring strong consistency.

Introduction to the concept of quorum reads and their potential for improving read performance.

Discussion on the challenges and race conditions associated with quorum reads.

Conclusion on the importance of coordination services in modern data storage systems.

Highlight of the trade-off between strong consistency and performance in coordination services.

Emphasis on the role of coordination services in various database technologies and their practical applications.

Transcripts

play00:00

hey everyone uh wow welcome to all the

play00:03

new subscribers yet again uh hope you

play00:05

guys are looking forward to learning

play00:06

some more crap about systems design uh

play00:08

today i'm gonna kind of rush this video

play00:10

through since uh i'm in a little bit of

play00:12

a rush to get some stuff done but i just

play00:14

got back from the gym feeling kind of

play00:16

smelly so it's a perfect time in my

play00:18

perfect computer science mindset and

play00:20

mentality to make another video um i

play00:23

know i've kind of been working up to

play00:24

talking about more technologies and

play00:26

stuff like that i think what i'm going

play00:27

to do is probably hbase next which is

play00:29

another type of database but in order to

play00:31

talk about that i have to talk about one

play00:33

or two different types of technology and

play00:36

then we'll probably go and get to that

play00:37

hbase video so enjoy that let's get into

play00:39

this

play00:41

okay so a coordination service what is

play00:43

it

play00:44

basically in any type of distributed

play00:47

system that has a cluster of nodes

play00:49

there's some need to have some sort of

play00:50

shared state or information about the

play00:52

configuration of the cluster

play00:54

that includes things like the ip

play00:56

addresses of nodes in this cluster which

play00:58

partitions are on each node the nodes

play01:00

that are actually still members so which

play01:02

are alive and which might be down things

play01:04

like distributed locks that require some

play01:06

consensus or coordination for one node

play01:08

to be grabbing them and others can't

play01:10

so you know obviously there's this kind

play01:11

of need for a centralized system where

play01:13

you're keeping information or like

play01:14

metadata about the cluster so to solve

play01:17

this problem um things called

play01:18

coordination services have come up and

play01:21

if you've ever heard of things like

play01:22

zookeeper or at cd these are examples of

play01:25

them and so basically i'm going to talk

play01:27

about those this video i'm not going to

play01:29

do like super specific like zookeeper

play01:31

specific or lcd specific details i might

play01:33

do that in the future but for now i'm

play01:35

just going to try and give a sense of

play01:36

how coordination services in general

play01:38

work and then we'll

play01:40

i guess kind of see in the future how

play01:42

those work inside of other systems

play01:44

generally speaking by the way these are

play01:45

meant for read heavy workloads so we'll

play01:47

see why that's relevant in a bit

play01:49

okay so what is a coordination service

play01:51

well i kind of touched upon this in my

play01:52

raft video but the point is they're just

play01:55

highly available and by highly available

play01:57

all that really means is that they're

play01:58

replicated key value stores or in the

play02:00

case of zooka zookeeper it's kind of

play02:02

like almost like a file system but files

play02:05

can have child files

play02:07

built on top of some sort of consensus

play02:08

layer which really just means that they

play02:10

share a replicated log

play02:12

by virtue of having to use a consensus

play02:15

mechanism like i don't know paxos raft

play02:17

or zookeeper uses something called zab

play02:20

obviously rights are going to be pretty

play02:21

slow because they all have to go through

play02:22

the leader and they all have to touch at

play02:24

least a quorum of nodes in order to

play02:26

actually successfully write

play02:29

in terms of reads and coordination

play02:31

services well you might think that hey

play02:33

oh the fact that we're using a consensus

play02:35

algorithm might mean that we're going to

play02:36

have strong consistency actually that's

play02:38

not the case because like i said these

play02:40

are for read heavy workloads like

play02:42

probably ten to one is what it says in

play02:44

the zookeeper documentation so what that

play02:46

means is that if you're reading ten to

play02:48

one generally strong consistency isn't

play02:50

going to be efficient enough you

play02:51

probably need to deal with eventual

play02:52

consistency and then you know if you

play02:54

need to go and make that even more

play02:56

consistent such that you get strong

play02:57

consistency zookeeper will allow you to

play02:59

do something like that but generally

play03:01

speaking

play03:03

what do coordination services actually

play03:05

tell you about their reads because you

play03:06

can read from any replica and there's no

play03:08

guarantee that the data is going to be

play03:09

up to date

play03:11

well first of all there's no monotonic

play03:13

reads monotonic reads is a term that i

play03:15

introduced probably like 15 videos ago

play03:18

but if you recall a monotonic read is

play03:19

basically when a client makes a read

play03:21

from one replica then reads from another

play03:23

replica and the second one is more

play03:25

outdated than the first and as a result

play03:27

it looks like time is going backwards so

play03:29

obviously we can't be having any of that

play03:31

we want a service to see reads actually

play03:33

going forwards in time and so you can

play03:35

avoid this by using the replicated log

play03:37

basically the fact that each server has

play03:40

a replicated log where the ids are the

play03:42

same for each slot in the log means that

play03:45

if a client makes a read or a write to

play03:47

or from some replica it's going to get

play03:50

the last id that it's seen so you know

play03:52

it knows the length of the log on that

play03:53

replica and then from then on it's only

play03:55

going to accept reads as valid if the

play03:57

log kind of supporting that read if the

play03:59

log on the replica where it gets the

play04:01

read from is more up to date than that

play04:03

last id that it saw

play04:06

okay additionally

play04:08

these coordination services can ensure

play04:10

predicate validity

play04:12

so basically what this means is uh

play04:14

predicate's also another term i've used

play04:16

in the past so if you want to recall a

play04:17

predicate is basically just information

play04:19

that you might read from a database

play04:21

before making some other read or write

play04:24

so you say you know for example i'm a

play04:26

doctor and i want to go off my shift but

play04:28

i can only do so if there's another

play04:30

doctor currently in the room

play04:32

i'm going to read to see if there are

play04:33

other doctors in the room from my

play04:34

database table and if there are i can

play04:36

leave but what if that predicate's no

play04:38

longer valid it's important to be able

play04:39

to make sure that you know things are

play04:42

actually still the case that you thought

play04:43

they were before making some sort of

play04:45

change to a system so in coordination

play04:47

services in particular you can attach

play04:49

something called a watch to any key that

play04:51

you read or i guess any file or file

play04:53

name and what that's going to do is the

play04:55

replica is going to keep track of that

play04:56

watch and say okay if this file has

play04:59

actually been changed before that

play05:01

transaction is finished up

play05:03

the client is going to receive some sort

play05:05

of notification probably through some

play05:07

stream of events that the file was

play05:09

changed and that way it can either retry

play05:12

that like read write operation or it can

play05:14

just keep that in mind and that's

play05:16

actually kind of a similar approach to

play05:17

serializable snapshot isolation which if

play05:19

you remember is just like the database

play05:22

will keep track of almost the

play05:24

dependencies of every single read and

play05:25

write and if one of those dependencies

play05:27

becomes outdated it'll have to cancel

play05:29

the rate and do it again and that's like

play05:31

an optimistic concurrency control type

play05:33

of thing

play05:35

okay so what if we do need strong

play05:36

consistency because a lot of the times

play05:39

you know you really don't want to be for

play05:40

example

play05:42

say we're talking about the ip addresses

play05:43

of the nodes maybe it's unacceptable to

play05:46

you know be sending write requests to

play05:47

the wrong ip address and just getting no

play05:49

response perhaps we want strong

play05:51

consistency and to ensure that one

play05:53

client can read another client's rights

play05:55

pretty much instantly well how can we do

play05:56

this there are three ways in these

play05:59

coordination services and i will talk

play06:00

about all three of them the first and

play06:02

the most simple is just reading from the

play06:04

leader so obviously that's always doable

play06:07

but it's got a couple of issues with it

play06:08

for starters the leader is already

play06:10

pretty bogged down by the fact that all

play06:12

the right operations are going through

play06:13

it and it has to handle the fact that

play06:15

it's

play06:16

communicating with all of the other

play06:17

nodes in the cluster remember that for

play06:19

every single write you need to hit a

play06:21

majority of nodes so the leader is

play06:22

sending out tons of network data

play06:25

and the fact that now you're going to be

play06:26

hitting it with reads too is a lot to

play06:28

handle it's really going to slow things

play06:29

down

play06:30

another thing is to use an operation

play06:32

called sync which i'll touch upon in the

play06:34

next slide and then this is kind of

play06:37

more of an upcoming research area but

play06:39

we'll talk about it a little bit are the

play06:41

concept of quorum reads

play06:44

okay so what is sync a client node will

play06:47

basically go ahead and write the command

play06:48

sync into the replicated log so sync is

play06:51

not any sort of key value but it does

play06:54

take a position in the replicated log

play06:56

well what does that do as i said every

play06:58

single time you write and the leader

play07:01

responds back to you saying hey you just

play07:02

had a successful write it'll also

play07:04

respond with the id of what you just

play07:07

wrote in the log so you know the

play07:09

position of your write and the log so

play07:11

once i have the position of that sync

play07:13

any replica that i'm going to read from

play07:15

recall because there's

play07:16

only monotonic reads allowed

play07:19

it means that

play07:21

basically you're only going to be

play07:23

getting reads from replicas that have

play07:24

that sync included in their log so it's

play07:26

basically saying

play07:28

every single time you write a sync i

play07:29

will now only be reading from this point

play07:31

onwards

play07:32

so uh there's that that's one way of

play07:35

doing things

play07:36

and then the other way is quorum reads

play07:38

so in both of these scenarios keep in

play07:40

mind we're putting a ton of load on the

play07:41

leader if you're reading from the leader

play07:43

well obviously you're putting a load on

play07:45

the leader and if you're using sync it

play07:46

means you're doing another right which

play07:48

means that you're putting even more load

play07:50

on the leader because the leader has to

play07:51

communicate with all these other

play07:52

replicas but what if we just wanted to

play07:54

be able to get up to reads from the

play07:56

replicas alone well this might actually

play07:58

be possible so the first thing to recall

play08:00

is that all of these replicated

play08:01

consensus algorithms require a quorum of

play08:04

nodes in order to accept and eventually

play08:06

commit or write

play08:07

so what that means is that if i am to

play08:10

read from a quorum of nodes i should be

play08:12

able to get an up-to-date

play08:14

and up-to-date value basically for any

play08:17

key however that's not necessarily true

play08:19

so let's look at this following race

play08:21

condition

play08:22

imagine i have the three nodes right

play08:24

here and as you can see the leader is

play08:26

the first one and then i'm going to be

play08:28

reading from the other two and the other

play08:29

two

play08:30

comprise a quorum because it's two out

play08:32

of the three nodes so the way that these

play08:35

consensus algorithms work basically

play08:37

is let's look at the operation x equals

play08:39

four

play08:40

as you can see all of the replicas have

play08:42

accepted x equals four so the leader

play08:44

knows it can go ahead and commit x

play08:46

equals four so what it first does is it

play08:48

commits x equals four locally and then

play08:50

it's going to go ahead and say okay guys

play08:52

i'm now going to tell you to commit them

play08:54

however this creates a race condition it

play08:57

means there's a point of time where the

play08:59

leader has committed x equals four but

play09:01

the other two replicas have yet to

play09:03

receive that network call and as a

play09:04

result they have not committed x equals

play09:06

four so now if i read from those other

play09:09

two nodes which do technically comprise

play09:10

a majority even though eventually they

play09:13

will have x equals four at the moment it

play09:15

still looks like x equals three is the

play09:17

most up-to-date value even though if we

play09:19

read the leader we would see that's not

play09:21

the case the one way we can kind of

play09:23

rectify this is by looking at those

play09:25

right ahead logs and saying oh i see

play09:27

that there's another value here in the

play09:29

right ahead log and you know x equals

play09:32

three is probably going to be

play09:33

overwritten so we can retry our quorum

play09:35

read but there's no guarantee that

play09:37

retrying the quorum read is going to get

play09:39

us the most up-to-date value simply by

play09:41

virtue of the fact that hey maybe the

play09:43

the leader crashed after it committed

play09:46

locally and wasn't able to send the

play09:47

commits out to everyone else maybe the

play09:49

there's a network partition between them

play09:51

so the leader can no longer reach these

play09:53

other nodes so there's just a variety of

play09:55

issues

play09:56

in terms of uh you know quorum reads not

play09:59

being exactly perfect but

play10:02

um studies have shown or some research

play10:04

has shown that they can take a

play10:05

significant amount of load off the

play10:07

leader and in certain situations greatly

play10:09

increase read performance

play10:11

so it is something to i guess consider

play10:13

in the back your head maybe research

play10:15

here will improve a little bit and these

play10:17

quorum reads will get even better

play10:20

okay in conclusion coordination services

play10:22

are a really important sub-component of

play10:24

a ton of modern day data storage systems

play10:27

unlike gossip protocols which i

play10:29

introduced in a previous video which

play10:31

basically have nodes randomly pass

play10:32

information from one to the other until

play10:34

it's propagated through the system

play10:36

coordination services are a way of

play10:38

storing the data in a centralized

play10:40

location which is obviously replicated

play10:42

for fault tolerance and uses a consensus

play10:44

algorithm to ensure things like

play10:46

atomicity and a total ordering of the

play10:48

rights so that things never get

play10:50

completely out of whack

play10:52

even though coordination services don't

play10:54

handle strong consistency right out of

play10:56

the box because that would be too big of

play10:58

a performance hit for things like reads

play11:00

you want to be able to scale reads

play11:01

linearly with the number of nodes in the

play11:03

system

play11:04

there are ways of achieving strongly

play11:06

consistent reads that are built into the

play11:08

system still like that sync command you

play11:09

can always just read from the leader and

play11:11

again keep in mind that quorum reads

play11:13

might eventually be a viable thing that

play11:15

a lot of people end up using however

play11:17

another thing to note about strong

play11:18

consistency and this is always the case

play11:20

is that to achieve strong consistency it

play11:23

is at the cost of significantly reduced

play11:25

performance

play11:26

okay so i hope this makes sense for

play11:27

coordination services coordination

play11:29

services or something that we see used

play11:31

in literally a ton of other database

play11:33

technologies and so i think it's

play11:35

important that we introduce them now so

play11:36

you can see where they come up as useful

play11:38

later all right have a good one guys

Rate This

5.0 / 5 (0 votes)

Related Tags
Distributed SystemsCoordination ServicesZooKeeperConsensus AlgorithmsHigh AvailabilityRead Heavy WorkloadsEventual ConsistencyStrong ConsistencyReplicated LogSystems Design