2: Instagram + Twitter + Facebook + Reddit | Systems Design Interview Questions With Ex-Google SWE

Jordan has no life
25 Nov 202343:49

Summary

TLDRThis video script outlines a comprehensive guide to building a social media platform supporting services like Instagram, Twitter, Facebook, and Reddit. It covers essential features including newsfeeds, user following, and Reddit-style nested comments. The speaker discusses system design considerations, database choices, and the use of technologies like Cassandra, Flink, and Kafka to ensure scalability, performance, and consistency. The script also delves into optimizing read operations, managing large volumes of data, and handling popular posts from verified users.

Takeaways

  • 😀 The video covers building a quad-combo service for Instagram, Twitter, Facebook, and Reddit, focusing on similar features like Newsfeeds and Reddit-style nested comments.
  • 🕵️‍♂️ The plan includes supporting a Newsfeed, user following/followers, and configurable privacy types for posts, with an emphasis on optimizing for read operations due to the nature of social media usage patterns.
  • 📈 Capacity estimates are provided, assuming 100 bytes per character for posts and comments, with potential storage requirements calculated for a billion posts per day and a million comments per post.
  • 🔄 The use of Change Data Capture (CDC) is proposed to maintain follower relationships and avoid partial failure scenarios, ensuring data consistency without the need for two-phase commits.
  • 💡 Derived data and stream processing frameworks like Kafka and Flink are recommended for keeping data in sync and ensuring no messages are lost, even in the event of a failure.
  • 🛠️ Cassandra is suggested as the database of choice for the user followers table due to its high write throughput and the ability to handle right conflicts naturally by merging data.
  • 🔑 The importance of proper partitioning and sorting in databases is highlighted to ensure fast query performance, especially for operations like deleting a follower or loading a user's Newsfeed.
  • 📱 For Newsfeed optimization, the video discusses the concept of caching every user's Newsfeed on powerful servers to provide a fast reading experience, even considering the asynchronous nature of data updates.
  • 🔍 A hybrid approach is considered for handling popular posts from verified users with many followers, using a combination of direct database reads and caching strategies to manage the load.
  • 🗣️ The script touches on the implementation of security levels in posts, suggesting storing security permissions within the followers table and allowing Flink to manage these permissions when delivering posts to caches.
  • 🌐 Finally, the video addresses the complexity of implementing nested comments, proposing a depth-first search index similar to a geohash for efficient range queries and good disk locality.

Q & A

  • What is the main focus of the video?

    -The video focuses on building a system that supports features for Instagram, Twitter, Facebook, and Reddit, including newsfeed and Reddit-style nested comments.

  • What are the key features planned for the system?

    -The key features include a newsfeed, support for Reddit-style nested comments, quickly loading who a user is following and who follows them, getting all posts for a given user, low latency newsfeed, configurable privacy types, and optimizing for read operations.

  • Why is optimizing for reads important in the context of a social media site?

    -Optimizing for reads is important because the majority of user interactions on social media sites involve reading or 'lurking' rather than posting, making read operations more frequent.

  • What is the estimated storage requirement for a single post and how does it scale up to yearly storage for a billion posts per day?

    -A single post is estimated to be around 200 bytes, including metadata. With a billion posts per day, this could lead to approximately 73 terabytes of storage per year.

  • How does the system plan to handle the follower and following relationships in a distributed database setting?

    -The system plans to use a change data capture (CDC) method with a single source of truth table and stream processing frameworks like Kafka and Flink to ensure data consistency and avoid partial failure scenarios.

  • What database is suggested for handling the user follower table and why?

    -Cassandra is suggested due to its high write throughput, leaderless replication, and the use of LSM trees, which allow for fast ingestion and buffering in memory.

  • How does the system handle the issue of popular users with millions of followers?

    -For popular users, a hybrid approach is used where posts are read from the Post DB directly, and a caching layer for popular posts is introduced to handle the high volume of followers efficiently.

  • What is the proposed method for implementing configurable privacy levels for posts?

    -The implementation involves storing additional information in the followers table to indicate the security level of the relationship, which is then used by the Flink consumer to filter posts accordingly.

  • What challenges arise when considering the replication of comments in a social media system?

    -Challenges include maintaining causal dependencies and ensuring that the state of the replicas makes sense, avoiding situations where a comment's child exists on a replica but not its parent.

  • How does the video script address the problem of reading nested comments efficiently?

    -The script suggests using a depth-first search index, similar to a geohash, which allows for range queries to efficiently retrieve entire branches of comments.

  • What is the overall architecture of the system presented in the video?

    -The system architecture includes services for user management, follower relationships, post management, and comments, with databases like MySQL, Cassandra, and potentially a graph database or a depth-first search index for comments, all interconnected through Flink nodes for stream processing.

Outlines

00:00

📺 Introduction to Building Social Media Services

The speaker introduces a video tutorial on constructing four similar social media services: Instagram, Twitter, Facebook, and Reddit. The goal is to build these services within an hour, covering features like Newsfeed and Reddit-style nested comments. The speaker also mentions personal anecdotes, such as needing a haircut and having eaten a lot, indicating a casual and humorous tone. The importance of optimizing for read operations due to the read-heavy nature of social media use is emphasized, along with initial capacity estimates for posts and comments storage.

05:02

🔍 Database Design for Efficient Follower and Following Operations

This paragraph delves into the challenges of database design for efficiently querying followers and followings. The speaker discusses the limitations of traditional indexing and the benefits of using change data capture (CDC) to maintain consistency without resorting to two-phase commits. The use of stream processing frameworks like Kafka and Flink is proposed to ensure no data loss and to update derived data. The choice of Cassandra as the database is justified due to its high write throughput and the use of leaderless replication and LSM trees.

10:04

🛠 Optimizing Newsfeed Generation for Social Media

The speaker outlines the process of generating a Newsfeed, discussing the naive approach of aggregating posts from a sharded database and the more optimal method of using Flink to manage user-following relationships and post deliveries. The importance of caching entire Newsfeeds in memory for quick access is highlighted, along with the potential use of multiple replicas to distribute the load. The challenges of delivering posts from popular users with millions of followers are also touched upon.

15:04

🔄 Hybrid Approach for Newsfeed Caching

The paragraph introduces a hybrid approach to handle Newsfeed caching, especially for popular users who may have a high volume of followers. The speaker suggests using change data capture to update both the Post DB and a popular post cache asynchronously. The process involves Flink consumers partitioned by user ID to manage data efficiently, ensuring that updates to posts and security permissions are propagated correctly.

20:06

🗨️ Designing for Nested Comments in Social Media

This section focuses on the complexities of designing a system to handle nested comments, like those found on Reddit. The speaker discusses the trade-offs between breadth-first and depth-first search approaches for loading comments and the challenges of using non-native graph databases. The limitations of binary search in databases and the advantages of native graph databases with pointers for faster depth-first search are explained.

25:07

📚 Implementing Depth-First Search Index for Comments

The speaker proposes a depth-first search index for comments, inspired by geohashing, to improve the performance of loading nested comments. The method involves creating a comment index based on the full path of comments, allowing for efficient range queries to retrieve entire branches of comments. The use of single-leader replication for the comment database is justified to maintain causal dependencies and ensure up-to-date comment data.

30:09

🌐 System Architecture for Social Media Services

The final paragraph presents a comprehensive system architecture diagram for the social media services discussed in the video. It includes services for user management, follower relationships, post management, and comments, each with their respective databases and data flow managed by Flink nodes. The architecture aims to balance consistency, speed, and scalability, with a focus on efficient data processing and caching strategies.

Mindmap

Keywords

💡Quad combo video

A 'quad combo video' refers to a single video that combines content for four different services or topics. In the context of the video, it is used to describe the format where the creator intends to build and explain four different services within one video due to their similarities. This approach is taken to optimize time and provide comprehensive coverage in a single sitting.

💡Newsfeed

A 'Newsfeed' is a continuously updating list of personal messages, posts, or other content from a user's social network. In the video, the creator discusses the necessity to support a Newsfeed for the social media services being discussed, emphasizing the challenge of making it not only functional but also quick to load, which is a critical feature for user experience.

💡Nested comments

'Nested comments' are comments that are replies to other comments, creating a hierarchical structure similar to a tree. The video script discusses supporting Reddit-style nested comments, where comments can be infinitely nested, forming a tree-like structure. This feature is important for social media platforms as it facilitates deeper discussions and interactions among users.

💡Change data capture (CDC)

Change data capture is a method of tracking changes made at the database level so that this information can be used for various purposes, such as updating derived data. In the video, the creator opts for CDC to maintain consistency across different databases and to avoid partial failure scenarios, which is crucial for features like follower relationships and news feed updates.

💡Stream processing

Stream processing refers to the analysis and processing of data in real-time as it flows in a stream or sequence. The video script mentions using stream processing frameworks like Flink to ensure that data changes are captured and processed in real-time, which is essential for features that require up-to-date information, such as news feeds and comment threads.

💡Cassandra

Cassandra is a highly scalable, distributed NoSQL database designed to handle large amounts of data across many commodity servers. In the context of the video, the creator suggests using Cassandra for certain databases due to its high write throughput and leaderless replication, making it suitable for social media platforms with high data volume and velocity.

💡Partitioning and sorting

Partitioning and sorting are database management techniques used to organize data for efficient querying and storage. The video discusses using partitioning on user ID and sorting on follower ID to optimize the performance of follower and following relationships, ensuring that related data is stored together and can be quickly accessed.

💡Load balancing

Load balancing is the distribution of workload across multiple systems or components to ensure no single system is overwhelmed and to improve response times. In the video, the creator talks about load balancing the feed caches to ensure that the system can handle a large number of requests, especially for popular posts with potentially millions of viewers.

💡Hybrid approach

A 'hybrid approach' combines different methods or technologies to achieve an optimal solution. The video script discusses using a hybrid approach for handling popular posts from verified users with many followers, suggesting a combination of direct database reads and cached data to improve performance and reduce latency.

💡Graph database

A graph database is a type of database that stores data in a graph structure with nodes, edges, and properties to represent complex relationships. The video mentions considering a graph database for managing nested comments to efficiently perform depth-first searches and maintain causal dependencies in the comment threads.

💡Depth-first search index

A depth-first search index is a method used to traverse or search tree or graph structures in a depth-first manner. The video script describes building a depth-first search index for nested comments to optimize the retrieval of comment branches, which involves using a path-like indexing system to perform range queries and efficiently load more comments.

Highlights

Introduction of a quad-combo video covering the development of four similar services within an hour.

Building services for Instagram, Twitter, Facebook, and Reddit with shared features like Newsfeed and Reddit-style nested comments.

Supporting quick loading of followers and followings, and fetching posts for a given user with an efficient solution.

Challenges of creating a low-latency Newsfeed and the importance of optimizing for reads over writes in social media platforms.

Requirement of supporting configurable privacy types for posts as requested by users.

Estimation of storage capacity for a billion posts per day and the implications for database design.

The use of change data capture (CDC) to maintain follower relationships and avoid partial failure scenarios.

Discussion on the choice of database for follower relationships, leaning towards Cassandra for its write throughput.

Explanation of partitioning and sorting keys in Cassandra for optimizing user follower table queries.

Innovative approach to Newsfeed generation using in-memory caches for each user's Newsfeed.

Use of Apache Flink for processing streams of data and updating Newsfeed caches.

Addressing the challenge of popular users with millions of followers and the introduction of a hybrid Newsfeed approach.

Implementation of a depth-first search index for nested comments to optimize read efficiency.

Comparison between native and non-native graph databases for handling nested comments.

Design of a single-leader replication system for comments to maintain causal dependencies.

Final system design overview connecting all services and databases for a comprehensive understanding.

Call to action for feedback and critique on the presented system design.

Transcripts

play00:00

hello everybody and welcome back to the

play00:02

channel today we'll be doing a quad

play00:05

combo video of four different Services

play00:07

which we're all going to build in one

play00:08

video cuz they're all pretty damn

play00:10

similar so hopefully we can get through

play00:12

it within an hour because I have to get

play00:14

a haircut after this and even besides

play00:16

that I've done myself the absolute favor

play00:18

of eating a ton of chicken wings and

play00:19

protein shakes today and at some point I

play00:22

may need a 15 to 20 minute break to go

play00:24

spill out my guts anyways let's go ahead

play00:26

and jump into this thing cuz I need to

play00:28

get started all right so let's go ahead

play00:30

and dive on into it so today we are

play00:32

going to be doing Instagram Twitter

play00:34

Facebook and Reddit well that's

play00:36

obviously a lot so let's actually talk

play00:38

about the features that we plan on

play00:39

supporting I assume all of you use at

play00:41

least one of these things so you'll

play00:43

probably understand what I'm talking

play00:44

about when I say we're going to try and

play00:46

support a

play00:47

Newsfeed and in addition to that we're

play00:49

also going to be supporting Reddit style

play00:51

nested comments where the comments

play00:54

themselves basically form a tree you can

play00:56

have some top comments and then you can

play00:58

also have load more buttons that are

play00:59

going going to quickly fetch basically

play01:01

the next branch of comments below that

play01:03

so hopefully that makes sense let's go

play01:06

ahead and move on to some

play01:08

requirements so we've got a few

play01:09

different objectives in a problem like

play01:11

this oh boy my throat is already

play01:13

starting to get sore but I'm going to

play01:14

push through it so the first is that of

play01:17

course we always want to be able to

play01:18

quickly load who we're following and who

play01:20

follows us those are just two common

play01:22

features of all of these applications

play01:24

additionally we want to be able to get

play01:25

all the posts for a given user I say

play01:28

this because when you see our eventual

play01:29

Sol solution this isn't necessarily

play01:31

something that comes for free with that

play01:32

we need to be able to support this as

play01:34

well eventually as well we want to be

play01:36

able to support a low latency Newsfeed

play01:39

making a Newsfeed is easy making it

play01:41

quick is hard there are a lot of talks

play01:42

from Twitter Engineers to prove this

play01:44

number four is that we want to be able

play01:46

to support configurable privacy types

play01:48

this isn't that hard but someone did ask

play01:50

for uh for this in the comments of the

play01:52

last version of this video so I figured

play01:54

screw it why not throw this one in there

play01:56

and then they also asked for Reddit

play01:58

style comments where they can be

play01:59

infinitely nested now if you're thinking

play02:01

about the use patterns of something like

play02:03

a social media site hopefully it makes

play02:05

sense that 90% of the time that you're

play02:06

on there you're probably just lurking in

play02:08

reading stuff and you're posting pretty

play02:10

infrequently so as a result the main

play02:12

thing we want to keep in mind here is

play02:13

that we're going to be aiming to

play02:14

optimize reads as opposed to writs that

play02:16

is going to be very

play02:18

important okay so let's start thinking

play02:20

about some capacity estimates let me

play02:22

zoom in a little bit here so the first

play02:25

thing is that I'm going to assume there

play02:26

are 100 characters of post get that one

play02:28

from Twitter even though it's 140 let's

play02:30

do 100 for making the math a little bit

play02:32

easier round 100 bytes because you know

play02:35

a character is basically a bite and then

play02:37

additionally let's estimate that you

play02:39

know other metadata like user ID uh post

play02:41

time stamp stuff like that adds another

play02:43

100 bytes so maybe we can assume that

play02:45

there's 200 bytes in a single post

play02:48

additionally if we have a billion posts

play02:50

per day which is actually pretty

play02:51

realistic for some of these sites you

play02:53

could be looking at uh 73 terabyt per

play02:57

year of storage that's a lot it's

play02:59

actually not a ton when you're a

play03:01

platform like this and you make a ton of

play03:02

money through ads 73 terabytes a year is

play03:04

pretty little but we are going to need

play03:05

more storage than that we'll discuss

play03:06

that later additionally let's assume

play03:09

that the average user has around 100

play03:11

followers I personally have no followers

play03:13

I like to just tweet into the abyss and

play03:15

that there are some verified users with

play03:17

Millions for example myself all the

play03:19

women in the world follow me next we

play03:21

also have comments I'm going to limit

play03:23

those to 100 characters for Simplicity

play03:25

sake and again that means they're

play03:27

probably around 200 bytes in total store

play03:30

that and uh also when we start to think

play03:32

about comments and making them

play03:34

infinitely nested it's important to have

play03:36

a sense of you know how much data

play03:38

comments for a post actually take I'm

play03:40

going to assume there can be up to a

play03:41

million comments per post and that means

play03:43

up to 200 megabytes of storage required

play03:45

because a million time 200

play03:48

bytes okay so the first thing that we're

play03:50

going to talk about is starting to fetch

play03:52

our follower SL following so the issue

play03:55

here is that we want both of these

play03:57

operations to be very fast right we have

play03:59

this one over here where we get the

play04:01

followers for a specific user and then

play04:03

we also want to get all of the users

play04:05

that a specific user is following and we

play04:07

want both of those to return quickly now

play04:09

the issue with this is that if you were

play04:11

to choose a database table that is

play04:14

indexed in a manner such that it is

play04:16

either by the user ID for all of their

play04:18

followers or the user ID for all of

play04:20

their followings that is going to be

play04:22

very slow for the other type the reason

play04:25

why is that these queries are going to

play04:26

be distributed so if we look over here

play04:29

on the right side of the screen let's

play04:31

imagine that we used this type of table

play04:33

over here right where we've got one user

play04:36

and then a follower ID and that

play04:38

represents a following relationship and

play04:40

then we go ahead and index on that user

play04:43

ID so you can see the user ID field is

play04:45

sorted that's why fours are at the

play04:47

bottom six is at the top and then let's

play04:49

say we go ahead and partition also on

play04:51

that user ID because they're going to be

play04:52

a ton of follower following

play04:53

relationships we're not going to be able

play04:54

to store them all on one database table

play04:57

and so that's going to be really good

play04:58

when we want to find all the followers

play04:59

of one particular user however it's

play05:01

going to break down in a distributed

play05:03

setting when we want to find all of the

play05:04

users that a user follows so let's say

play05:07

we wanted to find all the users that

play05:09

user one follows so that would be this

play05:12

guy over here as you can see because we

play05:14

are partitioning on basically the person

play05:17

that has a following getting all of a

play05:20

person's uh users that they follow is

play05:22

going to be very challenging because

play05:23

we'd have to do a distributed query we

play05:25

wouldn't have any indexing within those

play05:27

nodes we would have to do a linear time

play05:29

sort or rather a linear time scan to go

play05:32

through every single row on here and

play05:34

every single row on here and then we'd

play05:36

have to aggregate them somewhere on some

play05:38

other server and then we would have to

play05:39

return that back to the user and that

play05:41

probably just isn't feasible so what

play05:44

I've opted to ultimately do instead is

play05:46

use an actual derived data change data

play05:49

capture type of method the reason I opt

play05:52

for something like change data capture

play05:54

here with one source of truth table is

play05:56

the fact that it helps us avoid partial

play05:58

failure scenarios because keep in mind

play06:00

that if you're a client and you're

play06:01

writing to two different databases at

play06:03

once one this is one database here's the

play06:06

other database you know one of these

play06:08

rights could fail one of them could

play06:09

succeed and really the only way to

play06:11

guarantee that doesn't happen is going

play06:13

to be something like two-phase commit

play06:15

and that's going to be really slow so if

play06:17

we really want to ensure consistency

play06:19

another thing that we could do is use

play06:21

something like change data capture and

play06:23

then use one of our coveted stream

play06:24

processing Frameworks to ensure that

play06:26

none of those rights get lost you can

play06:28

actually use one of the tables as shown

play06:31

over here and you would use CDC from it

play06:34

to basically go into something like

play06:36

Kafka the reason Kafka is good is

play06:37

because it's replicated so it's fa

play06:39

tolerant and it is also basically a

play06:41

persistent log so we know that if a

play06:43

message uh doesn't get processed in the

play06:45

moment we can always go back and process

play06:47

it later and then we've got something

play06:48

like Flink which if you recall is going

play06:50

to checkpoint State

play06:52

occasionally and this is going to make

play06:54

sure that we're never not processing a

play06:57

single message and then we can basically

play06:59

Al take it and update our other table

play07:01

and this is going to be our derived

play07:04

data so you may notice that I've chosen

play07:07

to have the user followers table be the

play07:09

source of truth that means for user X

play07:13

who

play07:15

follows X and if you recall that schema

play07:18

looks something like we've got our user

play07:20

over here and then all the people that

play07:22

follow them on the right and we index by

play07:25

that user ID and we can partition on

play07:27

that as well so then of course we've got

play07:29

Flink listening to all of those

play07:30

different partitions and then uploading

play07:33

another table keeping it in check now

play07:35

you may think to yourself well it is

play07:36

possible that Flink uh in a failure

play07:38

scenario might upload duplicate messages

play07:41

to the user following table that's

play07:42

really not a huge deal because at the

play07:44

end of the day a duplicate upload can

play07:46

just be duped right like if I have 4 and

play07:49

22 and then I have 4 and 22 again I

play07:51

could just say well if it's already in

play07:53

there just don't add it not a huge

play07:55

deal okay so now that we have kind of an

play07:58

idea of how we're actually going going

play07:59

to maintain our follower relationships

play08:01

within databases what type of actual

play08:04

database should we be using to do so so

play08:06

of course we do want to have good read

play08:09

speeds and good write speeds for these

play08:10

tables especially for this guy over here

play08:13

because it doesn't have any buffering

play08:15

before the rights get there it's

play08:17

basically just taking all the rights as

play08:18

they come in so it's important that we

play08:20

also do have some fast ingestion here so

play08:23

in my opinion I think something like

play08:24

Cassandra would be really good Cassandra

play08:26

is very good in terms of it right

play08:27

throughput for a couple of reasons one

play08:29

of which is that it uses leaderless

play08:30

replication so rights can go to any

play08:32

replica another is that it uses LSM

play08:34

trees so rights are first buffered in

play08:36

memory but the point here is that like I

play08:38

mentioned we don't really have conflicts

play08:40

or right conflicts to worry about in the

play08:43

relationship of kind of a user and their

play08:45

follower because at the end of the day

play08:47

you're just merging them all together

play08:48

right like if I say that a user has one

play08:51

follower on one replica and then I say

play08:53

that a user has a follower B on another

play08:55

replica the the kind of combined state

play08:58

of those two is just just great now this

play08:59

user has two followers so again right

play09:04

conflicts not an

play09:06

issue and this was unlike our tiny URL

play09:09

video where they were certainly an issue

play09:11

so again this is good for the reasons I

play09:12

mentioned over here it's going to be

play09:14

fast and basically what should our

play09:16

partition and sortkey be if we are going

play09:18

to have a database like this especially

play09:20

for our user follower table because

play09:22

that's where we really care about the

play09:24

latency so my thinking here is well the

play09:27

obvious thing to partition on would be

play09:28

the user right because we're already

play09:30

indexing on there we want all of the

play09:32

user's followers to be on a single

play09:34

database node or a single database

play09:36

partition because that is going to make

play09:37

the queries a lot faster we don't have

play09:39

to do any you know aggregations after we

play09:41

hit multiple partitions and in addition

play09:44

to that by actually sorting on the

play09:46

follower following ID this just means

play09:49

that if we ever need to delete a row say

play09:51

a follower stops following a person we

play09:53

can go hit that partition quickly find

play09:56

their follower and then go delete that

play09:58

row

play09:59

so you know the example would be here's

play10:01

partition one here's partition two you

play10:04

may notice that uh obviously we're not

play10:06

using just range based sorting or else

play10:09

uh number six would be on this partition

play10:12

and number 18 would be on the other

play10:13

partition we're using a hash range

play10:15

sorting or or rather a hash range

play10:17

partitioning because ideally that is

play10:19

going to load balance our tables a

play10:21

little bit better keep the partitions

play10:23

more balanced consistent hashing yada

play10:26

yada yada you guys get it by now if you

play10:28

don't I recommend watching the tiny URL

play10:30

video and the consistent hashing video

play10:32

for some more details there so we've

play10:34

spoken about how we actually want to go

play10:36

ahead and make our follower and

play10:38

following tables how we're using derive

play10:40

data and change data capture to keep

play10:42

those up and it's very important that we

play10:44

actually have those in check because

play10:46

they are crucial for actually

play10:48

maintaining a Newsfeed the reason why is

play10:50

that for any given user we need to know

play10:52

who follows them if I post it's very

play10:55

important that basically uh all of my

play10:57

followers are going to see my my post

play10:59

and so we need to be able to do that and

play11:01

then of course once we actually go and

play11:04

take a person and basically all of who

play11:06

they follow then you have to generate

play11:08

all the posts for them so kind of the

play11:10

naive way of doing this is you've got a

play11:13

client right this is the reader of

play11:17

Newsfeed the first thing they could do

play11:19

is potentially hit the user following

play11:21

table now to clarify that's just who X

play11:26

follows so that's going to respond with

play11:29

some people and then they're going to go

play11:30

and reach out to the post DB which is

play11:33

probably going to be sharded on user ID

play11:35

because that is the most rational way of

play11:36

doing things and then it's going to

play11:38

aggregate them all on one server blah

play11:41

blah blah blah

play11:42

blah and return it back to the user and

play11:45

the reason I said this was naive is

play11:47

because all of those distributed queries

play11:49

and then the aggregation step is

play11:51

probably just going to be too slow

play11:53

you're basically bottlenecked by

play11:54

whatever the slowest query could

play11:55

possibly be and that is a bad thing and

play11:58

hence this this is why I call this the

play12:00

naive way of building the news feed in

play12:02

reality what we would like to do is put

play12:03

in a little bit more work on the right

play12:05

path so that we can speed up our read

play12:07

path a little bit more so let's actually

play12:09

start to talk about that here's what I

play12:11

would call the more optimal Newsfeed so

play12:14

the first thing is that we should

play12:16

recognize that if we really want to make

play12:18

a news feed as fast as possible we would

play12:20

somehow have to index all of the tweets

play12:23

by which news feed they belong to and

play12:25

the issue with that is that you can't

play12:27

obviously do that because it belongs to

play12:28

many different news feeds in fact we

play12:30

even said it belongs to around a 100 of

play12:32

them because the average user has around

play12:34

100 followers so the issue is that you

play12:38

know because there are so many different

play12:39

places to put this tweet we would have

play12:41

to end up storing a ton of data well how

play12:43

much data actually we estimated there

play12:45

were around a billion tweets a day and

play12:47

there were around 200 bytes a tweet and

play12:49

what if we did actually store 100 copies

play12:51

of every single tweet and put it in a

play12:53

different index well we could and then

play12:55

we would actually only have 20 terabytes

play12:57

of tweets per day which if if you think

play12:59

about it is really not that bad for a

play13:00

company like Twitter I mean they have

play13:02

literally millions of dollars of servers

play13:04

20 terabytes is nothing for them and

play13:06

especially if we want to make this

play13:07

really fast and throw it in memory let's

play13:10

say we had 256 GB super beefy hosts

play13:13

where they actually have that much RAM

play13:14

in them that's only around 80 maybe we

play13:16

can round that up to 100 in memory

play13:18

caches and so effectively what we can

play13:20

actually do is go ahead and cach every

play13:23

single user's Newsfeed and we can do it

play13:25

on one of these super beefy servers

play13:27

maybe we've got a couple of replicas of

play13:29

them so maybe instead of 80 it's more

play13:31

like 200 but that's okay again these are

play13:33

massive sites they make lots of money

play13:36

200 massive servers is not going to make

play13:38

or break things for them so let's

play13:40

actually start looking at the newsfeed

play13:42

diagram or at least the initial sketch

play13:44

of it again this is going to be a lot to

play13:46

ingest so no worries if it takes you a

play13:48

little bit to uh put this all down you

play13:50

can always pause the video and go back

play13:52

for a second so let's say that we've got

play13:54

client six over here the first thing

play13:56

they're going to do is make a post

play13:58

that's going to hit our post database

play14:00

note that again I'm using change data

play14:02

capture here because it keeps everything

play14:04

consistent and we don't have to worry

play14:05

about two-phase commit that is then

play14:07

going to be ingested into a Kafka que

play14:09

for the same reasons that I mentioned

play14:11

before this gives us fault tolerance and

play14:13

it gives us replayability which is very

play14:15

important for this guy here Flink so in

play14:18

this case as opposed to just ingesting

play14:20

one stream of data this Flink consumer

play14:22

is going to be getting data from two

play14:23

sources the first is going to be the

play14:26

user followers table which if you recall

play14:28

from before is going to be that source

play14:29

of Truth which means that whenever a

play14:31

following is done it goes right into

play14:33

this table and so what's going to now

play14:35

happen is that this user or rather this

play14:37

Flink consumer can cach who follows user

play14:41

six pardon me as I accidentally erase

play14:43

when I mean to write but the gist is

play14:46

that as you can see Flink is going to

play14:48

look at all of those user follower

play14:50

relationships and it can actually Shard

play14:52

itself on that user ID of this table so

play14:55

keep in mind this table is user ID

play14:58

follower

play15:00

ID and the good thing about that is that

play15:02

these guys can be sharded the same way

play15:04

because keep in mind post DB is also

play15:06

sharded by user ID and so now Flink or a

play15:10

particular Flink consumer only has to

play15:11

hold a subset of this data and so it can

play15:13

keep it all in memory and so let's see

play15:16

now flank says okay well we know that uh

play15:18

this post has to be delivered to user

play15:20

one user 10 user 22 and 44 and then it's

play15:23

going to go ahead and write to the

play15:24

appropriate feed caches we can load

play15:26

balance and partition these feed caches

play15:28

such that they each have certain ranges

play15:30

of users that they represent and then

play15:32

the Flint consumer can figure out where

play15:34

it needs to send it it'll probably have

play15:35

to reach out to zookeeper hit a load

play15:37

balancer and then boom there you go now

play15:40

we're in some caches so let's quickly

play15:42

look at a couple of our notes to clarify

play15:44

things basically Flink can uh go ahead

play15:47

and keep the state of all the followers

play15:49

as I mentioned and the reason that this

play15:51

is so important that it keeps this state

play15:53

is because it is a major optimization on

play15:55

having to make a network call to the

play15:57

database every single time saying well

play15:59

who does user 6 actually have as a

play16:01

following that would be very slow and in

play16:03

addition to that it would be a ton of

play16:05

repeat computation additionally as new

play16:09

followers come in we're actually

play16:10

streaming that data up to Flink via

play16:12

change data capture so it is up to date

play16:14

we don't have to worry about that and

play16:17

then again we talked about sharding

play16:18

we're basically sharding on the poster

play16:20

ID which is the user ID here and the

play16:23

user ID which the post DB is also

play16:25

sharded on so this fling consumer should

play16:27

actually be able ble to handle all of

play16:29

this data because we can partition it in

play16:31

a Smart Way such that there's not enough

play16:33

to overload it Okay so we've spoken

play16:36

about that and then Additionally you may

play16:38

be thinking to yourself well this works

play16:39

great for new tweets but what if we want

play16:41

to edit a tweet what if we want to

play16:43

potentially update security permission

play16:45

changes on it what if that tweet all of

play16:47

a sudden gets new followers well keep in

play16:49

mind that all of this data is actually

play16:51

flowing through into this Flint consumer

play16:53

and then it can do the necessary logic

play16:55

to send those updates to the required

play16:57

caches now of course keep in mind this

play16:59

is pretty expensive right objectively it

play17:01

is now taking a tweet a very long time

play17:04

to get into one of these Newsfeed caches

play17:06

but at the same time it doesn't really

play17:08

matter if a tweet takes a while to get

play17:10

there because all that matters is the

play17:11

user sees their tweet hit the post DB

play17:14

and then they think they're good and

play17:15

then maybe 5 minutes later it gets into

play17:17

everyone's caches but they don't have to

play17:19

know that it's all asynchronous and at

play17:20

the end of the day this is going to make

play17:22

everyone else's reading experience a lot

play17:24

better and they're going to keep coming

play17:25

back to our site so let's actually go

play17:28

ahead and talk about our post database

play17:30

and schema because I kind of brushed

play17:32

over that we haven't really mentioned it

play17:33

very much yet and it's important that we

play17:35

do keep this fast because in the case of

play17:38

the user followers table it's actually

play17:40

the same thing if rights are going

play17:41

directly to this guy it needs to be able

play17:43

to ingest them quickly or else it will

play17:45

become a bottleneck in our system so for

play17:47

the same reasoning as before I again

play17:49

wanted to use Cassandra but this time I

play17:52

wanted to partition it in a slightly

play17:53

different way where I would Partition by

play17:56

the user ID so that you keep all posts

play17:58

from the same user on a given node and

play18:00

then within a given partition we want to

play18:02

use the sort key as the time stamp or

play18:05

rather the time stamp as the sort key

play18:06

and this means that when we run a query

play18:09

let's say we've got 69 and then what's

play18:11

today's day we've got

play18:13

1120

play18:15

2023 now when I load all of these posts

play18:17

they're already going to be in presorted

play18:19

order and that's going to keep our query

play18:21

very fast for when we say get

play18:24

posts of a specific user and again

play18:29

that's what I mentioned over here it is

play18:31

going to make sure that basically

play18:33

partitioning for a given user is nice

play18:35

and fast and as a result of that that

play18:37

endpoint will be

play18:40

reasonable Okay so we've gone over our

play18:42

actual table schema for the Post DB

play18:44

which ensures that if we do want to load

play18:45

posts for a given user that should

play18:47

hopefully be decently fast it should all

play18:49

be on the same partition it should

play18:50

already be presorted by timestamp which

play18:52

is great however we do come across one

play18:55

big problem which is going to be popular

play18:57

users so I did did mention that some

play18:59

users are verified and they're actually

play19:01

going to have millions of followers so

play19:04

the problem with this is that our

play19:06

previous step of basically uploading all

play19:08

of our posts using change data capture

play19:10

and then basically using a bunch of

play19:11

different Newsfeed caches as our sync is

play19:14

going to fail the reason being that when

play19:16

you have millions of followers that post

play19:18

has to get delivered to many many

play19:20

different places and that is going to be

play19:21

heinously slow so what could we

play19:24

potentially do instead what we're going

play19:26

to try to do is use a hybrid approach so

play19:30

in this case let's imagine that we've

play19:32

got our Newsfeed reader so our bad post

play19:35

would exclusively go to the Post DB and

play19:37

aggregate our ideal approach is that we

play19:40

exclusively read from the feed Pat but

play19:43

it's possible that what we could do

play19:45

instead is say only for the verified

play19:47

users we want to read from the post DB

play19:50

and for the non-verified users basically

play19:53

we can read from our feed cache and then

play19:54

we'll aggregate those ideally it'll

play19:56

result in a lot fewer partitions of the

play19:59

database that we'll have to hit but even

play20:01

still we are still going to hit a bunch

play20:03

of the partitions of that database and

play20:05

that could still be too slow so how

play20:07

could we make this a little bit better

play20:09

well what we could do is introduce a

play20:12

potential caching layer for our popular

play20:14

posts so the nice thing about this is

play20:17

that when it comes to popular posts or

play20:19

rather posts from popular users we

play20:21

actually know in advance that they're

play20:23

going to be popular if someone with a

play20:25

million followers is going to make a

play20:26

tweet we know it's probably going to get

play20:27

at least you know 10,000 100,000 views

play20:30

within the first few minutes and as a

play20:32

result when that person makes a post we

play20:34

can actually pre-load a cash so that

play20:36

everyone else can load it so what this

play20:38

is going to look like is actually going

play20:40

to be very very similar to our previous

play20:43

setup to load all of our Newsfeed caches

play20:45

for example here's me as I mentioned all

play20:47

women on Twitter follow me they love me

play20:50

so the first thing that we do is put it

play20:52

in our post DB this is going to be

play20:53

common amongst all tweets the next thing

play20:56

that it's going to do is go overse CDC I

play20:59

forgot to write out the cofa Q here but

play21:00

hopefully that makes sense into Flink

play21:03

and then similarly what's going to

play21:04

happen is that we've got our users table

play21:06

over here which contains for a given

play21:08

user ID whether that user is verified or

play21:11

not and so again over change data

play21:13

capture this can go ahead and hit Flink

play21:16

and we can again Shard this by the user

play21:18

ID right so again

play21:20

Shard by user ID and so this way when a

play21:25

post from Jordan hits Flink Jord 's

play21:28

verified status is also going to be in

play21:30

Flink and so as a result Flink can say

play21:33

oh Jordan is in fact verified all the

play21:35

ladies love him let's go and put one of

play21:37

his posts in the popular post caches

play21:40

over here and so by using this style we

play21:42

accomplish a couple of things the first

play21:44

is that one uh we don't have to use a

play21:47

traditional like right through based

play21:48

caching approach where we would have to

play21:50

either use two-phase commit to ensure

play21:52

cash consistency or just in general we

play21:55

would have to have a partial failure

play21:56

scenario where our cash cash might get

play21:58

uploaded but our database wouldn't or

play22:00

our database you know the right goes

play22:02

through and the cash doesn't so this

play22:03

makes sure that everything is consistent

play22:05

keeps everything asynchronous and it

play22:07

also allows us to see whether or not a

play22:09

given user is verified when they

play22:11

actually make a tweet and again the nice

play22:14

thing about this is that post edits will

play22:15

also go back through here back into

play22:17

flank and then we can go ahead and

play22:19

upload our popular post cache assuming

play22:22

that we shred these caches by user ID we

play22:25

know exactly where a given popular

play22:27

user's post is going to live and that

play22:29

makes our life super easy so what does

play22:31

things actually look like again well now

play22:34

the gist is that um if we want to use

play22:37

our hybrid solution we would basically

play22:40

also read from the

play22:42

cache okay let's figure out where I'm at

play22:46

cuz now even I'm starting to lose track

play22:48

so kind of the next thing that I want to

play22:51

be touching upon here is the following

play22:53

concept we have mentioned that we want

play22:56

to read all of our popular posts from a

play22:59

popular post cache however a challenging

play23:02

part of this is that for a given reader

play23:04

right let's say this is the reader it's

play23:06

going to the news feed

play23:08

service

play23:10

feed

play23:12

service and then it's going to you know

play23:14

popular posts popular and Newsfeed cach

play23:20

news

play23:21

feed the only way that we actually know

play23:24

what to read from the popular posts is

play23:27

by figuring out who this guy follows

play23:29

that is

play23:30

verified and that in it of itself could

play23:33

be a little bit of a tough query because

play23:35

even though we store all of our follower

play23:37

following relationships already and

play23:39

that's decently fast in order to

play23:42

actually get a sense of whether or not a

play23:44

particular person that I follow is

play23:47

verified I would have to join that on

play23:48

the user table and that could of course

play23:50

become slow so how can we go ahead and

play23:53

do this well again we can use derived

play23:56

data you might start to uh figure out a

play23:58

pattern here which is that I do love me

play24:00

some stream processing and I love using

play24:03

derived data because derived data allows

play24:05

us to pre-compute data in the most

play24:07

optimal format so that we can make

play24:08

things as fast as possible now how could

play24:11

we actually do this well yet again we

play24:13

could use another flank node we could

play24:15

have our users table which already

play24:17

exists sharded by users ID we could have

play24:20

our user following table which if you

play24:22

recall is already a piece of derived

play24:24

data so we're actually now using change

play24:26

data capture on derive data and then

play24:29

what we do is by merging these two

play24:31

things into two different streams that

play24:33

go into the same Flink node and

play24:35

partitioning properly we can actually

play24:37

tell for a given user not only who they

play24:39

follow but if they're verified so from

play24:41

the user following table I can say for

play24:43

example I know user 10 follows user 3

play24:46

and user 22 from this table over here I

play24:50

can say well I know user 3 is verified

play24:53

and you might say to yourself ah shoot

play24:54

well it seems that the user's table

play24:56

actually has to put verif verified

play24:59

users on every single Flint

play25:02

consumer every

play25:04

consumer my argument to you here would

play25:07

be that there aren't that many verified

play25:08

users I can't imagine there would be

play25:10

more than like 10,000 of them and so a

play25:12

10,000 person set is really not a big

play25:14

deal um but the gist is that as a result

play25:17

of having this small verified table in

play25:20

memory on every single node we can

play25:22

quickly tell oh you know what hey user

play25:25

10 is actually now following user 3 that

play25:27

person is verified let's go ahead and

play25:29

upload to the cache over here and now we

play25:32

can see that user 10 is following

play25:34

verified user 3 and we can use this as a

play25:38

cach to figure

play25:42

out my

play25:47

verified um following so for example if

play25:50

I'm following Mr Beast if I'm following

play25:52

Donald Trump if I'm following Obama it

play25:54

would all be in this verified cache cool

play25:58

so let's quickly touch upon security

play26:00

levels in posts because uh this was a

play26:03

specific request for this video and I

play26:04

would like to honor that so let's say

play26:07

that a user can actually specify whether

play26:09

a post is for all of their followers or

play26:11

let's say a close friend followers we'll

play26:13

keep it simple and say there's only two

play26:15

configurable levels right now however it

play26:17

is going to be the case that uh you know

play26:19

this kind of scales out to three or four

play26:21

or five levels of potential security

play26:23

anyways so the easiest way to implement

play26:25

this at least in my opinion is just by

play26:27

actually going ahead and putting all of

play26:29

this information in the followers table

play26:31

so we already have our user followers

play26:33

table which defines the relationship uh

play26:35

that these guys are having with one

play26:37

another and so for example uh we've got

play26:40

user one follower two so that means that

play26:42

two follows user one and you can see

play26:45

that their security level is all same

play26:48

goes for here user 3 follows user one

play26:51

their relationship is close friend so

play26:53

recall that in Flink oh boy nice voice

play26:56

crack there

play26:58

in flank we actually have access to this

play27:00

data so it would say something like uh

play27:03

one has two which is a uh sorry an all

play27:10

follower two all and then three close

play27:15

friend so when a post comes

play27:18

in post of close

play27:23

friend it can say ah you know what I'm

play27:25

actually only going for this guy here

play27:27

and not for all and so as a result of

play27:30

that uh we can actually continue to uh

play27:34

use our existing posting pipeline uh by

play27:36

just storing this data as well within

play27:38

our Flint consumer now it is unfortunate

play27:41

that uh basically you know changes to

play27:43

the specific close friend level of a

play27:45

follower or following will take a while

play27:47

to propagate through our pipeline same

play27:49

goes for a post uh but they do all

play27:52

eventually go through to this Flint

play27:53

consumer who can then upload the caches

play27:55

or update the caches accordingly

play27:58

and so that is going to make life a

play28:00

little bit easier it's expensive it's

play28:02

asynchronous so you know if there's a

play28:04

post out there that you already made and

play28:05

you're like oh shoot that wasn't meant

play28:07

for everyone this is meant for close

play28:08

friends you know you changing that level

play28:10

is not going to make that happen

play28:11

instantly and that is a trade-off here

play28:13

but it is worth

play28:15

noting okay so this is the part of the

play28:17

video that I definitely want to focus on

play28:19

a little bit because I think uh it makes

play28:21

this video unique I haven't really seen

play28:23

too many others on the internet that

play28:25

focus on nested comments All Too Much so

play28:27

let's talk about them basically we want

play28:29

to be able to optimize to read nested

play28:32

comments and the question is well how

play28:34

can we actually go ahead and partition

play28:35

those let's start with kind of the easy

play28:37

part of this setup well keep in mind

play28:39

that when we did our capacity estimates

play28:41

for this video we said that probably per

play28:44

thread there's around 200 megabytes of

play28:46

comment data and the good thing about

play28:48

that is 200 megabytes is actually very

play28:50

little and what that means is that we

play28:51

can actually keep all of the comments

play28:53

even on our most popular threads on a

play28:55

single node which is huge because then

play28:57

it means we don't have to do cross

play28:58

partition queries all that we have to do

play29:01

is Shard by our post ID and we should be

play29:03

good there as far as partitioning goes

play29:06

the next question is what about

play29:07

replication so replication is a little

play29:09

bit more interesting because of the fact

play29:11

that we can actually have causal

play29:13

dependencies on comments and things

play29:15

wouldn't make sense in certain scenarios

play29:17

so let's say we have a multi-leader

play29:18

database as you can see right here

play29:20

here's leader one here's leader two so

play29:23

let's say that one guy is going to make

play29:25

a comment on leader one over on the left

play29:28

and then a second guy guy number two is

play29:30

going to read it and then respond to it

play29:33

on leader two and now the issue is that

play29:36

the state that leader 2 is in doesn't

play29:38

make sense because it has a comment that

play29:41

is a child of a comment that doesn't

play29:43

exist and so that would obviously be

play29:45

problematic when they sync up maybe

play29:47

things will be okay but Anyone who reads

play29:49

from this replica in the meantime is

play29:51

going to take a look at that and be like

play29:52

I have no idea what's happening this

play29:54

doesn't make sense so for this reason I

play29:57

think that I would opt for single leiter

play29:58

replication here could you maybe get

play30:00

away with Quorum consistency in a

play30:02

multier setup maybe but at the end of

play30:04

the day some of the replicas still might

play30:06

not make sense and that would be a

play30:09

problem okay so let's actually think

play30:11

about our nested comments a little bit

play30:12

more abstractly so as we can think about

play30:15

it if we have nested comments that's

play30:16

going to be a tree right this could be

play30:18

comment one this could be the kid of

play30:20

comment one this could be the kid of the

play30:22

kid of comet one here's another kid of

play30:24

that one and so we have all of this tree

play30:27

right here and of course depending on

play30:30

what site you're on these comments are

play30:31

going to load in a different order some

play30:33

of them do it where basically you'll

play30:34

load a comment at a time and it'll show

play30:36

you the next comments that you could

play30:38

potentially each click on some of them

play30:40

like Reddit specifically will actually

play30:42

kind of load a branch at a time where

play30:44

it'll be like oh well actually let's

play30:46

load these two right here when you click

play30:47

the load more and that's more of like a

play30:49

depth first search right the other way

play30:51

that I just showed was a bit more of a

play30:52

breath first search or at least it would

play30:54

show you everything at a particular

play30:56

level so that you could click into it

play30:58

now I personally think the breath first

play30:59

search approach is pretty easy to

play31:01

implement because you basically are just

play31:03

saying well you know this guy can point

play31:05

here this guy can point right here and

play31:07

then every time we land here we just do

play31:09

a query in our database for all of them

play31:11

that have you know a given parent ID so

play31:13

you do like uh

play31:17

where parent

play31:19

ID equals X and then you just index on

play31:23

that parent ID and that makes it nice

play31:25

and fast I personally you know I want to

play31:28

make this video a little bit hard and uh

play31:30

make everyone think a little bit so I'm

play31:32

going to choose to try and think about

play31:34

this more from this perspective right

play31:36

here where we're trying to actually get

play31:37

like uh a depth for search single branch

play31:40

of comments at a time because that's a

play31:42

little bit harder to do in a fast way so

play31:45

let's actually go ahead and think about

play31:47

this one approach that we could possibly

play31:49

take is a graph database so how would

play31:53

this work well if you haven't heard too

play31:54

much about graph databases in the past I

play31:56

have spoken about them on this channel

play31:58

but I will give a quick recap so there

play32:00

are basically two different types of

play32:02

graph databases one of which is going to

play32:04

be called a native graph database and

play32:06

the other is a non-native graph database

play32:08

so let's talk about non-native first

play32:10

which I've Illustrated over here on the

play32:12

left so a nonnative graph database

play32:14

especially in the scenario of comments

play32:16

would look something like this where

play32:18

every single node has an ID it's got a

play32:20

parent ID and then uh basically you've

play32:23

got all of the things uh that they

play32:26

contain as their data

play32:27

so the issue with this is that I've

play32:29

already kind of mentioned that if you

play32:31

want to do a depth first search you

play32:33

would say okay well I've got my ID over

play32:38

here sorry about that I can't really

play32:41

draw on the left side of the screen and

play32:44

then you would say well find me all of

play32:45

the nodes with parent ID 1 so sorry I

play32:48

indexed this guy incorrectly but the

play32:50

point is you would say find me all of

play32:52

IDs with parent ID one blah blah blah

play32:55

over here and then you would get get

play32:57

these two and then you could continue to

play32:59

depth for search as you please the

play33:02

problem with that is this when you have

play33:04

an index on a particular field the time

play33:07

complexity of finding an element based

play33:09

on that Field's value is going to be o

play33:11

of log of n right because we're actually

play33:13

going to be binary searching this table

play33:15

and the problem with binary searching a

play33:17

table is that that gets slower as the

play33:19

table gets bigger so even as this table

play33:22

gets bigger even if the branch of

play33:24

comments that you're looking for is the

play33:25

same size it's still going to get slower

play33:27

and that in particular is why non-native

play33:29

graph databases are bad on the other

play33:32

hand native ones are quite a bit faster

play33:34

because they actually just go ahead and

play33:36

use pointers on disk so obviously I've

play33:39

drawn out a tree right here but the gist

play33:41

is you can actually just put a pointer

play33:43

to another memory location on disk and

play33:45

then they're literally jumping around so

play33:46

to do a depth for search I would

play33:48

actually just follow these pointers all

play33:49

the way down and then we're good to go

play33:52

nonetheless native graph databases are

play33:54

actually not that fast the reason being

play33:57

that jumping around on dis is slow the

play34:00

first thing I covered on my systems

play34:01

design 2.0 series is that a disc looks

play34:04

like this right You've Got A Little

play34:06

Wheel you've got something that points

play34:08

around the wheel and to jump from place

play34:10

to place to place to place means you

play34:12

actually have to find that location and

play34:14

because these are mechanical Parts this

play34:15

is not like Ram it's just really slow to

play34:18

do that and so this is potentially good

play34:20

for you know graph type data models but

play34:22

if we can avoid representing our thing

play34:24

as a graph type of data model we

play34:27

potentially do this better so what we're

play34:28

going to try and actually do here is

play34:30

build a depth first search index um and

play34:33

yeah I'm just not going to talk about

play34:34

breath resarch that much because I don't

play34:36

think it's that hard to do so let's go

play34:38

ahead and see how we could try and do

play34:40

something like that so before I actually

play34:42

get two into this one I'd like to uh go

play34:44

ahead and thank systems design Fight

play34:46

Club for inspiring me a little bit here

play34:49

I've taken a solution in deped a little

play34:51

bit but the general idea of this thing

play34:54

is that the depth first search index

play34:55

that we're going to build is is very

play34:57

similar to a geohash so if you think

play34:59

about it when we look at a particular

play35:01

comment let's say every single comment

play35:04

of a given node has a letter A based on

play35:06

if it's the first child B on if it's the

play35:08

second child you know if we had a third

play35:10

one we could call it C and put another

play35:12

node here and every time that you have a

play35:14

child of a particular node you again

play35:16

restart that kind of sequence so what

play35:18

you do is in our actual

play35:21

index the ID of the comment is going to

play35:23

be the full path of it so this guy right

play35:25

here is AA

play35:28

because it is the child of a and it

play35:31

itself has the letter a and so that's

play35:33

why it's a a so what does this actually

play35:35

buy us well let's say that we wanted to

play35:37

get the entire contents of this Branch

play35:40

right here well what is this Branch

play35:42

really it's everything that is to the

play35:44

left of letter a and before AB so the

play35:49

range query would look something like a

play35:51

to a and if you look in our actual

play35:55

comment Index right here you can see

play35:57

that a a a AA a are all right before a

play36:01

so then we can just stop right there and

play36:03

as a result of that we can perform a

play36:04

nice clean range query so even though

play36:07

the time complexity is going to be o of

play36:09

log n and then you know however many

play36:10

entries we have to pull after that this

play36:13

potentially is going to be quite a bit

play36:14

faster because it is you know enabling

play36:17

good disc locality as opposed to jumping

play36:19

around every single place on disc and so

play36:21

all we have to do is just generate this

play36:24

comment index by appending all of the

play36:27

comments

play36:28

names in the

play36:32

chain and so when we click load more it

play36:34

should actually be super easy to go

play36:36

ahead and make that range query and hit

play36:38

our database so keep in mind that I

play36:40

wanted this to be a single leader

play36:42

replication uh and something like my SQL

play36:44

I think would work just fine for a

play36:46

database like

play36:48

this okay so as you can see guys we've

play36:51

now gotten to the point where we have an

play36:53

absolute Behemoth of a diagram to go

play36:56

through so I'm going to attempt to do it

play36:58

nice and

play36:59

slowly let's say right here we've got

play37:01

our poster this is the guy who is

play37:04

literally going to write a tweet or a

play37:06

comment and there are going to be many

play37:08

different things that happen when they

play37:09

do the first thing I'll start off at the

play37:12

top is going to be our user service I

play37:14

drew one box but ideally all of these

play37:16

Services should be horizontally scaled

play37:18

out you can have a load balancer in

play37:19

between them I just literally didn't

play37:21

have enough space to write it out so

play37:23

keep in mind that we've got our user

play37:24

service we've got a user database

play37:27

and I elected to use my SQL there the

play37:29

reason being that I don't think that

play37:30

many changes are being made to profiles

play37:32

so I'm not too worried about write

play37:34

throughput I'm more so worried about

play37:36

actual consistency of user changes and

play37:38

having a transactional single leader

play37:40

database like my SQL I think is

play37:41

perfectly good for this uh single leader

play37:44

replication I think is very reasonable

play37:45

here the next thing that I'll cover is

play37:48

our follower service so the follower

play37:50

Service as we mentioned before is going

play37:52

to be in Cassandra or at least the

play37:54

follower DB is I wanted the user follow

play37:56

follower DB we're basically for a given

play37:58

user who follows them to be our source

play38:00

of Truth the reason being that we could

play38:02

then stream those changes right into our

play38:04

Flink node for eventual uh delivering of

play38:08

posts I also noted that we should be

play38:10

sharding on our user ID here if you

play38:12

recall our schema is literally going to

play38:14

be user ID follower ID security

play38:17

permission hopefully that makes some

play38:19

sense the next part is the Post Service

play38:21

like I mentioned this guy needs to be

play38:23

able to ingest rights very quickly posts

play38:25

happen all the time I thought Cassandra

play38:27

was the right choice for this one as a

play38:28

result of that again this is something

play38:30

that you can be sharding on user

play38:32

ID the last part of this is the comment

play38:35

database which as we touched just a

play38:37

moment ago I wanted to be using single

play38:39

leader replication uh the reason being

play38:42

that uh we have causal dependencies and

play38:44

comments and as a result I want rights

play38:45

to be as up to- date as possible so

play38:48

using something like my SQL I think will

play38:50

be pretty reasonable uh totally

play38:52

understand if you think that the right

play38:54

ingestion is not going to be fast enough

play38:56

there uh if you can think of a single

play38:58

leader database that uses LSM trees

play39:01

maybe that would be better could be

play39:02

something like H base who knows so now

play39:05

the thing is that we've got all of these

play39:07

change data captures and this is where

play39:08

we start reaching the middle of our

play39:11

setup because we have all of our Flink

play39:13

nodes so we've got two Flink nodes the

play39:16

first is what I would call like the

play39:18

following Flink

play39:19

node because this is going to help us

play39:22

generate all of our derived data in

play39:24

order to actually make faster queries

play39:27

for things like loading a user's

play39:28

following or uh figuring out who they

play39:31

follow that is verified so if you recall

play39:34

we've got one user service telling us

play39:36

who's

play39:37

verified D do and also who we

play39:43

follow and we can output that into the

play39:46

user verified following cache this is

play39:48

going to make that query as quickly as

play39:50

possible when we actually have to go

play39:51

ahead and load up our Newsfeed

play39:53

additionally we also want a user

play39:55

following DB because I want to be able

play39:57

to quickly see hey how many people am I

play39:59

actually following and who is that and

play40:01

so we are going to go ahead and do that

play40:03

over here I wanted to do that in

play40:04

Cassandra because again I think that uh

play40:07

for single partition reads Cassandra is

play40:09

actually quite quick uh especially if

play40:11

you have a good database schema where

play40:13

you make sure to partition everything by

play40:15

the actual user ID so that would look

play40:18

something like user

play40:20

ID and then following ID so this is the

play40:24

person they're

play40:25

following cool so we have our user

play40:28

verified following cache put that guy in

play40:30

redus it's not going to be that much

play40:32

data I think keeping it in memory is

play40:33

pretty fair uh you can replicate this as

play40:36

much as you want you can Shard it out

play40:38

probably based on user ID I think that's

play40:39

all pretty reasonable so the other type

play40:42

of flank node that we have over here is

play40:44

going to be our posts Flink node so this

play40:47

is going to take in new posts over here

play40:49

from change data capture it's going to

play40:51

take in effectively uh users to see

play40:55

who's verified IED so that verified

play40:57

posts in particular can actually go to

play41:00

the popular post cache and then it is

play41:02

also finally going to take in the

play41:05

follower table changes so that we can

play41:07

say Hey you know user 6 has user 1 2 and

play41:11

three following them so deliver those to

play41:14

the corresponding caches you know it

play41:16

could also then say user 10 is

play41:19

verified so deliver that post to the

play41:21

popular post cache now the last piece of

play41:24

the puzzle is obviously going to be the

play41:26

reader this is going to be someone

play41:29

reading their

play41:33

Newsfeed so as we mentioned reading your

play41:35

news feed is pretty simple the first

play41:37

thing that you're going to do is go

play41:38

ahead and hit the feed service number

play41:40

one then you're also going to reach out

play41:42

to the user verified following cache

play41:44

because we need to know which verified

play41:46

users we want to load post for that way

play41:48

we can hit the proper shards of the

play41:50

popular post cach over here once we do

play41:53

that we can get all of those results

play41:55

that we need back same thing goes from

play41:57

our particular Newsfeed cache it's just

play41:59

going to be from one of these replicas

play42:00

one of these partitions and then they

play42:02

can actually get

play42:04

aggregated over here on the feed service

play42:06

you can aggregate them by timestamp it's

play42:08

not going to be more than like a 100

play42:09

posts so it shouldn't be too hard to do

play42:11

that and then you go ahead and return it

play42:13

right to the user similarly there are a

play42:15

few other queries that the user is going

play42:17

to read one of which is going to be hey

play42:19

who am I following that comes from over

play42:21

here there's also hey who follows me

play42:24

that comes from over here there's Al hey

play42:27

what are my user stats what does my

play42:29

profile look like that comes from over

play42:31

here up top in the user database there's

play42:33

also hey what are the comments that I'm

play42:34

trying to fetch for this post that comes

play42:36

from the comments DB and then of course

play42:38

finally there's hey what have I posted

play42:40

before or what have other users posted

play42:43

before that would come from our post DB

play42:45

anyways guys I know this was a very very

play42:48

long in-depth massive video and frankly

play42:51

I may have even gone through it faster

play42:52

than I needed to uh I don't mean to

play42:54

brush through any parts and and I

play42:56

absolutely want to keep this one as

play42:58

clear as humanly possible but my main

play43:00

intention was to try and leave no stone

play43:02

unturned there are definitely a lot of

play43:04

practical considerations when it comes

play43:05

to a design like this and uh you know I

play43:08

think it's very easy to make a video

play43:09

that skips over some of the in-depth

play43:11

details of a lot of them uh I could

play43:13

certainly see a lot of people not loving

play43:15

this design just due to the amount that

play43:17

I abuse change data capture but I also

play43:19

do think it's a very good way to ensure

play43:21

that all of your data is actually in

play43:22

sync without having to use two-phase

play43:24

commit or introducing yourself to a

play43:26

bunch of partial failure scenarios uh as

play43:28

always though uh my Solutions aren't

play43:30

perfect I'm making these up myself I'm

play43:32

not just going to other channels and

play43:33

copying them so please do go ahead and

play43:35

critique me in the comments section ask

play43:37

questions ask anything you want and I'm

play43:39

happy to defend myself of course you can

play43:41

always get me and uh you know you're

play43:43

probably going to be right anyways guys

play43:45

I hope you enjoyed this video I will see

play43:47

you in the next one

Rate This

5.0 / 5 (0 votes)

Related Tags
Social MediaPerformance OptimizationScalabilityDatabase DesignCassandraFlinkMySQLStream ProcessingSystems DesignComment System