2: Instagram + Twitter + Facebook + Reddit | Systems Design Interview Questions With Ex-Google SWE
Summary
TLDRThis video script outlines a comprehensive guide to building a social media platform supporting services like Instagram, Twitter, Facebook, and Reddit. It covers essential features including newsfeeds, user following, and Reddit-style nested comments. The speaker discusses system design considerations, database choices, and the use of technologies like Cassandra, Flink, and Kafka to ensure scalability, performance, and consistency. The script also delves into optimizing read operations, managing large volumes of data, and handling popular posts from verified users.
Takeaways
- 😀 The video covers building a quad-combo service for Instagram, Twitter, Facebook, and Reddit, focusing on similar features like Newsfeeds and Reddit-style nested comments.
- 🕵️♂️ The plan includes supporting a Newsfeed, user following/followers, and configurable privacy types for posts, with an emphasis on optimizing for read operations due to the nature of social media usage patterns.
- 📈 Capacity estimates are provided, assuming 100 bytes per character for posts and comments, with potential storage requirements calculated for a billion posts per day and a million comments per post.
- 🔄 The use of Change Data Capture (CDC) is proposed to maintain follower relationships and avoid partial failure scenarios, ensuring data consistency without the need for two-phase commits.
- 💡 Derived data and stream processing frameworks like Kafka and Flink are recommended for keeping data in sync and ensuring no messages are lost, even in the event of a failure.
- 🛠️ Cassandra is suggested as the database of choice for the user followers table due to its high write throughput and the ability to handle right conflicts naturally by merging data.
- 🔑 The importance of proper partitioning and sorting in databases is highlighted to ensure fast query performance, especially for operations like deleting a follower or loading a user's Newsfeed.
- 📱 For Newsfeed optimization, the video discusses the concept of caching every user's Newsfeed on powerful servers to provide a fast reading experience, even considering the asynchronous nature of data updates.
- 🔍 A hybrid approach is considered for handling popular posts from verified users with many followers, using a combination of direct database reads and caching strategies to manage the load.
- 🗣️ The script touches on the implementation of security levels in posts, suggesting storing security permissions within the followers table and allowing Flink to manage these permissions when delivering posts to caches.
- 🌐 Finally, the video addresses the complexity of implementing nested comments, proposing a depth-first search index similar to a geohash for efficient range queries and good disk locality.
Q & A
What is the main focus of the video?
-The video focuses on building a system that supports features for Instagram, Twitter, Facebook, and Reddit, including newsfeed and Reddit-style nested comments.
What are the key features planned for the system?
-The key features include a newsfeed, support for Reddit-style nested comments, quickly loading who a user is following and who follows them, getting all posts for a given user, low latency newsfeed, configurable privacy types, and optimizing for read operations.
Why is optimizing for reads important in the context of a social media site?
-Optimizing for reads is important because the majority of user interactions on social media sites involve reading or 'lurking' rather than posting, making read operations more frequent.
What is the estimated storage requirement for a single post and how does it scale up to yearly storage for a billion posts per day?
-A single post is estimated to be around 200 bytes, including metadata. With a billion posts per day, this could lead to approximately 73 terabytes of storage per year.
How does the system plan to handle the follower and following relationships in a distributed database setting?
-The system plans to use a change data capture (CDC) method with a single source of truth table and stream processing frameworks like Kafka and Flink to ensure data consistency and avoid partial failure scenarios.
What database is suggested for handling the user follower table and why?
-Cassandra is suggested due to its high write throughput, leaderless replication, and the use of LSM trees, which allow for fast ingestion and buffering in memory.
How does the system handle the issue of popular users with millions of followers?
-For popular users, a hybrid approach is used where posts are read from the Post DB directly, and a caching layer for popular posts is introduced to handle the high volume of followers efficiently.
What is the proposed method for implementing configurable privacy levels for posts?
-The implementation involves storing additional information in the followers table to indicate the security level of the relationship, which is then used by the Flink consumer to filter posts accordingly.
What challenges arise when considering the replication of comments in a social media system?
-Challenges include maintaining causal dependencies and ensuring that the state of the replicas makes sense, avoiding situations where a comment's child exists on a replica but not its parent.
How does the video script address the problem of reading nested comments efficiently?
-The script suggests using a depth-first search index, similar to a geohash, which allows for range queries to efficiently retrieve entire branches of comments.
What is the overall architecture of the system presented in the video?
-The system architecture includes services for user management, follower relationships, post management, and comments, with databases like MySQL, Cassandra, and potentially a graph database or a depth-first search index for comments, all interconnected through Flink nodes for stream processing.
Outlines
📺 Introduction to Building Social Media Services
The speaker introduces a video tutorial on constructing four similar social media services: Instagram, Twitter, Facebook, and Reddit. The goal is to build these services within an hour, covering features like Newsfeed and Reddit-style nested comments. The speaker also mentions personal anecdotes, such as needing a haircut and having eaten a lot, indicating a casual and humorous tone. The importance of optimizing for read operations due to the read-heavy nature of social media use is emphasized, along with initial capacity estimates for posts and comments storage.
🔍 Database Design for Efficient Follower and Following Operations
This paragraph delves into the challenges of database design for efficiently querying followers and followings. The speaker discusses the limitations of traditional indexing and the benefits of using change data capture (CDC) to maintain consistency without resorting to two-phase commits. The use of stream processing frameworks like Kafka and Flink is proposed to ensure no data loss and to update derived data. The choice of Cassandra as the database is justified due to its high write throughput and the use of leaderless replication and LSM trees.
🛠 Optimizing Newsfeed Generation for Social Media
The speaker outlines the process of generating a Newsfeed, discussing the naive approach of aggregating posts from a sharded database and the more optimal method of using Flink to manage user-following relationships and post deliveries. The importance of caching entire Newsfeeds in memory for quick access is highlighted, along with the potential use of multiple replicas to distribute the load. The challenges of delivering posts from popular users with millions of followers are also touched upon.
🔄 Hybrid Approach for Newsfeed Caching
The paragraph introduces a hybrid approach to handle Newsfeed caching, especially for popular users who may have a high volume of followers. The speaker suggests using change data capture to update both the Post DB and a popular post cache asynchronously. The process involves Flink consumers partitioned by user ID to manage data efficiently, ensuring that updates to posts and security permissions are propagated correctly.
🗨️ Designing for Nested Comments in Social Media
This section focuses on the complexities of designing a system to handle nested comments, like those found on Reddit. The speaker discusses the trade-offs between breadth-first and depth-first search approaches for loading comments and the challenges of using non-native graph databases. The limitations of binary search in databases and the advantages of native graph databases with pointers for faster depth-first search are explained.
📚 Implementing Depth-First Search Index for Comments
The speaker proposes a depth-first search index for comments, inspired by geohashing, to improve the performance of loading nested comments. The method involves creating a comment index based on the full path of comments, allowing for efficient range queries to retrieve entire branches of comments. The use of single-leader replication for the comment database is justified to maintain causal dependencies and ensure up-to-date comment data.
🌐 System Architecture for Social Media Services
The final paragraph presents a comprehensive system architecture diagram for the social media services discussed in the video. It includes services for user management, follower relationships, post management, and comments, each with their respective databases and data flow managed by Flink nodes. The architecture aims to balance consistency, speed, and scalability, with a focus on efficient data processing and caching strategies.
Mindmap
Keywords
💡Quad combo video
💡Newsfeed
💡Nested comments
💡Change data capture (CDC)
💡Stream processing
💡Cassandra
💡Partitioning and sorting
💡Load balancing
💡Hybrid approach
💡Graph database
💡Depth-first search index
Highlights
Introduction of a quad-combo video covering the development of four similar services within an hour.
Building services for Instagram, Twitter, Facebook, and Reddit with shared features like Newsfeed and Reddit-style nested comments.
Supporting quick loading of followers and followings, and fetching posts for a given user with an efficient solution.
Challenges of creating a low-latency Newsfeed and the importance of optimizing for reads over writes in social media platforms.
Requirement of supporting configurable privacy types for posts as requested by users.
Estimation of storage capacity for a billion posts per day and the implications for database design.
The use of change data capture (CDC) to maintain follower relationships and avoid partial failure scenarios.
Discussion on the choice of database for follower relationships, leaning towards Cassandra for its write throughput.
Explanation of partitioning and sorting keys in Cassandra for optimizing user follower table queries.
Innovative approach to Newsfeed generation using in-memory caches for each user's Newsfeed.
Use of Apache Flink for processing streams of data and updating Newsfeed caches.
Addressing the challenge of popular users with millions of followers and the introduction of a hybrid Newsfeed approach.
Implementation of a depth-first search index for nested comments to optimize read efficiency.
Comparison between native and non-native graph databases for handling nested comments.
Design of a single-leader replication system for comments to maintain causal dependencies.
Final system design overview connecting all services and databases for a comprehensive understanding.
Call to action for feedback and critique on the presented system design.
Transcripts
hello everybody and welcome back to the
channel today we'll be doing a quad
combo video of four different Services
which we're all going to build in one
video cuz they're all pretty damn
similar so hopefully we can get through
it within an hour because I have to get
a haircut after this and even besides
that I've done myself the absolute favor
of eating a ton of chicken wings and
protein shakes today and at some point I
may need a 15 to 20 minute break to go
spill out my guts anyways let's go ahead
and jump into this thing cuz I need to
get started all right so let's go ahead
and dive on into it so today we are
going to be doing Instagram Twitter
Facebook and Reddit well that's
obviously a lot so let's actually talk
about the features that we plan on
supporting I assume all of you use at
least one of these things so you'll
probably understand what I'm talking
about when I say we're going to try and
support a
Newsfeed and in addition to that we're
also going to be supporting Reddit style
nested comments where the comments
themselves basically form a tree you can
have some top comments and then you can
also have load more buttons that are
going going to quickly fetch basically
the next branch of comments below that
so hopefully that makes sense let's go
ahead and move on to some
requirements so we've got a few
different objectives in a problem like
this oh boy my throat is already
starting to get sore but I'm going to
push through it so the first is that of
course we always want to be able to
quickly load who we're following and who
follows us those are just two common
features of all of these applications
additionally we want to be able to get
all the posts for a given user I say
this because when you see our eventual
Sol solution this isn't necessarily
something that comes for free with that
we need to be able to support this as
well eventually as well we want to be
able to support a low latency Newsfeed
making a Newsfeed is easy making it
quick is hard there are a lot of talks
from Twitter Engineers to prove this
number four is that we want to be able
to support configurable privacy types
this isn't that hard but someone did ask
for uh for this in the comments of the
last version of this video so I figured
screw it why not throw this one in there
and then they also asked for Reddit
style comments where they can be
infinitely nested now if you're thinking
about the use patterns of something like
a social media site hopefully it makes
sense that 90% of the time that you're
on there you're probably just lurking in
reading stuff and you're posting pretty
infrequently so as a result the main
thing we want to keep in mind here is
that we're going to be aiming to
optimize reads as opposed to writs that
is going to be very
important okay so let's start thinking
about some capacity estimates let me
zoom in a little bit here so the first
thing is that I'm going to assume there
are 100 characters of post get that one
from Twitter even though it's 140 let's
do 100 for making the math a little bit
easier round 100 bytes because you know
a character is basically a bite and then
additionally let's estimate that you
know other metadata like user ID uh post
time stamp stuff like that adds another
100 bytes so maybe we can assume that
there's 200 bytes in a single post
additionally if we have a billion posts
per day which is actually pretty
realistic for some of these sites you
could be looking at uh 73 terabyt per
year of storage that's a lot it's
actually not a ton when you're a
platform like this and you make a ton of
money through ads 73 terabytes a year is
pretty little but we are going to need
more storage than that we'll discuss
that later additionally let's assume
that the average user has around 100
followers I personally have no followers
I like to just tweet into the abyss and
that there are some verified users with
Millions for example myself all the
women in the world follow me next we
also have comments I'm going to limit
those to 100 characters for Simplicity
sake and again that means they're
probably around 200 bytes in total store
that and uh also when we start to think
about comments and making them
infinitely nested it's important to have
a sense of you know how much data
comments for a post actually take I'm
going to assume there can be up to a
million comments per post and that means
up to 200 megabytes of storage required
because a million time 200
bytes okay so the first thing that we're
going to talk about is starting to fetch
our follower SL following so the issue
here is that we want both of these
operations to be very fast right we have
this one over here where we get the
followers for a specific user and then
we also want to get all of the users
that a specific user is following and we
want both of those to return quickly now
the issue with this is that if you were
to choose a database table that is
indexed in a manner such that it is
either by the user ID for all of their
followers or the user ID for all of
their followings that is going to be
very slow for the other type the reason
why is that these queries are going to
be distributed so if we look over here
on the right side of the screen let's
imagine that we used this type of table
over here right where we've got one user
and then a follower ID and that
represents a following relationship and
then we go ahead and index on that user
ID so you can see the user ID field is
sorted that's why fours are at the
bottom six is at the top and then let's
say we go ahead and partition also on
that user ID because they're going to be
a ton of follower following
relationships we're not going to be able
to store them all on one database table
and so that's going to be really good
when we want to find all the followers
of one particular user however it's
going to break down in a distributed
setting when we want to find all of the
users that a user follows so let's say
we wanted to find all the users that
user one follows so that would be this
guy over here as you can see because we
are partitioning on basically the person
that has a following getting all of a
person's uh users that they follow is
going to be very challenging because
we'd have to do a distributed query we
wouldn't have any indexing within those
nodes we would have to do a linear time
sort or rather a linear time scan to go
through every single row on here and
every single row on here and then we'd
have to aggregate them somewhere on some
other server and then we would have to
return that back to the user and that
probably just isn't feasible so what
I've opted to ultimately do instead is
use an actual derived data change data
capture type of method the reason I opt
for something like change data capture
here with one source of truth table is
the fact that it helps us avoid partial
failure scenarios because keep in mind
that if you're a client and you're
writing to two different databases at
once one this is one database here's the
other database you know one of these
rights could fail one of them could
succeed and really the only way to
guarantee that doesn't happen is going
to be something like two-phase commit
and that's going to be really slow so if
we really want to ensure consistency
another thing that we could do is use
something like change data capture and
then use one of our coveted stream
processing Frameworks to ensure that
none of those rights get lost you can
actually use one of the tables as shown
over here and you would use CDC from it
to basically go into something like
Kafka the reason Kafka is good is
because it's replicated so it's fa
tolerant and it is also basically a
persistent log so we know that if a
message uh doesn't get processed in the
moment we can always go back and process
it later and then we've got something
like Flink which if you recall is going
to checkpoint State
occasionally and this is going to make
sure that we're never not processing a
single message and then we can basically
Al take it and update our other table
and this is going to be our derived
data so you may notice that I've chosen
to have the user followers table be the
source of truth that means for user X
who
follows X and if you recall that schema
looks something like we've got our user
over here and then all the people that
follow them on the right and we index by
that user ID and we can partition on
that as well so then of course we've got
Flink listening to all of those
different partitions and then uploading
another table keeping it in check now
you may think to yourself well it is
possible that Flink uh in a failure
scenario might upload duplicate messages
to the user following table that's
really not a huge deal because at the
end of the day a duplicate upload can
just be duped right like if I have 4 and
22 and then I have 4 and 22 again I
could just say well if it's already in
there just don't add it not a huge
deal okay so now that we have kind of an
idea of how we're actually going going
to maintain our follower relationships
within databases what type of actual
database should we be using to do so so
of course we do want to have good read
speeds and good write speeds for these
tables especially for this guy over here
because it doesn't have any buffering
before the rights get there it's
basically just taking all the rights as
they come in so it's important that we
also do have some fast ingestion here so
in my opinion I think something like
Cassandra would be really good Cassandra
is very good in terms of it right
throughput for a couple of reasons one
of which is that it uses leaderless
replication so rights can go to any
replica another is that it uses LSM
trees so rights are first buffered in
memory but the point here is that like I
mentioned we don't really have conflicts
or right conflicts to worry about in the
relationship of kind of a user and their
follower because at the end of the day
you're just merging them all together
right like if I say that a user has one
follower on one replica and then I say
that a user has a follower B on another
replica the the kind of combined state
of those two is just just great now this
user has two followers so again right
conflicts not an
issue and this was unlike our tiny URL
video where they were certainly an issue
so again this is good for the reasons I
mentioned over here it's going to be
fast and basically what should our
partition and sortkey be if we are going
to have a database like this especially
for our user follower table because
that's where we really care about the
latency so my thinking here is well the
obvious thing to partition on would be
the user right because we're already
indexing on there we want all of the
user's followers to be on a single
database node or a single database
partition because that is going to make
the queries a lot faster we don't have
to do any you know aggregations after we
hit multiple partitions and in addition
to that by actually sorting on the
follower following ID this just means
that if we ever need to delete a row say
a follower stops following a person we
can go hit that partition quickly find
their follower and then go delete that
row
so you know the example would be here's
partition one here's partition two you
may notice that uh obviously we're not
using just range based sorting or else
uh number six would be on this partition
and number 18 would be on the other
partition we're using a hash range
sorting or or rather a hash range
partitioning because ideally that is
going to load balance our tables a
little bit better keep the partitions
more balanced consistent hashing yada
yada yada you guys get it by now if you
don't I recommend watching the tiny URL
video and the consistent hashing video
for some more details there so we've
spoken about how we actually want to go
ahead and make our follower and
following tables how we're using derive
data and change data capture to keep
those up and it's very important that we
actually have those in check because
they are crucial for actually
maintaining a Newsfeed the reason why is
that for any given user we need to know
who follows them if I post it's very
important that basically uh all of my
followers are going to see my my post
and so we need to be able to do that and
then of course once we actually go and
take a person and basically all of who
they follow then you have to generate
all the posts for them so kind of the
naive way of doing this is you've got a
client right this is the reader of
Newsfeed the first thing they could do
is potentially hit the user following
table now to clarify that's just who X
follows so that's going to respond with
some people and then they're going to go
and reach out to the post DB which is
probably going to be sharded on user ID
because that is the most rational way of
doing things and then it's going to
aggregate them all on one server blah
blah blah blah
blah and return it back to the user and
the reason I said this was naive is
because all of those distributed queries
and then the aggregation step is
probably just going to be too slow
you're basically bottlenecked by
whatever the slowest query could
possibly be and that is a bad thing and
hence this this is why I call this the
naive way of building the news feed in
reality what we would like to do is put
in a little bit more work on the right
path so that we can speed up our read
path a little bit more so let's actually
start to talk about that here's what I
would call the more optimal Newsfeed so
the first thing is that we should
recognize that if we really want to make
a news feed as fast as possible we would
somehow have to index all of the tweets
by which news feed they belong to and
the issue with that is that you can't
obviously do that because it belongs to
many different news feeds in fact we
even said it belongs to around a 100 of
them because the average user has around
100 followers so the issue is that you
know because there are so many different
places to put this tweet we would have
to end up storing a ton of data well how
much data actually we estimated there
were around a billion tweets a day and
there were around 200 bytes a tweet and
what if we did actually store 100 copies
of every single tweet and put it in a
different index well we could and then
we would actually only have 20 terabytes
of tweets per day which if if you think
about it is really not that bad for a
company like Twitter I mean they have
literally millions of dollars of servers
20 terabytes is nothing for them and
especially if we want to make this
really fast and throw it in memory let's
say we had 256 GB super beefy hosts
where they actually have that much RAM
in them that's only around 80 maybe we
can round that up to 100 in memory
caches and so effectively what we can
actually do is go ahead and cach every
single user's Newsfeed and we can do it
on one of these super beefy servers
maybe we've got a couple of replicas of
them so maybe instead of 80 it's more
like 200 but that's okay again these are
massive sites they make lots of money
200 massive servers is not going to make
or break things for them so let's
actually start looking at the newsfeed
diagram or at least the initial sketch
of it again this is going to be a lot to
ingest so no worries if it takes you a
little bit to uh put this all down you
can always pause the video and go back
for a second so let's say that we've got
client six over here the first thing
they're going to do is make a post
that's going to hit our post database
note that again I'm using change data
capture here because it keeps everything
consistent and we don't have to worry
about two-phase commit that is then
going to be ingested into a Kafka que
for the same reasons that I mentioned
before this gives us fault tolerance and
it gives us replayability which is very
important for this guy here Flink so in
this case as opposed to just ingesting
one stream of data this Flink consumer
is going to be getting data from two
sources the first is going to be the
user followers table which if you recall
from before is going to be that source
of Truth which means that whenever a
following is done it goes right into
this table and so what's going to now
happen is that this user or rather this
Flink consumer can cach who follows user
six pardon me as I accidentally erase
when I mean to write but the gist is
that as you can see Flink is going to
look at all of those user follower
relationships and it can actually Shard
itself on that user ID of this table so
keep in mind this table is user ID
follower
ID and the good thing about that is that
these guys can be sharded the same way
because keep in mind post DB is also
sharded by user ID and so now Flink or a
particular Flink consumer only has to
hold a subset of this data and so it can
keep it all in memory and so let's see
now flank says okay well we know that uh
this post has to be delivered to user
one user 10 user 22 and 44 and then it's
going to go ahead and write to the
appropriate feed caches we can load
balance and partition these feed caches
such that they each have certain ranges
of users that they represent and then
the Flint consumer can figure out where
it needs to send it it'll probably have
to reach out to zookeeper hit a load
balancer and then boom there you go now
we're in some caches so let's quickly
look at a couple of our notes to clarify
things basically Flink can uh go ahead
and keep the state of all the followers
as I mentioned and the reason that this
is so important that it keeps this state
is because it is a major optimization on
having to make a network call to the
database every single time saying well
who does user 6 actually have as a
following that would be very slow and in
addition to that it would be a ton of
repeat computation additionally as new
followers come in we're actually
streaming that data up to Flink via
change data capture so it is up to date
we don't have to worry about that and
then again we talked about sharding
we're basically sharding on the poster
ID which is the user ID here and the
user ID which the post DB is also
sharded on so this fling consumer should
actually be able ble to handle all of
this data because we can partition it in
a Smart Way such that there's not enough
to overload it Okay so we've spoken
about that and then Additionally you may
be thinking to yourself well this works
great for new tweets but what if we want
to edit a tweet what if we want to
potentially update security permission
changes on it what if that tweet all of
a sudden gets new followers well keep in
mind that all of this data is actually
flowing through into this Flint consumer
and then it can do the necessary logic
to send those updates to the required
caches now of course keep in mind this
is pretty expensive right objectively it
is now taking a tweet a very long time
to get into one of these Newsfeed caches
but at the same time it doesn't really
matter if a tweet takes a while to get
there because all that matters is the
user sees their tweet hit the post DB
and then they think they're good and
then maybe 5 minutes later it gets into
everyone's caches but they don't have to
know that it's all asynchronous and at
the end of the day this is going to make
everyone else's reading experience a lot
better and they're going to keep coming
back to our site so let's actually go
ahead and talk about our post database
and schema because I kind of brushed
over that we haven't really mentioned it
very much yet and it's important that we
do keep this fast because in the case of
the user followers table it's actually
the same thing if rights are going
directly to this guy it needs to be able
to ingest them quickly or else it will
become a bottleneck in our system so for
the same reasoning as before I again
wanted to use Cassandra but this time I
wanted to partition it in a slightly
different way where I would Partition by
the user ID so that you keep all posts
from the same user on a given node and
then within a given partition we want to
use the sort key as the time stamp or
rather the time stamp as the sort key
and this means that when we run a query
let's say we've got 69 and then what's
today's day we've got
1120
2023 now when I load all of these posts
they're already going to be in presorted
order and that's going to keep our query
very fast for when we say get
posts of a specific user and again
that's what I mentioned over here it is
going to make sure that basically
partitioning for a given user is nice
and fast and as a result of that that
endpoint will be
reasonable Okay so we've gone over our
actual table schema for the Post DB
which ensures that if we do want to load
posts for a given user that should
hopefully be decently fast it should all
be on the same partition it should
already be presorted by timestamp which
is great however we do come across one
big problem which is going to be popular
users so I did did mention that some
users are verified and they're actually
going to have millions of followers so
the problem with this is that our
previous step of basically uploading all
of our posts using change data capture
and then basically using a bunch of
different Newsfeed caches as our sync is
going to fail the reason being that when
you have millions of followers that post
has to get delivered to many many
different places and that is going to be
heinously slow so what could we
potentially do instead what we're going
to try to do is use a hybrid approach so
in this case let's imagine that we've
got our Newsfeed reader so our bad post
would exclusively go to the Post DB and
aggregate our ideal approach is that we
exclusively read from the feed Pat but
it's possible that what we could do
instead is say only for the verified
users we want to read from the post DB
and for the non-verified users basically
we can read from our feed cache and then
we'll aggregate those ideally it'll
result in a lot fewer partitions of the
database that we'll have to hit but even
still we are still going to hit a bunch
of the partitions of that database and
that could still be too slow so how
could we make this a little bit better
well what we could do is introduce a
potential caching layer for our popular
posts so the nice thing about this is
that when it comes to popular posts or
rather posts from popular users we
actually know in advance that they're
going to be popular if someone with a
million followers is going to make a
tweet we know it's probably going to get
at least you know 10,000 100,000 views
within the first few minutes and as a
result when that person makes a post we
can actually pre-load a cash so that
everyone else can load it so what this
is going to look like is actually going
to be very very similar to our previous
setup to load all of our Newsfeed caches
for example here's me as I mentioned all
women on Twitter follow me they love me
so the first thing that we do is put it
in our post DB this is going to be
common amongst all tweets the next thing
that it's going to do is go overse CDC I
forgot to write out the cofa Q here but
hopefully that makes sense into Flink
and then similarly what's going to
happen is that we've got our users table
over here which contains for a given
user ID whether that user is verified or
not and so again over change data
capture this can go ahead and hit Flink
and we can again Shard this by the user
ID right so again
Shard by user ID and so this way when a
post from Jordan hits Flink Jord 's
verified status is also going to be in
Flink and so as a result Flink can say
oh Jordan is in fact verified all the
ladies love him let's go and put one of
his posts in the popular post caches
over here and so by using this style we
accomplish a couple of things the first
is that one uh we don't have to use a
traditional like right through based
caching approach where we would have to
either use two-phase commit to ensure
cash consistency or just in general we
would have to have a partial failure
scenario where our cash cash might get
uploaded but our database wouldn't or
our database you know the right goes
through and the cash doesn't so this
makes sure that everything is consistent
keeps everything asynchronous and it
also allows us to see whether or not a
given user is verified when they
actually make a tweet and again the nice
thing about this is that post edits will
also go back through here back into
flank and then we can go ahead and
upload our popular post cache assuming
that we shred these caches by user ID we
know exactly where a given popular
user's post is going to live and that
makes our life super easy so what does
things actually look like again well now
the gist is that um if we want to use
our hybrid solution we would basically
also read from the
cache okay let's figure out where I'm at
cuz now even I'm starting to lose track
so kind of the next thing that I want to
be touching upon here is the following
concept we have mentioned that we want
to read all of our popular posts from a
popular post cache however a challenging
part of this is that for a given reader
right let's say this is the reader it's
going to the news feed
service
feed
service and then it's going to you know
popular posts popular and Newsfeed cach
news
feed the only way that we actually know
what to read from the popular posts is
by figuring out who this guy follows
that is
verified and that in it of itself could
be a little bit of a tough query because
even though we store all of our follower
following relationships already and
that's decently fast in order to
actually get a sense of whether or not a
particular person that I follow is
verified I would have to join that on
the user table and that could of course
become slow so how can we go ahead and
do this well again we can use derived
data you might start to uh figure out a
pattern here which is that I do love me
some stream processing and I love using
derived data because derived data allows
us to pre-compute data in the most
optimal format so that we can make
things as fast as possible now how could
we actually do this well yet again we
could use another flank node we could
have our users table which already
exists sharded by users ID we could have
our user following table which if you
recall is already a piece of derived
data so we're actually now using change
data capture on derive data and then
what we do is by merging these two
things into two different streams that
go into the same Flink node and
partitioning properly we can actually
tell for a given user not only who they
follow but if they're verified so from
the user following table I can say for
example I know user 10 follows user 3
and user 22 from this table over here I
can say well I know user 3 is verified
and you might say to yourself ah shoot
well it seems that the user's table
actually has to put verif verified
users on every single Flint
consumer every
consumer my argument to you here would
be that there aren't that many verified
users I can't imagine there would be
more than like 10,000 of them and so a
10,000 person set is really not a big
deal um but the gist is that as a result
of having this small verified table in
memory on every single node we can
quickly tell oh you know what hey user
10 is actually now following user 3 that
person is verified let's go ahead and
upload to the cache over here and now we
can see that user 10 is following
verified user 3 and we can use this as a
cach to figure
out my
verified um following so for example if
I'm following Mr Beast if I'm following
Donald Trump if I'm following Obama it
would all be in this verified cache cool
so let's quickly touch upon security
levels in posts because uh this was a
specific request for this video and I
would like to honor that so let's say
that a user can actually specify whether
a post is for all of their followers or
let's say a close friend followers we'll
keep it simple and say there's only two
configurable levels right now however it
is going to be the case that uh you know
this kind of scales out to three or four
or five levels of potential security
anyways so the easiest way to implement
this at least in my opinion is just by
actually going ahead and putting all of
this information in the followers table
so we already have our user followers
table which defines the relationship uh
that these guys are having with one
another and so for example uh we've got
user one follower two so that means that
two follows user one and you can see
that their security level is all same
goes for here user 3 follows user one
their relationship is close friend so
recall that in Flink oh boy nice voice
crack there
in flank we actually have access to this
data so it would say something like uh
one has two which is a uh sorry an all
follower two all and then three close
friend so when a post comes
in post of close
friend it can say ah you know what I'm
actually only going for this guy here
and not for all and so as a result of
that uh we can actually continue to uh
use our existing posting pipeline uh by
just storing this data as well within
our Flint consumer now it is unfortunate
that uh basically you know changes to
the specific close friend level of a
follower or following will take a while
to propagate through our pipeline same
goes for a post uh but they do all
eventually go through to this Flint
consumer who can then upload the caches
or update the caches accordingly
and so that is going to make life a
little bit easier it's expensive it's
asynchronous so you know if there's a
post out there that you already made and
you're like oh shoot that wasn't meant
for everyone this is meant for close
friends you know you changing that level
is not going to make that happen
instantly and that is a trade-off here
but it is worth
noting okay so this is the part of the
video that I definitely want to focus on
a little bit because I think uh it makes
this video unique I haven't really seen
too many others on the internet that
focus on nested comments All Too Much so
let's talk about them basically we want
to be able to optimize to read nested
comments and the question is well how
can we actually go ahead and partition
those let's start with kind of the easy
part of this setup well keep in mind
that when we did our capacity estimates
for this video we said that probably per
thread there's around 200 megabytes of
comment data and the good thing about
that is 200 megabytes is actually very
little and what that means is that we
can actually keep all of the comments
even on our most popular threads on a
single node which is huge because then
it means we don't have to do cross
partition queries all that we have to do
is Shard by our post ID and we should be
good there as far as partitioning goes
the next question is what about
replication so replication is a little
bit more interesting because of the fact
that we can actually have causal
dependencies on comments and things
wouldn't make sense in certain scenarios
so let's say we have a multi-leader
database as you can see right here
here's leader one here's leader two so
let's say that one guy is going to make
a comment on leader one over on the left
and then a second guy guy number two is
going to read it and then respond to it
on leader two and now the issue is that
the state that leader 2 is in doesn't
make sense because it has a comment that
is a child of a comment that doesn't
exist and so that would obviously be
problematic when they sync up maybe
things will be okay but Anyone who reads
from this replica in the meantime is
going to take a look at that and be like
I have no idea what's happening this
doesn't make sense so for this reason I
think that I would opt for single leiter
replication here could you maybe get
away with Quorum consistency in a
multier setup maybe but at the end of
the day some of the replicas still might
not make sense and that would be a
problem okay so let's actually think
about our nested comments a little bit
more abstractly so as we can think about
it if we have nested comments that's
going to be a tree right this could be
comment one this could be the kid of
comment one this could be the kid of the
kid of comet one here's another kid of
that one and so we have all of this tree
right here and of course depending on
what site you're on these comments are
going to load in a different order some
of them do it where basically you'll
load a comment at a time and it'll show
you the next comments that you could
potentially each click on some of them
like Reddit specifically will actually
kind of load a branch at a time where
it'll be like oh well actually let's
load these two right here when you click
the load more and that's more of like a
depth first search right the other way
that I just showed was a bit more of a
breath first search or at least it would
show you everything at a particular
level so that you could click into it
now I personally think the breath first
search approach is pretty easy to
implement because you basically are just
saying well you know this guy can point
here this guy can point right here and
then every time we land here we just do
a query in our database for all of them
that have you know a given parent ID so
you do like uh
where parent
ID equals X and then you just index on
that parent ID and that makes it nice
and fast I personally you know I want to
make this video a little bit hard and uh
make everyone think a little bit so I'm
going to choose to try and think about
this more from this perspective right
here where we're trying to actually get
like uh a depth for search single branch
of comments at a time because that's a
little bit harder to do in a fast way so
let's actually go ahead and think about
this one approach that we could possibly
take is a graph database so how would
this work well if you haven't heard too
much about graph databases in the past I
have spoken about them on this channel
but I will give a quick recap so there
are basically two different types of
graph databases one of which is going to
be called a native graph database and
the other is a non-native graph database
so let's talk about non-native first
which I've Illustrated over here on the
left so a nonnative graph database
especially in the scenario of comments
would look something like this where
every single node has an ID it's got a
parent ID and then uh basically you've
got all of the things uh that they
contain as their data
so the issue with this is that I've
already kind of mentioned that if you
want to do a depth first search you
would say okay well I've got my ID over
here sorry about that I can't really
draw on the left side of the screen and
then you would say well find me all of
the nodes with parent ID 1 so sorry I
indexed this guy incorrectly but the
point is you would say find me all of
IDs with parent ID one blah blah blah
over here and then you would get get
these two and then you could continue to
depth for search as you please the
problem with that is this when you have
an index on a particular field the time
complexity of finding an element based
on that Field's value is going to be o
of log of n right because we're actually
going to be binary searching this table
and the problem with binary searching a
table is that that gets slower as the
table gets bigger so even as this table
gets bigger even if the branch of
comments that you're looking for is the
same size it's still going to get slower
and that in particular is why non-native
graph databases are bad on the other
hand native ones are quite a bit faster
because they actually just go ahead and
use pointers on disk so obviously I've
drawn out a tree right here but the gist
is you can actually just put a pointer
to another memory location on disk and
then they're literally jumping around so
to do a depth for search I would
actually just follow these pointers all
the way down and then we're good to go
nonetheless native graph databases are
actually not that fast the reason being
that jumping around on dis is slow the
first thing I covered on my systems
design 2.0 series is that a disc looks
like this right You've Got A Little
Wheel you've got something that points
around the wheel and to jump from place
to place to place to place means you
actually have to find that location and
because these are mechanical Parts this
is not like Ram it's just really slow to
do that and so this is potentially good
for you know graph type data models but
if we can avoid representing our thing
as a graph type of data model we
potentially do this better so what we're
going to try and actually do here is
build a depth first search index um and
yeah I'm just not going to talk about
breath resarch that much because I don't
think it's that hard to do so let's go
ahead and see how we could try and do
something like that so before I actually
get two into this one I'd like to uh go
ahead and thank systems design Fight
Club for inspiring me a little bit here
I've taken a solution in deped a little
bit but the general idea of this thing
is that the depth first search index
that we're going to build is is very
similar to a geohash so if you think
about it when we look at a particular
comment let's say every single comment
of a given node has a letter A based on
if it's the first child B on if it's the
second child you know if we had a third
one we could call it C and put another
node here and every time that you have a
child of a particular node you again
restart that kind of sequence so what
you do is in our actual
index the ID of the comment is going to
be the full path of it so this guy right
here is AA
because it is the child of a and it
itself has the letter a and so that's
why it's a a so what does this actually
buy us well let's say that we wanted to
get the entire contents of this Branch
right here well what is this Branch
really it's everything that is to the
left of letter a and before AB so the
range query would look something like a
to a and if you look in our actual
comment Index right here you can see
that a a a AA a are all right before a
so then we can just stop right there and
as a result of that we can perform a
nice clean range query so even though
the time complexity is going to be o of
log n and then you know however many
entries we have to pull after that this
potentially is going to be quite a bit
faster because it is you know enabling
good disc locality as opposed to jumping
around every single place on disc and so
all we have to do is just generate this
comment index by appending all of the
comments
names in the
chain and so when we click load more it
should actually be super easy to go
ahead and make that range query and hit
our database so keep in mind that I
wanted this to be a single leader
replication uh and something like my SQL
I think would work just fine for a
database like
this okay so as you can see guys we've
now gotten to the point where we have an
absolute Behemoth of a diagram to go
through so I'm going to attempt to do it
nice and
slowly let's say right here we've got
our poster this is the guy who is
literally going to write a tweet or a
comment and there are going to be many
different things that happen when they
do the first thing I'll start off at the
top is going to be our user service I
drew one box but ideally all of these
Services should be horizontally scaled
out you can have a load balancer in
between them I just literally didn't
have enough space to write it out so
keep in mind that we've got our user
service we've got a user database
and I elected to use my SQL there the
reason being that I don't think that
many changes are being made to profiles
so I'm not too worried about write
throughput I'm more so worried about
actual consistency of user changes and
having a transactional single leader
database like my SQL I think is
perfectly good for this uh single leader
replication I think is very reasonable
here the next thing that I'll cover is
our follower service so the follower
Service as we mentioned before is going
to be in Cassandra or at least the
follower DB is I wanted the user follow
follower DB we're basically for a given
user who follows them to be our source
of Truth the reason being that we could
then stream those changes right into our
Flink node for eventual uh delivering of
posts I also noted that we should be
sharding on our user ID here if you
recall our schema is literally going to
be user ID follower ID security
permission hopefully that makes some
sense the next part is the Post Service
like I mentioned this guy needs to be
able to ingest rights very quickly posts
happen all the time I thought Cassandra
was the right choice for this one as a
result of that again this is something
that you can be sharding on user
ID the last part of this is the comment
database which as we touched just a
moment ago I wanted to be using single
leader replication uh the reason being
that uh we have causal dependencies and
comments and as a result I want rights
to be as up to- date as possible so
using something like my SQL I think will
be pretty reasonable uh totally
understand if you think that the right
ingestion is not going to be fast enough
there uh if you can think of a single
leader database that uses LSM trees
maybe that would be better could be
something like H base who knows so now
the thing is that we've got all of these
change data captures and this is where
we start reaching the middle of our
setup because we have all of our Flink
nodes so we've got two Flink nodes the
first is what I would call like the
following Flink
node because this is going to help us
generate all of our derived data in
order to actually make faster queries
for things like loading a user's
following or uh figuring out who they
follow that is verified so if you recall
we've got one user service telling us
who's
verified D do and also who we
follow and we can output that into the
user verified following cache this is
going to make that query as quickly as
possible when we actually have to go
ahead and load up our Newsfeed
additionally we also want a user
following DB because I want to be able
to quickly see hey how many people am I
actually following and who is that and
so we are going to go ahead and do that
over here I wanted to do that in
Cassandra because again I think that uh
for single partition reads Cassandra is
actually quite quick uh especially if
you have a good database schema where
you make sure to partition everything by
the actual user ID so that would look
something like user
ID and then following ID so this is the
person they're
following cool so we have our user
verified following cache put that guy in
redus it's not going to be that much
data I think keeping it in memory is
pretty fair uh you can replicate this as
much as you want you can Shard it out
probably based on user ID I think that's
all pretty reasonable so the other type
of flank node that we have over here is
going to be our posts Flink node so this
is going to take in new posts over here
from change data capture it's going to
take in effectively uh users to see
who's verified IED so that verified
posts in particular can actually go to
the popular post cache and then it is
also finally going to take in the
follower table changes so that we can
say Hey you know user 6 has user 1 2 and
three following them so deliver those to
the corresponding caches you know it
could also then say user 10 is
verified so deliver that post to the
popular post cache now the last piece of
the puzzle is obviously going to be the
reader this is going to be someone
reading their
Newsfeed so as we mentioned reading your
news feed is pretty simple the first
thing that you're going to do is go
ahead and hit the feed service number
one then you're also going to reach out
to the user verified following cache
because we need to know which verified
users we want to load post for that way
we can hit the proper shards of the
popular post cach over here once we do
that we can get all of those results
that we need back same thing goes from
our particular Newsfeed cache it's just
going to be from one of these replicas
one of these partitions and then they
can actually get
aggregated over here on the feed service
you can aggregate them by timestamp it's
not going to be more than like a 100
posts so it shouldn't be too hard to do
that and then you go ahead and return it
right to the user similarly there are a
few other queries that the user is going
to read one of which is going to be hey
who am I following that comes from over
here there's also hey who follows me
that comes from over here there's Al hey
what are my user stats what does my
profile look like that comes from over
here up top in the user database there's
also hey what are the comments that I'm
trying to fetch for this post that comes
from the comments DB and then of course
finally there's hey what have I posted
before or what have other users posted
before that would come from our post DB
anyways guys I know this was a very very
long in-depth massive video and frankly
I may have even gone through it faster
than I needed to uh I don't mean to
brush through any parts and and I
absolutely want to keep this one as
clear as humanly possible but my main
intention was to try and leave no stone
unturned there are definitely a lot of
practical considerations when it comes
to a design like this and uh you know I
think it's very easy to make a video
that skips over some of the in-depth
details of a lot of them uh I could
certainly see a lot of people not loving
this design just due to the amount that
I abuse change data capture but I also
do think it's a very good way to ensure
that all of your data is actually in
sync without having to use two-phase
commit or introducing yourself to a
bunch of partial failure scenarios uh as
always though uh my Solutions aren't
perfect I'm making these up myself I'm
not just going to other channels and
copying them so please do go ahead and
critique me in the comments section ask
questions ask anything you want and I'm
happy to defend myself of course you can
always get me and uh you know you're
probably going to be right anyways guys
I hope you enjoyed this video I will see
you in the next one
関連動画をさらに表示
System Design Mock Interview: Design Instagram
System Design: How to design Twitter? Interview question at Facebook, Google, Microsoft
WHATSAPP System Design: Chat Messaging Systems for Interviews
1: TinyURL + PasteBin | Systems Design Interview Questions With Ex-Google SWE
DynamoDB: Under the hood, managing throughput, advanced design patterns | Jason Hunter | AWS Events
How I Built an Automated Social Media Content Planner (No-code Tutorial)
5.0 / 5 (0 votes)