1: TinyURL + PasteBin | Systems Design Interview Questions With Ex-Google SWE

Jordan has no life
18 Nov 202338:31

Summary

TLDRThis video delves into system design for URL shortening services like TinyURL and paste bins, tackling challenges in generating unique short URLs, handling large-scale data, and incorporating analytics for tracking clicks. The presenter explores various technical solutions, including hashing for URL generation, single-leader replication, partitioning for scalability, and caching to optimize read speeds. The discussion also covers strategies for managing hot links, stream processing for accurate click analytics, and considerations for handling expired links and large pastes, suggesting the use of object stores and CDNs for efficiency.

Takeaways

  • 😀 The video discusses designing a system for URL shortening services like TinyURL and a paste bin service, focusing on generating unique short URLs and handling large-scale data.
  • 🔄 The presenter accidentally recorded 20 minutes without sound, highlighting the importance of checking technical setup before long recordings.
  • 🔗 The core functionality involves creating short links from long URLs and storing pastes with short access links, emphasizing the need for a unique and distributed approach to avoid collisions.
  • 📈 The system design considers analytics, specifically tracking the number of clicks per link, which introduces challenges in ensuring data accuracy and performance at scale.
  • 🚀 The design aims to handle an extremely high scale, with a hypothetical trillion URLs and varying sizes of data, from kilobytes to gigabytes, requiring partitioning and distributed storage.
  • ⚖ The system optimizes for more reads than writes, a common pattern in URL shortening services, which influences the choice of database replication and caching strategies.
  • 🔑 Generating short URLs involves using a hashing function with elements like long URL, user ID, and timestamp to ensure an even distribution and handle collisions through probing.
  • đŸš« The video rules out multi-leader or leaderless replication and write-back caching due to potential conflicts and inconsistencies in link generation.
  • 📚 The choice of database is influenced by the need for single-leader replication, partitioning, and the use of B-tree indexes for efficient reading, leaning towards a traditional SQL database.
  • đŸ”„ To handle hot links with high traffic, the system employs caching strategies, with considerations for cache invalidation and the use of write-around caching to avoid conflicts.
  • ♻ The system uses stream processing with tools like Kafka for handling analytics data, ensuring durability and fault tolerance, and avoiding race conditions in click counting.

Q & A

  • What is the primary purpose of the systems design interview questions discussed in the video?

    -The primary purpose is to delve into the design of systems like TinyURL and ppin, focusing on generating short links and handling analytics such as click counts, while considering performance and scalability.

  • Why is generating a unique short URL considered challenging?

    -Generating a unique short URL is challenging because it must be unique for each link, which can slow down the service as the number of URLs increases, leading to potential collisions and the need for efficient handling mechanisms.

  • What is the significance of using a hashing function in the context of generating short URLs?

    -A hashing function is used to distribute the short URLs evenly across the system, reducing the likelihood of collisions and ensuring a more uniform distribution of link generations.

  • Why might using a monotonically increasing sequence number for short links be a bad idea?

    -Using a monotonically increasing sequence number could lead to performance bottlenecks because it would require locking on that number for every single request, thus reducing concurrency and slowing down the link generation process.

  • What are some performance considerations when designing a system to handle a trillion URLs with varying click rates?

    -Performance considerations include ensuring accurate click count analytics, data storage for potentially petabyte-scale data, and optimizing for a higher number of reads than writes due to usage patterns.

  • How can partitioning help in improving the performance of URL generation and click analytics?

    -Partitioning can improve performance by distributing the load across multiple systems, reducing the chance of hotspots and allowing for more efficient data management and retrieval.

  • What is the role of caching in the context of a URL shortener service?

    -Caching can significantly speed up read operations by storing frequently accessed data, such as popular short URLs and their redirections, in a faster-access storage system, reducing the need to query the database repeatedly.

  • Why might multi-leader or leaderless replication not be suitable for a URL shortener service?

    -Multi-leader or leaderless replication could lead to conflicts where multiple users generate the same short URL at the same time, resulting in incorrect link associations and a poor user experience.

  • What is the proposed solution for handling click analytics to ensure accuracy without performance degradation?

    -The proposed solution involves using stream processing, such as Kafka, to handle click events and then process them in mini-batches using a system like Spark Streaming, ensuring accurate and efficient analytics.

  • How can a write-around cache help in managing the writes to the database for URL click analytics?

    -A write-around cache allows writes to be first made to the database and then propagated to the cache, ensuring data consistency and preventing the cache from serving stale data while also reducing the load on the database due to write operations.

  • What are some considerations for handling large pastes in a paste bin service similar to TinyURL?

    -Handling large pastes requires considering storage solutions like object stores (e.g., Amazon S3) instead of traditional databases, and using CDNs for efficient delivery of large, static files to users.

Outlines

00:00

😅 Systems Design Interview: Tiny URL and PPIN

The speaker begins by expressing frustration over a recording mishap but dives into discussing systems design interview questions, focusing on URL shortening services like Tiny URL and PPIN. They explain the concept of generating unique short links and the additional complexity of integrating analytics for tracking clicks. The importance of ensuring the accuracy of click counts is highlighted, along with the performance considerations at scale, such as supporting trillions of URLs and petabytes of data. The speaker emphasizes the need for a robust partitioning strategy and the predominance of read operations over writes.

05:00

🔄 Link Generation and Handling Hash Collisions

This paragraph delves into the technicalities of generating unique short URLs using a hashing function combined with the long URL, user ID, and creation timestamp. The discussion covers the number of characters needed in the hash result, given the possible combinations with alphanumeric characters. The speaker then addresses potential hash collisions, explaining the use of probing to find an available slot in the hash table, as opposed to chaining or database-linked lists, which are not feasible in this context.

10:01

đŸš« Challenges with Replication and Caching for URL Shortening

The speaker explores the challenges of using replication and caching to enhance write operations in a URL shortening service. They explain why multi-leader or leaderless replication is not suitable due to the potential for conflicts when multiple users generate the same short link simultaneously. The paragraph also discusses the limitations of write-back caching due to the inability to ensure immediate validity of the link, thus ruling out certain databases and caching strategies.

15:02

🔑 Database Schema and Handling Concurrent Writes

The paragraph discusses the database schema for a URL shortening service, including fields for the short URL, actual URL, user ID, creation time, expiration time, and click count. It addresses the issue of concurrent writes to the same short URL by users and how databases can handle this using predicate locks or by materializing conflicts. The speaker also touches on the efficiency of using indexes and the potential use of stored procedures to reduce network calls.

20:03

🛠 Choosing the Right Database Engine and Indexing Strategy

The speaker compares different database engines and indexing strategies for the URL shortening service. They discuss the trade-offs between LSM trees and B-trees, ultimately favoring B-trees for their read efficiency. The choice of a single-leader replication database is justified, and the speaker expresses a preference for MySQL due to the simplicity of the data model, despite the potential for NoSQL solutions.

25:03

🔍 Optimizing Read Speeds and Handling Hot Links

This paragraph focuses on optimizing read speeds, which is a priority due to the high number of reads compared to writes. The speaker discusses the use of replication and partitioning to distribute load and improve read performance. They also introduce the concept of hot links, which receive a high volume of traffic, and propose caching as a solution to handle these efficiently, suggesting a write-around cache strategy with an LRU eviction policy.

30:03

📈 Stream Processing for Accurate Analytics

The speaker proposes using stream processing to accurately track URL clicks and avoid race conditions in the analytics process. They suggest using a message broker like Kafka for durability and fault tolerance, and discuss different methods of writing data to the cache. The paragraph also explores options for processing the stream of events, such as using Spark Streaming with mini-batching to update the click count in the database without the need for locking.

35:06

🔒 Ensuring Exactly-Once Semantics in Stream Processing

This paragraph addresses the challenge of ensuring exactly-once processing semantics in stream processing systems. The speaker discusses the potential for events to be processed more than once due to network issues or system failures. They propose solutions such as using an idempotency key or partitioning the Kafka queues and Spark Streaming consumers by short URL to avoid race conditions and ensure accurate click counts.

đŸ—‘ïž Handling Expired Links and Paste Bin Considerations

The speaker discusses the handling of expired links through a simple batch job that clears out links past their expiration time. They also differentiate the handling of Paste Bin, which involves large files that cannot be stored as fields in a database. The paragraph suggests using an object store like Amazon S3 for large pastes and leveraging a CDN for quick delivery, advocating for a write-through cache strategy in this case.

🌐 Comprehensive System Design for URL Shortening and Paste Bin

The final paragraph brings together all the components discussed in the previous paragraphs to present a comprehensive system design. The speaker outlines the processes for writers and readers, the use of load balancers, URL assigning services, databases, caches, Kafka for event streaming, and Spark Streaming for analytics. They also highlight the role of ZooKeeper as a coordination service for managing metadata and ensuring consistency across the system.

Mindmap

Keywords

💡TinyURL

TinyURL is a URL shortening service that provides a shorter alias for a longer URL. In the context of the video, it represents the main subject of discussion, with the speaker exploring the system design considerations for creating such a service. The script mentions 'tinyurl.com/SLABC' as an example of a short link created for a longer URL like 'google.com'.

💡Pastebin

Pastebin refers to a type of website where users can store text, usually code snippets or logs, for a set period. The script compares the design of a TinyURL service with a 'pastebin' service, noting the challenges of handling large text pastes that could be in gigabytes, contrasting with the kilobytes of a typical TinyURL.

💡URL Shortening

URL shortening is the process of converting a long URL into a shorter one. The video script discusses the technical aspects of generating unique short URLs and the challenges associated with ensuring their uniqueness and performance at scale. It is central to the theme of the video as it is the primary function of the services being discussed.

💡Analytics

In the video, analytics refers to the tracking of data such as the number of clicks per link. The script mentions adding analytics to the URL shortening service to make the problem more complex and in-depth, highlighting the need for accuracy in tracking these metrics.

💡Hashing Function

A hashing function is a mathematical process that converts an input (or 'message') into a fixed-size string of characters, which is typically used for ensuring data integrity. In the script, the speaker discusses using a hashing function to generate short link keys evenly across the system, incorporating elements like the long URL, user ID, and timestamp.

💡Partitioning

Partitioning, in the context of the video, refers to the process of dividing a database into parts to distribute the load and improve performance. The script talks about partitioning as a method to speed up writes and manage a large scale of data, such as a trillion URLs.

💡Replication

Replication in the video script refers to the strategy of duplicating data or database objects across multiple servers to enhance fault tolerance and read throughput. The speaker discusses different types of replication, such as multi-leader and single-leader replication, in the context of URL data management.

💡Caching

Caching is the process of storing data in a temporary storage area (cache) to improve performance. The video script explores the use of caching to speed up read operations for the URL shortening service, considering the high read-to-write ratio.

💡Consistent Hashing

Consistent hashing is a technique used to minimize the redistribution of keys when the number of slots (or servers) changes. The script mentions consistent hashing as a preferred method for partitioning in the URL shortening service to ensure efficient distribution and minimal reorganization when the system scales.

💡Stream Processing

Stream processing refers to the computational method of processing data in a continuous, real-time flow. The speaker in the video suggests using stream processing to handle the aggregation of click events for the URLs, which is a more scalable approach than updating counts directly in the database.

💡Write-Through Cache

A write-through cache is a type of cache where data is written to the cache and the underlying storage simultaneously. The script discusses the write-through cache as a potential method for updating the URL data but concludes that it might not be feasible due to the potential for data inconsistencies and conflicts.

💡Write-Around Cache

Write-around cache is a caching strategy where write operations are directed to the underlying storage instead of the cache. The video script suggests using a write-around cache for the URL shortening service to avoid conflicts and ensure that the data is written directly to the database.

💡Item Potency

Item potency is a property of a system that ensures that even if an operation is repeated, the result remains the same. The script discusses item potency keys to guarantee that updates to the click count in the database are not duplicated, which is crucial for maintaining accurate analytics.

💡Zookeeper

Zookeeper is a coordination service used for managing and coordinating distributed systems. In the script, Zookeeper is mentioned as a way to keep track of partitioning information and metadata for the various components of the URL shortening service, ensuring consistency and coordination.

Highlights

Introduction to systems design interview questions focusing on URL shortening services like TinyURL and ppin.

The challenge of generating unique short URLs and the potential impact on service performance.

Incorporating analytics to track the number of clicks per link and ensuring accuracy.

Performance considerations at scale, with a hypothetical trillion URLs scenario.

The importance of read-write ratio in system design, optimizing for more reads than writes.

Link generation strategies using hashing functions to distribute links evenly.

Handling hash collisions in URL generation with probing methods.

The pitfalls of using multi-leader or leaderless replication for URL assignment.

The role of caching in speeding up writes and its limitations in the context of URL shortening services.

Partitioning as a method to increase write throughput and its advantages.

The use of consistent hashing to minimize key redistribution during cluster size changes.

Database schema design for storing URLs, user IDs, creation times, and click counts.

Techniques to handle concurrent URL creation to avoid conflicts in a distributed system.

The choice between LSM tree and B-tree indexes for the database based on read and write priorities.

Optimizing read speeds through replication, partitioning, and handling hot links with caching.

Approaches to updating click analytics without causing race conditions in a high-traffic scenario.

Utilizing stream processing for handling click events and the role of systems like Kafka.

Ensuring exactly-once processing in stream consumers to prevent incorrect click counts.

The implementation of expiration logic for links and the use of batch jobs.

Distinguishing the design for a paste bin service, including handling large files with object stores and CDNs.

The overall system architecture integrating writers, readers, databases, caches, and stream processing.

Transcripts

play00:00

hello everyone and welcome back to the

play00:01

channel today after much long waited

play00:05

time we are finally going to be getting

play00:07

back into our systems design interview

play00:10

questions starting off of course with

play00:12

tiny URL and ppin now I've just had the

play00:15

absolute pleasure of going 20 minutes

play00:17

into this video and then realizing that

play00:19

my microphone wasn't turned on and now

play00:21

I'm on absolute tilt so if any of you

play00:24

think that uh you know you're not

play00:26

looking forward to watching this

play00:27

probably hourong video just know that

play00:29

I'm looking forward to recording it and

play00:31

editing it a whole lot less anyways

play00:34

let's go ahead and get started before I

play00:36

freak out so today we're going to be

play00:38

talking about tiny URL and ppin and so

play00:41

the gist of these two is pretty simple

play00:43

basically we're going to be generating

play00:45

short links so in the case of tiny URL

play00:47

we might have something like

play00:50

google.com and someone wants to make a

play00:52

short link for it called tinyurl.com

play00:56

SL

play00:58

ABC same goes for for past

play01:01

bin If instead of entering my text here

play01:04

I was to write Jordan is

play01:07

sexy which we all know to be the case of

play01:10

course our paste would be pointed to by

play01:13

that short link so let's actually go

play01:15

through some formal problem

play01:17

requirements so basically what we're

play01:19

going to talk about today is a couple of

play01:21

important things the first thing is

play01:22

generating this unique short URL now the

play01:25

she the unique short URL more or less

play01:28

has to be unique obviously like I've

play01:30

just said but this is easier said than

play01:32

done and it's of course going to make

play01:34

our service slower so the other thing is

play01:37

that in addition to that URL I'm

play01:39

actually going to make this problem a

play01:41

little bit more complicated and

play01:42

hopefully a little bit more in-depth by

play01:44

adding some amount of analytics in this

play01:46

case the number of clicks per link now

play01:48

you might think that this is super easy

play01:50

as a matter of fact it is not and of

play01:53

course the reason it's not easy is

play01:55

because we have to be able to ensure

play01:56

that this number is actually accurate

play01:58

while it doesn't have to be there the

play01:59

second that click occurs eventually we

play02:01

do need to know the right

play02:03

answer so let's think about some

play02:05

performance considerations as well uh

play02:08

disclaimer I doubt that if I were to

play02:10

build a real tiny URL site there would

play02:12

be a trillion URLs that I have to

play02:14

support nonetheless this systems design

play02:16

problem wouldn't be very much fun if

play02:18

there weren't so as a result I'm going

play02:20

to name some obscene amount of scale and

play02:22

then we're going to try and figure it

play02:24

all out so let's imagine that for our

play02:25

median URL we've got 10,000 clicks the

play02:28

most popular URL is in the mid millions

play02:30

of them so that's going to be quite a

play02:31

bit and make things tougher like I

play02:33

mentioned there's going to be a trillion

play02:34

overall and then if we're talking about

play02:36

past bin certain past can actually be in

play02:38

the gigabytes and others are going to be

play02:41

in kilobytes probably the vast majority

play02:43

of those so let's think that in that

play02:46

case if we're just going to do some back

play02:47

of the envelope math over here we've got

play02:49

1 trillion short URLs times 1 kilobytes

play02:53

worth of pastes that's going to be equal

play02:55

to a petabyte and that is certainly more

play02:57

data than we can store on one individual

play02:59

system we're going to have to be doing

play03:01

some partitioning at some point down the

play03:03

line another thing to note is just based

play03:05

on the actual use patterns of people who

play03:07

use tiny URL a lot of people are going

play03:10

to be clicking the link a lot more than

play03:12

they are generating links so as a result

play03:14

of that we're going to have many more

play03:15

reads than writes and that's probably

play03:17

the type of case that we want to be

play03:19

optimizing on so let's go ahead and

play03:22

actually start by talking about link

play03:24

generation because that is going to be

play03:25

the most critical part of this problem

play03:27

that we really need to understand so is

play03:30

possible if we really wanted to to

play03:32

basically just have some monotonically

play03:33

increasing sequence number for all of

play03:35

our short links but I think that's a

play03:37

pretty bad idea because then you

play03:38

basically need to lock on that number

play03:41

every single time for every single

play03:42

request that everyone's making so you

play03:44

can actually generate that short number

play03:46

instead what we should probably be doing

play03:48

is generating those links out evenly

play03:50

across our system and the way that we

play03:51

can do this is using a hashing function

play03:54

I think just to continue to make sure

play03:56

that we get very evenly distributed

play03:57

links we should probably put in things

play03:59

like like in the tiny URL case with the

play04:01

long URL a user ID and the create Tim

play04:03

stamp so we've got one example of that

play04:05

over here and as you can see it's going

play04:07

to Output some sort of string for us to

play04:11

use as our tiny URL key and so the

play04:14

question we should now be asking

play04:15

ourselves is how many characters do we

play04:16

actually need in our hash result well

play04:19

assuming that we're using 0 through 9

play04:21

and A to Z that gives us 10 + 26 = 36

play04:26

possible choices per character and so as

play04:29

a result that means per slot we have

play04:31

here we've got 36 * 36 * 36 * so on

play04:35

which is 36 to the N combinations where

play04:37

n is how many characters we're going to

play04:38

generate and so as a result if n is

play04:40

equal to 8 I looked this up on Google

play04:42

because I can't do this in my head we

play04:44

have around 2 trillion combinations and

play04:46

considering I said we need about a

play04:48

trillion links that should be good for

play04:50

us so the question now is what do we

play04:52

actually want to do on hash collisions

play04:53

cuz when you only have two trillion

play04:55

buckets and one trillion things you're

play04:57

probably going to have a decent amount

play04:58

of those well in the case of a hash

play05:00

collision at least on a single normal

play05:03

computer in our actual computer science

play05:05

programs we basically have two options

play05:07

one is that you can do something known

play05:10

as chaining which is creating a linked

play05:11

list out of each hash key bucket and the

play05:14

other is probing which is saying oh if

play05:16

this bucket is already taken why don't

play05:18

we try and put our value over here now

play05:21

in a database we can't really use a

play05:23

linked list so probably the only

play05:24

feasible thing is going to be probing so

play05:27

for example if we had you know the hash

play05:29

32 fc1 ca8 imagine this was like base 36

play05:34

or something then the next hash to click

play05:37

is going to be 32 fc1 ca9 and so on we

play05:41

can basically keep trying until we find

play05:43

an available one hopefully that makes

play05:45

sense so now the first thing I wanted to

play05:48

talk about for this video is writing

play05:50

those URLs actually assigning them and

play05:53

then as a result what type of

play05:54

replication that we need to use within

play05:56

our databases so of course even though

play05:58

we care more so about reads than writes

play06:00

we are still doing a ton of writs I

play06:02

mentioned there are going to be a

play06:03

trillion URLs and as a result of that we

play06:05

want to maximize that throughput so the

play06:08

first question that I wanted to ask is

play06:11

can we do so with replication can we use

play06:13

a multi-leader or leaderless replication

play06:15

schema because these are two ways of

play06:17

actually making sure you can speed up

play06:18

wrs it gives you the ability to write to

play06:21

many replicas and as a result you can

play06:22

increase your throughput and the answer

play06:24

that I think is going to be the case

play06:26

here is no sadly you cannot so let's

play06:29

come up with example here's me on the

play06:31

right Jordan and here's a businessman on

play06:32

the

play06:34

left we're both submitting URLs for

play06:37

Generation to the database at the same

play06:39

time and his is leading to my

play06:41

presentation.com mine is leading to one

play06:43

of my favorite sites personally and then

play06:46

at the same time you'll notice that

play06:47

we've got the same short link so as a

play06:49

result we have a conflict and so when it

play06:52

comes to things like multi-leader or

play06:53

leaderless replication it's kind of

play06:55

arbitrary a lot of the time how you

play06:57

actually end up resolving which right is

play06:59

going to win so let's imagine that we

play07:00

use last right wins lww and my time

play07:03

stamp happens to be a little bit higher

play07:05

than his mainly just out of random luck

play07:08

because keep in mind we can't even trust

play07:09

distributed timestamps So This Server

play07:11

happens to have timestamp uh X and this

play07:14

one's got time stamp X plus one so

play07:16

Jordan's right wins and then all of a

play07:18

sudden this guy is reading Jordan's Link

play07:21

and he's saying WTF because he thought

play07:24

this was a business

play07:25

presentation so in theory you know if

play07:28

you wanted to play advocate here you

play07:30

could go and say that well okay maybe a

play07:33

few seconds later we could have told the

play07:35

businessman hey your right is no longer

play07:37

valid but at least in my opinion this is

play07:39

not going to be good here because most

play07:41

people generate their link they copy

play07:42

paste it they leave the site and then

play07:44

they paste it elsewhere so most of the

play07:46

time I feel that you probably want to be

play07:48

giving them the right link the second

play07:49

they click that button so of course this

play07:52

is going to rule out all sorts of

play07:53

databases that actually use leaderless

play07:55

replication and that's going to include

play07:57

any of the Dynamo style ones so Cass

play07:59

Andra Sila Rak Etc we want to be using

play08:02

single leiter

play08:04

replication okay next we are going to be

play08:06

talking about caching because caching is

play08:08

actually another way that you can speed

play08:10

up your rights assuming you're doing it

play08:12

properly by using a right back cache and

play08:15

so for example we've got these two guys

play08:17

over here which are basically in memory

play08:18

databases they're you know reddest

play08:21

databases in particular and the gist is

play08:23

that you can first write to them and

play08:25

then eventually some point down the line

play08:27

you can flush those out but what we

play08:29

encounter here is basically the same

play08:31

problem that we encountered above with

play08:33

multileader or leaderless replication

play08:35

which is that no one really knows the

play08:36

second they make the right whether their

play08:38

link is valid or not and as a result if

play08:40

they send it around in the meantime it

play08:42

could be someone else's link and that

play08:44

would be a problem so again same issue

play08:46

as before we can't really use right back

play08:48

hasing to speed up our rights that's

play08:51

unfortunate what about partitioning

play08:52

that's another way you can speed up your

play08:54

rights the more partitions that you have

play08:56

the more load you can put on each one

play08:57

and you can spread things out a little

play08:59

bit

play08:59

well in my opinion this is a perfectly

play09:01

valid way of speeding up our rights and

play09:03

we should totally be doing it so we can

play09:05

actually just go ahead and take all of

play09:06

our short URLs and partition on those so

play09:09

we've got all the ones through a through

play09:11

D over here e through h on the next one

play09:14

and so on and so on and so you might

play09:16

notice that I'm actually partitioning by

play09:17

short URL range as opposed to Hash range

play09:20

my argument for that is that the short

play09:22

URL is itself already a hash so things

play09:24

should be pretty evenly distributed we

play09:26

shouldn't have to worry about load too

play09:28

much also recall that I basically said

play09:31

whenever you want a short URL X and you

play09:34

can't get it you basically have to try x

play09:36

+ one and so as a result x + one would

play09:40

be on the same partition as X so it

play09:42

means that you don't have to go to

play09:43

another partition randomly to make that

play09:45

right and it should in theory increase

play09:48

latency so another thing that is

play09:51

important to know with partitioning in

play09:52

general is that we probably should be

play09:53

using some sort of consistent hashing

play09:57

reasoning being that with consistent

play09:58

hashing hashing as opposed to hashing

play10:00

where you basically do mod n where n is

play10:02

the number of partitions with your short

play10:05

key that basically it means that

play10:07

whenever the cluster size changes fewer

play10:10

uh fewer keys are going to have to be

play10:11

redistributed so let's imagine this

play10:14

Arrow right here represents everything

play10:16

on this node over here and when I add a

play10:19

new node to the cluster now the only

play10:22

thing that's going to be moving is this

play10:24

range over here as well as just uh as

play10:26

opposed to just a variety of scattered

play10:28

keys throughout our hash range and so

play10:31

that is going to help us quite a bit

play10:34

okay so let's actually talk about our

play10:35

database schema and what things look

play10:37

like on a single node so you can see

play10:40

I've got an index right here or not an

play10:42

index but an actual table which has our

play10:44

10 URL and ideally we want this to be

play10:46

unique like I mentioned we've got an

play10:48

actual URL a user ID a create time an

play10:51

expire time we'll talk about how to

play10:52

expire things towards the end of this

play10:53

video and also a number of clicks which

play10:56

again we'll talk about towards the end

play10:57

of this video so let's let's imagine

play10:59

we've got two users user one and user

play11:01

two the way that we've organized our

play11:03

schema we actually have a little bit of

play11:04

a problem which is that in theory they

play11:07

could be both adding the same row with

play11:09

the same key for their short URL at the

play11:12

same time and our database wouldn't be

play11:13

able to do anything about it why well

play11:15

normally you would want to be locking on

play11:17

this key but in our case this row didn't

play11:20

actually exist yet before they added

play11:22

them and so we don't actually have

play11:23

anything to lock on so how would a

play11:25

database internally actually go ahead

play11:27

and do something like this well there's

play11:29

two possible solutions one of which is

play11:32

called predicate locks predicate locks

play11:33

are pretty simple they're basically just

play11:35

locks on rows that don't exist yet and

play11:37

you specify a query on that row so in

play11:40

this case you know as you can see we

play11:42

would be taking everything from the URLs

play11:43

table where the short URL key is the

play11:46

thing that the conflict was on and so

play11:48

what this sty looks like uh in turn is

play11:51

if user one says create the link but

play11:53

also give me the lock on short uh short

play11:55

URL X User two can do the same thing but

play11:59

the database is now going to come back

play12:01

and tell him to kick rocks because that

play12:02

is now taken then he's going to have to

play12:04

go back and create link X+1 and then

play12:07

finally the database can say okay so a

play12:10

couple things to note here first off is

play12:12

that predicate queries can potentially

play12:14

be expensive why because they have to go

play12:15

through the whole database and actually

play12:17

find all the rows that potentially apply

play12:19

so something that could make this a

play12:21

little bit faster is actually using an

play12:22

index on this short URL field why

play12:26

because an index basically means that

play12:27

this guy is going to be sorted

play12:31

internally and as a result of sorting by

play12:34

tiny URL now all of our queries are

play12:37

going to be o of log on that field

play12:39

because we can binary search there

play12:40

instead of having to do a linear scan

play12:42

which would be o of n additionally note

play12:45

that right over

play12:46

here we've got two sets of network calls

play12:49

made by the guy who basically lost the

play12:51

raise condition and now has to try to

play12:54

get link X+ one what we could do instead

play12:57

is use some sort of stored procedure

play12:58

Advanced database function where it's

play13:00

like if x is

play13:03

taken grab x + one or something like

play13:06

that and that way we could maybe do all

play13:08

of that writing with just one network

play13:10

call perhaps it would speed things up a

play13:12

little

play13:12

bit okay another way that we can also

play13:15

handle this in addition to predicate

play13:16

locks is by actually materializing

play13:18

conflicts so as opposed to you know

play13:21

locking on rows that don't exist yet

play13:23

what if we actually just grab the lock

play13:24

on rows that do exist well how can they

play13:26

exist we basically just write every

play13:29

single possible key to the database now

play13:31

you may think to yourself oh this has

play13:33

got to be a ton of Rights we have a

play13:34

trillion possible short URLs or actually

play13:36

two trillion because there are that many

play13:39

combinations well keep in mind that uh

play13:42

basically there's one bite per character

play13:44

there's eight characters per short URL

play13:46

and that actually only comes out to 16

play13:47

terabytes which is really not that much

play13:49

in the grand scheme of things you'll

play13:51

probably still have to partition but

play13:52

again not that much data and so now when

play13:55

user one and user two go ahead and try

play13:58

to both right to the same key user one

play14:01

can go ahead and grab the lock user two

play14:03

is going to lose the database is going

play14:05

to say try again and then he can try

play14:08

with a different

play14:10

row okay now let's talk about potential

play14:13

database engines because of course this

play14:14

is something we want to be thinking

play14:16

about as well the first thing to note is

play14:18

that for a problem like this we actually

play14:20

don't really ever need to do range

play14:22

queries maybe with the exception of a

play14:24

predicate lock so a hash index could be

play14:27

fast but at the end of the day we are

play14:31

storing about a petabyte of data and you

play14:34

know your interviewer might not let you

play14:36

get away with that one they might just

play14:37

say that's going to be too expensive

play14:38

choose an actual on disk index to use

play14:41

and so if we are limited to on disk

play14:43

indexes that gives us two choices first

play14:45

off the LSM tree second off the B tree

play14:49

now the trade-offs for these are in my

play14:51

opinion somewhat simple the LSM tree is

play14:54

going to be a little bit worse for reads

play14:55

because when you read you typically have

play14:56

to read from multiple SS tables and

play14:59

possibly the LSM tree as well and when

play15:01

you write you're basically just writing

play15:03

over to the LSM tree which is in memory

play15:05

so that should be relatively quick

play15:07

you'll flush it later on the other hand

play15:09

for a b tree you're basically writing

play15:11

straight to disk which is a little bit

play15:13

less efficient but at the end of the day

play15:15

when you're reading you only have to

play15:17

Traverse the tree one time there's no

play15:19

concept of having to check multiple

play15:21

different files and so a b tree is

play15:23

literally just going right through

play15:25

finding the piece of data that you want

play15:27

it's all sorted and then you're good to

play15:29

go so in this case because we are

play15:31

prioritizing reads over wrs generally

play15:33

here I think I would like to opt for the

play15:35

B tree oops accidentally did that but

play15:38

I'm open to hearing what anyone else has

play15:40

to say there so let's actually go ahead

play15:43

and choose a database because now we

play15:44

have some good parameters with which to

play15:46

filter our choices down we've already

play15:48

said we want single leader replication

play15:49

we've already planned on doing some

play15:51

amount of partitioning we want a

play15:53

database that uses a b tree index and

play15:55

personally for me considering the

play15:57

Simplicity of all of it I think that

play15:59

would just make it my sequel uh for me I

play16:02

could see why someone might say mongod

play16:03

DB if they're particularly inclined to

play16:05

no SQL but it's not like our data

play16:07

structures that we're using here in our

play16:08

data model are particularly complex I

play16:11

don't really see the argument for a

play16:12

document data model so I personally

play16:14

would lean towards my

play16:16

SQL okay so let's start talking about

play16:18

read speeds because that's the thing

play16:20

that we really want to be optimizing

play16:21

here like I mentioned there are a ton

play16:23

more reads than there are wres so so far

play16:26

we've already discussed the fact that

play16:27

we're going to be using replic

play16:28

replication not only for fault tolerance

play16:30

but also to speed up reads because

play16:32

replication is going to allow us to read

play16:34

from our follower replicas in addition

play16:36

to our leader and also multiple

play16:39

partitions which also means that for

play16:40

every partition we're going to have less

play16:42

load as a result of the fact that there

play16:44

are now more places to actually read

play16:45

from which is going to be really great

play16:48

now it is worth noting that in this case

play16:50

of single leader

play16:52

replication because we are going to be

play16:54

using asynchronous consistency or

play16:56

eventual consistency would probably be

play16:58

be the better word to use it is possible

play17:00

that a client over here could read from

play17:03

a

play17:04

follower get some stale data where maybe

play17:06

it thinks there's no actual URL to

play17:08

redirect you to when in reality there

play17:11

has been over here but it just hasn't

play17:13

been replicated yet um I do think that

play17:15

in this case you could maybe add some

play17:17

application logic to go check the leader

play17:19

replica but at the same time you should

play17:21

probably be careful with that you never

play17:23

know how many times they're just like

play17:24

Bots that are constantly spamming all of

play17:26

your followers to actually look for you

play17:28

know redirects or anything like that you

play17:30

might end up putting a lot of load on

play17:32

your leader as a

play17:33

result okay so maximizing read speeds

play17:37

we've spoken about replication we've

play17:38

spoken about partitioning but next let's

play17:41

speak about hot links because I did

play17:43

mention that certain links are going to

play17:44

have a ton of traffic they're going to

play17:46

have millions of clicks and as a result

play17:48

we need to find a way to actually deal

play17:50

with those it would be great if not

play17:52

every single one of those requests uh

play17:54

for a particular hotlink result had to

play17:57

go to the database especially because

play17:58

they're all returning the same thing and

play18:01

so what would be a good thing to do here

play18:02

well we should probably introduce some

play18:04

amount of caching so again the caching

play18:06

layer the thing that's actually nice

play18:08

about it is first of all we can scale

play18:10

this thing independently if we have you

play18:12

know a ton of hot links maybe we need

play18:15

more caches additionally we can also

play18:17

partition the caches by the short URL in

play18:20

the same way that we would our databases

play18:22

so that more or less all of the requests

play18:25

for a particular hot link can go to the

play18:27

same cach or same set of caches and that

play18:30

way we don't have to basically recompute

play18:32

that value multiple different times on

play18:33

many caches so as you can see all these

play18:35

guys want ABC right here the best thing

play18:38

to do is to all have those requests go

play18:40

to the top cache and then maybe this guy

play18:43

would be I don't know like H through Z

play18:45

requests or something like

play18:47

that okay so we've already spoken about

play18:49

the fact that we want caching but how

play18:51

are we actually going to make sure that

play18:53

the data that we want gets in the cache

play18:56

well when talking about caching or CDN

play18:58

or anything like that in general there

play18:59

are two concepts that we have to

play19:01

consider we can either push the data to

play19:03

the cache basically in advance when it's

play19:05

created or we can pull it in there so in

play19:08

my opinion I don't think pushing is

play19:10

really going to work here because at the

play19:12

end of the day we don't know which links

play19:13

are going to be popular beforehand

play19:15

there's no way on our servers to say

play19:17

like ooh I can tell this link is going

play19:18

to be super hot let's put it in the

play19:20

cache beforehand so we can warm it up no

play19:22

that's probably not going to happen so

play19:24

at the end of the day we're going to

play19:25

have to be pulling that data in there

play19:27

somehow so there are basically three

play19:29

methods of writing data to our cache

play19:31

that we can consider the first one we've

play19:33

already spoken about which is the right

play19:35

back cache we've already said we can't

play19:37

really do this because it's going to

play19:38

lead to data inconsistencies and right

play19:41

conflicts which is not going to be

play19:42

feasible for us the second one which is

play19:45

a possibility is going to be the right

play19:47

through cache so in the right through

play19:49

cache which you can see over here the

play19:51

gist is basically that in addition to

play19:53

writing to the database at a given time

play19:54

you also write to your cache now if you

play19:57

need to you can use two pH is to commit

play19:58

to make sure that they always stay in

play20:00

line or you can just do best efforts uh

play20:02

but the gist is you can do that

play20:04

personally I don't think it's necessary

play20:05

for us because it's going to slow down

play20:07

our right speeds a lot and the vast

play20:09

majority of these links we really don't

play20:11

even want in our cash in the first place

play20:12

so it's not that useful to do a write

play20:14

through in my opinion we should probably

play20:16

just do a write around cache which is

play20:19

basically where you just go and write to

play20:20

the database as per usual and then as

play20:23

people will eventually read from the

play20:25

cache the database will send its results

play20:27

for first to the cach and then back to

play20:29

the user so as you can see that's what's

play20:32

happening down over here and of course

play20:35

you know we are obviously going to run

play20:37

out of room as our cash gets filled up

play20:39

with all of these short links and of

play20:41

course we want to keep the most popular

play20:43

results because that's how we get the

play20:44

most us usage out of our cash the way

play20:47

that I personally would do this is

play20:48

pretty standard which is just least

play20:49

recently used eviction whatever entry in

play20:52

the cache has least you know been least

play20:54

recently used whenever we're performing

play20:55

a read from it get rid of that one

play20:57

popular it with a new piece of data

play21:00

hopefully simple enough Okay so we've

play21:02

spoken about some reads we've spoken

play21:04

about some wrs we seem to have a pretty

play21:06

fast solution in terms of our

play21:08

replication our partitioning our caching

play21:10

but now let's actually go and talk about

play21:13

a potential solution for our analytics

play21:16

so if you recall when I showed us the

play21:17

database schema we do have a column for

play21:20

clicks and in theory what we could do is

play21:23

just go ahead and update that right you

play21:25

know we've got let's say 100 for the

play21:27

short link a bc1 2 3 we could have the

play21:29

guy on the left increment it by one when

play21:31

he makes the click a guy on the right

play21:32

increment it by one when he makes the

play21:34

click however what many of you might

play21:37

already be anticipating is that without

play21:38

any sort of locking this is going to be

play21:41

a race condition why because this guy

play21:43

might read 100 first this guy might

play21:46

read 100 first and then they're both

play21:48

going to say set it to 100+ 1 so now

play21:51

they're going to write 101 and this

play21:53

guy's going to write 101 and then this

play21:55

gets set to 101 instead of 102 too and

play21:58

keep in mind that for very popular links

play22:00

this is a real possibility if you've got

play22:02

hundreds of thousands of people clicking

play22:04

it every single minute you're going to

play22:05

have a lot of these conflicts and you

play22:06

would need to implement locking and when

play22:08

you're implementing either locking or

play22:10

Atomic operations for something that's

play22:11

popular enough the database might not be

play22:14

able to handle that so keep that in mind

play22:17

it's probably too slow using this sort

play22:19

of naive

play22:20

implementation so what's a potentially

play22:22

better way that we can do this well what

play22:25

if we were to use stream processing so

play22:29

basically my idea is that we dump the

play22:31

data somewhere where we don't need to

play22:32

grab a lock in order to dump it there

play22:35

and then we can go ahead and aggregate

play22:36

it later so in theory you know we could

play22:39

dump it to a database but the question

play22:42

is do we need to dump this to a database

play22:44

a database might be slower than just

play22:45

dumping it to something like a you know

play22:48

inmemory message broker or a log base

play22:51

message broker because at the end of the

play22:53

day that's basically just either

play22:54

something in memory or a write ahead log

play22:56

that you're writing to so again I feel

play22:59

like we should rule out the database

play23:00

it's just going to be slower to write to

play23:02

and additionally we've also got our

play23:04

in-memory message broker which is good

play23:07

however I also did say that I want to

play23:09

ensure that the analytics results that

play23:11

we get are actually correct now the

play23:12

issue with an inmemory message broker is

play23:15

that it's in memory which means it's

play23:16

probably less fault tolerant barring it

play23:18

using a right ahead log and as a result

play23:21

even though it's super fast it's not

play23:22

going to be as durable and so what I

play23:24

would prefer to do is meet these things

play23:26

in the middle use a l based message

play23:28

broker where everything is kept on disk

play23:30

we've got offsets for every single entry

play23:33

and essentially what we're doing is

play23:35

writing to a right ahe head log and as a

play23:36

result of that it is going to be durable

play23:38

now an example of such a solution would

play23:40

be Kafka if you want to use AWS maybe

play23:43

AWS Kinesis but uh let's try and keep

play23:45

things open source here so I'm going to

play23:46

say

play23:47

Kafka okay so now let's actually talk

play23:50

about the consumer of these events how

play23:53

is it that once we have the events

play23:54

placed in some sort of you know queue

play23:57

that that we can go ahead and process

play23:59

them what technology do we want to use

play24:01

to do so well we've got a few options

play24:03

one of which is that you know we could

play24:04

have something dump to htfs or you know

play24:08

S3 or anything like that and then

play24:10

eventually run a batch job on it in my

play24:12

opinion uh this would probably give us

play24:14

analytics to infrequently but it really

play24:16

depends on your users right if they're

play24:18

content to have analytics once per day

play24:21

then you could do that but I feel like

play24:22

most people would like a little bit of a

play24:24

shorter granularity than that maybe

play24:25

every couple minutes maybe even every

play24:27

few seconds

play24:28

so personally I'm not a fan of the batch

play24:31

processing solution additionally another

play24:33

thing that we could do is use something

play24:35

like aachi flank which is more of a

play24:37

real-time solution so in this case we

play24:40

would process every single event that we

play24:42

receive from Kafka individually now

play24:44

personally I don't also think this is

play24:46

necessary because at the end of the day

play24:47

if we're processing every single event

play24:49

individually this also means that every

play24:52

single event is going to lead to a right

play24:53

to the database and there's not really

play24:55

any reason to put that much load on it

play24:57

now to be fair you could write custom

play24:59

flank code that basically just goes and

play25:00

says you know maybe every 10 that you'll

play25:02

upload to the database but in my

play25:05

personal opinion the best option here is

play25:07

probably just going to be something like

play25:08

spark streaming where Spark streaming

play25:11

natively supports something like mini

play25:13

batching which is configurable and we

play25:15

could just say something like hey give

play25:16

me mini batches of 100 clicks and so as

play25:19

a result every single 100 clicks you

play25:21

would go ahead and write to the sync in

play25:24

that type of

play25:26

interval so the very nice thing about

play25:28

these stream consumer FR Frameworks like

play25:31

spark streaming like Flink is that they

play25:33

actually ensure correctness at least

play25:36

within the stream processing system so

play25:38

they basically say that between EV every

play25:40

event that gets to Kafka and the actual

play25:43

eventual state of our stream consumer

play25:45

that all of those events are only going

play25:47

to be processed once however spoiler

play25:51

that is going to break down whenever you

play25:53

have some sort of external system like a

play25:55

database so let's go ahead and talk

play25:57

about

play26:00

that the question is are our events

play26:03

actually going to be processed exactly

play26:05

once if our thing here is just going to

play26:07

be you know do plus 100 Maybe not maybe

play26:11

things would be a little bit different

play26:12

if we actually aggregated the total

play26:15

count in our stream consumer and then

play26:17

occasionally updated the database but I

play26:19

think it makes sense to basically just

play26:21

take every single mini batched event and

play26:23

then upload it right to the database so

play26:26

again like I said my question is is do

play26:28

things actually get run exactly once and

play26:30

is it possible to have certain race

play26:32

conditions that could cause us to have

play26:33

an incorrect click count in my opinion

play26:36

yes so like I mentioned events are only

play26:38

processed exactly once internally so

play26:40

that means within here events are

play26:44

processed exactly once but here's an

play26:46

example of something that can happen

play26:47

from spark streaming we say to the

play26:49

database hey add 100 clicks and right

play26:52

before the database can respond saying

play26:54

hey you know I acknowledge this even

play26:57

though it went through on the database

play26:58

for some reason this internet connection

play27:00

gets cut off spark streaming goes down

play27:03

and then when it comes back

play27:05

up it doesn't see that database

play27:07

acknowledgement it doesn't realize that

play27:09

the event was processed and now it's

play27:11

going to have to process it again so

play27:13

it's going to say plus 100 one more time

play27:16

and that right there is going to be bad

play27:18

because now we're going to have an

play27:19

incorrect count of clicks so we've got a

play27:22

couple of options that we can actually

play27:23

do the first is two-phase commit right

play27:26

we would have some coordinat node over

play27:28

here which basically says hey are you

play27:30

ready and are you ready to commit this

play27:31

thing and assuming they both say

play27:34

yes du then it would say okay go ahead

play27:37

and commit it now that is notoriously

play27:40

slow ideally we would like to avoid it

play27:43

so maybe another better option would be

play27:45

to use something like an item potency

play27:47

key where in addition to storing the

play27:49

total clicks count let's say there are

play27:50

1500 clicks the database also says Hey

play27:53

the last key I saw was uh key number a12

play27:57

uh 45

play28:00

Z and so you know let's say this one

play28:03

this first right was a1245

play28:06

Z whenever this next right comes along

play28:10

because uh it is actually the same event

play28:12

from spark streaming that would also

play28:14

have key

play28:16

a1245 Z and then the database would say

play28:19

hey wait a second these two are equal

play28:20

don't process this one I don't want to

play28:22

listen to it and that is one way of

play28:25

guaranteeing that our pushes to the

play28:27

database are in fact item potent again

play28:29

another way like I mentioned which in

play28:31

retrospect might be easier is to just

play28:33

keep track of total

play28:34

count in spark streaming because then we

play28:37

don't have to worry about you know

play28:38

external numbers being published but

play28:40

that's also going to fail if we

play28:42

potentially have multiple spark

play28:43

streaming consumers so again I

play28:46

personally think an item potency key

play28:48

would make a lot of sense the issue with

play28:50

the item potency key is let's say we had

play28:53

you know 10 spark streaming consumers

play28:55

you know

play28:56

SS SS s SS and they're all publishing

play29:00

over

play29:02

here now we need to store this item

play29:04

potency key times 10 for every single

play29:06

publisher which is inconvenient you know

play29:09

if all of a sudden there are 100

play29:10

Publishers now we need to store 100

play29:12

extra keys that's not great how can we

play29:15

do this uh or rather how can we get

play29:17

around this well we could just make it

play29:19

so that we only have one publisher per

play29:21

row so what I'm saying is that we would

play29:23

only have one instance of spark

play29:26

streaming per short URL so let's say

play29:29

short URL ABC is only being published

play29:32

clicks by this guy right here and so the

play29:34

way that we can do that is we can

play29:36

actually partition our Kafka q's and of

play29:38

course our spark streaming consumers by

play29:40

the short URL and this way we ensure

play29:42

that basically only one consumer is

play29:45

going to be publishing clicks for a

play29:47

specific row at a time which is really

play29:49

good the first one I mentioned is that

play29:51

one there's fewer item potency keys to

play29:53

store per row and additionally now let's

play29:57

say we had two spark streaming things

play29:59

publishing to the database at the same

play30:00

time they would have to grab locks on

play30:03

this row ABC because otherwise we would

play30:05

run into the same problem as before we

play30:07

would have a race condition and so again

play30:09

we want to be trying to avoid grabbing

play30:10

locks whenever

play30:12

possible okay hopefully that much at

play30:15

least has made sense let's quickly talk

play30:17

about expired links because this is

play30:20

definitely worth a discussion so I

play30:22

mentioned in my data model that you know

play30:23

when you create the tiny URL link you

play30:25

should have some field for being able to

play30:26

expire one so let's actually just go

play30:30

ahead and use a batch job to do that we

play30:32

probably don't need to run anything like

play30:34

spark here it's not that much data uh I

play30:37

feel like a nightly batch job would do

play30:38

it and additionally you really don't

play30:40

even need a lock either because at the

play30:42

end of the day uh you know you're only

play30:45

grabbing it for the row that's being red

play30:47

so yeah maybe you do need a lock for

play30:48

that particular row of the one currently

play30:50

being red just so that no one is you

play30:53

know continuing to overwrite it while

play30:54

you clear out the data but at the same

play30:56

time you know the most expensive batch

play30:58

jobs are those that have to grab locks

play30:59

on everything because they're doing some

play31:01

sort of aggregation and in this case

play31:03

we're really not I think uh a simple

play31:05

Crown job would probably get it done and

play31:08

the pseudo code that we' be running is

play31:09

right here which is basically you know

play31:12

if the time of the batch Shob is greater

play31:14

than the expired time of the row just

play31:16

clear it out and uh yeah should be

play31:18

simple

play31:19

enough okay let's finally talk about

play31:22

paste bin because so far we've been

play31:25

talking as if this was all tiny URL

play31:27

and the reason I group P past bin into

play31:30

this kind of video is because they're

play31:31

two very very similar problems but the

play31:33

one exception with paast bin is that you

play31:35

can have super large pastes and so like

play31:38

I mentioned some of them could be in the

play31:39

multiple gigabytes of size we are going

play31:41

to support that and so as a result we

play31:43

probably cannot be storing them as a

play31:45

field in our database I did look it up I

play31:47

think uh postgress supports Fields up to

play31:50

4 gabt for long text so let's imagine we

play31:53

could have one that's 10 gigabytes and

play31:55

it's not going to fit in there so a few

play31:57

options that we have one of which is uh

play31:59

you know storing them in hdfs which we

play32:01

probably shouldn't do because it's

play32:02

generally more expensive uh especially

play32:05

considering that we don't need the data

play32:07

locality of being able to run batch jobs

play32:09

where our pastes are stored because

play32:11

there's no batch jobs to actually run on

play32:13

them we're literally just storing them

play32:14

and then returning them so in my opinion

play32:16

something like an object store like

play32:18

Amazon S3 would make a whole lot more

play32:20

sense so like I mentioned likely

play32:22

preferable cheaper no batch jobs blah

play32:24

blah blah you just heard me say it the

play32:26

other important thing to do is note that

play32:28

because these files are so large we want

play32:30

to be able to deliver them to the user

play32:32

relatively quickly and so some sort of

play32:34

caching would be of great use to us now

play32:36

if you've watched my CDN video you would

play32:38

know that those are great for things

play32:40

like static content which our pastes are

play32:42

you can't actually modify a paste after

play32:44

the fact that you've made it and so

play32:46

again a CDN is basically this

play32:48

geographically distributed cache which

play32:50

is going to enable us to deliver those

play32:52

static files particularly quickly we

play32:54

could again use some type of lru policy

play32:56

to make sure that only the most popular

play32:58

pastes are going to be in there and as a

play33:00

result hopefully things should go well

play33:02

so as you can see what we would want to

play33:04

do perhaps at least in my opinion is try

play33:07

and perform all of these rights in

play33:08

series because if we want all of these

play33:10

rights to go through 1 two and three to

play33:12

the CDN to the S3 and the database then

play33:15

we would have to use some sort of

play33:16

two-phase commit which again can get

play33:18

very expensive and annoying so

play33:20

personally what I would do is from the

play33:22

client I would just first write to CDN

play33:25

when that basically goes through I would

play33:27

then write to S3 assuming that still

play33:29

goes through I would then write to the

play33:31

database the reason I don't write to the

play33:33

database first and then S3 and then the

play33:35

CDN is that if the database right goes

play33:37

through and then the right to S3 fails

play33:39

it's going to seem like a paste exists

play33:41

when the data for it actually doesn't so

play33:43

it's more important that we get the data

play33:44

uploaded to our Object Store and our CDN

play33:46

first and then after that we can go and

play33:49

talk about uh putting things in the

play33:51

database so now you may note that I am

play33:54

actually writing to the CDN before

play33:55

pulling things in here

play33:57

I think this is maybe one of the few

play33:59

cases where uh right through actually

play34:01

makes a lot of sense because at the end

play34:03

of the day uh if we use right around

play34:05

we're going to have to have a cash Miss

play34:07

and that cash Miss is going to be

play34:10

gigantically expensive for a 10 gbyte

play34:12

file so it would be greatly beneficial

play34:14

for us if that was already in the CDN

play34:16

considering how few rights there are

play34:18

relative to reads that being said again

play34:20

open to debate on this

play34:23

one okay take a deep breath guys we're

play34:25

almost done

play34:27

so of course we finally made it to the

play34:30

last diagram where I've tried to put

play34:32

every single thing together so let's go

play34:35

ahead and do that here's our writer this

play34:37

is going to be someone who's generating

play34:39

a link here's our reader this is someone

play34:41

who's going to be reading a link so the

play34:44

writer is going to do the following

play34:46

assuming that this is for tiny URL and

play34:48

not past bin then you're only going to

play34:50

be writing over to the servers but if it

play34:52

is for past bin like I mentioned you'll

play34:54

probably be hitting S3 over here and

play34:55

you're also probably going to to be

play34:57

hitting your CVN which will eventually

play35:00

geographically distribute that content

play35:03

as needed so anyways we've got a load

play35:05

balancer before we basically

play35:07

horizontally scale out all of our URL

play35:09

assigning servers note that I do have

play35:12

our URL assigning Service as something

play35:15

completely different than our URL

play35:16

reading service over there on the right

play35:19

however I think in reality they would

play35:20

probably be on the same box I mainly did

play35:22

it this way for clarity of diagram

play35:25

though I do think it is fair enough to

play35:26

make them their own microservices

play35:28

considering the fact that there are so

play35:29

many more reads than there are writes so

play35:32

once we hit our assigning service we're

play35:34

now going to have to hit a database and

play35:36

actually write our data so as you can

play35:38

see I have those partitioned by the

play35:39

range of the short URLs a through l m

play35:42

through Z and of course we've also got

play35:44

our single liter replication single

play35:47

liter rep and this is going to be our

play35:51

URL table which ideally should be stored

play35:53

in my SQL now of course like I mentioned

play35:55

to be speeding up our reads we also want

play35:58

some sort of distributed partitioned

play36:00

replicated cache which we can just use

play36:02

redus instances for so as you can see

play36:04

this is similarly partitioned by short

play36:06

URL so that we can you know D duplicate

play36:09

the amount of data that we're storing

play36:10

and just use the same caches for the

play36:12

same types of data now note that you

play36:14

know for example if there's a key that

play36:15

starts with a and it gets a ton of

play36:17

traffic and there's a key that starts

play36:18

with b and it gets a ton of traffic you

play36:20

know one cach might not be enough for

play36:22

that so in those cases we may have to

play36:24

actually get some replicas of a

play36:26

particular cash

play36:27

instance okay so hopefully we've touched

play36:29

on that a little bit now let's start to

play36:32

touch upon reads so we've got our reader

play36:34

over here and the reader in the at least

play36:36

the tiny URL case is going to go to the

play36:39

URL read it's going to turn return

play36:43

something from the cash ideally or if

play36:45

it's not in the cash we're going to have

play36:46

a cash Miss that's going to go over to

play36:48

the database the database is going to

play36:50

load the appropriate cache and then as a

play36:53

result of you know going and clicking

play36:55

that link We we are going to now upload

play36:57

our click data the fact that we

play37:00

performed a click over to Kafka which is

play37:03

over here as you can see again we've got

play37:05

multiple different partitions to support

play37:07

the fact that we want just one spark

play37:09

streaming instance publishing per row

play37:13

and then we've got our spark streaming

play37:14

layer over here which ideally should be

play37:16

sharded in the same way that our Kafka

play37:18

qes are now eventually you know let's

play37:21

say every 10 seconds or something

play37:23

or more so probably 10 messages because

play37:26

this is mini batching spark streaming is

play37:29

going to say oh shoot I just saw you

play37:31

know you now have 50 more clicks well if

play37:33

there's 10 messages you can probably

play37:34

only have 10 more clicks and then it's

play37:36

going to go over to my SQL and do a

play37:38

single database upload which is going to

play37:40

be a lot less expensive than having to

play37:42

grab a ton of locks now of course the

play37:45

last component of this whole system is

play37:47

well where are we keeping all of this

play37:48

partitioning info how do we know what

play37:50

load balancer is pointing where you know

play37:52

what database URL is

play37:54

what that's what we have zookeeper for

play37:57

which is really connected to everything

play37:58

zookeeper is great it is you know

play38:01

completely consistent and I should say

play38:03

strongly consistent and it is a

play38:05

coordination service it is effectively a

play38:07

nice key Value Store of all of the

play38:10

metadata information that we're going to

play38:11

need about the various pieces of our

play38:16

system well guys give yourselves a pat

play38:18

on the back I certainly need one I know

play38:20

this was a pretty long video hopefully

play38:22

the content was actually useful because

play38:25

I now need to go cry a little bit

play38:27

anyways have a great rest of the night I

play38:29

will see you in the next one

Rate This
★
★
★
★
★

5.0 / 5 (0 votes)

Étiquettes Connexes
Systems DesignURL ShorteningAnalyticsInterview PrepTinyURLPPinLink GenerationCache OptimizationDistributed SystemsData ConsistencyStream Processing
Besoin d'un résumé en anglais ?