Design Youtube - System Design Interview

NeetCode
6 Dec 202226:04

Summary

TLDRThe video script discusses designing a high-level architecture for a YouTube-like application, focusing on the core functionalities of video uploading and viewing. It highlights the complexity of implementing these features at YouTube's scale, emphasizing the importance of reliability, availability, and minimizing latency. The speaker describes a potential infrastructure involving load balancers, application servers, object storage, and NoSQL databases for metadata. The video also touches on the challenges of video encoding, the use of CDN for optimized video delivery, and the trade-offs between different database systems, sharing YouTube's historical approach to scaling MySQL with the introduction of Vitess.

Takeaways

  • 🎯 The core functionalities of YouTube include video uploading and viewing, which require a scalable and reliable architecture.
  • 🔄 Dealing with the scale of YouTube involves handling 50 million uploads per day and billions of video views, necessitating a robust infrastructure.
  • 🛡️ Reliability is crucial; videos must be stored without risk of corruption or deletion, leveraging object storage solutions like AWS S3 or Google Cloud Storage.
  • 🌐 Availability is favored over consistency, meaning it's better to serve slightly stale data than to risk unavailable service.
  • 🚀 Video encoding is an asynchronous task that requires a large number of workers to handle the daily upload volume efficiently.
  • 💡 Using a CDN (Content Delivery Network) ensures videos are streamed quickly and geographically close to viewers, improving latency.
  • 📚 Metadata and user information are stored in a NoSQL database, such as MongoDB, to allow for fast reads and flexible data storage.
  • 🔄 Denormalization in NoSQL can improve performance by avoiding joins, but updates to user information may require propagating changes across multiple documents.
  • 🚦 Rate limiting may be necessary to prevent abuse of the system, such as uploading an excessive number of videos.
  • 🔍 Additional services for recommendations and search would likely be built on top of the core metadata storage, incorporating user history and preferences.
  • 🛠️ YouTube's initial use of MySQL and the development of Vitess show that with the right engineering solutions, even relational databases can scale to meet massive demands.

Q & A

  • What are the core functionalities of YouTube that the design proposal focuses on?

    -The design proposal focuses on two main functionalities: uploading videos from a user's perspective and watching videos from a user's perspective.

  • What is the estimated scale of daily uploads for YouTube?

    -The estimated scale of daily uploads for YouTube is 50 million videos per day.

  • How does YouTube handle the reliability of video storage?

    -YouTube uses object storage, such as AWS S3 or Google Cloud Storage, which handles replication and ensures that videos are reliably stored and not subject to deletion or corruption.

  • What is the read-to-write ratio for YouTube users?

    -For every one user uploading a video, there are a hundred users watching videos. This means that for every five users watching a video per day, there are five billion videos being watched per day.

  • What does YouTube prioritize in terms of data management: availability or consistency?

    -YouTube prioritizes availability over consistency. It is more important for the platform to respond correctly and quickly to user requests, even if it means occasionally serving slightly outdated data.

  • How does YouTube address the latency issue for video playback?

    -YouTube addresses latency by using a Content Delivery Network (CDN) to distribute video content geographically close to end users and by streaming videos in small chunks to start playback quickly, even before the entire video is loaded.

  • What type of database does YouTube initially use for storing video metadata and user information?

    -YouTube initially uses a relational database, specifically MySQL, for storing video metadata and user information.

  • How did YouTube scale their MySQL database to handle a large amount of read traffic?

    -YouTube scaled their MySQL database by adding read-only replicas and implementing sharding. They also developed an engine called Vitess to decouple the application layer from the database layer, handling sharding and request routing logic.

  • What is the role of a message queue in the video uploading process on YouTube?

    -The message queue is used to manage the video encoding process, which is an asynchronous task. Videos are added to the queue and then sent to encoding services, which can handle the encoding in parallel.

  • Why is denormalization acceptable in the context of YouTube's NoSQL database design?

    -Denormalization is acceptable because it improves performance by eliminating the need for joins. It allows for duplicate information to be stored, which speeds up read operations, as seen with user profile pictures being stored with each video document.

  • What protocol does YouTube use for video streaming and why?

    -YouTube uses HTTP requests built on top of TCP for video streaming. TCP is favored for its reliability, ensuring that the entire video is received without any missing gaps, which is important for delivering a smooth viewing experience.

Outlines

00:00

🎨 High-Level Architecture of YouTube-Style Application

This paragraph introduces the concept of designing a YouTube-style application, highlighting the differences between YouTube and other video platforms like Netflix. It emphasizes the core functionalities of YouTube, such as video uploading and viewing, and acknowledges the complexity behind these features. The speaker also mentions additional features like video search, recommendations, commenting, and analytics, but notes that these cannot be fully explored in a short interview. The focus is on the functional requirements of uploading and watching videos, with a brief mention of non-functional requirements like reliability, scalability, and minimizing latency.

05:01

🔧 Reliability and Availability in Video Storage

This paragraph delves into the non-functional requirements of the YouTube-style application, particularly focusing on reliability and availability. The speaker discusses the importance of ensuring videos are not corrupted or deleted, and the challenge of handling thousands of concurrent viewers. The design must account for a large scale, with assumptions of a billion daily active users and a high volume of video uploads and views. The speaker also introduces the concept of favoring availability over consistency, using the example of video recommendations and the potential for temporary stale data.

10:01

🚀 High-Level Design and Uploading Videos

The speaker begins to outline a high-level design for the YouTube-style application, starting with the user journey of uploading a video. The paragraph discusses the infrastructure needed to handle the massive scale of video uploads, suggesting the use of a load balancer and application servers. It also covers the storage of raw video files in object storage, like AWS S3 or Google Cloud Storage, and the importance of metadata associated with each video. The speaker chooses a NoSQL database, such as MongoDB, for its flexibility and ability to denormalize data, which is beneficial for read-heavy systems like YouTube.

15:02

🎥 Video Encoding and its Asynchronous Nature

This paragraph focuses on the video encoding process, which is an essential and time-consuming part of handling user-uploaded videos. The speaker explains that video encoding is an asynchronous task that requires adding videos to a queue for processing by multiple servers. The paragraph also touches on the need for horizontal scaling of encoding workers to handle the high volume of daily uploads. The speaker uses a hypothetical calculation to illustrate the number of workers needed and emphasizes the importance of having more workers than the number of videos uploaded per second.

20:03

🍿 Optimizing Video Viewing Experience

The speaker discusses the optimization of the video viewing experience, starting with an example of how YouTube loads and buffers video chunks for smooth playback. The paragraph explains the technique of loading video segments separately rather than the entire video, which reduces latency and allows for immediate playback. It also covers the use of HTTP requests for video streaming and the separation of audio and video content. The speaker highlights the importance of client-side code for managing memory usage during video playback and touches on the choice between UDP and TCP protocols for video streaming.

25:04

🛠️ YouTube's Database Evolution and Use of Vitess

In this paragraph, the speaker provides insights into YouTube's database evolution, noting that YouTube initially used MySQL, a relational database management system, and not NoSQL as might be expected for a system of its scale. The speaker explains how YouTube implemented read-only replicas and sharding to scale MySQL, which led to complex application server code. Eventually, YouTube developed Vitess, an engine to decouple the application layer from the database layer, allowing for easier scaling. Vitess has since been open-sourced and is used by other companies for scaling MySQL. The speaker suggests that while NoSQL might seem like an obvious choice, YouTube's ingenuity in scaling MySQL demonstrates that limitations can be overcome with resourcefulness.

Mindmap

Keywords

💡System Design

System Design refers to the process of defining the architecture or structure of a system to achieve specific goals. In the context of the video, it involves creating a high-level architecture for a YouTube-like application, focusing on scalability, reliability, and efficient handling of user interactions such as video uploads and viewing.

💡Video Uploading

Video Uploading is the process of transferring video files from a user's device to a server or cloud storage. In the video, it is a core functionality that the designed system must support, with considerations for handling large files, ensuring reliability, and managing the potential for millions of uploads per day.

💡Video Encoding

Video Encoding is the process of converting raw video files into a format that is optimized for streaming or playback on various devices. The video emphasizes the importance of encoding for reducing file sizes and ensuring that videos can be efficiently delivered to viewers.

💡Load Balancer

A Load Balancer is a device or software that distributes incoming network traffic across multiple servers to ensure no single server bears too much demand, improving reliability and performance. In the video, a load balancer is suggested as a necessary component to handle the massive scale of user requests and video uploads.

💡Object Storage

Object Storage is a cloud storage service that is optimized for storing and retrieving unstructured data such as multimedia files like videos. It is characterized by its ability to handle large files and provide high durability and availability. In the video, object storage is proposed for storing both the raw and encoded video files to ensure reliability and easy access.

💡Metadata

Metadata refers to data that provides information about other data. In the context of the video, it includes details such as video titles, descriptions, user information, and tags that are associated with each video file. Managing metadata is crucial for search, recommendation systems, and user experience.

💡Message Queue

A Message Queue is a middleware solution that allows different parts of a system to communicate asynchronously by sending and receiving messages. In the video, a message queue is used to manage the video encoding process, allowing for scalability and decoupling of services.

💡Content Delivery Network (CDN)

A Content Delivery Network (CDN) is a distributed network of servers that deliver content, especially videos, to users from the server closest to them. This improves latency and load times by reducing the distance data has to travel. In the video, a CDN is suggested to ensure that videos are delivered quickly and efficiently to end-users around the world.

💡Cache

Cache is a storage layer, often in memory, that holds data temporarily to enable faster access than from the original source. In the context of the video, caching is used to store frequently accessed video metadata to improve response times and reduce load on databases.

💡Asynchronous Task

An Asynchronous Task is a task that does not block the main execution thread and can be executed in the background, allowing the system to continue processing other tasks. In the video, video encoding is described as an asynchronous task, meaning that once a video is uploaded, it can be encoded in the background without affecting the user's experience.

💡Scalability

Scalability refers to the ability of a system to handle increased load or demand by adding resources or components as needed. The video emphasizes the importance of designing a system that can scale horizontally to accommodate the massive user base and data throughput of a YouTube-like application.

Highlights

Designing a YouTube-like application involves addressing the unique challenges of handling user uploads and video streaming at a massive scale.

YouTube differs from other video platforms like Netflix in that it allows users to upload videos and offers free access to a vast library of content.

The core functionalities of YouTube include video uploading and viewing, but also encompass complex systems for recommendations, comments, analytics, and advertising.

Reliability is a critical non-functional requirement, ensuring videos are not corrupted or lost, even when handling a large number of daily uploads.

Availability is prioritized over consistency, meaning users should always be able to access YouTube, even if it means occasionally viewing slightly outdated information.

Latency minimization is essential for a smooth user experience, with videos starting to play as soon as possible after a user clicks on them.

A load balancer is necessary to distribute the massive traffic and video uploads across multiple application servers, ensuring no single point of failure.

Object storage, like AWS S3 or Google Cloud Storage, is used for storing the raw video files due to its efficiency with large files and built-in replication features.

NoSQL databases, such as MongoDB, are chosen for storing video metadata and user information due to their flexibility and performance advantages for read-heavy systems.

Denormalization in NoSQL databases allows for duplicate information to improve read performance, even though it may require additional writes when data changes.

Video encoding is an asynchronous process that requires a message queue and multiple workers to handle the large volume of uploads efficiently.

A CDN (Content Delivery Network) is crucial for distributing encoded videos geographically to reduce latency and improve the viewing experience for users worldwide.

Video streaming, as opposed to downloading, involves sending small chunks of video to the user, allowing for playback to begin without the entire file needing to be loaded.

YouTube initially used MySQL, a relational database, and later developed Vitess to handle sharding and scaling, showcasing innovation in overcoming the limitations of traditional database systems.

The design of YouTube's system involves a balance between read and write operations, with a focus on optimizing for the more frequent and critical read operations.

For video streaming, the HTTP protocol is used, leveraging the reliability of TCP to ensure video chunks are delivered without loss or corruption.

The evolution of YouTube's backend architecture, from MySQL to Vitess, demonstrates the resourcefulness and adaptability required in addressing scaling challenges.

Transcripts

play00:00

let's design the high-level architecture

play00:02

of a YouTube style application by the

play00:05

way this video is taken from my system

play00:06

design interview course which you can

play00:08

check out on neetcode.io now let's

play00:11

design YouTube first let's go over the

play00:13

background even though I'm sure you're

play00:15

familiar with how YouTube and other

play00:17

types of video sites like Netflix and

play00:19

others work compared to Netflix YouTube

play00:21

is a bit different and that users can

play00:24

actually upload videos and it's actually

play00:26

free as well pretty much anyone can

play00:29

upload videos and of course if we can

play00:31

upload videos we can also choose to

play00:34

watch videos as users and when it comes

play00:36

to YouTube This is actually the core

play00:38

functionality though that doesn't mean

play00:40

it's simple to implement there's a ton

play00:42

of complexity with reaching the scale

play00:44

that YouTube does with even just these

play00:46

two features but this is actually not

play00:48

all that YouTube is capable of doing

play00:50

there's a lot of data you can obviously

play00:52

search for videos you can obviously have

play00:54

videos recommended and you know

play00:56

designing that recommendation system

play00:57

could be its own design problem but even

play00:59

then and it would not be able to be

play01:02

fully described and designed in a 45

play01:04

minute interview of course and users can

play01:06

of course you know comment and interact

play01:08

with videos by liking or disliking them

play01:10

and there's a ton of probably analytics

play01:12

that goes on with reporting views and

play01:14

I'm sure there's like bot prevention

play01:16

with like comments even though there's a

play01:18

ton of thoughts in the comments lately

play01:19

and advertising and you know the list

play01:21

could go on and on the point I'm trying

play01:23

to make is that when it comes to an

play01:24

ambiguous kind of design proposal like

play01:26

this there's many different directions

play01:28

we could go in and of course we can't

play01:30

explore all of them so then moving on to

play01:32

the functional requirements let's say

play01:34

that the main features that we want to

play01:36

focus on are going to be uploading

play01:39

videos from a user's perspective and

play01:41

then watching videos from a user's

play01:44

perspective so these are the two main

play01:46

functional requirements we want to focus

play01:47

on if we have time at the end maybe we

play01:49

can kind of explore how we can extend

play01:51

our design to handle additional

play01:53

functionality but you know these are the

play01:55

main things that we want to be able to

play01:56

focus on when it comes to non-functional

play01:58

requirements a first one that comes to

play02:01

mind is reliability when it comes to

play02:04

videos you would never want to run into

play02:06

an issue where somebody uploads a video

play02:08

and then that video is somehow corrupted

play02:12

or deleted we definitely don't expect

play02:14

that when we're storing something on

play02:15

YouTube even though it's free we

play02:17

wouldn't want a video to just disappear

play02:19

so we really need the videos to be

play02:21

extremely reliable at least in terms of

play02:23

storage and talking about the scale that

play02:26

we're going to be handling even a single

play02:28

video can have potentially thousands of

play02:31

concurrent viewers so that's what we

play02:33

have to kind of keep in mind of course

play02:34

we're going to have a ton of users let's

play02:37

assume that we're designing YouTube to

play02:39

handle a billion daily active users

play02:41

which is about accurate I think now when

play02:43

it comes to these users let's say that

play02:46

each user is watching five videos per

play02:49

day but the upload ratio is going to be

play02:52

a hundred users are watching videos for

play02:55

every one user uploading a video or you

play02:58

know this is the ratio of reads versus

play03:00

rights for videos so if we have five

play03:02

users watching a video per day we have

play03:04

five billion of videos being watched per

play03:07

day if the ratio is a hundred to one

play03:09

that means one percent of five billion

play03:11

is going to be the number of videos

play03:12

uploaded per day so that is going to be

play03:15

50 million videos uploaded per day so a

play03:19

massive amount of throughput now the

play03:21

good thing is among these 50 million

play03:22

videos most of them probably aren't

play03:24

going to be getting a ton of views if I

play03:26

had to guess I bet you know the top five

play03:28

percent of videos account for like 90 of

play03:31

the views but this is just off the top

play03:33

of my head but I think we can kind of

play03:34

design this in a way that we assume that

play03:36

you know most videos will not be getting

play03:38

views though they do still have to be

play03:39

you know stored and we can't let them

play03:42

get deleted now in most cases doing a

play03:44

bunch of complex math isn't super

play03:45

important it's about coming to the right

play03:47

conclusions which we kind of are we also

play03:49

have to keep in mind that when it comes

play03:51

to availability we definitely want to

play03:54

favor availability over consistency what

play03:57

do I mean by that well every time you go

play04:00

on YouTube and you refresh the YouTube

play04:02

home page and you want to see like a

play04:04

bunch of videos on your home page every

play04:06

time you make that request every time

play04:08

you refresh you should get a correct

play04:10

response you should get an HTTP 200

play04:12

response and things should load and it's

play04:15

okay if we have to sacrifice consistency

play04:18

to achieve that what do I mean by that

play04:20

well what if you're in your subscription

play04:22

feed and somebody just uploaded a new

play04:25

video somebody you're subscribed to just

play04:27

uploaded a new video one second ago and

play04:29

you just refreshed your home page you

play04:31

see a bunch of videos in your

play04:33

subscription feed but none of them

play04:35

appear to be the one that was just

play04:37

uploaded a second ago hypothetically

play04:39

this could happen if we have multiple

play04:41

storage systems and one of the storage

play04:43

systems that you happen to be reading

play04:45

from when you refreshed the page was

play04:46

this one but this one did not have the

play04:48

most up-to-date data this one did not

play04:50

have the new video uploaded but this one

play04:52

did but eventually that video will be

play04:55

replicated to the other storage but it

play04:57

just takes a few seconds so you're

play04:58

getting stale data our data storage is

play05:01

not favoring consistency it's favoring

play05:02

availability and the worst thing that

play05:04

would happen in this case is most likely

play05:05

you'd have to wait a little bit longer

play05:07

maybe when a new video is uploaded you

play05:09

have to wait five seconds before you can

play05:11

actually see it or maybe in the worst

play05:12

case something like 10 seconds but is

play05:13

that really that big of a deal I think

play05:15

it would be a lot worse if you refresh

play05:17

the page and it didn't return anything

play05:19

to you at all and lastly we want to

play05:22

obviously minimize the latency as much

play05:24

as possible when you click to watch a

play05:26

video ideally it should start playing

play05:28

immediately even if the entire video

play05:30

isn't loaded and if we have a good

play05:32

internet connection we shouldn't have to

play05:33

experience any buffering or waiting for

play05:35

the video to load now let's start with

play05:37

the high level design and I'm going to

play05:39

start with the user journey of uploading

play05:42

a video because uploading is probably

play05:44

going to be more complicated than

play05:46

actually watching a video and this will

play05:48

probably give us a better sense of the

play05:49

infrastructure that's going to be

play05:51

involved with our design now since we're

play05:53

dealing with such a massive scale 50

play05:55

million uploads per day we probably

play05:58

can't handle that with a single server

play05:59

or so we would most likely have a load

play06:02

balancer sitting in between a bunch of

play06:04

application servers so that we can kind

play06:06

of scale this horizontally now this is a

play06:08

pretty generic thing for now let's just

play06:10

assume that how we kind of do this

play06:12

doesn't really matter whether you know

play06:14

the user hits this application server or

play06:16

this one so for now I'm going to

play06:17

simplify our design and just kind of

play06:19

draw it like the user is making an

play06:21

upload request to the application server

play06:23

even though under the hood we know it's

play06:24

of course going to need to be load

play06:26

balanced now even the act of uploading a

play06:28

video is not as simple as it might sound

play06:30

what happens if there's a short like

play06:32

internet connection breakage like even

play06:34

for just a second and we're uploading a

play06:36

file that's like over a gigabyte we were

play06:38

already halfway through but now would we

play06:40

have to restart over or could we start

play06:41

where we left off let's assume that this

play06:43

is not the direction that we want to go

play06:45

in and we can just kind of say that once

play06:47

the video is uploaded it's going to be

play06:49

stored in some object storage and let's

play06:52

say that this is where we're going to

play06:54

store the raw files that the user

play06:56

uploads but firstly the reason we're

play06:58

using an object stores because cause

play07:00

that's a lot better for storing media

play07:02

and large files like videos we probably

play07:04

don't need to store that in like a

play07:06

relational database for example and also

play07:08

object storage typically you know

play07:09

something like AWS S3 or Google Cloud

play07:12

Storage they kind of handle that

play07:14

replication for us so we can kind of

play07:17

safely assume that if we store something

play07:19

in an object store we don't have to

play07:21

worry about it being deleted and that's

play07:22

generally how Cloud file storage works

play07:25

like things like Google Drive are

play07:26

actually built on top of object storage

play07:28

so at a high level we can safely assume

play07:30

that we have our reliability cover now

play07:33

storing the videos here is fine but what

play07:35

about the actual metadata associated

play07:37

with every video going over what the API

play07:39

for uploading would look like it would

play07:41

obviously have like a title and a

play07:44

description and like the actual like

play07:45

video content itself maybe something

play07:47

like an mp4 that's what's going to

play07:49

actually be stored in object storage and

play07:51

you know there could be a bunch of other

play07:52

things that we store you know things

play07:53

like tags and stuff like that but this

play07:55

isn't really the important part knowing

play07:57

like every single field that you would

play07:58

want to store with a video but most

play08:00

importantly we also want to associate

play08:02

every single video with a user because

play08:04

remember every time you you know go on

play08:06

YouTube and you watch a video underneath

play08:07

there is usually like the profile

play08:09

picture and the username of the person

play08:10

who actually uploaded it this isn't like

play08:12

Netflix where you just have you know

play08:14

shows on YouTube people are actually

play08:16

creating the videos the content creators

play08:18

so you know every time we want to

play08:19

actually show a video to a user we're

play08:21

gonna have to join that video with the

play08:24

user information and the video metadata

play08:26

of the video itself and like the person

play08:28

who created it of course so long story

play08:30

short every time we upload a video we're

play08:32

also going to be storing metadata

play08:33

associated with that video and we're

play08:35

also going to be storing user

play08:36

information in this database and I'm

play08:39

choosing to do a nosql database because

play08:41

we're going to have so many videos

play08:42

uploaded probably going to be needing to

play08:44

read this metadata very frequently in

play08:47

this database itself we can store a

play08:49

reference to the video file in the

play08:52

object store and that should be fine now

play08:54

let's say for the nosql we're using

play08:56

something like mongodb which if you

play08:58

don't recall it it doesn't store things

play09:00

in tables and rows like a SQL database

play09:03

it stores things kind of like in a Json

play09:06

format the terms are a collection we can

play09:09

have a collection of documents and a

play09:12

document is pretty similar to a Json

play09:14

object it's very flexible so let's say

play09:16

you know one collection is videos every

play09:19

video document will have all the

play09:21

information about a single video that we

play09:22

need and we also have another collection

play09:24

for a user and you know all their

play09:26

information you might be thinking if a

play09:28

user you know wants to watch a certain

play09:30

video don't we have to then perform a

play09:32

join with a user well not necessarily

play09:35

with nosql databases like mongodb we can

play09:38

have our data a denormalized is the

play09:41

correct term normalized and SQL is

play09:43

basically like you don't store duplicate

play09:45

data you have Separate Tables and then

play09:46

if you want to aggregate or combine

play09:48

information you can join those tables

play09:50

but in mongodb you don't have to do that

play09:52

we can actually store duplicate

play09:53

information so in every video we

play09:55

actually would store the relevant user

play09:57

information like we know when a user

play09:59

what goes on YouTube and wants to watch

play10:01

a YouTube video they kind of see that

play10:02

profile picture of the user like that's

play10:04

one example that I'm going to be talking

play10:05

about right now well that profile

play10:07

picture is probably also going to be

play10:09

stored in object storage somewhere so

play10:11

that profile picture will probably have

play10:13

a reference to it in the user document

play10:15

but we'll also actually have it stored

play10:18

in every video document of you know the

play10:21

creator of that video so we'll have

play10:23

duplicate references to it but that's

play10:24

okay in nosql because it at the very

play10:27

least does improve a performance we

play10:29

don't have to perform joins now the

play10:31

question is what happens if a user

play10:32

actually updates their profile picture

play10:34

yes we'd have to update you know the

play10:36

user document but then we have to update

play10:37

every single a video document where that

play10:40

person created a video and maybe they

play10:41

have 100 videos or maybe they have a

play10:43

thousand videos we'd have to update all

play10:45

of those documents and in this case

play10:46

that's okay because first of all they're

play10:49

probably not going to be updating their

play10:50

profile picture very frequently you know

play10:52

uploading a video is probably more

play10:54

frequent and watching a video is more

play10:55

frequent so that's kind of what we're

play10:57

favoring here reads over writes but also

play10:59

if they update their profile picture you

play11:01

know we can kind of update all of those

play11:03

video documents asynchronously we don't

play11:06

have to do it immediately is it going to

play11:07

be the end of the world if somebody sees

play11:09

an old profile picture from this user

play11:11

for a few minutes or maybe even an hour

play11:13

probably not so these are some details

play11:15

that we could kind of discuss this is

play11:17

probably not you know high level so

play11:18

let's kind of continue with the rest of

play11:20

our design now when it comes to videos

play11:21

encoding is actually a big part of it as

play11:24

users upload videos like raw video files

play11:27

to YouTube YouTube does a lot of video

play11:29

encoding and compression to get the size

play11:32

of those videos down and encoding a

play11:34

video is not something that can happen

play11:36

like in one second this is definitely an

play11:38

asynchronous task so it can take on the

play11:41

order of minutes to typically uh encode

play11:44

files and if they're you know really

play11:45

large files I think YouTube will allow

play11:47

you to even upload like a 24 hour video

play11:49

file it can probably take hours to do

play11:51

the encoding for that which is the

play11:53

reason why we are using a message queue

play11:56

for that now there's a lot of domain

play11:57

knowledge that would be needed to

play11:59

understand and video encoding and that's

play12:01

not what we really want to dive into so

play12:03

let's just keep it high level as raw

play12:05

video files are uploaded we're going to

play12:06

be storing them but we're also going to

play12:08

be adding them to a queue so that they

play12:10

can be sent to another service which is

play12:13

going to be handling the encoding and

play12:15

it's probably not going to be you know a

play12:16

single server that's going to be doing

play12:18

that we're probably going to have a ton

play12:19

of servers to do that after the videos

play12:22

are encoded they are going to be stored

play12:24

in object storage because you know

play12:26

there's still videos we probably still

play12:27

want to store them in object storage to

play12:29

make sure that they are a reliably

play12:31

stored and replicated and videos are

play12:34

immutable so we don't really need like a

play12:36

Hadoop file system or something like

play12:38

that object storage is probably good

play12:39

enough we're not going to be you know

play12:40

updating a video we'll be updating like

play12:42

metadata associated with it but you know

play12:44

with a video we're either gonna upload

play12:46

it we might delete it but that's pretty

play12:47

much it you're not going to be editing

play12:49

the video now this is how a video can be

play12:52

uploaded but what about actually

play12:53

watching a video well we want the reads

play12:56

to be as fast as possible we want the

play12:58

latency to be as low as possible so

play13:00

anywhere we can kind of add caching is

play13:02

going to be really really helpful we

play13:04

know users aren't going to be reading

play13:05

you know raw video files they're going

play13:07

to be reading encoded video files and we

play13:10

probably want to have these distributed

play13:11

around the world but also to have the

play13:14

videos stored as close as possible to

play13:16

end users we can have a CDN service

play13:19

which you know does exactly that it

play13:21

distributes static files geographically

play13:23

and so when a user wants to watch a

play13:25

video the video file itself is going to

play13:28

be loaded via the CDN which is going to

play13:30

be pulling from the object storage but

play13:32

the user can fetch like the actual

play13:34

metadata associated with a video from a

play13:37

database but to actually speed that up

play13:38

because probably we know that a small

play13:40

amount of videos are going to be getting

play13:42

the most amount of views we can probably

play13:44

add a ash in front of our database and

play13:48

that cache of course is going to be an

play13:49

in-memory cache that's the whole point

play13:51

of speeding it up because disk is of

play13:53

course slower than memory but this can

play13:56

probably not store every video that we

play13:58

need so we'll have to have you know know

play13:59

some way to kick videos off most likely

play14:01

newer videos are going to be getting

play14:03

more views so we can probably have like

play14:05

an lru cache implemented here so now

play14:07

finally let's actually start digging

play14:09

into some of the details and the first

play14:12

thing I actually want to talk about is

play14:13

this encoding part over here more

play14:16

specifically we talked about we could

play14:18

have 50 million videos uploaded per day

play14:21

so my question is how many workers here

play14:24

assuming that they can actually encode

play14:26

the videos in parallel which you know

play14:28

this is a pretty easy service to scale

play14:31

horizontally at least at the high level

play14:32

I'm not saying you know video encoding

play14:34

is an easy a topic to understand but

play14:36

assuming that at a high level you know

play14:38

one worker can encode one video at a

play14:41

time so if one person uploads multiple

play14:43

videos or you know 10 people are

play14:45

uploading videos at the same time

play14:46

they'll be added to the queue and then

play14:48

they'll reach the encoding service

play14:50

before they're actually encoded and

play14:51

written to storage but the point is that

play14:53

multiple videos can be encoded at the

play14:55

same time there's no like dependency or

play14:57

anything like that so if we have 50

play14:58

million and uploads per day and assuming

play15:01

that every video takes one minute to

play15:04

encode which is probably too small it

play15:06

would probably take longer than that on

play15:07

average but let's say you know these

play15:09

workers are really really good they have

play15:10

really good resources and maybe most

play15:12

videos that we upload are going to be

play15:13

pretty short so in terms of capacity

play15:16

planning how many workers would we need

play15:18

here well 50 million uploads per day

play15:20

that's assuming 100 seconds in a day we

play15:23

can divide that by a hundred thousand I

play15:25

think we get to roughly 500 videos per

play15:28

second are going to be uploaded per day

play15:30

so you know the first thing on your mind

play15:33

would be well can we just have 500

play15:34

workers no that's pretty naive because

play15:36

remember we said that it takes one

play15:38

minute to upload or to encode every

play15:41

video on average let's say so if we only

play15:44

had 500 workers and in the first second

play15:47

we have 500 videos uploaded okay each of

play15:50

those workers is encoding a single video

play15:52

now one more second goes by and we have

play15:55

500 more videos uploaded but every

play15:57

worker is busy so we add those 500 to

play16:00

the queue and then another second goes

play16:01

by and we add 500 more and you know that

play16:03

this keeps happening until one minute

play16:06

has passed and then finally these 500

play16:08

are done and we can store them and now

play16:10

the workers can get 500 more videos but

play16:13

by this point our queue would be

play16:15

backloaded pretty hard at this rate we

play16:17

would never get through the backlog so

play16:19

we need more than 500 workers if you do

play16:21

the math there's 60 seconds in a minute

play16:24

so multiply 500 by 60 you'd get to 30k

play16:27

workers and this is roughly the answer I

play16:30

personally would be looking for now with

play16:32

video encoding it's probably pretty hard

play16:34

to get an accurate estimate and I'm not

play16:36

sure if you know one worker can actually

play16:38

handle multiple videos at once maybe

play16:40

that's the case but the important thing

play16:42

I would be looking for if I ask you this

play16:43

is that you know we definitely need more

play16:45

than 500 workers we need more than how

play16:47

many videos are going to be uploaded in

play16:49

a second that's for sure now another

play16:50

interesting thing about this problem is

play16:52

actually watching a video let's talk

play16:55

about some details on how this can be

play16:57

optimized and the best way to do so is

play16:59

by looking at an example so right now

play17:02

I'm on YouTube on my channel

play17:03

specifically I'm going to go ahead and

play17:05

open up our Dev tools and we're going to

play17:07

be focusing on the network tab I'm also

play17:10

going to filter this on xhrs and I'm

play17:12

going to click one of these shorter

play17:14

videos You'll see why in just a second

play17:16

so first thing you see here is this is

play17:18

how much of the video has buffered you

play17:20

can see this portion of the video has

play17:23

buffered when we watch a YouTube video

play17:24

we don't need to wait for the entire

play17:27

video to download before we watch it

play17:29

we're going to be starting at the

play17:30

beginning presumably we only need the

play17:32

beginning to be loaded but watch what

play17:34

happens when I click over here if I skip

play17:36

to this part of the video we would well

play17:39

it just kind of loaded a little bit so

play17:40

now I'm going to skip over here Watch

play17:41

What Happens see it kind of immediately

play17:43

buffers so that's what we want to do we

play17:46

don't need the entire video to be loaded

play17:48

but it's true that some people might be

play17:49

skipping around they might skip around

play17:51

to this part which seems to be popular

play17:52

and that you know this part of the video

play17:55

has not loaded only this part has so

play17:57

what's gonna happen when I click here

play17:58

well that part got loaded and what's

play18:01

actually happening here if we scroll

play18:02

down in the request the most recent

play18:05

request is a request to actually load

play18:07

that portion of the video we are not

play18:09

using a streaming to do this we're

play18:12

actually making HTTP requests to load

play18:15

chunks of the video I'm going to kind of

play18:17

expand this here you can see a request

play18:19

was made here and what the response was

play18:21

looks like gibberish to us because this

play18:23

is actually you know that portion of the

play18:25

video and going back to the headers when

play18:27

you scroll down to the response headers

play18:29

you can see that okay actually this was

play18:31

not the video this content type is audio

play18:33

so I'm going to hit the second one over

play18:35

here actually and scrolling down to the

play18:37

header and then looking at the content

play18:39

type we see that this one was the video

play18:41

so actually it looks like the audio is

play18:43

being fetched separately from the video

play18:45

so this when we look at the response is

play18:47

probably the video before we were

play18:49

probably looking at the audio not that

play18:50

it looks any different to us and I'm

play18:52

going to go ahead and refresh this and

play18:54

do it one more time so we can see

play18:56

pausing this a portion of the video has

play18:58

has loaded here so we're going to scroll

play19:01

down to the requests we can see the

play19:03

video playback requests are the ones

play19:05

that are actually loading the video

play19:06

itself and as I click here new chunk of

play19:09

video was loaded so let's scroll all the

play19:12

way down to see that one over here these

play19:14

uh multiple requests and you can see

play19:16

that some of them are larger than others

play19:18

but the point is that one megabyte of

play19:20

data is easier to transfer than the

play19:23

entire video which might be I don't know

play19:24

like 20 or 30 megabytes and this is the

play19:27

technique to lower latency loading a

play19:29

video via smaller chunks now while

play19:32

rendering and loading videos is also a

play19:34

domain knowledge heavy topic I still

play19:36

think it's worth mentioning because that

play19:37

technique that we kind of just went over

play19:39

small chunks of videos It's a pretty

play19:42

simple concept to understand at the high

play19:43

level we don't need to send the entire

play19:46

video to the user before they can start

play19:48

watching it we can just send them small

play19:49

chunks of the portion that they're

play19:51

actually watching now another relevant

play19:53

question would be what protocol should

play19:55

we use for sending videos and by the way

play19:57

what we just talked about is called

play19:59

video streaming not necessarily live

play20:03

streaming because we know that video is

play20:04

already stored it's not like a live feed

play20:06

but the video is being streamed meaning

play20:08

it's being sent in small chunks versus

play20:10

like when you actually download a video

play20:12

that's not streaming that's like taking

play20:14

the entire file that's stored and then

play20:17

sending it to your computer and then

play20:18

storing it whereas video streaming I

play20:20

believe those small chunks are actually

play20:22

stored in your computer's memory which

play20:25

is also why you would not want the

play20:27

entire video to be taking up all of your

play20:29

memory so most likely there is some

play20:31

client-side code that is handling that

play20:33

and freeing memory because it's pretty

play20:34

easy to write client-side JavaScript

play20:37

that will crash your browser and take up

play20:38

all your memory so that's kind of

play20:40

something it's a front-end developer you

play20:42

might want to keep in mind because if we

play20:43

were watching like a 10 hour long video

play20:45

which definitely exists on YouTube we

play20:47

would not need the entire video to be

play20:49

buffered in our memory we could just

play20:51

skip around the video but going back to

play20:53

what protocol we might want to use for

play20:56

this since we want latency to be as low

play20:58

as as possible you might favor at a high

play21:01

level there's you know two protocols UDP

play21:03

and TCP you might favor UDP for video

play21:06

streaming now that's probably a better

play21:08

choice for video live streaming because

play21:10

you know as like a sports game is going

play21:12

on if you miss one second of it you

play21:14

don't want to go back to that one second

play21:16

you want to keep up with the most

play21:18

up-to-date information so you want to

play21:20

say you know what's happening in real

play21:22

time that's what you would want if

play21:23

you're live streaming something or

play21:25

watching a live stream that's what UDP

play21:27

favors but with an actual video we know

play21:29

that video is stored somewhere and we

play21:31

want to watch the entire video if you're

play21:33

watching like a movie or something on

play21:34

YouTube you don't want like to miss you

play21:37

know two seconds of it because that

play21:38

might be like the actual plot point so

play21:40

TCP is favored for reliability that will

play21:43

ensure that we get the entire video

play21:45

there's not going to be any missing gaps

play21:47

in the video and sure it might take

play21:48

longer to do that but as long as we send

play21:51

it in small chunks it should be okay and

play21:54

that's exactly what we saw was happening

play21:55

with YouTube it was sending HTTP

play21:57

requests which are Bill built on top of

play22:00

TCP so I think that is kind of also

play22:02

another important question in the

play22:04

context of YouTube compared to a lot of

play22:06

like other system design problems and

play22:08

also there's a lot of other things we

play22:10

could explore with this especially when

play22:11

it comes to uploading videos keeping

play22:13

things at a high level we'd probably

play22:15

want to rate limit this we don't want

play22:17

somebody to just be able to upload like

play22:19

an infinite number of videos or you know

play22:22

that could be implemented in the load

play22:24

balancer itself which we kind of like

play22:25

emitted from this design but we know it

play22:27

does exist also when it comes to

play22:29

recommendations for YouTube videos or

play22:31

even searching we'd probably want to

play22:33

have you know other auxiliary Services

play22:35

which read from our metadata and we

play22:38

probably want to store like a history of

play22:40

what types of videos does this person

play22:41

watch what types of videos do they like

play22:43

so we can kind of build some

play22:44

recommendation for them and for you know

play22:46

searching videos that could be its own

play22:48

topic like that's kind of like designing

play22:50

Google search because there's a lot of

play22:51

like indexing you can do you probably do

play22:53

want to incorporate recommendations with

play22:55

searching as well but also you know you

play22:58

want to have like the metadata like the

play22:59

description the title how many views

play23:01

does a video have which videos are most

play23:03

relevant when searching which types of

play23:05

strings we could have some autocomplete

play23:07

with that and those most likely would be

play23:09

built on top of this or those would be

play23:12

built separately from this kind of core

play23:14

functionality now one last thing I

play23:16

wanted to cover because I think it's

play23:18

always interesting to understand how

play23:19

this type of service was actually built

play23:21

what YouTube actually did was not use a

play23:24

nosql database they actually used my

play23:27

sequel which is a relational database

play23:29

management system now you might be

play23:31

wondering why didn't they use nosql and

play23:34

I definitely don't know the details one

play23:36

guess I have is that YouTube was

play23:37

actually first created I think in like

play23:39

the early 2000s maybe 2004 or 2005 and

play23:42

mongodb did not exist at that point and

play23:45

they probably didn't need to handle the

play23:46

same scale that they do right now but as

play23:49

time went on they found that they did

play23:51

need to scale their database I think

play23:53

what they first did was added read-only

play23:56

replicas because of course this is a

play23:57

read heavy system So reading is going to

play24:00

be more common than actually you know

play24:01

uploading new videos but even then they

play24:03

ran into issues and next they tried to

play24:06

add sharding and so they sharded their

play24:09

mySQL database and they ended up having

play24:10

a lot of complex code in their

play24:12

application server which properly routed

play24:15

uh user requests to the correct Shard

play24:18

I'm not exactly sure which Shard key

play24:20

that they used but that's what they did

play24:22

and then eventually the long-term

play24:24

solution they found was by building a

play24:27

new engine it's called vitess I'm

play24:30

actually not sure how it's pronounced

play24:31

maybe the test but this was something

play24:33

that was created at YouTube and this is

play24:36

basically to decouple the application

play24:39

layer from the database layer the

play24:41

application layer should not have to

play24:43

know about how the database is sharded

play24:45

so vitess was added like as a middle

play24:48

layer in between the application servers

play24:51

and the database at least at a high

play24:52

level and that is kind of where all the

play24:55

logic for sharding and routing the

play24:57

requests correctly lives and this is

play24:59

kind of how they were able to take even

play25:00

a relational database like MySQL and

play25:03

scale it up now maybe if they could go

play25:06

back in time they would have started

play25:07

with a nosql database in the first place

play25:09

or maybe you know some other type of

play25:11

database but they did find a way to get

play25:14

my sequel to work and actually vitess

play25:17

was later open source and it's actually

play25:19

a very popular project that's still

play25:21

being used it's very modern it's very

play25:23

very powerful it's being used by new

play25:25

companies like Planet scale which are

play25:27

taking you know my Sequel and then

play25:29

adding my test to it and just you know

play25:30

selling that as a product and of course

play25:32

adding more functionality but this kind

play25:34

of shows you when you reach problems in

play25:36

distributed systems that can kind of

play25:38

breed a lot of Ingenuity and

play25:40

resourcefulness and you can kind of

play25:41

overcome a lot of limitations that you

play25:43

know we would look at and say oh MySQL

play25:45

if we're dealing with a lot of read

play25:47

scale and we don't need an eventual

play25:48

consistency is fine we can just use our

play25:51

nosql database but they found MySQL to

play25:53

work and if you found that interesting

play25:54

you can kind of read a brief history of

play25:57

YouTube and my secret cool and the test

play25:59

in like the vitess docs here and

play26:01

probably other places on the internet

Rate This

5.0 / 5 (0 votes)

Related Tags
System DesignVideo UploadingCloud StorageMetadata ManagementVideo EncodingContent StreamingUser ExperienceScalabilityReliabilityLow Latency