How NETFLIX onboards new content: Video Processing at scale 🎥

Gaurav Sen
30 Aug 201910:44

Summary

TLDRThis video script delves into Netflix's innovative content onboarding process, addressing both legal and engineering challenges. It explains the necessity of video encoding in various formats and resolutions to cater to different internet speeds, the concept of codecs, and the importance of efficient data storage. The script also highlights Netflix's smart approach to video chunking for better user experience and their use of Amazon S3 for data storage. Furthermore, it discusses the revolutionary Open Connect technology that significantly improves streaming speed and user experience by reducing load on both Netflix and ISPs, ultimately showcasing Netflix's prowess in system design and scalability.

Takeaways

  • 😀 Netflix faces both legal and engineering challenges when onboarding new content.
  • 📚 They need to store videos in multiple formats like MP4 and AVI to accommodate different internet speeds.
  • 🔄 Codecs are used for compressing video, with lossy compression reducing file size at the cost of some data loss.
  • 📱 Netflix adjusts video resolution based on the device being used, such as lower for mobiles and higher for TVs.
  • 🔢 The number of video versions processed is determined by the product of formats (F) and resolutions (R).
  • 💡 Netflix engineers break videos into chunks to process them more efficiently across multiple processors.
  • 🎬 Video chunks are processed not by time but by scenes to avoid disrupting the viewing experience.
  • 📊 Netflix's recommendation algorithm adapts to whether a movie is 'sparse' or 'dense', optimizing data delivery accordingly.
  • 🗃️ Amazon S3 is used by Netflix for storing video content due to its cost-effectiveness for static data storage.
  • 🌐 Netflix's Open Connect boxes placed with ISPs cache content locally to reduce latency and improve user experience.
  • 🔄 These boxes are updated during off-peak hours to ensure they have the latest content for viewers.

Q & A

  • What are the main challenges NETFLIX faces when uploading new content?

    -The main challenges include legal challenges, engineering challenges, and the need to store content in different formats and resolutions to accommodate varying internet speeds and devices.

  • Why does NETFLIX use different video formats and resolutions?

    -Different video formats and resolutions are used to provide the best possible viewing experience for users with different internet speeds and devices. High-quality formats are available for fast internet connections, while lower quality formats are used for slower connections.

  • What is a codec and why is it important in video processing?

    -A codec is a method for compressing video files to reduce their size. This is important to ensure videos can be stored and streamed efficiently without taking up excessive storage space or bandwidth.

  • How does NETFLIX handle the processing of large video files?

    -NETFLIX breaks large video files into smaller chunks and processes these chunks individually in different formats and resolutions. This approach helps in managing the workload efficiently and reduces the risk of processing failures.

  • What innovative method does NETFLIX use to improve user experience while watching videos?

    -NETFLIX improves user experience by breaking videos into 4-second long chunks called shots, and then collates these shots into scenes. This method allows for seamless playback and better handling of user interactions with the video.

  • What is the difference between a sparse movie and a dense movie in NETFLIX's recommendation algorithm?

    -A sparse movie is one where users frequently jump to different points, while a dense movie is watched continuously. For sparse movies, NETFLIX only sends the requested data, whereas for dense movies, it proactively fetches and preloads future scenes to ensure seamless playback.

  • Where does NETFLIX store its video content and why?

    -NETFLIX stores its video content on Amazon S3, which is a cost-effective storage solution for static data. This allows NETFLIX to store large amounts of data without frequent updates, making it cheaper than using a traditional database.

  • How does NETFLIX improve the delivery of its content to users around the world?

    -NETFLIX improves content delivery by using local caches, known as Open Connect boxes, which store popular content close to users. This reduces the load on central servers and improves the speed and reliability of video streaming.

  • What role do Internet Service Providers (ISPs) play in NETFLIX's content delivery strategy?

    -ISPs host Open Connect boxes provided by NETFLIX, which store frequently accessed content locally. This reduces the distance data needs to travel, saving bandwidth and improving user experience by providing faster access to videos.

  • How does NETFLIX update its content on Open Connect boxes?

    -NETFLIX updates content on Open Connect boxes by sending the latest movie chunks from its central servers during low-traffic times, such as 4 am. This ensures that the boxes are populated with the newest content, providing users with up-to-date options.

Outlines

00:00

🎥 Netflix's Content Onboarding and Technical Challenges

This paragraph delves into the complexities of adding new content to Netflix's platform. It covers the need for video encoding in various formats and resolutions to accommodate different internet speeds and user devices. The speaker explains the concept of codecs and lossy compression to manage file sizes while maintaining quality. The paragraph also touches on the technical innovation of breaking videos into chunks for efficient processing and the importance of scene-based chunking to enhance user experience during playback. The discussion highlights Netflix's approach to overcoming engineering challenges associated with data storage and processing.

05:01

🌐 Netflix's Data Storage and Content Delivery Strategies

The second paragraph focuses on Netflix's strategies for data storage and content delivery. It explains the use of Amazon S3 for storing static video content due to its cost-effectiveness compared to databases. The paragraph also discusses Netflix's innovative approach to improving user experience by using Open Connect boxes, which act as localized caches for video content, reducing latency and bandwidth usage. The speaker describes how these boxes are strategically placed with internet service providers to serve content quickly and efficiently, based on regional demand and preferences. The summary also touches on the predictive algorithms Netflix uses to determine whether to stream content sparsely or densely, depending on user viewing patterns.

10:03

🚀 Netflix's Innovative Video Processing and Serving Solutions

The final paragraph wraps up the discussion by emphasizing the innovative methods Netflix employs for video processing and serving, which allow it to operate at scale. It highlights the significant impact of Open Connect boxes in handling a majority of user requests, thereby improving user satisfaction and system efficiency. The speaker also hints at the potential for future videos on system design in the real world and encourages viewers to engage with the content by leaving comments, liking, and subscribing for updates.

Mindmap

Keywords

💡Onboarding

Onboarding in the context of the video refers to the process of adding new content, such as TV series or movies, to the Netflix platform. It involves overcoming various legal and engineering challenges to ensure smooth integration. The script discusses how Netflix handles the technicalities of onboarding, such as video encoding and format compatibility, to cater to different internet speeds and device capabilities.

💡Codecs

Codecs are compression algorithms used to reduce the file size of video content while maintaining quality. The term is central to the video's theme as it explains how Netflix stores videos in various formats to accommodate different internet speeds. For instance, the script mentions that a high-quality video requires a larger file size due to less data loss, whereas a lower-quality video has a smaller file size due to lossy compression.

💡Resolutions

Resolutions are the dimensions of a digital image or video, affecting its clarity and detail. The video script explains that Netflix adjusts content to various resolutions to suit different devices, such as mobile phones, TVs, and laptops. The concept is integral to understanding how Netflix optimizes video quality for diverse viewing experiences.

💡Lossy Compression

Lossy compression is a method of reducing file size by removing some data, which can result in a loss of quality. The script uses this term to describe how video files are made smaller for streaming, particularly for lower-quality formats where some detail is sacrificed to decrease file size.

💡Encoding

Encoding in the video script refers to the process of converting video into a format suitable for streaming. It is a technical process that involves choosing the right codec and resolution to balance video quality and file size, which is crucial for the efficient delivery of content on the Netflix platform.

💡Chunks

In the context of the video, chunks are portions of a video file that Netflix breaks down for processing. The script explains that Netflix uses this method to handle video encoding more effectively, allowing for parallel processing of different formats and resolutions, which enhances the streaming experience.

💡Scenes

Scenes, as mentioned in the script, are a collection of video chunks that represent a continuous sequence of actions in a movie. Netflix breaks down videos into scenes rather than fixed time intervals to improve user experience by ensuring seamless playback without interruptions.

💡Sparse and Dense Movies

The script introduces the concepts of sparse and dense movies to describe user viewing patterns. A sparse movie is one where viewers might jump to different points in the video, while a dense movie is watched continuously. Netflix's recommendation algorithm adjusts the data delivery strategy based on these patterns to optimize the user experience.

💡Amazon S3

Amazon S3 is a cloud storage service mentioned in the script as the place where Netflix stores its video content. It is a cost-effective solution for storing large amounts of static data, such as video files, which is cheaper than using a database that requires frequent updates.

💡Open Connect

Open Connect is Netflix's innovative solution for improving streaming speed and reducing bandwidth usage. The script describes it as a caching system that places content closer to users by storing it on local servers provided by ISPs. This approach significantly enhances the user experience by reducing latency and load times.

💡ISPs

ISPs, or Internet Service Providers, are companies that provide internet access to consumers. In the context of the video, ISPs play a crucial role in delivering Netflix content by hosting Open Connect boxes, which store and serve popular content locally, thus speeding up access and reducing the load on both Netflix and ISP servers.

Highlights

NETFLIX onboards new content by overcoming legal and engineering challenges.

Different video formats like MP4 and AVI are necessary to accommodate various internet connection speeds.

Codecs are used to compress video, balancing quality and file size through lossy compression.

NETFLIX adjusts video resolutions for different devices, from mobile phones to TVs.

The number of video formats and resolutions creates a multitude of processing combinations.

Engineers at NETFLIX improve data storage techniques to reduce file sizes, such as from 6GB to 1GB.

NETFLIX breaks original videos into chunks for efficient processing and encoding.

Video chunks are processed in various resolutions and formats, streamlining the encoding process.

Breaking video into chunks based on scenes rather than time improves user experience during playback.

NETFLIX's video suggestion algorithm adapts to user behavior, categorizing movies as sparse or dense.

Amazon S3 is used by NETFLIX for storing static video content due to its cost-effectiveness.

NETFLIX's innovative solution to internet service provider limitations is the Open Connect box.

Open Connect boxes cache content locally, reducing latency and bandwidth usage.

ISPs benefit from Open Connect boxes, with around 90% of NETFLIX traffic being handled by them.

NETFLIX updates Open Connect boxes during off-peak hours to ensure the latest content is available.

NETFLIX's video processing and serving innovations enable efficient scaling of their service.

The video provides insights into real-world system designing and the practical applications of NETFLIX's methods.

Transcripts

play00:03

Sometimes it feels like I am a god.

play00:09

Hi everyone, today we'll be talking about how NETFLIX onboards new content

play00:11

onboards new content onto their platform

play00:14

So if you have a TV series or a movie and you want to get it uploaded

play00:17

on NETFLIX, apart from the legal challenges there's also

play00:20

engineering challenges that NETFLIX solves. I'll be keeping this video as

play00:23

simple as possible so that the maximum number of viewers can

play00:26

understand what's going on. But there will be some technical details

play00:28

when it comes to video encoding and other

play00:31

technical processes. Firstly what kind of challenges will we

play00:34

face when we are uploading new content ?

play00:37

Well, we need to store it in different formats

play00:40

sometimes you might be knowing about MP4, AVI and other formats

play00:43

The reason they have this is because different people

play00:45

have different internet connection speeds. So if you have a really good

play00:48

internet connection speed and you can deal with a really

play00:51

difficult format for example a detailed one where the

play00:55

data loss is minimum and you want to see like maximum video quality

play00:58

And then you'll have something like medium quality and low quality too.

play01:02

So, all of these are nothing but codecs.

play01:05

A codec is a way in which you compress video.

play01:08

So originally, like this video right now,

play01:11

It's going to be taking a lot of detail, but when I edit this video

play01:14

I'll make sure that the size of the file is not huge.

play01:17

I'll try to keep it within 1GB. So that is one

play01:20

type of codec. If I reduce the quality more then the

play01:23

size of the file reduces because it is lossy compression. I'm losing some

play01:26

data to keep the file size smaller.

play01:29

And the second thing that NETFLIX does is

play01:31

play with different resolutions.

play01:35

If you are watching on a cell phone, then the

play01:38

resolution that you need is much lesser than the resolution you need on your

play01:44

"What's it called"

play01:45

play01:48

TV or even on your laptop. In this way you're seeing

play01:51

that a single video has multiple formats and multiple resolutions

play01:55

and each of these formats and resolutions are creating tuples like they are creating pairs.

play01:59

You have high quality 720p

play02:01

The number of formats lets call that F

play02:03

into(multiplied by) the number of resolutions R

play02:06

are the number of videos that you'll end up processing.

play02:10

If the engineers in the Netflix come up with much better technique

play02:13

of storing data. Let's say you had high quality

play02:15

requiring you 6GB

play02:16

Now its just requiring you 1GB.

play02:18

Then you take the older movies that you had encoded

play02:20

which are 6GB big. You run them through the new process

play02:23

and it becomes 1 GB. But the thing is this process

play02:26

is going to take some time.So you don't want to give all this

play02:29

responsibility to a single computer because it's going to take time

play02:32

and it has a chance of failing. (what if the computer shuts down?)

play02:36

So what netflix does is really interesting and very smart.

play02:40

It takes the original video and breaks it into

play02:43

chunks.

play02:44

Now what you can do with each of these chunks is to

play02:47

run them through different resolutions and different formats.

play02:50

At the end of it, you will have this chunk

play02:53

lets say chunk A

play02:55

.mp4 So that's

play02:57

a format

play02:59

In resolution 1020.Then you will have

play03:01

A in avi may be 480

play03:04

and so on and so forth. Effectively you have taken a really big video

play03:07

and broken it into small parts, so that you can deal with it effectively

play03:10

per processor

play03:13

One resolution, one format, one chunk

play03:16

That's one task.

play03:18

The story of processing these chunks is pretty interesting.

play03:21

Initially what used to happen is, you would have this video file and you would

play03:24

break it into chunks of 3 minutes each.So that's equal size

play03:27

it looks good because every processor is doing the equal amount of work

play03:30

and you can actually quantify it. But the thing is

play03:32

imagine an action movie and at the 3rd minute

play03:36

the two cars-the villain's car is just about to overtake the hero's-

play03:39

and then you have a new chunk.

play03:42

If that's the case and someone makes an API call for this chunk

play03:46

it's going to take time. Like initially you are watching this video

play03:49

you come to this point, you get an API call

play03:52

and there is a lag.

play03:53

The user experience is bad because you wanted to see that seamlessly.

play03:56

What they ended up doing is

play03:58

breaking the chunks not based on time stamps but

play04:01

based on scenes. So you can make this

play04:04

instead of 3 minute thing, you can make it much more fine grained

play04:07

4 secs each. It's called a shot

play04:09

one shot 4 seconds and you can

play04:12

collate shots, put them all together to create a scene

play04:16

So that's the car scene you can think about. Instead of

play04:19

having it arbitrarily stop at 3 minutes

play04:23

you collate them into scenes

play04:27

and each scene has a lot of chunks. 4 second long chunks.

play04:31

Right. Now if a person is watching a video

play04:34

and they click on some point.The video suggestion algorithm will take

play04:37

this as one scene. And the user experience will be much better

play04:40

because you get the entire block fetched together.

play04:43

In fact this algorithm is much more complicated. What happens is

play04:46

netflix sees the entire movie

play04:49

and treats it like

play04:52

a set of chunks. If you arbitrarily go to points

play04:56

then netflix assumes that this movie is

play04:58

a sparse movie, in the sense that

play05:00

you go one point and you see a scene

play05:02

and then you head to next point and then you see a scene and so on and so forth

play05:05

So its recommendation algorithm, its prediction algorithm

play05:08

is going to say that this is a sparse movie

play05:11

or sparsely seen movie

play05:13

and what we should be doing is not trying to be too smart

play05:16

not trying to be sending a lot of data, instead just send the data

play05:19

that the user has asked for because they are probably clicking on

play05:23

different points in that buffer that you get. On the other hand if it's a very engaging movie

play05:26

lets say, I don't know whats an engaging movie but

play05:29

something that is dense movie meaning that people are watching it for a

play05:32

continuous period of time and you can easily say that

play05:35

you know linearly that this part is going to be picked up next.

play05:39

Then this is called a dense movie.

play05:42

Instead of sending just the part that you have asked for

play05:45

it predicatively, proactively fetches the

play05:48

future parts, gets it onto your computer

play05:50

and shows it to you.

play05:52

If you are wondering where netflix stores all this data, then its

play05:56

like google drive called Amazon S3

play05:59

Something that nearly all the engineers know.

play06:02

This is where people store their static data

play06:04

meaning that you don't change that data, you can go and store stuff.

play06:07

It's extremely cheap compared to a database

play06:10

because a database has updates and gives you

play06:13

other guarantees also. So Amazon S3 is what

play06:16

netflix uses to store that video content.

play06:19

The most interesting thing about netflix is that

play06:22

they were able to bring up an innovative solution to

play06:25

something that was there in the internet space for ages.

play06:28

You know about internet service providers. If you go on your browser right now

play06:31

and type facebook.com. What's going to happen is that

play06:35

you will talk to your internet service provider. They have a list of

play06:38

addresses.They map that to IP addresses.

play06:41

So if you facebook.com,its mapped to an IP address:

play06:44

they have a table over here, which maps it. And this

play06:47

IP address is, you can assume it to be

play06:50

physical place. Its actually a computer some where on the internet

play06:53

which is giving you Facebook. So you are literally talking to Facebook

play06:55

when you say facebook.com. So that's,

play06:58

let's say, over here.

play07:00

Very similarly when you say Netflix, it is an IP address.

play07:03

It's going to be taking you to a computer which gives you Netflix

play07:06

or is Netflix basically. So you can

play07:09

actually, end up chatting with it maybe.

play07:11

But Netflix exists somewhere

play07:15

and every time you ask your internet service provider to talk to

play07:18

netflix, it goes and talks to that computer

play07:21

and then returns you the response.

play07:24

These servers are usually in the U.S which means they are geographically concentrated.

play07:27

In a place like India which is really far

play07:30

its going to take a lot of time to send a signal and then receive it

play07:33

especially if its video because there is a lot data which is

play07:36

going to be coming in and its going to be slow.

play07:39

So to improve on user experience,one of the principle things

play07:42

you do as an engineer is to

play07:44

cache information.

play07:46

which means you pre-compute and store it in some place.

play07:50

Let's say sacred games comes out in India

play07:53

You want to watch that, you put in in a cache.

play07:56

Now Netflix extended the concept

play07:59

and applied it to ISP's.

play08:01

So what the ISP does is that when ever it gets a

play08:03

request from India, let's say

play08:06

and its a movie which is from Bollywood,

play08:08

they won't go and hit the Netflix U.S server just like that.

play08:12

They are going to be asking a cache

play08:15

which has been placed by netflix.This is called a

play08:17

Open Connect

play08:19

box.

play08:20

In this box, you are going to have a ton of movies. You can assume

play08:23

this to be something like a hard drive and

play08:25

if you find the movie here, that's well and good

play08:28

you just return it quickly. So that's a lot of bandwidth which was saved

play08:31

hitting the netflix server, that's a lot time which was saved

play08:34

that's much better user experience and also

play08:37

this is localized. So for India you can keep separate movies

play08:40

for Britain you can have different movies, for U.S you can have different movies.

play08:43

This is a brilliant concept

play08:45

because what you have done is reduced the

play08:48

load on not just you but also the ISP's.

play08:52

So they really want to have these boxes. Every time you hit

play08:55

netflix and get a really quick response,you end up assuming that

play08:57

your ISP guy is a really nice guy. Its gone upto such an extent that

play09:00

around 90% of netflix trafic

play09:03

is taken care of by these ISP boxes that they provide.

play09:07

They are called open connect and this technology is

play09:10

revolutionary not so much

play09:13

who knows but youtube is also doing this. I think youtube

play09:16

red boxes come up with ISP again saving a lot of bandwidth for them

play09:19

and really improving user experience in a lot of places.

play09:36

in that case what you can do is, around

play09:39

4 am in the night is a good time:The load on boxes is minimum.

play09:42

So you can have a lot of write operations

play09:45

being sent in from the U.S server, so it will suggest you

play09:48

what to copy. 1) You register

play09:51

your movie on netflix, 2) netflix processes them

play09:53

the same way that we talked about.

play09:56

3) After it has been brought down to chunks

play09:59

4) It sends them to your ISP or maybe

play10:02

it can directly send it over here

play10:05

and populate this box with these new movie chunks.

play10:10

That way this box has the latest content and the users are happy.

play10:14

So its the innovative menthods on the video processing and the video serving side

play10:17

which keep netflix running at scale.If you think about

play10:20

90% of your requests are being taken care of by this box.

play10:23

So that is a superb gain and

play10:26

its a really innovative solution.We will be having a lot more videos like this

play10:30

which is system designing in the real world.This is the interesting bit and of course

play10:33

if you have any doubts or suggestions,you can leave them in the comments below.

play10:36

If you like the video then make sure to hit the like button

play10:39

and if you want notifications for further videos like this, hit the subscribe button

play10:42

I'll see you next time :)

Rate This

5.0 / 5 (0 votes)

Related Tags
NetflixContent OnboardingVideo EncodingCompressionResolutionsCodecsStreamingUser ExperienceData StorageOpen ConnectISP Caching