How NETFLIX onboards new content: Video Processing at scale 🎥
Summary
TLDRThis video script delves into Netflix's innovative content onboarding process, addressing both legal and engineering challenges. It explains the necessity of video encoding in various formats and resolutions to cater to different internet speeds, the concept of codecs, and the importance of efficient data storage. The script also highlights Netflix's smart approach to video chunking for better user experience and their use of Amazon S3 for data storage. Furthermore, it discusses the revolutionary Open Connect technology that significantly improves streaming speed and user experience by reducing load on both Netflix and ISPs, ultimately showcasing Netflix's prowess in system design and scalability.
Takeaways
- 😀 Netflix faces both legal and engineering challenges when onboarding new content.
- 📚 They need to store videos in multiple formats like MP4 and AVI to accommodate different internet speeds.
- 🔄 Codecs are used for compressing video, with lossy compression reducing file size at the cost of some data loss.
- 📱 Netflix adjusts video resolution based on the device being used, such as lower for mobiles and higher for TVs.
- 🔢 The number of video versions processed is determined by the product of formats (F) and resolutions (R).
- 💡 Netflix engineers break videos into chunks to process them more efficiently across multiple processors.
- 🎬 Video chunks are processed not by time but by scenes to avoid disrupting the viewing experience.
- 📊 Netflix's recommendation algorithm adapts to whether a movie is 'sparse' or 'dense', optimizing data delivery accordingly.
- 🗃️ Amazon S3 is used by Netflix for storing video content due to its cost-effectiveness for static data storage.
- 🌐 Netflix's Open Connect boxes placed with ISPs cache content locally to reduce latency and improve user experience.
- 🔄 These boxes are updated during off-peak hours to ensure they have the latest content for viewers.
Q & A
What are the main challenges NETFLIX faces when uploading new content?
-The main challenges include legal challenges, engineering challenges, and the need to store content in different formats and resolutions to accommodate varying internet speeds and devices.
Why does NETFLIX use different video formats and resolutions?
-Different video formats and resolutions are used to provide the best possible viewing experience for users with different internet speeds and devices. High-quality formats are available for fast internet connections, while lower quality formats are used for slower connections.
What is a codec and why is it important in video processing?
-A codec is a method for compressing video files to reduce their size. This is important to ensure videos can be stored and streamed efficiently without taking up excessive storage space or bandwidth.
How does NETFLIX handle the processing of large video files?
-NETFLIX breaks large video files into smaller chunks and processes these chunks individually in different formats and resolutions. This approach helps in managing the workload efficiently and reduces the risk of processing failures.
What innovative method does NETFLIX use to improve user experience while watching videos?
-NETFLIX improves user experience by breaking videos into 4-second long chunks called shots, and then collates these shots into scenes. This method allows for seamless playback and better handling of user interactions with the video.
What is the difference between a sparse movie and a dense movie in NETFLIX's recommendation algorithm?
-A sparse movie is one where users frequently jump to different points, while a dense movie is watched continuously. For sparse movies, NETFLIX only sends the requested data, whereas for dense movies, it proactively fetches and preloads future scenes to ensure seamless playback.
Where does NETFLIX store its video content and why?
-NETFLIX stores its video content on Amazon S3, which is a cost-effective storage solution for static data. This allows NETFLIX to store large amounts of data without frequent updates, making it cheaper than using a traditional database.
How does NETFLIX improve the delivery of its content to users around the world?
-NETFLIX improves content delivery by using local caches, known as Open Connect boxes, which store popular content close to users. This reduces the load on central servers and improves the speed and reliability of video streaming.
What role do Internet Service Providers (ISPs) play in NETFLIX's content delivery strategy?
-ISPs host Open Connect boxes provided by NETFLIX, which store frequently accessed content locally. This reduces the distance data needs to travel, saving bandwidth and improving user experience by providing faster access to videos.
How does NETFLIX update its content on Open Connect boxes?
-NETFLIX updates content on Open Connect boxes by sending the latest movie chunks from its central servers during low-traffic times, such as 4 am. This ensures that the boxes are populated with the newest content, providing users with up-to-date options.
Outlines
🎥 Netflix's Content Onboarding and Technical Challenges
This paragraph delves into the complexities of adding new content to Netflix's platform. It covers the need for video encoding in various formats and resolutions to accommodate different internet speeds and user devices. The speaker explains the concept of codecs and lossy compression to manage file sizes while maintaining quality. The paragraph also touches on the technical innovation of breaking videos into chunks for efficient processing and the importance of scene-based chunking to enhance user experience during playback. The discussion highlights Netflix's approach to overcoming engineering challenges associated with data storage and processing.
🌐 Netflix's Data Storage and Content Delivery Strategies
The second paragraph focuses on Netflix's strategies for data storage and content delivery. It explains the use of Amazon S3 for storing static video content due to its cost-effectiveness compared to databases. The paragraph also discusses Netflix's innovative approach to improving user experience by using Open Connect boxes, which act as localized caches for video content, reducing latency and bandwidth usage. The speaker describes how these boxes are strategically placed with internet service providers to serve content quickly and efficiently, based on regional demand and preferences. The summary also touches on the predictive algorithms Netflix uses to determine whether to stream content sparsely or densely, depending on user viewing patterns.
🚀 Netflix's Innovative Video Processing and Serving Solutions
The final paragraph wraps up the discussion by emphasizing the innovative methods Netflix employs for video processing and serving, which allow it to operate at scale. It highlights the significant impact of Open Connect boxes in handling a majority of user requests, thereby improving user satisfaction and system efficiency. The speaker also hints at the potential for future videos on system design in the real world and encourages viewers to engage with the content by leaving comments, liking, and subscribing for updates.
Mindmap
Keywords
💡Onboarding
💡Codecs
💡Resolutions
💡Lossy Compression
💡Encoding
💡Chunks
💡Scenes
💡Sparse and Dense Movies
💡Amazon S3
💡Open Connect
💡ISPs
Highlights
NETFLIX onboards new content by overcoming legal and engineering challenges.
Different video formats like MP4 and AVI are necessary to accommodate various internet connection speeds.
Codecs are used to compress video, balancing quality and file size through lossy compression.
NETFLIX adjusts video resolutions for different devices, from mobile phones to TVs.
The number of video formats and resolutions creates a multitude of processing combinations.
Engineers at NETFLIX improve data storage techniques to reduce file sizes, such as from 6GB to 1GB.
NETFLIX breaks original videos into chunks for efficient processing and encoding.
Video chunks are processed in various resolutions and formats, streamlining the encoding process.
Breaking video into chunks based on scenes rather than time improves user experience during playback.
NETFLIX's video suggestion algorithm adapts to user behavior, categorizing movies as sparse or dense.
Amazon S3 is used by NETFLIX for storing static video content due to its cost-effectiveness.
NETFLIX's innovative solution to internet service provider limitations is the Open Connect box.
Open Connect boxes cache content locally, reducing latency and bandwidth usage.
ISPs benefit from Open Connect boxes, with around 90% of NETFLIX traffic being handled by them.
NETFLIX updates Open Connect boxes during off-peak hours to ensure the latest content is available.
NETFLIX's video processing and serving innovations enable efficient scaling of their service.
The video provides insights into real-world system designing and the practical applications of NETFLIX's methods.
Transcripts
Sometimes it feels like I am a god.
Hi everyone, today we'll be talking about how NETFLIX onboards new content
onboards new content onto their platform
So if you have a TV series or a movie and you want to get it uploaded
on NETFLIX, apart from the legal challenges there's also
engineering challenges that NETFLIX solves. I'll be keeping this video as
simple as possible so that the maximum number of viewers can
understand what's going on. But there will be some technical details
when it comes to video encoding and other
technical processes. Firstly what kind of challenges will we
face when we are uploading new content ?
Well, we need to store it in different formats
sometimes you might be knowing about MP4, AVI and other formats
The reason they have this is because different people
have different internet connection speeds. So if you have a really good
internet connection speed and you can deal with a really
difficult format for example a detailed one where the
data loss is minimum and you want to see like maximum video quality
And then you'll have something like medium quality and low quality too.
So, all of these are nothing but codecs.
A codec is a way in which you compress video.
So originally, like this video right now,
It's going to be taking a lot of detail, but when I edit this video
I'll make sure that the size of the file is not huge.
I'll try to keep it within 1GB. So that is one
type of codec. If I reduce the quality more then the
size of the file reduces because it is lossy compression. I'm losing some
data to keep the file size smaller.
And the second thing that NETFLIX does is
play with different resolutions.
If you are watching on a cell phone, then the
resolution that you need is much lesser than the resolution you need on your
"What's it called"
TV or even on your laptop. In this way you're seeing
that a single video has multiple formats and multiple resolutions
and each of these formats and resolutions are creating tuples like they are creating pairs.
You have high quality 720p
The number of formats lets call that F
into(multiplied by) the number of resolutions R
are the number of videos that you'll end up processing.
If the engineers in the Netflix come up with much better technique
of storing data. Let's say you had high quality
requiring you 6GB
Now its just requiring you 1GB.
Then you take the older movies that you had encoded
which are 6GB big. You run them through the new process
and it becomes 1 GB. But the thing is this process
is going to take some time.So you don't want to give all this
responsibility to a single computer because it's going to take time
and it has a chance of failing. (what if the computer shuts down?)
So what netflix does is really interesting and very smart.
It takes the original video and breaks it into
chunks.
Now what you can do with each of these chunks is to
run them through different resolutions and different formats.
At the end of it, you will have this chunk
lets say chunk A
.mp4 So that's
a format
In resolution 1020.Then you will have
A in avi may be 480
and so on and so forth. Effectively you have taken a really big video
and broken it into small parts, so that you can deal with it effectively
per processor
One resolution, one format, one chunk
That's one task.
The story of processing these chunks is pretty interesting.
Initially what used to happen is, you would have this video file and you would
break it into chunks of 3 minutes each.So that's equal size
it looks good because every processor is doing the equal amount of work
and you can actually quantify it. But the thing is
imagine an action movie and at the 3rd minute
the two cars-the villain's car is just about to overtake the hero's-
and then you have a new chunk.
If that's the case and someone makes an API call for this chunk
it's going to take time. Like initially you are watching this video
you come to this point, you get an API call
and there is a lag.
The user experience is bad because you wanted to see that seamlessly.
What they ended up doing is
breaking the chunks not based on time stamps but
based on scenes. So you can make this
instead of 3 minute thing, you can make it much more fine grained
4 secs each. It's called a shot
one shot 4 seconds and you can
collate shots, put them all together to create a scene
So that's the car scene you can think about. Instead of
having it arbitrarily stop at 3 minutes
you collate them into scenes
and each scene has a lot of chunks. 4 second long chunks.
Right. Now if a person is watching a video
and they click on some point.The video suggestion algorithm will take
this as one scene. And the user experience will be much better
because you get the entire block fetched together.
In fact this algorithm is much more complicated. What happens is
netflix sees the entire movie
and treats it like
a set of chunks. If you arbitrarily go to points
then netflix assumes that this movie is
a sparse movie, in the sense that
you go one point and you see a scene
and then you head to next point and then you see a scene and so on and so forth
So its recommendation algorithm, its prediction algorithm
is going to say that this is a sparse movie
or sparsely seen movie
and what we should be doing is not trying to be too smart
not trying to be sending a lot of data, instead just send the data
that the user has asked for because they are probably clicking on
different points in that buffer that you get. On the other hand if it's a very engaging movie
lets say, I don't know whats an engaging movie but
something that is dense movie meaning that people are watching it for a
continuous period of time and you can easily say that
you know linearly that this part is going to be picked up next.
Then this is called a dense movie.
Instead of sending just the part that you have asked for
it predicatively, proactively fetches the
future parts, gets it onto your computer
and shows it to you.
If you are wondering where netflix stores all this data, then its
like google drive called Amazon S3
Something that nearly all the engineers know.
This is where people store their static data
meaning that you don't change that data, you can go and store stuff.
It's extremely cheap compared to a database
because a database has updates and gives you
other guarantees also. So Amazon S3 is what
netflix uses to store that video content.
The most interesting thing about netflix is that
they were able to bring up an innovative solution to
something that was there in the internet space for ages.
You know about internet service providers. If you go on your browser right now
and type facebook.com. What's going to happen is that
you will talk to your internet service provider. They have a list of
addresses.They map that to IP addresses.
So if you facebook.com,its mapped to an IP address:
they have a table over here, which maps it. And this
IP address is, you can assume it to be
physical place. Its actually a computer some where on the internet
which is giving you Facebook. So you are literally talking to Facebook
when you say facebook.com. So that's,
let's say, over here.
Very similarly when you say Netflix, it is an IP address.
It's going to be taking you to a computer which gives you Netflix
or is Netflix basically. So you can
actually, end up chatting with it maybe.
But Netflix exists somewhere
and every time you ask your internet service provider to talk to
netflix, it goes and talks to that computer
and then returns you the response.
These servers are usually in the U.S which means they are geographically concentrated.
In a place like India which is really far
its going to take a lot of time to send a signal and then receive it
especially if its video because there is a lot data which is
going to be coming in and its going to be slow.
So to improve on user experience,one of the principle things
you do as an engineer is to
cache information.
which means you pre-compute and store it in some place.
Let's say sacred games comes out in India
You want to watch that, you put in in a cache.
Now Netflix extended the concept
and applied it to ISP's.
So what the ISP does is that when ever it gets a
request from India, let's say
and its a movie which is from Bollywood,
they won't go and hit the Netflix U.S server just like that.
They are going to be asking a cache
which has been placed by netflix.This is called a
Open Connect
box.
In this box, you are going to have a ton of movies. You can assume
this to be something like a hard drive and
if you find the movie here, that's well and good
you just return it quickly. So that's a lot of bandwidth which was saved
hitting the netflix server, that's a lot time which was saved
that's much better user experience and also
this is localized. So for India you can keep separate movies
for Britain you can have different movies, for U.S you can have different movies.
This is a brilliant concept
because what you have done is reduced the
load on not just you but also the ISP's.
So they really want to have these boxes. Every time you hit
netflix and get a really quick response,you end up assuming that
your ISP guy is a really nice guy. Its gone upto such an extent that
around 90% of netflix trafic
is taken care of by these ISP boxes that they provide.
They are called open connect and this technology is
revolutionary not so much
who knows but youtube is also doing this. I think youtube
red boxes come up with ISP again saving a lot of bandwidth for them
and really improving user experience in a lot of places.
in that case what you can do is, around
4 am in the night is a good time:The load on boxes is minimum.
So you can have a lot of write operations
being sent in from the U.S server, so it will suggest you
what to copy. 1) You register
your movie on netflix, 2) netflix processes them
the same way that we talked about.
3) After it has been brought down to chunks
4) It sends them to your ISP or maybe
it can directly send it over here
and populate this box with these new movie chunks.
That way this box has the latest content and the users are happy.
So its the innovative menthods on the video processing and the video serving side
which keep netflix running at scale.If you think about
90% of your requests are being taken care of by this box.
So that is a superb gain and
its a really innovative solution.We will be having a lot more videos like this
which is system designing in the real world.This is the interesting bit and of course
if you have any doubts or suggestions,you can leave them in the comments below.
If you like the video then make sure to hit the like button
and if you want notifications for further videos like this, hit the subscribe button
I'll see you next time :)
Посмотреть больше похожих видео
Design Thinking in Netflix | | Case Studio - 04 | #netflix #designthinking #uiux
Cookies vs Local Storage vs Session Storage
Internet Bandwidth (speed) Explained
System Design Mock Interview: Design Instagram
How does Starlink Satellite Internet Work?📡☄🖥
How to Build a Streaming Database in Three Challenging Steps | Materialize
5.0 / 5 (0 votes)