NERFs (No, not that kind) - Computerphile
Summary
TLDRThis video explores Neural Radiance Fields (NeRF), a cutting-edge AI technique for generating 3D reconstructions from 2D images. The host, with the help of a PhD student, explains how NeRF uses RGB images to create detailed 3D scenes, contrasting it with traditional methods like point clouds and meshes. They discuss the process of training a neural network to 'see' and render 3D objects, emphasizing the importance of multiple viewpoints for accurate reconstruction. The video also touches on the limitations, such as the need for a large dataset and the challenges of capturing dynamic scenes, while highlighting NeRF's potential in revolutionizing 3D rendering and asset creation.
Takeaways
- 🌐 The script introduces Neural Radiance Fields (NeRF), a technology for generating new views of scenes from a series of RGB images.
- 🎓 Lewis, a PhD student, explains NeRF, which he works with as part of his PhD, to the audience.
- 🌳 NeRF reconstructs 3D scenes with high detail from simple RGB images, unlike traditional 3D reconstruction methods like point clouds or meshes.
- 🖼️ Rendering with NeRF involves shooting rays into the environment and querying a neural network for color and density at various points along the ray.
- 🔍 NeRF's 3D representation allows it to understand the environment, unlike other generative models like diffusion which lack a 3D context.
- 📸 For effective training, NeRF requires a substantial number of images, ideally 250 or more, to achieve a good reconstruction.
- 📹 The script discusses the challenges of capturing a dynamic scene like a Christmas tree with NeRF, including issues like motion blur and overfitting.
- 🚫 Changes in the scene or photobombing can lead to artifacts like 'floaters' where the model struggles to place certain pixels accurately.
- 📈 Despite its limitations, NeRF can be used to quickly generate 3D representations from images, which can then be converted into meshes for use in games or other applications.
- 🔮 The future of 3D rendering may involve technologies like NeRF, but there are also emerging rivals like GAN splatting that show promise.
Q & A
What are Neural Radiance Fields (NeRF)?
-Neural Radiance Fields, or NeRF, is a method used in AI literature for generating new views of scenes from a series of RGB images. It reconstructs 3D scenes using a neural network, unlike traditional 3D reconstruction methods such as point clouds, voxel grids, or meshes.
How does NeRF differ from traditional 3D reconstruction methods?
-Traditional 3D reconstruction methods like point clouds, voxel grids, and meshes are discrete and can become complicated with many points or faces. NeRF, on the other hand, reconstructs scenes in high-quality detail using only RGB images, encoding the 3D scene within the parameters and weights of a neural network.
How does the rendering process in NeRF work?
-In NeRF, rendering involves shooting rays into the environment and querying a series of points along these rays. The neural network then provides color and density for each point, which is different from traditional rendering methods that might use rasterization or ray tracing.
What is the significance of density in the context of NeRF?
-Density in NeRF indicates whether a point along a ray is within an object or in empty space. A density of zero corresponds to empty space, while a density of one or more signifies that the point is inside an object, which is crucial for 3D representation.
Why is capturing multiple images from different angles important for NeRF?
-Capturing multiple images from various angles is essential for NeRF to properly reconstruct a 3D scene. It allows the neural network to understand the environment from different perspectives, which is necessary for accurate rendering when the camera viewpoint changes.
What are the downsides of using NeRF for 3D reconstruction?
-One downside of NeRF is the need for a large number of images for good reconstruction quality. Additionally, the method can struggle with scenes where objects change position or are not fully captured, leading to artifacts or noise in the rendered image.
How many images are typically needed for a good NeRF reconstruction?
-For a good reconstruction, at least 250 images are recommended. However, the quality can vary depending on the technique used and the complexity of the scene.
What is the role of the camera positions in NeRF?
-Camera positions are crucial in NeRF as they provide the spatial context for the images captured. They help the neural network understand where each image was taken from, which is necessary for reconstructing the 3D scene accurately.
Can NeRF be used to create 3D assets for games or other applications?
-Yes, NeRF can be used to extract the 3D volume of objects, which can then be converted into meshes. This can speed up the creation of 3D assets for use in games or other applications, as it requires fewer manual design and modeling steps.
What is the current state of NeRF in terms of real-time rendering?
-NeRF is not suitable for real-time rendering due to its computational intensity. It can take several seconds to render a single image, making it more suitable for pre-rendered scenes rather than real-time applications.
Are there any alternatives to NeRF for 3D reconstruction?
-Yes, there are alternatives such as GAN splatting, which is a newer method also providing impressive results for 3D reconstruction and is worth exploring for potential advantages over NeRF.
Outlines
🌐 Introduction to Neural Radiance Fields
The script introduces Neural Radiance Fields (NeRFs), a technology that generates new views of scenes from RGB images using a neural network. Unlike traditional 3D reconstruction methods like point clouds, voxel grids, or meshes, NeRFs offer a continuous representation of a scene. The discussion involves a PhD student, Lewis, who explains that NeRFs use a series of RGB images to reconstruct a 3D scene, which is then used to render images from novel viewpoints. The process is akin to ray tracing but uses a neural network to determine the color and density at various points along a ray, encoding the 3D scene within the network's parameters.
🎥 Practical Application and Limitations of NeRFs
This section delves into the practical application of NeRFs, discussing the process of capturing images and training the neural network to reconstruct a 3D scene. It highlights the importance of multiple viewpoints for accurate reconstruction and the challenges of capturing a complete scene with limited camera movement. The script also touches on the downsides of NeRFs, such as the need for a substantial number of images for good reconstruction and the potential for artifacts when scenes change or are not fully captured. The conversation suggests that while NeRFs may not be as efficient as traditional rendering, they offer a powerful tool for 3D scene capture and reconstruction from real-world images.
🖼️ Exploring NeRFs with a Christmas Tree Example
The script uses a Christmas tree as an example to demonstrate the capabilities and limitations of NeRFs. It discusses the process of capturing images with a phone, the challenges of dealing with motion blur and incomplete scene capture, and the subsequent training of the NeRF. The outcome is a 3D reconstruction that looks good from certain angles but degrades in quality when viewed from outside the captured image range. The discussion points out the need for a large dataset of images for high-quality reconstruction and the potential of NeRFs to be used for creating 3D assets for games or other applications. It also mentions the existence of rival technologies like GAN splatting and the potential future of 3D rendering.
Mindmap
Keywords
💡Neural Radiance Fields (NeRF)
💡3D Reconstruction
💡RGB Images
💡Ray Tracing
💡Overfitting
💡Data Set
💡Rendering
💡Density
💡Camera Position
💡Real-time Rendering
Highlights
Introduction to Neural Radiance Fields (NeRF), a method for generating new views of scenes from RGB images.
NeRF's ability to reconstruct 3D scenes from simple RGB images, offering an alternative to traditional 3D reconstruction methods like point clouds and meshes.
Explanation of how NeRF works by training a neural network to reconstruct scenes in 3D from a series of RGB images.
The innovative aspect of NeRF that allows for rendering by querying a neural network for color and density at points along a ray.
Comparison of NeRF's rendering process to ray tracing, with a focus on the neural network's role in determining color and density.
The importance of capturing multiple images from different angles for effective NeRF training and reconstruction.
Discussion on the limitations of NeRF when there are changes or motion in the captured images, leading to artifacts like 'floaters'.
The practical challenge of capturing a sufficient dataset for NeRF, with a recommendation of at least 250 images for good reconstruction.
The trade-off between the quality of NeRF reconstruction and the density and diversity of the captured images.
Demonstration of how NeRF can be used to extract 3D information and potentially convert it into meshes for use in games or other 3D applications.
The current state of NeRF technology, including its limitations and the potential for future improvements.
Comparison of NeRF with other emerging techniques like GAN splatting, suggesting a competitive landscape in the field of 3D rendering.
The potential of NeRF and similar technologies to revolutionize 3D rendering and their implications for the industry.
Practical demonstration of capturing a scene with a phone and the subsequent NeRF training process.
Analysis of the quality of the NeRF reconstruction, highlighting areas of strength and weakness based on the captured data.
Discussion on the practical applications of NeRF, including its current capabilities and the potential for future development.
Transcripts
yeah it's a different kind of ner we're
looking at today so these are neural
Radiance Fields right this is this is
something that's been happening you know
been going around the AI literature for
a while a couple of years at least um
really impressive ways of generating new
views of
scenes I'm familiar with how Nerfs work
but I don't work with Nerfs every day so
I brought in my tame PhD student here
Lewis hello yeah and Lewis actually you
know trains Nerfs he's working with
Nerfs as part of his PhD and so you know
you're going to explain to us how it
works I hope so yeah what they do is
they take a series of uh RGB images and
from those RGB images of a scene they're
able to reconstruct it in 3D using a
neural network um traditional methods of
3D reconstruction are things like Point
clouds voxal grids meshes things like
that but they're discret and for some
scenarios that's all you need but for
real life situations for example this
room it can get very very complicated
many many points many many faces on your
mesh
this NE Radiance field is is able to
reconstruct it in very very high quality
detail just only using simple RGB images
that you can capture on your phone so I
think the interesting thing about this
for me is that we're actually not doing
rendering in in a way you might expect
so normally what you would do with
rendering is you'd have some meshes but
you then you know you you render them as
pixels so you rasterize the image or you
do ray tracing or something like this
with a Nerf what we're actually doing is
essentially a bit like ra tracing but
we're actually using a newal Network to
say okay this Ray is going to be this
color and this Ray is going to be this
color and so we basically building our
3D scene into the kind of inside of a
newal network so the parameters and
weights of the newal network are
actually encoding our Christmas tree or
our car or whatever or our room or
whatever else it is we're we're we're uh
looking at I'm just going to draw here a
trained Nerf for now so imagine we're
looking at this from a side angle so we
have something like this and and for the
sake of this video as you will see later
I'm going to draw you a nice Christmas
tree um I hope you're better at drawing
than me uh not a
chance that's actually a lot better than
me okay and you know I'll just shade it
in green yeah there you go so and let's
imagine that we have a camera up here
now when I say camera an image has
already been taken of of this Christmas
tree in real life and what we're doing
is simulating this this camera this
camera in real life we're an image that
looks like that makes sense looking at
it from a above sort of view so how do
we render this right so what we do we
start shooting Rays into the environment
like this this is what the neural
network does so unlike things like
diffusion which basically generate noise
isn't it the stable diffusion and yeah
yeah stuff like frogs on stilts and
stuff like that yeah th this doesn't
generate an image given a position in 3D
and a and a view Direction it will give
you a color and a density at that point
in the environment that's important
that's what is different from things
like diffusion is that it has a 3D
representation it understands the
environment so now we shot this Ray
through what we need to do is query a
series of points along this Ray and ask
the neural network okay I'm at this
point I'm looking at it from this
direction what should the color be what
should the density be so let's start off
by doing this so we're going to do this
point
here right what is the color and what is
the density we go to neural network and
it comes back and says the color is
white doesn't matter because the density
is zero why because we're in empty space
right there's nothing there's nothing
there yeah it's like air you can't
render nothing so doesn't matter let's
continue do it again nothing again
nothing again nothing again
nothing oops go thank you wow here we go
we've created a point here and it's come
back with a density of one AKA it's
inside an object and it's come out of a
color of green makes sense because we're
now entering this Christmas tree and
what we're going to do is do it again
here
here here here and you've queried all
these points along the ray the neural
networ come back with nothing nothing
Well white white white but with let's
say density 0 0
green density one green density one
green density one 0 0 perhaps the thing
that when you first learn about this is
hard to get your head around is normally
in ra tracing what you would do is you
would fire ARR into your scene usually
based on your position of your pixels
and say basically query what color is it
and it might bounce around and do
lighting or something like this but
essentially it will come back and say
yeah it's red and you paint that pixel
red in this case we're actually doing
multiple samples per Ray and we're
saying what's here what's here what's
here all the way along and what you'll
do is sometimes you'll just shoot off
and they there be nothing there
sometimes you'll hit an object you'll
intersect an object and for some time
you'll see different colors and so your
rendering process is going to be about
sending out a lot of these rays and
finding out where in 3D everything is
right as opposed to just sending out a
ray and going oh it's red yeah now that
might seem really inefficient but
actually this is the only way it trains
right because if you trained a new
network and just said yeah this is red
this is red this is green this is red
then it will work very nicely at just
drawing that particular image yeah but
you can't then move the camera over here
and say okay what does it look like from
above right where we haven't necessarily
got any images so the idea would be that
you train this with with sort of another
camera my camera's worse than yours uh
and another camera so you maybe have
three cameras but now we can draw this
one and we can draw this one and this
one because we can shoot Rays out and we
can do thing what's important here is
that when you shoot the Rays out
here they are intersecting let's say
like that that's sort of bit iffy
because it goes through there but that
how you're able to finalize where that
object is in 3D space because these
points intersect on the Rays that you
shoot which is why with Nerf one of the
downsides maybe is that you need quite a
few images for it to properly get a good
reconstruction if you only have one
image what will happen is it will look
pretty great from anywhere near that
image but if you move elsewhere and it
will it will degenerate pretty quickly
um so and another thing that's
interesting about this is normally in
any kind of AI or machine learning what
you're trying to do is generalize your
approach to some other data set so you
say I want to train on this data set but
ultimately I don't really care how well
it works on that data set CU I already
know all the answers what I care about
is how it works on this data set and
this data set but in Nerf we're actually
learning this exact scene it's not going
to reconstruct a different kind of tree
cuz that's not in the training set we
only care about producing images of this
tree from different directions yeah
overfitting to the max pretty much they
tell you never to overfit but in this
case overfit as much as as you can this
is over fitting by Design right by so I
figure we could just Trot off down the
corridor and take some pictures and have
a go now Christmas tree is actually a
really hard example of pines and
everything yeah I mean that's not an
easy thing right and and and it's a good
example in a way because we'll see some
of the problems as well as the benefits
of Nerf but also it is worth knowing
that this is something that most
reconstruction techniques are going to
really struggle on right so this is a
hard problem but it's also fun yes well
you know when you say you've got to take
quite a few pictures how many is quite a
few for something like a if you want a
good reconstruction at least 250 okay
yeah and and those are the good images
there are to be fair because Nur really
popular there's lots of research going
on and there are many many techniques
that are trying to reduce the number
they don't all work very well right so
you know your mileage May VAR if you
want really good reconstruction then a
lot of images is what you're going to
need and obviously if I'm obviously I'm
a videographer so video any good video
is good because video is a lot of I mean
aside from maybe motion blur and things
like that if you've got good quality
frames that's just more and more shots
so what have we got we've got a nice
looking Christmas tree I didn't decorate
this one and this is a very complicated
scene right because we've got bushes at
the side with huge numbers of leaves
we've got sofas we've got whatever all
this stuff is hanging off the the
trellises it's a big big place as well
one thing about normal Nerf that you see
in the literature is that they run on
fairly constrained scenes like you often
you've used a robot or some other
capture rig to capture very nice
concentric circles of images all equally
spaced everything's very constrained
we're just going to kind of wave a
around and see what happens it's more
fun so all right so Lou will catch us
some videos and then we'll uh I'll yeah
I'll stand over here and not be in the
shot what happens if you get things
changing in these images it's not good
that that's where you'll get a lot of I
think they they call them floaters where
it can't figure out where to put certain
pixels in the scene cuz they're
different throughout and then it would
just create noise so you could be photob
bombed and that would really cause a
problem exactly you get kind of a
ghostly mic
and then fing back out again that's
exactly it yeah so we've got the video
now the next step is we need to get the
camera positions and then we just got
train the Nerf and uh that should
take about an hour I've trained up this
Nerf and currently we're we're viewing
it for this I'm using Nerf Studio which
is what a lot of people are using now
cuz it's very userfriendly and it's very
very good and this is the sort of thing
that it looks like so you can see all of
the images and where they are in 3D
space and that was me going around that
Christmas tree it takes a while for the
for it the quality to increase because
what this does it renders it very quick
because you need to get an understanding
of the environment but Nerfs are slow
right for real-time rendering so it
takes about 5 seconds for it to render a
good image every time but if I move it
around here and let me just get rid of
the cameras here so you can just see the
thing give it a few seconds and it
should there we go now I want to talk
talk about what's good and what's bad
about this there are Nerf data sets out
there that are fantastic and you'll be
able to get really good high quality
reconstructions from them this is is not
so good because I took it on my phone
and there's lots of things like motion
blur different sort of issues I didn't
capture the whole scene but things like
the Christmas tree that looks pretty
good to me you know you can see the bu
balls on it and high quality you're not
this tall right with the greatest
respect if you come down yeah yeah will
it be better if you're closer to where
the original cameras were ex it should
be oh hang on hang on it's a bit finicky
mind you there we go so this is a player
yeah it's a real time renderer and it
looks good from this perspective because
this was where the images were taken
from cuz that's around my height where I
was taking it from and this is probably
very close to there you go it's very
close to where these images were taken
which is why you can see when it loads
you can see the background somewhat you
can see the tree Tre it makes sense the
Christmas tree looks good as soon as you
move out suddenly it looks pretty bad
right and that is one of the things with
Nerf is that it's very good where the
images can see the scene there is going
to be a lot of loss of quality when you
go outside where we were capturing the
images from would you call this a data
set or a picture or an image or a scene
what would you call it I would call this
a data set data set so how big is a data
set like that so this is around 300
images so Nerf data sets have a series
of Imes and then uh Json far with where
all the camera positions are that's
that's it but this is about 300 but
they're all relatively if you see here
they're all pretty close together right
if you want a really good capture really
good scene you want them spaced far
apart capturing the entire thing I'm not
Mr Fantastic with long arms I can only
capture things around here right which
is why when you go and look let's say
over here in the background
you'll see that the
background here especially on the
ceiling it's really not very good
because not a lot of dat was captured in
these images of the background so that's
why it's not as good you see here this
is just noise and the reason why that is
just noise cuz I didn't actually capture
any of that floor when I was capturing
the video I just forgot which is why
when when you look at it from here it
looks awful but that's not really the
fault of the Nerf per se it's more of a
fault of me perhaps it's worth thinking
about what we would actually use this
for because ultimately people might be
looking at this and going well it's not
as good as a 3D render but but we
actually only had to capture 300 images
we didn't have to artistically design
the tree in 3D we didn't have to paint
all the all the meshes we didn't have to
do any of that um and you know I
couldn't do that anyway because you've
seen my drawing abilities so but the
other thing is that we also as well as a
color we can also extract where the
objects are which means that you can
convert a lot of these objects into a
mesh so you can use this multi view
reconstruction to essentially obtain 3D
Volume so you could then speed up your
creation of an actual 3D asset that you
could use in a game or something like
this is this the future then of of 3D
rendering is this how it's all going to
work this month possibly yeah actually
there's a new rival to Nerf called Gan
splatting which is also providing
incredibly impressive results so maybe
that's video number
two this network is maybe slightly
better when it has a text estima the
noise so you actually put in two images
of dystopian abandoned futuristic I with
overgown plants right and then I just
put them in a four Loop and just produce
200 of them so I can pick the nice ones
5.0 / 5 (0 votes)