Watching Neural Networks Learn
Summary
TLDRThis video explores the concept of function approximation in neural networks, emphasizing their role as universal function approximators. It delves into how neural networks learn by adjusting weights and biases to fit data points, using examples like curve fitting and image recognition. The video also discusses the challenges of high-dimensional problems and the 'curse of dimensionality,' comparing neural networks to other mathematical tools like Taylor and Fourier series for function approximation. It concludes by highlighting the potential of these methods for real-world applications and poses a challenge to the audience to improve the approximation of the Mandelbrot set.
Takeaways
- 🧠 Neural networks are universal function approximators, capable of learning and modeling complex relationships in data.
- 🌐 Functions are fundamental to describing the world, with everything from sound to light being represented through mathematical functions.
- 📊 The goal of artificial intelligence is to create programs that can understand, model, and predict the world, often through self-learning functions.
- 🔍 Neural networks learn by adjusting weights and biases to minimize error between predicted and actual outputs, a process known as backpropagation.
- 📉 Activation functions like ReLU and leaky ReLU play a crucial role in shaping the output of neurons within neural networks.
- 🔢 Neural networks can handle both low-dimensional and high-dimensional data, although the complexity of learning increases with dimensionality.
- 📊 The Fourier series and Taylor series are mathematical tools that can be used to approximate functions and can be integrated into neural networks to improve performance.
- 🌀 The Mandelbrot set demonstrates the infinite complexity that can be captured even within low-dimensional functions, challenging neural networks to approximate it accurately.
- 📈 Normalizing data and reducing the learning rate are practical techniques for optimizing neural network training and improving approximation quality.
- 🚀 Despite the theoretical ability of neural networks to learn any function, practical limitations and the curse of dimensionality can affect their effectiveness in higher dimensions.
Q & A
Why are functions important in describing the world?
-Functions are important because they describe the world by representing relationships between numbers. Everything can be fundamentally described with numbers and the relationships between them, which we call functions. This allows us to understand, model, and predict the world around us.
What is the goal of artificial intelligence in the context of function approximation?
-The goal of artificial intelligence in function approximation is to create programs that can understand, model, and predict the world, or even have them write themselves. This involves building their own functions that can fit data points and accurately predict outputs for inputs not in the data set.
How does a neural network function as a universal function approximator?
-A neural network functions as a universal function approximator by adjusting its weights and biases through a training process to minimize error. It can fit any data set by bending its output to match the given inputs and outputs, effectively constructing any function.
What is the role of the activation function in a neural network?
-The activation function in a neural network defines the mathematical shape of a neuron. It determines how the neuron responds to different inputs by introducing non-linearity into the network, allowing it to learn and represent complex functions.
Why is backpropagation a crucial algorithm for training neural networks?
-Backpropagation is crucial for training neural networks because it efficiently computes the gradient of the loss function with respect to the weights, allowing the network to update its weights in a way that minimizes the loss, thus improving its predictions over time.
How does normalizing inputs improve the performance of a neural network?
-Normalizing inputs improves the performance of a neural network by scaling the values to a range that is easier for the network to deal with, such as -1 to 1. This makes the optimization process more stable and efficient, as the inputs are smaller and centered at zero.
What is the curse of dimensionality and how does it affect function approximation?
-The curse of dimensionality refers to the phenomenon where the volume of the input space increases so fast that the available data becomes sparse. This makes function approximation and machine learning tasks computationally impractical or impossible for higher-dimensional problems, as the number of computations needed grows exponentially with the dimensionality of the inputs.
How do Fourier features enhance the performance of a neural network in function approximation?
-Fourier features enhance the performance of a neural network by providing additional mathematical building blocks in the form of sine and cosine terms. These terms allow the network to approximate functions more effectively, especially in low-dimensional problems, by capturing the wave-like nature of the data.
What is the difference between a Taylor series and a Fourier series in the context of function approximation?
-In the context of function approximation, a Taylor series is an infinite sum of polynomial functions that approximate a function around a specific point, while a Fourier series is an infinite sum of sine and cosine functions that approximate a function within a given range of points. Both can be used to enhance neural networks by providing additional input features.
Why might using Fourier features in high-dimensional problems lead to overfitting?
-Using Fourier features in high-dimensional problems might lead to overfitting because the network can become too tailored to the specific data points, capturing noise and irregularities rather than the underlying function. This happens when the network has too many parameters relative to the amount of data, leading to a poor generalization to new, unseen data.
Outlines
🧠 Understanding Neural Networks as Function Approximators
The paragraph introduces the concept of neural networks as universal function approximators, emphasizing their ability to learn and model complex relationships in data. It explains that functions are fundamental to describing the world, and neural networks aim to approximate these functions based on given data points. The video's purpose is to explore neural networks learning in complex spaces, their limitations, and alternative machine learning methods. The speaker admits to being a programmer with a dislike for math but acknowledges its importance. The process of function approximation in neural networks is described, where the goal is to find a function that fits a given dataset and can predict outputs for new inputs. The concept of 'curve fitting' is introduced, and the video shows an actual neural network learning to fit a curve to data points.
📈 Deeper Dive into Neural Network Function Approximation
This paragraph delves deeper into how neural networks learn through the training process, using backpropagation to adjust weights and minimize error. It discusses higher-dimensional problems, such as image recognition, where inputs and outputs are vectors representing pixel values. The paragraph explains the use of activation functions like ReLU and leaky ReLU, and how they contribute to the network's learning process. The speaker also touches on the importance of normalizing inputs and using appropriate activation functions in the output layer to ensure the network's predictions fall within the desired range. The paragraph concludes with a demonstration of how neural networks can learn complex shapes, such as parametric surfaces, despite the challenges involved.
🌀 Tackling Complex Shapes with Neural Networks
The speaker attempts to use a neural network to approximate complex shapes, such as a spiral shell surface, and acknowledges the challenges in getting the network to accurately model such intricate forms. The paragraph also introduces the Mandelbrot set as an example of an infinitely complex fractal that is difficult for neural networks to approximate due to its detailed and intricate nature. Despite the network's efforts, it struggles to capture all the details of the Mandelbrot set, highlighting the limitations of neural networks in approximating certain types of functions. The speaker then explores alternative mathematical tools like the Taylor series for function approximation, which is likened to a single-layer neural network.
🔍 Exploring Fourier Series for Function Approximation
The paragraph discusses the use of the Fourier series as a tool for function approximation, contrasting it with the Taylor series. The Fourier series is shown to be effective for approximating functions within a given range, particularly when dealing with periodic data. The speaker demonstrates how computing additional Fourier features and feeding them into a neural network can significantly improve the network's performance in learning and approximating functions. The success of this method is illustrated through its application to image data, where the Fourier-enhanced network outperforms a standard neural network. The paragraph also addresses the 'curse of dimensionality,' noting that while neural networks handle high-dimensional data well, other methods like the Fourier series can become computationally impractical.
🌐 Applying Fourier Features to Real-World Data
The speaker applies the concept of Fourier features to real-world data, specifically the MNIST dataset of handwritten digits. Despite the initial success of Fourier features in lower-dimensional problems, the paragraph reveals that their effectiveness diminishes with higher-dimensional data, such as full images. The use of Fourier features leads to overfitting, where the network performs well on training data but fails to generalize to new data. The speaker concludes by emphasizing that no single method is universally best for all tasks and that the exploration of function approximation methods, even in low-dimensional toy problems, can provide valuable insights for more complex, real-world applications.
🚀 Concluding Thoughts and a Challenge
In the final paragraph, the speaker wraps up the discussion by reiterating the importance of function approximation in machine learning and the versatility of neural networks. They issue a challenge to the viewers to improve upon the Mandelbrot set approximation, emphasizing that there is always room for discovery and innovation. The speaker expresses optimism that the exploration of these methods could lead to better solutions in the field of machine learning, encouraging a collaborative and curious approach to problem-solving.
Mindmap
Keywords
💡Neural Networks
💡Universal Function Approximators
💡Backpropagation
💡Activation Functions
💡Loss Function
💡Feature Vectors
💡Fully Connected Feedforward Network
💡Curse of Dimensionality
💡Fourier Series
💡Mandelbrot Set
Highlights
Neural networks are universal function approximators, capable of learning almost anything.
Functions are crucial as they describe the world, including sound and light.
The goal of AI is to create programs that can understand, model, and predict the world.
Neural networks build their own functions through function approximation.
Neural networks learn by adjusting weights to minimize error between predicted and true outputs.
The architecture used in the video is a fully connected feed-forward network.
Neurons in a network learn individual features of the overall function.
The training process involves backpropagation, which is not explained in detail in the video.
Higher-dimensional problems, such as image recognition, are approached by treating each pixel as an output.
Normalization and activation functions like leaky ReLU can improve neural network performance.
The video explores the limitations of neural networks when approximating complex shapes like spiral surfaces.
The Mandelbrot set is used to demonstrate the challenge of approximating infinitely complex fractals.
Alternative function approximation methods like Taylor and Fourier series are discussed.
Fourier features, derived from the Fourier series, can significantly improve neural network performance on certain tasks.
The curse of dimensionality affects many function approximation methods, but neural networks handle it well.
The video concludes with a challenge to approximate the Mandelbrot set using a universal function approximator.
Transcripts
you are currently watching a neural
network learn
about a year ago I made a video about
how neural networks can learn almost
anything and this is because they are
Universal function approximators why is
that so important well you might as well
ask why functions are important they are
important because functions
describe
the world
everything is described by functions
that's right functions describe the
sound of my voice on your eardrum
function the light that's kind of
hitting your eyeballs right now function
different classes and Mathematics
different areas in Mathematics Study
different kinds of function high school
math studies second degree one variable
polynomials calculus studies smooth one
variable functions and it goes on and on
functions describe the world
yes correct thanks Thomas he gets a
little excited but he's right the world
can fundamentally be described with
numbers and relationships between
numbers we call those relationships
functions and with functions we can
understand model and predict the world
around us
the goal of artificial intelligence is
to write programs that can also
understand model and predict the world
or rather have them write themselves so
they must be able to build their own
functions that is the point of function
approximation and that is what neural
networks do they are function building
machines in this video I want to expand
on the ideas of my previous video by
watching actual neural networks learn
strange shapes in strange spaces here we
will encounter some very difficult
challenges discover the limitations of
neural networks and explore other
methods for machine learning and
Mathematics to approach this open
problem
now I am a programmer not a
mathematician and to be honest I kind of
hate math I've always found it difficult
and intimidating but that's a bad
attitude because math is unavoidably
useful and occasionally beautiful I'll
do my best to keep things simple and
accurate for an audience like me but
know that I'm gonna have to brush over a
lot of things and I'm gonna be pretty
informal
I recommend you watch my previous video
but to summarize functions are input
output machines they take an input set
of numbers and output a corresponding
set of numbers and the function defines
the relationship between those numbers
the particular problem that neural
network solve is when we don't know the
definition of the function that we're
trying to approximate instead we have a
sample of data points from that function
inputs and outputs this is our data set
we must approximate a function that fits
these data points and allows us to
accurately predict outputs given inputs
that are not in our data set this
process is also called Curve fitting and
you can see why
now this is not some handcrafted
animation it is an actual neural network
attempting to fit the curve to the data
and it does so by sort of bending the
line into shape this process is
generalizable such that it can fit the
curve to any data set and thus construct
any function this makes it a universal
function approximator
the network itself is also a function
and should approximate some unknown
Target function the particular neural
architecture we're dealing with in this
video is called a fully connected feed
forward Network its inputs and outputs
are sometimes called features and
predictions and they take the form of
vectors arrays of numbers
the overall function is made up of lots
of simple functions called neurons that
take many inputs but only produce one
output each input is multiplied by its
own weight and added up along with one
extra weight called a bias
let's rewrite this weighted sum with
some linear algebra we can put our
inputs into a vector with an extra one
for the bias and our weights into
another vector and then take what is
called the dot product let's just make
up some example values
to take the dot product we multiply each
input by each weight and then add them
all up
finally this dot product is then passed
to a very simple activation function in
this case a relu which here returns zero
we could use a different activation
function but Aurelio looks like this the
activation function defines the neuron's
mathematical shape while the weights
shift and squeeze and stretch that shape
we feed the original inputs of our
Network to a layer of neurons each with
our own learned weights and each with
our own output value we stack these
outputs together into a vector and then
feed the output Vector as inputs to the
next layer and the next and the next
until we get the final output of the
network
each neuron is responsible for learning
its own little piece or feature of the
overall function and by combining many
neurons we can build an Ever more
intricate function with an infinite
number of neurons we can provably build
any function
the values of the weights or parameters
are discovered through the training
process we give the network inputs from
our data set and ask it to predict the
correct outputs over and over and over
the goal is to minimize the Network's
error or loss which is some measurement
of difference between the predicted
outputs and the true outputs
over time the network should do better
and better as loss goes down the
algorithm for this is called back
propagation and I am again not going to
explain it in this video I'll make a
video on it eventually I promise it's a
pretty magical algorithm
however this is a baby problem what
about functions with more than just one
input or output that is to say higher
dimensional problems
the dimensionality of a vector is
defined by the number of numbers in that
Vector for a higher dimensional problem
let's try to learn an image the input
Vector is the row column coordinates of
a pixel and the output Vector is the
value of the pixel itself in mathspeak
we would say that this function maps
from R2 to R1 our data set is all of the
pixels in an image let's use this
unhappy man as an example a pixel value
of 0 is black and one is white although
I'm going to use different color schemes
because it's pretty
as we train we take snapshots of the
Learned function as the approximation
improves that's what you're seeing now
and that's what you saw at the beginning
of this video
but to clarify this image is not a
single output from the network rather
every individual pixel is a single
output we are looking at the entire
function all at once and we can do this
because it is very low dimensional
you'll also notice that the learning
seems to slow down it's not changing as
abruptly as it was at the beginning this
is because we periodically reduce the
learning rate a parameter that controls
how much our training algorithm Alters
the current function this allows it to
progressively refine details
now just because our neural network
should theoretically be able to learn
any function there are things we can do
to practically improve the approximation
and optimize the learning process for
instance one thing I'm doing here is
normalizing the row column inputs which
means I'm moving the values from a range
of 0 1400 to the range of negative one
one I do this with a simple linear
transformation that shifts and scales
the values the negative 1 1 range is
easier for the network to deal with
because it's smaller and centered at
zero
another trick is that I'm not using a
relu as my activation function but
rather something called a leaky relu a
leaky value can output negative values
while still being non-linear and has
been shown to generally improve
performance so I'm using a leaky value
in all of my layers except for the last
one because the final output is a pixel
value it needs to be between 0 and 1. to
enforce this in the final layer we can
use a sigmoid activation function which
squishes its inputs between 0 and 1.
except there is a different squishing
function called tan H that squishes its
inputs between negative one and one I
can then normalize those outputs into
the final range of 0 1. why go through
the trouble well tan H just tends to
work better than sigmoid
intuitively this is because tanh is
centered at zero and plays much nicer
with back propagation but ultimately the
reasoning doesn't matter as much as the
results both networks here are
theoretically Universal function
approximators but practically one works
much better than the other this can be
measured empirically by calculating and
comparing the error rates of both
Networks I think of this as the science
of math where we must test our ideas and
validate them with evidence rather than
providing formal proofs it'd be great if
we could do both but that is not always
possible and it is often much easier to
just try and see what happens and that's
my kind of math
let's make it harder here we have a
function that takes two inputs UV and
produces three outputs x y z it's a
parametric surface function and we'll
use the equation for a sphere we can
learn it the same way as before take a
random sample of points across the
surface of the sphere and ask our
Network to approximate it now this is
clearly a very silly way to make a
sphere but the network is trying its
best to sort of wrap the surface around
the sphere to fit the data points
I hope this also gives you a better view
of what a parametric surface is it takes
a flat 2D sheet and contorts it in 3D
space According to some function
now this does okay though it never quite
closes up around the poles
for a real challenge let's try this
beautiful spiral shell surface I got the
equation for this from this wonderful
little website that lets you play with
all kinds of shell surfaces see what I
mean when I say that functions describe
the world
anyway let's sample some points across
the spiral surface and start learning
[Music]
[Laughter]
[Music]
well it's working but clearly we're
having some trouble here I'm using a
fairly big neural network but this is a
complicated shape and it seems to be
getting a little bit confused we'll come
back to this one
we can also make the problem harder not
by increasing dimensionality but by
increasing the complexity of the
function itself
let's use the mandelbrot set an
infinitely complex fractal
we can simply Define a mandelbrot
function as taking two real valued
inputs and producing one output the same
dimensionality as the images we learned
earlier
I have to find my mandelbrot function to
Output a value between 0 and 1 where 1
is in the mandelbrot set and anything
less than one is not under the hood it's
iteratively operating on complex numbers
and I added some stuff to Output smooth
values between 0 and 1 but I'm not going
to explain it much more than that after
all a neural network doesn't know the
function definition either and it
shouldn't matter it should be able to
approximate it all the same
the data set here is randomized points
drawn uniformly from this range now this
has actually been a pet project of mine
for some time and I've made several
videos trying this exact experiment over
the years
I hope you can see why it's interesting
despite being so low dimensional the
mandelbrot function is infinitely
complex literally made with complex
numbers and is uniquely difficult to
approximate you can just keep fitting
and fitting and fitting the function and
you will always come up short
now you could do this with any fractal I
just use the mandelbrot set because it's
so well known
so after training for a while we've made
some progress but clearly we're still
missing an infinite amount of detail
I've gotten this to look better in the
past but I'm not going to waste any more
time training this network there are
better ways of doing this
are there different methods for
approximating functions besides neural
networks yes many actually there are
always many ways to solve the same
problem though some ways are better than
others another mathematical tool we can
use is called the Taylor series
this is an infinite sum of a sequence of
polynomial functions X Plus x squared
plus X cubed plus x to the fourth up to
x to the n n is the order of the series
each of these terms are multiplied by
their own value called a coefficient
each coefficient controls how much that
individual term affects the overall
function
given some Target function by choosing
the right coefficients we can
approximate that Target function around
a specific point in this case Zero the
approximation gets better the more terms
we add where an infinite sum of terms is
exactly equivalent to the Target
function
if we know the target function we can
actually derive the exact coefficients
using a general formula to calculate
each coefficient for each term but of
course in our particular problem we
don't know the function we only have a
sample of data points so how do we find
the coefficients
well do you see anything familiar in
this weighted sum of terms we can put
all of the X to the N terms into an
inputs vector and put all of the
coefficients into a weights vector and
then take the dot product a weighted sum
the Taylor series is effectively a
single layer neural network but one
where we compute a bunch of additional
inputs x squared x cubed and so on we'll
call these additional inputs Taylor
features we can then learn the
coefficients or weights with back
propagation of course we can only
compute a finite number of these the
partial Taylor series up to some order
but the higher the order the better it
should do let's use this simple Taylor
Network to learn this function using
eight orders of the Taylor series here's
our data set and here's the
approximation
[Music]
that's not great polynomials are pretty
touchy as their values can explode very
quickly so I think back propagation has
a tough time finding the right
coefficients but we can do better rather
than using a single layer Network let's
just give these Taylor features to a
full multi-layered Network let's give it
a shot
[Music]
foreign
it's a bit wonky but this performs much
better this trick of computing
additional features to feed to the
network is a well-known and commonly
used one intuitively it's like giving
the network different kinds of
mathematical building blocks to build a
more diverse complex function
let's try this on an image data set
[Music]
well that's pretty good it's learning
but it doesn't seem to work any better
than just using a good old-fashioned
neural network the Taylor series is made
to approximate a function around a
single given point while we want to
approximate within a given range of
points a better tool for this is the
Fourier series
the Fourier series acts very much like
the Taylor series but is an infinite sum
of Sines and cosines each order n of the
series is made up of sine N X plus
cosine n x
each sine and cosine is multiplied by
its own coefficient again controlling
how much that term affects the overall
function
n these inner multiplier values control
the frequency of each wave function the
higher the frequency the more Hills the
curve has
by combining weighted waves of different
frequencies we can approximate a
function within the range of 2 pi one
full period
again if we know the function we can
compute the weights and even if we don't
we could use something called the
Discrete Fourier transform which is
really cool but we're not dealing with
it in this video
I hope you see where I'm going with this
let's just jump ahead and do what we did
before compute a bunch of terms of the
Fourier series and feed them to a
multi-layer network as additional inputs
Fourier features
note that we have twice as many Fourier
features as Taylor features since we
have a sine and cosine
let's try it on this data set
this works pretty well it's a little
wavy but not too shabby note that for
this to work we need to normalize our
inputs between negative pi and positive
Pi one full period
let's try this on an image
that looks strange at first almost like
static coming into Focus but it works
and it works really well
if we compare it to networks of the same
size trained for the same amount of time
we can see the Fourier Network learns
much better and faster than the network
without Fourier features or the one with
Taylor features just look at the level
of detail in those curly locks you can
hardly tell the difference from the real
image
now I've glossed over a very important
detail the example Fourier series I gave
had one input this function has two
inputs to handle this properly we have
to use the two-dimensional Fourier
series one that takes an input of X and
Y what we do with that extra y
here are the terms for the 2D Fourier
series up to two orders we are now
multiplying the X and Y terms together
and end up with sine x cosine y sine X
sine y cosine x cosine Y and cosine X
sine y every combination of sine and
cosine and Y and X
not only that we also have every
combination of frequencies that inner
multiplier so sine 2x times cosine 1y
and so on and so forth here's up to
three orders now four
that is a lot of terms we have to
calculate this many terms per order and
this number grows very quickly as we
increase the order much faster than it
would for the 1D series and this is just
for a baby 2D input for a 3D 4D 5D input
forget it the number of computations
needed for higher dimensional Fourier
series explodes as we increase the
dimensionality of our inputs we have
encountered the curse of dimensionality
lots of methods of function
approximation and machine learning
breakdown as dimensionality grows these
methods might work well on low
dimensional problems but they become
computationally impractical or
impossible for higher dimensional
problems
neural networks by contrast handle the
dimensionality problem very well
comparatively it is Trivial to add
additional dimensions
but we don't need to use the 2D Fourier
series we can just treat each input as
its own independent variable and compute
1D Fourier features for each input this
is less theoretically sound but much
more practical to compute it's still a
lot of additional features but it's
manageable and it's worth it it
drastically improves performance that's
what I've been using to get these image
approximations
it really shouldn't be surprising that
Fourier features help so much here since
the Fourier series and transform is used
to compress images it's how the jpeg
compression algorithm Works turns out
lots of things can be represented as
combinations of waves
so let's apply it to our mandelbrot data
set again it looks a little weird but it
is definitely capturing more detail than
the previous attempt
well that's fun to watch but let's
evaluate
for comparison here is the real
mandelbrot set
actually no this is not the real
mandobrot set it is an approximation
from our Fourier Network
now you might be able to tell if you're
on a 4k monitor especially when I zoom
in
this network was given 256 orders of the
Fourier series which means 1024 extra
Fourier features being fed to the
network and the network itself is pretty
damn big
when we really zoom in it becomes very
obvious that this is not the real deal
it is still missing an infinite amount
of detail
[Music]
nonetheless I am blown away by the
quality of the Fourier Network's
approximation Fourier features are of
course not my idea they come from this
paper that was suggested by a Reddit
commenter who I think actually may have
been a co-author I'm still missing
details from this adding Fourier
features was one of if not the most
effective improvements to the
approximation I've applied and it was
really surprising
to return to the tricky spiral shell
surface we can see that our Fourier
network does way better than our
previous attempt although the target
function is literally defined with Sines
and cosines so of course it will do well
so if Fourier features help so much why
don't we use them more often they hardly
ever show up in real world neural
networks to State the obvious all of the
approximations in this video so far are
completely useless we know the functions
and the images we don't need a massive
neural network to approximate them
but I hope that you can see that we're
not studying the functions we're
studying the methods of approximation
because these toy problems are so low
dimensional we can visualize them and
hopefully gain insights that will carry
over into higher dimensional problems so
let's bring it back to Earth with a real
problem that uses real data
this is the mnist data set images of
hand-drawn numbers and their labels
our input is an entire image flattened
out into a vector and our output is a
vector of 10 values representing a label
as to which number 0 through 9 is in the
image
there is some unknown function that
describes the relationship between an
image and its label and that's what
we're trying to discover
even for tiny 28 by 28 black and white
images that is a 784 dimensional input
that is a lot and this is still a very
simple problem for real world problems
we must address the curse of
dimensionality our method must be able
to handle huge dimensional inputs and
outputs we also can't visualize the
entire approximation all at once as
before any idea what a 700 dimensional
space looks like
but a normal neural network can handle
this problem just fine it's pretty
trivial we can evaluate it by measuring
the accuracy of its predictions on
images from the data set that it did not
see during training we'll call this
evaluation accuracy and a small network
does pretty well what if we use Fourier
features on this problem say up to eight
orders
well it does do a little better but
we're adding a lot of additional
features for only eight orders we're
Computing a total of 13
328 input features which is a lot more
than 784 and it's only two percent more
accurate when we use 32 orders of the
Fourier series it actually seems to harm
performance up to 64 orders and its
downright ruinous
this may be due to something called
overfitting where our approximation
learns the data really well too well but
fails to learn the underlying function
usually this is a product of not having
enough data but our Fourier Network
seems to be especially prone to this
this seems consistent with the
conclusions of the paper I mentioned
earlier and ultimately our Fourier
Network seems to be very good for low
dimensional problems but not very good
for high dimensional problems no single
architecture model or method is the best
fit for all tasks indeed there are all
kinds of problems that require different
approaches than the ones discussed here
now I'd be surprised if the Fourier
series didn't have more to teach us
about machine learning but this is where
I'll leave it I hope this video has
helped you appreciate what function
approximation is and why it's useful and
maybe sparked your imagination with some
alternative perspectives neural networks
are a kind of mathematical clay that can
be molded into arbitrary shapes for
arbitrary purposes
I want to finish by opening up the
mandelbrot approximation problem as a
fun challenge for anyone who's
interested how precisely and deeply can
you approximate the mandobrot set given
only a random sample of points there are
probably a million things that could be
done to improve on my approximation and
the internet is much smarter than I am
the Only Rule is that your solution must
still be a universal function
approximator meaning it could still
learn any other data set of any
dimensionality
now this is just for fun but potentially
solutions to this toy problem could have
uses in the real world there is no
reason to think that we found the best
way of doing this and there may be far
better Solutions waiting to be
discovered
thanks for watching
5.0 / 5 (0 votes)