Week 5 -- Capsule 1 -- From linear classification to Neural Networks
Summary
TLDRCe script introduit les réseaux de neurones en commençant par la classification linéaire et en montrant comment passer à un réseau de neurones pour gérer les données non séparables linéairement. Il explique comment combiner plusieurs classifieurs linéaires pour obtenir une décision non linéaire. Le script utilise un exemple graphique avec des neurones pour illustrer la manière dont les réseaux de neurones fonctionnent, comment ils sont constitués de couches d'entrée, de couches cachées et de sortie, et comment ils peuvent être utilisés pour différentes tâches d'apprentissage supervisé, en soulignant leur rôle clé dans la révolution du deep learning.
Takeaways
- 🧠 Les réseaux neuronaux sont une extension des modèles de classification linéaire permettant de gérer des données non séparables linéairement.
- 🔍 La classification linéaire utilise une régression linéaire avec un seuil pour séparer les classes, mais elle a des limites avec les données non linéaires.
- 🤖 Pour traiter les données non linéaires, on peut combiner plusieurs classifieurs linéaires pour obtenir une décision non linéaire.
- 🔗 Chaque classifieur linéaire est représenté par un modèle avec des poids et un biais, et leur combinaison forme la base d'un réseau neuronal.
- 📈 Les réseaux neuronaux peuvent être visualisés graphiquement avec des noeuds (ou neurones) qui reçoivent des signaux et effectuent des calculs.
- 💡 Les neurones effectuent un produit scalaire entre les poids et les entrées, suivi d'une fonction d'activation non linéaire pour produire une sortie.
- 🌐 Les réseaux neuronaux sont composés de couches, y compris une couche d'entrée, une ou plusieurs couches cachées, et une couche de sortie.
- 🔑 Le nombre de couches et la taille de chaque couche sont des hyperparamètres qui influencent la capacité du modèle à apprendre des données complexes.
- 📊 Les réseaux neuronaux sont flexibles et peuvent être adaptés à de nombreuses tâches d'apprentissage supervisé, comme la régression, la classification et l'estimation de densité.
- ⏳ L'essor des réseaux neuronaux est au cœur de la révolution du deep learning, avec des progrès considérables au cours des dix dernières années dans la capacité à résoudre des problèmes complexes.
Q & A
Quel est le problème typique abordé dans le script?
-Le script aborde le problème de classification linéaire binaire, où l'objectif est de classer des données en deux classes distinctes, représentées par des points verts et bleus dans un espace bidimensionnel.
Comment est défini le modèle de régression linéaire dans le script?
-Le modèle de régression linéaire est défini comme un produit scalaire entre w et x, plus un terme de biais w0, qui sert de base pour établir une frontière de décision.
Quelle est la limitation du modèle de régression linéaire mentionnée dans le script?
-La limitation du modèle de régression linéaire est qu'il ne peut obtenir zéro erreur que si les données sont séparables linéairement. Il échoue avec des données non séparables linéairement.
Quelle est la solution proposée pour traiter les données non séparables linéairement?
-Pour traiter les données non séparables linéairement, le script suggère d'utiliser des réseaux neuronaux qui combinent la décision de plusieurs classifieurs linéaires pour obtenir une frontière de décision non linéaire.
Comment les deux classifieurs linéaires sont-ils combinés pour former un réseau neuronal?
-Les deux classifieurs linéaires sont combinés en utilisant leurs sorties respectives comme entrées pour un troisième modèle, qui est également un classifieur linéaire, pour prendre une décision finale.
Quels sont les éléments clés d'un neurone dans un réseau neuronal?
-Les éléments clés d'un neurone dans un réseau neuronal incluent les connexions associées aux poids, le sommet de la fonction d'activation non linéaire, et la capacité de calculer une sortie basée sur les entrées.
Pourquoi les réseaux neuronaux sont-ils considérés comme des modèles flexibles?
-Les réseaux neuronaux sont considérés comme des modèles flexibles car ils permettent d'ajouter des neurones, des couches et même différents types de connexions pour créer des modèles plus puissants capables de modéliser divers types de données.
Quels sont les types de tâches supervisées auxquelles les réseaux neuronaux sont bons?
-Les réseaux neuronaux sont particulièrement efficaces pour des tâches supervisées telles que la régression, la classification et l'estimation de densité.
Quel est le lien entre les réseaux neuronaux et la 'révolution de l'apprentissage profond'?
-Les réseaux neuronaux sont au cœur de la 'révolution de l'apprentissage profond' car ils sont les modèles derrière les techniques d'apprentissage profond, qui ont permis de réaliser des progrès considérables dans le domaine de l'intelligence artificielle.
Quelle est la signification des couches dans un réseau neuronal?
-Dans un réseau neuronal, les couches représentent les différentes étapes de traitement des informations. Il y a généralement une couche d'entrée, une ou plusieurs couches cachées, et une couche de sortie, chacune contenant un certain nombre de neurones.
Outlines
🤖 Introduction aux réseaux de neurones
Le paragraphe 1 présente une introduction aux réseaux de neurones en commençant par la classification linéaire. Il explique que si les données sont séparables linéairement, un modèle de régression linéaire avec un produit scalaire entre w et x et un terme de biais w0 peut être utilisé pour créer une frontière de décision. Cependant, si les données ne sont pas séparables linéairement, un modèle plus puissant est nécessaire. L'idée est de combiner plusieurs classifieurs linéaires pour obtenir une décision non linéaire. L'exemple donné montre comment deux classifieurs linéaires peuvent être combinés pour classer des données qui ne sont pas linéairement séparables.
🔄 Combinaison de modèles linéaires pour une décision non linéaire
Le paragraphe 2 approfondit la manière dont deux modèles linéaires peuvent être combinés pour créer une décision non linéaire. Il est expliqué que si les deux modèles s'accordent sur la catégorie d'un point donné, alors cette catégorie est choisie comme la prédiction finale. Si les modèles ne sont pas d'accord, un modèle de combinaison est utilisé pour prendre la décision finale. Ce modèle de combinaison est également un classifieur linéaire qui prend en entrée les sorties des deux premiers modèles. L'auteur illustre cela avec un exemple où les sorties des deux premiers modèles sont combinées pour prédire la classe correcte.
🧠 Vue graphique des réseaux de neurones
Le paragraphe 3 introduit la notion de réseau de neurones comme une représentation graphique des calculs effectués par les classifieurs linéaires. Chaque connexion entre les neurones est associée à un poids, et chaque neurone effectue un produit scalaire entre ses poids et les entrées, suivi d'une fonction d'activation non linéaire. Le paragraphe explique que les réseaux de neurones sont inspirés par les mécanismes de calcul du cerveau et que les noeuds ou neurones sont les unités de base de ces réseaux. Il est également mentionné que les réseaux de neurones sont des modèles très flexibles qui peuvent être utilisés pour diverses tâches d'apprentissage supervisé.
🌐 Propriétés générales des réseaux de neurones
Le paragraphe 4 décrit les propriétés générales des réseaux de neurones, soulignant leur flexibilité et leur capacité à être hautement non linéaires. Il mentionne qu'ils sont utilisés pour diverses tâches telles que la régression, la classification et l'estimation de densité, et qu'ils peuvent également être utilisés pour l'apprentissage non supervisé. Le paragraphe conclut en soulignant l'importance des réseaux de neurones dans la révolution de l'apprentissage profond et les progrès réalisés dans le domaine de l'apprentissage automatique au cours des dix dernières années.
Mindmap
Keywords
💡Réseaux neuronaux
💡Classification linéaire
💡Séparabilité linéaire
💡Modèle de régression linéaire
💡Biais
💡Couche cachée
💡Nœud
💡Fonction d'activation
💡Hyperparamètres
💡Apprentissage profond
Highlights
Introduction to neural networks starting from linear classification.
Explanation of linear classification and its limitations with non-linearly separable data.
The concept of combining multiple linear classifiers to achieve non-linear decision boundaries.
Graphical representation of a simple neural network with two linear classifiers.
Formalization of combining linear classifiers through a third model, f double prime.
The role of decision rules in classifying data points based on the output of linear models.
Algorithm for model combination involving evaluation and output combination steps.
Graphical view of a neural network with nodes representing computations and connections representing signals.
Definition and function of neurons in a neural network.
Different types of neural networks and the concept of feedforward neural networks.
Importance of the number of hidden layers and their size as hyperparameters.
The forward path in neural network computation and prediction.
Neural networks' flexibility and their ability to model non-linear relationships.
Applications of neural networks in supervised learning tasks.
The role of neural networks in the deep learning revolution.
Historical context of neural networks' popularity in machine learning.
The cyclical nature of machine learning trends and the current dominance of deep learning techniques.
Transcripts
in this capsule we're going to introduce
neural networks
the way we're going to do this is we're
going to start from linear
classification
and then we're going to see how we can
extend that to obtain a neural network
so recall the linear classification
problem
the bottom what i'm showing is this pro
typical problem which we've talked about
a couple times already we have data in
two dimension
and our problem is to classify two
classes it's a binary classification
problem
the green class and the blue class
recall that we learned that one thing we
could do
is to start from a linear regression
model that is a dot product between
w and x plus a bias term w0
and add on top of it a decision boundary
we saw that geometrically speaking the
decision boundary is something that
divides our space into
here two regions the
above the decision boundary we say that
the point is going to be green and below
decision boundary our model would
predict that
a point would belong to the blue class
okay
so that sort of model can obtain zero
error if the data
is linearly separable but what about the
case when it's not
so let's take a simple example here
where we have four points in two
dimension
we see that these points are not
linearly separable that is you cannot
found a linear decision boundary
that would perfectly classify this data
set
so what does that mean well for one
it means that we're going to need a more
powerful model to be able to do this
right and the intuition we're going to
follow to extend linear classification
to neural network
is this idea that we're going to be able
to combine the
decision of several linear classifiers
okay so what do i mean by that well
let's look at this
in this particular case what i have is
two linear classifiers
one that's parametrized by w and w zero
and the second that's characterized by w
prime and w
zero prime okay and so
intuitively whatever falls
above the top line should be the cream
to belong to green class whatever falls
below the bottom line or the bottom
decision boundary should belong to the
green class
and whatever falls in the middle of the
two
should belong to the blue class okay
so there might be a way to combine these
two
to obtain a non-linear decision
effectively a non-linear
decision okay so how can we do this a
little bit more formally
well let's imagine that our we call our
model that's characterized by w and w0
we call that f okay and so our model f
will predict that things are from the
green class if
they are above the decision boundary
which is this one here
okay and that model will predict that
thing that
points or datum or data are from the
blue class
if they are under the decision
boundaries in this region
here okay and so you see obviously that
that
model on its own would misclassify this
particular point
we could do the same thing for our model
number two f
prime okay the decision boundary of f
prime is this one
and so we see that things above
the decision boundary of f prime will be
classified as blue and things below
the boundary of x prime will be
classified as green
so again this model would make an error
it would misclassify this particular
point here okay
now let's see though how we can combine
these two models so
there's the first two cases and then
there's a third case which is
perhaps the more interesting one so in
the first case we're saying that
we have a point and for that point
the model f predicts that it's going to
belong to the green class okay so it
means that it's
above this line right here oops
sorry about that okay so we're saying
that it's
above this line right here okay then
we know that regardless of what f
prime says right we know that it's going
to belong to the green class so in this
particular case of course f
prime would say that it belongs to the
blue class
but regardless of that we know that if
it's above
the decision boundary of f of the model
f
then we will predict as being going into
the green class
okay so this is similar for the second
case here
where if f
prime of x says that it's it belongs to
green class
it really means that it's below this
decision boundary right here
right and so if it is below decision
boundary then
the value of f of x doesn't really
matter
we know that it should belong to the
green class now the third case is the
most interesting one
this is the case where f of x says that
it's below its decision boundary so it
means that it's here
and f prime of x also or says
that it's above its decision boundary
right so really in this
middle zone right here right
okay and so in this case right
we would imagine that if both say that
it's blue then effectively
the correct class to predict here would
be blue
okay so in this particular case we've
combined
the output of both models to come up
with the decision
okay so if we were to
think a little bit or try to write down
an algorithm to get to this
model combination it would really have
two steps
in the first step we evaluate each model
given the data okay so f uses the data x
and f prime also uses the data x
then in the second step we're going to
combine the output
of both of these models right so in a
sense it's as if we had a
third model that used as input
not the x's but rather used as input
the output of x of f and f prime okay
their prediction
okay and so mathematically of course we
could try to formalize this
and we could imagine that this third
model can also be written
as a linear classifier in particular
here
we call it f double prime and we
parametrize it using
w double prime and w0 double prime
what's important to note is that the
input
here are the output of the models from
the first step
okay the output of f x and f prime of x
they are
multiplied or a dot product is taken
between w double prime and these outputs
and then we add a bias w zero double
prime to it
and then we do exactly what we did
before we take a decision
rule we added this in rule on top of
what seems like a linear regression
model
okay so here i wrote it's a threshold so
above a value it's a
particular above a value say it's going
to be green and below a value it's going
to be blue
but this what i mean here is really just
a decision rule
on on just like we did for f prime and f
f x f of x and f prime of x
okay so what did we learn here
well we learned that we can effectively
combine
two linear models to obtain
effectively a non-linear model right a
model which would correctly classify
this nonlinearly separable data okay
so how do we get exactly from there to
neural nets
well this is effectively your first
example of the neural net
okay but let's try to try to get a
little bit more intuition
and in particular what i'll do here is
that i'll introduce
this graphical view of things okay
so what we have here in our previous
slide what we had
is we had a datum right an instance
this instance is in two dimensions x1
and x2 then
both of these x1 and x2 so datum
is passed both to f of x
this is why i have a little arrow here
that goes from x1 to f of x
and x2 to f of x okay so you can think
of these little arrows as being signals
i'm sending
the this information about the value of
x1 to f of x and i'm sending the value
of x2
also to f x okay and of course i'm doing
the same thing for f
prime of x we said that both have access
to the data
then this little circle here which i can
call a node
which we'll later call a neuron is where
computation happens in particular here
the computation that happens is a dot
product
right between the my parameters
w the parameters of f and
my x's okay
once i have this output so once i have a
dot product
i then have passed it through my
decision rule once i've done that for
f and f prime of x then i could take
these
outputs and use them as
input to my third model right my
combination model
f double prime of x and of course the
third step
the third and final step is that f
double prime of x
will then make a prediction it will say
whether or not the current the model
with the current parameters w
w prime and w double prime of course
belongs to
so the model believes that this
particular
datum x1 and x2 belongs to the class
blue the blue class or to the green
okay so what i've done here is exactly
what i did in the previous slide except
that i'm now
using this sort of graphical notation
okay
and this in fact as i said on the
previous slide
is a neural network we call it a neural
network because it's biologically
inspired
it resembles a little bit the sort of
resembles a sort of computations that
happen in in brains
okay and so in particular because
of this this connection to computations
in the brain
we call these nodes or these circles we
call them neurons
or if we had a single the neural network
with a single node we could
call that a perceptron
okay so we can try it of course to make
um some of the things i've said a little
bit more formal
so here is exactly the picture you had
on the previous slide
here each arrow denotes what we called a
connection
right or a signal that's associated with
the weight it means that each connection
is associated with a particular
scalar w okay
right so this is what i have here then
each
node here is the weighted sum of its
input
followed by a nonlinear activation
mathematically speaking this is what i
have imagine i have a node
with weights w
w something index and then one what i'm
saying here is
i take the input to the node and i do a
dot
product with this w okay or weighted sum
and then i pass it through a nonlinear
activation function this is what until
now i've been calling this a decision
rule or the threshold
and here i denote it by the symbol sigma
and we'll see exactly what that is in a
couple slides
okay um and
what's important is that there are
different types of neural network
what i'm showing here is a feed forward
neural network okay so in a feed forward
neural network
information always um goes forward
okay and so that means that the
connections go from
data to output or in this case from left
here to right okay each connection must
go from left to right
so there are no connections between a
layer right
within a layer so for example there are
no connections going from
this neuron to the neuron below right
and nor are there connections that are
going backward right so there are no
connections for example that go from the
output layer back to um to another a
previous
neuron in a previously okay
now you'll note that i've used the word
layer
so this particular neural net has three
layers it has
an input layer right this is really
effectively a data layer and so the size
of the input layer
that will be given by the number of
inputs you have here
our data lives in two dimension and so
we have two inputs
in addition what we often do is we add a
third input so that's why it says that
the input layer its size is the number
of inputs plus
one right or your size the size of your
independent the number of independent
variables plus one
and this one here is just because it
adds
a bias okay so effectively what i'm
saying here is
what we have here this is w zero prime
and this is w zero okay so that's sort
of
a common way to add a bias just to say
that we have an input
that's always fixed to one okay
then after the input layer what we have
is a hidden layer right so this is what
we have here
our hidden layer effectively picks
a signal that comes in through its
connections does
computations and then sends its output
either to another
hidden layer or to an output layer okay
so the number of hidden layers here is a
single one but the number of hidden
layers
is a hyperparameter and the size
of each hidden layer that is how many
neurons a hidden layer has
here this hidden layer has two neurons
that is
also a hyper okay
and intuitively speaking the more hidden
layers you have
and the bigger each hidden layer is
the more capacity your model has okay
and so one way to regularize neural
network is to
limit the number of hidden layers and
limit the size of each hidden layer
the third layer is the output layer it
will take
information from the last hidden layer
transform it right so there's another
dot product followed by
a decision rule or that sigma symbol i
used earlier
and then i obtain an output okay and so
the size of the output layer the number
of neurons
could be more than one in our case in
the case we've looked at so far
we only need to predict a single value
right whether something is green or blue
and so that requires a single neuron but
of course if we wanted to
output a vector of values then
we could have multiple output neurons
here
okay so to make things very clear
we can compute a prediction in the
neural network by doing what we call the
forward path
that is we input to the the neural net
to our neural net
um data here x1 and x2
then at each layer right we do a layer
by layer
um computation or we do layer wise
computation
where at each layer we calculate for
each neuron in the layer
its output right once we've done that
for each layer then we can pass
its output to the next layer here the
output layer
this neuron would do its computation and
finally we would obtain
a prediction okay
so what we show now are neural networks
would have introduced in this capsule
are neural networks um here's sort of
some of their more
sort of general properties so they are a
very flexible model class right
so as you may have intuited from the
last couple of
slides i could just add neurons i could
add layers i could add even different
types of connections right
to obtain ever more powerful models or
to obtain models that
that can that can model specific types
of data
in general although each neuron is a
linear model
their combination as i as we showed
earlier
gives something that's that's is
non-linear right
and so neural networks in general are
can be
highly non-linear models they are good
for
many supervised learning tasks such as
regression classification and density
estimation
and as we'll see in a couple weeks we
can also use them in other tasks such as
unsupervising for unsupervised learning
tasks
finally last but certainly not least
they are the models behind deep learning
or as it's sometimes called the deep
learning revolution
okay so over the last
about 10 years much of the progress
behind machine learning
much of the new task the new ai type
task that machine learning is able to
solve
comes because we can now train
very large neural networks to do
very sort of to do some to do very
amazing
amazing things i should say right to do
tasks that
seem to really require artificial
intelligence
okay having said that i also want to
briefly touch upon a particular
historical aspect
so as i said one of the reasons behind
um all the excitement in the field of
machine learning in the eye
is exactly because of neural networks
right we'll define exactly in a future
capsule
what a deep neural network is and what
deep learning effectively is
okay but so yeah so much of the
excitement
in the eye world these days comes from
deep learning
which means that if you do now in
machine learning class
they're all most machine learning class
will involve some
deep learning techniques some neural
nets some discussion of neural networks
okay this might not have been the case
had you taken a class say 15 to 20 years
ago
where back then one of the most popular
models in terms of how well it worked
but also in terms of our theoretical
understanding were support vector
machines right which i
briefly discussed a couple capsules ago
this also means that
[Music]
we have to take into account these these
sorts of cycles right
and it also means that we don't exactly
know um
what will be what will make up the
machinery class in 15 to 20 years
right but at least for now these deep
learning techniques are performing
beyond sort of our wildest imagination
Browse More Related Video
Week 5 -- Capsule 3 -- Learning representations
Week 3 -- Capsule 2 -- Linear Classification
Qu'est-ce que les réseaux de communication ? Comment ça marche Internet ? 🤔🌐
Comment sommes nous connectés ? | Feat. E-penser, Manon Bril & bien d'autres | EPISODE #9
Comprendre les modèles OSI et TCP/IP
C'est quoi le TCPIP ?
5.0 / 5 (0 votes)