Week 5 -- Capsule 1 -- From linear classification to Neural Networks

MATH-ML1
17 Sept 202019:28

Summary

TLDRCe script introduit les réseaux de neurones en commençant par la classification linéaire et en montrant comment passer à un réseau de neurones pour gérer les données non séparables linéairement. Il explique comment combiner plusieurs classifieurs linéaires pour obtenir une décision non linéaire. Le script utilise un exemple graphique avec des neurones pour illustrer la manière dont les réseaux de neurones fonctionnent, comment ils sont constitués de couches d'entrée, de couches cachées et de sortie, et comment ils peuvent être utilisés pour différentes tâches d'apprentissage supervisé, en soulignant leur rôle clé dans la révolution du deep learning.

Takeaways

  • 🧠 Les réseaux neuronaux sont une extension des modèles de classification linéaire permettant de gérer des données non séparables linéairement.
  • 🔍 La classification linéaire utilise une régression linéaire avec un seuil pour séparer les classes, mais elle a des limites avec les données non linéaires.
  • 🤖 Pour traiter les données non linéaires, on peut combiner plusieurs classifieurs linéaires pour obtenir une décision non linéaire.
  • 🔗 Chaque classifieur linéaire est représenté par un modèle avec des poids et un biais, et leur combinaison forme la base d'un réseau neuronal.
  • 📈 Les réseaux neuronaux peuvent être visualisés graphiquement avec des noeuds (ou neurones) qui reçoivent des signaux et effectuent des calculs.
  • 💡 Les neurones effectuent un produit scalaire entre les poids et les entrées, suivi d'une fonction d'activation non linéaire pour produire une sortie.
  • 🌐 Les réseaux neuronaux sont composés de couches, y compris une couche d'entrée, une ou plusieurs couches cachées, et une couche de sortie.
  • 🔑 Le nombre de couches et la taille de chaque couche sont des hyperparamètres qui influencent la capacité du modèle à apprendre des données complexes.
  • 📊 Les réseaux neuronaux sont flexibles et peuvent être adaptés à de nombreuses tâches d'apprentissage supervisé, comme la régression, la classification et l'estimation de densité.
  • ⏳ L'essor des réseaux neuronaux est au cœur de la révolution du deep learning, avec des progrès considérables au cours des dix dernières années dans la capacité à résoudre des problèmes complexes.

Q & A

  • Quel est le problème typique abordé dans le script?

    -Le script aborde le problème de classification linéaire binaire, où l'objectif est de classer des données en deux classes distinctes, représentées par des points verts et bleus dans un espace bidimensionnel.

  • Comment est défini le modèle de régression linéaire dans le script?

    -Le modèle de régression linéaire est défini comme un produit scalaire entre w et x, plus un terme de biais w0, qui sert de base pour établir une frontière de décision.

  • Quelle est la limitation du modèle de régression linéaire mentionnée dans le script?

    -La limitation du modèle de régression linéaire est qu'il ne peut obtenir zéro erreur que si les données sont séparables linéairement. Il échoue avec des données non séparables linéairement.

  • Quelle est la solution proposée pour traiter les données non séparables linéairement?

    -Pour traiter les données non séparables linéairement, le script suggère d'utiliser des réseaux neuronaux qui combinent la décision de plusieurs classifieurs linéaires pour obtenir une frontière de décision non linéaire.

  • Comment les deux classifieurs linéaires sont-ils combinés pour former un réseau neuronal?

    -Les deux classifieurs linéaires sont combinés en utilisant leurs sorties respectives comme entrées pour un troisième modèle, qui est également un classifieur linéaire, pour prendre une décision finale.

  • Quels sont les éléments clés d'un neurone dans un réseau neuronal?

    -Les éléments clés d'un neurone dans un réseau neuronal incluent les connexions associées aux poids, le sommet de la fonction d'activation non linéaire, et la capacité de calculer une sortie basée sur les entrées.

  • Pourquoi les réseaux neuronaux sont-ils considérés comme des modèles flexibles?

    -Les réseaux neuronaux sont considérés comme des modèles flexibles car ils permettent d'ajouter des neurones, des couches et même différents types de connexions pour créer des modèles plus puissants capables de modéliser divers types de données.

  • Quels sont les types de tâches supervisées auxquelles les réseaux neuronaux sont bons?

    -Les réseaux neuronaux sont particulièrement efficaces pour des tâches supervisées telles que la régression, la classification et l'estimation de densité.

  • Quel est le lien entre les réseaux neuronaux et la 'révolution de l'apprentissage profond'?

    -Les réseaux neuronaux sont au cœur de la 'révolution de l'apprentissage profond' car ils sont les modèles derrière les techniques d'apprentissage profond, qui ont permis de réaliser des progrès considérables dans le domaine de l'intelligence artificielle.

  • Quelle est la signification des couches dans un réseau neuronal?

    -Dans un réseau neuronal, les couches représentent les différentes étapes de traitement des informations. Il y a généralement une couche d'entrée, une ou plusieurs couches cachées, et une couche de sortie, chacune contenant un certain nombre de neurones.

Outlines

00:00

🤖 Introduction aux réseaux de neurones

Le paragraphe 1 présente une introduction aux réseaux de neurones en commençant par la classification linéaire. Il explique que si les données sont séparables linéairement, un modèle de régression linéaire avec un produit scalaire entre w et x et un terme de biais w0 peut être utilisé pour créer une frontière de décision. Cependant, si les données ne sont pas séparables linéairement, un modèle plus puissant est nécessaire. L'idée est de combiner plusieurs classifieurs linéaires pour obtenir une décision non linéaire. L'exemple donné montre comment deux classifieurs linéaires peuvent être combinés pour classer des données qui ne sont pas linéairement séparables.

05:01

🔄 Combinaison de modèles linéaires pour une décision non linéaire

Le paragraphe 2 approfondit la manière dont deux modèles linéaires peuvent être combinés pour créer une décision non linéaire. Il est expliqué que si les deux modèles s'accordent sur la catégorie d'un point donné, alors cette catégorie est choisie comme la prédiction finale. Si les modèles ne sont pas d'accord, un modèle de combinaison est utilisé pour prendre la décision finale. Ce modèle de combinaison est également un classifieur linéaire qui prend en entrée les sorties des deux premiers modèles. L'auteur illustre cela avec un exemple où les sorties des deux premiers modèles sont combinées pour prédire la classe correcte.

10:01

🧠 Vue graphique des réseaux de neurones

Le paragraphe 3 introduit la notion de réseau de neurones comme une représentation graphique des calculs effectués par les classifieurs linéaires. Chaque connexion entre les neurones est associée à un poids, et chaque neurone effectue un produit scalaire entre ses poids et les entrées, suivi d'une fonction d'activation non linéaire. Le paragraphe explique que les réseaux de neurones sont inspirés par les mécanismes de calcul du cerveau et que les noeuds ou neurones sont les unités de base de ces réseaux. Il est également mentionné que les réseaux de neurones sont des modèles très flexibles qui peuvent être utilisés pour diverses tâches d'apprentissage supervisé.

15:02

🌐 Propriétés générales des réseaux de neurones

Le paragraphe 4 décrit les propriétés générales des réseaux de neurones, soulignant leur flexibilité et leur capacité à être hautement non linéaires. Il mentionne qu'ils sont utilisés pour diverses tâches telles que la régression, la classification et l'estimation de densité, et qu'ils peuvent également être utilisés pour l'apprentissage non supervisé. Le paragraphe conclut en soulignant l'importance des réseaux de neurones dans la révolution de l'apprentissage profond et les progrès réalisés dans le domaine de l'apprentissage automatique au cours des dix dernières années.

Mindmap

Keywords

💡Réseaux neuronaux

Les réseaux neuronaux sont des modèles d'apprentissage artificiel inspirés du fonctionnement du cerveau humain. Dans le script, ils sont présentés comme une extension des modèles de classification linéaire pour résoudre des problèmes non linéaires. Les réseaux neuronaux sont composés de plusieurs couches de neurones interconnectées, capables d'apprendre des relations complexes entre les données.

💡Classification linéaire

La classification linéaire est une méthode de classification supervisée où les données sont séparées par une frontière décisionnelle linéaire. Dans le script, cela sert de point de départ pour expliquer comment les réseaux neuronaux peuvent être utilisés pour traiter des données qui ne sont pas séparables linéairement.

💡Séparabilité linéaire

La séparabilité linéaire fait référence à la capacité de séparer des classes de données à l'aide d'une frontière linéaire. Le script mentionne que les données peuvent être linéairement séparables ou non, ce qui influence le choix du modèle d'apprentissage à utiliser.

💡Modèle de régression linéaire

Le modèle de régression linéaire est utilisé pour prédire la valeur d'une variable cible en fonction de variables indépendantes. Dans le script, il est mentionné comme la base d'un modèle de classification linéaire, où un produit scalaire entre les poids et les entrées de données est utilisé pour faire des prédictions.

💡Biais

Le biais est un terme utilisé pour décrire un terme de compensation dans un modèle de régression linéaire. Dans le script, il est expliqué comme un terme ajouté à la somme pondérée des entrées pour ajuster la position de la frontière décisionnelle.

💡Couche cachée

Une couche cachée est une couche intermédiaire dans un réseau neuronal qui n'est ni d'entrée ni de sortie. Dans le script, la couche cachée est utilisée pour combiner les décisions de plusieurs classifieurs linéaires pour créer une frontière décisionnelle non linéaire.

💡Nœud

Dans le contexte des réseaux neuronaux, un nœud fait référence à un neurone qui effectue des calculs basés sur les entrées qu'il reçoit. Le script utilise le terme 'nœud' pour décrire les unités de traitement dans les réseaux neuronaux qui reçoivent des signaux, effectuent des calculs et envoient des sorties.

💡Fonction d'activation

Une fonction d'activation est une fonction non linéaire utilisée pour introduire des non-linéarités dans les réseaux neuronaux. Dans le script, elle est mentionnée comme une étape cruciale après le calcul du produit scalaire pour déterminer l'état d'activation du neurone.

💡Hyperparamètres

Les hyperparamètres sont des paramètres qui doivent être définis avant l'apprentissage d'un modèle et qui ne sont pas ajustés directement à partir des données. Dans le script, le nombre de couches cachées et la taille de chaque couche sont des hyperparamètres qui influencent la capacité du modèle.

💡Apprentissage profond

L'apprentissage profond fait référence à la branche de l'intelligence artificielle qui utilise des réseaux neuronaux profonds, c'est-à-dire avec plusieurs couches de neurones. Le script mentionne l'apprentissage profond comme étant à l'origine des récents progrès dans le domaine de l'intelligence artificielle.

Highlights

Introduction to neural networks starting from linear classification.

Explanation of linear classification and its limitations with non-linearly separable data.

The concept of combining multiple linear classifiers to achieve non-linear decision boundaries.

Graphical representation of a simple neural network with two linear classifiers.

Formalization of combining linear classifiers through a third model, f double prime.

The role of decision rules in classifying data points based on the output of linear models.

Algorithm for model combination involving evaluation and output combination steps.

Graphical view of a neural network with nodes representing computations and connections representing signals.

Definition and function of neurons in a neural network.

Different types of neural networks and the concept of feedforward neural networks.

Importance of the number of hidden layers and their size as hyperparameters.

The forward path in neural network computation and prediction.

Neural networks' flexibility and their ability to model non-linear relationships.

Applications of neural networks in supervised learning tasks.

The role of neural networks in the deep learning revolution.

Historical context of neural networks' popularity in machine learning.

The cyclical nature of machine learning trends and the current dominance of deep learning techniques.

Transcripts

play00:00

in this capsule we're going to introduce

play00:02

neural networks

play00:03

the way we're going to do this is we're

play00:05

going to start from linear

play00:06

classification

play00:07

and then we're going to see how we can

play00:08

extend that to obtain a neural network

play00:11

so recall the linear classification

play00:14

problem

play00:15

the bottom what i'm showing is this pro

play00:16

typical problem which we've talked about

play00:18

a couple times already we have data in

play00:20

two dimension

play00:21

and our problem is to classify two

play00:24

classes it's a binary classification

play00:26

problem

play00:26

the green class and the blue class

play00:30

recall that we learned that one thing we

play00:32

could do

play00:33

is to start from a linear regression

play00:35

model that is a dot product between

play00:38

w and x plus a bias term w0

play00:41

and add on top of it a decision boundary

play00:45

we saw that geometrically speaking the

play00:46

decision boundary is something that

play00:48

divides our space into

play00:50

here two regions the

play00:53

above the decision boundary we say that

play00:56

the point is going to be green and below

play00:59

decision boundary our model would

play01:00

predict that

play01:01

a point would belong to the blue class

play01:04

okay

play01:05

so that sort of model can obtain zero

play01:07

error if the data

play01:09

is linearly separable but what about the

play01:12

case when it's not

play01:14

so let's take a simple example here

play01:16

where we have four points in two

play01:18

dimension

play01:19

we see that these points are not

play01:21

linearly separable that is you cannot

play01:23

found a linear decision boundary

play01:25

that would perfectly classify this data

play01:28

set

play01:30

so what does that mean well for one

play01:33

it means that we're going to need a more

play01:35

powerful model to be able to do this

play01:37

right and the intuition we're going to

play01:40

follow to extend linear classification

play01:42

to neural network

play01:43

is this idea that we're going to be able

play01:45

to combine the

play01:47

decision of several linear classifiers

play01:50

okay so what do i mean by that well

play01:52

let's look at this

play01:54

in this particular case what i have is

play01:56

two linear classifiers

play01:57

one that's parametrized by w and w zero

play02:01

and the second that's characterized by w

play02:03

prime and w

play02:04

zero prime okay and so

play02:08

intuitively whatever falls

play02:11

above the top line should be the cream

play02:14

to belong to green class whatever falls

play02:17

below the bottom line or the bottom

play02:21

decision boundary should belong to the

play02:23

green class

play02:24

and whatever falls in the middle of the

play02:26

two

play02:27

should belong to the blue class okay

play02:31

so there might be a way to combine these

play02:33

two

play02:34

to obtain a non-linear decision

play02:37

effectively a non-linear

play02:39

decision okay so how can we do this a

play02:41

little bit more formally

play02:43

well let's imagine that our we call our

play02:45

model that's characterized by w and w0

play02:48

we call that f okay and so our model f

play02:52

will predict that things are from the

play02:54

green class if

play02:55

they are above the decision boundary

play02:57

which is this one here

play02:59

okay and that model will predict that

play03:03

thing that

play03:04

points or datum or data are from the

play03:07

blue class

play03:08

if they are under the decision

play03:10

boundaries in this region

play03:12

here okay and so you see obviously that

play03:15

that

play03:15

model on its own would misclassify this

play03:18

particular point

play03:20

we could do the same thing for our model

play03:22

number two f

play03:23

prime okay the decision boundary of f

play03:26

prime is this one

play03:27

and so we see that things above

play03:31

the decision boundary of f prime will be

play03:33

classified as blue and things below

play03:35

the boundary of x prime will be

play03:37

classified as green

play03:38

so again this model would make an error

play03:42

it would misclassify this particular

play03:44

point here okay

play03:46

now let's see though how we can combine

play03:48

these two models so

play03:50

there's the first two cases and then

play03:52

there's a third case which is

play03:53

perhaps the more interesting one so in

play03:55

the first case we're saying that

play03:57

we have a point and for that point

play04:01

the model f predicts that it's going to

play04:04

belong to the green class okay so it

play04:05

means that it's

play04:06

above this line right here oops

play04:11

sorry about that okay so we're saying

play04:14

that it's

play04:14

above this line right here okay then

play04:17

we know that regardless of what f

play04:21

prime says right we know that it's going

play04:23

to belong to the green class so in this

play04:25

particular case of course f

play04:27

prime would say that it belongs to the

play04:29

blue class

play04:30

but regardless of that we know that if

play04:33

it's above

play04:35

the decision boundary of f of the model

play04:37

f

play04:38

then we will predict as being going into

play04:40

the green class

play04:42

okay so this is similar for the second

play04:45

case here

play04:46

where if f

play04:50

prime of x says that it's it belongs to

play04:52

green class

play04:53

it really means that it's below this

play04:55

decision boundary right here

play04:57

right and so if it is below decision

play04:59

boundary then

play05:00

the value of f of x doesn't really

play05:03

matter

play05:04

we know that it should belong to the

play05:05

green class now the third case is the

play05:08

most interesting one

play05:09

this is the case where f of x says that

play05:13

it's below its decision boundary so it

play05:15

means that it's here

play05:17

and f prime of x also or says

play05:21

that it's above its decision boundary

play05:23

right so really in this

play05:25

middle zone right here right

play05:29

okay and so in this case right

play05:32

we would imagine that if both say that

play05:35

it's blue then effectively

play05:37

the correct class to predict here would

play05:40

be blue

play05:42

okay so in this particular case we've

play05:44

combined

play05:45

the output of both models to come up

play05:48

with the decision

play05:49

okay so if we were to

play05:52

think a little bit or try to write down

play05:54

an algorithm to get to this

play05:56

model combination it would really have

play05:58

two steps

play05:59

in the first step we evaluate each model

play06:03

given the data okay so f uses the data x

play06:07

and f prime also uses the data x

play06:10

then in the second step we're going to

play06:11

combine the output

play06:13

of both of these models right so in a

play06:15

sense it's as if we had a

play06:17

third model that used as input

play06:20

not the x's but rather used as input

play06:23

the output of x of f and f prime okay

play06:27

their prediction

play06:28

okay and so mathematically of course we

play06:30

could try to formalize this

play06:33

and we could imagine that this third

play06:36

model can also be written

play06:38

as a linear classifier in particular

play06:41

here

play06:42

we call it f double prime and we

play06:44

parametrize it using

play06:45

w double prime and w0 double prime

play06:50

what's important to note is that the

play06:52

input

play06:53

here are the output of the models from

play06:56

the first step

play06:57

okay the output of f x and f prime of x

play07:00

they are

play07:01

multiplied or a dot product is taken

play07:02

between w double prime and these outputs

play07:05

and then we add a bias w zero double

play07:07

prime to it

play07:08

and then we do exactly what we did

play07:10

before we take a decision

play07:13

rule we added this in rule on top of

play07:15

what seems like a linear regression

play07:17

model

play07:18

okay so here i wrote it's a threshold so

play07:21

above a value it's a

play07:22

particular above a value say it's going

play07:24

to be green and below a value it's going

play07:25

to be blue

play07:26

but this what i mean here is really just

play07:29

a decision rule

play07:30

on on just like we did for f prime and f

play07:34

f x f of x and f prime of x

play07:38

okay so what did we learn here

play07:41

well we learned that we can effectively

play07:43

combine

play07:44

two linear models to obtain

play07:48

effectively a non-linear model right a

play07:50

model which would correctly classify

play07:53

this nonlinearly separable data okay

play07:57

so how do we get exactly from there to

play07:59

neural nets

play08:00

well this is effectively your first

play08:03

example of the neural net

play08:04

okay but let's try to try to get a

play08:07

little bit more intuition

play08:08

and in particular what i'll do here is

play08:10

that i'll introduce

play08:12

this graphical view of things okay

play08:15

so what we have here in our previous

play08:18

slide what we had

play08:19

is we had a datum right an instance

play08:23

this instance is in two dimensions x1

play08:26

and x2 then

play08:30

both of these x1 and x2 so datum

play08:33

is passed both to f of x

play08:36

this is why i have a little arrow here

play08:38

that goes from x1 to f of x

play08:40

and x2 to f of x okay so you can think

play08:43

of these little arrows as being signals

play08:45

i'm sending

play08:46

the this information about the value of

play08:49

x1 to f of x and i'm sending the value

play08:51

of x2

play08:52

also to f x okay and of course i'm doing

play08:55

the same thing for f

play08:56

prime of x we said that both have access

play08:58

to the data

play09:00

then this little circle here which i can

play09:02

call a node

play09:04

which we'll later call a neuron is where

play09:07

computation happens in particular here

play09:10

the computation that happens is a dot

play09:12

product

play09:13

right between the my parameters

play09:16

w the parameters of f and

play09:19

my x's okay

play09:22

once i have this output so once i have a

play09:25

dot product

play09:25

i then have passed it through my

play09:27

decision rule once i've done that for

play09:29

f and f prime of x then i could take

play09:32

these

play09:32

outputs and use them as

play09:36

input to my third model right my

play09:38

combination model

play09:39

f double prime of x and of course the

play09:42

third step

play09:43

the third and final step is that f

play09:45

double prime of x

play09:46

will then make a prediction it will say

play09:48

whether or not the current the model

play09:51

with the current parameters w

play09:53

w prime and w double prime of course

play09:56

belongs to

play09:58

so the model believes that this

play10:00

particular

play10:01

datum x1 and x2 belongs to the class

play10:04

blue the blue class or to the green

play10:09

okay so what i've done here is exactly

play10:12

what i did in the previous slide except

play10:13

that i'm now

play10:14

using this sort of graphical notation

play10:17

okay

play10:18

and this in fact as i said on the

play10:21

previous slide

play10:22

is a neural network we call it a neural

play10:25

network because it's biologically

play10:26

inspired

play10:27

it resembles a little bit the sort of

play10:30

resembles a sort of computations that

play10:32

happen in in brains

play10:34

okay and so in particular because

play10:37

of this this connection to computations

play10:40

in the brain

play10:41

we call these nodes or these circles we

play10:43

call them neurons

play10:44

or if we had a single the neural network

play10:46

with a single node we could

play10:47

call that a perceptron

play10:51

okay so we can try it of course to make

play10:54

um some of the things i've said a little

play10:55

bit more formal

play10:58

so here is exactly the picture you had

play11:00

on the previous slide

play11:02

here each arrow denotes what we called a

play11:05

connection

play11:06

right or a signal that's associated with

play11:08

the weight it means that each connection

play11:10

is associated with a particular

play11:12

scalar w okay

play11:19

right so this is what i have here then

play11:21

each

play11:22

node here is the weighted sum of its

play11:24

input

play11:25

followed by a nonlinear activation

play11:27

mathematically speaking this is what i

play11:29

have imagine i have a node

play11:31

with weights w

play11:34

w something index and then one what i'm

play11:37

saying here is

play11:38

i take the input to the node and i do a

play11:41

dot

play11:42

product with this w okay or weighted sum

play11:45

and then i pass it through a nonlinear

play11:47

activation function this is what until

play11:49

now i've been calling this a decision

play11:51

rule or the threshold

play11:52

and here i denote it by the symbol sigma

play11:55

and we'll see exactly what that is in a

play11:56

couple slides

play11:58

okay um and

play12:02

what's important is that there are

play12:04

different types of neural network

play12:05

what i'm showing here is a feed forward

play12:08

neural network okay so in a feed forward

play12:11

neural network

play12:12

information always um goes forward

play12:15

okay and so that means that the

play12:17

connections go from

play12:19

data to output or in this case from left

play12:23

here to right okay each connection must

play12:25

go from left to right

play12:27

so there are no connections between a

play12:29

layer right

play12:30

within a layer so for example there are

play12:31

no connections going from

play12:33

this neuron to the neuron below right

play12:37

and nor are there connections that are

play12:39

going backward right so there are no

play12:41

connections for example that go from the

play12:42

output layer back to um to another a

play12:46

previous

play12:48

neuron in a previously okay

play12:51

now you'll note that i've used the word

play12:53

layer

play12:55

so this particular neural net has three

play12:57

layers it has

play12:58

an input layer right this is really

play13:01

effectively a data layer and so the size

play13:04

of the input layer

play13:07

that will be given by the number of

play13:09

inputs you have here

play13:10

our data lives in two dimension and so

play13:13

we have two inputs

play13:14

in addition what we often do is we add a

play13:17

third input so that's why it says that

play13:18

the input layer its size is the number

play13:20

of inputs plus

play13:21

one right or your size the size of your

play13:23

independent the number of independent

play13:24

variables plus one

play13:26

and this one here is just because it

play13:29

adds

play13:30

a bias okay so effectively what i'm

play13:32

saying here is

play13:33

what we have here this is w zero prime

play13:36

and this is w zero okay so that's sort

play13:40

of

play13:40

a common way to add a bias just to say

play13:42

that we have an input

play13:43

that's always fixed to one okay

play13:48

then after the input layer what we have

play13:51

is a hidden layer right so this is what

play13:54

we have here

play13:55

our hidden layer effectively picks

play13:58

a signal that comes in through its

play14:00

connections does

play14:02

computations and then sends its output

play14:04

either to another

play14:06

hidden layer or to an output layer okay

play14:09

so the number of hidden layers here is a

play14:11

single one but the number of hidden

play14:13

layers

play14:13

is a hyperparameter and the size

play14:16

of each hidden layer that is how many

play14:18

neurons a hidden layer has

play14:20

here this hidden layer has two neurons

play14:22

that is

play14:23

also a hyper okay

play14:26

and intuitively speaking the more hidden

play14:29

layers you have

play14:30

and the bigger each hidden layer is

play14:33

the more capacity your model has okay

play14:36

and so one way to regularize neural

play14:38

network is to

play14:39

limit the number of hidden layers and

play14:42

limit the size of each hidden layer

play14:45

the third layer is the output layer it

play14:48

will take

play14:48

information from the last hidden layer

play14:52

transform it right so there's another

play14:54

dot product followed by

play14:56

a decision rule or that sigma symbol i

play15:00

used earlier

play15:01

and then i obtain an output okay and so

play15:04

the size of the output layer the number

play15:06

of neurons

play15:08

could be more than one in our case in

play15:10

the case we've looked at so far

play15:12

we only need to predict a single value

play15:14

right whether something is green or blue

play15:16

and so that requires a single neuron but

play15:19

of course if we wanted to

play15:20

output a vector of values then

play15:23

we could have multiple output neurons

play15:26

here

play15:28

okay so to make things very clear

play15:32

we can compute a prediction in the

play15:34

neural network by doing what we call the

play15:36

forward path

play15:38

that is we input to the the neural net

play15:41

to our neural net

play15:43

um data here x1 and x2

play15:47

then at each layer right we do a layer

play15:50

by layer

play15:51

um computation or we do layer wise

play15:54

computation

play15:55

where at each layer we calculate for

play15:58

each neuron in the layer

play16:00

its output right once we've done that

play16:02

for each layer then we can pass

play16:04

its output to the next layer here the

play16:07

output layer

play16:09

this neuron would do its computation and

play16:11

finally we would obtain

play16:12

a prediction okay

play16:18

so what we show now are neural networks

play16:21

would have introduced in this capsule

play16:23

are neural networks um here's sort of

play16:26

some of their more

play16:27

sort of general properties so they are a

play16:29

very flexible model class right

play16:31

so as you may have intuited from the

play16:33

last couple of

play16:34

slides i could just add neurons i could

play16:37

add layers i could add even different

play16:38

types of connections right

play16:40

to obtain ever more powerful models or

play16:42

to obtain models that

play16:44

that can that can model specific types

play16:46

of data

play16:48

in general although each neuron is a

play16:50

linear model

play16:52

their combination as i as we showed

play16:54

earlier

play16:55

gives something that's that's is

play16:57

non-linear right

play16:58

and so neural networks in general are

play17:01

can be

play17:01

highly non-linear models they are good

play17:05

for

play17:06

many supervised learning tasks such as

play17:08

regression classification and density

play17:10

estimation

play17:12

and as we'll see in a couple weeks we

play17:13

can also use them in other tasks such as

play17:16

unsupervising for unsupervised learning

play17:17

tasks

play17:20

finally last but certainly not least

play17:23

they are the models behind deep learning

play17:27

or as it's sometimes called the deep

play17:30

learning revolution

play17:32

okay so over the last

play17:35

about 10 years much of the progress

play17:38

behind machine learning

play17:40

much of the new task the new ai type

play17:42

task that machine learning is able to

play17:44

solve

play17:45

comes because we can now train

play17:48

very large neural networks to do

play17:52

very sort of to do some to do very

play17:54

amazing

play17:55

amazing things i should say right to do

play17:57

tasks that

play17:58

seem to really require artificial

play18:01

intelligence

play18:03

okay having said that i also want to

play18:07

briefly touch upon a particular

play18:09

historical aspect

play18:11

so as i said one of the reasons behind

play18:15

um all the excitement in the field of

play18:18

machine learning in the eye

play18:19

is exactly because of neural networks

play18:21

right we'll define exactly in a future

play18:23

capsule

play18:24

what a deep neural network is and what

play18:26

deep learning effectively is

play18:28

okay but so yeah so much of the

play18:30

excitement

play18:31

in the eye world these days comes from

play18:34

deep learning

play18:35

which means that if you do now in

play18:38

machine learning class

play18:39

they're all most machine learning class

play18:41

will involve some

play18:43

deep learning techniques some neural

play18:44

nets some discussion of neural networks

play18:46

okay this might not have been the case

play18:48

had you taken a class say 15 to 20 years

play18:51

ago

play18:51

where back then one of the most popular

play18:55

models in terms of how well it worked

play18:56

but also in terms of our theoretical

play18:58

understanding were support vector

play19:00

machines right which i

play19:01

briefly discussed a couple capsules ago

play19:05

this also means that

play19:07

[Music]

play19:08

we have to take into account these these

play19:10

sorts of cycles right

play19:12

and it also means that we don't exactly

play19:14

know um

play19:15

what will be what will make up the

play19:17

machinery class in 15 to 20 years

play19:20

right but at least for now these deep

play19:23

learning techniques are performing

play19:24

beyond sort of our wildest imagination

Rate This

5.0 / 5 (0 votes)

相关标签
Réseau NeuronalClassification LinéaireApprentissage AutomatiqueModèles Non LinéairesMachine LearningDeep LearningSéparabilité LinéaireModèles PrédictifsIntelligence ArtificielleModèles de Données
您是否需要英文摘要?