Week 3 -- Capsule 1 -- Nearest Neighbor

MATH-ML1
3 Sept 202019:04

Summary

TLDRDans ce script, l'introduction au machine learning supervisé est abordée, notamment le modèle 'nearest neighbor'. Le focus est sur les modèles linéaires, principalement pour la classification, bien que mentionnant leur extension à la régression et l'estimation de densité. Les modèles sont divisés en modèles non-probabilistes (comme 'nearest neighbor' et 'support vector machines') et modèles probabilistes (comme 'naive bayes'). L'approche 'nearest neighbor' est expliquée, y compris la sélection du nombre de voisins (hyperparamètre k) à l'aide d'un ensemble de validation pour optimiser les performances. Les avantages et les défis de l'utilisation de 'nearest neighbor', notamment la nécessité de stocker l'ensemble des données et la robustesse en présence d'outliers, sont discutés.

Takeaways

  • 📚 Cette semaine, nous introduisons l'apprentissage supervisé et le modèle d'apprentissage supervisé appelé 'plus proche voisin'.
  • 🔍 Nous allons explorer différents modèles d'apprentissage supervisé, principalement linéaires, avec un focus sur la classification, bien que ces modèles puissent être étendus à la régression ou à l'estimation de densité.
  • 🏫 Nous aborderons deux types de modèles : les modèles non-probabilistes (comme le plus proche voisin et les machines à vecteurs de soutien) et les modèles probabilistes (notamment le modèle de Bayes naïf).
  • 🏠 Dans l'apprentissage supervisé, nous avons un ensemble de caractéristiques ou de fonctionnalités et une cible, appelée 'y', qui peut être une classification (par exemple, la rapidité de vente d'une maison).
  • 🔍 Le modèle 'plus proche voisin' (KNN) est un modèle non paramétrique simple et puissant qui ne nécessite pas de phase d'apprentissage : il utilise les instances d'apprentissage pour prédire les voisins du test.
  • 🔢 Le KNN peut être étendu au 'k plus proches voisins', où une nouvelle instance est classée selon le vote majoritaire de ses 'k' voisins les plus proches.
  • 🔧 Le choix du paramètre 'k' est crucial et peut être optimisé à l'aide d'un jeu de validation pour trouver la valeur qui donne la meilleure performance.
  • 📈 L'utilisation de 'k' plus élevé peut rendre la frontière de décision plus lisse, tandis que 'k' plus faible peut être moins robuste face aux bruits ou aux valeurs atypiques.
  • 💾 Le KNN est un approche non paramétrique qui nécessite de stocker l'ensemble des données d'apprentissage et peut être coûteux en termes de recherche des voisins les plus proches.
  • 🔄 Les bibliothèques telles que scikit-learn permettent des recherches plus rapides des voisins les plus proches, parfois en utilisant des approximations ou en prétraitant les données pour une recherche plus efficace.

Q & A

  • Quel est le modèle d'apprentissage supervisé introduit en premier dans le script ?

    -Le premier modèle d'apprentissage supervisé introduit est le modèle 'nearest neighbor'.

  • Quels sont les deux types de modèles d'apprentissage supervisé abordés dans le script ?

    -Les deux types de modèles d'apprentissage supervisé abordés sont les modèles non-probabilistes, y compris les voisins les plus proches et les machines à vecteurs de soutien, et les modèles probabilistes, en particulier les modèles de Bayes naïfs.

  • Pourquoi le modèle 'nearest neighbor' est-il choisi comme premier modèle dans le script ?

    -Le modèle 'nearest neighbor' est choisi en premier parce qu'il est conceptuellement très simple et qu'il est un modèle puissant.

  • Comment fonctionne le modèle 'nearest neighbor' pour la classification ?

    -Le modèle 'nearest neighbor' classe une instance en fonction de son voisin le plus proche pour le '1 nearest neighbor' ou en fonction du vote majoritaire de ses 'k' voisins les plus proches pour le 'k nearest neighbor'.

  • Quelle est la différence entre le '1 nearest neighbor' et le 'k nearest neighbor' ?

    -Le '1 nearest neighbor' classe une instance selon son premier voisin le plus proche, tandis que le 'k nearest neighbor' classe une instance selon le vote majoritaire de ses 'k' voisins les plus proches.

  • Comment le choix de 'k' affecte-t-il la performance du modèle 'nearest neighbor' ?

    -Le choix de 'k' est un hyperparamètre qui affecte fortement la performance du modèle. Un 'k' trop petit peut être peu robuste, tandis qu'un 'k' trop grand peut sur-ajuster au jeu de données d'apprentissage.

  • Quels sont les défis associés à l'utilisation du modèle 'nearest neighbor' ?

    -Les défis incluent la nécessité de stocker l'ensemble du jeu de données, la recherche des voisins les plus proches qui peut être coûteuse, et les performances réduites en présence de données à haute dimension en raison du phénomène de 'curse of dimensionality'.

  • Comment le modèle 'nearest neighbor' peut-il être étendu au-delà de la classification ?

    -Le modèle 'nearest neighbor' peut être étendu aux problèmes de régression en prenant par exemple la moyenne des valeurs des voisins les plus proches pour prédire une cible continue.

  • Quelle est la garantie théorique fournie par le script concernant le modèle 'nearest neighbor' ?

    -Si le jeu de données d'apprentissage tend vers l'infini et que 'k' est égal à un, l'erreur de test est limitée par deux fois l'erreur optimale.

  • Comment le script suggère-t-il de choisir le meilleur 'k' pour le modèle 'nearest neighbor' ?

    -Le script suggère d'utiliser un jeu de validation pour essayer différentes valeurs de 'k' et de choisir celle qui donne les meilleurs résultats sur ce jeu de validation.

Outlines

00:00

📚 Introduction au Machine Learning Supervisé

Dans ce paragraphe, l'enseignant présente un aperçu de la semaine consacrée au machine learning supervisé, en abordant notamment le modèle de voisinage le plus proche. Il mentionne qu'ils exploreront différents modèles, principalement linéaires, avec un focus sur la classification, bien que ces modèles puissent être étendus à la régression ou à l'estimation de densité. Deux types de modèles seront traités : les modèles non-probabilistes (comme le plus proche voisin et les machines à vecteurs de soutien) et les modèles probabilistes (comme le modèle de Bayes naïf).

05:02

🔍 Le Modèle de Voisinage le Plus Proche

Le paragraphe explique le modèle de plus proche voisin (NN), qui est un modèle non-paramétrique simple et puissant. Aucune phase d'apprentissage n'est nécessaire pour ce modèle ; au moment du test, on se réfère aux instances d'apprentissage pour faire des prédictions en fonction des voisins du test. L'exemple illustre comment classer un point en fonction de sa proximité avec d'autres points de la même classe. L'idée est ensuite élargie au modèle K plus proches voisins (KNN), où une nouvelle instance est classée selon le vote majoritaire de ses K voisins les plus proches.

10:04

🔄 Impact du Paramètre K sur la Classification

Ce paragraphe discute de l'impact du paramètre K dans le KNN sur la classification des données. Un K plus élevé donne une frontière de décision plus lisse, tandis que K=1 est moins robuste face aux outliers. L'enseignant suggère que le choix de K devrait être basé sur des performances évaluées sur un ensemble de validation. Il est également mentionné que l'utilisation de tous les voisins pourrait nécessiter un poids inversement proportionnel à la distance pour les voisins les plus proches.

15:04

📉 Propriétés et Limitations du Modèle de Voisinage le Plus Proche

Les propriétés du modèle de plus proche voisin sont abordées, y compris son caractère non-paramétrique et la nécessité de stocker l'ensemble des données d'apprentissage. L'enseignant mentionne que la recherche des voisins les plus proches peut être coûteuse en temps, surtout pour de grands ensembles de données. Il est également question de la possibilité d'utiliser des fonctions de distance autres que la distance euclidienne pour les données non-continues. Les limitations, telles que la performance dégradée dans les espaces de très haute dimension en raison du phénomène de la 'malediction de la dimensionalité', sont également discutées.

Mindmap

Keywords

💡Apprentissage supervisé

L'apprentissage supervisé est une méthode d'apprentissage automatique où un algorithm apprend à partir de données étiquetées. Dans le script, cela est mentionné comme le thème principal de la semaine, où l'enseignant introduit différents modèles d'apprentissage supervisé, notamment le modèle 'nearest neighbor'. Il est utilisé pour prédire une catégorie ou une valeur à partir de caractéristiques ou de caractéristiques.

💡Modèle de voisin le plus proche (Nearest Neighbor)

Le modèle de voisin le plus proche est un algorithme d'apprentissage non paramétrique simple et puissant. Comme expliqué dans le script, il ne nécessite pas de phase d'entraînement et utilise la distance pour prédire la catégorie d'un nouvel échantillon en se basant sur sa proximité avec les instances d'apprentissage.

💡Modèles linéaires

Les modèles linéaires sont des méthodes d'apprentissage supervisé qui cherchent à établir une relation linéaire entre les caractéristiques d'entrée et la sortie. Dans le script, ils sont mentionnés comme l'un des types de modèles qui seront principalement étudiés, bien que le focus soit également sur la classification.

💡Classification

La classification est une tâche d'apprentissage supervisé où l'on cherche à prédire une catégorie pour de nouveaux échantillons à partir de données étiquetées. Dans le script, l'exemple donné est la prédiction de la 'vendabilité' d'une maison basée sur ses caractéristiques.

💡Modèles non-probabilistes

Les modèles non-probabilistes, comme le modèle de voisin le plus proche, ne modélisent pas la probabilité d'une catégorie. Ils sont mentionnés dans le script comme l'un des types de modèles d'apprentissage supervisé qui seront abordés, avec le support vector machines comme exemple.

💡Modèles probabilistes

Les modèles probabilistes, tels que les modèles de Bayes naïfs, cherchent à modéliser la probabilité d'une catégorie en fonction des caractéristiques. Ils sont mentionnés dans le script comme le deuxième type principal de modèles d'apprentissage supervisé qui seront introduits.

💡Hyperparamètres

Les hyperparamètres sont des paramètres qui doivent être définis avant l'apprentissage et qui ne sont pas appris à partir des données. Dans le script, 'k' pour le modèle de voisin le plus proche est un hyperparamètre qui doit être ajusté pour optimiser les performances du modèle.

💡Validation set

Un ensemble de validation est un ensemble de données utilisé pour ajuster les hyperparamètres d'un modèle. Dans le script, il est mentionné comme une méthode pour sélectionner la valeur optimale de 'k' dans le modèle de voisin le plus proche.

💡Distance de Euclide

La distance de Euclide est une mesure de la distance entre deux points dans un espace à plusieurs dimensions. Elle est utilisée dans le script pour déterminer le voisin le plus proche dans le modèle de voisin le plus proche, bien que d'autres fonctions de distance puissent être utilisées pour différents types de données.

💡Voisins les plus proches (k-NN)

k-NN est une extension du modèle de voisin le plus proche où, au lieu de considérer un seul voisin, on considère le vote de la majorité parmi 'k' voisins les plus proches. Dans le script, cela est discuté comme une façon de classifier de nouveaux échantillons en fonction du vote des 'k' voisins les plus proches.

Highlights

Introduction to supervised learning and the nearest neighbor model.

Focus on linear models for classification with potential extension to regression or density estimation.

Discussion of non-probabilistic models such as nearest neighbor and support vector machines.

Introduction of probabilistic models, specifically naive Bayes.

Supervised learning involves learning a function from features to a target classification.

The nearest neighbor approach as a non-parametric model without a training phase.

Explanation of how the nearest neighbor model predicts based on the closest training instance.

Concept of extending the nearest neighbor approach to k-nearest neighbors for classification.

Importance of selecting the right value for k, the number of neighbors, using a validation set.

Demonstration of how k-nearest neighbors can be more robust than 1-nearest neighbor.

Visualization of decision boundaries with different values of k using scikit-learn.

Properties of nearest neighbors, including its non-parametric nature and requirement to store the entire dataset.

Discussion on the computational expense of searching for nearest neighbors and ways to optimize it.

Adaptability of nearest neighbors to different types of data and distance functions.

Theoretical guarantee that nearest neighbors' test error is bounded for large training sets.

Challenges of using nearest neighbors in high-dimensional spaces due to the curse of dimensionality.

Summary of nearest neighbors as a simple yet powerful model for both classification and regression tasks.

Transcripts

play00:01

in this capsule we're going to introduce

play00:02

the outline of what we're going to do

play00:04

this week

play00:04

which is supervised learning in addition

play00:07

i'll also introduce the first supervised

play00:09

learning model

play00:10

this semester called the nearest

play00:12

neighbor model

play00:15

so we're going to introduce different

play00:18

models for supervised learning the

play00:20

models are mostly going to be linear

play00:22

models

play00:22

and we're going to focus on

play00:24

classification but as you'll see

play00:26

many of these models can actually be

play00:27

extended to regression or even density

play00:30

estimation set we're going to focus on

play00:33

two types of models

play00:34

one which you may already be more

play00:36

familiar with are called

play00:37

for non-probabilistic models including

play00:39

things like nearest neighbor

play00:41

and support vector machines the second

play00:44

is

play00:44

probabilistic models in particular we're

play00:46

going to introduce the naive bayes

play00:48

models to talk about

play00:49

lots of the probabilistic modeling

play00:51

concepts

play00:54

okay so recall that in supervised

play00:56

learning you have a set of

play00:58

characteristics

play00:59

or features and you have a target which

play01:02

we called

play01:02

y here we're in a case where

play01:06

a classification case and so our

play01:09

y are not real valued like it's not the

play01:12

price of a house

play01:14

and here it's actually whether or not

play01:16

your house will sell quickly

play01:17

and there are three possibilities either

play01:20

your house will not sell quickly

play01:22

either your house will sell

play01:25

semi-quickly or either your house your

play01:27

house will sell very quickly

play01:29

so zero one and two

play01:33

the task that we have is we want to

play01:35

learn

play01:36

a function or a model f

play01:39

that goes from these characteristics to

play01:42

a particular classification zero one

play01:45

or two of course what we want to do with

play01:49

this

play01:49

task once we've learned it using

play01:51

training data is

play01:53

then to be able to apply it onto test

play01:56

data

play01:56

that is to predict the sellability of

play01:59

new houses that we've never seen before

play02:02

okay so this is a quick recall of the

play02:04

supervised learning setting

play02:06

um and in this week

play02:09

we're mostly going to focus on actual

play02:11

models so that

play02:12

these functions f that allow us to

play02:15

predict the celebrated sellability of

play02:18

the house

play02:18

given the characteristics of a house

play02:24

the first model we're going to introduce

play02:26

is the

play02:27

nearest neighbor approach or nn

play02:30

i chose it first because it's

play02:32

conceptually very simple

play02:34

yet it's quite a powerful model

play02:37

the near snapper approach belongs to the

play02:40

class of non-parametric models

play02:42

in short and this may make more sense in

play02:44

a couple weeks

play02:45

there is no training phase for the near

play02:48

snaper mod

play02:51

and so what you do at test time is that

play02:54

you actually remember

play02:56

the training instances and you predict

play02:58

according to the neighbors

play03:00

of the test instance okay so let's have

play03:04

a

play03:05

look at what that means okay here i have

play03:09

data that belongs to either the green

play03:11

class

play03:12

or the blue class okay so i have a

play03:14

binary classification problem

play03:16

in addition each datum lives in two

play03:18

dimension

play03:19

okay so it has an x1 component and an x2

play03:22

component

play03:23

okay so now if i were to ask

play03:27

um what does this what which class does

play03:30

this point correspond to

play03:33

okay well it's likely that our human

play03:36

intuition would tell us that it belongs

play03:38

to

play03:38

the blue class okay and that

play03:42

probably is because it's very close to

play03:45

other blue instances and it's also

play03:48

further from the green instances or the

play03:50

green data

play03:52

okay now imagine that i show you a point

play03:55

that's

play03:56

closer to the middle of what you

play03:58

perceive as being the boundary

play04:00

between these two classes then how would

play04:03

you classify this point

play04:05

so one thing you could do is you could

play04:07

look at its closest neighbors

play04:09

okay so the point that it's the closest

play04:11

to

play04:13

this is effectively what um

play04:16

nearest neighbor does or what nearest

play04:17

neighbor does okay so

play04:19

we call this point x j and then we say

play04:23

that we're gonna look at

play04:25

the closest point to xj this close point

play04:28

here i've called

play04:29

x i prime okay and then we're going to

play04:32

predict

play04:33

the class of xj to be the same as a

play04:37

class of its

play04:38

nearest neighbor x i prime so in this

play04:40

case we would predict

play04:42

that xj belongs to the green class okay

play04:44

this is called

play04:45

one nearest neighbor we simply classify

play04:48

an instance

play04:49

according to its first nearest neighbor

play04:52

okay mathematically you can think of it

play04:54

as follows

play04:56

first you find the index of the point

play04:58

which i call i prime

play04:59

that is the closest to your query point

play05:01

xj

play05:03

okay so you look over all points and you

play05:06

pick the one

play05:06

you keep the index of the one that is

play05:08

the closest it's this one here in this

play05:11

picture

play05:12

and then you say that the class of your

play05:15

test intense xj so its class yj is

play05:19

simply going to be equal to

play05:21

the class of its closest point so here

play05:24

it's

play05:25

y i prime and it's the green class

play05:28

okay now perhaps intuition says that

play05:31

looking at one neighbor is a good idea

play05:33

but you could look you should look at

play05:35

more neighbors

play05:37

okay and so we can actually extend this

play05:39

idea of one nearest neighbor

play05:40

classification

play05:41

to k nearest neighbor classification

play05:44

in this case a new instance xj for

play05:47

example

play05:48

will be classified according to the

play05:50

majority vote of its

play05:52

k nearest neighbors okay so let's look

play05:55

at this example

play05:56

imagine that we assume that k is equal

play05:58

to five okay and general k is a hyper

play06:01

parameter

play06:02

that you will need to set of course the

play06:04

the performance of your algorithm will

play06:06

can strongly depend on this hyper brown

play06:10

okay so here we say that we're looking

play06:12

at the five closest neighbors i've

play06:14

identified them as being these five

play06:16

points

play06:17

we know that three of them are blue and

play06:19

two of them are green

play06:20

so if we take a majority vote it's three

play06:22

against two in favor of the blue class

play06:25

okay so according to uh k n

play06:28

when k is equal to five so five and n

play06:31

this point x j

play06:32

should belong to the blue cast so notice

play06:34

that

play06:35

the decision the prediction rather i

play06:39

should say that

play06:40

5 and n would be different than the

play06:42

prediction of 1

play06:44

in this particular case okay also here

play06:47

on the left hand side

play06:48

i've written down in a more

play06:52

formal mathematic using more formal

play06:54

mathematics formula

play06:56

what i've just said in words

play06:59

okay so perhaps um now you say

play07:02

that okay this sounds reasonable right

play07:05

um if i have a new point

play07:07

i'm going to look at its neighbors and

play07:09

i'm going to base my causation

play07:10

on that but really how many neighbors

play07:13

should i be looking at

play07:15

okay how do i set k in practice so

play07:18

in practice what you could do is you

play07:20

could

play07:22

select or pick a validation set

play07:25

okay this is something i've talked about

play07:26

in the capsules from the last week okay

play07:30

and then you could try um you could

play07:33

evaluate the performance

play07:34

of different values of k and simply pick

play07:36

k that had that yields the best

play07:38

performance on your

play07:39

validation set alternatively you could

play07:42

say something like

play07:43

i'm going to use all neighbors

play07:46

okay and of course

play07:50

if that's not enough basically right

play07:53

so if you use all neighbors then you

play07:56

probably would want to weight the

play07:58

neighbors

play07:59

by how close they are to your new

play08:02

instance okay so in this particular case

play08:04

maybe i'm looking at all these neighbors

play08:06

but of course the neighbors that are the

play08:07

closest

play08:08

right intuitively they should play the

play08:10

biggest role and so then their weight

play08:13

is going to be the largest so you could

play08:15

think that the weight could be inversely

play08:17

proportional to the distance

play08:18

the points that are the the closest

play08:21

right the smallest distance

play08:22

have the higher weight

play08:27

okay so now i want to build some of your

play08:30

intuitions about what would happen

play08:31

in with k in different cases

play08:35

and so here's the first example so i

play08:38

still have my data set of green points

play08:40

and blue points in addition you'll

play08:43

notice here that

play08:44

i've colored the background of these

play08:46

plots

play08:47

and when the background is green it

play08:49

means that if a new instance

play08:52

appears in in this in this space which

play08:54

is green then it would be labeled as

play08:56

green

play08:56

and of course if it if a new instance

play08:58

appears in

play09:00

a place where the background is blue

play09:02

then it would be classified

play09:04

or predicted as being from the blue

play09:06

class okay

play09:08

what i'm comparing now on the left and

play09:10

on the right

play09:11

is a nearest neighbor approach where

play09:13

we're using

play09:14

here the three nearest neighbor so k is

play09:16

equal to three versus

play09:18

a nearest neighbor approach where we're

play09:19

using only a single nearest neighbor so

play09:21

k

play09:22

is equal to one one thing i want you to

play09:25

notice

play09:25

is that the decision boundary

play09:29

is much smoother when k is higher

play09:32

okay so here you can see that decision

play09:35

boundary

play09:36

at certain places takes very sharp

play09:39

angles

play09:40

okay so it's unclear just looking at

play09:43

this which one you should

play09:44

pick but at least intuitively it seems

play09:48

like

play09:49

there's no absolute given the the data

play09:52

and the geometry of our of our the data

play09:55

we have access to

play09:56

then there's no absolute reason or no

play09:58

clear reason i should say

play10:00

uh why there should be such a sharp

play10:03

angle

play10:04

at this point and at this point okay so

play10:06

at least intuitively speaking it seems

play10:08

like

play10:09

having a higher number of neighbors may

play10:11

make more sense

play10:13

okay before i move on to the next slide

play10:16

let me just mention that

play10:18

i obtained this plot using the scikit

play10:20

learning library so scikit-learn is a

play10:22

python toolbox for machine learning

play10:25

it's great we'll be using it fairly

play10:27

thoroughly in the class

play10:29

and actually we'll spend a full week i

play10:31

uh in a couple of weeks

play10:33

to explore the various possibilities and

play10:35

various models that exist

play10:36

in in scikit look

play10:40

okay here's another example and now

play10:41

we're going to look at the robustness of

play10:43

the classifier so in this particular

play10:44

case i reuse my data set from the

play10:46

previous slide but i also added

play10:48

what i would call the noisy instance or

play10:51

an outlier

play10:52

okay and what i compare again is on the

play10:55

left k is equal to 3

play10:56

2 on the right k is equal to 1. it seems

play11:00

like k is equal to 1 is a lot less

play11:02

robust to this particular outlier

play11:04

in particular it now would assume that

play11:06

there's a large

play11:07

region here which i would say clearly

play11:10

belongs to the

play11:12

green class that would actually be

play11:14

classified as blue

play11:16

we see that the effect is much smaller

play11:19

for the

play11:20

um for the case when k is equal to three

play11:23

and

play11:24

actually this zone stays green and

play11:26

there's just a small zone here that

play11:27

would become blue

play11:28

and again this i think it would be

play11:31

difficult to explain exactly

play11:33

what why that should be okay

play11:37

so um based on these two slides

play11:41

one of the interesting you can have is

play11:42

that in general

play11:44

you know larger case may be helpful

play11:47

okay but of course it's not uh

play11:51

it's not that larger is always better

play11:54

okay in fact if k was equal to n then we

play11:57

would just predict according to

play11:59

the the number uh we would sorry always

play12:02

predict the same thing

play12:04

and in this case since there are more

play12:05

green points and blue points we would

play12:07

always predict

play12:08

the the class should be green okay

play12:11

so again k is a hyper parameter which

play12:14

you should

play12:15

set um using a validation set

play12:18

and you know if k is equal to one

play12:21

doesn't

play12:22

quite work as well as you think we also

play12:23

know that k is equal to n

play12:25

will have issues and so in practice

play12:28

you're going to want to try different

play12:30

k's and keep the one that does the best

play12:32

again on this validation set

play12:35

okay so a few properties of nearest

play12:37

neighbors uh first

play12:39

as we said before it's a non-parametric

play12:42

approach

play12:42

and as we've made it very clear from the

play12:45

last couple of sites

play12:46

it will require storing the whole data

play12:47

set okay so that means that

play12:50

if your data set for example is too big

play12:52

it does not fit on your on a single

play12:53

computer

play12:54

then using nearest neighbor will be very

play12:56

difficult

play12:57

uh further searching for nearest

play12:59

neighbor of a new datum can be expensive

play13:02

okay particularly you could imagine that

play13:05

a naive search would mean that you're

play13:07

going to compare

play13:09

you're going to look at the distance

play13:10

between the new instance and all of your

play13:12

other instances okay now and so the

play13:15

order of that operation is o of n

play13:18

okay where n is the size of your

play13:20

training data and so if you have a large

play13:22

test set

play13:22

then that search can be can become very

play13:25

expensive

play13:27

what cycle learn and other libraries

play13:30

allow you to do

play13:31

is to is to have faster searches and

play13:35

some of these searches are approximate

play13:36

um in general the the intuition behind

play13:39

some of these faster searches

play13:41

is that you're going to be fixed costs

play13:43

at the beginning

play13:44

to pre-process your training data okay

play13:46

and you're going to pre-process it in a

play13:48

particular structure

play13:49

that allows for much faster retrieval of

play13:52

nearest neighbors

play13:54

so that's one thing and that's an option

play13:56

that exists in psychic learning

play13:59

the second thing is that in my example

play14:02

so far

play14:03

i've imagined that my data was

play14:05

continuous okay

play14:07

and so but nearest neighbor can also be

play14:10

used

play14:11

when the data when you're using

play14:12

non-continuous data in particular when

play14:14

you're using continuous data

play14:15

you're probably going to want to use the

play14:17

euclidean distance

play14:18

function okay and when you're using

play14:21

other types of data then what you would

play14:23

want to do

play14:23

is change a distance function okay so

play14:26

that's what allows you to

play14:27

calculate the distance between different

play14:28

neighbors to a non-euclidean distance

play14:32

and again sagatlearn and other libraries

play14:34

have various

play14:36

standard options for doing that

play14:40

okay a few more properties first

play14:43

we know that nearest neighbor

play14:47

if your training set tends to infinity

play14:50

and k is equal to one then you know that

play14:53

your

play14:53

test error will be bounded by twice the

play14:57

optimal error okay

play15:00

so um that's that could be reassuring in

play15:03

the sense right

play15:05

that you know that for large very large

play15:08

data set

play15:09

then effectively your test error is

play15:11

going to be

play15:12

close to the optimal error that

play15:15

a machine learning model could obtain

play15:18

okay

play15:20

now of course um it's interesting and

play15:23

worth worthwhile to know and to remember

play15:25

but

play15:26

it does not tell us very much about what

play15:28

happens when

play15:29

n does not tend to infinity and

play15:32

especially when

play15:32

n is is small right so if you have a

play15:35

small data set then

play15:36

we don't this this of course would would

play15:39

not apply

play15:40

um and it's not even clear that you

play15:42

could derive some intuition by

play15:44

for small data sets looking at this

play15:46

particular theoretical guarantee

play15:48

the second thing i'd like to mention is

play15:50

that nearest neighbors may not perform

play15:52

very well in high

play15:54

with high dimensional inputs due to the

play15:56

curse of dimensionality

play15:58

okay the intuition here is that

play16:02

in very high dimensional spaces

play16:05

you will need lots and lots of data okay

play16:07

if you don't have lots lots of data

play16:09

then your neighbors or the neighbors of

play16:12

a new instance may actually be very

play16:14

far from it okay and that of course

play16:17

could become could be an issue right

play16:19

because the intuition behind nearest

play16:21

neighbors

play16:21

is that we're going to look at a local

play16:23

neighborhood right because we imagine

play16:25

that locally things are similar

play16:28

okay now if we need to look at houses

play16:31

that

play16:31

in my high dimensional space are very

play16:34

far

play16:34

right then you could imagine that the

play16:37

performance of nearest neighbor

play16:38

would degrade tremendously

play16:42

okay this of course this argument relies

play16:45

on the fact that you have uniform data

play16:48

and so again this is interesting

play16:52

but in practice you may still want to

play16:54

try it out even with your high

play16:55

dimensional data

play16:57

you may still want to try it out and and

play16:59

see how well your nearest neighbor

play17:01

will work on your particular data set

play17:03

okay

play17:06

okay so nearest neighbor summaries again

play17:09

near sabres is a non-parametric approach

play17:11

it does not require fitting parameters

play17:13

using training data

play17:15

instead it uses the training data to

play17:18

directly to be able to predict

play17:22

the classes of test sentences or test

play17:25

data

play17:27

at least in its vanilla formulation the

play17:30

only

play17:31

hyper parameter that you have to set is

play17:33

the number of neighbors

play17:34

again you can set that using a

play17:36

validation set

play17:37

and now you may say yeah but in cycle

play17:40

learn there are lots of

play17:41

other parameters that i can set such as

play17:43

uh which distance function am i going to

play17:44

use

play17:45

right and it's absolutely true

play17:48

and so i think in practice you should

play17:51

look at these various parameters and

play17:52

again you can all set them

play17:54

using a i'll set them using a validation

play17:58

set

play17:59

but at least um conceptually speaking

play18:02

the only hyper parameter

play18:03

we really have to set is the number of

play18:06

neighbors

play18:08

finally uh i'll mention this something i

play18:10

mentioned in one of the

play18:11

initial slides but here i i'm in a

play18:14

classification setting and so i

play18:15

introduced nearest neighbor for

play18:16

classification

play18:18

but the nearest neighbors can be

play18:19

extended to regression problems and

play18:21

even to density estimation for

play18:24

regression problems

play18:25

we could imagine that again you would

play18:28

look for the nearest neighbors

play18:31

of a particular test datum

play18:34

but then instead of doing a majority

play18:36

vote you would

play18:38

for example take the average of the

play18:40

nearest neighbor so for example if

play18:41

you're trying to predict the price of

play18:43

houses

play18:44

with the nearest neighbor approach you

play18:46

would look at your test house

play18:48

find its closest neighbors according to

play18:51

the particular k value you chose

play18:53

and then you would simply take the

play18:55

average price

play18:56

of its neighboring houses and that would

play18:59

give you a prediction for the price of

play19:01

this new

play19:02

test house

Rate This

5.0 / 5 (0 votes)

Связанные теги
Apprentissage SuperviséModèle K-NNClassificationMachine LearningModèles LinéairesRégresseionEstimation de DensitéApprentissage Non-ParamétriqueSélection de KValidation de Modèle
Вам нужно краткое изложение на английском?