Naive Bayes classifier: A friendly approach
Summary
TLDRIn this video, Luis Serrano explains the Naive Bayes classifier, a fundamental concept in probability and machine learning. He uses the example of building a spam detector to illustrate how Bayes' theorem is applied. The video covers calculating the probability of spam emails based on keywords like 'buy' and 'cheap'. It also discusses the 'naive' assumption of independence between features, which simplifies calculations. Luis provides a detailed walkthrough of how to apply Bayes' theorem and the naive assumption to estimate probabilities when not all data points are available.
Takeaways
- 📝 Bayes' Theorem is a fundamental concept in probability and machine learning, used to calculate the probability of an event based on prior knowledge of conditions.
- 📝 Naive Bayes is an extension of Bayes' Theorem that simplifies calculations by making the assumption that features are independent, even when they might not be.
- 📝 The video uses the example of a spam detector to explain how Naive Bayes can be applied to classify emails into spam or not spam based on the presence of certain words.
- 📝 The script demonstrates how to calculate the probability of an email being spam if it contains specific words, like 'buy' and 'cheap', using Bayes' Theorem.
- 📝 It explains the concept of conditional probability and how it is used in the context of Naive Bayes to determine the likelihood of spam based on email content.
- 📝 The video highlights the importance of making naive assumptions about independence between features to simplify the calculations and make the model more manageable.
- 📝 The script shows how to handle situations where data is sparse or certain combinations of features do not appear in the training set.
- 📝 It emphasizes that even with the naive assumption of independence, Naive Bayes classifiers can perform well in practice for many classification tasks.
- 📝 The video concludes by summarizing the process of filling out a probability table and using it to calculate the likelihood of an email being spam based on multiple features.
- 📝 It challenges viewers to understand the math behind Naive Bayes and appreciates the simplicity of calculating probabilities by dividing one set of data by another.
Q & A
What is the Naive Bayes classifier?
-The Naive Bayes classifier is a simple probabilistic classifier based on applying Bayes' theorem with strong (naive) independence assumptions between the features.
What is Bayes' theorem?
-Bayes' theorem is a fundamental principle in probability theory that describes the probability of an event based on prior knowledge of conditions that might be related to the event.
How does the Naive Bayes classifier work for spam detection?
-For spam detection, the Naive Bayes classifier works by calculating the probability of an email being spam based on the presence of certain keywords or features that are indicative of spam.
What is the significance of the word 'buy' in the context of the spam detector example?
-In the spam detector example, the word 'buy' is chosen as a feature that is likely to appear more frequently in spam emails compared to non-spam emails.
How is the probability of an email being spam calculated if it contains the word 'buy'?
-The probability is calculated by dividing the number of spam emails containing the word 'buy' by the total number of emails containing the word 'buy'.
What is the role of the word 'cheap' in the spam detection example?
-Similar to 'buy', 'cheap' is another feature that might be more common in spam emails, and its presence is used to calculate the likelihood of an email being spam.
What happens when you apply Naive Bayes to multiple features, like both 'buy' and 'cheap'?
-When applying Naive Bayes to multiple features, you calculate the combined probability of an email being spam given the presence of all those features, assuming independence between them.
Why is the assumption of independence between features considered 'naive'?
-The assumption of independence is considered 'naive' because in reality, features are often not independent. However, this simplification allows for easier calculations and can still yield good results.
How does the Naive Bayes classifier handle situations where certain combinations of features have not been observed in the training data?
-The classifier uses the assumption of feature independence to estimate probabilities for unseen combinations, allowing it to make predictions even with limited data.
What is the importance of the dataset size when using the Naive Bayes classifier?
-A larger dataset can provide more accurate probabilities for the features, but the Naive Bayes classifier can still perform well with smaller datasets due to its simplicity and the assumption of feature independence.
Can the Naive Bayes classifier be improved by considering feature dependencies?
-Yes, the classifier can potentially be improved by using more sophisticated models that capture feature dependencies, but this comes at the cost of increased complexity and computational requirements.
Outlines
📊 Introduction to Naive Bayes Classifier
Luis Serrano introduces the concept of the Naive Bayes classifier, explaining its importance in probability and machine learning. He clarifies that Bayes' theorem is about calculating the probability of an event given some prior knowledge, and Naive Bayes extends this idea by making simplifying assumptions to handle complex scenarios. The example of building a spam detector is used to illustrate the concept, where the presence of certain words in emails is correlated with them being spam or not. The video uses the word 'buy' to demonstrate how to calculate the probability of an email being spam based on its content.
🔍 Handling Overlapping Features in Naive Bayes
The script delves into the challenge of handling multiple features, such as the words 'buy' and 'cheap', in a Naive Bayes classifier. It discusses the issue of zero instances of non-spam emails containing both words in a small dataset and how this could skew the classifier's accuracy. The solution proposed is to make an assumption about the independence of these features and estimate the probability of their co-occurrence based on their individual probabilities, despite the lack of direct evidence in the data.
📉 Applying Naive Assumptions to Improve Calculations
Luis explains how the Naive Bayes classifier uses the assumption of independence between features to simplify calculations. By assuming that the presence of one word does not affect the presence of another, the video demonstrates how to estimate the probability of an email being spam based on multiple keywords. It shows how to calculate this probability by multiplying the probabilities of individual words appearing in spam emails and compares it to the same calculation for non-spam emails.
📈 Expanding Naive Bayes to More Features
The script extends the discussion to include more features, such as the word 'work', in the Naive Bayes classifier. It shows how to incorporate additional features into the model by making the same naive independence assumption and calculating the combined probability of multiple words appearing in an email. The video explains how some features can increase the likelihood of an email being spam, while others can decrease it, and how the Naive Bayes classifier combines these to make a prediction.
📝 Wrapping Up Naive Bayes Explanation
In the final paragraph, Luis summarizes the process of using Naive Bayes for spam detection, emphasizing the simplicity of calculating probabilities by dividing one number by another. He invites viewers to engage with the content by subscribing, liking, sharing, and commenting with questions or suggestions for future videos. The video concludes with a reminder to follow Luis on Twitter for more mathematical insights.
Mindmap
Keywords
💡Naive Bayes Classifier
💡Bayes' Theorem
💡Spam Detector
💡Conditional Probability
💡Feature
💡Independence Assumption
💡Dataset
💡Probability
💡Ham
Highlights
Naive Bayes classifier is based on Bayes' theorem and is useful in machine learning for tasks like spam detection.
Bayes' theorem is about calculating the probability of an event given some information about another event.
Naive Bayes simplifies calculations by making assumptions about the independence of events.
An example of building a spam detector using email data is provided.
The word 'buy' is studied for its correlation with spam emails.
It's found that 80% of emails containing 'buy' are spam.
The word 'cheap' is also studied, with 60% of containing emails classified as spam.
When considering both 'buy' and 'cheap', the probability of an email being spam reaches 100%.
The naive assumption is made that words 'buy' and 'cheap' are independent.
The independence assumption allows for easier calculations even with limited data.
The concept of 'ham' is introduced as a term for non-spam emails.
The video explains how to fill out a probability table using Bayes' theorem.
The importance of normalization in calculating final probabilities is discussed.
Naive Bayes can handle many features by assuming independence between them.
The video concludes by emphasizing that Naive Bayes combines multiple features into a model for spam detection.
The presenter invites viewers to engage by subscribing, liking, sharing, and commenting for more content.
Transcripts
i am luis serrano and this video is
about the naive Bayes classifier now
your base is one of the most important
things in probability and it's very
useful in machine learning you may have
seen it as a complicated formula
regarding some ratios of probabilities I
like to see this a little further and I
like to think of it as what is the
probability of something happened given
that we know some information that
something else happens and then naive
Bayes is an extension of this which
basically says ok once I have too many
events and I don't know how to handle
them are there any naive assumptions
that I can make on them to make the math
work easier and so this is what we're
gonna see today so let's start with an
example let's say we want to build a
spam detector because we are tired of
seeing a lot of spam email in our inbox
and we want to sort it properly so how
do we build it we build it with previous
data unless our previous data is a set
of a hundred emails and when we look at
them carefully there are 25 of them that
are spam and 75 and are not spam
so what we're gonna do is we're gonna
try to pick properties of the emails
that we think may correlate with them
being spam or not spam so let's pick one
let's say we're gonna study the
appearance of the word buy so we think
that emails that contain the word buy
are more likely to be spam than not spam
so let's study that let's see how many
emails that a spam have the word Buy and
turns out there's 20 of them and let's
see how many emails that are not spam
have the word buy on them
so there's five so let's forget about
all the others and just look at the spam
emails and here's a quiz the quiz says
if an email contains the word buy then
what is the probability that this email
is spam given the data that we have and
the options are 40% 60% 80% and a
hundred percent so feel free to pause
the video and think about it yourself
given the data that we have what is the
probability that if an email contains
over by then it is spam is a conditional
probability so I'll tell you the answer
the answer is if we look at the emails
that contain the word buy well there's
$20 spam and five that are not
so that mason 80/20 split and so from
this data we can see that from the
emails our continued whereby 80% of them
are spam so the probability we conclude
again just from this data that the
probability is gonna be 80 percent that
it's spam if it contains the word buy
therefore we associate the condition
containing the word buy with the
probability 80 percent and that is
exactly what Bayes theorem is you may
have seen in a different way it's you
know like a formula this is really what
it is so just for fun let's do it for a
different property for a different word
let's say that we think that the word
cheap may also be a good way to tell if
an email is spam so let's study this
word we count how many times the word
cheap appears in spaniels that's gonna
be in 15 of them and from the non-spam
ten of them I have the word cheap so we
forget about the rest and quiz again if
an email contains where chip was a
probability a spam 40 60 80 100 again
feel free to pause the video I'll tell
you the answer the answer is 60% because
if you look at the split there is 15
spam and 10 no spam among the ones that
contain the word cheap so that's a 60/40
split and therefore the solution is 60%
so we applied base theorem for two words
and obtain 80 and 60 now here's where
things get complicated what if we want
to apply it for both words at the same
time so we want to see what's the
probability of an email being spam if it
contains both the word bye
and the word cheap well we can do the
same thing right we can count how many
emails contain the word by and then look
at how many contain the word cheap and
then actually look at the overlap and so
there's actually 12 emails that contain
the words buy and cheap so that's some
good data and then let's look among the
no spams let's say that there's these
five that contain the word buy and these
ten contain they were cheap so actually
there's none that contain both words but
that's okay we're gonna do the same
thing as before we have 12 spam emails
and zero no spam emails that contain
they were cheap so easiest quiz in the
world if an email contains the word
buying cheap wise
probably a spam forty sixty eighty or a
hundred and this should be easy right
because there are twelve emails that
contain both words zero emails that
contain no words and this is a 100% 0%
split so the answer is 100% and we are
done
right well maybe you're being skeptical
like me right it seems like that's a
little too much like any classifier that
tells you something 100% is too strong
and where lies the problem well the
problem lies here that we had 12 emails
that contained about words by and cheap
and that's not bad but here we had 0
emails so among the non spam emos there
are zero emails that contains the words
buy and cheap and so that's just
unfortunate among our data we don't find
the two words but it's possible that
these two words could appear right so we
can't restrict ourselves to not have a
classifier with the words buying cheap
just because in our small data set the
world stone appear so what could we do
well one solution could be just maybe
collect more data like go through a lot
more emails until we find the words buy
and cheap and then do base theorem on
those but what if we just can't what if
we can't collect more data and we have
to do with the data that we have so
let's think we have this situation what
would you do if you have the situation
and you have to sort of imagine how many
emails would contain the words buy and
cheap so what we're gonna do is try to
guess the number try to come up with a
sensible amount of emails that would
contain the words buy and cheap even if
we found none so let's look at a
slightly larger DSL let's say we have a
hundred emails so this is a different
set than the first one we have a hundred
emails and let's say that five contain
the word buy and let's say that ten
contain the word cheap and they don't
overlap however what do you think would
be a sensible number of emails that
would contain the words buy and cheap so
let's think 5 out of 100 is 5% so 5% of
the emails contain the word body and 10
out of 100 is 10 percent so tempers
the emails contained that were cheap so
in an ideal world where everything was
pretty how many emails would contain the
words buy and cheap well what is what is
ten percent of five percent it's zero
point five percent so why don't we just
assume that 0.5% of the email contained
the word buy and cheap so we can sort of
imagine that there is half an email that
contains the words buying cheap answers
all we're doing is math it doesn't
really matter that there's half an email
this will work out on our formulas what
we did is an assumption we assumed that
the words buy and cheap are independent
they may not be right it could be that
containing the word buy makes it easier
to contain the word cheap because you're
talking about a product they say buy
cheap something or it could be the
opposite that if one appears that sort
of forces the other want to do not
appear to be less likely to appear so
it's a it's a quite a strong assumption
as a matter of fact many people would
say that's a naive assumption assuming
that two variables are independent when
they may not be is very naive however
that's what our algorithm is based
because it turns out that if we make
these assumptions things still work well
and it makes our math much much easier
because now we don't have to collect
thousands of emails we can collect these
100 and from the number of thousands by
and the number of peers of cheap we can
sort of cook up the numbers of Pearson's
of buy and cheap so let's do that let's
go back to our data we had 25 spam
emails and 20 of them had the word buy
and that's four-fifths and 15 of them
had the word cheap that's 3/5 so we can
imagine that the product of this is 12
divided by 25 so we could assume that an
average 12 emails here out of 25 would
contain the words buy and cheap so in
order to find the actual number we
multiply by 25 and we get that 12 emails
have the words buying cheap so that was
kind of lucky that we actually did find
12 we're not gonna be that lucky in the
other case but we can still do it right
so we have 75 emails five of them are
buy that's 115
of them then ten of them have the word
cheap that's two fifteen of them and the
product of these two fractions again
assuming they're independent is two
divided by 225 so that's the fraction of
emails that contain the words buy and
cheap so to find the actual number or
multiply it by 75 and we get 2/3 so in
here we have 2/3 of an email contains
the words buy and cheap and that's fair
let's work with that so we go back to
our data and on the Left we have 12
emails that contain the word buy and
cheap and on the right we have 2/3 of an
email that contain the word buy and
cheap and we can do math with these ones
right because now the quiz says if an
email contains the words buy and cheap
what is the probability that is spam so
let's do some math what is the split
among 12 and 2/3 well let's take the
spam ones that's 12 and let's take the
total number of emails that contain Buy
and cheap and that's 12 plus 2/3 because
there's 12 spam and 2/3 there are no
spam so we can find the ratio between
these and by the way if you've seen the
formula for base theorem and there's a
ratio and it's precisely this one so
what do we do with this fraction well we
put in lowest terms is 36 over 38 or
ninety four point seven three seven
percent because this plate is ninety
four point seven three seven and five
point two six three therefore our final
answer is that the words buy and cheap
give us a probability of ninety four
point seven three seven percent of being
spam that means if we have an email with
both of those words is ninety four
percent point seven three seven likely
to be spam and that is precisely the
naive Bayes classifier so now Bayes
classifier basically it's a combination
of Bayes theorem and be naive assumption
that two events are going to be
independent when they may not be but
that naive assumption makes the math
much much easier so let's do a little
summary what we're really doing is we're
gonna fill out this table and some
places of the table we can't really fill
out the data so we'll fill them out with
other places in the table so let's look
at spam and those
animals we looked at the total was 25
spam emails and 75 non-spam emails in
our dataset right now the next way we're
gonna count how many of them have the
word by so 20 of the 25 have the word by
that's forfeits and then five of the 75
there are no spam have the word by so
that's 115 because it's five divided by
75 now we're gonna fill in the next row
so out of the spam emails the 15 of them
contain the word sheep that's
three-fifths cousins fifteen by twenty
five and ten under $75 not spam contain
they were cheap and so that's two 15s
because it's 10 divided by 75 now we
would love to fill in the last row with
data the word their words buy and cheap
but unfortunately this is not big enough
to actually handle as an event that is
so sparse like the words buying cheap
appearing and you can imagine if there
were more words it would be even harder
so we have to cook up this row from the
previous ones so what we're gonna do is
the naive assumption that the words buy
and cheap are independent so that one
doesn't imply or push the other one to
appear or stop it from appearing and if
we make this assumption then we're gonna
say that the product of these two is the
probability of the word buy and cheap
appearing so that's 12 divided by 25 the
product of 4/5 and 3/5 so that's gonna
be our probability and now if this is
the probability of buying cheap
appearing how many emails contain buy
and cheap all we have to multiply by it
by the total number which is 25 so 25
times 12 over I 25 is 12 so we conclude
that 12 we must should contain the words
buying cheap even if there is 12 or 14
or 10 or none logically if we have that
assumption there should be 12 now let's
look at the other two boxes well again
we make the assumption that the word
pine and chips are independent of each
other so the product of this 2 which is
2 divided by 225 it's gonna be the
probability afterwards buying cheap
appearing in an email that is no spam so
now how many emails that are not spam
contain the word buy and cheap well
product of the probability times a total
number so how much is two over twenty
two hundred twenty five times seventy
five that's actually two-thirds so we
have twelve spam emails and two-thirds
of an email that is not spam that
contain the words buying cheap so now we
have to normalize right we have to see
what is the split how many percentage
are spam among the total ones and the
total ones is twelve plus two-thirds
that's all of our emails that are
containing the word pion cheap so we
divide twelve the spam ones divided by
the total which is twelve plus
two-thirds and we get 36 over 38 which
is nine four point seven three seven now
notice that nice ways extents and the
idea is that this extends to many many
more properties right because the point
is if we have 50 properties and we can't
check when they all appear at the same
time we can check when one appears and
then multiply things right so let's add
an extra row to this table let's say we
looked at the word work and we're
wondering if the word work helps us in
our classifier so let's study how much
it appears let's say that it appears
five times in our spam emails and 30
times in our non-spam email so it
doesn't look like it's gonna help us
that much it looks almost like it's a
word that's more correlated to not spam
but let's just study it so this 5 out of
25 is 1/5 so therefore one fifth of the
spaniels contain the word work and six
fifteen of the nonce problems contain
the word work because 30 divided by 75
is 6 over 15
so again naive assumption that the words
buy cheap and work are all independent
therefore the probability that the three
of them appear in an email is the
product of these three numbers which is
12 divided by 125 and again if we want
to estimate the number of emails that
are spam that contain those three words
we multiply the probability times the
total and we get twelve divided by five
so roughly twelve divided by five which
is a little over two emails will be spam
and contain the words by chip and work
and now let's do it over here we assume
again that the three words are
independent of each other we take the
product of the probabilities and that's
gonna be the probability that the words
buy cheap and work all appear in an
email at the same time when the email is
not spam
so in order to find the total number of
emails that are not spam to contain the
words by cheap and work we multiply the
probability that they appear times the
total number of emails and we get that
four out of 15 emails are not spam and
contain the word by Cheban work because
75 times 12 divided by three three seven
five is four fifteen so in summary out
of the emails that contain the words by
chip and work 12 over five are spam and
four over 15 or ham so how many are spam
divided by the total well we take twelve
hour by five the number spam divided by
the total which is 12 over 5 plus four
divided by 15 and that is gonna be 36
over 44 put in lowest terms or 90% so
that's how we combine the three words
now is that 90 is less than 97 because
the word work actually decreases the
probability that an email is spam
because as you can see work appears a
lot more in spam emails so it does make
sense because it's not a word that one
would correlate with spam so some of
these properties may increase
probability and some of them would
decrease it but the fact is a nice base
helps us combine a bunch of different
features into creating a model that
calculates the probability that
something is spam and these features get
combined in a nice way because we don't
have to wait until we find an email with
all these features we can actually cook
up probabilities without having emails
that satisfy all of them so if you're
like formulas this is really what
happened in the background we have this
is the formula of Bayes theorem and the
letter S stands for being spam the
letter H stands for ham which is
actually how they call email that are
not spam they call them ham and the red
letter B stands for by so probability of
s given B when you see that vertical bar
that is a conditional probability so
what the Left says is probability of
spam in the word by appears and that's a
ratio because most post probabilities
are ratios and then the top we have
probability of BI given that spam so out
of the spanning knows how many of them
contain the word by that was 20 out of
25 and then probability of s is
email spam regardless of any words that
it contains to us 25100 because if we
remember there were 25 spaniels out of
100 total so in the bottom goes
everything that total so that's the same
thing 20 over 25 times 25 or 100 plus
the ham ones so we have what's the
probability of the word by appearing if
the email is ham that's five or seventy
five because out of 75 animals five of
them have the word pie and the
probability of animal being ham well 75
over 100 so if you do that whole formula
you get 80% but the interesting thing is
if you look at what we did it was
exactly that and then what happens with
naive Bayes is that we make this
assumption that the probability of the
word by and the word cheap appearing is
the product of the probabilities of the
word by appearing and the word cheap
appearing again this is not supposed to
happen the words buying cheap may be
either correlated or inversely
correlated maybe one implies the other
one maybe one stops the other from
appearing but we're gonna assume naively
that the product of the probabilities
the property of both appearing which is
saying this that the probability of some
event B intersection segment C is a
product of probabilities of B and C
appearing again is a naive assumption
but we're gonna make it because I've
makes our math easier and the full
formula for a naive Bayes this is for
two events but you can generalize this
for many more events is probability of
spam given that the words buy and cheap
appear is that formula and if we look at
all the probabilities here we say
probability of spam if it contains the
words buy and cheap well it's a ratio on
the top we know these probabilities is
20 out of 25 or probability of by given
that a spam probability of cheap given
that it's spam is 15 or 25 you remember
correctly or 15 e spam emails containing
the word cheap and then again 25 over
100 for the probability that an email of
spam in the bottom we have the same
thing plus 5 over 7 5 the probability
that a ham email contains the word pie
10 over 75 the product is at honey milk
contains were cheap and then the
probability that an email is ham which
is 75 over 100 you do this math and you
get ninety four point seven three seven
but I challenge you if it doesn't look
super clear look at this slide and go
to what we did in night base and
convince yourself this is exactly what
we did
what do then this whole video was
nothing different than calculating
probabilities by dividing one thing by
another so thank you very much that's it
for a naive base as usual if you liked
it please subscribe for more videos
coming up yeah please hit like share it
with your friends and feel free to
comment to ask any questions or any
suggestions for this or any other videos
you'd like to see and my twitter handle
is louis likes math so thank you very
much for your attention and see you in
the next video
تصفح المزيد من مقاطع الفيديو ذات الصلة
5.0 / 5 (0 votes)