Clustering: K-means and Hierarchical
Summary
TLDRIn this video, Luis Serrano explains two popular clustering algorithms: k-means and hierarchical clustering, both used in unsupervised learning to group data. Through a marketing example involving customer segmentation, he demonstrates how clustering works by grouping individuals based on their age and engagement with a page. The k-means algorithm is illustrated using a pizza parlor analogy, where locations are optimized to serve customers. Additionally, the hierarchical method is introduced to create clusters based on proximity. Luis highlights clustering’s applications in marketing, biology, and social networks.
Takeaways
- 😀 Clustering is a type of unsupervised learning that involves grouping data points based on similarities.
- 📊 Two main clustering algorithms covered are k-means clustering and hierarchical clustering.
- 📈 In a marketing application, clustering can be used for customer segmentation to design different strategies for distinct groups.
- 🧑💻 K-means clustering works by starting with random points, assigning data to the nearest center, and adjusting until clusters are optimized.
- 🍕 K-means clustering can be visualized through a pizza parlor analogy, where the goal is to place stores closest to their customer groups.
- 📉 The 'elbow method' helps decide the optimal number of clusters by evaluating the diameter of groups and identifying the best point for clustering.
- 🏙 Hierarchical clustering builds clusters by joining the two closest points or groups iteratively, and a dendrogram is used to visualize these connections.
- ✂️ Decisions in hierarchical clustering, like where to cut the dendrogram, are partially manual but informed by data.
- 🌐 Clustering has broad applications in marketing, genetics, social networks, and recommendation systems.
- 🧬 Social networks use clustering to group similar users and recommend content or connections based on demographic and behavioral similarities.
Q & A
What is clustering, and how is it defined in this video?
-Clustering is a type of unsupervised learning that consists of grouping data into distinct clusters. The algorithm identifies groups based on similarities in the data, even when the data appears to be scattered.
What are the two clustering algorithms discussed in this video?
-The two clustering algorithms discussed in this video are k-means clustering and hierarchical clustering.
What application of clustering is mentioned at the beginning of the video?
-The video mentions customer segmentation for marketing as an application of clustering. The goal is to create three marketing strategies by dividing potential customers into well-defined groups based on their age and engagement levels.
How does the k-means clustering algorithm work?
-In k-means clustering, the computer starts by placing 'k' random points, assigns data points to the closest cluster, and moves each cluster center to the average location of the points assigned to it. This process repeats until clusters stabilize.
What is the elbow method in k-means clustering?
-The elbow method is used to determine the optimal number of clusters by running the algorithm for different numbers of clusters and plotting the diameter (largest distance between two points in the same cluster). The 'elbow' in the plot indicates the best number of clusters.
Why is hierarchical clustering different from k-means clustering?
-Hierarchical clustering is different because it creates a hierarchy of clusters by repeatedly merging the closest data points or clusters, while k-means starts with a fixed number of clusters and adjusts their positions iteratively.
What is a dendrogram in hierarchical clustering?
-A dendrogram is a tree-like diagram that represents the hierarchy of clusters in hierarchical clustering. It helps visualize how data points are grouped together based on their proximity.
How do you determine the number of clusters in hierarchical clustering?
-In hierarchical clustering, the number of clusters can be determined by cutting the dendrogram at a certain height, based on how close the points are or by specifying how many clusters you want.
What real-world applications of clustering are mentioned in the video?
-The video mentions applications of clustering in genetics, evolutionary biology, recommendation systems (e.g., video suggestions), and social networks, where clustering is used to group users based on behavior or demographics.
How does clustering help in social networks?
-In social networks, clustering groups users with similar behaviors or demographics, which can help suggest friends, target advertisements, or recommend content that might be relevant to specific groups of users.
Outlines
📊 Introduction to Clustering and its Marketing Application
Luis Serrano introduces clustering, an unsupervised learning method used for grouping data. He explains that the video will cover two key clustering algorithms: K-means clustering and hierarchical clustering. The example scenario is marketing, specifically customer segmentation for an app, where the goal is to develop three marketing strategies based on two factors: customer age (demographic) and engagement (behavioral). By plotting this data, Luis shows how distinct groups naturally emerge, leading to well-defined marketing strategies.
🍕 K-Means Clustering Explained with Pizza Parlor Example
Luis explains K-means clustering through a pizza parlor analogy. The algorithm works by randomly placing three initial 'pizza parlors' (centers) and assigning customers to their closest parlor. The algorithm then adjusts the parlors' locations based on the average location of their customers, repeating this process until the best positions for the parlors are found. The video emphasizes how the algorithm mimics human intuition, gradually improving through logical steps until the clusters stabilize.
📉 The Elbow Method: Determining the Optimal Number of Clusters
Luis introduces the elbow method, which helps determine the ideal number of clusters for K-means. By plotting the diameters of clusters for varying numbers of clusters, the method identifies the 'elbow' point where adding more clusters doesn't significantly improve the clustering. This graph-based technique balances computational efficiency and accuracy. While humans can easily see the elbow point, the method also accommodates high-dimensional data, keeping the graph in two dimensions for clarity.
🌳 Hierarchical Clustering Explained: From Simple to Complex Groups
Luis shifts to hierarchical clustering, explaining how it groups the closest points first and gradually merges them into larger clusters. The process starts with pairing the closest data points and progresses by joining the nearest groups. To visualize this, he introduces the concept of a dendrogram, a tree-like structure that helps in determining how many clusters to form based on distances between groups. Cutting the dendrogram at a specific distance yields the desired number of clusters.
📊 Dendrograms and Deciding the Number of Clusters
Expanding on hierarchical clustering, Luis explains how to use a dendrogram to decide where to 'cut' and determine the number of clusters. The process involves observing the distance between points or groups and using this information to form clusters at various levels of distance. The visual structure of the dendrogram makes it easy to decide where to stop merging clusters, offering both flexibility and insight, even with high-dimensional data.
🧬 Applications of Clustering in Biology and Social Networks
Luis concludes by highlighting real-world applications of clustering, particularly in genetics and social networks. In genetics, clustering helps in understanding evolutionary relationships by grouping species based on their genome. In social networks, clustering is used to group users with similar demographics or behavior, facilitating personalized content recommendations. He relates this back to how platforms like YouTube might recommend videos, illustrating the practical importance of clustering.
Mindmap
Keywords
💡Clustering
💡K-Means Clustering
💡Hierarchical Clustering
💡Unsupervised Learning
💡Customer Segmentation
💡Elbow Method
💡Centroid
💡Dendrogram
💡Distance Formula
💡Behavioral Data
Highlights
Introduction to clustering as a type of unsupervised learning, focusing on grouping data based on similarity.
Application of clustering in marketing, particularly for customer segmentation to define three marketing strategies.
Key features used for customer segmentation: age (demographic) and engagement (behavioral) in days per week.
Visualizing data on a 2D plot (age vs engagement) makes it easier to see the three natural customer groups.
Introduction to k-means clustering and its analogy to placing pizza parlors in a city to serve clients based on their location.
Step-by-step explanation of how the k-means algorithm works, starting with random points and refining cluster centers iteratively.
Explanation of how computers determine the closest cluster for each data point using distance formulas.
Introduction of the 'elbow method' for determining the optimal number of clusters by analyzing the trade-off between number of clusters and data compactness.
Explanation of calculating the diameter of a cluster to measure the effectiveness of clustering solutions.
Challenges computers face in determining the right number of clusters and how the elbow method helps automate this decision.
Introduction to hierarchical clustering as an alternative method for grouping data, building clusters based on closest data points.
Demonstration of creating a dendrogram to visualize the merging process in hierarchical clustering.
Discussion of how to make decisions in hierarchical clustering, such as where to cut the dendrogram to form distinct groups.
Applications of clustering in real-world scenarios like genetics, evolutionary biology, and social networks.
Clustering methods are widely used in recommendation systems and social networks to group users with similar behavior for targeted content delivery.
Transcripts
hello i'm luis serrano and this video is
about flustering we're gonna learn two
very important algorithms k-means
clustering and hierarchical clustering
clustering is a type of unsupervised
learning and it basically consists of
grouping data so if your data looks like
it's all over the place the algorithm
will say okay you got a group here
you've got a group here a group here etc
so let's take a look so let's start with
an application the application is gonna
be in marketing in particular and
customer segmentation and the situation
is the following we have an app and we
want to market this app we've looked at
our budget and we can actually make
three marketing strategies so that's our
goal to make three marketing strategies
so the idea is to look at their
potential customer base and to split it
into three well-defined groups when we
look at the customer base we realize
that we have two types of information we
have their age in years and we have
their engagement with a certain page in
number of days per week so one of the
columns is demographic age and the other
one is behavioral which is engagement
with the page and the engagement on the
page can be a number from 0 to 7 since
it's in days per week
so we look at the potential customer
base and this is it there's 8 people
with their age and their engagement so
by looking at this list of people what
groups can you come up with let's take a
look feel free to pause the video and
think about it for a minute so just by
eyeballing I can think that for example
this two people are similar they have
similar ages and similar engagements
maybe I could put those in the same
group I don't know maybe these two are
similar as well we can take awhile and
we can actually write them down and
maybe come up with groups but there's
gotta be something easier or at least
something mechanical the computer can do
automatically so one of the first things
to do with data is to plot it so let's
let's plot it in some way let's plot it
like this so in the horizontal axis we
put the age and in the vertical axis we
put the engagement and now it looks more
clear right there are three groups here
is one
here is another one and here is the
other one so that's our three marketing
strategies the first one is for people
around the age of 20 who are have a low
engagement with the page two three and
four days a week
then strategy two for people that are
around their late 30s and early 40s and
high engagement with the page and then
the last one is for people that are
around their 50s and very low engagement
with the page and that is pretty much
for clustering is basically if our data
looks like it's a bunch of points like
this then a clustering algorithm will
say hey you know what I don't know much
about your data but I can tell you that
it's kind of split into these groups so
what we learn in this video is how to do
these clustering how does the computer
identify these groups because for a
human in this small case it's easy but
for computers not and in particular if
you have many many many points and and
many columns or many dimensions it's not
easy so in this video I'm gonna show you
two important methods the first one is
called k-means clustering and the second
one is called hierarchical clustering so
let's start with k-means clustering and
the question is how does the computer
look at this points all over the place
and figure out that they are forming
these groups so when I try to imagine
points in the plane I just imagine
places in a city and trying to put pizza
parlors so let's say that we are the
owners of this pizza place and we want
to put three pizza parlors in this city
and what we want to do is we want to put
them in such a way that we serve our
clientele in the best possible way so we
look at our clientele and it looks like
this this is where they live so what we
want to do is locate three pizza parlors
in the best possible places that will
serve a clientele so if you take a look
at it you can come up with three places
right it seems like we should have a red
one that serves the red point some blue
one that serves the blue points and a
yellow one that serves the yellow points
however for humans is easy but a
computer has a harder time so what the
computer is gonna do is like in most
things in machine learning start a
random spot
and start getting better and better so
how to start well first it locates three
random points and puts three pizza
parlors there and now what we're gonna
do is a series of slightly obvious
logical statements that when put
together will get us to a better place
so the first logical statement is it
seems like if we have the pizza parlors
in these places everyone should go to
the closest one to them that makes sense
right so we're gonna plot all the people
that go to the red to the blue and to
the yellow pizza parlor basically you go
to the one that is the closest so here's
another logical statement if all the red
people go to the red pizza parlor it
wouldn't make sense to put it in the
center of all those houses right and the
same thing with the blue and with the
yellow basically you move the pizza
parlor to the center of the houses that
it's serving so the yellow one will
serve these houses over here the blue
one is serving these houses over here
and the red one is serving these houses
over here so we move each one of them to
the center of the houses that they're
serving and now let's apply the first
logical statement again we have three
pizza parlors and everyone's gonna go to
the one that is closest to them so some
things change right because let's take a
look at these three blue points well now
they're closer to the yellow pizza
parlor so these people move and now
they're gonna go to the other two the
yellow pizza parlor
what about these two red points over
here well now they're closer to the blue
pizza parlor so they're gonna start
going to the blue pizza parlor now so
let's go back to the ideological
statement which is that the best
location for a pizza parlor is the
center of the houses that it serves so
we move every pizza parlor to the center
of the houses that it serves and again
let's go back to the first logical
statement which is every person goes to
the closest pizza parlor so if you look
at these points over here they are red
but now they're much closer to the blue
pizza parlor so they move to the blue
pizza parlor now
and you can see that we're getting
better and better right because now when
we apply the other statement which is
every pizza parlor should be at the
center of the houses that it serves then
now we move everything to the center or
the houses where it serves and we're
done so that's pretty simple right and a
computer can do it because a computer
can find the center of a bunch of points
by just averaging the coordinates and
can also determine if a point is closer
to one center than to the other one
because it simply just applies the
Pythagorean theorem or the distance
formula and can compares numbers these
are these are decisions that a computer
can make very easily so we managed to
think like a computer and not like a
human which is basically the main idea
and machine learning so this is a
k-means clustering algorithm now you may
be noticing that we took one decision
that seemed to be taken by a human and
not by a computer right we decided that
there were three clusters but as we said
that's hard for a computer to decide as
a human can see it back empiric and so
here's a question how do we know how
many clusters to pick and for this we
have a few methods but I'm gonna show
you what's called the elbow method so
the elbow method basically says try a
bunch of numbers and then be a little
smart on how to pick the best one
so let's try with one cluster we can do
this algorithm with only one cluster and
we're probably gonna get something like
this every house go to the same pizza
parlor let me can run it with two
clusters and you can start seeing that
this algorithm actually depends on where
the original point starts sometimes it
works sometimes it doesn't sometimes
give you different answers so let's say
we try to clusters and we got this then
we try three clusters and we got the
solution that we got then we try with
four clusters and let's say we got this
with five clusters and we got this and
with six clusters and we got this
so by eyeballing this we can see that
the best solution is with three clusters
but again we need to teach the computer
how to find the three clusters we need
to think like a computer so we can't
rationalize things we have to do things
like measuring distances comparing
numbers averaging coordinates etc so
with those tools how do we find that 3
is the
best well what we need is a measure of
how good is one clustering and maybe the
following measure will make sense
basically we're gonna do is we're gonna
think of the diameter of a clustering
and the diameter is simply gonna be the
largest possible distance between two
points of the same color that basically
tells us how big each group is in a
rough way so let's look at the first one
cluster solution the longest possible
distance between two points of the same
color is this one those two red points
are the farthest apart so that distances
sign is a win away telling us how good
is that clustering let's do it with two
clusters so the longest distance let's
say it's this distance over here that
tells us how good the clustering is with
two clusters now let's do it with three
clusters let's say that the longest
distance is this one over here again
with four clusters along its distance is
this one with five clusters long a
distance is this one and with six
clusters is this one now I just eyeball
these distances so if you think there's
another one you may be correct but
conceptually what we're trying to do is
to do it define the next method which is
the elbow method so what we're gonna do
is we take all these distances and we
graph them in the following way on the
horizontal axis we're gonna put the
number of clusters so one two three four
five and six and on the vertical axis
we're gonna graph the diameter so we get
the following points and now what we do
is we join these points and now this is
somewhere where a human can intervene a
human can look at this graph and say
okay you know what I want the elbow to
be here there are also some automatic
methods to do this but at some point in
in the machine learning algorithm is
good to actually have a consistency
check because you may have an idea of
how many clusters you want or you may
have an idea of how many Coster you you
would like to have or a maximum or a
minimum so anyway in some way or another
we figure out that the numbers to be
another thing that's important is that
this elbo math is very easy for a human
if our if our data has many many columns
we're looking at points in very high
dimensions
however the elbow method the graph is
always gonna be two-dimensional so
that's it that's how we decided that
three clusters are the best and that is
the k-means clustering algorithm in a
nutshell okay so now let's go to our
second method which is hierarchical
clustering and we're gonna do a similar
problem except now with this data set
we're gonna find it a clustering and two
let's see how many groups we can find so
another way to do it is the following
let's think about this let's think of
the two closest points it wouldn't make
sense to say that these two points that
are the closest would belong to the same
group maybe yes maybe no but it's a
sensical thing to ask right so let's go
on that statement let's say these are
the two closest points so these two are
gonna be part of the same group now what
are the next two closest points let's
say it is two so these two belong in a
group and we're gonna keep going in this
direction the two closest points are
these ones so these two belong to the
same group the two closest points are
this ones so now what do we do well we
just join the two groups so now it
becomes a group of three the two closest
points are these two so they join like
this the two closest points after that
are these two so we join the two groups
the group of two and the group of three
into a group of five and then the next
pair of points are the closest are these
- so we're gonna join them but let's say
that's just too big so we have maybe a
measure of how much is too far so we
stop here and that's it that's
hierarchical clustering it's pretty
simple right now again there seem to be
a human decision here right why did we
decide on that being the distance or for
example why did we decide on two being
the number of clusters so we can make
this decision but let's actually look at
an educated way to make this decision so
let's answer this question how do we
decide the distance or the number of
clusters so a way to do it is by
building something called add and drop
so what we're gonna do is the following
we're gonna pour points in a row over
here one up to eight and then in the
vertical axis we're gonna graph the
distance I'm going to show you how let's
pick the closest two points which are
four and five so we join four and five
and we join them over here and this is
not up to scale but the height of that
little curved line between four and five
let's say is the distance between four
and five so we join this two and then we
go to the next two which is 1/2 and so
we're going to join one two here and
we're gonna join them in the dendrogram
they're right and again assume that that
height of that little curved line is the
distance between one and two now we join
the next pair which is six and seven so
we join six and seven and again the
height is the distance we keep going six
and eight
so now we're gonna join six and eight
how do we join them well we join them
like this the group of six seven and the
group of eight and the next group is
three and four five so they get joined
like this and now the next group is
gonna be two and three so we join the
group corresponding to two one the
group's one two three in the dendrogram
and notice that the dendogram goes up
because these distances increase so
every time we make a new joint it's
higher than the previous one the next
one that we joined are three and six so
we end up joining these two trees like
that and so that's it we have a lot of
information about this set in this
dendrogram and now how do we decide
where to cut well let's say we cut over
here at a certain distance and that
gives us two clusters which are this one
one two three four and five and this one
which is six seven and eight so notice
that we made the decision on cutting
based on how much a distance is too far
away or how many clusters do we want to
obtain let's say the one obtained four
clusters so we cut out this distance
over here which gives us four clusters
the cluster formed by one and two the
one formed by three by itself the more
informed by four and five and the one
formed by six seven and eight so again
these decisions are taken by a human but
think about it again
let's say we have billions of points and
let's say that they live in a thousand
dimensional space it doesn't matter the
dendogram is still a two dimensional
figure and we can easily make decisions
on it so again a combination of a
computer algorithm and some smart human
decisions it what gives us the best
clustering and that's it
that's hierarchical clustering in a
nutshell clustering has some very
interesting applications and admissions
some of them things like genetics or
evolutionary biology the genome carries
a lot of information about a species and
if you manage to cluster them you get to
understand a lot about species and how
they evolved into what they are right
now
other things I recommend are systems use
a lot of clustering for example the way
you may have got this video recommended
was using several methods that include
clustering users grouping them into into
similar users so maybe somebody very
similar then you watch this video and
that's why you gotta recommend it and
that brings us to social networks which
is another place where a clustering is
used a lot in a very similar example
than the one we did social networks use
these methods to group users into
certain similar groups based on
demographics based on behavior and then
be able to target information to them
that they want to see or suggest your
friends that are similar to you at
cetera so that's all for now thank you
very much for your attention as usual if
you would like to see some more of this
content please subscribe or hit like
feel free share with your friends and
feel free to throw in a comment and tell
me what you think of the video or if you
have any suggestions for other videos
you'd like to see and my twitter handle
is Luis likes math if you'd like to
tweet at me so thanks again and see you
in the next video
5.0 / 5 (0 votes)