MINI-LESSON 5: Correlation, the intuition. Doesn't mean what people usually think it means.
Summary
TLDRThis lecture delves into the complexities and misconceptions surrounding the concept of correlation. It clarifies that correlation does not necessarily reflect the dependence between variables, using examples to illustrate how it can be misleading, especially in non-linear relationships. The speaker emphasizes the limitations of correlation, such as its inability to account for noise and its non-additive nature. The lecture advocates for the use of mutual information as a more powerful metric for capturing true associations between variables, providing a visual representation to better understand the information content conveyed by different correlation values.
Takeaways
- đ§ Correlation is often misunderstood and does not reflect the true dependence between variables.
- đ The formula for calculating correlation involves the expectation of the product of the deviations from the mean of x and y, normalized by the square root of their variances.
- đ Correlation can be misleading, as it can be zero even when there is a strong relationship between variables, as shown in the example where y equals x or -x depending on the sign of x.
- đ Mutual information is a more powerful metric that can capture the true dependence between variables, unlike correlation.
- đ Correlation works best in simple linear models where the relationship between variables is direct and proportional.
- đ Correlation is not additive; the local correlations in different quadrants of a graph may not sum up to the overall correlation.
- đ« Correlation is not suitable for complex, non-linear relationships commonly found in real-life scenarios.
- đ€ The perception of correlation values can be deceptive; for example, a correlation of 0.25 is much closer to zero than to 0.5 in terms of actual information content.
- đ The transcript emphasizes the importance of visual representation, such as graphs, in understanding the implications of correlation and other statistical measures.
- đ For Gaussian distributions, there is a direct formula to calculate mutual information from correlation, but for non-Gaussian distributions, it must be computed individually.
- đĄ The speaker suggests that mutual information, based on entropy and logarithms, is a more informative measure connected to the Kelly criterion, which is used in betting and decision-making.
Q & A
What does the speaker claim about the general understanding of correlation?
-The speaker claims that few people truly understand what correlation is, and many who think they do, actually do not know what it means.
What is the formula for calculating correlation mentioned in the script?
-The formula for calculating correlation is the expectation of (x - mean of x) * (y - mean of y) divided by the square root of the variance of x times the variance of y.
Why does the speaker say that correlation does not reflect dependence between variables?
-The speaker argues that correlation does not reflect dependence because it only measures the linear relationship between variables, and does not account for non-linear relationships or noise in the data.
What is the example given to illustrate that correlation does not reflect dependence?
-The example given is a situation where y equals x below zero and y equals -x above zero. Despite a clear non-linear relationship and dependence, the correlation (rho) between x and y is zero.
What is the alternative metric suggested to better understand dependence between variables?
-The alternative metric suggested is mutual information, which is said to provide a more powerful and accurate measure of dependence.
Why does the speaker say that correlation is not additive?
-The speaker explains that correlation is not additive because the local correlations in different quadrants do not sum up to the overall correlation. This is demonstrated with an example where the overall correlation is 0.5, but the local correlations in each quadrant are different.
What is the speaker's opinion on the use of correlation in IQ studies?
-The speaker believes that IQ studies are generally either complete or partial nonsense when they rely on correlation, as correlation is not a reliable measure for non-linear relationships.
What is the correlation triangle mentioned by the speaker, and what does it represent?
-The correlation triangle is a concept used by option traders to understand the relationships between different assets. It represents the idea that a correlation between two assets does not necessarily imply a correlation between a third asset.
Why does the speaker suggest that visual representation can help in understanding correlation better?
-The speaker suggests that visual representation, such as graphs, can help because people often understand the implications of data better when they can see it visually, rather than just being given numerical metrics.
How does the speaker describe the relationship between mutual information and the Kelly criterion?
-The speaker describes mutual information as an entropy-based measure that is closely connected to the Kelly criterion because it is based on logs, which indicate how much one would be willing to bet on an outcome based on the information available.
What is the formula for mutual information for a Gaussian distribution mentioned in the script?
-For a Gaussian distribution, the formula for mutual information is the log of 1 plus the correlation, minus one half the log of 1 minus the square of the correlation.
Outlines
đ Understanding Correlation and Its Limitations
This paragraph discusses the concept of correlation, emphasizing its common misunderstandings and limitations. The speaker clarifies that correlation does not reflect the dependence between variables, contrary to popular belief. The standard formula for calculating correlation is explained, highlighting how it can be misleading due to its reliance on variance and the mean. The paragraph also points out that correlation fails to capture the true relationship in certain scenarios, such as when variables have a non-linear relationship, as demonstrated with the example of y being equal to x and -x in different ranges. The concept of mutual information is introduced as a more powerful metric for capturing the true nature of dependence between variables.
đ The Misconceptions and Realities of Correlation
The speaker delves into the misconceptions surrounding correlation, explaining that it is not additive and does not necessarily reflect real-world relationships, especially in the presence of non-linearities. The paragraph addresses the issue of subadditivity in correlation, where the sum of local correlations does not equal the overall correlation. It also critiques the use of correlation in IQ studies and the flawed assumption that correlation can be extended through intermediaries (A correlates with B, and B with C, does not imply A correlates with C). The speaker emphasizes the importance of visual representation in understanding correlation and introduces the concept of mutual information as a more informative measure of association.
đ Visualizing Correlation and Mutual Information
In this paragraph, the focus shifts to visualizing the concept of correlation and mutual information. The speaker uses a Monte Carlo simulation to demonstrate how correlation can be zero even when there is a clear relationship between variables, and how increasing the correlation coefficient does not necessarily convey more information. The paragraph illustrates the difference between Pearson correlation and mutual information, showing that the latter provides a more accurate representation of the information content in a relationship. The speaker concludes by advocating for the use of mutual information over Pearson correlation, especially in non-Gaussian scenarios, and emphasizes the importance of understanding the visual representation of data.
Mindmap
Keywords
đĄCorrelation
đĄNon-linearity
đĄExpectation
đĄVariance
đĄMutual Information
đĄDependence
đĄStandard Deviation
đĄAdditivity
đĄIQ Studies
đĄMonte Carlo
đĄMutivariate Gaussian
Highlights
Correlation and non-linearity are often misunderstood and can be dangerous concepts.
Correlation does not reflect the dependence between variables, contrary to common belief.
The formula for correlation involves the expectation of the product of deviations from the mean of x and y, divided by the product of their standard deviations.
Correlation can be misrepresentative, similar to standard deviation, due to giving different weights to associations.
An example is given where correlation is zero, but the dependence between variables is significant.
Mutual information is introduced as a more powerful metric than correlation for assessing dependence.
Correlation works well in simple linear models but fails in more complex, non-linear scenarios.
The concept of Pearson correlation is explained, and its limitations are discussed.
Correlation is not additive, as demonstrated with an example of local correlations in quadrants.
The transcript discusses the misuse of correlation in finance, particularly with options and market relationships.
IQ studies are criticized for their reliance on correlation, which may not provide meaningful information.
The transcript explains the fallacy of assuming transitive correlation, e.g., A correlates with B, and B with C, does not imply A correlates with C.
The perception of correlation values is often incorrect, with small values like 0.25 being closer to zero than to 0.5.
Visual representation of correlation can be more informative than numerical values alone.
Mutual information is highlighted as a metric that can effectively capture non-linear relationships.
The transcript concludes with a warning against trusting correlation without visual or graphical analysis.
Transcripts
friends
welcome back we're going to talk today
about correlation
orgy palamo de la corella
nesuno capiche da vero la
nobody really knows what it is i mean or
not nobody few people know the
correlation
and the non-linearity is very difficult
it's very dangerous concept a lot of
people think they understand it
and they don't know while they don't
know what it means
so let's start with uh
with very simply you know what it means
i have x
x1 xn and y
y1 yn okay as usual
when i tell you what is the correlation
between x and y
does it reflect the dependence no it
does not reflect the dependence between
x and y
the way the correlation is computed
which is pretty much like the
expectation
of x minus the mean of x
y minus mean of y over
square root of variance of x
variance of y okay that's a
the measure for correlation and that
measure
as we saw with standard deviation which
is uh the if you gotta
you know uh you want to really uh
redo a lecture look at the one on
standard deviation it's the same concept
we're multiplying things
as you see we're multiplying say if i'm
normalized
x by y so basically it's normalized x
normalized covariance okay x y when you
multiply things
visibly you give a bigger weight to when
associations
are large and
smaller weight to the smaller ones
so same problem as we have with standard
deviation misrepresented in some cases
now you would think that correlation
you know reflects dependence i'm going
to show you
a situation where
below zero y
equals minus x and above 0 y equals
sorry y equal x and minus x here
okay and this is x with no noise
100 you know x you will know why
okay guess what
what is the correlation rho x y equals
zero
what's the dependence between these two
huge
okay so actually when we use a
more powerful uh metric used called uh
mutual information
uh applied to this we see it's infinite
but as certain as
things can be so
this is the first problem the
correlation you know doesn't
uh reflect what you think it does
reflect defendant
where does correlation work
in very simple models very simple
well you have y equals
uh a intersect plus b of x
v times x plus some noise okay
this number here the beta
is constant so if you draw if you
normalize
you know then you draw
this okay
x y this b
will be the correlation okay that's
basically it
so and why would be the correlation or
is it very good tell me what it doesn't
degree how much noise doesn't
doesn't describe how much noise it does
because we've normalized
normalized means use
x minus x bar over
sigma x and the same for y in other
words you
you you're scaling by the standard
deviation after reducing by the mean
so mean zero standard deviation of one
so because as this angle
becomes bigger okay
the the the noise the ratio of
information to noise
increases and as this one becomes
smaller because the noise stays the same
that's pretty much the intuition so
correlation works when you have a linear
relationship and we're talking about
pearson correlation the other ones
actually are worse
okay
another problem with correlation
is that it's not additive okay
let's take x y
and generate noise with a correlation of
0.5
rho x y equals 0.5
what local correlation will you get in
every quadrant
say x positive y positive we're
normalizing
okay this is about
two six point two six
this is about point sixteen point
sixteen
so if you add them up whether you
probabilize you know you give the
probability of being each quadrant
thirty eight percent i think 38
and then one minus okay uh half of
the half of one minus uh the sum
whatever you get probabilities of each
quadrant you weigh him
you realize correlation is subadditive
in that case
so in other words you have correlation
of 50
overall you'd get point 26 and a
positive polygon
0.26 negative 1. so
it's weird because the metric is not
designed
to be used outside a very simple narrow
model and not generalized to things
in real life that have non-linearities
as we saw with that
that payoff incidentally that payoff
like that inverted v
is payoff of a short straddle so we in
finance know
that you cannot have correlation between
options
and option p l and the market because
we're based on second order terms
uh and and a positive v this is long
thread the same thing it's not related
to the market
but it depends 100 on the market so
uh
you can't do really partial correlation
and and of course
you understand now why iq studies
are generally uh complete either
complete bs
or uh partial bs you know
completely s in a sense that
you can't use it for any information or
partially sometimes you can use them in
the negative domain
because as you see
let's say i have iq
an iq metric that works this is
performance
this is iq and doesn't work in a
positive domain
you have non-linearities if you have
non-linearity the row is uninformative
there are a lot they make incidentally a
lot of mistakes in in the
iq i'll do that in a more complicated
more technical lecture this is
meant to be non-technical but let me
explain another thing that they make
they use proxy for correlation that a
correlates to b
and b correlates to c it doesn't mean
that a correlates to c
and if it does it does it's not the same
you know if this can be point eight can
be point eight doesn't mean it's gonna
be point eight
so because you have cross terms this we
learn
as option traders when we start doing
correlation trades
correlation triangles see my book
dynamic hedging
now quickly
i'm going to show you the problem with
correlation
that in the mind of people okay so
it has a correlation
point two five zero
point five you have the feeling that
a correlation of 0.25 is half of the
correlation between 0 and 0.5
it's not of course it's going to be
a correlation of 0.25 is much closer to
0
which just tells you that correlation
0.1.5 even 0.25 has no information in it
okay it looks good but it doesn't have
any information and i'm going to show
you how visually
you can capture this point
next
okay so now let's look at
what a correlation means a numerical
number implies as far as
the the the real information and and you
can see information with the naked eye
as a bunch of researchers figured out
that economists couldn't interpret their
own numbers
but when you show them graphs they can
interpret the implication of the graph
i mean the economy should be normal
that's people understand with their eyes
things a lot better than
when you're given metrics uh that they
you know
they don't even know how how these
metrics were derived
so i'm showing here x as a random
variable
y the same random variable before we
generate
by monte carlo let me do it in front of
you
a uh you know joint
xy you know points so you put in
two-dimensional space
and look at here correlation is zero
pretty much
things are spread you have more
concentration in the middle because it's
a gaussian
and incidentally forgot to mention
multivariate gaussian rho zero
so uh you go to a correlation of half
not much better than zero not much
better
so if i show them to you separately you
probably wouldn't wouldn't
be able to tell which one's which point
eight
not much different from one half okay so
maybe you need to square if we square
it doesn't get much better it doesn't
get better at all look
zero the distance between three quarters
and one and certainly you can't have a
correlation of one
and a multivariate gaussian that blows
up so you
you know approximate one uh
three-quarters a lot closer to zero
than it is to one one and
only one measure can fix a problem
mutual information and
to go back to my roots as a trader
it really tells you visual information
is an entropy based measure
very very connected to the
kelly criterion because it's based on
logs
which tells you how much would you
willing to bet
on why knowing x how many dollars
so if you have certainty of course you
can bet of all you can
if you have
some degree of uncertainty you bet less
okay and and so on and and pretty much
how much you bet
is distance and that works beautifully
for genetics
much better than correlation for a
gaussian there's a map one to one
that metric is called mutual information
and you can see here
the mutual information is in green how
it rises
in how mutual information rises so in
other words you see the non-linearity so
0.5 okay is
a lot closer to 0 than it is for 0.1
0.25 a lot closer to 0 than this 2.5 and
so on
and you can see it with a green line
uh i cheated to eliminate infinities so
you know you can play with it but just
just as a concept gives you a notion of
relative
information content that you're getting
via the
metric the corresponding
correlation so
for a gaussian as i said is a simple
formula a log of
a minus one half log of one minus rho
square
but for non-gaussian it is
[Music]
you have to recompute it one by one
so i think uh i'm done with this lecture
on correlation
remember one thing that uh correlation
doesn't mean anything unless you've seen
a graph
do not trust social scientist dealing
with correlation only trust people who
have derived it
or derived some metric to describe an
association
and and other metrics scandal and all
these
derivatives metrics
from the pearson original one
original cross-moment correlation audio
actually give you worse
information so thank you and have a good
day
5.0 / 5 (0 votes)