MINI-LESSON 5: Correlation, the intuition. Doesn't mean what people usually think it means.

N N Taleb's Probability Moocs
4 May 202114:16

Summary

TLDRThis lecture delves into the complexities and misconceptions surrounding the concept of correlation. It clarifies that correlation does not necessarily reflect the dependence between variables, using examples to illustrate how it can be misleading, especially in non-linear relationships. The speaker emphasizes the limitations of correlation, such as its inability to account for noise and its non-additive nature. The lecture advocates for the use of mutual information as a more powerful metric for capturing true associations between variables, providing a visual representation to better understand the information content conveyed by different correlation values.

Takeaways

  • 🧐 Correlation is often misunderstood and does not reflect the true dependence between variables.
  • 📊 The formula for calculating correlation involves the expectation of the product of the deviations from the mean of x and y, normalized by the square root of their variances.
  • 📉 Correlation can be misleading, as it can be zero even when there is a strong relationship between variables, as shown in the example where y equals x or -x depending on the sign of x.
  • 📈 Mutual information is a more powerful metric that can capture the true dependence between variables, unlike correlation.
  • 🔍 Correlation works best in simple linear models where the relationship between variables is direct and proportional.
  • 📊 Correlation is not additive; the local correlations in different quadrants of a graph may not sum up to the overall correlation.
  • 🚫 Correlation is not suitable for complex, non-linear relationships commonly found in real-life scenarios.
  • 🤔 The perception of correlation values can be deceptive; for example, a correlation of 0.25 is much closer to zero than to 0.5 in terms of actual information content.
  • 📚 The transcript emphasizes the importance of visual representation, such as graphs, in understanding the implications of correlation and other statistical measures.
  • 📈 For Gaussian distributions, there is a direct formula to calculate mutual information from correlation, but for non-Gaussian distributions, it must be computed individually.
  • 💡 The speaker suggests that mutual information, based on entropy and logarithms, is a more informative measure connected to the Kelly criterion, which is used in betting and decision-making.

Q & A

  • What does the speaker claim about the general understanding of correlation?

    -The speaker claims that few people truly understand what correlation is, and many who think they do, actually do not know what it means.

  • What is the formula for calculating correlation mentioned in the script?

    -The formula for calculating correlation is the expectation of (x - mean of x) * (y - mean of y) divided by the square root of the variance of x times the variance of y.

  • Why does the speaker say that correlation does not reflect dependence between variables?

    -The speaker argues that correlation does not reflect dependence because it only measures the linear relationship between variables, and does not account for non-linear relationships or noise in the data.

  • What is the example given to illustrate that correlation does not reflect dependence?

    -The example given is a situation where y equals x below zero and y equals -x above zero. Despite a clear non-linear relationship and dependence, the correlation (rho) between x and y is zero.

  • What is the alternative metric suggested to better understand dependence between variables?

    -The alternative metric suggested is mutual information, which is said to provide a more powerful and accurate measure of dependence.

  • Why does the speaker say that correlation is not additive?

    -The speaker explains that correlation is not additive because the local correlations in different quadrants do not sum up to the overall correlation. This is demonstrated with an example where the overall correlation is 0.5, but the local correlations in each quadrant are different.

  • What is the speaker's opinion on the use of correlation in IQ studies?

    -The speaker believes that IQ studies are generally either complete or partial nonsense when they rely on correlation, as correlation is not a reliable measure for non-linear relationships.

  • What is the correlation triangle mentioned by the speaker, and what does it represent?

    -The correlation triangle is a concept used by option traders to understand the relationships between different assets. It represents the idea that a correlation between two assets does not necessarily imply a correlation between a third asset.

  • Why does the speaker suggest that visual representation can help in understanding correlation better?

    -The speaker suggests that visual representation, such as graphs, can help because people often understand the implications of data better when they can see it visually, rather than just being given numerical metrics.

  • How does the speaker describe the relationship between mutual information and the Kelly criterion?

    -The speaker describes mutual information as an entropy-based measure that is closely connected to the Kelly criterion because it is based on logs, which indicate how much one would be willing to bet on an outcome based on the information available.

  • What is the formula for mutual information for a Gaussian distribution mentioned in the script?

    -For a Gaussian distribution, the formula for mutual information is the log of 1 plus the correlation, minus one half the log of 1 minus the square of the correlation.

Outlines

00:00

📊 Understanding Correlation and Its Limitations

This paragraph discusses the concept of correlation, emphasizing its common misunderstandings and limitations. The speaker clarifies that correlation does not reflect the dependence between variables, contrary to popular belief. The standard formula for calculating correlation is explained, highlighting how it can be misleading due to its reliance on variance and the mean. The paragraph also points out that correlation fails to capture the true relationship in certain scenarios, such as when variables have a non-linear relationship, as demonstrated with the example of y being equal to x and -x in different ranges. The concept of mutual information is introduced as a more powerful metric for capturing the true nature of dependence between variables.

05:01

🔍 The Misconceptions and Realities of Correlation

The speaker delves into the misconceptions surrounding correlation, explaining that it is not additive and does not necessarily reflect real-world relationships, especially in the presence of non-linearities. The paragraph addresses the issue of subadditivity in correlation, where the sum of local correlations does not equal the overall correlation. It also critiques the use of correlation in IQ studies and the flawed assumption that correlation can be extended through intermediaries (A correlates with B, and B with C, does not imply A correlates with C). The speaker emphasizes the importance of visual representation in understanding correlation and introduces the concept of mutual information as a more informative measure of association.

10:02

📈 Visualizing Correlation and Mutual Information

In this paragraph, the focus shifts to visualizing the concept of correlation and mutual information. The speaker uses a Monte Carlo simulation to demonstrate how correlation can be zero even when there is a clear relationship between variables, and how increasing the correlation coefficient does not necessarily convey more information. The paragraph illustrates the difference between Pearson correlation and mutual information, showing that the latter provides a more accurate representation of the information content in a relationship. The speaker concludes by advocating for the use of mutual information over Pearson correlation, especially in non-Gaussian scenarios, and emphasizes the importance of understanding the visual representation of data.

Mindmap

Keywords

💡Correlation

Correlation is a statistical measure that expresses the extent to which two variables are linearly related. In the video, the speaker explains that while many people think they understand correlation, it is often misunderstood. The script uses the formula for Pearson correlation to illustrate its computation and points out that it does not necessarily reflect the true dependence between variables, as shown with the example where the correlation is zero but the dependence is high.

💡Non-linearity

Non-linearity refers to the property of a mathematical relationship or function where a change in the input does not result in a proportional change in the output. The video emphasizes that correlation, especially Pearson correlation, is not well-suited for understanding non-linear relationships, which are common in real-world scenarios. The speaker uses the concept of non-linearity to critique the limitations of correlation as a measure of dependence.

💡Expectation

In statistics, expectation is another term for the mean or average of a random variable. The script mentions the computation of correlation as being similar to the expectation of the product of the deviations of two variables from their respective means. This is a fundamental concept in understanding how correlation quantifies the strength and direction of a linear relationship.

💡Variance

Variance is a measure of the dispersion or spread of a set of data points. In the context of the video, variance is used in the denominator of the correlation formula to normalize the covariance between two variables. The speaker points out that correlation, by using variance, can be influenced by the scale of the variables, which may not accurately reflect their relationship.

💡Mutual Information

Mutual information is a statistical measure used to quantify the amount of information obtained about one random variable through observing another random variable. The video contrasts mutual information with correlation, suggesting that it provides a more accurate representation of the dependence between variables, especially in cases of non-linear relationships.

💡Dependence

Dependence in statistics refers to the relationship between two variables where changes in one variable are associated with changes in the other. The video script argues that correlation does not always reflect the true dependence between variables, and it uses the example of a situation where variables have zero correlation but high dependence.

💡Standard Deviation

Standard deviation is a measure that is used to quantify the amount of variation or dispersion of a set of values. In the script, standard deviation is related to the concept of correlation as it is part of the normalization process in the computation of correlation, helping to adjust for the scale of the variables.

💡Additivity

Additivity in the context of the video refers to the property that the correlation of a combination of variables is the sum of the correlations of the individual pairs. The speaker explains that correlation is not additive, meaning that the correlation of a set of variables does not simply sum up to the overall correlation, which is a limitation of the measure.

💡IQ Studies

IQ Studies in the video are mentioned as an example of where correlation can be misused or misinterpreted. The speaker criticizes such studies for relying on correlation as a measure without considering the non-linearities and complexities of intelligence and performance.

💡Monte Carlo

Monte Carlo is a method of computational simulation that relies on repeated random sampling to obtain numerical results. In the script, the speaker uses Monte Carlo simulation to generate random variables and visually demonstrate the concept of correlation, showing how the distribution of points can affect the perceived correlation.

💡Mutivariate Gaussian

A multivariate Gaussian, or multivariate normal distribution, is a generalization of the one-dimensional normal distribution to higher dimensions. The video uses the concept of a multivariate Gaussian to illustrate the limitations of correlation, pointing out that even when the correlation is zero, the variables can still be dependent.

Highlights

Correlation and non-linearity are often misunderstood and can be dangerous concepts.

Correlation does not reflect the dependence between variables, contrary to common belief.

The formula for correlation involves the expectation of the product of deviations from the mean of x and y, divided by the product of their standard deviations.

Correlation can be misrepresentative, similar to standard deviation, due to giving different weights to associations.

An example is given where correlation is zero, but the dependence between variables is significant.

Mutual information is introduced as a more powerful metric than correlation for assessing dependence.

Correlation works well in simple linear models but fails in more complex, non-linear scenarios.

The concept of Pearson correlation is explained, and its limitations are discussed.

Correlation is not additive, as demonstrated with an example of local correlations in quadrants.

The transcript discusses the misuse of correlation in finance, particularly with options and market relationships.

IQ studies are criticized for their reliance on correlation, which may not provide meaningful information.

The transcript explains the fallacy of assuming transitive correlation, e.g., A correlates with B, and B with C, does not imply A correlates with C.

The perception of correlation values is often incorrect, with small values like 0.25 being closer to zero than to 0.5.

Visual representation of correlation can be more informative than numerical values alone.

Mutual information is highlighted as a metric that can effectively capture non-linear relationships.

The transcript concludes with a warning against trusting correlation without visual or graphical analysis.

Transcripts

play00:02

friends

play00:03

welcome back we're going to talk today

play00:06

about correlation

play00:08

orgy palamo de la corella

play00:11

nesuno capiche da vero la

play00:16

nobody really knows what it is i mean or

play00:18

not nobody few people know the

play00:19

correlation

play00:26

and the non-linearity is very difficult

play00:33

it's very dangerous concept a lot of

play00:35

people think they understand it

play00:37

and they don't know while they don't

play00:38

know what it means

play00:40

so let's start with uh

play00:44

with very simply you know what it means

play00:48

i have x

play00:51

x1 xn and y

play00:55

y1 yn okay as usual

play01:00

when i tell you what is the correlation

play01:02

between x and y

play01:03

does it reflect the dependence no it

play01:05

does not reflect the dependence between

play01:07

x and y

play01:09

the way the correlation is computed

play01:13

which is pretty much like the

play01:14

expectation

play01:18

of x minus the mean of x

play01:21

y minus mean of y over

play01:25

square root of variance of x

play01:28

variance of y okay that's a

play01:32

the measure for correlation and that

play01:35

measure

play01:38

as we saw with standard deviation which

play01:40

is uh the if you gotta

play01:41

you know uh you want to really uh

play01:45

redo a lecture look at the one on

play01:47

standard deviation it's the same concept

play01:48

we're multiplying things

play01:50

as you see we're multiplying say if i'm

play01:52

normalized

play01:54

x by y so basically it's normalized x

play01:57

normalized covariance okay x y when you

play02:00

multiply things

play02:01

visibly you give a bigger weight to when

play02:04

associations

play02:05

are large and

play02:09

smaller weight to the smaller ones

play02:13

so same problem as we have with standard

play02:14

deviation misrepresented in some cases

play02:18

now you would think that correlation

play02:22

you know reflects dependence i'm going

play02:24

to show you

play02:26

a situation where

play02:29

below zero y

play02:33

equals minus x and above 0 y equals

play02:37

sorry y equal x and minus x here

play02:40

okay and this is x with no noise

play02:44

100 you know x you will know why

play02:48

okay guess what

play02:52

what is the correlation rho x y equals

play02:56

zero

play02:58

what's the dependence between these two

play03:00

huge

play03:02

okay so actually when we use a

play03:05

more powerful uh metric used called uh

play03:08

mutual information

play03:10

uh applied to this we see it's infinite

play03:12

but as certain as

play03:14

things can be so

play03:17

this is the first problem the

play03:20

correlation you know doesn't

play03:24

uh reflect what you think it does

play03:26

reflect defendant

play03:28

where does correlation work

play03:32

in very simple models very simple

play03:36

well you have y equals

play03:40

uh a intersect plus b of x

play03:45

v times x plus some noise okay

play03:49

this number here the beta

play03:52

is constant so if you draw if you

play03:56

normalize

play03:59

you know then you draw

play04:03

this okay

play04:07

x y this b

play04:10

will be the correlation okay that's

play04:14

basically it

play04:15

so and why would be the correlation or

play04:18

is it very good tell me what it doesn't

play04:20

degree how much noise doesn't

play04:21

doesn't describe how much noise it does

play04:24

because we've normalized

play04:25

normalized means use

play04:29

x minus x bar over

play04:34

sigma x and the same for y in other

play04:36

words you

play04:37

you you're scaling by the standard

play04:39

deviation after reducing by the mean

play04:41

so mean zero standard deviation of one

play04:45

so because as this angle

play04:49

becomes bigger okay

play04:54

the the the noise the ratio of

play04:57

information to noise

play04:59

increases and as this one becomes

play05:01

smaller because the noise stays the same

play05:03

that's pretty much the intuition so

play05:05

correlation works when you have a linear

play05:08

relationship and we're talking about

play05:09

pearson correlation the other ones

play05:11

actually are worse

play05:12

okay

play05:16

another problem with correlation

play05:20

is that it's not additive okay

play05:24

let's take x y

play05:28

and generate noise with a correlation of

play05:32

0.5

play05:34

rho x y equals 0.5

play05:39

what local correlation will you get in

play05:41

every quadrant

play05:42

say x positive y positive we're

play05:45

normalizing

play05:46

okay this is about

play05:51

two six point two six

play05:55

this is about point sixteen point

play05:57

sixteen

play05:59

so if you add them up whether you

play06:01

probabilize you know you give the

play06:03

probability of being each quadrant

play06:04

thirty eight percent i think 38

play06:07

and then one minus okay uh half of

play06:10

the half of one minus uh the sum

play06:14

whatever you get probabilities of each

play06:15

quadrant you weigh him

play06:17

you realize correlation is subadditive

play06:20

in that case

play06:22

so in other words you have correlation

play06:24

of 50

play06:25

overall you'd get point 26 and a

play06:28

positive polygon

play06:29

0.26 negative 1. so

play06:33

it's weird because the metric is not

play06:35

designed

play06:36

to be used outside a very simple narrow

play06:40

model and not generalized to things

play06:41

in real life that have non-linearities

play06:44

as we saw with that

play06:46

that payoff incidentally that payoff

play06:48

like that inverted v

play06:49

is payoff of a short straddle so we in

play06:52

finance know

play06:53

that you cannot have correlation between

play06:55

options

play06:56

and option p l and the market because

play06:58

we're based on second order terms

play07:01

uh and and a positive v this is long

play07:04

thread the same thing it's not related

play07:05

to the market

play07:06

but it depends 100 on the market so

play07:10

uh

play07:13

you can't do really partial correlation

play07:15

and and of course

play07:16

you understand now why iq studies

play07:20

are generally uh complete either

play07:23

complete bs

play07:24

or uh partial bs you know

play07:27

completely s in a sense that

play07:30

you can't use it for any information or

play07:32

partially sometimes you can use them in

play07:33

the negative domain

play07:35

because as you see

play07:40

let's say i have iq

play07:45

an iq metric that works this is

play07:46

performance

play07:48

this is iq and doesn't work in a

play07:51

positive domain

play07:53

you have non-linearities if you have

play07:55

non-linearity the row is uninformative

play07:59

there are a lot they make incidentally a

play08:01

lot of mistakes in in the

play08:02

iq i'll do that in a more complicated

play08:05

more technical lecture this is

play08:07

meant to be non-technical but let me

play08:11

explain another thing that they make

play08:14

they use proxy for correlation that a

play08:17

correlates to b

play08:20

and b correlates to c it doesn't mean

play08:24

that a correlates to c

play08:26

and if it does it does it's not the same

play08:28

you know if this can be point eight can

play08:30

be point eight doesn't mean it's gonna

play08:32

be point eight

play08:32

so because you have cross terms this we

play08:35

learn

play08:36

as option traders when we start doing

play08:38

correlation trades

play08:40

correlation triangles see my book

play08:42

dynamic hedging

play08:44

now quickly

play08:48

i'm going to show you the problem with

play08:49

correlation

play08:51

that in the mind of people okay so

play08:54

it has a correlation

play08:59

point two five zero

play09:02

point five you have the feeling that

play09:05

a correlation of 0.25 is half of the

play09:07

correlation between 0 and 0.5

play09:11

it's not of course it's going to be

play09:15

a correlation of 0.25 is much closer to

play09:18

0

play09:19

which just tells you that correlation

play09:21

0.1.5 even 0.25 has no information in it

play09:24

okay it looks good but it doesn't have

play09:26

any information and i'm going to show

play09:28

you how visually

play09:29

you can capture this point

play09:34

next

play09:36

okay so now let's look at

play09:39

what a correlation means a numerical

play09:44

number implies as far as

play09:50

the the the real information and and you

play09:53

can see information with the naked eye

play09:55

as a bunch of researchers figured out

play09:57

that economists couldn't interpret their

play09:59

own numbers

play10:00

but when you show them graphs they can

play10:02

interpret the implication of the graph

play10:04

i mean the economy should be normal

play10:05

that's people understand with their eyes

play10:07

things a lot better than

play10:09

when you're given metrics uh that they

play10:12

you know

play10:12

they don't even know how how these

play10:14

metrics were derived

play10:16

so i'm showing here x as a random

play10:19

variable

play10:19

y the same random variable before we

play10:22

generate

play10:23

by monte carlo let me do it in front of

play10:25

you

play10:26

a uh you know joint

play10:30

xy you know points so you put in

play10:32

two-dimensional space

play10:33

and look at here correlation is zero

play10:35

pretty much

play10:37

things are spread you have more

play10:38

concentration in the middle because it's

play10:39

a gaussian

play10:40

and incidentally forgot to mention

play10:42

multivariate gaussian rho zero

play10:44

so uh you go to a correlation of half

play10:47

not much better than zero not much

play10:50

better

play10:51

so if i show them to you separately you

play10:53

probably wouldn't wouldn't

play10:55

be able to tell which one's which point

play10:58

eight

play10:59

not much different from one half okay so

play11:02

maybe you need to square if we square

play11:07

it doesn't get much better it doesn't

play11:10

get better at all look

play11:12

zero the distance between three quarters

play11:15

and one and certainly you can't have a

play11:18

correlation of one

play11:19

and a multivariate gaussian that blows

play11:21

up so you

play11:22

you know approximate one uh

play11:25

three-quarters a lot closer to zero

play11:28

than it is to one one and

play11:32

only one measure can fix a problem

play11:36

mutual information and

play11:39

to go back to my roots as a trader

play11:42

it really tells you visual information

play11:43

is an entropy based measure

play11:45

very very connected to the

play11:49

kelly criterion because it's based on

play11:51

logs

play11:52

which tells you how much would you

play11:55

willing to bet

play11:56

on why knowing x how many dollars

play12:01

so if you have certainty of course you

play12:02

can bet of all you can

play12:06

if you have

play12:09

some degree of uncertainty you bet less

play12:11

okay and and so on and and pretty much

play12:13

how much you bet

play12:14

is distance and that works beautifully

play12:17

for genetics

play12:18

much better than correlation for a

play12:21

gaussian there's a map one to one

play12:24

that metric is called mutual information

play12:26

and you can see here

play12:27

the mutual information is in green how

play12:30

it rises

play12:32

in how mutual information rises so in

play12:36

other words you see the non-linearity so

play12:39

0.5 okay is

play12:43

a lot closer to 0 than it is for 0.1

play12:47

0.25 a lot closer to 0 than this 2.5 and

play12:50

so on

play12:51

and you can see it with a green line

play12:55

uh i cheated to eliminate infinities so

play12:58

you know you can play with it but just

play13:00

just as a concept gives you a notion of

play13:02

relative

play13:03

information content that you're getting

play13:05

via the

play13:06

metric the corresponding

play13:13

correlation so

play13:16

for a gaussian as i said is a simple

play13:18

formula a log of

play13:21

a minus one half log of one minus rho

play13:24

square

play13:26

but for non-gaussian it is

play13:29

[Music]

play13:30

you have to recompute it one by one

play13:34

so i think uh i'm done with this lecture

play13:37

on correlation

play13:38

remember one thing that uh correlation

play13:42

doesn't mean anything unless you've seen

play13:43

a graph

play13:45

do not trust social scientist dealing

play13:47

with correlation only trust people who

play13:49

have derived it

play13:51

or derived some metric to describe an

play13:54

association

play13:55

and and other metrics scandal and all

play13:58

these

play13:59

derivatives metrics

play14:02

from the pearson original one

play14:06

original cross-moment correlation audio

play14:08

actually give you worse

play14:09

information so thank you and have a good

play14:14

day

Rate This

5.0 / 5 (0 votes)

Связанные теги
CorrelationStatistical AnalysisNon-linearityMutual InformationData InterpretationGaussian DistributionEconomicsFinanceIQ StudiesOption TradingCorrelation Misconceptions
Вам нужно краткое изложение на английском?