MINI LESSON 7: P-Values and P-Value Hacking: a simplified lecture.

N N Taleb's Probability Moocs
22 May 202109:25

Summary

TLDRIn this lecture, the speaker expresses skepticism about the p-value, a statistical concept often used in hypothesis testing. They argue that p-values lack a solid probabilistic basis and are frequently misunderstood. The speaker emphasizes that metrics are often stochastic and can be influenced by survivorship bias, leading to skewed results. They explain that p-values are stochastic and may not accurately represent the probability of a true effect. The lecture warns against the misuse of p-values and suggests considering alternative statistical methods, particularly in fields like psychology where sample sizes are often small.

Takeaways

  • πŸ˜• The speaker expresses skepticism about the p-value, suggesting it lacks a solid probabilistic basis despite its widespread use.
  • πŸ” The lecture emphasizes that metrics are often stochastic variables, meaning they can vary with each sample, especially when the sample size is small.
  • πŸ“š It's highlighted that stochastic variables can lead to survivorship bias, where only the most successful outcomes are observed, skewing the understanding of the variable's distribution.
  • πŸ“‰ The p-value is explained as the probability of observing a statistic as extreme as, or more extreme than, the one observed, under the null hypothesis.
  • πŸ€” The speaker points out that the p-value is itself stochastic and can be skewed, with most values falling below the true mean.
  • 🧐 The p-value's problem is exacerbated by the fact that it does not account for the sample size 'n', which is crucial for understanding its significance.
  • πŸ”¬ The speaker warns against 'gaming the metric' by running multiple experiments and taking the maximum p-value, which can lead to misleading results.
  • πŸ“ˆ The distribution of the maximum p-value is lower than the true p-value, suggesting that repeated experiments can artificially lower the p-value and misrepresent significance.
  • πŸ“š Reference is made to the speaker's book 'Statistical Consequences of Fat Tails', where further discussion on this topic can be found.
  • 🚫 The speaker advises caution when using p-values and suggests considering alternative methods for statistical analysis.
  • πŸ€“ The preference for p-values among psychologists is critiqued, implying that larger sample sizes in other fields might offer more robust results.

Q & A

  • What is the speaker's opinion on p-values?

    -The speaker is not in favor of p-values, considering them to be a concept without a strong probabilistic basis and not very solid, despite their widespread use.

  • What are the two central points the speaker wants to recap from the session on correlation and metrification?

    -The first point is that metrics are often stochastic variables that converge to something by the law of large numbers but can vary in sample, especially with small sample sizes. The second point is the presence of survivorship bias in stochastic variables, where one may only see the upper bound of outcomes.

  • What does the speaker mean by 'hacking of variables'?

    -The speaker refers to the manipulation of stochastic variables to achieve the upper bound of results, which can be misleading as the distribution of the maximum is different from the distribution of the variable itself.

  • What is the p-value and how is it calculated in the context of the speaker's explanation?

    -The p-value is the probability of observing a statistic as extreme as, or more extreme than, the observed value, assuming the null hypothesis is true. It is calculated as the probability that the z-score (mean minus a hypothesized value divided by the standard deviation) is higher than a certain threshold.

  • Why does the speaker believe p-values are problematic?

    -The speaker argues that p-values are problematic because they are stochastic and do not inherently account for the sample size (n), which can lead to misunderstandings about the significance of the results.

  • What is the issue with p-values being stochastic according to the speaker?

    -The issue is that p-values themselves can vary and are skewed, with most values falling below their true mean, which can lead to incorrect conclusions about the significance of the results.

  • What is the 'survival function' mentioned by the speaker?

    -The survival function is the probability that a random variable is greater than a certain value. In the context of p-values, it refers to the probability of exceeding a certain threshold under the assumption of a null hypothesis.

  • Why does the speaker suggest that the p-value should be considerably smaller than 1?

    -The speaker suggests this because if the true p-value is significantly higher (e.g., 0.11), then achieving a p-value of 0.01 through multiple experiments is misleading and not representative of the actual probability.

  • What does the speaker imply about the use of p-values in psychology?

    -The speaker implies that psychologists may prefer p-values because they allow for smaller sample sizes, which might be easier to manage in a college campus setting but can lead to flawed quantitative analysis.

  • What advice does the speaker give regarding the use of p-values?

    -The speaker advises to be cautious with p-values and to consider alternative methods of analysis to avoid the pitfalls associated with their stochastic nature and potential for misinterpretation.

Outlines

00:00

πŸ˜– Critique of P-Value Reliability

The speaker expresses dissatisfaction with the concept of p-values, arguing that they lack a solid probabilistic foundation despite widespread use. They emphasize that metrics are often stochastic variables, meaning they converge to a value by the law of large numbers but can vary with different samples, especially when the sample size is small. The speaker also discusses the issue of survivorship bias, where only the most successful outcomes are observed, leading to a skewed perception of events. They highlight that p-values are stochastic and that the distribution of the maximum values is different from the variable itself, which is often misunderstood. The p-value is explained as the probability of observing a statistic as extreme as, or more extreme than, the one observed, given that the null hypothesis is true. The speaker points out that the scaling by the square root of n in p-value calculations does not eliminate the problem of stochasticity.

05:02

😑 The Problem with P-Values and Experimentation

This paragraph delves into the problems associated with p-values, noting that they are inherently stochastic and often skewed, with most values falling below their true mean. The speaker uses an example to illustrate that if the true p-value is 0.11, there's a significant chance of obtaining a p-value less than 0.05 through multiple experiments due to the distribution of maximum values. They argue that this discrepancy makes p-values a poor metric for determining statistical significance. The speaker also touches on the tendency to 'game' metrics by running multiple experiments to achieve a desirable p-value, which can mislead about the true significance of the results. They conclude by cautioning against the reliance on p-values and suggesting that alternative methods should be considered, particularly pointing out that psychologists seem to favor p-values, possibly due to smaller sample sizes in their studies.

Mindmap

Keywords

πŸ’‘p-value

The p-value is a statistical measure used to determine the probability that an observed result could have occurred by chance if a null hypothesis were true. In the video, the speaker expresses skepticism about the p-value, arguing that it lacks a strong probabilistic basis and is often misunderstood or misused. The script mentions that the p-value is stochastic, meaning it can vary with different samples, and criticizes its use without considering the sample size (n).

πŸ’‘stochastic variables

Stochastic variables, also known as random variables, are variables that can take on different values in different trials or experiments. In the context of the video, the speaker uses the term to discuss how metrics often rely on these variables, which can lead to variability in results. The script emphasizes that this variability can be problematic, especially when sample sizes are small.

πŸ’‘law of large numbers

The law of large numbers is a fundamental statistical principle that states as the number of trials increases, the average of the results converges to the expected value. The video script uses this concept to discuss how stochastic variables behave over many observations, suggesting that they tend to become more predictable as the sample size grows.

πŸ’‘survivorship bias

Survivorship bias is the logical error of focusing on the survivors of a process and overlooking those who did not survive. In the video, the speaker uses the concept to illustrate how stochastic variables can be manipulated or misinterpreted, particularly in the context of observing only successful outcomes, such as profitable traders, and ignoring the failures.

πŸ’‘correlation

Correlation is a statistical measure that expresses the extent to which two variables are linearly related. The script discusses the use of correlation in the context of stochastic variables, where small sample sizes can lead to unreliable measures of correlation, and the speaker criticizes the reliance on p-values to determine statistical significance in these cases.

πŸ’‘statistical significance

Statistical significance refers to the probability that a result is not due to chance. In the video, the speaker discusses how the p-value is used to determine whether a correlation is statistically significant, but criticizes this approach due to the stochastic nature of p-values and the lack of consideration for sample size.

πŸ’‘z-score

The z-score is a measure of how many standard deviations an element is from the mean. In the script, the speaker uses the z-score as an example of a statistic that is used in hypothesis testing, where the p-value is calculated based on the probability of observing a z-score as extreme as, or more extreme than, the one calculated from the sample data.

πŸ’‘hacking of variables

In the context of the video, 'hacking of variables' refers to the manipulation or exploitation of stochastic variables to achieve a desired outcome, such as a significant p-value. The speaker warns against this practice, as it can lead to misleading results and a misinterpretation of the data.

πŸ’‘maximum distribution

The maximum distribution refers to the distribution of the maximum values obtained from a set of observations. The video script discusses how the distribution of the maximum can be different from the distribution of the variable itself, leading to a skewed understanding of the data when only maximum values are considered.

πŸ’‘fat tails

Fat tails in a probability distribution indicate that extreme values are more likely than would be the case under a normal distribution. The speaker mentions 'fat tails' in the context of the book 'The Black Swan' by Nassim Nicholas Taleb, which discusses the impact of highly improbable events that have significant consequences.

πŸ’‘experimentation

Experimentation in the video refers to the process of conducting tests or studies to gather data. The speaker criticizes the practice of running multiple experiments and taking the maximum p-value as a result, which can lead to a false sense of statistical significance due to the stochastic nature of p-values.

Highlights

The speaker expresses dissatisfaction with the concept of p-value, considering it not solid and lacking a strong probabilistic basis despite its widespread use.

P-values are discussed as stochastic variables, which may not converge to a single value due to the law of large numbers, especially with small sample sizes.

The concept of survivorship bias is introduced, where only the successful outcomes are observed, leading to skewed perceptions.

The speaker warns of the potential for 'hacking' stochastic variables by repeatedly trying to achieve a desired outcome, which can distort results.

The distribution of the maximum value in a stochastic process is different from the distribution of the variable itself.

A detailed explanation of what p-value is, including its calculation and the assumptions behind it.

The p-value is revealed to be stochastic, with a distribution that often has most values below its true mean.

The probability of a p-value being less than certain thresholds (e.g., 0.01 or 0.05) is discussed, highlighting the skewness of the distribution.

The speaker criticizes the use of p-values in experiments, suggesting that they can be easily manipulated to achieve desired results.

The distribution of the maximum p-value in multiple experiments is discussed, and how it can lead to misleading conclusions.

A recommendation against relying solely on p-values, suggesting that alternative statistical methods should be considered.

The speaker notes that p-values are particularly favored in psychology due to the ease of use with smaller sample sizes.

A critique of the quantitative approach in psychology, suggesting that the reliance on p-values is flawed.

The speaker emphasizes the need for caution when using p-values and the importance of understanding their stochastic nature.

The transcript concludes with a general warning against the misuse of p-values and a call for more rigorous statistical practices.

The speaker provides a chapter reference for further reading on the topic, indicating a comprehensive discussion in their book.

Transcripts

play00:01

friends hello again

play00:04

today i finally talked about p-value

play00:07

and to be honest i'm not crazy about

play00:10

uh this lecture so if you don't know

play00:12

what p-value means

play00:14

forget about it don't just don't don't

play00:17

follow

play00:18

because it's a concept that is not very

play00:21

solid

play00:22

it doesn't have any strong probabilistic

play00:25

basis

play00:26

and yet is used uh abundantly

play00:29

let's recap from uh the session on

play00:32

correlation

play00:33

hacking uh and and

play00:36

and the metrification of things let's

play00:39

recap

play00:40

and focus on the two central points

play00:51

metrics are often stochastic variables

play00:54

in other words the random variables

play00:55

that by the law of large numbers

play00:57

converge to something

play00:59

sometimes they don't

play01:02

but they're not deterministic

play01:05

so in sample you may get a different

play01:08

result

play01:10

each time you try if particularly if

play01:13

your

play01:14

n observation is not very large that's

play01:17

the first point

play01:19

the second point is whenever something

play01:21

is stochastic

play01:23

you have what i

play01:26

call in most of the books uh this

play01:30

sanitary assignment evidence in other

play01:32

words there are a lot of things you

play01:33

don't see

play01:35

the survivorship bias if you will

play01:38

we saw it and fooled by randomness when

play01:40

you have a collection of traders

play01:43

you only see the upper bound people

play01:45

making money

play01:46

likewise as we saw in the black swan

play01:49

how you tend to see events that were

play01:52

recorded

play01:53

and have a bias and again this is the

play01:56

upper bound

play01:57

so the second point to remember is that

play02:00

you have hacking of these variables

play02:03

because someone could keep trying

play02:05

when something is stochastic to get the

play02:08

upper bound

play02:09

and the upper bound on average

play02:12

the distribution of the maximum is

play02:14

different from the distribution of the

play02:15

variable

play02:16

itself so people do not know

play02:21

that p-value is stochastic or at least

play02:25

from what i've seen since i had to write

play02:27

a paper on it

play02:29

the paper is buried in here chapter 19

play02:32

statistical consequences of fat tales

play02:35

and you can find it on archive because

play02:37

everything i write

play02:38

must be available for free for those who

play02:42

like you know

play02:45

electronic versions so

play02:49

what is the problem when we use

play02:52

correlation

play02:53

say i have x and y

play02:57

two vectors and then an observation and

play03:00

n is small you know you're gonna get a

play03:02

stochastic number

play03:04

when n becomes large compresses you

play03:07

reveal your end so people can know what

play03:10

you know whether your

play03:11

correlation is statistically significant

play03:13

or not

play03:14

so when you use correlation okay you

play03:17

provide people with an end

play03:19

the problem with the p-value is that it

play03:21

is not

play03:23

provided with an n because supposedly

play03:26

the

play03:26

end comes out of it hence the problem

play03:29

we're going to see now

play03:30

let me start from the beginning what is

play03:32

p value

play03:34

i have observation x 1 x n

play03:37

and let's say i want to see if the mean

play03:40

the mean x bar if the mean

play03:43

is uh different

play03:46

from say zero i'm gonna simplify

play03:49

okay you can use something different

play03:53

than zero let me simplify

play03:54

so the statistic is going to be x bar

play03:58

minus zero over sigma the

play04:01

standard deviation of the elevation

play04:03

square root of n

play04:06

okay so

play04:11

that metric let's call it the z okay

play04:15

the z is going to be the variable under

play04:19

consideration

play04:20

and the p value is probability

play04:23

okay that

play04:26

z higher than z

play04:29

in other words probability of exceeding

play04:31

c

play04:33

now you have the illusion that square

play04:36

root of n

play04:37

is a scaling here that takes out the

play04:40

subtleties of the problem but in fact it

play04:42

does not

play04:43

i'm going to see what the result is

play04:46

going to be like

play04:47

in fact what we're considering is the

play04:49

distribution

play04:51

assuming the mean of zero okay

play04:55

of z higher than that number

play04:58

and that is the p-value okay

play05:02

so to re-simplify

play05:06

i'm assuming that that i'm taking

play05:10

the exceedance probability or so-called

play05:13

survival

play05:14

uh survivor survival function

play05:17

the probability exceeding a certain

play05:18

number using the distribution as the

play05:21

mean was zero

play05:22

and that number can be 0.05 0.06

play05:26

0.1 uh

play05:30

0 0 1 etc now

play05:33

what is the problem it turns out

play05:38

that the probability p value

play05:42

probability of exceeding z is itself

play05:45

stochastic

play05:50

so what do i mean by stochastic this is

play05:53

this distribution

play05:55

and it's going to look like this

play05:58

this is this true mean

play06:02

mean value and it's going to have

play06:05

observations

play06:05

below the mean in fact most observations

play06:08

are gonna be below the mean

play06:11

so and to give you actually the the

play06:14

difficulties of this and you can get it

play06:17

in the book it's page

play06:18

339 chapter

play06:25

chapter 19 okay in my book or download

play06:28

it from archive

play06:30

let's say the mean is about

play06:33

now 11. the true p-value is eleven if

play06:35

you do it a million times

play06:36

on the same ensemble of things you get

play06:38

point eleven

play06:40

the probability the p-value probability

play06:43

that a p-value

play06:45

less than 0.01

play06:48

is 25 percent and we've got a p-value

play06:51

less than

play06:52

0.05 53

play06:58

so part one just discussed

play07:02

p-value is stochastic it tends to be so

play07:05

skewed that most value will be below the

play07:07

true mean and if the true mean is 11

play07:11

you're going to have more than half the

play07:12

observation below 0.05

play07:15

starting with that things get worse how

play07:18

how do people game a metric

play07:22

okay you run experiments

play07:28

experiment one and you take the maximum

play07:33

the maximum okay

play07:36

so the distribution of the maximum is

play07:39

gonna be lower

play07:40

you see so odds are that if you do four

play07:42

or five times

play07:44

you're going to get a p-value of 0.01

play07:46

with something

play07:49

with a true p-value point 11.

play07:52

now you may tell me point 11 is better

play07:54

than random i say yes

play07:56

but your 0.01 is not 0.01

play08:00

which is why for it to not be gained

play08:04

the p-value should be considerably

play08:06

smaller

play08:08

than 1 so we're off by more than other

play08:12

magnitude

play08:13

so let me repeat

play08:17

p-value is a metric that metric

play08:20

is not thought as being stochastic but

play08:24

it is

play08:24

stochastic and the fact that the end

play08:28

number of observation comes out

play08:30

of the equation is uh

play08:34

you know what makes it a bad bad metric

play08:37

in fact not only it is stochastic but

play08:39

it's very stochastic

play08:41

as you can see most of the elevation

play08:43

fall below its true mean

play08:45

so beware when you do p-values try to do

play08:48

something else

play08:49

and it looks like the only people you

play08:51

know who like

play08:53

p-value are psychologists because

play08:56

they avoid having to do homework by

play08:58

having a larger uh

play09:00

uh n in in in medicine ends tend to be

play09:03

large

play09:04

uh and psychologists tend to be small

play09:06

usually college people on campus first

play09:08

of all i think

play09:10

psychology has a bigger problem p-value

play09:11

is pretty much the quantitative approach

play09:14

to it is severely flaws flawed so

play09:17

thank you for listening to me and have a

play09:19

nice whatever it's

play09:20

so i think it's the weekend here it's a

play09:22

great great day thanks

play09:24

have a good day thanks bye

Rate This
β˜…
β˜…
β˜…
β˜…
β˜…

5.0 / 5 (0 votes)

Related Tags
P-ValueStatisticsResearchCorrelationStochasticSignificanceSurvivorship BiasLaw of Large NumbersRandom VariablesStatistical FlawCritical Analysis