StatQuest: One or Two Tailed P-Values

StatQuest with Josh Starmer
24 Apr 201707:05

Summary

TLDRIn this Stat Quest, brought to you by the Genetics Department at UNC Chapel Hill, the focus is on one-tailed vs. two-tailed tests in statistics. Using a cancer treatment trial as an example, the video explains that a one-tailed test shows a p-value of 0.03, indicating significance, while a two-tailed test shows 0.06, not significant. The importance of choosing the test type before the experiment to avoid false positives is emphasized. The video concludes that a two-tailed test is generally preferable to get a complete picture of the data.

Takeaways

  • 📊 One-tailed tests focus on whether one treatment is better than another, while two-tailed tests evaluate if there is any difference, either better or worse.
  • 🔬 In clinical trials, it's crucial to decide on the type of test and p-value threshold before conducting the experiment to avoid bias.
  • 📉 A one-tailed test may produce a smaller p-value because it doesn't account for the possibility of the new treatment being worse.
  • ⚖️ Good statistical practice involves using a two-tailed test to get a complete picture of the treatment's effectiveness.
  • 🛑 Waiting to choose the type of test until after seeing the data can lead to p-hacking and false positives.
  • 🔍 A false positive occurs when a test indicates a significant result when there is none, which can be influenced by the type of test used.
  • 📈 Using a one-tailed test inappropriately can increase the probability of reporting false positives from 5% to 8%.
  • 💡 Always aim to understand both sides of the data story, not just the side that seems favorable.
  • 🔎 Some statistical tests do not offer a choice between one-tailed and two-tailed, but when given a choice, the two-tailed test is generally preferable.
  • 🎉 The end message is clear: for accurate and unbiased results, always use a two-tailed test when you have the option.

Q & A

  • What is the primary topic discussed in the transcript?

    -The primary topic discussed is the difference between one-tailed and two-tailed tests in statistical analysis and when to use each type.

  • Why might someone choose to use a one-tailed test?

    -A one-tailed test is used when the hypothesis specifically predicts that one treatment or condition will have a better outcome than another.

  • What p-value did the one-tailed test yield in the example given?

    -The one-tailed test yielded a p-value of 0.03.

  • What p-value did the two-tailed test yield in the example given?

    -The two-tailed test yielded a p-value of 0.06.

  • Why is the two-tailed p-value generally preferred over the one-tailed p-value?

    -The two-tailed p-value is preferred because it tests whether the new treatment is better, worse, or not significantly different from the standard treatment, providing a more comprehensive analysis.

  • What is the risk associated with deciding the type of test after seeing the data?

    -Deciding the type of test after seeing the data can lead to 'p-hacking,' increasing the probability of reporting false positives.

  • What is a false positive in the context of statistical testing?

    -A false positive occurs when the test indicates a significant effect when there is none, typically expected to happen 5% of the time in a two-tailed test.

  • How did the false positive rate change when switching to a one-tailed test after seeing favorable data?

    -The false positive rate increased from 5% to 8% when switching to a one-tailed test after seeing favorable data.

  • What lesson does the transcript emphasize regarding the choice of statistical tests?

    -The transcript emphasizes the importance of deciding which statistical test to use before conducting the experiment to avoid p-hacking and ensure accurate results.

  • What is the recommended practice when you have a choice between a one-tailed and a two-tailed test?

    -The recommended practice is to always choose a two-tailed test to fully understand both sides of the data and avoid biased results.

Outlines

00:00

🧬 Introduction to One vs. Two-Tailed Tests

The video, presented by the University of North Carolina at Chapel Hill's genetics department, introduces the concept of one-tailed and two-tailed tests. It discusses a clinical trial for a new cancer treatment and compares the effectiveness of the new treatment against the standard treatment using both tests. The one-tailed test, which hypothesizes the new treatment is better, yields a significant p-value of 0.03. In contrast, the two-tailed test, which checks if the new treatment is better, worse, or not different, gives a non-significant p-value of 0.06. The speaker emphasizes the importance of choosing the right test beforehand to avoid biased results.

05:03

📊 The Importance of Pre-Determined Statistical Tests

The video continues to explain the consequences of deciding on the type of test after seeing the data. It describes an experiment with a standard normal distribution, illustrating how pre-selecting a two-tailed test usually avoids false positives, while switching to a one-tailed test when results seem favorable increases the rate of false positives from 5% to 8%. This practice, known as 'p-hacking,' undermines the integrity of the results. The video concludes by stressing that for accurate and unbiased statistical analysis, one should always use a two-tailed test when given the choice, as it provides a complete view of the data.

Mindmap

Keywords

💡One-tailed test

A one-tailed test, also known as a one-sided test, is a statistical hypothesis test in which the region of rejection is on only one side of the sampling distribution. In the video, it is used to test if a new cancer treatment is better than the standard treatment, giving a p-value of 0.03, which is less than the 0.05 cutoff, indicating significance if the test is one-tailed.

💡Two-tailed test

A two-tailed test, or two-sided test, assesses whether a sample is either greater than or less than a certain range of values. It tests for the possibility of an effect in two directions, whether the new treatment is better or worse than the standard. The video explains that a two-tailed test gives a p-value of 0.06, which is not significant, emphasizing the importance of testing both directions for accurate results.

💡P-value

The p-value measures the strength of the evidence against the null hypothesis. A lower p-value indicates stronger evidence against the null hypothesis. In the video, p-values of 0.03 and 0.06 are compared, demonstrating the impact of choosing between one-tailed and two-tailed tests on the interpretation of statistical significance.

💡False positive

A false positive occurs when a test incorrectly indicates the presence of a condition (such as a significant effect) when it is not actually present. The video shows that switching from a two-tailed to a one-tailed test can increase the rate of false positives from 5% to 8%, highlighting the importance of pre-defining the type of test before conducting experiments.

💡Clinical trial

A clinical trial is a research study conducted with human participants to evaluate the effects of medical, surgical, or behavioral interventions. In the video, a small clinical trial with six patients is used to compare a new cancer treatment with the standard treatment, illustrating how statistical tests are applied in medical research.

💡Significance level

The significance level, often denoted as alpha (α), is the threshold used to determine whether a p-value indicates a statistically significant result. Commonly set at 0.05, it represents a 5% risk of concluding that a difference exists when there is no actual difference. The video uses 0.05 as the cutoff for significance in its examples.

💡Normal distribution

The normal distribution is a bell-shaped curve that represents the distribution of many types of data; most values cluster around a central mean. The video uses a standard normal distribution to explain how data points and p-values are derived and the concept of false positives in statistical testing.

💡Statistical hypothesis

A statistical hypothesis is a specific, testable prediction about the relationship between two or more variables. In the video, hypotheses about the effectiveness of a new cancer treatment versus the standard treatment are tested using one-tailed and two-tailed tests to determine if there is a significant difference.

💡Statistical significance

Statistical significance indicates that the observed results are unlikely to have occurred by chance alone, according to a predefined significance level (e.g., 0.05). The video discusses how different p-values (0.03 and 0.06) from one-tailed and two-tailed tests affect the conclusion about the effectiveness of a new cancer treatment.

💡P-hacking

P-hacking refers to manipulating data analysis until statistically significant results are obtained, often by trying multiple statistical tests or selectively reporting outcomes. The video warns against p-hacking by emphasizing the importance of deciding on the type of test and significance level before conducting an experiment to avoid biased results.

Highlights

Introduction to the concept of one-tailed versus two-tailed tests.

Illustration using a hypothetical cancer treatment scenario to explain the tests.

Explanation of one-tailed (one-sided) tests providing a p-value of 0.03.

Explanation of two-tailed (two-sided) tests giving a p-value of 0.06.

Clarification that one-tailed tests only determine if the new treatment is better, not worse.

Two-tailed tests assess if the treatment is better, worse, or not significantly different.

Importance of using two-tailed tests to understand if a new treatment could be worse.

Discussion on the dangers of 'p-hacking' and choosing tests after seeing the data.

Explanation of false positives and their occurrence rate of about 5% in two-tailed tests.

Analysis showing that switching to one-tailed tests increases false positives to 8%.

Visual representation of false positives using histograms for two-tailed and one-tailed tests.

Emphasis on the need to predefine tests before conducting experiments to avoid bias.

Conclusion that for most cases, especially where outcomes can be better or worse, a two-tailed test is recommended.

Note that not all statistical tests offer a choice between one-tailed and two-tailed, and in such cases, the default should be used.

Final advice to always select two-tailed tests when the option is available to get a comprehensive understanding of the data.

Transcripts

play00:00

Steff quest stat quest stat quest stat

play00:08

quest hello and welcome to stat quest

play00:12

stat quest is brought to you by the

play00:14

friendly folks in the genetics

play00:16

department at the University of North

play00:18

Carolina at Chapel Hill today we're

play00:21

going to be talking about one versus

play00:23

two-tailed tests people frequently ask

play00:26

me which one they should use so I'm

play00:29

gonna settle the matter once and for all

play00:30

right here with this stat quest imagine

play00:34

you've got a new cancer treatment you

play00:37

hope that people do better with your new

play00:39

treatment than the standard treatment

play00:41

you do a small clinical trial on six

play00:44

patients and here's your data the red

play00:48

dots represent people that took your new

play00:52

treatment and the black dots represent

play00:55

people that took the standard treatment

play00:57

the values range from better to worse

play01:02

the data suggests that people who use

play01:05

your new treatment do better than people

play01:07

on the standard treatment however there

play01:11

is a little bit of ambiguity in the

play01:13

results so you run the stats a

play01:17

one-tailed or one-sided test gives you a

play01:21

p-value of 0.03 awesome 0.03 is smaller

play01:29

than that pesky 0.05 cutoff that we

play01:32

usually use to determine significance

play01:36

a two-tailed or two-sided test gives you

play01:40

a p-value of 0.06

play01:44

DAG not so awesome

play01:48

which p-value should you use the

play01:52

one-tailed p-value tests the hypothesis

play01:55

that your treatment is better than the

play01:58

standard treatment great that's what we

play02:01

wanted right it's certainly tempting but

play02:06

let's not jump to conclusions before

play02:08

learning about the two-tailed p-value

play02:11

the two-tailed p-value tests whether the

play02:15

new treatment is better worse or not

play02:18

significantly different

play02:20

the one-tailed p-value is smaller

play02:23

because it doesn't distinguish between

play02:25

worse and not significantly different

play02:28

since we'd want to know if our new

play02:31

treatment was worse than the standard

play02:33

treatment we should use the two-tailed

play02:36

p-value but wait doesn't the data being

play02:41

skewed towards the new method being

play02:43

better suggests we don't need to test if

play02:46

it is worse no good statistical practice

play02:51

means we need to decide what tests and

play02:55

what p-value we want to use before we do

play02:58

the experiment

play03:00

otherwise were pee hacking this

play03:03

increases the probability that we will

play03:06

report bogus results let's see why this

play03:10

is I started with a standard normal

play03:13

distribution the x-axis represents

play03:17

measurements from small to large the

play03:21

y-axis represents the probability that

play03:23

I'll get certain measurements most of

play03:27

the time I should get measurements in

play03:29

the middle but every now and then I'll

play03:32

get a really small measurement or a

play03:34

really big one then I took a sample from

play03:37

this distribution that means a computer

play03:41

picked three numbers that had a high

play03:43

likelihood of being from the center of

play03:46

the distribution but every now and then

play03:49

one of them might be really small or

play03:50

really big I then took another sample

play03:54

from the exact same distribution in most

play03:59

cases a two-tailed t-test on these two

play04:02

samples should give me a p-value greater

play04:05

than 0.05 this is because most of the

play04:09

time the samples will overlap but every

play04:13

now and then the samples will not

play04:15

overlap and the t-test will give me a

play04:18

p-value less than 0.05 this is called a

play04:23

false positive

play04:24

it happens 5% of the time

play04:28

I did 10,000 two-tailed t-test on data

play04:32

like this

play04:35

percent of 10,000 equals 500 so I was

play04:40

expecting 500 false positives here's a

play04:44

histogram of the p-values the blue line

play04:49

shows that each bin contains about 500

play04:52

tests these are the false positives the

play04:57

tests with p-values less than 0.05 we

play05:02

pretty much got what we expected there

play05:05

were close to 500 false positives then I

play05:09

changed things to mimic switching to a

play05:12

one tailed test when things looked good

play05:15

if sample number one had two or more

play05:19

values that were less than all of the

play05:21

values in sample number two then I used

play05:24

a one-tailed t-test since these two

play05:28

values are less than all of the values

play05:31

in sample number two I used a one-tailed

play05:34

t-test on this data set here's a

play05:38

histogram of the new p-values the blue

play05:42

line shows the expected number of

play05:44

p-values per bin there are now close to

play05:48

800 false positives the chance of

play05:52

reporting a false positive went from 5

play05:55

percent to 8 percent even though we're

play05:58

using 0.05 as the threshold for

play06:01

significance thus you can't wait till

play06:04

you see the data to decide you want to

play06:07

use a one-tailed p-value so let's take a

play06:10

step back to before we did the

play06:12

experiment

play06:15

what do we want to learn from it with a

play06:18

cancer treatment it's obvious we must

play06:21

learn if it improves things or makes

play06:23

them worse

play06:25

but really it's the same for all data

play06:28

that has the option for a 1 or

play06:30

two-tailed p-value we always want to

play06:34

know both sides of the story not just

play06:37

one so when you have a choice always go

play06:42

with a two-tailed p-value note not all

play06:45

statistical tests have a choice in that

play06:48

case don't worry about it

play06:51

hooray we've made it to the end we now

play06:55

know that when we have a choice

play06:57

we should always select a two-tailed

play06:59

test tune in next time for another

play07:02

exciting stat quest

Rate This

5.0 / 5 (0 votes)

関連タグ
Statistical TestsData AnalysisP-valuesClinical TrialsOne-TailedTwo-TailedFalse PositivesSignificanceCancer TreatmentStat Quest
英語で要約が必要ですか?