StatQuest: One or Two Tailed P-Values
Summary
TLDRIn this Stat Quest, brought to you by the Genetics Department at UNC Chapel Hill, the focus is on one-tailed vs. two-tailed tests in statistics. Using a cancer treatment trial as an example, the video explains that a one-tailed test shows a p-value of 0.03, indicating significance, while a two-tailed test shows 0.06, not significant. The importance of choosing the test type before the experiment to avoid false positives is emphasized. The video concludes that a two-tailed test is generally preferable to get a complete picture of the data.
Takeaways
- 📊 One-tailed tests focus on whether one treatment is better than another, while two-tailed tests evaluate if there is any difference, either better or worse.
- 🔬 In clinical trials, it's crucial to decide on the type of test and p-value threshold before conducting the experiment to avoid bias.
- 📉 A one-tailed test may produce a smaller p-value because it doesn't account for the possibility of the new treatment being worse.
- ⚖️ Good statistical practice involves using a two-tailed test to get a complete picture of the treatment's effectiveness.
- 🛑 Waiting to choose the type of test until after seeing the data can lead to p-hacking and false positives.
- 🔍 A false positive occurs when a test indicates a significant result when there is none, which can be influenced by the type of test used.
- 📈 Using a one-tailed test inappropriately can increase the probability of reporting false positives from 5% to 8%.
- 💡 Always aim to understand both sides of the data story, not just the side that seems favorable.
- 🔎 Some statistical tests do not offer a choice between one-tailed and two-tailed, but when given a choice, the two-tailed test is generally preferable.
- 🎉 The end message is clear: for accurate and unbiased results, always use a two-tailed test when you have the option.
Q & A
What is the primary topic discussed in the transcript?
-The primary topic discussed is the difference between one-tailed and two-tailed tests in statistical analysis and when to use each type.
Why might someone choose to use a one-tailed test?
-A one-tailed test is used when the hypothesis specifically predicts that one treatment or condition will have a better outcome than another.
What p-value did the one-tailed test yield in the example given?
-The one-tailed test yielded a p-value of 0.03.
What p-value did the two-tailed test yield in the example given?
-The two-tailed test yielded a p-value of 0.06.
Why is the two-tailed p-value generally preferred over the one-tailed p-value?
-The two-tailed p-value is preferred because it tests whether the new treatment is better, worse, or not significantly different from the standard treatment, providing a more comprehensive analysis.
What is the risk associated with deciding the type of test after seeing the data?
-Deciding the type of test after seeing the data can lead to 'p-hacking,' increasing the probability of reporting false positives.
What is a false positive in the context of statistical testing?
-A false positive occurs when the test indicates a significant effect when there is none, typically expected to happen 5% of the time in a two-tailed test.
How did the false positive rate change when switching to a one-tailed test after seeing favorable data?
-The false positive rate increased from 5% to 8% when switching to a one-tailed test after seeing favorable data.
What lesson does the transcript emphasize regarding the choice of statistical tests?
-The transcript emphasizes the importance of deciding which statistical test to use before conducting the experiment to avoid p-hacking and ensure accurate results.
What is the recommended practice when you have a choice between a one-tailed and a two-tailed test?
-The recommended practice is to always choose a two-tailed test to fully understand both sides of the data and avoid biased results.
Outlines
🧬 Introduction to One vs. Two-Tailed Tests
The video, presented by the University of North Carolina at Chapel Hill's genetics department, introduces the concept of one-tailed and two-tailed tests. It discusses a clinical trial for a new cancer treatment and compares the effectiveness of the new treatment against the standard treatment using both tests. The one-tailed test, which hypothesizes the new treatment is better, yields a significant p-value of 0.03. In contrast, the two-tailed test, which checks if the new treatment is better, worse, or not different, gives a non-significant p-value of 0.06. The speaker emphasizes the importance of choosing the right test beforehand to avoid biased results.
📊 The Importance of Pre-Determined Statistical Tests
The video continues to explain the consequences of deciding on the type of test after seeing the data. It describes an experiment with a standard normal distribution, illustrating how pre-selecting a two-tailed test usually avoids false positives, while switching to a one-tailed test when results seem favorable increases the rate of false positives from 5% to 8%. This practice, known as 'p-hacking,' undermines the integrity of the results. The video concludes by stressing that for accurate and unbiased statistical analysis, one should always use a two-tailed test when given the choice, as it provides a complete view of the data.
Mindmap
Keywords
💡One-tailed test
💡Two-tailed test
💡P-value
💡False positive
💡Clinical trial
💡Significance level
💡Normal distribution
💡Statistical hypothesis
💡Statistical significance
💡P-hacking
Highlights
Introduction to the concept of one-tailed versus two-tailed tests.
Illustration using a hypothetical cancer treatment scenario to explain the tests.
Explanation of one-tailed (one-sided) tests providing a p-value of 0.03.
Explanation of two-tailed (two-sided) tests giving a p-value of 0.06.
Clarification that one-tailed tests only determine if the new treatment is better, not worse.
Two-tailed tests assess if the treatment is better, worse, or not significantly different.
Importance of using two-tailed tests to understand if a new treatment could be worse.
Discussion on the dangers of 'p-hacking' and choosing tests after seeing the data.
Explanation of false positives and their occurrence rate of about 5% in two-tailed tests.
Analysis showing that switching to one-tailed tests increases false positives to 8%.
Visual representation of false positives using histograms for two-tailed and one-tailed tests.
Emphasis on the need to predefine tests before conducting experiments to avoid bias.
Conclusion that for most cases, especially where outcomes can be better or worse, a two-tailed test is recommended.
Note that not all statistical tests offer a choice between one-tailed and two-tailed, and in such cases, the default should be used.
Final advice to always select two-tailed tests when the option is available to get a comprehensive understanding of the data.
Transcripts
Steff quest stat quest stat quest stat
quest hello and welcome to stat quest
stat quest is brought to you by the
friendly folks in the genetics
department at the University of North
Carolina at Chapel Hill today we're
going to be talking about one versus
two-tailed tests people frequently ask
me which one they should use so I'm
gonna settle the matter once and for all
right here with this stat quest imagine
you've got a new cancer treatment you
hope that people do better with your new
treatment than the standard treatment
you do a small clinical trial on six
patients and here's your data the red
dots represent people that took your new
treatment and the black dots represent
people that took the standard treatment
the values range from better to worse
the data suggests that people who use
your new treatment do better than people
on the standard treatment however there
is a little bit of ambiguity in the
results so you run the stats a
one-tailed or one-sided test gives you a
p-value of 0.03 awesome 0.03 is smaller
than that pesky 0.05 cutoff that we
usually use to determine significance
a two-tailed or two-sided test gives you
a p-value of 0.06
DAG not so awesome
which p-value should you use the
one-tailed p-value tests the hypothesis
that your treatment is better than the
standard treatment great that's what we
wanted right it's certainly tempting but
let's not jump to conclusions before
learning about the two-tailed p-value
the two-tailed p-value tests whether the
new treatment is better worse or not
significantly different
the one-tailed p-value is smaller
because it doesn't distinguish between
worse and not significantly different
since we'd want to know if our new
treatment was worse than the standard
treatment we should use the two-tailed
p-value but wait doesn't the data being
skewed towards the new method being
better suggests we don't need to test if
it is worse no good statistical practice
means we need to decide what tests and
what p-value we want to use before we do
the experiment
otherwise were pee hacking this
increases the probability that we will
report bogus results let's see why this
is I started with a standard normal
distribution the x-axis represents
measurements from small to large the
y-axis represents the probability that
I'll get certain measurements most of
the time I should get measurements in
the middle but every now and then I'll
get a really small measurement or a
really big one then I took a sample from
this distribution that means a computer
picked three numbers that had a high
likelihood of being from the center of
the distribution but every now and then
one of them might be really small or
really big I then took another sample
from the exact same distribution in most
cases a two-tailed t-test on these two
samples should give me a p-value greater
than 0.05 this is because most of the
time the samples will overlap but every
now and then the samples will not
overlap and the t-test will give me a
p-value less than 0.05 this is called a
false positive
it happens 5% of the time
I did 10,000 two-tailed t-test on data
like this
percent of 10,000 equals 500 so I was
expecting 500 false positives here's a
histogram of the p-values the blue line
shows that each bin contains about 500
tests these are the false positives the
tests with p-values less than 0.05 we
pretty much got what we expected there
were close to 500 false positives then I
changed things to mimic switching to a
one tailed test when things looked good
if sample number one had two or more
values that were less than all of the
values in sample number two then I used
a one-tailed t-test since these two
values are less than all of the values
in sample number two I used a one-tailed
t-test on this data set here's a
histogram of the new p-values the blue
line shows the expected number of
p-values per bin there are now close to
800 false positives the chance of
reporting a false positive went from 5
percent to 8 percent even though we're
using 0.05 as the threshold for
significance thus you can't wait till
you see the data to decide you want to
use a one-tailed p-value so let's take a
step back to before we did the
experiment
what do we want to learn from it with a
cancer treatment it's obvious we must
learn if it improves things or makes
them worse
but really it's the same for all data
that has the option for a 1 or
two-tailed p-value we always want to
know both sides of the story not just
one so when you have a choice always go
with a two-tailed p-value note not all
statistical tests have a choice in that
case don't worry about it
hooray we've made it to the end we now
know that when we have a choice
we should always select a two-tailed
test tune in next time for another
exciting stat quest
Ver Más Videos Relacionados
How to calculate One Tail and Two Tail Tests For Hypothesis Testing.
Uji Hipotesis part 2 (Prosedur Pengujian Hipotesis, Statistik Uji, Wilayah Tolak/ Kritis)
uji hipotesis rata-rata 2 populasi (sampel independen)
T-test, ANOVA and Chi Squared test made easy.
T-Tests: A Matched Pair Made in Heaven: Crash Course Statistics #27
MINI-LESSON 3: The Law of Large Numbers. A very intuitive introduction.
5.0 / 5 (0 votes)