T-Tests: A Matched Pair Made in Heaven: Crash Course Statistics #27
Summary
TLDRIn this Crash Course Statistics episode, Adriene Hill explores the concept of test statistics through a coffee tasting experiment. Comparing two local coffee shops, she uses a two-sample t-test to determine if there's a significant difference in scores. Initially, no significant difference is found, but after considering individual preferences, a paired t-test reveals a preference for one shop. The lesson emphasizes the importance of controlling for variability in experiments and the power of statistical tests to reveal patterns amidst noise.
Takeaways
- 📊 The script introduces the concept of test statistics and how they can be generalized for different situations without memorizing a new formula for each.
- 🔍 It uses a coffee tasting experiment to illustrate the process of hypothesis testing, comparing two coffee shops, Caf-fiend and The Blend Den.
- 📈 The experiment involves a two-sample t-test, which is also known as an independent or unpaired t-test, to determine if there's a significant difference in the mean scores of coffee ratings.
- 🎯 The null hypothesis is that there is no difference between the two coffee shops, while the alternative hypothesis suggests that one is better than the other.
- 📉 The standard error is used as a measure of average variation in the two-sample t-test, taking into account the sample variance of both groups.
- 📋 The script explains how to calculate the critical t-value and p-value to determine statistical significance, using an alpha level of 0.05.
- 🔎 It discusses the limitations of the initial experiment, such as not accounting for individual preferences for coffee, which can introduce variability.
- 🔄 The concept of a paired t-test is introduced as a solution to control for individual differences in coffee preference by comparing scores from the same individuals.
- 📝 The paired t-test involves creating difference scores for each person, which allows for the comparison of the two coffees while controlling for individual preferences.
- 🔑 The script emphasizes the importance of reducing variability in experiments to better identify true differences and patterns.
- 📚 It concludes by highlighting that statistical tests help in understanding and comparing observed data with expected outcomes, a fundamental principle in statistics.
Q & A
What is the main topic discussed in this Crash Course Statistics episode?
-The main topic discussed in this episode is the application of the general formula for creating test statistics in various situations, with a focus on how to use statistical tests to determine if there is a significant difference between two groups, exemplified by comparing coffee scores from two shops.
Why is it important not to have to memorize a new formula for every statistical test?
-It's important not to memorize a new formula for every statistical test because it would be inefficient and overwhelming. The general formula for test statistics allows for adaptability across different situations, making the process of statistical analysis more streamlined and manageable.
What is the null hypothesis in the coffee tasting experiment described in the script?
-The null hypothesis in the coffee tasting experiment is that there is no difference between the two coffee shops, meaning the mean scores for coffee from Caf-fiend and The Blend Den are the same.
What statistical test is used to analyze the difference in coffee scores between Caf-fiend and The Blend Den?
-A two-sample t-test, also known as an independent or unpaired t-test, is used to analyze the difference in coffee scores between Caf-fiend and The Blend Den.
How is the standard error calculated for a two-sample t-test?
-The standard error for a two-sample t-test is calculated by taking into account the sample variances of the two groups and their sizes (n1 and n2). It involves squaring the standard deviation to get the variance and then using the formula specific for two-sample t-tests.
What are the two methods mentioned to determine if there is a statistically significant difference between the two coffee shops?
-The two methods mentioned are calculating the critical t-value and comparing it with the t-statistic, and calculating the p-value from the t-statistic and comparing it with the chosen alpha level.
Why is an alpha level of 0.05 commonly used in statistical tests?
-An alpha level of 0.05 is commonly used because it means that in the long run, only 5% of tests done on groups with no real difference will incorrectly reject the null hypothesis, providing a balance between avoiding false positives and detecting true effects.
What is the significance of the p-value in the context of this coffee experiment?
-The p-value in the context of the coffee experiment represents the probability of observing a difference in sample means as extreme as the one observed, assuming the null hypothesis is true. A p-value greater than the alpha level indicates a failure to reject the null hypothesis, suggesting no significant difference between the coffee shops.
Why is a paired t-test considered a better approach for the coffee experiment after considering individual preferences?
-A paired t-test is considered better because it accounts for individual preferences by comparing the scores of the same individuals for both coffees, thus controlling for variability due to personal coffee affinity and making it easier to detect true differences between the coffees.
What does the t-statistic of -3.212 in the paired t-test indicate about the difference in coffee scores?
-The t-statistic of -3.212 in the paired t-test indicates that the observed mean difference in scores is 3.212 standard errors away from the expected mean difference under the null hypothesis, suggesting a statistically significant difference between the coffees from Caf-fiend and The Blend Den.
How does the script illustrate the concept that 'absence of evidence is not evidence of absence'?
-The script illustrates the concept by showing that even though the initial experiment did not find a significant difference between the coffee shops, it doesn't mean there is no difference. It could be due to variability in the sample or other factors not accounted for, hence the need for a more controlled experiment like the paired t-test.
Outlines
📊 Introduction to Test Statistics and Coffee Experiment
In this segment, Adriene Hill introduces the concept of test statistics and their application in various scenarios, emphasizing the adaptability of the general formula. She uses the example of a coffee taste test between two popular coffee shops, Caf-fiend and The Blend Den, to explain the process. A random sample of 16 people taste coffee from both shops without knowing the source, resulting in mean scores of 7.6 for Caf-fiend and 7.9 for The Blend Den. Adriene highlights the need to define null and alternative hypotheses before conducting a two-sample t-test to determine if the observed difference is statistically significant. The null hypothesis posits no difference between the shops, while the alternative suggests one is better. The t-test formula is introduced, and the observed difference, standard error, and critical t-values are discussed, leading to the conclusion that there's no significant difference between the two coffee shops based on the sample data.
🔍 Deep Dive into P-Values and Paired T-Test
This paragraph delves into the interpretation of p-values and the concept of a paired t-test. Adriene explains that a p-value of 0.6684 indicates that the observed difference in coffee scores could occur by random chance 66.84% of the time if there were no true difference between the coffee shops. This leads to the decision not to reject the null hypothesis based on the initial experiment. However, recognizing potential variability due to individual coffee preferences, Adriene suggests a paired t-test as an improved method. In this approach, each participant rates both coffees, effectively controlling for individual preferences. The mean difference score and standard error of the difference scores are calculated, resulting in a t-statistic of -3.212. This value, being far from the mean in a t-distribution, corresponds to a very small p-value (0.00582), leading to the rejection of the null hypothesis and the conclusion that there is a statistically significant difference in preference for The Blend Den's coffee over Caf-fiend's.
🧐 The Essence of Statistical Testing and Experiment Design
In the final paragraph, Adriene Hill wraps up the discussion by emphasizing the core of statistical testing: comparing observed data to expected outcomes under the null hypothesis. She stresses that the various formulas encountered in statistics are fundamentally about this comparison, aiming to reduce variability to identify true patterns. The paragraph concludes with a call to action, encouraging viewers to use the tools of statistical testing to design experiments and answer intriguing questions, even if it involves the occasional over-caffeination of friends in the process.
Mindmap
Keywords
💡Test Statistics
💡Null Hypothesis
💡Alternative Hypothesis
💡Standard Error
💡Two-Sample T-Test
💡Critical T-Value
💡P-Value
💡Alpha Level
💡Matched Pairs T-Test
💡Variability
💡Statistical Significance
Highlights
Introduction to the logic of test statistics and the general formula for creating them.
Explanation of how to adapt the general formula for various situations without memorizing new formulas for each.
The importance of defining null and alternative hypotheses in statistical testing.
Description of a practical experiment to determine the best coffee shop in town using statistical methods.
Use of a two-sample t-test, also known as an independent or unpaired t-test, to compare coffee scores.
Calculation of the standard error for two groups and its role in the t-test formula.
Decision-making process using critical t-values and the concept of a two-tailed test.
Interpretation of the t-statistic and its comparison to critical values to determine statistical significance.
Introduction to the concept of p-values and their use in hypothesis testing.
The choice of alpha level in statistical testing and its implications.
Discussion on the limitations of random assignment and the introduction of variability in experiments.
Proposal of a paired t-test as a solution to account for individual differences in coffee preferences.
Explanation of how to calculate difference scores and use them in a matched pairs t-test.
Interpretation of t-statistics in terms of standard errors and their significance in hypothesis testing.
Rejection of the null hypothesis based on a small p-value, indicating a significant difference between coffee shops.
Practical application of statistical tests to reduce variability and identify patterns in data.
Final thoughts on the unifying nature of statistical formulas and their role in comparing observed and expected outcomes.
Transcripts
Hi, I’m Adriene Hill, and welcome back to Crash Course Statistics.
In the last episode we dove into the logic surrounding test statistics and talked about
a general formula that allows us to create them for lots different situations.
There are so many questions we might want to answer, and it would be rough if we had
to memorize a new formula for EVERY Single One.
And sometimes Statistics is taught in a way that makes it seem like there’s a different
formula you need to know if you want to test whether your bus is late more often than the
average bus in your town.
Or if burns treated with aloe heal faster than those that are left alone.
But! Hah-zah.
We can adapt the general formula...in all sorts of situations.
INTRO
Let’s say that you just moved to a new place, and you’re looking for the BEST coffee in town.
Since you’ve been watching Crash Course Statistics, you decide to do a little impromptu experiment.
Word on the street is there are two really popular coffee places near you, Caf-fiend
and The Blend Den.
So one Sunday after brunch, you grab a random sample of 16 of your new friends, and randomly
give half of them an unmarked cup with coffee from Caf-fiend, and the other half an unmarked
cup with coffee from The Blend Den.
You made sure to get the same roast--dark--to keep things as even as possible.
After delicate sniffs and sips of coffee in a process known as “cupping”, the tallies are in.
On a scale of 1 to 10, Caf-fiend got a mean score of 7.6 and The Blend Den got a mean
score of 7.9
So we observe a difference between the coffee scores.
Coffee from Caf-fiend scored 0.3 points lower than Coffee from The Blend Den.
So coffee from The Blend Den is better?
Right?
Done and done.
Nope not yet.
Maybe it’s just random chance.
So first we need to define our null.
There’s no difference between the two coffee shops.
And then our alternative hypothesis, that there is a difference.
One is better than the other.
In this case, we’re interested in whether the mean scores for coffee are different between
Caf-fiend and The Blend Den.
With a little algebra, we can see that this is the same thing as asking whether the difference
between the two means is not zero.
Now that we have our hypotheses, we can do a t-test.
Specifically, we’ll do a two sample t-test, also called an independent or unpaired t-test.
The formula for a two sample t-test follows our general test statistic formula:
The difference we observed is 0.3.
If the null hypothesis were true and there’s no difference between the coffee shops, we’d
expect a difference of 0.
So the numerator of our t-test is 0.3.
For this kind of t-test, our measure of average variation is the standard error.
For two groups, the standard error is calculated a bit differently since we have to account
for the sample variance of two groups.
Here, we’re squaring the standard deviation to get the variance and n1 and n2 are the
sizes of the two groups--both are 8 here.
Now that we have our t-value, we can figure out if there’s a statistically significant
difference between the two coffee shops and there are two ways to do this.
We can calculate the critical t-value and if our t-statistic is GREATER than the critical
value we reject the null hypothesis.
Or we can calculate the p-value from our t-statistic and we can reject the null hypothesis if the
p-value is SMALLER than our chosen alpha level.
To do either of these things, we’ll need to choose our alpha level.
Again, our alpha is arbitrary.
But usually people will use 0.05 since that means that in the long run, only 5% of tests
done on groups with no real difference will incorrectly reject the null.
So, we’ll conform :) and use an alpha of 0.05 here.
To calculate our critical t-value we need to find the t-values which correspond to the
top 5% most extreme values in our t-distribution.
Usually a computer or a calculator will do this for you, so we won’t go into the formula,
but here are the cutoffs:
The cutoffs for our specific problem are about -2.145 and 2.145.
We have two cutoffs because we’re doing a two tailed test.
We want to reject the null if coffee from Caf-fiend is better or if coffee from The
Blend Den is better.
We can already tell that we should fail to reject the null.
That there’s no clear difference between the quality of the coffee.
Our t-statistic of about 0.44 is isn’t close to -2.145 OR 2.145.
The critical value and p-value approach will give you identical results, so we don’t
really need to do both.
But for the sake of showing we get the same outcome…our calculated p-value is 0.6684.
We reject the null if the p-value is smaller than alpha, so again we fail to reject since
0.6684 is WAY bigger than 0.05.
One thing that’s nice about the p-value approach, and the reason we’ll mainly rely
on it throughout the rest of these examples, is that p-values are easier for us non-computers
to interpret.
A p-value of 0.6684 means that if there were NO difference in scores between coffee from
Caf-fiend and coffee from The Blend Den, we’d still expect to see a difference in our sample
means that’s 0.3 or greater pretty often...
66.84% of the time.
Since our observed difference of 0.3 or greater is pretty common under the null hypothesis,
we haven’t found evidence that it’s a bad fit.
That’s why we failed to reject it.
So right now we don’t have any evidence that one coffee shop is better than the other.
But remember, absence of evidence is not evidence of absence.
And while our coffee excursion and experiment were well designed, we can probably improve it.
If you look at the scores that your friends gave the coffees, you’ll see that there’s
one person who tried coffee from Caf-fiend and really hated it.
After looking through your scorecards, you realize it’s Alex , who has mentioned in
the past that she just doesn’t love coffee.
Which gets you thinking.
Even though you randomly assigned your friends to get either coffee from Caf-fiend or coffee
from The Blend Den, that design didn’t account for the fact that some people just like coffee
more than others.
Alex might give the best coffee in the world a measly 6 point rating just because...coffee’s
not really her thing.
Whereas your always caffeinated friend Cameron would probably give that day old coffee in
the breakroom a score of 7 just because he loves coffee.
So in addition to any true difference in scores between coffee from Caf-fiend and coffee from
The Blend Den, our sample means are also affected by how much the people in each group like coffee.
You randomly assigned your friends to groups, so you don’t expect that there’s some
systematic difference between the average coffee enjoyment of the groups.
But random assignment adds variation, which can make it harder to see a true difference
between the coffee scores.
One solution to this issue is a paired t-test.
You could try to pair up your friends based on how much they like coffee and then randomly
assign one to coffee from Caf-fiend and the other to coffee from The Blend Den, and repeat
this over and over until everyone had been assigned.
The best match, of course, for a person is themselves.
I’m just like me.
So you decide to call another random sample of 16 of your friends.
This time you give all of them both Caf-fiend coffee AND The Blend Den coffee and they record
their scores.
Now that everyone has scored both coffees, you can be sure that the two groups have the
exact same level of “coffee affinity” since it’s the exact same people.
The mean scores are still affected by variation due to individual coffee preferences, but
since the exact same people are in both groups, we can extract that variation and “throw
it away” so to speak.
One way to do this, is to make a difference score for each person.
This will tell you how much more they like coffee from Caf-fiend than coffee from The Blend Den.
Now that we have only one list of values--the difference scores--our matched pairs t-test
will look surprisingly similar to the one sample t-test that we’ve seen before.
We observed a mean difference (Caf-fiend - The Blend Den) of -0.18125, which means that on
average, people rated coffee The Blend Den 0.18125 points higher than coffee from Caf-fiend.
The null hypothesis here is that there’s no difference between ratings for coffee from
Caf-fiend and coffee The Blend Den, so we’d expect our mean difference to be 0.
And our measure of average variation is just the standard error of the difference scores:
Putting it together, we get a t-statistic of about -3.212.
Before we get to the corresponding p-value that our computer spit out, let’s consider
another way to think about what t-statistics are actually telling us.
T-statistics tell you how many standard errors away from the mean our observed difference is.
Though the t-distribution isn’t EXACTLY normal, it’s reasonably close, so we can
use our intuition about normal distributions to understand our t-values.
Normal distributions have about 68% of their data within one standard deviation from the mean.
And about 95% within 2 standard deviations.
That means that t-scores around 3, like ours, are about 3 standard errors away from the
mean...only around 0.3% of scores are that far away!
So it makes sense that our p-value is very small: 0.00582.
Which allows us to reject the null hypothesis that there is no difference between the scores
for Coffee from Caf-fiend and coffee from The Blend Den.
Which means that from now on, I’ll be buying my coffee from The Blend Den.
Except for when I’m meeting up with Alex, then I’ll buy` tea.
Statistical tests help us wade through the murky waters of variability, and our goal
should be to get rid of as MUCH of that variability as possible so that we can see patterns.
We can see whether exercise improves sleep...which your friends might be lacking after all that coffee.
Or whether your hearing could be hurt by listening to loud music by Cream or Ice Cube or Vanilla Ice
or some other musician that sounds like it belongs in coffee.
Like Spoon! Spoon. Yeah? Brandon Spoon.
But more importantly, we’re learning that all those formulas you may have seen floating
around, really aren’t that different.
We’re just comparing what we see, to what we think we should see.
We’re always comparing the way things are to how we expect them to be.
And statistics is no exception.
We now have the tools to design experiments and answer a lot of interesting questions
and do our own experiments even if we over caffeinate some of our friends in the process.
Thanks for watching. I'll see you next time.
関連動画をさらに表示
How To Know Which Statistical Test To Use For Hypothesis Testing
Two-sample t test for difference of means | AP Statistics | Khan Academy
How to Perform and Interpret Independent Sample T-Test in SPSS
uji hipotesis rata-rata 2 populasi (sampel independen)
T-test, ANOVA and Chi Squared test made easy.
StatQuest: One or Two Tailed P-Values
5.0 / 5 (0 votes)