T-Tests: A Matched Pair Made in Heaven: Crash Course Statistics #27

CrashCourse
15 Aug 201811:16

Summary

TLDRIn this Crash Course Statistics episode, Adriene Hill explores the concept of test statistics through a coffee tasting experiment. Comparing two local coffee shops, she uses a two-sample t-test to determine if there's a significant difference in scores. Initially, no significant difference is found, but after considering individual preferences, a paired t-test reveals a preference for one shop. The lesson emphasizes the importance of controlling for variability in experiments and the power of statistical tests to reveal patterns amidst noise.

Takeaways

  • 📊 The script introduces the concept of test statistics and how they can be generalized for different situations without memorizing a new formula for each.
  • 🔍 It uses a coffee tasting experiment to illustrate the process of hypothesis testing, comparing two coffee shops, Caf-fiend and The Blend Den.
  • 📈 The experiment involves a two-sample t-test, which is also known as an independent or unpaired t-test, to determine if there's a significant difference in the mean scores of coffee ratings.
  • 🎯 The null hypothesis is that there is no difference between the two coffee shops, while the alternative hypothesis suggests that one is better than the other.
  • 📉 The standard error is used as a measure of average variation in the two-sample t-test, taking into account the sample variance of both groups.
  • 📋 The script explains how to calculate the critical t-value and p-value to determine statistical significance, using an alpha level of 0.05.
  • 🔎 It discusses the limitations of the initial experiment, such as not accounting for individual preferences for coffee, which can introduce variability.
  • 🔄 The concept of a paired t-test is introduced as a solution to control for individual differences in coffee preference by comparing scores from the same individuals.
  • 📝 The paired t-test involves creating difference scores for each person, which allows for the comparison of the two coffees while controlling for individual preferences.
  • 🔑 The script emphasizes the importance of reducing variability in experiments to better identify true differences and patterns.
  • 📚 It concludes by highlighting that statistical tests help in understanding and comparing observed data with expected outcomes, a fundamental principle in statistics.

Q & A

  • What is the main topic discussed in this Crash Course Statistics episode?

    -The main topic discussed in this episode is the application of the general formula for creating test statistics in various situations, with a focus on how to use statistical tests to determine if there is a significant difference between two groups, exemplified by comparing coffee scores from two shops.

  • Why is it important not to have to memorize a new formula for every statistical test?

    -It's important not to memorize a new formula for every statistical test because it would be inefficient and overwhelming. The general formula for test statistics allows for adaptability across different situations, making the process of statistical analysis more streamlined and manageable.

  • What is the null hypothesis in the coffee tasting experiment described in the script?

    -The null hypothesis in the coffee tasting experiment is that there is no difference between the two coffee shops, meaning the mean scores for coffee from Caf-fiend and The Blend Den are the same.

  • What statistical test is used to analyze the difference in coffee scores between Caf-fiend and The Blend Den?

    -A two-sample t-test, also known as an independent or unpaired t-test, is used to analyze the difference in coffee scores between Caf-fiend and The Blend Den.

  • How is the standard error calculated for a two-sample t-test?

    -The standard error for a two-sample t-test is calculated by taking into account the sample variances of the two groups and their sizes (n1 and n2). It involves squaring the standard deviation to get the variance and then using the formula specific for two-sample t-tests.

  • What are the two methods mentioned to determine if there is a statistically significant difference between the two coffee shops?

    -The two methods mentioned are calculating the critical t-value and comparing it with the t-statistic, and calculating the p-value from the t-statistic and comparing it with the chosen alpha level.

  • Why is an alpha level of 0.05 commonly used in statistical tests?

    -An alpha level of 0.05 is commonly used because it means that in the long run, only 5% of tests done on groups with no real difference will incorrectly reject the null hypothesis, providing a balance between avoiding false positives and detecting true effects.

  • What is the significance of the p-value in the context of this coffee experiment?

    -The p-value in the context of the coffee experiment represents the probability of observing a difference in sample means as extreme as the one observed, assuming the null hypothesis is true. A p-value greater than the alpha level indicates a failure to reject the null hypothesis, suggesting no significant difference between the coffee shops.

  • Why is a paired t-test considered a better approach for the coffee experiment after considering individual preferences?

    -A paired t-test is considered better because it accounts for individual preferences by comparing the scores of the same individuals for both coffees, thus controlling for variability due to personal coffee affinity and making it easier to detect true differences between the coffees.

  • What does the t-statistic of -3.212 in the paired t-test indicate about the difference in coffee scores?

    -The t-statistic of -3.212 in the paired t-test indicates that the observed mean difference in scores is 3.212 standard errors away from the expected mean difference under the null hypothesis, suggesting a statistically significant difference between the coffees from Caf-fiend and The Blend Den.

  • How does the script illustrate the concept that 'absence of evidence is not evidence of absence'?

    -The script illustrates the concept by showing that even though the initial experiment did not find a significant difference between the coffee shops, it doesn't mean there is no difference. It could be due to variability in the sample or other factors not accounted for, hence the need for a more controlled experiment like the paired t-test.

Outlines

00:00

📊 Introduction to Test Statistics and Coffee Experiment

In this segment, Adriene Hill introduces the concept of test statistics and their application in various scenarios, emphasizing the adaptability of the general formula. She uses the example of a coffee taste test between two popular coffee shops, Caf-fiend and The Blend Den, to explain the process. A random sample of 16 people taste coffee from both shops without knowing the source, resulting in mean scores of 7.6 for Caf-fiend and 7.9 for The Blend Den. Adriene highlights the need to define null and alternative hypotheses before conducting a two-sample t-test to determine if the observed difference is statistically significant. The null hypothesis posits no difference between the shops, while the alternative suggests one is better. The t-test formula is introduced, and the observed difference, standard error, and critical t-values are discussed, leading to the conclusion that there's no significant difference between the two coffee shops based on the sample data.

05:05

🔍 Deep Dive into P-Values and Paired T-Test

This paragraph delves into the interpretation of p-values and the concept of a paired t-test. Adriene explains that a p-value of 0.6684 indicates that the observed difference in coffee scores could occur by random chance 66.84% of the time if there were no true difference between the coffee shops. This leads to the decision not to reject the null hypothesis based on the initial experiment. However, recognizing potential variability due to individual coffee preferences, Adriene suggests a paired t-test as an improved method. In this approach, each participant rates both coffees, effectively controlling for individual preferences. The mean difference score and standard error of the difference scores are calculated, resulting in a t-statistic of -3.212. This value, being far from the mean in a t-distribution, corresponds to a very small p-value (0.00582), leading to the rejection of the null hypothesis and the conclusion that there is a statistically significant difference in preference for The Blend Den's coffee over Caf-fiend's.

10:05

🧐 The Essence of Statistical Testing and Experiment Design

In the final paragraph, Adriene Hill wraps up the discussion by emphasizing the core of statistical testing: comparing observed data to expected outcomes under the null hypothesis. She stresses that the various formulas encountered in statistics are fundamentally about this comparison, aiming to reduce variability to identify true patterns. The paragraph concludes with a call to action, encouraging viewers to use the tools of statistical testing to design experiments and answer intriguing questions, even if it involves the occasional over-caffeination of friends in the process.

Mindmap

Keywords

💡Test Statistics

Test statistics are quantities calculated from sample data that are used to make inferences about a population parameter. In the video, test statistics are introduced as a general formula that can be adapted for various situations, emphasizing their importance in hypothesis testing. The script uses the example of comparing coffee scores from two shops to illustrate how test statistics are calculated and used to determine if there is a significant difference between the means of two groups.

💡Null Hypothesis

The null hypothesis is a statement of no effect or no difference in statistical testing. It serves as a baseline assumption that is tested against an alternative hypothesis. In the video, the null hypothesis is that there is no difference between the coffee scores from Caf-fiend and The Blend Den. The script explains that if the null hypothesis were true, any observed difference would be due to random chance.

💡Alternative Hypothesis

The alternative hypothesis is a statement that there is an effect or a difference, which is the opposite of the null hypothesis. It is what researchers hope to prove when conducting a test. In the context of the video, the alternative hypothesis is that there is a difference in the quality of coffee between the two shops, meaning one is better than the other.

💡Standard Error

Standard error is a measure of the variability of the sample mean. It is calculated by dividing the standard deviation by the square root of the sample size. In the video, the standard error is used as a measure of average variation in the two-sample t-test, helping to determine the significance of the observed difference between the coffee scores.

💡Two-Sample T-Test

A two-sample t-test, also known as an independent or unpaired t-test, is used to compare the means of two groups that are independent of each other. The video uses this test to compare the coffee scores from Caf-fiend and The Blend Den, demonstrating the calculation of the t-statistic and the interpretation of the results.

💡Critical T-Value

The critical t-value is the value that separates the region of acceptance of the null hypothesis from the region of rejection. If the calculated t-statistic is greater than the critical t-value, the null hypothesis is rejected. The video explains how to determine the critical t-value from a t-distribution table and how it is used to make a decision about the null hypothesis.

💡P-Value

The p-value is the probability of observing a test statistic as extreme as, or more extreme than, the one calculated from the sample data, assuming the null hypothesis is true. In the video, the p-value is calculated to determine whether the observed difference in coffee scores is significant. A high p-value, as found in the video, indicates that the observed difference is not statistically significant.

💡Alpha Level

The alpha level is the threshold probability of rejecting the null hypothesis when it is actually true, also known as the Type I error rate. Commonly set at 0.05, it represents the level of significance in a hypothesis test. The video mentions choosing an alpha level of 0.05, which means that there is a 5% chance of incorrectly rejecting the null hypothesis.

💡Matched Pairs T-Test

A matched pairs t-test, also known as a paired t-test, is used when the same subjects are tested under two different conditions. The video introduces this test as a way to account for individual differences in coffee preferences by having the same people rate both coffees, thus controlling for variability due to individual tastes.

💡Variability

Variability refers to the degree to which data points differ from each other. In the context of the video, reducing variability is crucial for detecting true differences between groups. The video discusses how random assignment can introduce variability, and how matched pairs can help control for it, allowing for a clearer comparison of the coffee scores.

💡Statistical Significance

Statistical significance is a measure of whether the results of a study are likely to have occurred by chance. A result is considered statistically significant if the p-value is less than the alpha level. In the video, the concept is used to determine whether the difference in coffee scores is due to more than just random chance, with the final test showing a significant difference.

Highlights

Introduction to the logic of test statistics and the general formula for creating them.

Explanation of how to adapt the general formula for various situations without memorizing new formulas for each.

The importance of defining null and alternative hypotheses in statistical testing.

Description of a practical experiment to determine the best coffee shop in town using statistical methods.

Use of a two-sample t-test, also known as an independent or unpaired t-test, to compare coffee scores.

Calculation of the standard error for two groups and its role in the t-test formula.

Decision-making process using critical t-values and the concept of a two-tailed test.

Interpretation of the t-statistic and its comparison to critical values to determine statistical significance.

Introduction to the concept of p-values and their use in hypothesis testing.

The choice of alpha level in statistical testing and its implications.

Discussion on the limitations of random assignment and the introduction of variability in experiments.

Proposal of a paired t-test as a solution to account for individual differences in coffee preferences.

Explanation of how to calculate difference scores and use them in a matched pairs t-test.

Interpretation of t-statistics in terms of standard errors and their significance in hypothesis testing.

Rejection of the null hypothesis based on a small p-value, indicating a significant difference between coffee shops.

Practical application of statistical tests to reduce variability and identify patterns in data.

Final thoughts on the unifying nature of statistical formulas and their role in comparing observed and expected outcomes.

Transcripts

play00:03

Hi, I’m Adriene Hill, and welcome back to Crash Course Statistics.

play00:06

In the last episode we dove into the logic surrounding test statistics and talked about

play00:10

a general formula that allows us to create them for lots different situations.

play00:15

There are so many questions we might want to answer, and it would be rough if we had

play00:19

to memorize a new formula for EVERY Single One.

play00:23

And sometimes Statistics is taught in a way that makes it seem like there’s a different

play00:27

formula you need to know if you want to test whether your bus is late more often than the

play00:31

average bus in your town.

play00:33

Or if burns treated with aloe heal faster than those that are left alone.

play00:38

But! Hah-zah.

play00:39

We can adapt the general formula...in all sorts of situations.

play00:43

INTRO

play00:53

Let’s say that you just moved to a new place, and you’re looking for the BEST coffee in town.

play00:57

Since you’ve been watching Crash Course Statistics, you decide to do a little impromptu experiment.

play01:01

Word on the street is there are two really popular coffee places near you, Caf-fiend

play01:06

and The Blend Den.

play01:07

So one Sunday after brunch, you grab a random sample of 16 of your new friends, and randomly

play01:12

give half of them an unmarked cup with coffee from Caf-fiend, and the other half an unmarked

play01:18

cup with coffee from The Blend Den.

play01:20

You made sure to get the same roast--dark--to keep things as even as possible.

play01:24

After delicate sniffs and sips of coffee in a process known as “cupping”, the tallies are in.

play01:30

On a scale of 1 to 10, Caf-fiend got a mean score of 7.6 and The Blend Den got a mean

play01:36

score of 7.9

play01:38

So we observe a difference between the coffee scores.

play01:41

Coffee from Caf-fiend scored 0.3 points lower than Coffee from The Blend Den.

play01:46

So coffee from The Blend Den is better?

play01:48

Right?

play01:48

Done and done.

play01:49

Nope not yet.

play01:51

Maybe it’s just random chance.

play01:53

So first we need to define our null.

play01:55

There’s no difference between the two coffee shops.

play01:58

And then our alternative hypothesis, that there is a difference.

play02:02

One is better than the other.

play02:03

In this case, we’re interested in whether the mean scores for coffee are different between

play02:08

Caf-fiend and The Blend Den.

play02:10

With a little algebra, we can see that this is the same thing as asking whether the difference

play02:14

between the two means is not zero.

play02:17

Now that we have our hypotheses, we can do a t-test.

play02:19

Specifically, we’ll do a two sample t-test, also called an independent or unpaired t-test.

play02:25

The formula for a two sample t-test follows our general test statistic formula:

play02:30

The difference we observed is 0.3.

play02:33

If the null hypothesis were true and there’s no difference between the coffee shops, we’d

play02:37

expect a difference of 0.

play02:38

So the numerator of our t-test is 0.3.

play02:41

For this kind of t-test, our measure of average variation is the standard error.

play02:46

For two groups, the standard error is calculated a bit differently since we have to account

play02:51

for the sample variance of two groups.

play02:53

Here, we’re squaring the standard deviation to get the variance and n1 and n2 are the

play02:59

sizes of the two groups--both are 8 here.

play03:01

Now that we have our t-value, we can figure out if there’s a statistically significant

play03:06

difference between the two coffee shops and there are two ways to do this.

play03:10

We can calculate the critical t-value and if our t-statistic is GREATER than the critical

play03:16

value we reject the null hypothesis.

play03:19

Or we can calculate the p-value from our t-statistic and we can reject the null hypothesis if the

play03:25

p-value is SMALLER than our chosen alpha level.

play03:28

To do either of these things, we’ll need to choose our alpha level.

play03:31

Again, our alpha is arbitrary.

play03:33

But usually people will use 0.05 since that means that in the long run, only 5% of tests

play03:39

done on groups with no real difference will incorrectly reject the null.

play03:44

So, we’ll conform :) and use an alpha of 0.05 here.

play03:48

To calculate our critical t-value we need to find the t-values which correspond to the

play03:53

top 5% most extreme values in our t-distribution.

play03:58

Usually a computer or a calculator will do this for you, so we won’t go into the formula,

play04:02

but here are the cutoffs:

play04:03

The cutoffs for our specific problem are about -2.145 and 2.145.

play04:10

We have two cutoffs because we’re doing a two tailed test.

play04:14

We want to reject the null if coffee from Caf-fiend is better or if coffee from The

play04:20

Blend Den is better.

play04:21

We can already tell that we should fail to reject the null.

play04:23

That there’s no clear difference between the quality of the coffee.

play04:26

Our t-statistic of about 0.44 is isn’t close to -2.145 OR 2.145.

play04:33

The critical value and p-value approach will give you identical results, so we don’t

play04:37

really need to do both.

play04:39

But for the sake of showing we get the same outcome…our calculated p-value is 0.6684.

play04:46

We reject the null if the p-value is smaller than alpha, so again we fail to reject since

play04:51

0.6684 is WAY bigger than 0.05.

play04:56

One thing that’s nice about the p-value approach, and the reason we’ll mainly rely

play04:59

on it throughout the rest of these examples, is that p-values are easier for us non-computers

play05:05

to interpret.

play05:06

A p-value of 0.6684 means that if there were NO difference in scores between coffee from

play05:12

Caf-fiend and coffee from The Blend Den, we’d still expect to see a difference in our sample

play05:17

means that’s 0.3 or greater pretty often...

play05:21

66.84% of the time.

play05:24

Since our observed difference of 0.3 or greater is pretty common under the null hypothesis,

play05:29

we haven’t found evidence that it’s a bad fit.

play05:31

That’s why we failed to reject it.

play05:34

So right now we don’t have any evidence that one coffee shop is better than the other.

play05:38

But remember, absence of evidence is not evidence of absence.

play05:43

And while our coffee excursion and experiment were well designed, we can probably improve it.

play05:48

If you look at the scores that your friends gave the coffees, you’ll see that there’s

play05:51

one person who tried coffee from Caf-fiend and really hated it.

play05:56

After looking through your scorecards, you realize it’s Alex , who has mentioned in

play05:59

the past that she just doesn’t love coffee.

play06:02

Which gets you thinking.

play06:03

Even though you randomly assigned your friends to get either coffee from Caf-fiend or coffee

play06:07

from The Blend Den, that design didn’t account for the fact that some people just like coffee

play06:12

more than others.

play06:14

Alex might give the best coffee in the world a measly 6 point rating just because...coffee’s

play06:19

not really her thing.

play06:20

Whereas your always caffeinated friend Cameron would probably give that day old coffee in

play06:24

the breakroom a score of 7 just because he loves coffee.

play06:27

So in addition to any true difference in scores between coffee from Caf-fiend and coffee from

play06:31

The Blend Den, our sample means are also affected by how much the people in each group like coffee.

play06:38

You randomly assigned your friends to groups, so you don’t expect that there’s some

play06:41

systematic difference between the average coffee enjoyment of the groups.

play06:46

But random assignment adds variation, which can make it harder to see a true difference

play06:51

between the coffee scores.

play06:52

One solution to this issue is a paired t-test.

play06:55

You could try to pair up your friends based on how much they like coffee and then randomly

play07:00

assign one to coffee from Caf-fiend and the other to coffee from The Blend Den, and repeat

play07:05

this over and over until everyone had been assigned.

play07:07

The best match, of course, for a person is themselves.

play07:10

I’m just like me.

play07:12

So you decide to call another random sample of 16 of your friends.

play07:15

This time you give all of them both Caf-fiend coffee AND The Blend Den coffee and they record

play07:20

their scores.

play07:21

Now that everyone has scored both coffees, you can be sure that the two groups have the

play07:25

exact same level of “coffee affinity” since it’s the exact same people.

play07:30

The mean scores are still affected by variation due to individual coffee preferences, but

play07:35

since the exact same people are in both groups, we can extract that variation and “throw

play07:41

it away” so to speak.

play07:42

One way to do this, is to make a difference score for each person.

play07:46

This will tell you how much more they like coffee from Caf-fiend than coffee from The Blend Den.

play07:51

Now that we have only one list of values--the difference scores--our matched pairs t-test

play07:57

will look surprisingly similar to the one sample t-test that we’ve seen before.

play08:01

We observed a mean difference (Caf-fiend - The Blend Den) of -0.18125, which means that on

play08:07

average, people rated coffee The Blend Den 0.18125 points higher than coffee from Caf-fiend.

play08:14

The null hypothesis here is that there’s no difference between ratings for coffee from

play08:18

Caf-fiend and coffee The Blend Den, so we’d expect our mean difference to be 0.

play08:24

And our measure of average variation is just the standard error of the difference scores:

play08:29

Putting it together, we get a t-statistic of about -3.212.

play08:34

Before we get to the corresponding p-value that our computer spit out, let’s consider

play08:39

another way to think about what t-statistics are actually telling us.

play08:43

T-statistics tell you how many standard errors away from the mean our observed difference is.

play08:49

Though the t-distribution isn’t EXACTLY normal, it’s reasonably close, so we can

play08:53

use our intuition about normal distributions to understand our t-values.

play08:58

Normal distributions have about 68% of their data within one standard deviation from the mean.

play09:04

And about 95% within 2 standard deviations.

play09:08

That means that t-scores around 3, like ours, are about 3 standard errors away from the

play09:13

mean...only around 0.3% of scores are that far away!

play09:18

So it makes sense that our p-value is very small: 0.00582.

play09:24

Which allows us to reject the null hypothesis that there is no difference between the scores

play09:29

for Coffee from Caf-fiend and coffee from The Blend Den.

play09:32

Which means that from now on, I’ll be buying my coffee from The Blend Den.

play09:36

Except for when I’m meeting up with Alex, then I’ll buy` tea.

play09:39

Statistical tests help us wade through the murky waters of variability, and our goal

play09:43

should be to get rid of as MUCH of that variability as possible so that we can see patterns.

play09:49

We can see whether exercise improves sleep...which your friends might be lacking after all that coffee.

play09:56

Or whether your hearing could be hurt by listening to loud music by Cream or Ice Cube or Vanilla Ice

play10:00

or some other musician that sounds like it belongs in coffee.

play10:05

Like Spoon! Spoon. Yeah? Brandon Spoon.

play10:10

But more importantly, we’re learning that all those formulas you may have seen floating

play10:14

around, really aren’t that different.

play10:16

We’re just comparing what we see, to what we think we should see.

play10:20

We’re always comparing the way things are to how we expect them to be.

play10:25

And statistics is no exception.

play10:26

We now have the tools to design experiments and answer a lot of interesting questions

play10:30

and do our own experiments even if we over caffeinate some of our friends in the process.

play10:36

Thanks for watching. I'll see you next time.

Rate This

5.0 / 5 (0 votes)

Related Tags
StatisticsCoffee TestHypothesis TestingT-TestExperiment DesignData AnalysisCaf-fiendThe Blend DenCrash CourseStatistical Significance