Two-sample t test for difference of means | AP Statistics | Khan Academy
Summary
TLDRIn this instructional video, the audience is guided through a two-sample T-test to determine if there's a significant difference in the sizes of tomato plants grown in two separate fields. The instructor assumes all conditions for inference are met, including random sampling, normality, and independence. With a significance level of 0.05, the T-test is conducted, resulting in a T-statistic of -2.44. A P-value of approximately 0.024 is calculated, leading to the rejection of the null hypothesis, thus suggesting a difference in plant sizes between the fields.
Takeaways
- π± The scenario involves Kaito growing tomatoes in two fields and wanting to know if there's a difference in plant sizes.
- π Kaito takes random samples from each field and measures the heights to compare them.
- π A two-sample T-test is suggested to determine if there's a significant difference in plant sizes between the fields.
- π The conditions for inference, including randomness, normality, and independence, are assumed to be met.
- π The significance level for the test is set at 0.05, which is the threshold for rejecting the null hypothesis.
- β The null hypothesis states that there is no difference in mean sizes between the two fields.
- π« The alternative hypothesis posits that there is a difference in the mean sizes of the tomato plants in the two fields.
- π The T statistic is calculated using the difference in sample means over an estimate of the standard deviation of the sampling distribution.
- π The formula for the T statistic includes the sum of squared sample standard deviations divided by their respective sample sizes.
- π’ The calculated T statistic for the given data is approximately -2.44.
- π The probability of obtaining a T value with an absolute value of 2.44 or more is found to be approximately 0.024.
- π« The P-value (0.024) is less than the significance level (0.05), leading to the rejection of the null hypothesis.
- π The conclusion suggests that there is evidence of a difference in the sizes of tomato plants between the two fields.
Q & A
What is the main objective of the video script?
-The main objective of the video script is to guide the viewer through conducting a two-sample T-test to determine if there is a significant difference in the sizes of tomato plants grown in two separate fields.
What are the assumptions made before conducting the two-sample T-test?
-The assumptions made before conducting the two-sample T-test are that the random condition, normal condition, and independent condition for inference are met.
What is the significance level used in the T-test?
-The significance level used in the T-test is 0.05.
What is the null hypothesis in this context?
-The null hypothesis is that there is no difference between the mean sizes of tomato plants in field A and field B, meaning the mean size in field A is equal to the mean size in field B.
What is the alternative hypothesis in this scenario?
-The alternative hypothesis is that the mean sizes of tomato plants in field A are not equal to the mean sizes in field B, indicating a difference between the two fields.
How is the T statistic calculated in a two-sample T-test?
-The T statistic is calculated as the difference between the sample means divided by the estimated standard deviation of the sampling distribution of the difference of the sample means, which is the square root of the sum of the squared sample standard deviations divided by their respective sample sizes.
What sample data is used to calculate the T statistic in the script?
-The sample data used includes the mean heights of plants in both fields (1.3 for field A and 1.6 for field B), the sample standard deviations (0.5 for field A and 0.3 for field B), and the sample sizes (22 for field A and 24 for field B).
What is the calculated T statistic value in the script?
-The calculated T statistic value is approximately -2.44.
How is the probability associated with the T statistic determined?
-The probability is determined by using a calculator to find the cumulative distribution function for the T distribution and then multiplying the result by two to account for both tails of the distribution.
What is the degrees of freedom used in the T-test?
-The degrees of freedom used in the T-test is the smaller of the two sample sizes minus one, which in this case is 21 (22 - 1).
What conclusion is drawn from the P-value compared to the significance level?
-Since the P-value (approximately 0.024) is less than the significance level (0.05), the null hypothesis is rejected, suggesting that there is indeed a difference between the sizes of tomato plants in the two fields.
Outlines
π Tomato Plant Size Comparison Hypothesis
The instructor introduces a scenario where Kaito, a farmer, is interested in whether the sizes of his tomato plants differ between two fields. To investigate this, he takes random samples from each field and measures the heights. The video prompts viewers to conduct a two-sample T-test under the assumption that all conditions for statistical inference are met, including randomness, normality, and independence. The significance level is set at 0.05. The null hypothesis is that there is no difference in the mean sizes of tomato plants between the two fields, while the alternative hypothesis suggests that there is a difference. The T-statistic is calculated using the sample means and standard deviations, resulting in a value of approximately -2.44, indicating a significant difference.
π Two-Sample T-Test Statistical Analysis
This paragraph delves into the statistical analysis of the T-test. The T-statistic of -2.44 is used to determine the probability of obtaining such a value under the null hypothesis. The degrees of freedom are calculated as 21, based on the smaller sample size minus one. Using a calculator, the instructor finds the cumulative probability for a T-value of -2.44, which is then doubled to account for both tails of the distribution. The resulting P-value is approximately 0.024, which is less than the significance level of 0.05. This leads to the rejection of the null hypothesis, suggesting that there is indeed a difference in the sizes of the tomato plants between the two fields.
Mindmap
Keywords
π‘Two sample T test
π‘Significance level
π‘Null hypothesis
π‘Alternative hypothesis
π‘Sample mean
π‘Sample standard deviation
π‘Degrees of freedom
π‘T statistic
π‘P-value
π‘Cumulative distribution function
π‘Random sample
Highlights
Kaito grows tomatoes in two separate fields and is curious about the size difference between the plants.
A random sample of tomato plants is taken from each field to measure their heights.
A two-sample T-test is conducted to determine if there's a significant difference in plant sizes between the fields.
The conditions for inference, including random, normal, and independent conditions, are assumed to be met.
The significance level for the test is set at 0.05.
The null hypothesis states that there is no difference in the mean sizes of tomato plants between the two fields.
The alternative hypothesis suggests that the mean sizes of tomato plants in the two fields differ.
The T statistic is calculated based on the differences between the sample means and the standard deviation of the sampling distribution.
The formula for the T statistic includes the sample standard deviations and sizes from both fields.
The calculated T statistic is approximately -2.44.
The T distribution is used to determine the probability of getting a T value as extreme as -2.44.
The probability of getting a T value with an absolute value greater than or equal to 2.44 is calculated.
The degrees of freedom for the T test are determined by the smaller sample size minus one.
The P-value is compared to the significance level to make a decision about the null hypothesis.
The P-value of approximately 0.024 is less than the significance level, leading to the rejection of the null hypothesis.
The rejection of the null hypothesis suggests that there is a difference in the sizes of tomato plants between the two fields.
Transcripts
- [Instructor] "Kaito grows tomatoes in two separate fields.
"When the tomatoes are ready to be picked,
"he is curious as to whether the sizes of his tomato plants
"differ between the two fields.
"He takes a random sample of plants from each field
"and measures the heights of the plants.
"Here is a summary of the results:"
So what I want you to do, is pause this video,
and conduct a two sample T test here.
And let's assume that all of the conditions
for inference are met, the random condition,
the normal condition, and the independent condition.
And let's assume that we are working
with a significance level of 0.05.
So pause the video, and conduct the two sample T test here,
to see whether there's evidence
that the sizes of tomato plants differ between the fields.
Alright, now let's work through this together.
So like always, let's first construct our null hypothesis.
And that's going to be the situation
where there is no difference between the mean sizes,
so that would be that the mean size in field A
is equal to the mean size in field B.
Now what about our alternative hypothesis?
Well, he wants to see whether the sizes of his tomato plants
differ between the two fields.
He's not saying whether A is bigger than B,
or whether B is bigger than A,
and so his alternative hypothesis
would be around his suspicion,
that the mean of A is not equal to the mean of B,
that they differ.
And to do this two sample T test now,
we assume the null hypothesis.
We assume our null hypothesis,
and remember we're assuming that all
of our conditions for inference are met.
And then we wanna calculate a T statistic
based on this sample data that we have.
And our T statistic is going to be equal
to the differences between the sample means,
all of that over our estimate
of the standard deviation of the sampling distribution
of the difference of the sample means.
This will be the sample standard deviation
from sample A squared, over the sample size from A,
plus the sample standard deviation
from the B sample squared, over the sample size from B.
And let's see, we have all the numbers here
to calculate it.
This numerator is going to be equal to 1.3 minus 1.6,
1.3 minus 1.6, all of that over
the square root of, let's see,
the standard deviation, the sample standard deviation
from the sample from field A is 0.5.
If you square that, you're gonna get 0.25,
and then that's going to be over the sample size
from field A, over 22,
plus 0.3 squared, so that is,
0.3 squared is 0.09,
all of that over the sample size from field B,
all of that over 24.
The numerator is just gonna be -.3,
divided by the square root
of .25 divided by 22,
plus .09 divided by 24,
and that gets us -2.44.
Approximately -2.44.
And so if you think about a T distribution,
and we'll use our calculator to figure out this probability,
so this is a T distribution right over here,
this would be the assumed mean of our T distribution.
And so we got a result that is,
we got a T statistic of -2.44,
so we're right over here, so this is -2.44.
And so we wanna say what is the probability
from this T distribution of getting something
at least this extreme?
So it would be this area, and it would also be this area,
if we got 2.44 above the mean, it would also be this area.
And so what I could do is, I'm gonna use my calculator
to figure out this probability right over here,
and then I'm just gonna multiply that by two,
to get this one as well.
So the probability of getting a T value,
I guess I could say where its absolute value
is greater than or equal to 2.44,
is going to be approximately equal to,
I'm going to go to second, distribution,
I'm going to go to the cumulative distribution function
for our T distribution, click that.
And since I wanna think about this tail probability here
that I'm just gonna multiply by two,
the lower bound is a very very very negative number,
and you could view that as functionally negative infinity.
The upper bound is -2.44.
- 2.44.
And now what's our degrees of freedom?
Well if we take the conservative approach,
it'll be the smaller of the two samples minus one.
Well the smaller of the two samples is 22,
and so 22 minus one is 21.
So put 21 in there.
Two...
21.
And now I can paste, and I get that number right over there,
and if I multiply that by two, 'cause this just gives me
the probability of getting something lower than that,
but I also wanna think about the probability
of getting something 2.44 or more above the mean
of our T distribution.
So times two, is going to be equal to approximately 0.024.
So approximately 0.024.
And what I wanna do then is compare this
to my significance level.
And you can see very clearly, this right over here,
this is equal to our P value.
Our P value in this situation,
our P value in this situation is clearly less
than our significance level.
And because of that, we said hey,
assuming the null hypothesis is true,
we got something that's a pretty low probability
below our threshold, so we are going to reject
our null hypothesis, which tells us that there is,
so this suggests,
this suggests the alternative hypothesis,
that there is indeed a difference between the sizes
of the tomato plants in the two fields.
Browse More Related Video
5.0 / 5 (0 votes)