Statistics in 10 minutes. Hypothesis testing, the p value, t-test, chi squared, ANOVA and more

Global Health with Greg Martin
8 Jul 202409:33

Summary

TLDRThis script is an educational guide on hypothesis testing in statistics. It explains the concept of a null hypothesis and how statistical tests like the Z-test, T-test, ANOVA, and Chi-Square test work to determine if observed differences in a sample are statistically significant or due to chance. The speaker, Greg Martin, uses the example of a population of purple and yellow people to illustrate the process of testing hypotheses and interpreting P-values. He emphasizes the importance of understanding when to use each test and provides a cheat sheet for further reference.

Takeaways

  • 🔍 Understand the concept of sampling: When a full population is too large to measure, a sample is taken to infer about the whole population.
  • 🎨 Recognize the colors of the population: The example uses purple and yellow to represent different groups within the population.
  • 📊 Learn about the null hypothesis: It is assumed that there is no difference in the population, which is what statistical tests try to disprove.
  • ✅ Grasp the significance of the Z-test: It calculates the probability of observing a sample result if the null hypothesis is true.
  • 📉 Understand P-values: They represent the likelihood of observing the sample results if the null hypothesis is true.
  • 🚫 Rejection of the null hypothesis: If the P-value is smaller than the predetermined alpha level, the null hypothesis is rejected, indicating a statistically significant result.
  • 📈 Learn about different statistical tests: The script discusses T-Test, ANOVA, Chi-Square, and Correlation tests, each suited for different types of data and research questions.
  • 📚 Greg Martin's educational resource: The speaker provides a cheat sheet and encourages learning more about statistics from his website.
  • 🔗 Accessing resources: Instructions are given on how to download a statistics cheat sheet from the website for further study.
  • 📋 Apply statistical tests to real scenarios: Examples are given for each test to illustrate how they can be used to analyze data and draw conclusions.

Q & A

  • What is the purpose of taking a sample from a population in research?

    -The purpose of taking a sample from a population is to estimate the characteristics of the entire population when it is impractical to measure every single individual.

  • What is the null hypothesis in the context of the purple and yellow people example?

    -The null hypothesis in the example is that there is no difference in the population, meaning there are equal numbers of purple and yellow people.

  • How does the Z-test help in determining the likelihood of observing a sample with 90% purple people if the null hypothesis is true?

    -The Z-test calculates the probability (P-value) of obtaining a result as extreme or more extreme than what was observed, assuming the null hypothesis is true. A small P-value indicates that it is unlikely to have observed such a sample by chance if the null hypothesis were true.

  • What is the significance of the P-value in statistical testing?

    -The P-value indicates the probability of obtaining a result at least as extreme as the one observed, assuming the null hypothesis is true. A small P-value suggests that the observed difference is statistically significant and not likely due to chance.

  • What is the Alpha value in hypothesis testing, and how is it used?

    -The Alpha value is a predetermined threshold used to determine the significance of the results. If the P-value is smaller than the Alpha value, the null hypothesis is rejected, indicating the results are statistically significant.

  • Can you explain the concept of hypothesis testing using the T-Test example provided in the script?

    -In the T-Test example, the null hypothesis assumes no difference in the average weight between men and women in the population. The T-Test calculates the probability of observing the sample difference if this null hypothesis were true. A small P-value would lead to the rejection of the null hypothesis, suggesting a real difference exists.

  • What is the main difference between the T-Test and ANOVA?

    -The main difference is that the T-Test compares the means of two groups, while ANOVA compares the means of three or more groups to determine if there are any statistically significant differences among them.

  • How does the Chi-Square test differ from the T-Test and ANOVA?

    -The Chi-Square test is used with categorical variables, examining the relationship between two categorical variables to determine if there is a significant association between them, unlike the T-Test and ANOVA, which are used with numeric variables.

  • What is the purpose of the correlation test mentioned in the script?

    -The correlation test is used to determine if there is a statistically significant relationship between two numeric variables, such as a correlation between age and weight.

  • How can one obtain the statistics cheat sheet mentioned by Greg Martin in the script?

    -The statistics cheat sheet can be obtained by visiting Greg Martin's website, learnmore365.com, signing up for a free account, navigating to the free resources section, finding the statistics cheat sheet, and downloading it.

  • What is the advice Greg Martin gives at the end of the script regarding learning statistics?

    -Greg Martin advises to get the basics right, understand the question being asked, and know how to use a particular test to answer that question, which will make understanding more complicated statistics easier.

Outlines

00:00

🔍 Understanding Hypothesis Testing

This paragraph introduces the concept of hypothesis testing in research, specifically focusing on determining the proportion of a certain characteristic within a population. The scenario involves a population of purple and yellow people, and the goal is to estimate the proportion of purple individuals without measuring every single person. A sample is taken, and it's observed that 90% are purple. The paragraph discusses the possibility of this result being due to chance and introduces the null hypothesis, which assumes no difference in the population (50-50 split). It explains the role of the P-value in a Z-test, which measures the probability of obtaining a result as extreme as the sample if the null hypothesis were true. A small P-value suggests that the observed difference is statistically significant, indicating it's unlikely to be due to chance alone. The concept of the Alpha value, which serves as a threshold to determine the significance of the P-value, is also introduced.

05:02

📊 Exploring Different Statistical Tests

The second paragraph delves into various statistical tests, emphasizing the importance of understanding when to use each test. The speaker, Greg Martin, provides an overview of hypothesis testing and P-value interpretation without focusing on formulas. He introduces a cheat sheet available on his website, learnmore365.com, which can be accessed for free. The paragraph then discusses the application of the T-Test, which is used to determine if a difference observed in a sample, such as the average weight of men and women, is statistically significant or merely a chance occurrence. The paragraph also touches on ANOVA, which extends the T-Test concept to more than two categories, and the Chi-Square test, which is used for analyzing the relationship between two categorical variables. The correlation test, which examines the relationship between two numeric variables, is also mentioned. The speaker concludes by encouraging viewers to understand the basics of these tests to better grasp more complex statistical concepts.

Mindmap

Keywords

💡Population

In the context of the video, 'population' refers to the entire group of individuals or items that are the subject of the research. The script discusses a hypothetical scenario where the population consists of people who are either purple or yellow. The concept is crucial as it sets the stage for understanding sampling and statistical inference. The video aims to explain how researchers use samples to make inferences about the larger population.

💡Sample

A 'sample' is a subset of the population selected for the purpose of study. The script mentions taking a sample to determine the color of individuals in the population, as measuring every single person is impractical. The concept of a sample is fundamental to statistical analysis because it allows researchers to make inferences about a larger population based on a smaller, more manageable group.

💡Proportion

The 'proportion' refers to the share or part of a whole, often expressed as a percentage. In the script, the researcher is interested in the proportion of purple people in the population. The video uses the example of observing that 90% of the sample are purple to illustrate how proportions can be used to make inferences about the population.

💡Null Hypothesis

The 'null hypothesis' is a fundamental concept in statistics. It represents the assumption that there is no significant relationship between the variables being studied, or no difference between groups. The script introduces the null hypothesis as the starting point for statistical testing, suggesting that there is no difference in the number of purple and yellow people in the population.

💡Statistical Test

A 'statistical test' is a method used to determine if a result from a sample is statistically significant. The video explains how a Z-test can be used to calculate the probability of obtaining a sample result as extreme as the one observed, assuming the null hypothesis is true. This helps to decide whether the observed differences are likely due to chance or represent a real effect.

💡P-Value

The 'P-value' is the probability of obtaining a result at least as extreme as the one observed, assuming the null hypothesis is true. The script uses the P-value to determine the likelihood of observing a sample with 90% purple people if the null hypothesis (equal numbers of purple and yellow people) were true. A small P-value suggests that the observed difference is statistically significant.

💡Alpha Value

The 'alpha value' is a predetermined threshold used to decide whether the results of a statistical test are significant. If the P-value is lower than the alpha value, the null hypothesis is rejected. The script mentions choosing an alpha value before conducting a statistical test to determine the cut-off point for what is considered statistically significant.

💡T-Test

A 'T-Test' is a type of statistical test that compares the means of two groups to see if there is a significant difference between them. The script uses the T-Test as an example to explain how to determine if the difference in average weight between men and women in a sample is statistically significant or could be due to chance.

💡ANOVA

ANOVA (Analysis of Variance) is a statistical test used to compare the means of three or more groups. The script mentions ANOVA in the context of comparing the average weight of people from three different countries. It helps determine if the observed differences are statistically significant or could be due to chance.

💡Chi-Square Test

The 'Chi-Square Test' is used to determine if there is a significant association between two categorical variables. The script explains how the Chi-Square Test can be used to test for an association between sex and height category, which are both categorical variables. It helps to determine if the observed association in the sample is statistically significant.

💡Correlation Test

A 'Correlation Test' is used to measure the strength and direction of a linear relationship between two numeric variables. The script uses the example of weight and age to illustrate how a correlation test can determine if there is a statistically significant relationship between the two variables in the population.

Highlights

Exploring the proportion of purple people in a population without measuring every individual.

Using a sample to represent a larger population and the concept of representative sampling.

Observing 90% purple people in a sample and questioning if this could be due to chance.

Introducing the null hypothesis of equal numbers of purple and yellow people.

Performing a z-test to determine the probability of observing the sample results under the null hypothesis.

Defining the P-value as the probability of obtaining a result as extreme or more extreme than observed, assuming the null hypothesis is true.

The significance of a small P-value in rejecting the null hypothesis and concluding statistical significance.

Setting an Alpha value as the cutoff point to determine if the P-value is small enough to reject the null hypothesis.

Applying the principles of hypothesis testing and interpreting the P-value for various statistical tests.

Greg Martin's introduction and the availability of longer videos on statistics.

Accessing the statistics cheat sheet from Greg Martin's website, learn more 365.com.

Applying the T Test to determine if the observed difference in average weight between men and women is statistically significant.

Using the Anova test to analyze variances with more than two categories, such as the weight of people in different countries.

The difference between the T Test and Anova in terms of variable types and the questions they aim to answer.

Introducing the Chi-Square test for analyzing the relationship between two categorical variables, such as sex and height category.

The process of conducting a Chi-Square test and interpreting its P-value to determine the existence of a relationship.

Applying the correlation test to two numeric variables, like weight and age, to determine if there's a statistically significant correlation.

The final message encouraging viewers to always do their best and the promise of future content.

Transcripts

play00:00

imagine that you have a research

play00:01

question about a certain population this

play00:03

is your population now the people in

play00:06

your population are either purple or

play00:08

yellow and you want to know what

play00:09

proportion of the people are purple but

play00:11

you don't have the time to measure the

play00:13

color of every single person in the

play00:15

population so instead you take a sample

play00:18

of the population and determine the

play00:20

color of those people hoping that what

play00:22

you observe will be representative of

play00:24

what's happening in The Wider population

play00:26

and You observe that 90% of the people

play00:28

in your sample are purple and 10% are

play00:30

yellow but could this be just due to

play00:33

chance might it be the case that there

play00:35

are equal number of purple and yellow

play00:36

people in your population and that you

play00:38

just happened to by chance select a

play00:40

sample with this extreme proportion of

play00:42

purple people while of course this is

play00:45

possible your intuition is that it's

play00:47

unlikely or shall we say

play00:49

improbable let's imagine that there is

play00:51

in fact no difference that there are

play00:53

equal numbers of purple and yellow

play00:54

people and we're going to call this our

play00:56

null hypothesis now you can perform a

play00:59

statistical test and in this case it's a

play01:01

z test and that's going to give you a P

play01:03

value now the P value is the probability

play01:06

of obtaining a result as extreme or more

play01:08

extreme than what you've observed this

play01:11

assumes the null hypothesis is true in

play01:13

other words in this example if in the

play01:16

population 50% of the people were in

play01:18

fact purple what are the chances that

play01:20

would get a random sample in which by

play01:23

chance we found that 90% of the people

play01:25

were purple it would be very unlikely

play01:27

the probability would be low in other

play01:29

words the P value would be very small so

play01:33

if the P value is small then you can

play01:35

reject the null hypothesis and conclude

play01:37

that the difference observed in your

play01:39

sample is statistically

play01:41

significant but what do we mean by small

play01:43

how small is small enough to reject the

play01:45

null hypothesis before we do our

play01:47

statistical test we decide on a cut off

play01:50

and we call that our Alpha value so the

play01:52

alpha value is the cutof point that you

play01:54

compare your P value to to determine

play01:56

whether the P value is small enough to

play01:58

conclude that your results are un likely

play02:00

to have occurred by chance now let's

play02:02

take a look at how to apply the

play02:03

principles of hypothesis testing and

play02:05

interpreting the P value for the T Test

play02:07

the kai Square test an NOA and the

play02:09

correlation test we're going to take a

play02:12

quick look at some of the more commonly

play02:13

used statistical tests if you can

play02:16

understand which tests to use when with

play02:18

respect to these tests you'll find the

play02:20

more complicated statistics much easier

play02:22

to understand so get the basics right

play02:24

you'll notice I'm not going to talk

play02:26

about any formulas here I'm going to

play02:27

talk about understanding the question

play02:29

question that you're asking and how to

play02:32

use a particular test to get an answer

play02:34

to that question okay so let's Dive

play02:35

Right In my name is Greg Martin I've got

play02:37

longer videos on stats that you can

play02:38

watch this is going to be a real quick

play02:40

one but just so that you know what

play02:41

you're seeing on the screen at the

play02:42

moment is a cheat sheet that I've

play02:44

created and we're going to talk through

play02:45

the various bits and pieces there within

play02:47

but you can get this PDF it's very easy

play02:49

to get I'll just quickly show you how in

play02:51

one second my website is learn more

play02:53

365.com uh I'm signed in signing in is

play02:57

for free of course you can create an

play02:58

account for free click on free resources

play03:01

that'll take you to the free resource

play03:02

Library you could scroll down you can

play03:04

filter the resources using these little

play03:05

categories on the side so I'm going to

play03:06

click on research and stats here has the

play03:09

statistics cheat sheet click on download

play03:11

and boom shakalaka there you go so let's

play03:13

look at how we can apply the T Test in

play03:14

this example right we've measured the

play03:16

average weight of men and the average

play03:18

weight of women in the sample that we've

play03:20

got and we've seen that there's a

play03:21

difference now that difference could be

play03:23

real or it could just be by chance it

play03:26

might just be a a fluke we may have

play03:28

taken a sample that happens to have a

play03:30

difference that's not really

play03:32

representative of the wider population

play03:34

so how do we decide the extent to which

play03:36

we can have confidence in this sample

play03:38

that we've got well we make an

play03:39

assumption let's assume that there's no

play03:43

difference in the real population right

play03:44

that we can't really see we're not going

play03:46

to measure everybody let's assume that

play03:47

it's the case that in actual fact

play03:49

there's no difference in the weight of

play03:50

men and women in the population if that

play03:53

were true How likely would it be that we

play03:57

would have gotten a sample that shows

play03:59

the difference that we've observed and

play04:01

that is exactly what the T test will

play04:04

tell you it'll tell you the probability

play04:06

of by chance getting a sample that shows

play04:10

this difference if it were the case that

play04:12

in actual fact in reality there was in

play04:14

fact no difference so if the P value is

play04:16

very small we can say it's very unlikely

play04:20

right because there's a small

play04:22

probability that the null hypothesis is

play04:24

true that there is no difference in the

play04:26

weight of men and women so we can reject

play04:27

it we can say we don't believe it we

play04:29

don't have confidence in that idea and

play04:32

if we can reject the null hypothesis we

play04:34

then can accept that in actual fact in

play04:38

the population from where the sample

play04:39

came there is a difference in the weight

play04:42

of men and women and that's how the tea

play04:44

test Works easy peasy lemon squeezy

play04:47

let's keep going boom shakalaka now

play04:49

let's talk about the Anova kind of in

play04:50

its simplest form this is the analysis

play04:52

of varans essentially it's asking the

play04:55

same kind of question as the T Test but

play04:57

in this case you've got a categorical

play04:58

variable with more than two categories

play05:01

well you could do it for for two but for

play05:03

two or more categories so let's say uh

play05:05

We've looked at the weight of people in

play05:08

three countries like America Britain and

play05:10

Russia and in our sample population

play05:14

we've observed that there is a

play05:16

difference we conduct an anova test we

play05:19

get a P value that tells us that if it

play05:21

were the case that in actual fact the

play05:23

weight of people in all three countries

play05:25

was exactly the same it would be very

play05:28

unlikely the P value

play05:30

very unlikely that we would have seen

play05:32

the difference that we did in our sample

play05:34

so we can reject the null hypothesis and

play05:36

we can accept the fact that there is a

play05:38

difference does that make sense now the

play05:40

Ki Square test is a little different and

play05:43

the difference is because in the Ki

play05:44

Square test you've got a different

play05:45

combination of variable types right

play05:48

you've got two categorical variables in

play05:51

other words two variables in which the

play05:53

data can be put into buckets in both

play05:55

cases right for the T Test you had one

play05:58

categorical variable in this case sex

play06:00

male or female and the other variable

play06:02

was numeric right so you had this

play06:04

distribution which had a mean and

play06:06

average for the Ki Square test you don't

play06:09

have a numeric variable but you've got a

play06:11

second categorical variable right so you

play06:14

could still have sex male or female but

play06:15

you could also have a second categorical

play06:17

variable as in small uh medium height

play06:21

and Tall right short short medium and

play06:23

Tall so three categories once again and

play06:26

as you can imagine for these categories

play06:28

you can think of proportions right so

play06:30

for short people there'll be a a certain

play06:33

proportion of them will be men and a

play06:34

certain proportion will be will be wom

play06:37

male and female and for medium height

play06:39

people a certain proportion will be men

play06:41

and a certain proportion will be women

play06:42

right so you've got these proportions

play06:44

that you get from these categorical

play06:45

variables and now once you understand

play06:48

that these are the two variables we've

play06:49

got after that everything about what

play06:52

we're going to do in terms of

play06:53

inferential statistics is exactly the

play06:55

same as the T Test in Anova right we're

play06:57

going to say we've taken a sample and

play06:58

we've seen a difference we've seen that

play07:00

there's some sort of association between

play07:03

sex and and height category and we're

play07:06

asking the question is that Association

play07:08

real is it statistically significant or

play07:11

is it the case that we happened to by

play07:13

chance take a sample that represented

play07:16

that sort of difference but in actual

play07:18

fact in the population that difference

play07:20

doesn't really exist so we make the

play07:22

Assumption we have a null hypothesis we

play07:24

let's make the assumption that in actual

play07:27

fact in The Wider population from where

play07:28

the sample came there's no relationship

play07:31

between sex and height category that

play07:33

that the proportions all exactly equal

play07:35

we do the Ki Square test and it gives us

play07:37

the P value if the P value is very very

play07:39

small it means that we cannot have

play07:42

confidence it's very unlikely that that

play07:45

null hypothesis is true it's unlikely

play07:48

that if it were the case that there's no

play07:50

association between sex and height

play07:53

category it's unlikely that we would

play07:55

have gotten the sample that we did with

play07:57

the differences that we saw so we can

play07:58

reject that we reject the null

play08:00

hypothesis and we accept the fact that

play08:02

in actual fact there is some sort of an

play08:05

association between six and height

play08:08

category got it easy peasy lemon squeezy

play08:10

let's keep going boom shakalaka and for

play08:12

the correlation test the exact same

play08:14

principles apply in this case again

play08:16

we've got a difference in the type of

play08:18

variable that we've got we've got two

play08:20

numeric variables right so let's imagine

play08:22

we've got weight and age both of them

play08:24

are numeric variables no categorical

play08:26

variables and if you got two numeric

play08:28

variables you can imagine that there

play08:29

could be an observed association between

play08:31

the two as age goes up weight goes up

play08:34

right you can imagine that there's a

play08:36

correlation so we take a sample of data

play08:38

in our observation we see a correlation

play08:40

and we ask the question is that real is

play08:43

it statistically significant or might it

play08:45

be the case that there's no correlation

play08:49

and that by chance we happen to get a

play08:50

sample in which erroneously this

play08:52

correlation seem to exist so we do a

play08:54

correlation test once again we get a P

play08:57

value the P value tells us that uh if

play09:00

the P value is very small we cannot

play09:03

accept the assumption that there's no

play09:04

correlation and we must accept the fact

play09:07

that there is in fact some sort of

play09:09

correlation between the two numeric

play09:11

variables right so uh there'll be a link

play09:13

on the screen that you can click on so

play09:14

that you can get uh this cheat sheet and

play09:17

have a look at it in your own time

play09:18

thanks for watching don't ever change

play09:19

don't do drugs always do your best boom

play09:20

shakalaka speak to you soon take care

play09:22

bye

Rate This

5.0 / 5 (0 votes)

Etiquetas Relacionadas
Statistical TestingHypothesis TestingP ValueT TestANOVAChi-Square TestCorrelation TestData AnalysisGreg MartinResearch Methods
¿Necesitas un resumen en inglés?