Marginal and conditional distributions | Analyzing categorical data | AP Statistics | Khan Academy

Khan Academy
24 Oct 201805:16

Summary

TLDRThis educational video script explores the relationship between study time and test performance in a classroom of 200 students. It introduces the concept of a two-way table to analyze data distribution across these variables. The instructor explains marginal distributions, which focus on one variable by summing rows or columns, and conditional distributions, which analyze the distribution of one variable given a condition of the other. The script illustrates these concepts with examples and calculations, emphasizing the importance of understanding data in both joint and individual contexts.

Takeaways

  • πŸ“š The script discusses analyzing the relationship between study time and test performance in a classroom of 200 students.
  • πŸ—‚ It introduces the concept of creating 'buckets' for study time and test scores to categorize student data.
  • πŸ“Š The two-way table or joint distribution is used to visualize the relationship between the two variables.
  • πŸ”’ A marginal distribution focuses on one variable by summing counts or percentages across categories of the other variable.
  • πŸ‘¨β€πŸ« Example given: 40 out of 200 students scored between 80-100%, representing 20% of the class.
  • πŸ“ˆ Marginal distributions can be represented as either counts or percentages of the total.
  • πŸ•’ Another example of a marginal distribution is the time studied, with percentages indicating how many students fall into each time bracket.
  • πŸ“ The concept of a conditional distribution is introduced, which shows the distribution of one variable given a condition of the other.
  • πŸ“‹ Conditional distributions are typically represented in percentages to show the likelihood within a specific condition.
  • 🧐 An example of a conditional distribution is the percentage of students scoring within certain ranges, given they studied between 41-60 minutes.
  • πŸ” The script emphasizes the importance of understanding and interpreting data through these distributions for educational insights.

Q & A

  • What is the purpose of setting up buckets for time studied and percent correct in the classroom scenario described?

    -The purpose is to categorize students into specific ranges of time studied and percent correct scores, allowing for the analysis of the relationship between study time and test performance.

  • What is a two-way table and how does it relate to the joint distribution in the script?

    -A two-way table is a statistical tool that displays the frequency distribution of two categorical variables. In the script, it represents the joint distribution of time studied and percent correct, showing how these two variables are related.

  • How many students in the class got between a 60 and 79% on the test and studied between 21 and 40 minutes according to the script?

    -According to the script, 20 out of the 200 total students fall into this category.

  • What is a marginal distribution and how is it derived from the two-way table?

    -A marginal distribution focuses on one of the dimensions of the data, ignoring the other. It is derived by summing the counts or percentages across the rows or columns of the two-way table.

  • How many students got between 80 and 100% correct on the test, according to the marginal distribution of percent correct?

    -According to the marginal distribution, 40 out of the 200 students scored between 80 and 100%.

  • What is the percentage of students who scored between 60 and 79% on the test, when represented as a marginal distribution?

    -The percentage is 30%, calculated by dividing the 60 students who scored in this range by the total number of students, which is 200.

  • How many students studied between zero and 20 minutes, according to the marginal distribution of study time?

    -According to the marginal distribution, 14 students studied between zero and 20 minutes.

  • What is a conditional distribution and how does it differ from a marginal distribution?

    -A conditional distribution is the distribution of one variable given a condition related to the other variable. Unlike a marginal distribution, which considers the overall distribution of a single variable, a conditional distribution considers the distribution of one variable within specific conditions of another variable.

  • What is the standard practice for representing a conditional distribution?

    -The standard practice for representing a conditional distribution is to think in terms of percentages, rather than counts.

  • Can you provide an example of calculating a conditional distribution from the script?

    -An example from the script is calculating the conditional distribution of percent correct given that students studied between 41 and 60 minutes. This involves looking at the students in that study time range and calculating the percentage of students falling into each percent correct category.

  • What percentage of students who studied between 41 and 60 minutes scored between 80 and 100% on the test, according to the conditional distribution calculated in the script?

    -According to the script, approximately 18.6% of students who studied between 41 and 60 minutes scored between 80 and 100% on the test.

Outlines

00:00

πŸ“Š Understanding Joint and Marginal Distributions

The paragraph discusses how to analyze the relationship between two variables, such as the amount of time studied and the percent correct on a test, within a classroom of 200 students. The instructor suggests creating buckets for time studied and percent correct and then using a two-way table or joint distribution to visualize the data. The example provided shows that 20 out of 200 students scored between 60 and 79% and studied between 21 and 40 minutes. The focus of the video is on understanding marginal and conditional distributions. A marginal distribution concentrates on one variable, like the percent correct, and can be represented as counts or percentages. For instance, 40 out of 200 students scored between 80 and 100%, which is a 20% marginal distribution. The paragraph also introduces the concept of a conditional distribution, which is the distribution of one variable given a condition of the other variable, such as the percent correct given that students studied between 41 and 60 minutes.

05:01

πŸ”’ Calculating Conditional Distributions

This paragraph continues the discussion on distributions by focusing on conditional distributions. It explains that a conditional distribution is the distribution of one variable given a specific condition of another variable. Using the example of students who studied between 41 and 60 minutes, the paragraph demonstrates how to calculate the conditional distribution of percent correct. The process involves looking at the column corresponding to the condition and then calculating the percentage of students falling into each category of percent correct within that condition. The example shows that 16 out of 86 students in this time range scored between 80 and 100%, which is approximately 18.6%. The paragraph emphasizes that while marginal distributions can be represented as counts or percentages, conditional distributions are typically expressed in terms of percentages.

Mindmap

Keywords

πŸ’‘Two-way table

A two-way table, also known as a contingency table, is a statistical tool used to display the frequency distribution of two categorical variables. In the video, the instructor uses a two-way table to illustrate the relationship between the amount of time students studied and their percent correct on a test. The table helps in visualizing the joint distribution of these two variables.

πŸ’‘Joint distribution

Joint distribution refers to the way two or more random variables are distributed together. In the context of the video, the joint distribution is depicted through the two-way table, showing how the time studied and the percent correct scores are related across different groups of students.

πŸ’‘Buckets

In the script, 'buckets' are used to categorize data into specific ranges or groups. For instance, time studied and percent correct are divided into buckets, allowing for easier analysis and visualization of the data in the two-way table.

πŸ’‘Marginal distribution

A marginal distribution focuses on the distribution of a single variable while ignoring the others. The instructor explains how to derive the marginal distribution of percent correct by summing the counts across rows in the two-way table and then converting these counts into percentages of the total.

πŸ’‘Counts

Counts in this context refer to the number of occurrences within each category or bucket. The script mentions counting the number of students who achieved certain scores and studied for specific amounts of time, which are then used to create the marginal and conditional distributions.

πŸ’‘Percentages

Percentages are used to express parts of a whole as a proportion of 100. In the video, after obtaining the counts for each category, the instructor shows how to convert these into percentages to better understand the marginal distribution of the percent correct scores.

πŸ’‘Conditional distribution

A conditional distribution is the probability distribution of one variable given a condition on another variable. The instructor discusses how to calculate the conditional distribution of percent correct, given that a student has studied a certain amount of time, by looking at the relevant column in the two-way table and calculating the percentages.

πŸ’‘Survey

A survey is a method of data collection where a sample of subjects is asked questions to gather information. The script mentions surveying students as a means to collect data on the amount of time they studied and their test scores.

πŸ’‘Test scores

Test scores are the numerical results of an assessment, which in this video, are categorized into buckets representing different ranges of percentages correct. These scores are a key aspect of the data being analyzed in the two-way table.

πŸ’‘Classroom

The classroom in the script is the setting for the study, where 200 students are observed to understand the relationship between study time and test performance. It provides the context for the data being discussed.

πŸ’‘Students

Students are the subjects of the study in the video. Their study habits and test performance are being analyzed to understand the correlation between time spent studying and the accuracy of their test answers.

Highlights

Exploring the relationship between study time and test performance in a classroom of 200 students.

Creating buckets for time studied and percent correct to categorize student data.

Utilizing a two-way table to represent the joint distribution of study time and test scores.

Identifying a specific example: 20 students studied 21-40 minutes and scored 60-79% on the test.

Introduction to marginal distributions focusing on a single dimension of data.

Calculating marginal distribution by summing counts across rows or columns.

Representing marginal distributions as both counts and percentages for clearer insights.

Understanding the distribution of percent correct with 40 students scoring 80-100%.

Calculating the marginal distribution for time studied, with 14 students studying 0-20 minutes.

Concept of conditional distributions to analyze the relationship between two variables given a condition.

Determining the conditional distribution of percent correct for students who studied 41-60 minutes.

Emphasizing the standard practice of representing conditional distributions in percentages.

Calculating the conditional percentage: 16 out of 86 students scored 80-100% when studying 41-60 minutes.

Methodology for deriving the full conditional distribution by calculating percentages for each category.

Highlighting the importance of understanding joint, marginal, and conditional distributions in data analysis.

Practical application of these distributions in educational research to improve study habits and test outcomes.

Transcripts

play00:00

- [Instructor] Let's say that we are trying

play00:01

to understand a relationship in a classroom of 200 students

play00:04

between the amount of time studied and the percent correct.

play00:09

What we could do is we could set up some buckets

play00:11

of time studied and some buckets of percent correct

play00:14

and then we could survey the students

play00:16

and/or look at the data from the scores on the test.

play00:19

And then we can place students in these buckets.

play00:22

So what you see right over here, this is a two-way table.

play00:25

And you can also view this as a joint distribution

play00:28

along these two dimensions.

play00:30

So one way to read this is that 20

play00:34

out of the 200 total students got

play00:36

between a 60 and 79% on the test

play00:40

and studied between 21 and 40 minutes.

play00:44

So there's all sorts of interesting things

play00:46

that we could try to glean from this,

play00:48

but what we're going to focus on this video

play00:49

is two more types of distributions

play00:52

other than the joint distribution that we see in this data.

play00:55

One type is a marginal distribution.

play00:58

And a marginal distribution is just focusing

play01:02

on one of these dimensions.

play01:04

And one way to think about it is you can determine it

play01:07

by looking at the margin.

play01:09

So, for example, if you wanted

play01:10

to figure out the marginal distribution

play01:13

of the percent correct,

play01:16

what you could do is look at the total of these rows.

play01:19

So these counts right over here

play01:21

give you the marginal distribution of the percent correct.

play01:25

40 out of the 200 got between 80 and a hundred.

play01:28

60 out of the 200 got between 60 and 79, so on and so forth.

play01:32

Now, a marginal distribution could be represented

play01:35

as counts or as percentages.

play01:37

So if you represent it as percentages,

play01:38

you would divide each of these counts by the total,

play01:41

which is 200.

play01:42

So 40 over 200, that would be 20%.

play01:45

60 out of 200, that would be 30%.

play01:48

70 out of 200, that would be 35%.

play01:51

20 out of 200 is 10%.

play01:53

And 10 out of 200 is 5%.

play01:56

So this right over here in terms of percentages

play01:58

gives you the marginal distribution of the percent correct

play02:01

based on these buckets.

play02:03

So you can say 10% got between a 20 and a 39.

play02:07

Now, you could also think

play02:08

about marginal distributions the other way.

play02:10

You could think about the marginal distribution

play02:11

for the time studied in the class.

play02:14

Then you would look at these counts right over here.

play02:18

You would say a total of 14 students studied

play02:21

between zero and 20 minutes.

play02:23

You're not thinking about the percent correct anymore.

play02:26

A total of 30 studied between 21 and 40 minutes.

play02:28

And likewise, you could write these as percentages.

play02:31

This would be 7%.

play02:34

This would be 15%.

play02:35

This would be 43%.

play02:38

And this would be 35% right over there.

play02:42

Now, another idea that you might sometimes see

play02:45

when people are trying

play02:45

to interpret a joint distribution like this

play02:48

or get more information or more realizations from it

play02:51

is to think about something known

play02:53

as a conditional distribution.

play02:57

Conditional distribution.

play03:01

And this is the distribution of one variable

play03:04

given something true about the other variable.

play03:07

So, for example, an example of a conditional distribution

play03:09

would be the distribution of percent correct

play03:16

given that students

play03:24

study between, let's say, 41 and 60 minutes.

play03:30

Between 41

play03:34

and 60 minutes.

play03:36

Well, to think about that,

play03:37

you would first look at your condition.

play03:40

Okay, let's look at the students

play03:41

who have studied between 41 and 60 minutes.

play03:43

That would be this column right over here.

play03:46

And then that column, the information in it,

play03:48

can give you your conditional distribution.

play03:51

Now, an important thing to realize

play03:52

is a marginal distribution can be represented

play03:54

as counts for the various buckets or percentages,

play03:57

while the standard practice for conditional distribution

play03:59

is to think in terms of percentages.

play04:02

So the conditional distribution of the percent correct

play04:04

given that students study between 41 and 60 minutes,

play04:07

it would look something like this.

play04:09

Let me get a little bit more space.

play04:11

So if we set up the various categories, 80 to 100,

play04:16

60 to 79, 40 to 59,

play04:21

continue it over here, 20 to 39,

play04:25

and zero to 19,

play04:27

what we'd wanna do is calculate the percentage

play04:29

that fall into each of these buckets given

play04:31

that we're studying between 41 and 60 minutes.

play04:34

So this first one, 80 to a hundred,

play04:35

it would be 16 out of the 86 students.

play04:39

So we would write 16 out of 86,

play04:43

which is equal to 16 divided by 86

play04:48

is equal to, I'll just round to one decimal place.

play04:50

It's roughly 18.6%.

play04:53

18.6.

play04:55

Approximately equal to 18.6%.

play04:58

And then to get the full conditional distribution,

play05:01

we would keep doing that.

play05:02

We would figure out the percentage.

play05:03

60 to 79, that would 30 out of 86.

play05:07

30 out of 86, whatever percentage that is,

play05:11

and so on and so forth

play05:12

in order to get that entire distribution.

Rate This
β˜…
β˜…
β˜…
β˜…
β˜…

5.0 / 5 (0 votes)

Related Tags
Educational AnalysisStudent SurveyTest ScoresStudy TimeData InterpretationJoint DistributionMarginal DistributionConditional DistributionStatistical StudyClassroom DataEducation Metrics