ETC1000 Topic 1b

Brett Inder
16 Feb 202222:11

Summary

TLDRThis video continues the exploration of categorical data, focusing on the concepts of probability, marginal and conditional probabilities, and independence. Using examples related to medical conditions and exercise habits, the speaker explains how to calculate and interpret these probabilities. Additionally, the video covers the importance of understanding these concepts for program evaluation, demonstrated through a job search program. The speaker emphasizes the need for statistical tests to validate findings and introduces advanced topics for further study. Practical tips on working with pivot tables and calculating probabilities are also provided.

Takeaways

  • 📊 The session covers categorical data, focusing on different medical conditions and amounts of exercise among 5,000 people, presented in a frequency distribution table.
  • 🔢 The frequency distribution table is used to calculate probabilities, turning raw counts into marginal probabilities by dividing each count by the total population (5,000).
  • 🧮 Marginal probabilities focus on one characteristic of interest, such as the amount of exercise or type of illness.
  • 🔀 Joint or intersection probabilities look at the probability of two characteristics occurring together, such as having diabetes and engaging in minimal exercise.
  • 🔍 Conditional probabilities are calculated by conditioning on a particular column or row total, providing insights into the likelihood of one characteristic given another.
  • 💡 Conditional probabilities are essential for understanding relationships and potential causation between variables, such as the impact of exercise on diabetes.
  • 📐 Independence is a crucial concept where two events are independent if the probability of one occurring is unaffected by the outcome of the other.
  • 📈 Independence can be tested by comparing conditional probabilities across different groups to see if they are equal.
  • 👩‍🏫 The example of a job search program demonstrates the practical application of these concepts, showing how to evaluate the effectiveness of interventions.
  • 🔍 Program evaluation involves comparing the success rates of those who participated in a program versus those who didn't, highlighting the importance of conditional probabilities and independence.
  • 📉 In real-world applications, statistical tests are necessary to determine if differences in probabilities are significant or due to chance, which will be covered in future videos.

Q & A

  • What is a frequency distribution table and why is it used in the script?

    -A frequency distribution table is a statistical tool used to organize and display data in a tabular form, showing the frequency or count of occurrences for different categories. In the script, it is used to represent the medical conditions and exercise habits of 5,000 people, allowing for a clear visualization of the data.

  • What is the difference between marginal and joint probabilities?

    -Marginal probabilities refer to the probability of a single event or characteristic occurring, regardless of other variables. Joint probabilities, on the other hand, refer to the probability of two or more events or characteristics occurring simultaneously. In the script, marginal probabilities are found in the margins of the table, while joint probabilities are found in the intersection of rows and columns.

  • How are probabilities calculated from the frequency distribution table?

    -Probabilities are calculated by dividing the frequency or count of each category by the total number of observations. In the script, the total number of observations is 5,000, and each cell in the table is divided by this number to convert counts into probabilities.

  • What is conditional probability and how is it related to the data presented in the script?

    -Conditional probability is the probability of an event occurring, given that another event has already occurred. In the script, it is calculated by taking the joint probability of two characteristics and dividing it by the marginal probability of one of the characteristics, which provides insight into the relationship between the two.

  • Why is the concept of independence important in analyzing the data in the script?

    -The concept of independence is crucial as it helps determine whether the occurrence of one event has any impact on the occurrence of another. If two variables are independent, the probability of one does not affect the probability of the other. In the script, the analysis of exercise and diabetes shows that they are not independent, indicating a relationship between exercise levels and the likelihood of having diabetes.

  • How does the script illustrate the application of conditional probabilities in real-world scenarios?

    -The script uses the example of a job search program to illustrate the application of conditional probabilities. It shows how the probability of finding a job is different for those who participated in the program versus those who did not, demonstrating the effectiveness of the program in improving employment chances.

  • What is the significance of the pivot table in the script's discussion of probabilities?

    -The pivot table is significant as it allows for the easy manipulation and visualization of data. In the script, it is used to convert raw data into probabilities and to calculate conditional probabilities by showing values as percentages of rows or columns.

  • What statistical concept is briefly mentioned at the end of the script and why is it important?

    -Statistical testing is briefly mentioned at the end of the script. It is important because it helps determine whether observed differences in probabilities are statistically significant and not due to chance, providing a more robust analysis of the data.

  • What is the purpose of the advanced section mentioned in the script for those studying at a higher level?

    -The advanced section is intended to provide a deeper understanding of probability distributions and to introduce common probability distributions. It offers a more in-depth exploration of the topic for those who wish to gain a more comprehensive knowledge of the subject.

  • How does the script use the concept of program evaluation to discuss the effectiveness of a job search program?

    -The script uses program evaluation to compare the employment outcomes of participants and non-participants of a job search program. By comparing the conditional probabilities of finding a job for both groups, it evaluates the effectiveness of the program in improving employment rates.

Outlines

00:00

📚 Introduction to Categorical Data and Frequency Distribution

The speaker introduces the continuation of the topic on categorical data and emphasizes the importance of watching the first half. The discussion centers around a two-way frequency distribution table representing medical conditions and exercise amounts among 5,000 people. Probabilities are introduced as a foundational concept for analyzing complex data.

05:00

📊 Understanding Marginal and Joint Probabilities

The speaker explains marginal probabilities, which focus on one characteristic of interest, and joint or intersection probabilities, which consider two characteristics simultaneously. Examples from the data table, such as the probability of having diabetes and minimal exercise, are used to illustrate these concepts.

10:01

🔄 Conditional Probabilities and Pivot Tables

The concept of conditional probabilities is introduced, which considers the probability of one event given another. The speaker demonstrates how to calculate conditional probabilities using pivot tables, converting percentages to probabilities, and explains the significance of these probabilities in data analysis.

15:03

📉 Independence in Probability

The speaker delves into the concept of independence, where two events are independent if the occurrence of one does not affect the probability of the other. Using the example of diabetes and exercise, the speaker explains how to determine independence and its implications for understanding relationships between variables.

20:03

📝 Program Evaluation and Practical Applications

The final part covers an example of evaluating a job search program to illustrate the importance of understanding probabilities and independence. The speaker highlights the relevance of these concepts in real-world scenarios, such as public health and program evaluation, and hints at more advanced statistical tests to come.

Mindmap

Keywords

💡Categorical Data

Categorical data refers to variables that can be classified into different categories or groups. In the video, the speaker uses categorical data to analyze medical conditions and exercise levels, which are presented in a table format. This data type is essential for understanding the frequency distribution and is foundational in the analysis of the relationships between different characteristics.

💡Frequency Distribution

A frequency distribution is a table that displays the frequency of various categories within a dataset. In the context of the video, the speaker uses a two-way frequency distribution to show the relationship between medical conditions and exercise levels among 5,000 people, illustrating how often each combination occurs.

💡Probability

Probability is a measure of the likelihood that a particular event will occur, expressed as a number between 0 and 1. The video emphasizes the importance of probability in capturing uncertainty and risk in the world. The speaker demonstrates how to convert the frequencies from the table into probabilities to analyze the likelihood of different outcomes, such as having a certain medical condition given a specific level of exercise.

💡Marginal Probability

Marginal probability is the probability of a single event occurring, independent of any other events. In the video, the speaker calculates marginal probabilities by looking at the totals in the margins of the table, such as the total number of people with a certain medical condition or the total number of people engaging in a certain level of exercise.

💡Joint Probability

Joint probability is the probability of two or more events occurring together. The speaker in the video explains joint probabilities by referring to the intersection probabilities within the table, such as the probability of a person having both diabetes and engaging in minimal exercise.

💡Conditional Probability

Conditional probability is the probability of an event occurring, given that another event has already occurred. The video illustrates this by calculating the likelihood of having diabetes given that a person exercises minimally, using the formula for conditional probability and demonstrating it through a pivot table.

💡Independence

Independence in probability theory refers to two events that do not affect each other's probability of occurrence. The video discusses the concept of independence by examining whether the probability of having diabetes is the same across different levels of exercise, concluding that exercise level and diabetes are not independent due to differing probabilities.

💡Program Evaluation

Program evaluation is the process of assessing the effectiveness of a program or intervention. In the video, the speaker uses the concept of program evaluation to determine whether a job search program increases the likelihood of finding employment, by comparing the conditional probabilities of employment for participants and non-participants.

💡Statistical Test

A statistical test is a method used to determine if a result or relationship is statistically significant, meaning it is unlikely to have occurred by chance. The video mentions the need for statistical tests to validate the differences in probabilities observed in the sample data, ensuring that the conclusions drawn are robust and not due to random variation.

💡Pivot Table

A pivot table is an interactive table in spreadsheet software that allows for the summarization and manipulation of data. In the video, the speaker uses pivot tables to calculate and display probabilities, conditional probabilities, and to illustrate the concepts of marginal and joint probabilities in an accessible way.

💡Uncertainty

Uncertainty refers to the state of being unsure or the inability to predict an outcome with total certainty. The video discusses how probabilities are used to quantify uncertainty in various real-world scenarios, such as the likelihood of different health outcomes or the effectiveness of a job search program.

Highlights

Introduction to the second half of Topic One, emphasizing the importance of watching the first half for context.

Discussion on categorical data and its presentation in a two-way frequency distribution table.

Explanation of how to represent data about medical conditions and exercise levels for 5,000 people in a table format.

Introduction to the concept of probabilities as a foundational idea for analyzing risk and uncertainty.

Description of how to convert frequency distribution data into probabilities by dividing by the total number of observations.

Definition and explanation of marginal probabilities, focusing on the probability of one characteristic of interest.

Introduction to joint or intersection probabilities, which represent the probability of two events occurring together.

Explanation of conditional probabilities and how they are calculated from joint probabilities and marginal probabilities.

Demonstration of how to calculate conditional probabilities using pivot tables and percentage calculations.

Discussion on the importance of understanding the relationship between variables, such as exercise and diabetes, through conditional probabilities.

Introduction to the concept of independence and its significance in determining whether two variables are linked.

Example illustrating how to evaluate the effectiveness of a job search program using conditional probabilities.

Explanation of how to determine if a training program is independent of getting a job by comparing conditional probabilities.

Discussion on the potential issues with program evaluation, such as selection bias, and the need for more sophisticated statistical tests.

Encouragement for higher-level students to explore advanced sections on probability distributions for a deeper understanding.

Acknowledgment of the challenges of working from home and a light-hearted moment involving the speaker's wife.

Transcripts

play00:02

hello again

play00:03

we're on the second half of topic one

play00:07

make sure you watch the first half first

play00:09

because otherwise this second half may

play00:10

not make a lot of sense to you

play00:13

so if you take a

play00:15

look at the notes i will share my screen

play00:17

and we will work our way through the

play00:19

second half there

play00:21

we're going to think

play00:23

as you recall

play00:24

about categorical data and in the

play00:26

example we've got on the screen there is

play00:28

the one we've been looking at so far

play00:30

we've got different medical conditions

play00:32

that people have

play00:33

and also different amounts of exercise

play00:35

that are embarking

play00:36

and so we can present

play00:39

that information about these 5 000

play00:42

people in the form of a table like this

play00:44

which we call a frequency distribution

play00:46

this is a two-way frequency distribution

play00:48

because there are two characteristics

play00:50

that we're interested in

play00:52

and so each of the rows represents a

play00:54

medical condition and each of the

play00:55

columns represents the amount of

play00:57

exercise people get

play00:59

so of our 5 000 people for example 43 of

play01:03

those people

play01:05

engage in a moderate to a large amount

play01:07

of exercise and have a heart disease

play01:10

okay and uh at the other extreme uh

play01:13

we've got uh 323 people who do minimal

play01:16

exercise and who suffer from depression

play01:18

as their primary medical condition

play01:22

okay so that's what that data looks like

play01:24

now we're going to

play01:26

take

play01:27

this familiar sort of way of presenting

play01:29

data and think about it now

play01:32

as

play01:32

an idea of probabilities and the reason

play01:36

we do that is because if we want to make

play01:38

the world if we want to start analyzing

play01:39

more complex data we're going to need

play01:41

some sort of

play01:42

sort of more proper tools at our

play01:44

disposal rather than just using pivot

play01:46

tables and we're going to need some

play01:48

um

play01:49

more fancy methodologies and in order to

play01:52

do that we need a sort of theoretical

play01:53

foundation to the way in which we look

play01:55

at data and the likelihood of different

play01:58

things occurring and the theoretical

play02:00

foundation we have is a probability

play02:03

and you know probabilities are pretty

play02:04

important because the world is full of

play02:06

uncertainty as you know there's a lot we

play02:08

don't know

play02:10

and so the way in which

play02:12

we capture uncertainty at least one way

play02:15

in which we capture uncertainty at risk

play02:17

and all those different things is with

play02:18

probabilities what's the likelihood of

play02:20

this or that occurring

play02:23

and so that's really the foundational

play02:24

idea

play02:25

for

play02:26

actually the way we analyze risk and

play02:28

uncertainty in the world and people are

play02:30

always sitting there making judgments

play02:32

based on an estimate of a probability

play02:35

so we as data analytics people need to

play02:37

have a bit of an idea about what

play02:39

probabilities actually are

play02:41

and at their most basic most basic level

play02:44

they're really simple probability is

play02:45

simply the proportion of times something

play02:48

happens that's all it is so of these 5

play02:51

000 people we can divide all of those

play02:53

numbers in that table there by 5 000

play02:56

and turn them all into probabilities and

play02:58

that's what these numbers here are

play03:01

okay so for example uh let's take an uh

play03:05

let's take a person who

play03:07

is engaging in moderate or frequent

play03:08

amount of exercise and let's just look

play03:10

at the bottom row here so we'll ignore

play03:13

the medical condition just look at all

play03:15

of the people

play03:16

all

play03:17

1784 people

play03:19

who embarked in moderate amount of

play03:21

exercise

play03:22

that

play03:23

1784 out of 5 000 is actually

play03:27

35.68 of the people in other words

play03:31

in probability sense if you randomly

play03:33

pick one of these people the probability

play03:35

that that person

play03:37

will do a moderate or large amount of

play03:38

exercise is 0.3568

play03:41

okay so that's a statement of

play03:43

probability you just made

play03:45

or if you randomly picked a person

play03:47

what's the chances they've got diabetes

play03:50

.0404 that number there

play03:54

okay

play03:55

the number in the last column for

play03:56

diabetes

play03:58

right okay so that's the idea of

play04:02

uh

play04:04

probabilities in the most basic sense

play04:08

now in these two examples you'll notice

play04:10

i've only looked at one characteristic

play04:11

of interest i've only looked at the

play04:13

amount of exercise or at

play04:15

the type of illness that they have

play04:17

that's what's called a marginal

play04:19

probability it's called that because

play04:21

it's in the margins of the table

play04:23

that's the simplest way to remember it

play04:24

anyway but importantly it's saying even

play04:27

though you might know information about

play04:29

two characteristics of the people we

play04:30

just want to know a probability about

play04:32

one of those two characteristics and so

play04:33

we call that a marginal probability

play04:37

often though we're interested in some of

play04:39

the numbers that are in the middle of

play04:40

the table and those numbers are our

play04:45

joint or intersection probabilities

play04:48

let's take an example of that okay so

play04:50

what's the probability of

play04:53

somebody having diabetes

play04:55

and engaging in minimal exercise

play04:58

answer

play05:00

0.0292 go to the diabetes row

play05:03

and

play05:05

the

play05:07

minimal exercise column and you get

play05:09

0.0292

play05:11

back to the table up the top here

play05:13

there's 146 of those people

play05:15

out of the 5 000 and that's how when you

play05:18

divide by 5000 that's how you get the

play05:20

0.0292

play05:23

so that is a

play05:24

joint or a intersection probability

play05:27

and you've come across that in high

play05:29

school and you've probably seen the

play05:30

symbol the upside down u symbol to

play05:32

indicate intersection

play05:34

both things have to be true

play05:36

for that probability

play05:38

for that particular event to occur they

play05:40

have to have both diabetes and minimal

play05:42

exercise

play05:44

okay

play05:45

and likewise we could choose diabetes

play05:47

and moderate to high frequency exercise

play05:50

and we get 0.0112

play05:52

okay just by looking at the second of

play05:55

the two numbers in the diabetes row

play05:58

all right then we can do any one of

play06:00

those

play06:01

marginal joint probabilities sorry

play06:03

intersectional joint

play06:04

probabilities now there's another type

play06:06

of view which we think is pretty

play06:09

interesting because this is where we

play06:10

start getting a clue about what causes

play06:12

what in the world and what connections

play06:14

are between things and that's the idea

play06:16

of a conditional probability

play06:18

let's go back to condition to

play06:20

calculating a percentage of column table

play06:23

remember how to do that i go to my pivot

play06:25

table

play06:25

and uh

play06:27

i've got one right here okay i just go

play06:29

to the cell and i click on the right

play06:31

mouse button and show values as

play06:33

percentage of column or i can do

play06:34

percentage of row or percentage of total

play06:37

if i want conditional probabilities i'm

play06:39

going to need conditional

play06:40

um columns or row totals so first of all

play06:43

let's do percentage of column

play06:45

that's what we've got here

play06:47

so these numbers here are just the same

play06:49

as what's in my table except that

play06:51

they're expressed not as percentages but

play06:54

as probabilities that's easy i just

play06:57

click the right mouse button having

play06:58

highlighted them all and format the

play06:59

cells and instead of making it a

play07:00

percentage i'm going to format it as a

play07:02

number

play07:04

then so i get

play07:06

those different

play07:07

probabilities there okay so you can

play07:10

convert pivot tables from percentages to

play07:12

numbers to probabilities quite easily

play07:16

what do these numbers mean

play07:18

well these are what's called conditional

play07:20

because you're saying we're only going

play07:21

to look at the people in the first

play07:23

column who've done minimal exercise if a

play07:26

person's done minimal exercise what's

play07:27

the probability they've got diabetes

play07:30

answer

play07:31

0.0454 so look at the first column and

play07:34

the diabetes row 0.0454

play07:36

okay

play07:37

so that's a conditional probability

play07:39

given so we use that phrase that word

play07:41

given that this is true or

play07:44

considering a person who is only

play07:47

doing minimal exercise what's the

play07:49

chances they'll have depression 0.1084

play07:52

etc

play07:54

so what about someone who's done

play07:55

moderate to frequent exercise that's the

play07:57

second column for example their

play07:59

probability of having diabetes is 0.0314

play08:03

lower than the probability for the

play08:05

person who's done minimal exercise

play08:07

okay so these are examples of

play08:09

conditioning on a particular column and

play08:11

calculate what's called conditional

play08:13

probabilities

play08:14

each column of this table is a

play08:16

probability distribution in its own

play08:18

right each person in the minimal

play08:20

exercise categories in one of these

play08:22

categories here

play08:24

as one of those conditions or no no

play08:27

medical condition

play08:29

we write down the probabilities with

play08:31

this little vertical line here to say

play08:32

it's a conditional probability so the

play08:34

probability of having diabetes given

play08:36

that you had minimal exercises is 0.0454

play08:39

so when you see that vertical line you

play08:40

just replace it with the word given

play08:45

now

play08:47

a little bit of maths to show us how we

play08:50

got from one

play08:52

set of probabilities to the other

play08:54

we got the conditional

play08:56

probabilities by doing percentage of

play08:58

column so in other words by taking a

play09:00

particular column in this table and

play09:02

dividing each of these values by the

play09:04

total for their column

play09:06

so diabetes of 0.0292 divided by 0.6432

play09:12

will give me

play09:13

0.0454 there

play09:15

so that's actually what we show you in

play09:17

this

play09:18

expression here

play09:20

so the formula for calculating a

play09:22

conditional probability

play09:24

is to calculate the intersection

play09:26

probability and divide by

play09:28

the probability the marginal probability

play09:30

of of the of the conditioning variable

play09:32

in this case minimal exercise

play09:35

that's the formula

play09:36

and

play09:37

can see its application but but

play09:39

importantly actually it's reasonably

play09:41

straightforward intuitively because it

play09:43

derives exactly from the

play09:45

structure of the pivot table that's the

play09:48

initial pivot table i just to get it to

play09:51

a percentage of column take each value

play09:53

and divide by the total for their column

play09:55

and that's essentially dividing by the

play09:58

marginal

play09:59

probability

play10:01

dividing the intersection probabilities

play10:02

by the marginal probabilities to get the

play10:04

conditional probabilities

play10:05

okay so that's the logic of it you might

play10:08

need to listen to that and think

play10:09

throughout throwing through that again

play10:10

but that's the basic idea

play10:12

everything's the same if i want

play10:14

percentage of row i go back to my pivot

play10:16

table and say oops for some reason

play10:18

instead of that i want you to show me

play10:21

show values as percentage of row

play10:25

there and so now all my rows add to one

play10:28

so now i'm conditioning on having a

play10:30

particular condition so given that you

play10:33

have depression

play10:34

what's the probability that you and back

play10:36

in minimal exercise answer 0.655

play10:40

for example okay

play10:41

so that's what percentage of row is it's

play10:44

the same idea it's still a conditional

play10:46

probability but it's flipped around what

play10:47

are you conditioning on

play10:49

so now

play10:50

you've got for example the probability

play10:52

of minimal exercise given diabetes

play10:54

rather than the probability of diabetes

play10:56

given minimal exercise

play10:57

percentage of ronald instead of

play10:59

percentage of column

play11:00

how do we calculate them

play11:02

easy just divide the values in the

play11:05

original table by the total for that

play11:08

particular row in other words divide the

play11:10

intersection probability by the marginal

play11:12

probability and you get

play11:14

the conditional probability

play11:17

that's an example given to you right

play11:18

here so you can go back and have a look

play11:20

at the tables and confirm that for

play11:22

yourself

play11:24

okay that's all fun

play11:25

well maybe not fun but just to review

play11:29

we're looking at probability because

play11:30

it's the way we describe uncertainty in

play11:32

the world and we've got three basic

play11:34

types of probability that the pivot

play11:36

table perfectly illustrates for us we've

play11:38

got

play11:40

marginal probabilities the probability

play11:42

of one particular characteristic of

play11:43

interest we've got

play11:45

joint probabilities probability of both

play11:47

things being true at once

play11:49

diabetes and minimal exercise for

play11:51

example

play11:53

and then we've got conditional

play11:54

probabilities where we say given that

play11:56

one thing is true what's the probability

play11:57

of the other thing being true

play12:00

okay so those are our three types of

play12:01

probability

play12:02

now we're going to use those to

play12:05

explore a very important phenomenon and

play12:08

that is this idea of independence so

play12:10

what's independence all about this is

play12:13

our main kind of final sort of key point

play12:15

if you like for topic one so concentrate

play12:18

hard even if you're

play12:20

finding a little bit hard to keep up

play12:21

with everything so far go back over it

play12:23

make sure you're on top of it and then

play12:24

make sure you really nail this stuff

play12:26

here

play12:27

let's go back to the condition the

play12:29

percentage of column probability table

play12:31

that we have here

play12:33

and let's just have a look at the

play12:34

chances of having diabetes that row

play12:36

there corresponding to diabetes

play12:39

and so we've got three different

play12:40

probabilities of getting diabetes

play12:42

depending on what you condition on

play12:44

if you condition on minimal exercise

play12:48

then

play12:49

you get a probability of getting

play12:51

diabetes of 0.0454 you'll see that

play12:54

in this

play12:56

line here okay given that you do minimal

play12:59

exercise you're in that first column the

play13:01

probability of diabetes is 0.0454

play13:05

the second column given that you do

play13:06

moderate exercise

play13:09

the probability of having diabetes is

play13:10

0.0314

play13:13

and the last column

play13:15

is the marginal probability of just

play13:17

having diabetes if you just don't pay

play13:20

any

play13:21

you look at all the people whether they

play13:22

do minimal or lots of exercise just

play13:24

what's the chance of having diabetes

play13:27

answer

play13:28

just over four percent

play13:31

so you'll see i've got three different

play13:33

probabilities of having diabetes

play13:36

depending upon how much exercise i do so

play13:39

there's obviously a link between

play13:41

exercise and having diabetes

play13:44

now the idea of independence

play13:47

is what is that independence will mean

play13:49

that there's no link between these two

play13:51

characteristics of interest

play13:53

an independent thing says it doesn't

play13:55

matter how much exercise you do

play13:57

you won't it makes you no more or less

play14:00

likely to have diabetes

play14:03

so

play14:05

if that was true that would be a very

play14:07

important piece of information because

play14:09

it would tell us sending people off to

play14:11

do lots of exercise isn't going to help

play14:13

it's not going to reduce the diabetes of

play14:14

the population so we better find out if

play14:16

it's true or not

play14:18

so the idea of independence is that two

play14:21

events are independent if the

play14:22

probability of one occurring is

play14:24

unaffected by the probability of the by

play14:26

the outcome of the other so whether you

play14:28

exercise a lot or exercise not much

play14:31

doesn't make any difference to the

play14:32

probability of having diabetes that's

play14:35

what would need to be true in order for

play14:37

this to be independent

play14:40

so in other words we would need the

play14:42

probability of having diabetes to be the

play14:44

same

play14:45

whether you do minimal exercise or

play14:46

frequent exercise

play14:48

okay so the probability of diabetes

play14:50

would need given that you do minimal

play14:52

exercise would just be the same as the

play14:54

probability given that you do moderate

play14:56

exercise and it'll be the same as the

play14:58

probability of just having diabetes

play14:59

overall

play15:01

that's what we want to see occur

play15:03

for it to be independent

play15:05

well guess what

play15:07

they're not equal

play15:09

you're much more likely a reasonable

play15:11

amount more likely to have diabetes if

play15:13

you do minimal exercise four and a half

play15:15

percent versus three percent for those

play15:18

that do moderate exercise and four

play15:20

percent overall for the population

play15:23

so because those three probabilities are

play15:25

not equal

play15:26

then we conclude that the amount of

play15:29

exercise you get

play15:30

is not independent

play15:32

of

play15:33

having diabetes okay so there's a

play15:36

connection between these two

play15:38

and that's important obviously

play15:40

this is a simple little study but if we

play15:42

did this more comprehensively and so on

play15:43

that would be important to public health

play15:45

messaging

play15:47

now i'm just going to run quickly

play15:48

through one more example

play15:50

a different set of data different just

play15:52

to illustrate and show you how important

play15:54

this is and i'm going to go fairly

play15:55

quickly through this because it's a bit

play15:56

of a repeat of what you've just done but

play15:59

as i say go back through it again if you

play16:01

if you don't follow it all the first

play16:02

time

play16:04

now we're trying to evaluate whether or

play16:06

not

play16:07

we can do something to help people find

play16:10

employment so we've got a job search

play16:11

program that we put people through

play16:14

and so we've got 100 people

play16:16

and some of those people participated in

play16:18

this job search program 24 of them and

play16:20

76 people didn't participate

play16:23

okay so that's what we've got

play16:25

and

play16:26

all of these people started the year

play16:28

unemployed

play16:29

by the end of the program six months

play16:31

later

play16:32

hopefully some of these people have

play16:34

found jobs

play16:35

now it turns out

play16:36

that of those hundred people by the end

play16:38

of the six months

play16:40

about half of them 49 of the 100 did

play16:42

find work

play16:43

and 51 were still unemployed

play16:45

so there's been

play16:46

some in some progress

play16:50

but

play16:52

a lot of the people here didn't do the

play16:54

program and so so we need to figure out

play16:57

whether or not and in fact quite a few

play16:59

people who didn't take part in the

play17:00

program okay

play17:02

managed to get themselves a job

play17:04

26 out of the 76 people still managed to

play17:07

get a job even though they didn't take

play17:09

part in any training

play17:11

so do we really need the training

play17:14

in other words does the training help

play17:16

people to get work is the training

play17:19

independent of

play17:21

getting a job

play17:23

that's the question you're answering

play17:25

okay how do we do it well

play17:28

we can look at the conditional

play17:29

probabilities

play17:30

given

play17:33

so first of all

play17:35

what's the probability of getting a job

play17:37

sometime in the next six months

play17:39

well answer 0.49 okay 49 of the people

play17:42

so that's the marginal probability of

play17:44

finding a job

play17:45

okay 49 of people were able to get a job

play17:49

if you did the program

play17:52

this is impressive 23 out of the 24

play17:55

people that did the program got a job

play17:58

so given so this is a conditional

play18:00

probability that you participated in the

play18:02

program your chance of getting a job is

play18:04

0.96

play18:06

if you didn't participate in the program

play18:10

there's 76 of them well some of them did

play18:13

get a job but only 26 of them so in

play18:15

other words about .33 is the probability

play18:18

the conditional probability of getting a

play18:20

job if you didn't do the program so

play18:23

given that you did not participate

play18:24

what's the chances of getting a job

play18:27

so

play18:28

is doing the training independent of

play18:30

getting a job well

play18:32

are the probabilities of employment

play18:34

given that you participated the same as

play18:36

the probability of employment given that

play18:38

you did not participate and the overall

play18:40

probability of employment

play18:41

absolutely not 0.96 0.34 and 0.49 are

play18:46

not equal

play18:47

okay clearly

play18:48

doing the training program has gained

play18:50

you a much more likely to get a job 96

play18:53

chance of getting a job versus 34 for

play18:56

those that didn't do the training the

play18:58

training program worked

play19:00

okay so this is a great example of

play19:03

examining independence to try and give

play19:05

us an idea about what how the world

play19:07

works

play19:08

and in fact this is an example of what

play19:11

we refer to as program evaluation this

play19:14

is just a general comment for you to to

play19:16

help you put it in context you might

play19:17

feel we're doing something really nitty

play19:19

gritty and rather uninteresting here but

play19:21

this is the basic idea behind pretty

play19:24

much what everybody does whenever we

play19:26

evaluate anything

play19:27

does it work or not and hopefully we

play19:29

evaluate things because you know the

play19:30

government wants to fix problems whether

play19:32

it's make people healthier or get people

play19:34

jobs or

play19:35

you know help reduce crime or whatever

play19:38

it might be they're going to introduce

play19:40

programs to do it you'd like to know

play19:42

whether the programs work or not

play19:44

well the basic approach you take to

play19:47

evaluating a program is exactly what i

play19:49

just described in this example here

play19:53

namely

play19:54

you look at some of the people that did

play19:55

the program and you look at some people

play19:57

who didn't do the program and you look

play19:59

at the probability of success you know

play20:01

getting a job in this case for those

play20:03

that did the program versus probability

play20:05

of success for those that didn't well it

play20:07

better be better for the ones that did

play20:09

the program

play20:11

and if it is

play20:12

then the program at least partially

play20:15

works in this case really well no 96

play20:18

success rate is pretty impressive

play20:21

okay

play20:21

so that's the idea of program evaluation

play20:23

it's precisely this more complicated and

play20:25

you can probably think for yourself some

play20:27

of the problems with this particular

play20:29

example here

play20:30

which i'll just hint at which is

play20:33

who chose

play20:35

who participates in the program or not

play20:38

maybe all of the highly motivated people

play20:40

took part in the program and all the

play20:42

lazy people didn't bother

play20:44

maybe that's the reason why there was

play20:45

such high success rate here's a little

play20:47

clue for you that this is not as neat as

play20:49

it looks but we'll leave that as a

play20:50

question for you to ponder and think

play20:51

about and talk about

play20:54

okay now just briefly

play20:56

to finish up this video

play20:58

so far everything we've done so far has

play21:00

just been you know calculating

play21:01

probabilities but actually in the real

play21:04

world you've only got a sample of people

play21:05

and these are estimates and so little

play21:07

differences in probability here may not

play21:09

actually be real they might just be

play21:11

flukes so we need something a bit more

play21:13

fancy a statistical test to decide

play21:15

whether these differences in conditional

play21:17

probabilities are are big enough to be

play21:20

sort of robust and to be kind of

play21:22

legitimate not just due to chance and

play21:25

that's what we'll do in later later

play21:27

videos when you

play21:29

learn more about statistical tests and

play21:31

so on

play21:33

okay my wife just come to join this

play21:35

video you can say hello to her

play21:38

another time okay now lastly these are

play21:41

the hazards of working from home as you

play21:43

i'm sure are well aware of if you've

play21:44

been studying from home for a while

play21:46

lastly for those of you who are

play21:48

studying this at a higher level i would

play21:50

encourage you to have a look through the

play21:53

advanced section of the topic

play21:56

this is where we go into a little bit

play21:57

more about probability distributions and

play21:59

a very couple of very common probability

play22:00

distributions just to give you a taste

play22:02

of future things that you'll look at in

play22:05

times to come okay thank you very much

play22:07

all the best

Rate This

5.0 / 5 (0 votes)

Связанные теги
Data AnalysisProbabilitiesCategorical DataFrequency DistributionMedical DataExercise ImpactConditional ProbabilityJoint ProbabilityIndependence TestProgram Evaluation
Вам нужно краткое изложение на английском?