Explanatory and Response Variables, Correlation (2.1)

Simple Learning Pro
18 Nov 201507:25

Summary

TLDRThis video script explores the concepts of explanatory and response variables, as well as correlation, in the context of statistical analysis. It explains how to use time plots and scatter plots to visualize the relationship between two variables, emphasizing the importance of identifying which variable is explanatory and which is responsive. The script delves into how correlation, denoted as 'R', measures both the direction and strength of a linear relationship between variables, and provides a step-by-step guide on calculating it using a formula. It concludes with a cautionary note on the potential for visual deception when interpreting correlation from scatter plots alone.

Takeaways

  • 📊 Descriptive statistics like histograms, stem plots, and box plots are used to describe a single variable, while time plots and scatter plots are used to show relationships between two variables.
  • 🌳 In a time plot, one variable is the explanatory variable (e.g., age of a tree), and the other is the response variable (e.g., tree height), indicating a cause-and-effect relationship.
  • 📈 A scatter plot represents the values of two quantitative variables from the same population, with the explanatory variable typically on the x-axis and the response variable on the y-axis.
  • 🔄 The terms 'explanatory variable' and 'response variable' can be used interchangeably with 'independent variable' and 'dependent variable', respectively.
  • ⚠️ Not all data sets have an explanatory and response variable; some variables are unrelated and do not have a cause-and-effect relationship.
  • 🔍 Correlation, denoted as R, measures the direction and strength of a linear relationship between two quantitative variables, independent of whether they are explanatory or response variables.
  • ⬆️ A positive correlation indicates an upward slope in the data set, while a negative correlation indicates a downward slope.
  • 🔢 The value of R ranges from -1 to 1, with -1 indicating a perfect negative correlation, 1 indicating a perfect positive correlation, and 0 indicating no correlation.
  • 📝 The formula for calculating correlation involves the means, standard deviations, and the sum of the products of the differences from the means of the paired variables.
  • 📚 To calculate correlation, gather data, create a table, calculate means and standard deviations, and apply the formula to find the correlation coefficient.
  • 👀 Visual inspection of a scatter plot can be misleading; the actual value of R should be calculated and interpreted numerically rather than relying on visual assessment alone.

Q & A

  • What is the purpose of using histograms, stem plots, and box plots?

    -Histograms, stem plots, and box plots are used to describe one variable. They help in understanding the distribution and characteristics of a single dataset.

  • How can we compare two different populations with respect to the same variable?

    -We can use back-to-back stem plots and side-by-side box plots to compare two different populations with respect to the same variable, which helps in visualizing the differences and similarities between the two groups.

  • What is the difference between a response variable and an explanatory variable?

    -A response variable measures the outcome of a study, while an explanatory variable explains the outcome. For example, in a study about trees, the height of the tree is the response variable, and the age of the tree is the explanatory variable.

  • Why is a time plot used to show the relationship between two variables?

    -A time plot is used to show the relationship between two variables when there is a temporal relationship, where one variable changes in response to the other over time.

  • What is a scatter plot and how is it different from a time plot?

    -A scatter plot is a graphical representation of two quantitative variables from the same population, where each dot represents an individual's values for both variables. Unlike a time plot, a scatter plot does not necessarily have time on the x-axis.

  • Why is the explanatory variable usually plotted on the x-axis and the response variable on the y-axis?

    -The explanatory variable is plotted on the x-axis and the response variable on the y-axis because it represents the independent and dependent variables, respectively, with the explanatory variable influencing the response variable.

  • What is the role of correlation in data analysis?

    -Correlation, denoted as R, measures the direction and strength of a linear relationship between two quantitative variables. It helps in understanding how two variables move in relation to each other.

  • How is the direction of a data set's slope related to the value of R in correlation?

    -If a data set has an upward slope, R is positive, indicating a positive correlation. If it has a downward slope, R is negative, indicating a negative correlation. A perfect straight line slope results in R being either +1 or -1, representing perfect positive or negative correlation.

  • What does the value of R equal to 0 indicate in terms of correlation?

    -When R is equal to 0, it indicates that there is no correlation, meaning there is no linear relationship between the two variables.

  • How can we calculate the correlation between two variables?

    -Correlation can be calculated using a formula that involves the means, standard deviations, and the sum of the products of the differences from the means for each variable. The formula is more complex than it appears but follows a systematic process of calculation.

  • Why is it misleading to interpret correlation based solely on the visual appearance of a scatter plot?

    -Interpreting correlation based on the visual appearance of a scatter plot can be misleading because different scales can make the relationship appear stronger or weaker. The actual numerical value of R is needed to accurately determine the strength and direction of the correlation.

Outlines

00:00

📊 Exploring Variables and Correlation

This paragraph introduces the concepts of explanatory and response variables, as well as the idea of correlation. It explains how histograms, stem plots, and box plots can be used to describe a single variable, and how time plots and scatter plots can be utilized to examine the relationship between two variables. The explanatory variable is described as the one that influences the outcome, while the response variable is the outcome itself. The paragraph also clarifies that correlation, denoted by R, can be determined without strictly identifying explanatory and response variables. It discusses how correlation measures the direction and strength of a linear relationship between two quantitative variables, with positive and negative values indicating the slope of the relationship and values close to 1 or -1 indicating a strong linear relationship. The formula for calculating correlation is introduced, and an example of calculating it using a teacher's data on study hours and test scores is provided.

05:00

📈 Calculating and Interpreting Correlation

The second paragraph delves into the process of calculating the correlation coefficient, R, using a formula that involves the means and standard deviations of the variables in question. It provides a step-by-step guide on how to create a table for calculations, starting with finding the means of the X and Y values, then subtracting these means from each respective value to create deviations. These deviations are multiplied together and summed up to form part of the correlation formula. The paragraph emphasizes the importance of calculating standard deviations and correctly applying them in the formula to find the value of R. An example calculation is presented, resulting in an R value of 0.602, which indicates a moderate positive correlation. The paragraph also cautions against relying solely on visual interpretation of scatter plots to determine correlation, as different scales can affect perception, and stresses the importance of using calculated values for accurate interpretation.

Mindmap

Keywords

💡Explanatory Variable

An explanatory variable is a factor that is believed to influence another variable in a study. In the context of the video, the age of a tree is an explanatory variable because it is thought to explain the height of the tree. The video emphasizes that as the explanatory variable (age) increases, the response variable (height) also increases, illustrating a direct relationship.

💡Response Variable

A response variable is the outcome of a study that is measured to see if it is affected by the explanatory variable. In the video, the height of the tree is the response variable, which is measured to determine how it responds to changes in the explanatory variable (age). The video script uses this concept to explain how one variable can be the result of another.

💡Correlation

Correlation is a measure that expresses the extent to which two variables are linearly related. The video script explains that correlation, denoted as 'R', indicates the direction and strength of the relationship between two quantitative variables. It is a key concept in the video, as it is used to discuss the relationship between studying hours and test scores, and how it can be visually represented through scatter plots.

💡Scatter Plot

A scatter plot is a type of plot that shows the values of two variables for a set of data. Each dot on the scatter plot represents the values of the two variables for an individual data point. In the video, a scatter plot is used to illustrate the relationship between the number of hours studied and test scores, showing how each variable relates to the other without the constraint of time being on the x-axis.

💡Time Plot

A time plot is a specific type of graph where one of the variables is time, and it is used to show the relationship between two variables when one is believed to influence the other over time. The video script mentions time plots as a way to represent the relationship between the age of a tree (explanatory variable) and its height (response variable) over time.

💡Histogram

A histogram is a graphical representation of the distribution of a set of data, which the video script mentions as a way to describe one variable. It is used to show the frequency of data points within certain ranges or 'bins' and is a tool for understanding the distribution of a single variable.

💡Stem Plot

A stem plot is a graphical method to display data that consists of a 'stem' (a common part of all data points) and 'leaves' (the unique part of each data point). The video script refers to stem plots as a way to describe one variable and compares them to histograms and box plots for data representation.

💡Box Plot

A box plot is a standardized way of displaying the distribution of data based on a five-number summary: minimum, first quartile (Q1), median, third quartile (Q3), and maximum. The video script discusses box plots as a tool for comparing two different populations with respect to the same variable.

💡Direction

In the context of correlation, direction refers to whether the relationship between two variables is positive (an upward slope) or negative (a downward slope). The video script explains that a positive correlation (R) indicates an upward slope, while a negative correlation indicates a downward slope, reflecting the nature of the linear relationship.

💡Strength

Strength in correlation refers to how closely the data points follow a straight line, indicating the degree of linear relationship between two variables. The video script clarifies that the strength of the relationship increases as the correlation coefficient 'R' approaches +1 or -1, with R = 0 indicating no linear relationship.

💡Formula

The formula mentioned in the video script is used to calculate the correlation coefficient 'R'. It involves several steps including calculating means, subtracting each value from the mean, multiplying the differences, and using standard deviations to arrive at the final 'R' value. The formula is central to the video's explanation of how to quantify the relationship between two variables.

Highlights

Explanatory and response variables are introduced as key concepts for understanding the relationship between two variables in statistical analysis.

Histograms, stem plots, and box plots are discussed as methods for describing a single variable.

Back-to-back stem plots and side-by-side box plots are used for comparing two populations with respect to the same variable.

Time plots are used to show the relationship between two variables when one is the outcome of the other.

The age of a tree is given as an example of an explanatory variable influencing the height, which is the response variable.

Scatter plots are introduced as a method to show the relationship between two quantitative variables without the need for time on the x-axis.

Each dot in a scatter plot represents an individual, illustrating the values of two variables for that individual.

The explanatory variable is typically plotted on the x-axis, and the response variable on the y-axis.

The concepts of independent and dependent variables are related to explanatory and response variables.

Examples are given where there is no need for explanatory or response variables, such as comparing unrelated events.

Correlation, denoted as R, is explained as a measure of the direction and strength of a linear relationship between two variables.

The direction of the slope in a data set determines whether the correlation is positive or negative.

Perfect positive and negative correlations are described with R values of +1 and -1, respectively.

The strength of the linear relationship is measured by how close R is to +1 or -1.

A formula for calculating the correlation coefficient is provided, emphasizing its importance in statistical analysis.

A practical example of calculating correlation between study hours and test scores is given, demonstrating the formula's application.

The importance of not relying solely on visual interpretation of scatter plots to determine correlation is stressed.

The concept that numbers do not lie is highlighted as crucial when interpreting correlation to avoid deception by graphs.

Transcripts

play00:06

in this video we will be looking at

play00:07

explanatory variables response variables

play00:10

and

play00:11

correlation in the first video set we

play00:13

talked about how we can use histograms

play00:15

stem plots and box plots to describe one

play00:18

variable and we also used back-to-back

play00:20

stem plots and side-by-side box plots to

play00:23

help us compare two different

play00:24

populations with respect to the same

play00:27

variable these were good for describing

play00:29

one variable but what about two

play00:32

variables well we saw that we can use a

play00:34

Time plot to show this if there is a

play00:36

relationship between these two variables

play00:38

one can be called the response variable

play00:40

and the other can be called the

play00:41

explanatory variable the response

play00:44

variable measures the outcome of a study

play00:46

and the explanatory variable explains

play00:48

the outcome of a study in this example

play00:51

the age of the tree is the explanatory

play00:53

variable because as the tree gets older

play00:56

the taller it will be so we can say that

play00:58

the age of the tree EXP explains its

play01:00

height this also means that the height

play01:02

of the tree is the response

play01:05

variable another way to show the

play01:07

relationship between two variables is by

play01:09

using a scatter plot unlike a Time plot

play01:12

a scatter plot does not need to show

play01:14

time on the x-axis a scatter plot shows

play01:16

the values of two quantitative variables

play01:19

that were measured from the same

play01:20

population of

play01:22

individuals this is a typical scatter

play01:24

plot you can think of each dot as being

play01:27

an individual so if we looked at this

play01:29

individual we can see that this person

play01:31

studied for 14 hours and got a test

play01:33

score of 62 and this individual studied

play01:37

for almost 6 hours and got a test score

play01:39

of 30 you might have noticed that the

play01:41

explanatory variable is always plotted

play01:43

on the x-axis and the response variable

play01:46

is always plotted on the y- AIS for this

play01:49

reason the explanatory variable is

play01:51

denoted as X and the response variable

play01:54

is denoted as y you can also think of

play01:57

the explanatory variable as being the

play01:59

independent variable and the response

play02:01

variable being the dependent

play02:04

variable note that it is possible to not

play02:07

have an explanatory variable and a

play02:08

response variable for example the number

play02:11

of points scored during a football game

play02:13

versus the number of points scored

play02:15

during a basketball game these are two

play02:18

unrelated events and one variable does

play02:20

not precisely explain the other if there

play02:22

are no explanatory or response variables

play02:25

then it doesn't matter where you plot

play02:27

each variable on the graph this is

play02:29

commonly seen when trying to compare two

play02:31

unrelated variables or

play02:33

events before we talk about correlation

play02:36

I'd like to point out that when

play02:37

determining correlation explanatory and

play02:40

response variables are not necessary now

play02:43

correlation is denoted as R and it tells

play02:46

you about the direction and strength of

play02:48

a linear relationship shared between two

play02:50

quantitative variables correlation can

play02:53

be expressed using Scatter Plots so

play02:56

let's talk about how Direction and

play02:57

strength is measured by correlation

play03:00

we'll talk about Direction first

play03:03

correlation tells us about the direction

play03:04

or slope of a set of data so it tells us

play03:07

if a data set has an upward slope or a

play03:10

downward slope if we have an upward

play03:12

slope we can say that R is positive if

play03:15

we have a downward slope then R is

play03:17

negative if we have an upward slope and

play03:20

the data points follow a perfect

play03:21

straight line then R is equal to pos1

play03:25

and this is called a perfect positive

play03:27

correlation in contrast if we have a

play03:30

downward slope and the data points

play03:31

follow a perfect straight line then R is

play03:34

equal to -1 and this is called a perfect

play03:36

negative correlation in both of these

play03:39

cases we have a perfect linear

play03:41

relationship correlation measures the

play03:44

strength of this linear relationship we

play03:46

can actually notice a pattern about how

play03:48

correlation measures strength we saw

play03:50

that R has values between positive 1 and

play03:53

Nega one and we saw that the strength of

play03:55

the linear relationship increased as R

play03:58

got close to positive 1 or negative 1 so

play04:01

when R is equal to Z this means that

play04:03

there is no correlation in other words

play04:06

there is no linear relationship

play04:09

whatsoever we can see that as R gets

play04:11

closer and closer to positive one the

play04:13

linear relationship gets stronger and as

play04:16

R gets closer and closer to -1 the

play04:18

linear relationship also gets

play04:21

stronger so how do you calculate

play04:23

correlation correlation can be

play04:25

calculated using this formula it seems

play04:28

like a complicated formula formula but

play04:30

it's easier than it looks so suppose a

play04:33

teacher wants to determine the

play04:34

correlation between the number of hours

play04:36

spent studying and test scores to do

play04:39

this he would first have to gather some

play04:40

data I will assign the number of hours

play04:43

spent studying as X and I will assign

play04:45

the test scores to be y remember that

play04:48

correlation doesn't care about

play04:49

explanatory or response variables so it

play04:52

didn't matter how I assigned these

play04:53

variables because I would end up with

play04:55

the same value of R when determining

play04:58

correlation it's a good idea to make a

play05:00

table to help you with your calculations

play05:02

this table corresponds to the formula

play05:05

specifically this part of the

play05:07

formula so the first step is to

play05:10

calculate the means for the X values and

play05:12

the Y values which you should already

play05:13

know how to do then we will subtract

play05:16

each x value from xar so we will have 13

play05:20

-

play05:21

12.5 which is equal to

play05:23

0.5 for the second row we will have 15 -

play05:28

12.5 which is equal to

play05:30

2.5 we will do this for each x value for

play05:34

the Y values we would have 53 minus 68

play05:39

which is equal to -15 we are basically

play05:42

doing the same process for each yval for

play05:45

the next step we will have to multiply

play05:47

each of these terms together so we will

play05:50

have 0.5 * -15 and for the second row we

play05:54

will have 2.5 * 1 and so

play05:57

on the next step is to add up all all

play05:59

these products together and that gives

play06:01

us

play06:02

821 we can now plug this value into the

play06:05

formula and since we added six products

play06:08

and will be equal to six next we need to

play06:11

calculate the standard deviations for

play06:13

each variable and you should already

play06:15

know how to do

play06:16

this at this point we have all of the

play06:18

ingredients we need for the formula so

play06:20

we can plug in SX and Sy Y into the

play06:23

formula and when we simplify this we get

play06:25

our value of R which is equal to 0.602

play06:30

so if we plotted our data it would look

play06:32

like this we determined that R is equal

play06:35

to

play06:35

pos6 and this makes sense because each

play06:38

data value seems to follow an upward

play06:41

Direction be careful when trying to

play06:43

interpret correlation by just looking at

play06:45

the scatter plot when comparing these

play06:48

graphs we might think that the scatter

play06:50

plot on the right has a higher r value

play06:52

because the data points are closer

play06:54

together and it looks like it visually

play06:56

displays a stronger linear

play06:58

relationship unless R is equal to plus

play07:00

or minus one it's really hard to

play07:02

determine the value of R just by using

play07:04

our eyes both of these graphs are

play07:07

actually the same and they have the same

play07:09

value of R they were just made using

play07:11

different scales and this is why graphs

play07:14

can deceive us using the notion that

play07:16

numbers do not lie is applicable when

play07:18

trying to interpret correlation

Rate This

5.0 / 5 (0 votes)

Related Tags
Data AnalysisExplanatory VariableResponse VariableCorrelationScatter PlotHistogramsStem PlotsBox PlotsTime SeriesStatistical MethodsEducational Content