How to Calculate a Correlation (and P-Value) in Microsoft Excel

Quantitative Specialists
15 Sept 201405:15

Summary

TLDRThis video tutorial demonstrates how to calculate the significance of a correlation coefficient in Microsoft Excel, which doesn't provide p-values directly through its correlation function. The workaround involves using the regression tool to obtain the p-value. The presenter guides viewers through selecting the correct ranges for input Y and input X, running the regression, and finding the p-value from the ANOVA table. The video concludes with a discussion on interpreting the p-value to determine the statistical significance of the correlation between hours studied and exam grades.

Takeaways

  • 😕 Microsoft Excel's Data Analysis Toolpak does not provide a p-value for correlation analysis, making it difficult to assess statistical significance.
  • 🔍 A workaround is to use regression analysis in Excel to obtain the p-value, which helps determine the significance of the correlation.
  • 📊 For input Y range, select the dependent variable (exam grades), and for input X range, select the independent variable (hours studied), including their labels.
  • 📈 In a simple regression with two variables, the multiple R value is equivalent to the Pearson correlation coefficient (r).
  • 📉 The p-value can be found in the ANOVA table under 'significance' and is also identical to the p-value under the independent variable in the regression output.
  • 🔑 An alpha level of 0.05 is used as the threshold for statistical significance; if the p-value is less than 0.05, the correlation is considered significant.
  • 📝 The script demonstrates that a correlation coefficient of .86 is statistically significant, indicating a strong positive relationship between study hours and exam grades.
  • 📋 The results are reported with the Pearson's r value, degrees of freedom (df = N - 2), and the p-value (less than .001 or less than .05, depending on the context).
  • 📖 The video concludes by showing how to interpret and report the p-value in the context of a correlation analysis using Excel's Data Analysis Toolpak.

Q & A

  • What is the issue with using Microsoft Excel's Data Analysis Toolpak to calculate correlation?

    -The issue is that it does not provide a p-value, which is necessary to assess whether the correlation is statistically significant.

  • How can we obtain a p-value for correlation in Excel?

    -We can obtain a p-value by using the regression feature in Excel, as it provides a p-value that can be used to assess the statistical significance of the correlation.

  • What are the two boxes in the regression input that need to be filled out?

    -The two boxes are 'Input Y range' for the dependent variable and 'Input X range' for the independent variable.

  • Why is it important to include the variable names when selecting the ranges for regression in Excel?

    -Including the variable names ensures that the correct data is being analyzed and helps in interpreting the results accurately.

  • What is the relationship between 'Multiple R' and 'Pearson r' when there are only two variables?

    -When there are only two variables, 'Multiple R' is identical to 'Pearson r', indicating the correlation between the two variables.

  • Where can the p-value be found in the regression output in Excel?

    -The p-value can be found in the ANOVA table under 'significance' and also under the 'p-value' for the independent variable in the regression output.

  • What is the decision rule for determining statistical significance when using an alpha of .05?

    -If the p-value is less than .05, the correlation is considered statistically significant.

  • What does a p-value of .0001 indicate about the correlation between hours studied and exam grades?

    -A p-value of .0001 indicates that there is a statistically significant positive relationship between hours studied and exam grades.

  • How is the degrees of freedom (df) calculated in this context?

    -The degrees of freedom (df) is calculated as N minus 2, where N is the number of observations.

  • Why is it acceptable to report 'p < .05' even if the actual p-value is less than .001?

    -Reporting 'p < .05' is acceptable because the alpha level used for the test is .05, and any p-value below this threshold indicates statistical significance.

  • What is the practical limit for reporting p-values in written results?

    -The practical limit for reporting p-values is typically 'less than .001', as p-values usually do not get reported smaller than this value.

Outlines

00:00

📊 Obtaining the Significance of Correlation in Excel

This paragraph discusses the process of determining if a correlation is statistically significant in Microsoft Excel. It explains that while Excel's Data Analysis Toolpak can calculate the correlation coefficient, it does not provide a p-value to assess significance. To overcome this, the speaker suggests using the regression tool instead. The steps include selecting the exam grade values as the input Y range and the hours studied as the input X range, ensuring to include variable names. The output from the regression analysis reveals the p-value, which is crucial for determining statistical significance. The paragraph highlights that with an alpha level of 0.05, a p-value less than this threshold indicates a significant correlation. The example provided shows a significant positive relationship between hours studied and exam grades, with a Pearson's r of 0.86 and a p-value much less than 0.05.

05:02

🔚 Conclusion on Correlation Coefficient Significance

The final paragraph of the video script concludes the tutorial on how to obtain the p-value for the correlation coefficient using Microsoft Excel's Data Analysis Toolpak. It summarizes the process and the significance of the findings, reinforcing the importance of the p-value in statistical analysis. The paragraph effectively wraps up the tutorial, providing a clear conclusion to the audience.

Mindmap

Keywords

💡Correlation

Correlation refers to the statistical relationship between two variables. In the video, the correlation is discussed in terms of the relationship between hours studied and exam grades, showing a positive relationship, meaning that as hours studied increase, so do exam grades. The Pearson correlation coefficient (r) is calculated to quantify this relationship.

💡Significance

Significance refers to the statistical measure that indicates whether the relationship between variables is likely due to chance. In the video, the speaker explains how to determine if the correlation between hours studied and exam grades is statistically significant using the p-value from regression analysis. A p-value less than 0.05 indicates that the correlation is significant.

💡p-value

The p-value helps assess whether the observed correlation is statistically significant. In the video, the p-value is obtained through regression analysis in Excel, and it is used to determine whether the positive correlation between hours studied and exam grade is statistically significant. A very small p-value (e.g., 0.0001) indicates strong evidence against the null hypothesis, suggesting the correlation is significant.

💡Regression

Regression is a statistical method used to examine the relationship between dependent and independent variables. In the video, the speaker uses Excel’s regression tool to calculate the p-value for the correlation between hours studied and exam grades. Regression allows them to obtain both the correlation coefficient and the p-value in a more detailed analysis.

💡Pearson's r

Pearson’s r is a measure of the strength and direction of a linear relationship between two variables. In the video, the speaker discusses how the Pearson correlation coefficient for hours studied and exam grades is 0.86, which indicates a strong positive correlation. This value is also referred to as multiple R in the context of regression output.

💡ANOVA table

ANOVA (Analysis of Variance) is a table that provides key statistics from regression analysis, including the p-value. In the video, the ANOVA table from Excel's output is used to locate the p-value (referred to as 'Significance F'), which is necessary to determine if the correlation between hours studied and exam grade is statistically significant.

💡Multiple R

Multiple R is another term for Pearson’s r when there are just two variables involved. In the video, the speaker points out that for a two-variable regression, Multiple R is identical to the Pearson correlation coefficient, and in this example, it is 0.86, indicating a strong positive relationship between hours studied and exam grades.

💡Alpha level (α)

The alpha level (α) is the threshold used to decide if a p-value indicates statistical significance. In the video, the speaker uses an alpha level of 0.05, meaning that if the p-value is below 0.05, the correlation is deemed statistically significant. In this example, the p-value is much smaller (0.0001), confirming significance.

💡Degrees of freedom (df)

Degrees of freedom refer to the number of independent values in a calculation that can vary. In the video, the degrees of freedom are calculated as N (the sample size) minus 2, where N is 13, giving 11 degrees of freedom. This value is included in the reporting of the correlation results.

💡Data Analysis Toolpak

The Data Analysis Toolpak is an Excel add-in used to perform complex data analyses. In the video, the speaker uses this tool to run regression analysis to obtain both the Pearson correlation coefficient and the p-value for assessing the relationship between hours studied and exam grades.

Highlights

In Microsoft Excel, the Data Analysis toolpak does not provide a p-value for correlation.

A workaround is to use regression analysis to obtain a p-value.

Select 'Input Y range' for the dependent variable, such as exam grades.

Select 'Input X range' for the independent variable, like hours studied.

Ensure to select the variable names if they are included in the data set.

Regression analysis provides a p-value under the ANOVA table.

With two variables, the multiple R is identical to the Pearson r.

The p-value for the correlation can be found under 'Significance F' or 'p-value'.

A p-value less than .05 indicates a statistically significant correlation.

A correlation coefficient of .86 is significant, showing a positive relationship.

The results can be written as a significant positive relationship with r(11) = .86 and p < .001.

The degrees of freedom (df) are calculated as N - 2, where N is the sample size.

A more informative p-value is provided, which is less than .001.

The p-value reported is usually not smaller than less than .001 in written results.

This method concludes how to obtain the p-value for the correlation coefficient in Excel.

Transcripts

play00:00

In our last video, we went ahead and calculated the correlation on these

play00:03

values,

play00:04

but we didn't get a significance value for the correlation to see whether it

play00:09

was statistically significant or not.

play00:12

And unfortunately in Microsoft Excel, when we run the correlation procedure through

play00:17

the data analysis toolpak it does not give us a p-value,

play00:21

so we can't assess whether a given correlation is statistically significant

play00:26

or not. But thankfully there is a work- around here and what we need to do is if

play00:31

we go to regression, and select that, we can obtain a p-value this way.

play00:38

Now what we'll do is - notice there's two boxes here - input Y range and input X

play00:43

range;

play00:44

for input Y range we'll go ahead and select the exam grade values and I'm

play00:49

also going to select the label or the variable name in the first row here,

play00:52

exam grade.

play00:54

Next click in the input X range box here and then select hours studied and all the

play01:00

values there and then be sure if you do select the variable names as I have here

play01:06

be sure to select labels. Okay that looks good,

play01:11

let's go ahead and click OK. And then here we get our output; I'm going to

play01:15

expand this a little bit just to the values we need there's a lot of

play01:18

information here but I really only need a few values and let's go ahead and

play01:24

highlight those here.

play01:25

First of all let's make this font a little bit bigger, so it's easier to read.

play01:31

All right and then we'll expand what we need here. Now first of all if you watched

play01:39

the last video on obtaining a correlation under where it says multiple

play01:44

R in regression

play01:46

if we have just two variables, like we do in this correlation example where we

play01:51

have hours studied and exam grade then with two variables the multiple R is

play01:57

just identical to the Pearson r. So notice how this is also once again

play02:04

.86.

play02:06

So we could

play02:07

really just run regression to get our correlation if you look under multiple

play02:11

R here. Now for the p-value what we want to do is we can go to the ANOVA

play02:15

table and under significance here this is our p-value and it is very small

play02:22

there. Now just in case you're interested there's also the exact same p-value when

play02:28

we have just two variables also is located under hours studied where it says p

play02:33

value.

play02:34

So notice how these two are the exact same here significance F

play02:39

and p-value; they're identical. Now what we're going to do here is we'll use alpha

play02:45

of .05

play02:47

and the decision rule is as follows: if our p-value, as given by the significance F,

play02:52

or p-value right here

play02:54

if it's less than .05, then the correlation is statistically

play03:00

significant

play03:01

and since .0001 is definitely less than .05,

play03:07

that indicates the correlation is in fact significant.

play03:12

Since

play03:13

this correlation coefficient of .86 is statistically significant,

play03:16

that means that there is a significant positive relationship between hours

play03:22

studied and the grade on the exam. And we can go ahead and write these results as

play03:29

follows:

play03:30

There is a significant positive relationship between the number of hours

play03:34

spent studying and the grade on the exam and then I have r, and that's for

play03:39

pearson's r, 11, and that's equal to the degrees of freedom, where df is equal to N

play03:45

minus 2 and N is equal to 13, so 13-2 is 11.

play03:59

So

play04:01

r(11) equals .86 and that was the value of Pearson's r if you remember. And I put p is less

play04:06

than .001. Now, alternatively, you could put p is less

play04:10

than .05 if you wanted to.

play04:13

Now this p-value gives us more information it is more informative than

play04:16

the p-value of less than .05. But since we used an alpha of point zero

play04:22

five in this test, it would be acceptable to put p is less than .05 if

play04:27

we wanted to, but this does provide more information. And the reason why I said p

play04:32

is less than .001, is because if you look back at our window in Excel

play04:38

in our output

play04:39

this value is less than .001 as it goes to four decimal places

play04:47

before the one appears, but it's not less than point .0001.

play04:52

In any event, in most cases when you see a p-value reported like this

play04:57

it usually doesn't get smaller than less than .001; that's about

play05:01

the limit where we report it in our written results.

play05:05

This concludes the video on obtaining the p-value for the correlation

play05:09

coefficient using the Data Analysis Toolpak in Microsoft Excel.

Rate This

5.0 / 5 (0 votes)

Связанные теги
Excel TutorialStatistical SignificanceCorrelation AnalysisRegression AnalysisData AnalysisP-Value CalculationStatistical TestMicrosoft ExcelEducational ContentResearch Method
Вам нужно краткое изложение на английском?