04 Correlation in SPSS – SPSS for Beginners
Summary
TLDRThis tutorial from the RStats Institute introduces the basics of correlation analysis in SPSS for beginners. It explains how to perform a Pearson's correlation to examine the relationship between two variables, such as height and weight, using a sample dataset. The video covers setting up data, interpreting the correlation matrix, and understanding significance levels. It also demonstrates how to include a third variable in the correlation, resulting in a point biserial correlation, and how to visualize correlations using a scatter plot with the Chart Builder tool. The tutorial concludes by highlighting the significance of correlation in statistical analysis and its applications in further statistical techniques like regression.
Takeaways
- 📚 The video is part of a series on SPSS for beginners by the RStats Institute at Missouri State University.
- 🔍 It focuses on teaching the basics of correlation analysis in SPSS, specifically Pearson's r, which measures the relationship between two variables.
- ⚖️ Pearson's r ranges from -1 to +1, with 0 indicating no relationship, and values closer to +1 or -1 indicating a stronger relationship.
- 📈 The example used in the video involves analyzing the relationship between height and weight, with each person's height paired with their weight.
- 🔎 The video demonstrates how to perform a correlation analysis in SPSS, including selecting the 'Bivariate' option and choosing the variables to correlate.
- 📊 The output of a correlation analysis in SPSS is a 'correlation matrix', which shows the correlation coefficients between variables.
- 🔑 The significance of the correlation is indicated by the p-value; if it's less than .05, the correlation is statistically significant.
- 👥 The video mentions that SPSS will ignore cases with missing values, which is important for understanding the sample size used in the analysis.
- 📊 The video also introduces the concept of a scatter plot as a visual representation of the correlation between two variables.
- 📝 The script suggests that viewers watch additional videos from the RStats Institute for more information on theory, analysis, and APA style reporting.
- 📈 Lastly, the video briefly mentions other statistical techniques such as point biserial correlation, regression, and t-tests for analyzing differences between variables.
Q & A
What is the main focus of the fourth video in the SPSS for beginners series?
-The main focus of the fourth video is to teach viewers how to perform simple analyses, specifically how to calculate and interpret Pearson's correlation in SPSS.
What is Pearson's r and what does it measure?
-Pearson's r is a statistical measure that describes the relationship between two variables. It ranges between -1 and +1, where 0 indicates no relationship, and values closer to +1 or -1 indicate a stronger relationship.
Why is it important that each person's height is paired with their own weight in the dataset?
-It is important because the analysis requires each pair of scores to stay together to make sense. Pairing one person's height with another person's weight would not be meaningful in the context of the analysis.
How does SPSS handle cases with missing values during a correlation analysis?
-SPSS will simply ignore cases with missing values. It does not include them in the analysis, ensuring that only complete pairs of scores are considered.
What is the significance of the correlation coefficient being flagged with asterisks in SPSS?
-Asterisks indicate that the correlation is statistically significant, meaning it is significantly different from 0. The specific significance level is denoted by the number of asterisks.
What is a 'correlation matrix' and what does it show?
-A correlation matrix is a table that shows the correlation coefficients for every combination of variables. It includes both the correlation between different variables and the perfect correlation of a variable with itself.
What is the significance level mentioned in the script, and what does a level smaller than .05 indicate?
-The significance level is a threshold used to determine if a result is statistically significant. A level smaller than .05 indicates that the correlation is statistically significant, meaning it is unlikely to have occurred by chance.
What is a point biserial correlation and when is it used?
-A point biserial correlation is a type of correlation used when one variable is continuous and the other is a dichotomous nominal variable. It is used in the script to correlate height, weight, and gender.
What does the scatter plot represent and how is it created in SPSS?
-A scatter plot represents the relationship between two variables, with each point corresponding to an individual's scores on the two variables. In SPSS, it is created using the Chart Builder tool by dragging variables to the x and y-axis drop zones.
What other statistical techniques are mentioned in the script that could be used to explore relationships between variables further?
-The script mentions other techniques such as simple regression and multiple regression for predicting variables, as well as t-tests for examining differences between variables.
Outlines
📊 Introduction to Correlation Analysis in SPSS
This paragraph introduces the fourth video in the SPSS for beginners series by the RStats Institute at Missouri State University. The focus is on performing simple analyses, specifically correlation, using the same dataset from the first video but without the z-score variables. The video explains the concept of Pearson's r, a measure of the relationship between two variables that ranges from -1 to +1, with 0 indicating no relationship. It emphasizes the importance of pairing each person's height with their weight and notes that SPSS will ignore cases with missing values. The process of calculating correlations in SPSS is demonstrated, including selecting the variables, choosing the Bivariate option, and understanding the output, which includes a correlation matrix and significance levels. The significance of the correlation coefficient and the default settings for two-tailed tests are also discussed.
📈 Exploring Correlation with Different Variables and Visualizing Data
In this paragraph, the video script delves into the analysis of the correlation between more than two variables, including the introduction of a nominal variable, gender, in a point biserial correlation. The script explains how to interpret the correlation matrix, which now includes three variables, and notes the negative correlations observed with gender due to the coding of males as 1 and females as 2. It highlights the significant correlation between weight and gender, indicated by asterisks and a p-value of .009. The paragraph concludes with a demonstration of how to create a scatter plot using the Chart Builder tool in SPSS, which visually represents the relationship between height and weight. The script also mentions the potential for further exploration of correlations, scatter plots, regression, and t-tests in future videos from the RStats Institute.
Mindmap
Keywords
💡SPSS
💡Descriptive Statistics
💡Correlation
💡Pearson's r
💡Significance Tests
💡Sample Size
💡Bivariate
💡Correlation Matrix
💡Scatter Plot
💡Point Biserial Correlation
💡APA Style
Highlights
Introduction to the fourth video in the SPSS for beginners series by the RStats Institute at Missouri State University.
Overview of how to perform simple analyses using descriptive statistics and graphs in SPSS.
Explanation of correlation analysis and its importance in understanding the relationship between variables.
Introduction to Pearson's r, a measure of the linear relationship between two variables, with a range from -1 to +1.
Demonstration of how to calculate Pearson's r in SPSS using the Bivariate option.
Discussion on the importance of pairing each person's height with their weight for accurate correlation analysis.
Note on SPSS ignoring cases with missing values in the correlation analysis.
Introduction to the correlation matrix and its role in displaying the correlation coefficients between variables.
Explanation of the significance of the correlation coefficient and its relation to the sample size.
Identification of the off-diagonal elements of the correlation matrix as the most interesting for analysis.
Example of a non-significant correlation between height and weight due to a small sample size.
Introduction to point biserial correlation and its application with nominal variables with two levels.
Demonstration of how to include a third variable, such as Gender, in the correlation analysis.
Discussion on the significance of the correlation between weight and gender, flagged with asterisks in SPSS.
Introduction to the Chart Builder tool in SPSS for creating visual representations like scatter plots.
Step-by-step guide on creating a scatter plot to visualize the relationship between height and weight.
Mention of further topics such as scatter plot formatting, regression analysis, and t-tests to be covered in future videos.
Transcripts
Welcome to the fourth video in SPSS for beginners from the RStats Institute at Missouri State University.
Now that we have learned how to examine each variable using descriptive statistics and graphs,
I'm going to show you how to do some simple analyses. So this video will show you the basics of doing correlation.
When you're ready to do a correlation for real,
watch the other are stats Institute videos to learn more about the theory the analysis and
how to write up your findings in APA style.
We're still using the same
SPSS data set that we created in the first video, however, I deleted the z-score variables that we created last time.
Now I'm going to show you how to calculate
correlations in
SPSS
The correlation that we are doing is called Pearson's r.
A Pearson's correlation describes the relationship between two variables.
Pearson's r ranges between -1 and +1.
0 indicates no relationship at all.
The closer that the correlation is to either +1 or -1, the stronger the relationship
between the variables.
We are interested in the relationship between height and weight.
Notice how the data have already been set up. Each person has a pair of scores.
Your height should be paired with your weight, it makes no sense to pair your height with my weight.
So it's very important that each pair stays together
We have 10 pairs of scores. Our sample size is 10. Each pair counts as one case.
So remember that we have two people without height and weight scores.
They are not going to be included in this analysis.
In fact, SPSS will simply ignore those cases with missing values. So let's do a correlation.
Go to Analyze
Correlate
A Pearson's r correlates two variables, so choose Bivariate...
As before all of our variables are here on the left. The two that we want to correlate are height and weight.
So we need at least two
variables.When you move over the first, the "OK" is still not available until you move over the second.
And we could add additional variables, but each would be correlated only two at a time.
We have some additional options here as well. We could calculate Kendall's tau or
Spearman's Rho if we had different data, but for now let's just stick with Pearson's r.
SPSS assumes that we want two tailed significance tests and
that we want to flag significant correlations.
We haven't talked about significance tests yet,
so for now just know that significance tests tell us something important about the variables. In this case, our
correlation is statistically significantly different than 0. If it is,
SPSS will flag it. All of the default settings are just the way we want them, so click OK to run the analysis.
The box that we see is called a "correlation matrix."
The correlation matrix shows the correlation coefficient for every combination of variables.
So we have two rows: one for height, one for weight.
And we have two columns: one for height, one for weight.
Where each row and column intersect, we see the correlation coefficient between those two variables.
So in this quadrant of the matrix we see the correlation coefficient between height and
itself.
No surprise. It's 1. It's a perfect correlation.
We see another perfect correlation down here on the lower right which is the correlation between weight and itself.
SPSS will compare every combination of variables
including each variable and itself.
Now these correlations are not very interesting because we already know that
every variable will always correlate with itself at a +1, no matter the variable.
The interesting correlations are in these off diagonals.
The top left box is the correlation coefficient.
It will always be between +1 and -1.
Below that is the significance level.
Significance levels smaller than .05 are statistically significant.
Below that is the N, or the sample size, which is our 10 pairs of scores. So let's look at this coefficient.
Notice that the off diagonal correlations are the same because height correlates with weight exactly the same as weight
correlates with height. In this case it's a .574 which is pretty strong,
but not significant because the sample size of 10 is pretty small.
You are always more likely to find significance with larger sample sizes.
If this correlation was significant, we would see some asterisks next to the coefficient.
So as I mentioned, you can correlate more than two variables at a time,
and you could even use correlation with nominal variables as long as it only has two levels. In fact, let me show you.
Go to Analyze
Correlate
Bivariate
All we're going to do is throw in a third variable, Gender, and this is actually called a point biserial correlation,
more on all of that later. For now, just click OK.
We get another correlation matrix, but this time it's bigger. It has three rows and three columns.
The correlations between height and weight are exactly the same as before,
but we also have correlations with gender.
Because the correlations are negative, as one variable goes up the other goes down.
Remember that we coded males as 1, females as 2. So the 1 is smaller. We see this negative
correlation, the smaller numbers are associated with larger values. So basically the males were taller and weighed more.
And here we also see a significant correlation that's been flagged.
The biserial correlation between weight and gender has two asterisks, so what does that mean?
We can see that this correlation is significant at the .01 level. In fact, the p-value is .009.
So there' is a statistically significant relationship between weight and gender.
So there's one more thing that I want to show you with correlations,
and that is how to make a picture of them. The picture is called a scatter plot,
and it is created using a new tool called the chart builder. And here's how we do it.
Instead of the Analyze menu, we're going to use the Graphs menu. So go to Graphs
Chart Builder
We will learn more about the chart builder later when we learn about graphing.
For now, let's just have some fun and make a scatter plot.
Start by clicking on the word Scatter/Dot in the gallery.
Now we see our eight options. If you hover your cursor above them, SPSS will tell you what they are.
We want this first option: Simple Scatter
Click and drag it into the blank area known as the canvas.
You will see that we now have two drop zones: one for the x axis and one for the y axis.
So let's use height to predict weight.
Drag height to the x axis drop zone
weight to the y-axis drop zone.
And that is all you have to do. Click OK.
And there is our scatter plot of all 10 of the pairs of scores.
There is much more that we could do with correlation, so for instance, we could format the scatter plot in APA style.
We could do other types of correlations.
We could even use some variables to predict other variables using a technique called
regression
To learn more about correlation, scatter plots, simple regression, and multiple regression check out these other videos
from RStats Instiutue.
Correlations are about relationships between variables,
but we might also be interested in
differences between variables. So next we're going to learn about t-tests.
5.0 / 5 (0 votes)