04 Correlation in SPSS – SPSS for Beginners

Research By Design
6 Dec 201709:52

Summary

TLDRThis tutorial from the RStats Institute introduces the basics of correlation analysis in SPSS for beginners. It explains how to perform a Pearson's correlation to examine the relationship between two variables, such as height and weight, using a sample dataset. The video covers setting up data, interpreting the correlation matrix, and understanding significance levels. It also demonstrates how to include a third variable in the correlation, resulting in a point biserial correlation, and how to visualize correlations using a scatter plot with the Chart Builder tool. The tutorial concludes by highlighting the significance of correlation in statistical analysis and its applications in further statistical techniques like regression.

Takeaways

  • 📚 The video is part of a series on SPSS for beginners by the RStats Institute at Missouri State University.
  • 🔍 It focuses on teaching the basics of correlation analysis in SPSS, specifically Pearson's r, which measures the relationship between two variables.
  • ⚖️ Pearson's r ranges from -1 to +1, with 0 indicating no relationship, and values closer to +1 or -1 indicating a stronger relationship.
  • 📈 The example used in the video involves analyzing the relationship between height and weight, with each person's height paired with their weight.
  • 🔎 The video demonstrates how to perform a correlation analysis in SPSS, including selecting the 'Bivariate' option and choosing the variables to correlate.
  • 📊 The output of a correlation analysis in SPSS is a 'correlation matrix', which shows the correlation coefficients between variables.
  • 🔑 The significance of the correlation is indicated by the p-value; if it's less than .05, the correlation is statistically significant.
  • 👥 The video mentions that SPSS will ignore cases with missing values, which is important for understanding the sample size used in the analysis.
  • 📊 The video also introduces the concept of a scatter plot as a visual representation of the correlation between two variables.
  • 📝 The script suggests that viewers watch additional videos from the RStats Institute for more information on theory, analysis, and APA style reporting.
  • 📈 Lastly, the video briefly mentions other statistical techniques such as point biserial correlation, regression, and t-tests for analyzing differences between variables.

Q & A

  • What is the main focus of the fourth video in the SPSS for beginners series?

    -The main focus of the fourth video is to teach viewers how to perform simple analyses, specifically how to calculate and interpret Pearson's correlation in SPSS.

  • What is Pearson's r and what does it measure?

    -Pearson's r is a statistical measure that describes the relationship between two variables. It ranges between -1 and +1, where 0 indicates no relationship, and values closer to +1 or -1 indicate a stronger relationship.

  • Why is it important that each person's height is paired with their own weight in the dataset?

    -It is important because the analysis requires each pair of scores to stay together to make sense. Pairing one person's height with another person's weight would not be meaningful in the context of the analysis.

  • How does SPSS handle cases with missing values during a correlation analysis?

    -SPSS will simply ignore cases with missing values. It does not include them in the analysis, ensuring that only complete pairs of scores are considered.

  • What is the significance of the correlation coefficient being flagged with asterisks in SPSS?

    -Asterisks indicate that the correlation is statistically significant, meaning it is significantly different from 0. The specific significance level is denoted by the number of asterisks.

  • What is a 'correlation matrix' and what does it show?

    -A correlation matrix is a table that shows the correlation coefficients for every combination of variables. It includes both the correlation between different variables and the perfect correlation of a variable with itself.

  • What is the significance level mentioned in the script, and what does a level smaller than .05 indicate?

    -The significance level is a threshold used to determine if a result is statistically significant. A level smaller than .05 indicates that the correlation is statistically significant, meaning it is unlikely to have occurred by chance.

  • What is a point biserial correlation and when is it used?

    -A point biserial correlation is a type of correlation used when one variable is continuous and the other is a dichotomous nominal variable. It is used in the script to correlate height, weight, and gender.

  • What does the scatter plot represent and how is it created in SPSS?

    -A scatter plot represents the relationship between two variables, with each point corresponding to an individual's scores on the two variables. In SPSS, it is created using the Chart Builder tool by dragging variables to the x and y-axis drop zones.

  • What other statistical techniques are mentioned in the script that could be used to explore relationships between variables further?

    -The script mentions other techniques such as simple regression and multiple regression for predicting variables, as well as t-tests for examining differences between variables.

Outlines

00:00

📊 Introduction to Correlation Analysis in SPSS

This paragraph introduces the fourth video in the SPSS for beginners series by the RStats Institute at Missouri State University. The focus is on performing simple analyses, specifically correlation, using the same dataset from the first video but without the z-score variables. The video explains the concept of Pearson's r, a measure of the relationship between two variables that ranges from -1 to +1, with 0 indicating no relationship. It emphasizes the importance of pairing each person's height with their weight and notes that SPSS will ignore cases with missing values. The process of calculating correlations in SPSS is demonstrated, including selecting the variables, choosing the Bivariate option, and understanding the output, which includes a correlation matrix and significance levels. The significance of the correlation coefficient and the default settings for two-tailed tests are also discussed.

05:02

📈 Exploring Correlation with Different Variables and Visualizing Data

In this paragraph, the video script delves into the analysis of the correlation between more than two variables, including the introduction of a nominal variable, gender, in a point biserial correlation. The script explains how to interpret the correlation matrix, which now includes three variables, and notes the negative correlations observed with gender due to the coding of males as 1 and females as 2. It highlights the significant correlation between weight and gender, indicated by asterisks and a p-value of .009. The paragraph concludes with a demonstration of how to create a scatter plot using the Chart Builder tool in SPSS, which visually represents the relationship between height and weight. The script also mentions the potential for further exploration of correlations, scatter plots, regression, and t-tests in future videos from the RStats Institute.

Mindmap

Keywords

💡SPSS

SPSS, which stands for Statistical Package for the Social Sciences, is a popular software used for statistical analysis in various academic and professional fields. In the context of the video, SPSS is the tool used to demonstrate how to perform correlation analysis and other statistical methods. The script shows how to navigate the software to calculate Pearson's r and create scatter plots, making it central to the video's theme of statistical analysis.

💡Descriptive Statistics

Descriptive statistics are numerical measures that summarize and describe the features of a dataset. They provide a quick yet comprehensive overview of the data, including measures of central tendency (mean, median, mode) and dispersion (range, variance, standard deviation). The video script mentions that the audience has already learned about descriptive statistics, indicating the progression from basic data description to more advanced analysis techniques like correlation.

💡Correlation

Correlation is a statistical measure that expresses the extent to which two variables are linearly related. It is a fundamental concept in the video, as it is the main focus of the analysis being demonstrated. The script explains that a Pearson's correlation coefficient (Pearson's r) is used to quantify the relationship between two variables, such as height and weight, with values ranging from -1 to +1.

💡Pearson's r

Pearson's r, also known as the Pearson correlation coefficient, is a measure of the linear correlation between two variables. The script specifically mentions Pearson's r as the type of correlation being calculated in SPSS, emphasizing its importance in understanding the strength and direction of the relationship between variables like height and weight.

💡Significance Tests

Significance tests are statistical methods used to determine whether an observed correlation or difference is likely to have occurred by chance. The video script refers to the p-value and its role in determining statistical significance, indicating that a correlation is meaningful if the p-value is less than a certain threshold, such as .05.

💡Sample Size

Sample size refers to the number of observations or data points in a study. In the video, the sample size is mentioned as 10 pairs of scores, which is a small sample size that may affect the statistical power to detect significant correlations. The script implies that larger sample sizes are more likely to yield significant results.

💡Bivariate

Bivariate analysis involves examining the relationship between two variables at a time. The script describes using SPSS to perform a bivariate correlation, which is a basic form of correlation analysis that looks at the relationship between two variables, such as height and weight.

💡Correlation Matrix

A correlation matrix is a table that displays the correlation coefficients between pairs of variables in a dataset. The video script explains that the correlation matrix in SPSS shows the correlation coefficients for every combination of variables, including the relationship between height and weight, and also between each variable and itself.

💡Scatter Plot

A scatter plot is a type of plot that displays the values of two variables for a set of data. The video script describes how to create a scatter plot using SPSS to visually represent the relationship between height and weight, providing a graphical representation of the correlation.

💡Point Biserial Correlation

Point biserial correlation is a type of correlation used when one variable is continuous and the other is a binary categorical variable. The script introduces this concept by including a third variable, gender, in the correlation analysis, which is a step beyond the basic bivariate correlation.

💡APA Style

APA Style refers to the guidelines set by the American Psychological Association for writing academic papers, including the format for presenting statistical results. The video script mentions the possibility of formatting the scatter plot and writing up findings in APA style, indicating the importance of adhering to academic standards in reporting research.

Highlights

Introduction to the fourth video in the SPSS for beginners series by the RStats Institute at Missouri State University.

Overview of how to perform simple analyses using descriptive statistics and graphs in SPSS.

Explanation of correlation analysis and its importance in understanding the relationship between variables.

Introduction to Pearson's r, a measure of the linear relationship between two variables, with a range from -1 to +1.

Demonstration of how to calculate Pearson's r in SPSS using the Bivariate option.

Discussion on the importance of pairing each person's height with their weight for accurate correlation analysis.

Note on SPSS ignoring cases with missing values in the correlation analysis.

Introduction to the correlation matrix and its role in displaying the correlation coefficients between variables.

Explanation of the significance of the correlation coefficient and its relation to the sample size.

Identification of the off-diagonal elements of the correlation matrix as the most interesting for analysis.

Example of a non-significant correlation between height and weight due to a small sample size.

Introduction to point biserial correlation and its application with nominal variables with two levels.

Demonstration of how to include a third variable, such as Gender, in the correlation analysis.

Discussion on the significance of the correlation between weight and gender, flagged with asterisks in SPSS.

Introduction to the Chart Builder tool in SPSS for creating visual representations like scatter plots.

Step-by-step guide on creating a scatter plot to visualize the relationship between height and weight.

Mention of further topics such as scatter plot formatting, regression analysis, and t-tests to be covered in future videos.

Transcripts

play00:07

Welcome to the fourth video in SPSS for beginners from the RStats Institute at Missouri State University.

play00:15

Now that we have learned how to examine each variable using descriptive statistics and graphs,

play00:21

I'm going to show you how to do some simple analyses. So this video will show you the basics of doing correlation.

play00:28

When you're ready to do a correlation for real,

play00:31

watch the other are stats Institute videos to learn more about the theory the analysis and

play00:38

how to write up your findings in APA style.

play00:49

We're still using the same

play00:50

SPSS data set that we created in the first video, however, I deleted the z-score variables that we created last time.

play00:58

Now I'm going to show you how to calculate

play01:01

correlations in

play01:03

SPSS

play01:05

The correlation that we are doing is called Pearson's r.

play01:10

A Pearson's correlation describes the relationship between two variables.

play01:16

Pearson's r ranges between -1 and +1.

play01:21

0 indicates no relationship at all.

play01:24

The closer that the correlation is to either +1 or -1, the stronger the relationship

play01:31

between the variables.

play01:34

We are interested in the relationship between height and weight.

play01:39

Notice how the data have already been set up. Each person has a pair of scores.

play01:45

Your height should be paired with your weight, it makes no sense to pair your height with my weight.

play01:52

So it's very important that each pair stays together

play01:56

We have 10 pairs of scores. Our sample size is 10. Each pair counts as one case.

play02:04

So remember that we have two people without height and weight scores.

play02:09

They are not going to be included in this analysis.

play02:13

In fact, SPSS will simply ignore those cases with missing values. So let's do a correlation.

play02:20

Go to Analyze

play02:23

Correlate

play02:24

A Pearson's r correlates two variables, so choose Bivariate...

play02:31

As before all of our variables are here on the left. The two that we want to correlate are height and weight.

play02:37

So we need at least two

play02:39

variables.When you move over the first, the "OK" is still not available until you move over the second.

play02:46

And we could add additional variables, but each would be correlated only two at a time.

play02:53

We have some additional options here as well. We could calculate Kendall's tau or

play02:59

Spearman's Rho if we had different data, but for now let's just stick with Pearson's r.

play03:06

SPSS assumes that we want two tailed significance tests and

play03:10

that we want to flag significant correlations.

play03:14

We haven't talked about significance tests yet,

play03:16

so for now just know that significance tests tell us something important about the variables. In this case, our

play03:24

correlation is statistically significantly different than 0. If it is,

play03:29

SPSS will flag it. All of the default settings are just the way we want them, so click OK to run the analysis.

play03:37

The box that we see is called a "correlation matrix."

play03:41

The correlation matrix shows the correlation coefficient for every combination of variables.

play03:48

So we have two rows: one for height, one for weight.

play03:52

And we have two columns: one for height, one for weight.

play03:56

Where each row and column intersect, we see the correlation coefficient between those two variables.

play04:05

So in this quadrant of the matrix we see the correlation coefficient between height and

play04:11

itself.

play04:13

No surprise. It's 1. It's a perfect correlation.

play04:17

We see another perfect correlation down here on the lower right which is the correlation between weight and itself.

play04:25

SPSS will compare every combination of variables

play04:29

including each variable and itself.

play04:32

Now these correlations are not very interesting because we already know that

play04:38

every variable will always correlate with itself at a +1, no matter the variable.

play04:45

The interesting correlations are in these off diagonals.

play04:50

The top left box is the correlation coefficient.

play04:53

It will always be between +1 and -1.

play04:58

Below that is the significance level.

play05:01

Significance levels smaller than .05 are statistically significant.

play05:06

Below that is the N, or the sample size, which is our 10 pairs of scores. So let's look at this coefficient.

play05:16

Notice that the off diagonal correlations are the same because height correlates with weight exactly the same as weight

play05:22

correlates with height. In this case it's a .574 which is pretty strong,

play05:29

but not significant because the sample size of 10 is pretty small.

play05:35

You are always more likely to find significance with larger sample sizes.

play05:40

If this correlation was significant, we would see some asterisks next to the coefficient.

play05:47

So as I mentioned, you can correlate more than two variables at a time,

play05:51

and you could even use correlation with nominal variables as long as it only has two levels. In fact, let me show you.

play05:59

Go to Analyze

play06:01

Correlate

play06:03

Bivariate

play06:05

All we're going to do is throw in a third variable, Gender, and this is actually called a point biserial correlation,

play06:14

more on all of that later. For now, just click OK.

play06:19

We get another correlation matrix, but this time it's bigger. It has three rows and three columns.

play06:26

The correlations between height and weight are exactly the same as before,

play06:31

but we also have correlations with gender.

play06:35

Because the correlations are negative, as one variable goes up the other goes down.

play06:41

Remember that we coded males as 1, females as 2. So the 1 is smaller. We see this negative

play06:48

correlation, the smaller numbers are associated with larger values. So basically the males were taller and weighed more.

play06:58

And here we also see a significant correlation that's been flagged.

play07:03

The biserial correlation between weight and gender has two asterisks, so what does that mean?

play07:11

We can see that this correlation is significant at the .01 level. In fact, the p-value is .009.

play07:21

So there' is a statistically significant relationship between weight and gender.

play07:26

So there's one more thing that I want to show you with correlations,

play07:29

and that is how to make a picture of them. The picture is called a scatter plot,

play07:35

and it is created using a new tool called the chart builder. And here's how we do it.

play07:42

Instead of the Analyze menu, we're going to use the Graphs menu. So go to Graphs

play07:49

Chart Builder

play07:51

We will learn more about the chart builder later when we learn about graphing.

play07:56

For now, let's just have some fun and make a scatter plot.

play08:01

Start by clicking on the word Scatter/Dot in the gallery.

play08:07

Now we see our eight options. If you hover your cursor above them, SPSS will tell you what they are.

play08:16

We want this first option: Simple Scatter

play08:22

Click and drag it into the blank area known as the canvas.

play08:27

You will see that we now have two drop zones: one for the x axis and one for the y axis.

play08:33

So let's use height to predict weight.

play08:37

Drag height to the x axis drop zone

play08:42

weight to the y-axis drop zone.

play08:46

And that is all you have to do. Click OK.

play08:51

And there is our scatter plot of all 10 of the pairs of scores.

play08:57

There is much more that we could do with correlation, so for instance, we could format the scatter plot in APA style.

play09:06

We could do other types of correlations.

play09:09

We could even use some variables to predict other variables using a technique called

play09:15

regression

play09:16

To learn more about correlation, scatter plots, simple regression, and multiple regression check out these other videos

play09:24

from RStats Instiutue.

play09:27

Correlations are about relationships between variables,

play09:31

but we might also be interested in

play09:35

differences between variables. So next we're going to learn about t-tests.

Rate This

5.0 / 5 (0 votes)

Related Tags
SPSS TutorialCorrelation AnalysisDescriptive StatsPearson's rData AnalysisScatter PlotsStatistical SignificanceRStats InstituteVideo SeriesAPA Style