Spearman Rank Correlation [Simply explained]

DATAtab
5 Apr 202305:40

Summary

TLDRThe video script explains the concept of Spearman's rank correlation, a non-parametric method to measure the relationship between two variables using their ranks instead of raw data. It contrasts this with Pearson correlation and demonstrates the process through an example involving reaction times and ages of computer players. The script also covers how to calculate Spearman's correlation coefficient, interpret its strength and direction, and use a t-test to determine if the correlation is significantly different from zero, as illustrated with a dataset provided in the video description.

Takeaways

  • 📊 Spearman correlation is a non-parametric statistical measure used to assess the strength and direction of the relationship between two variables.
  • 🔄 Unlike Pearson correlation, Spearman correlation does not use the raw data but instead uses the ranks of the data.
  • 🎯 In the example provided, reaction time and age of computer players are ranked to calculate Spearman correlation.
  • 📈 The Spearman rank correlation coefficient is calculated using the ranks of the data, similar to the Pearson correlation but with ranks.
  • 📉 The Spearman correlation coefficient, 'rs', ranges from -1 to 1, indicating the strength and direction of the relationship between variables.
  • 🔢 The formula for Spearman correlation involves summing the squares of the differences in ranks (D) and is applicable when there are no rank ties.
  • 📝 The correlation coefficient can be used to interpret the strength of the relationship: negative values indicate a negative correlation, positive values a positive correlation, and zero indicates no correlation.
  • ✅ A hypothesis test can be conducted to determine if the correlation coefficient is significantly different from zero, using a t-test and comparing the p-value to a significance level.
  • 📉 The null hypothesis states that there is no relationship (correlation coefficient equals zero), while the alternative hypothesis suggests there is a relationship.
  • 📊 The significance of the correlation can be assessed using a t-test statistic, where a p-value less than the significance level (commonly 0.05) leads to the rejection of the null hypothesis.
  • 🔗 Data from the example in the script yields a p-value of 0.002, which is less than 0.05, indicating a significant correlation and rejection of the null hypothesis.

Q & A

  • What is the Spearman correlation?

    -The Spearman correlation, also known as Spearman's rank correlation, is a non-parametric measure of rank correlation that assesses how well the relationship between two variables can be described using a monotonic function.

  • How does Spearman correlation differ from Pearson correlation?

    -While both Spearman and Pearson correlations measure the strength and direction of the relationship between two variables, Spearman correlation uses the ranks of the data instead of the raw data, making it a non-parametric alternative to the Pearson correlation which requires normally distributed data.

  • What is the process of calculating Spearman rank correlation?

    -To calculate the Spearman rank correlation, first assign ranks to the data points for each variable. Then, calculate the difference in ranks for each pair of data points, square these differences, sum them up, and divide by the number of observations to find the Spearman correlation coefficient.

  • How are ranks assigned to the data in Spearman correlation?

    -Ranks are assigned based on the order of the data points. The smallest value gets rank 1, the next smallest gets rank 2, and so on. If there are ties, assign the average rank to the tied values.

  • What does a Spearman correlation coefficient value of 0.9 indicate?

    -A Spearman correlation coefficient of 0.9 indicates a very strong positive relationship between the two variables, meaning that as one variable increases, the other tends to increase as well.

  • What is the range of the Spearman correlation coefficient?

    -The Spearman correlation coefficient ranges from -1 to 1, where -1 indicates a perfect negative relationship, 1 indicates a perfect positive relationship, and 0 indicates no relationship.

  • How can you determine if a Spearman correlation coefficient is statistically significant?

    -To determine if a Spearman correlation coefficient is statistically significant, you can use a t-test. If the calculated p-value is less than the significance level (commonly 0.05), the null hypothesis that there is no relationship is rejected.

  • What is the null hypothesis in a Spearman correlation test?

    -The null hypothesis in a Spearman correlation test is that the correlation coefficient (ρ) is equal to zero, indicating no relationship between the two variables.

  • What is the alternative hypothesis in a Spearman correlation test?

    -The alternative hypothesis in a Spearman correlation test is that the correlation coefficient (ρ) is not equal to zero, indicating that there is a relationship between the two variables.

  • Can the Spearman correlation be used with non-normally distributed data?

    -Yes, the Spearman correlation is particularly useful with non-normally distributed data or ordinal data because it does not assume normality and is based on the ranks of the data rather than the actual values.

  • How does the presence of outliers affect the Spearman correlation?

    -Since Spearman correlation is based on ranks, it is less affected by outliers compared to the Pearson correlation, which is sensitive to extreme values in the data.

Outlines

00:00

📊 Understanding Spearman's Rank Correlation

The first paragraph introduces Spearman's rank correlation, a non-parametric statistical method used to measure the relationship between two variables. Unlike Pearson correlation, which uses raw data, Spearman's method employs the ranks of the data. The paragraph provides an example involving the reaction time of computer players and their ages, demonstrating how to assign ranks to the data and how to calculate the Spearman correlation coefficient. It also explains that the Spearman correlation can yield the same result as the Pearson correlation when there are no ties in the ranks. The formula for calculating Spearman's correlation without ties is given, and the interpretation of the correlation coefficient's value is discussed, including how to determine the strength and direction of the correlation.

05:04

🔍 Hypothesis Testing with Spearman Correlation

The second paragraph delves into hypothesis testing using the Spearman correlation coefficient. It explains the process of testing whether the correlation in the sample data is significantly different from zero, which would indicate a relationship in the population. The paragraph outlines the null hypothesis (no relationship, correlation coefficient equals zero) and the alternative hypothesis (there is a relationship). It also describes how to use a t-test to determine if the correlation coefficient is significantly different from zero, with the significance level typically set at 5%. The example given in the paragraph results in a p-value of 0.002, which is less than the 0.05 threshold, leading to the rejection of the null hypothesis. The paragraph concludes by encouraging viewers to download the dataset for further exploration and thanking them for watching the video.

Mindmap

Keywords

💡Spearman correlation

Spearman correlation, also known as Spearman's rank correlation, is a non-parametric measure of rank correlation that assesses how well the relationship between two variables can be described using a monotonic function. In the video, it is contrasted with Pearson correlation and is used to examine the relationship between a player's reaction time and their age, with the correlation coefficient indicating the strength and direction of the relationship.

💡Pearson correlation

Pearson correlation is a statistical measure that expresses the linear relationship between two continuous variables. The script explains that while Pearson correlation uses the raw data values, Spearman correlation uses the ranks of the data, making it a non-parametric alternative. The video provides an example where both methods are applied to the same data set, yielding the same correlation coefficient of 0.9.

💡Non-parametric

Non-parametric refers to statistical methods that do not assume a specific distribution for the underlying populations. In the context of the video, Spearman correlation is described as the non-parametric counterpart of Pearson correlation, meaning it does not require the data to follow a normal distribution and can be used when the assumptions of parametric tests are not met.

💡Ranks

Ranks in statistics refer to the order of data when arranged in ascending or descending order. In the video, ranks are assigned to both reaction time and age of the players. The use of ranks is central to calculating the Spearman correlation, as it allows for the assessment of the relationship between variables without relying on the actual data values.

💡Scatter plot

A scatter plot is a type of plot that uses Cartesian coordinates to display values for two variables for a set of data. In the video, a scatter plot is used to visualize the raw data of age and reaction time, and then again to represent the ranks of these variables, which is crucial for understanding how Spearman correlation is derived from the ranked data.

💡Correlation coefficient

The correlation coefficient is a measure that expresses the extent to which two variables are linearly related. In the video, both the Spearman and Pearson correlation coefficients are calculated to be 0.9, indicating a strong positive relationship between the variables. The script explains how this coefficient varies between -1 and 1, with the sign indicating the direction of the relationship.

💡Hypothesis testing

Hypothesis testing is a statistical method used to determine whether there is enough evidence to support a claim. In the video, the null hypothesis is that there is no relationship (correlation coefficient equals zero), and the alternative hypothesis is that there is a relationship. The script describes how a t-test is used to calculate a p-value, which helps decide whether to reject or fail to reject the null hypothesis.

💡Significance level

The significance level, often denoted as alpha, is the threshold used in hypothesis testing to determine if the results are statistically significant. In the video, a significance level of 5% is mentioned, meaning if the p-value is less than 0.05, the null hypothesis is rejected, indicating a statistically significant correlation.

💡P-value

The p-value is the probability of obtaining results at least as extreme as the observed results, assuming the null hypothesis is true. In the context of the video, a p-value of 0.002 is calculated, which is less than the significance level, leading to the rejection of the null hypothesis and supporting the existence of a correlation.

💡Data set

A data set is a collection of data, typically used for analysis. The video script mentions a data set that can be downloaded for further analysis, which includes the reaction times and ages of computer players. This data set is used to calculate and compare the Spearman and Pearson correlation coefficients.

💡T-test

A t-test is a statistical test used to compare the means of two groups to determine if there is a significant difference between them. In the video, a t-test is used to evaluate the significance of the Spearman correlation coefficient calculated from the sample data, helping to determine if the correlation is significantly different from zero.

Highlights

Spearman correlation is a non-parametric method to examine the relationship between two variables.

Unlike Pearson correlation, Spearman correlation uses the ranks of data rather than the raw data.

The calculation of Spearman correlation involves assigning ranks to each data point for the variables being studied.

An example is given where the reaction time and age of computer players are measured and ranked.

The ranks are used to form a scatter plot for the Spearman correlation analysis.

Spearman and Pearson correlation coefficients can yield the same result when calculated from ranks.

The Spearman correlation coefficient is calculated using the formula that involves the sum of squared differences in ranks.

The Spearman correlation coefficient ranges from -1 to 1, indicating the strength and direction of the relationship.

A table is used to interpret the strength of the correlation based on the coefficient's value.

The significance of the correlation coefficient can be tested using a t-test against the null hypothesis of no relationship.

A p-value is calculated to determine if the correlation coefficient is significantly different from zero.

If the p-value is less than the significance level, the null hypothesis is rejected, indicating a significant relationship.

The example provided calculates a p-value of 0.002, which is less than 0.05, leading to the rejection of the null hypothesis.

Data can be downloaded for further analysis, with the link provided in the video description.

The video concludes by summarizing the process and significance of calculating and interpreting the Spearman correlation.

Transcripts

play00:00

what is a Spearman correlation

play00:02

spearman's rank correlation examines the

play00:05

relationship between two variables isn't

play00:09

that exactly what the Pearson

play00:10

correlation does that's right the

play00:13

Spearman rank correlation is the

play00:15

non-parametric counterpart of the

play00:18

Pearson correlation but there is an

play00:20

important difference between both

play00:22

correlation coefficients Spearman

play00:24

correlation does not use the raw data

play00:27

but the ranks of the data let's look at

play00:30

this with an example we measure the

play00:33

reaction time of 8 computer players and

play00:36

ask their age when we calculate a

play00:39

Pearson correlation we simply take the

play00:42

two variables reaction time and age and

play00:46

calculate the Pearson correlation

play00:47

coefficient however we now want to

play00:50

calculate the Spearman rank correlation

play00:52

so first we assign a rank to each person

play00:56

for reaction time and age the reaction

play00:59

time is already sorted by size 12 is the

play01:04

smallest value so gets rank 1 15 the

play01:07

second smallest so gets Rank 2 and so on

play01:11

and so forth we are now doing the same

play01:13

with age here we have the smallest value

play01:16

there the second smallest value the

play01:20

third smallest value fourth smallest and

play01:23

so on and so forth let's take a look at

play01:25

this in a scatter plot here we see the

play01:28

raw data of age and reaction time but

play01:32

now we would like to use the rankings so

play01:35

we form ranks from the variables age and

play01:38

reaction time

play01:40

through this transformation we have now

play01:43

distributed the data more evenly to

play01:46

calculate spiem and correlation now we

play01:49

simply calculate the P using correlation

play01:51

from the ranks

play01:54

so the Spearman correlation is equal to

play01:57

the Pearson correlation only that the

play01:59

ranks are used instead of the raw values

play02:02

let's have a quick look at that in data

play02:05

tab here we have the reaction time and

play02:08

age and there we have the just created

play02:11

ranks of the reaction time and age now

play02:15

we can either calculate the spement

play02:16

correlation of the reaction time and age

play02:19

where we get a correlation of 0.9 or we

play02:24

can calculate the Pearson correlation

play02:26

from the ranks there we also get 0.9 so

play02:31

exactly the same as before if you like

play02:34

you can download the data set you can

play02:37

find the link in the video description

play02:38

if there are no rank ties we can also

play02:42

use this equation to calculate the

play02:44

Spearman correlation n is the number of

play02:47

cases and D is the difference in ranks

play02:51

between the two variables referring to

play02:54

our example we get the different D's

play02:57

with this one minus 1 which is 0 2 minus

play03:01

three is minus one three minus two is

play03:05

one and so on now we Square the

play03:08

individual D's and add them all up

play03:11

so the sum of d i squared is eight n

play03:15

which is the number of people is also 8

play03:18

in this example if you put everything in

play03:21

we get a correlation coefficient of 0.9

play03:24

just like the Pearson correlation

play03:26

coefficient R the Spearman correlation

play03:29

coefficient r s also varies between

play03:33

-1 and 1 with the help of the

play03:35

coefficient we can now determine two

play03:38

things number one how strong the

play03:41

correlation is and number two in which

play03:44

direction the correlation goes the

play03:46

strength of the correlation can be read

play03:49

in a table if we have a coefficient

play03:51

between

play03:52

-1 and less than zero there is a

play03:55

negative correlation thus a negative

play03:58

relationship between the variables if we

play04:01

have a coefficient between greater than

play04:03

zero and one there is a positive

play04:06

correlation that is a positive

play04:08

relationship between the two variables

play04:10

if the result out is zero we have no

play04:13

correlation often however starting from

play04:17

a sample we want to test a hypothesis

play04:19

about the population we calculated the

play04:22

correlation coefficient from the sample

play04:25

data and now we can check if the

play04:27

correlation coefficient is significantly

play04:30

different from zero thus the null

play04:32

hypothesis is the correlation

play04:34

coefficient R is equal to zero there is

play04:38

no relationship and the alternative

play04:40

hypothesis is the correlation

play04:42

coefficient R is n equal to zero there

play04:46

is a relationship whether the

play04:48

correlation coefficient is significantly

play04:50

different from zero based on the sample

play04:53

collected can be checked using a t-test

play04:56

where R is the correlation coefficient

play04:59

and N is the sample size a p-value can

play05:03

then be calculated from the test

play05:05

statistic T if the p-value is less than

play05:09

the specified significance level which

play05:11

is usually five percent then the null

play05:14

hypothesis is rejected otherwise it is

play05:17

not if you use data depth for the

play05:19

calculation of the example we get a

play05:21

p-value of

play05:23

0.002 the p-value is therefore smaller

play05:27

than 0.05 and we can therefore reject

play05:31

the null hypothesis that in the

play05:33

population the correlation coefficient

play05:35

is zero thanks for watching and I hope

play05:38

you enjoyed the video

Rate This

5.0 / 5 (0 votes)

Related Tags
Spearman CorrelationRank CorrelationPearson CorrelationStatistical AnalysisData RelationshipsNon-ParametricCorrelation CoefficientData ScienceHypothesis TestingDescriptive Statistics