03 Descriptive Statistics and z Scores in SPSS – SPSS for Beginners

Research By Design
5 Dec 201707:46

Summary

TLDRThe third video in the SPSS for Beginners series from the RStats Institute explores descriptive statistics, including mean and standard deviation, to understand data distribution. It demonstrates how to obtain these statistics in SPSS, with options to customize output and visualize data through histograms, box plots, and stem-and-leaf plots. The video also introduces z-scores, showing how to standardize scores in SPSS and highlighting their significance in data analysis.

Takeaways

  • 📊 Descriptive Statistics are used to describe the characteristics of a dataset, including measures like mean and standard deviation.
  • 📈 The mean provides the average value of a variable, while the standard deviation indicates the spread of data around the mean.
  • 🔍 In SPSS, descriptive statistics can be calculated by going to Analyze -> Descriptive Statistics -> Descriptives.
  • 📝 Options in SPSS allow for the inclusion of various statistics such as mean, standard deviation, range, and sum.
  • 🔑 Valid N (listwise) represents the number of complete cases without missing data, which is crucial for ensuring data integrity.
  • 👥 Gender data may not be useful for mean and standard deviation calculations, as it is categorical rather than continuous.
  • 📋 For continuous variables like height and weight, mean and standard deviation provide insights into the central tendency and dispersion.
  • 📊 The Explore function in SPSS offers a more detailed analysis, including outliers, percentiles, and various plots for data visualization.
  • 📈 Histograms, stem-and-leaf plots, and box plots are useful for visualizing the distribution of data and identifying outliers.
  • 👦🏻👧🏻 Descriptive statistics can be calculated separately for different groups, such as males and females, by using the Explore function with a factor list.
  • 📐 Z-scores can be generated in SPSS by saving standardized values as variables, which indicate how many standard deviations a score is from the mean.
  • 🔍 Negative z-scores represent scores below the average, while positive z-scores indicate scores above the average.

Q & A

  • What is the main focus of the third video in the SPSS for Beginners series?

    -The main focus of the third video is to provide an introduction to more advanced descriptive statistics in SPSS, including how to convert these statistics into z-scores.

  • What are 'Descriptive Statistics' and why are they important?

    -Descriptive Statistics are numerical measures that summarize and describe the characteristics of a set of data. They are important because they provide a quick way to understand the central tendency, dispersion, and shape of the data distribution.

  • How does the mean help in understanding a dataset?

    -The mean, or average, gives us a central value for the dataset, indicating what is the typical or average value of the data points.

  • What does the standard deviation measure in a dataset?

    -The standard deviation measures the amount of variation or dispersion in the dataset. It tells us how spread out the data points are around the mean.

  • What is the purpose of the 'Descriptives' option in SPSS?

    -The 'Descriptives' option in SPSS is used to calculate and display various descriptive statistics for the variables in a dataset, such as mean, standard deviation, minimum, maximum, and range.

  • What is the 'Explore' function in SPSS and how does it differ from 'Descriptives'?

    -The 'Explore' function in SPSS provides a more detailed analysis of the data, including additional statistics, options for plotting, and the ability to identify outliers and percentiles. It differs from 'Descriptives' by offering more flexibility and detail in the analysis.

  • What is the significance of the 'Valid N (listwise)' in the SPSS output?

    -The 'Valid N (listwise)' indicates the number of complete cases in the dataset with no missing data, which is important for understanding the dataset's completeness and reliability.

  • Why might the mean and standard deviation for a categorical variable like gender be meaningless?

    -The mean and standard deviation for a categorical variable like gender are meaningless because these measures are not appropriate for ordinal or nominal data. They are meant for interval or ratio data where the concept of an average makes sense.

  • How can histograms, stem-and-leaf plots, and box plots help in understanding the data distribution?

    -Histograms, stem-and-leaf plots, and box plots provide visual representations of the data distribution, helping to identify patterns, central tendencies, dispersion, and potential outliers or anomalies in the dataset.

  • What is a z-score and how does SPSS calculate it?

    -A z-score is a standardized score that indicates how many standard deviations a data point is from the mean. SPSS calculates z-scores by converting each raw score into a value that represents its distance from the mean in terms of standard deviation units.

  • How can the 'Save standardized values as variables' option in SPSS be useful?

    -The 'Save standardized values as variables' option in SPSS allows you to save the calculated z-scores as new variables in your dataset. This can be useful for further analysis or for comparing the standardized scores across different variables or groups.

Outlines

00:00

📊 Descriptive Statistics and Z-Scores in SPSS

This paragraph introduces the third video in the SPSS for Beginners series from the RStats Institute at Missouri State University. The focus is on descriptive statistics, which include measures of central tendency and variability. The video demonstrates how to obtain additional descriptive statistics using SPSS, such as mean, standard deviation, range, and sum, by navigating through the Analyze menu. The presenter explains the significance of these statistics, particularly for variables like height and weight, and how they can be used to understand the distribution of data. The paragraph also introduces the concept of z-scores, which are standardized scores indicating how many standard deviations a particular score is from the mean, and shows how to generate these in SPSS by using the 'Save standardized values as variables' option.

05:00

📈 Advanced Descriptive Statistics and Visualization Techniques

The second paragraph delves deeper into the use of SPSS for advanced descriptive statistics and data visualization. The video script describes the process of using the 'Explore' function to analyze scale variables like height and weight, and to generate detailed statistics, including outliers, percentiles, and histograms. The presenter also introduces various plotting options, such as stem-and-leaf plots, box plots, and the use of histograms for a visual representation of data distribution. Additionally, the paragraph explains how to split the analysis by a categorical variable, such as gender, to compare descriptive statistics between different groups. The video concludes with a brief mention of z-scores and an invitation to explore further videos on the topic, as well as an introduction to correlation analysis in subsequent lessons.

Mindmap

Keywords

💡Descriptive Statistics

Descriptive statistics are numerical measures that summarize and describe the features of a set of data. In the video, descriptive statistics such as mean and standard deviation are used to describe the central tendency and variability of the dataset, respectively. For example, the mean height and weight provide an average value, while the standard deviation indicates the spread of the data around the mean.

💡Central Tendency

Central tendency refers to the central or typical value for a probability distribution. The video discusses measures of central tendency like the mean, which is the average value of a dataset. This concept is crucial for understanding the 'typical' or 'representative' value within a set of numbers, as it helps in summarizing the data succinctly.

💡Variability

Variability is a measure that describes the spread or dispersion of a set of data points. In the context of the video, standard deviation is used as a measure of variability, indicating how much the data points deviate from the mean. A smaller standard deviation suggests that the data points are closer to the mean, while a larger one indicates greater spread.

💡Frequency Count

A frequency count is a basic descriptive statistic that shows the number of times each data point occurs in a dataset. The video mentions having done a frequency count earlier, which is essential for understanding the distribution of data and identifying the most common occurrences within the dataset.

💡Z-scores

Z-scores, also known as standard scores, are a measure of how many standard deviations an element is from the mean. In the video, SPSS is used to convert raw scores into z-scores, which helps in understanding the relative standing of each data point. For instance, a z-score of -1 indicates that the data point is one standard deviation below the mean.

💡Outliers

Outliers are data points that are significantly different from other observations, often falling outside the range of typical values. The video script discusses identifying outliers through the 'Explore' function in SPSS, which is important for understanding potential anomalies or errors in the data that could affect statistical analysis.

💡Histogram

A histogram is a graphical representation of the distribution of data, showing the frequency of data points within different ranges or 'bins'. In the video, histograms are used to visualize the distribution of height and weight data, providing a quick visual summary of the data's shape and spread.

💡Stem-and-Leaf Plot

A stem-and-leaf plot is a type of chart used to display the distribution of data. It is particularly useful for preserving the original data points while showing their distribution. The video mentions stem-and-leaf plots as a way to visualize data distribution, offering a detailed view of the data without aggregating it into bins like in a histogram.

💡Box Plot

A box plot, or box-and-whisker plot, is a standardized way of displaying the distribution of a dataset based on five number summary: minimum, first quartile (Q1), median, third quartile (Q3), and maximum. The video describes using box plots to compare distributions of data, such as height and weight, and to identify potential outliers.

💡Categorical Variable

A categorical variable is a variable that can take on one of a limited, and usually fixed, number of possible values, assigning each individual or other unit of observation to a particular group or nominal category on the basis of some qualitative property. In the video, gender is used as a categorical variable to split the analysis and calculate descriptive statistics separately for males and females.

💡Explore Function

The 'Explore' function in SPSS is a statistical tool used for obtaining detailed descriptive statistics and for identifying outliers and extreme values. The video script describes using the 'Explore' function to get a comprehensive set of statistics and to visualize data through various plots, enhancing the understanding of data distribution and characteristics.

Highlights

Introduction to the third video in SPSS for Beginners series from the RStats Institute at Missouri State University.

Overview of previously learned concepts: creating variables, entering data, frequency counts, measures of central tendency, and variability.

Explanation of Descriptive Statistics and their role in describing the characteristics of a dataset.

Mean and standard deviation as measures to understand the average and spread of data points.

Demonstration of how to calculate more descriptive statistics in SPSS using the 'Descriptives' option.

Guidance on selecting variables and using options to customize descriptive statistics output.

Discussion on the importance of Valid N (listwise) for assessing complete cases without missing data.

Analysis of gender data to illustrate the concept of mean and standard deviation in a binary categorical variable.

Use of height and weight data to demonstrate the practical application of mean and standard deviation for interval data.

Introduction of the 'Explore' function in SPSS for detailed descriptive statistics and data visualization options.

Instructions on how to generate histograms, stem-and-leaf plots, and box plots for data visualization.

Explanation of how to identify outliers and extreme values in a dataset.

Demonstration of splitting descriptive statistics by a categorical variable, such as gender, for comparative analysis.

Discussion on the flexibility of SPSS commands like 'Descriptives', 'Frequencies', and 'Explore' for different levels of data analysis needs.

Introduction to standardized values and z-scores in SPSS and their significance in data analysis.

Step-by-step guide on how to save z-scores as variables in SPSS for further analysis.

Explanation of z-scores as a method to understand the relative standing of data points in relation to the mean.

Anticipation of future videos on calculating z-scores manually and performing z-tests in SPSS.

Teaser for upcoming content on exploring correlations and measuring relationships between two scale variables.

Transcripts

play00:08

Welcome to the third video in SPSS for Beginners from the RStats Institute

play00:15

at Missouri State University. So far, we've learned how to create variables in

play00:20

SPSS, enter data, do a frequency count, and calculate measures of central tendency,

play00:26

and measures of variability. Collectively, these are called

play00:30

"Descriptive Statistics," because they describe what the set of numbers looks

play00:34

like. The mean tells us what is the average height, and the standard

play00:39

deviation tells us how spread out the heights are around that average. We are

play00:45

now going to learn a little bit more about descriptive statistics and how to

play00:49

convert them to z-scores.

play00:58

Continuing with the data set that we created in the first video, we're going

play01:02

to get some more descriptive statistics. Go to Analyze -> Descriptive Statistics and

play01:08

choose Descriptives. When this window pops up, move over all of the variables

play01:16

except for the random ID number. Now click on Options. Here we are presented

play01:25

with all kinds of options for descriptive statistics. Some - like mean

play01:30

and standard deviation - are already checked by default.

play01:34

Others - like Range or Sum - are available to check, if we want them.

play01:40

I think that the default settings are good, so let's just continue and click OK.

play01:48

Here in the output, we see some useful information. For instance, we see the

play01:53

number of valid scores for each variable. We also see the Valid N (listwise), which

play01:59

is the number of complete cases with no missing data. For gender, we see that the

play02:05

minimum is 1; the maximum is 2. That is kind-of-useful, at least for

play02:10

checking that we do not have any data entry errors, but the mean and standard

play02:14

deviation for gender is pointless. The average of 1.58 for male and female

play02:21

does not really tell us anything; it just hints that there were a few more females

play02:25

than males. On the other hand, the mean and standard deviation for height and

play02:30

weight can be very useful. For example, the average weight was 133 pounds. The

play02:40

standard deviation was 13.4 pounds, which tells us that about

play02:45

two-thirds of our participants are going to be between 13.4 pounds

play02:50

heavier and 13.4 pounds lighter, than the mean of 133 pounds.

play02:59

Let me show you one more way to get descriptive statistics. This is going to

play03:03

give you even more detail about each variable, and more options for plotting

play03:08

and statistics. Let me show you how. Go to Analyze -> Descriptive Statistics -> Explore.

play03:18

I'm going to do this twice; the first time we're going to focus just on the scale

play03:23

variables. Move height and weight into the dependent list box. Now click on

play03:30

Statistics. Let's really push it this time: click on Outliers and Percentiles,

play03:39

and then Continue. Now click on Plots and choose Histogram.

play03:47

We are really going all out, so click Continue and OK. Look at all of this

play03:55

data! We have every kind of descriptive statistic that you could dream of. We

play04:00

even have a special box for Percentiles. We have another one that would identify

play04:04

if we have any outliers or extreme values. And there's our old friend the

play04:10

histogram. Plus, a small stem-and-leaf plot.

play04:14

Plus a new graph called a box plot. And those are for height only. We have a

play04:21

second set of graphs for weight. So, you can see that we get a lot of information

play04:26

here. But wait...there's more! Let's split all of this by gender.

play04:33

Go to Analyze -> Descriptive Statistics -> Explore. We're going to keep the same

play04:40

settings that we had previously. Simply move gender into the factor list,

play04:45

and then click OK. Now we see the same descriptive statistics have now been

play04:54

calculated separately for males and females. Remember this splitting

play05:00

would work similarly if we had three groups, or four, or more. Notice that we

play05:05

have separate histograms for males and females, separate stem-and-leaf plots, and

play05:12

the box plots are now side-by-side, so that we can do comparisons. Knowing about

play05:19

these options for descriptive statistics can help us to visualize our data,

play05:24

depending upon the level of information that we need. If you need only the basics,

play05:31

use the Descriptives command. If you want flexibility to choose exactly what

play05:38

output you get, use Frequencies. And if you want to know the exquisite details

play05:44

or to split the analysis by a categorical variable, use Explore.

play05:51

But there is something else that I want to show you about these variables. Let's run

play05:56

one more analysis and I will show you how

play06:00

standardized values. Go to Analyze -> Descriptive Statistics -> Descriptives

play06:09

First, I want to move gender out. We don't need that. Notice that there's a

play06:14

little box down here that says "Save standardized values as variables." I'm

play06:20

going to check that, and then click OK. In the output window, we see exactly the

play06:27

same table that we had last time. But if you go back to your spreadsheet, you'll

play06:32

notice two new variables. They are called "Z height" and "Z weight." You know what

play06:41

those are? Those are the z-scores. SPSS converts each height measurement into a

play06:47

standardized score that tells us how many standard deviation units this score

play06:53

is away from the mean. Negative z-scores mean that a raw score is below average.

play07:01

A positive z-score means that it is above average. So, that is a quick and easy way

play07:06

to get z-scores in SPSS. This was just a quick introduction to z-scores. When

play07:14

you are ready to learn more, check out these other videos from RStats

play07:17

Institute and learn how to calculate z-scores by hand or how to do a z-test

play07:23

using SPSS. Next, we are going to explore correlation and learn how to

play07:31

measure the relationship between two scale variables.

play07:45

Rate This

5.0 / 5 (0 votes)

Related Tags
Descriptive StatisticsSPSS TutorialData AnalysisZ-ScoresRStats InstituteMeasures of VarianceCentral TendencyData VisualizationStatistical MethodsEducational Content