02 Descriptive Statistics and Frequencies in SPSS – SPSS for Beginners

Research By Design
5 Dec 201714:01

Summary

TLDRThis tutorial from RStats Institute introduces how to input and analyze data in SPSS. It covers entering data, understanding variable types like nominal and scale, and assigning value labels. The video demonstrates calculating frequencies, generating bar charts and histograms, and using descriptive statistics to explore data distributions. It emphasizes the importance of selecting appropriate statistical methods and visualizations for different types of data.

Takeaways

  • 📊 The video is a tutorial on how to enter and analyze data in SPSS, starting with basic data entry in Data View.
  • 🔑 The first column in the data represents a random identification number for participant anonymity.
  • 👥 The second variable is gender, coded as 1 for Male and 2 for Female, highlighting the concept of 'dichotomous' categorical variables.
  • 📈 The remaining columns are for quantitative variables like height and weight, which are set to Scale due to their ratio level measurement.
  • 🛠 It's possible to change variable names by double-clicking on them, which leads to Variable View for editing.
  • 🏷 Assigning value labels to categorical variables like gender helps in avoiding confusion about what each number represents.
  • 🔢 The video demonstrates how to toggle between numerical values and value labels in Data View for clarity.
  • 📊 Descriptive statistics like frequencies are easily calculated in SPSS using the Analyze menu, providing insights into the distribution of data.
  • 📈 The output from SPSS includes detailed statistics, but it's crucial to understand how to interpret the results, especially the difference between 'Percent' and 'Valid Percent'.
  • 📊 The video also covers how to create visual representations of data, such as bar charts and histograms, and the importance of choosing the right type of chart for the data.
  • 📊 Histograms with a normal curve overlay can help assess the distribution of quantitative data, like height, against a normal distribution.
  • 📊 The tutorial concludes with an introduction to comparing means between groups, such as male and female heights, using SPSS.

Q & A

  • What is the purpose of the random identification number in the data set?

    -The random identification number stands in for the names of the participants and keeps the data anonymous.

  • How is gender coded in the data set and what is the significance of using 1 and 2 for coding?

    -Gender is coded as 1 for Male and 2 for Female. The numbers are placeholders and do not indicate an order or quantity; they simply represent the two categories of gender.

  • What is the difference between nominal and scale variables in the context of this script?

    -Nominal variables, like ID and gender, represent categories without a fixed order. Scale variables, like height and weight, have fixed intervals between scores and a meaningful zero, and they are measured at a ratio level.

  • Why is it important to assign value labels to categorical variables like gender?

    -Assigning value labels helps avoid confusion about what each number represents in the data set, making it easier to interpret the data.

  • How can you toggle between numbers and value labels in Data View?

    -You can toggle between numbers and value labels by clicking the button in Data View that allows you to switch between the two.

  • What is the significance of the 'Sort Ascending' option in the data set?

    -The 'Sort Ascending' option helps to organize the data in ascending order, making it easier to see the range of values, such as the range of weights in the data set.

  • How does SPSS handle missing data in the analysis of frequencies?

    -SPSS provides separate columns for 'Percent' and 'Valid Percent' to differentiate between the total sample size and the valid sample size, which excludes missing values.

  • What is the recommended approach to report percentages in SPSS when there are missing values?

    -It is recommended to report the 'Valid Percent' unless there is a specific reason to report the 'Percent' based on the total sample size.

  • How can you generate a bar chart or histogram in SPSS from the output window?

    -You can generate a bar chart or histogram by selecting 'Charts' from the Frequencies dialog box, choosing the chart type, and then customizing the chart options before running the analysis.

  • What is the difference between a bar chart and a histogram for displaying data?

    -A bar chart is suitable for displaying nominal data with discrete categories, while a histogram is appropriate for scale data that can show the distribution and density of the data.

  • How can you compare means between different groups, such as males and females, in SPSS?

    -You can compare means between different groups by using the 'Compare Means' option in the Analyze menu, selecting 'Means', and specifying the independent (gender) and dependent (height) variables.

  • What is the importance of understanding descriptive statistics before running other types of analysis?

    -Understanding descriptive statistics allows you to get a preliminary understanding of your data, including its distribution, central tendency, and variability, which is crucial before conducting more complex analyses.

Outlines

00:00

📊 Data Entry and Variable Understanding in SPSS

This paragraph introduces the process of entering data into SPSS following the creation of variables. It explains how to input data in Data View, the significance of each variable (ID, gender, height, and weight), and the importance of maintaining data anonymity. The speaker guides viewers on how to change variable names through Variable View and discusses the measurement levels of variables, such as nominal for ID and gender, and scale for height and weight. The paragraph also covers assigning value labels to categorical variables for clarity and demonstrates how to toggle between numerical codes and value labels in Data View. It concludes with a brief mention of sorting data to visualize the range of values.

05:01

📈 Descriptive Statistics and Data Visualization in SPSS

The second paragraph delves into analyzing data using SPSS's Analyze menu, focusing on calculating frequencies for categorical data. It details the process of selecting variables for analysis, the output provided by SPSS, and the importance of interpreting this output. The summary includes a discussion on valid sample sizes, missing data, and the distinction between 'Percent' and 'Valid Percent' in frequency tables. The paragraph also explores the creation of bar charts and histograms for visual data representation, emphasizing the importance of selecting appropriate statistical graphics for the type of data. It concludes with a demonstration of generating additional statistics like mean, standard deviation, and a normal curve overlay on histograms from the output window.

10:04

🔍 Advanced Data Analysis Techniques in SPSS

The final paragraph discusses advanced techniques for analyzing data in SPSS, such as comparing means between different groups. It explains how to use the 'Compare Means' function to examine differences in height between males and females, using gender as an independent variable. The paragraph highlights the output provided by this analysis, including separate means and standard deviations for each gender. It also touches on the importance of conducting preliminary data analysis through frequency counts, charts, and descriptive statistics before proceeding with more complex analyses. The speaker concludes by previewing the content of the next video, which will cover further descriptive statistics and the conversion of raw scores into z-scores.

Mindmap

Keywords

💡SPSS

SPSS, which stands for Statistical Package for the Social Sciences, is a software package used for statistical analysis in various disciplines. In the video, SPSS is the primary tool being demonstrated for beginners to learn how to enter data, manage variables, and perform basic statistical analyses. It is central to the video's theme of teaching data analysis.

💡Data View

Data View is a mode within SPSS that allows users to input and view data in a spreadsheet format. The script mentions entering numbers into SPSS in Data View, which is the initial step for data analysis, setting the stage for the rest of the tutorial.

💡Variable

In the context of statistics and SPSS, a variable is a characteristic or attribute that can vary from one observation to another. The video script discusses creating variables in the first video and later entering data for these variables, such as ID, gender, height, and weight, which are essential for understanding data structure.

💡Nominal

Nominal is a level of measurement that categorizes data into distinct groups without any inherent order. In the script, the 'ID' and 'gender' variables are identified as nominal because they represent categories like identification numbers and gender without a rank or interval scale.

💡Dichotomous

Dichotomous refers to a variable that has only two categories. The script uses this term to describe the 'gender' variable, which is coded as 1 for Male and 2 for Female, illustrating the concept of binary categorical variables.

💡Quantitative Variables

Quantitative variables are variables that represent quantities and can be measured on a numeric scale. In the video, height and weight are described as quantitative variables, which are important for statistical analysis because they allow for various mathematical operations and interpretations.

💡Scale

Scale in SPSS refers to a measurement level where the data has a meaningful zero and equal intervals between values. The script mentions that both height and weight are set to Scale because they are ratio level variables, indicating that they can be used for more complex statistical measures.

💡Value Labels

Value labels are used in SPSS to assign a descriptive label to a numeric code. The script explains assigning 'Male' to the number 1 and 'Female' to the number 2 for the gender variable, which helps in interpreting the data without confusion.

💡Frequencies

Frequencies in statistics refer to the number of occurrences of each value in a data set. The video script describes using SPSS to calculate frequencies for the gender variable, which is a fundamental step in understanding the distribution of categorical data.

💡Descriptive Statistics

Descriptive statistics are used to summarize and describe the main features of a data set. The script mentions using descriptive statistics in SPSS to find measures like mean, standard deviation, minimum, and maximum for the height variable, which provides a quick overview of the data's central tendency and dispersion.

💡Histogram

A histogram is a graphical representation of the distribution of a dataset. The script discusses generating a histogram for the height variable in SPSS, which helps visualize the distribution of quantitative data and assess its normality.

💡Bar Chart

A bar chart is a graph that represents data with rectangular bars, where the length of each bar is proportional to the value it represents. The video script uses the bar chart as an example for displaying gender data, showing the count of males and females, which is appropriate for categorical data.

💡Compare Means

Comparing means is a statistical method to evaluate if there are any significant differences between the averages of two or more groups. In the script, the video demonstrates how to use SPSS to compare the mean height between males and females, which is a common analysis in research to understand group differences.

Highlights

Introduction to the second video in the SPSS for Beginners series by RStats Institute at Missouri State University.

Teaching how to add data to SPSS by starting in Data View with the previously created variables.

Instructions to enter specific numerical data into the SPSS spreadsheet for practice.

Explanation of the importance of understanding what the data represents, including participant identification and demographic variables.

Demonstration of how to change variable names in SPSS using Variable View.

Clarification of the difference between nominal and quantitative variables with examples.

Description of assigning value labels to categorical variables for clarity in data interpretation.

Illustration of toggling between numeric codes and value labels in Data View for better data comprehension.

Guidance on sorting data in SPSS to view ranges and identify potential missing values.

Introduction to analyzing data frequencies in SPSS using the Analyze menu.

Explanation of interpreting SPSS output, including the difference between total and valid percentages.

Tutorial on creating bar charts and histograms in SPSS for visual data representation.

Discussion on the appropriateness of statistical graphs and measures for different types of variables.

Demonstration of calculating and comparing means for different groups using SPSS.

Preview of the next video's content focusing on descriptive statistics and z-scores.

Transcripts

play00:06

Welcome to the second video in SPSS for Beginners from RStats Institute

play00:12

at Missouri State University. In our first video, we learn how to create

play00:17

variables in SPSS. The next step is to add some data, and we're going to begin

play00:22

in Data View. Here in Data View these are the same four variables that we created

play00:30

in the first video. So now we can add some numbers. Pause the video and enter

play00:40

these same numbers into your SPSS spreadsheet.

play00:54

Now that we have numbers, it's important to understand just what these data

play00:58

represent. The first column is a random identification number. It stands in for

play01:05

the names of the participants and keeps our data anonymous. This second variable

play01:12

is gender. And these last two columns represent the height and the weight for

play01:19

each participant. Even after you've named a variable, it's possible to change the

play01:24

variable names. Double-click on a variable name to change it. When you do,

play01:30

you will be taken to Variable View, which is where you actually will make the

play01:34

changes. We set the measure for each variable previously. The ID variable is

play01:41

nominal because it stands for a number. It stands in for a participant's name. The

play01:48

variable "gender" is also nominal, and we are going to code gender as 1 and 2,

play01:54

for Male and Female. When a categorical variable has only two categories, we call

play02:01

it "dichotomous" So the 1 and the 2 are categories. You can be in one category or

play02:06

the other. You can't be in both; you can't be in neither. These last two columns

play02:14

represent height and weight. Height and weight are both quantitative variables,

play02:19

not categorical. They are measuring something. They both have fixed intervals

play02:25

between the scores, and they both have a meaningful zero. Both height and weight

play02:32

are set to Scale because they are both ratio level. Before we begin analyzing

play02:38

these numbers, there is one other thing that we should do. For a variable like

play02:43

gender - where we did not code the 1 and the 2 - we don't want to get confused with

play02:48

who was male who was female; what number stood for what. And so we are going to

play02:54

assign value labels for each level of this categorical variable.

play03:00

Click on Values. I'm going to tell SPSS to represent all of the 1's as Male and

play03:09

all of the 2's as Female. Of course, I could make 2 = Female or

play03:15

0 = Female; really any number that I wanted to, depending on the coding.

play03:20

The 1 does not mean that males are "first place." The 2 does not mean that females

play03:25

are twice as good the number is only a placeholder; it does not indicate an order

play03:30

or a quantity. So, now click OK. In fact, when I return to Data View, you can

play03:37

see the numbers, but watch this: you see this button? Click it and you can toggle

play03:44

between numbers and value labels. Let's leave this set with the value labels on.

play03:49

It's just easier that way. Now we can look at our data. The height of our

play03:55

participant was measured in inches, and we have values between 60 and 70 inches,

play04:00

which is between five and six feet tall (1.5 to 1.8 meters). Weight was measured in

play04:08

pounds. Of course it might be easier to see the range if we sorted these data.

play04:13

Ctrl-click on Mac or right-click on PC and choose "Sort Ascending." All of these

play04:22

participants were between 116 and 153 pounds. Just for illustration, I'm going to

play04:29

pretend that there were two participants for whom we did not get their height or

play04:33

their weight; both of them female. Notice that when the numbers are toggled on, all

play04:40

that I need to do is type "2"; however, when the value labels are toggled on,

play04:47

I need to double-click and select "Female." So now I think we're ready to analyze

play04:55

these data. One of the simplest things that we can do is to count up how often

play05:00

things occur. For example, we want to know how many males and females were in

play05:05

our sample. We want their frequencies. This is easy enough to do in SPSS. We're

play05:11

going to use the Analyze menu. Whenever you

play05:16

run an analysis in SPSS, you use the Analyze menu. We can see that there are lots of

play05:22

options, each with their own sub-menus and sub-sub-menus. The one that we want

play05:30

is Analyze -> Descriptive Statistics -> Frequencies. This window pops up and you

play05:40

will see lots of windows of this type in SPSS. All of the variables that we have

play05:45

in our dataset are on the left, and the variables that we want to analyze go on

play05:50

the right. You can select a variable for analysis by clicking on its name and

play05:55

then clicking on this arrow between the boxes. Alternatively, you can also

play06:01

drag-and-drop, and in some cases you can double click. Let me show you just how

play06:07

easy it is to use SPSS: click OK. What we are seeing now is the output window, and

play06:16

here is something very important to know about SPSS, especially compared with

play06:19

other types of statistical software: SPSS will give you copious amounts of output,

play06:25

often more than you really need, and you need to know how to interpret that

play06:31

output. In SPSS, it is easy to run an analysis, but it takes some education to

play06:38

learn how to interpret the output. First, we see a summary of the variables in the

play06:48

box labeled "Statistics." We have 12 valid scores for gender with no missing data,

play06:54

but for height, we only have scores for 10 people, with 2 missing values.

play07:01

The valid sample size is the number of participants for whom we actually have

play07:06

scores. This first frequency table is for gender. The total tells us that we have

play07:13

12 valid scores. We see that there are 5 males and 7 females.

play07:20

Notice the columns for "Percent" and "Valid Percent." They are exactly the same. They are the

play07:25

same because we have no missing values for gender. This second frequency table

play07:31

is for height. Remember that we have missing values for height for two of our

play07:37

participants, so we see the valid total is 10. Two values are missing in the data

play07:43

set - called system missing - and the total is 12. We see that the Percent column is

play07:50

different than the Valid Percent column. The Percent column is calculated based

play07:56

on the total sample size of 12; the Valid Percent is calculated on the valid

play08:02

n of 10 people for whom we actually have data. I recommend reporting the valid

play08:09

percent column unless you have a specific reason why you need to report

play08:13

Percent. Well, this is a good start, but we can do better. Let's make some pictures

play08:21

of our data. I am going to run another analysis and I want you to see that you

play08:26

do not need to go back to the data set. You can run a new analysis from the

play08:31

output window, as well. Just click on Analyze -> Descriptive Statistics ->

play08:39

-> Frequencies. You can see that our previous analysis is still in the window.

play08:45

We could clear it by clicking on this Reset button, but let's just continue

play08:50

with these data. So this time, click on Charts, and then under Chart Type, click

play08:58

on Bar Charts. Let's change Chart Values to Percentages. Click Continue, but

play09:05

before you click OK, let's turn off the frequency tables

play09:09

because we already have those. Now, click OK.

play09:14

In the output window, we see that the chart for gender looks really good.

play09:19

We have two distinct bars, one for male one for female, and we can estimate the

play09:25

percentages of each. But when we look at the bar chart for height, the options

play09:31

just don't look as good. We can definitely do better. Let's run

play09:36

another analysis. Click on Analyze -> Descriptive Statistics -> Frequencies.

play09:45

This time, click on Charts, and then under Chart Type, click on Histogram. Let's also

play09:53

choose "Show normal curve on histogram." Notice that the chart values are now

play09:58

gray, because we don't need them. Click continue, but before clicking OK, let's do

play10:04

one more thing. Click on Statistics. Here we can choose other options like the

play10:09

mean, the standard deviatio,n the minimum, and maximum. We could also get the

play10:14

variance, the standard error of the mean, and the sum is good, too. As you see, we

play10:21

can pick as many of these options as we would like. If we change our mind, we can

play10:25

unselect them, too. Click continue and then OK. In the output

play10:34

window, we see all of the statistics that we asked for. For example, the average

play10:39

height was 65.8 inches. The tallest person? 70 inches tall. The shortest? 62

play10:46

inches tall. If we added up all of their heights, they would total 658 inches.

play10:55

But notice this first histogram for gender. It just doesn't look good, not like it

play11:00

did with the bar chart. The bars are connected, but gender is supposed to be

play11:04

discrete categories. We no longer see the labels for males and females. And the

play11:10

normal curve makes absolutely no sense. On the other hand, the histogram for

play11:17

height is much improved. The bars touch, indicating that the data are connected,

play11:24

and the superimposed normal curve makes sense with these data. We can see that

play11:30

the shape of the data match reasonably well with a normal distribution. The

play11:37

important thing to learn here is that you should choose the statistics and the

play11:42

graphs that are appropriate to your data. A nominal variable like gender should be

play11:49

reported with frequencies and a bar chart. Scale variables like height should

play11:56

be reported with a mean, standard deviation, and a histogram. We know that

play12:03

the average height for all participants is 65.8 inches, but

play12:08

what if we want to split that by males and females? Let me show you how. Click on

play12:14

Analyze, but instead of Descriptive Statistics, choose Compare Means and this

play12:20

first option, simply labeled, "Means." Here we have the options for dependent

play12:27

variables. "Layers" refers to the independent variable, or categorical

play12:32

variable. We did not really assign people to the condition called gender, so gender

play12:38

would really be what is called a "quasi independent variable." Still, we will use

play12:44

gender as our independent variable. We want to examine differences in height, so

play12:51

height will be the dependent variable. Now click OK. We can see the means and

play12:59

the standard deviations from males and females separately and together. There

play13:06

are 5 each for males and females, 10 total. We see that males were a few

play13:12

inches taller on the average than females.

play13:15

The total mean and standard deviation here are the same as the values that we

play13:20

got earlier using the Frequencies command.

play13:25

Overall frequency counts, charts, and descriptive statistics are a great way

play13:30

to take a peek at your data and see just what you have. It's a good idea to do

play13:36

this before running any other kind of analysis. In our next video, we will

play13:41

look a little bit more at these descriptive statistics and how to

play13:46

convert raw scores into z-scores. I'll see you then.

play13:57

Rate This

5.0 / 5 (0 votes)

Related Tags
SPSS TutorialData AnalysisBeginners GuideGender CodingHeight WeightVariable NamesData EntryDescriptive StatsHistogramsBar ChartsRStats Institute