Normal Data Analysis with Software Part 1

Matt Teachout
16 Apr 202012:36

Summary

TLDRIn this tutorial, Matt demonstrates how to use statistical software, specifically StatKey and Statcato, to analyze normal quantitative data. Using a dataset of wrist circumferences from 40 women, he walks through the steps of importing data, calculating descriptive statistics like mean and standard deviation, and visualizing data through histograms and dot plots. The lesson emphasizes identifying normally distributed data, explaining the importance of the mean and standard deviation, and ensuring that the data is bell-shaped for accurate analysis. This video provides a hands-on guide for effective data analysis without manual calculations.

Takeaways

  • πŸ“Š The video introduces how to use software to analyze normal quantitative data, specifically focusing on using the mean and standard deviation.
  • πŸ’» The speaker emphasizes the importance of using software to handle calculations, rather than doing them by hand.
  • πŸ“ˆ The dataset used for the analysis is health data, focusing on the wrist circumference of 40 randomly selected women, measured in inches.
  • πŸ”’ The speaker demonstrates how to copy the data from an Excel file and paste it into analysis software for processing.
  • πŸ–±οΈ StatKey is the first tool used to calculate the mean (5.067) and standard deviation (0.331) of the wrist circumference data.
  • πŸ“ The importance of checking the shape of the data is highlighted, and a histogram is generated to confirm the data has a normal distribution.
  • πŸ“‰ The speaker reduces the number of bins in the histogram to better visualize the normal distribution, which is bell-shaped with symmetry on both sides.
  • πŸ“Š A comparison between the mean and median is made to confirm that the data is not skewed, as they are close in value.
  • πŸ“‹ The second software, Statcato, is introduced, and similar calculations (mean and standard deviation) are performed using the same data.
  • πŸ“ The speaker explains that the mean plus or minus one standard deviation captures about 68% of the data, showing the typical range of wrist circumferences in this dataset.

Q & A

  • What type of data is being analyzed in the video?

    -The video analyzes normal quantitative data, specifically focusing on the wrist circumference of 40 randomly selected women, measured in inches.

  • Why is software used to calculate the mean and standard deviation?

    -Software is used to calculate the mean and standard deviation to avoid manual calculation errors and speed up the process, especially when working with large datasets.

  • What software tools are demonstrated in the video for data analysis?

    -The video demonstrates the use of two tools: StatKey and StatCato for calculating descriptive statistics like the mean and standard deviation, as well as creating histograms and dot plots.

  • How does the presenter suggest pasting data into StatKey?

    -The presenter suggests clicking 'Edit Data' in StatKey, deleting any existing data using Ctrl+A (or Command+A on a Mac), and then pasting the new data with Ctrl+V (or Command+V).

  • What steps are recommended for checking if data is normally distributed?

    -To check for normal distribution, the presenter recommends creating a dot plot or histogram in StatKey or StatCato, and adjusting the number of bars (or bins) to ensure the highest bar is in the middle and the tails are symmetric.

  • What does the mean and standard deviation represent in the context of this dataset?

    -The mean (5.067 inches) represents the average wrist circumference of the women in the sample, while the standard deviation (0.331) indicates the spread of the wrist circumference values around the mean.

  • Why is it important to assess the shape of the dataset before using the mean?

    -It is important to assess the shape of the dataset to ensure it is normally distributed, as the mean is only an accurate measure of central tendency when the data follows a normal or bell-shaped distribution.

  • What additional statistics can be calculated using StatCato?

    -In addition to the mean and standard deviation, StatCato can calculate other statistics like the minimum, maximum, median, and sample size (n).

  • What is the significance of the mean and median being close to each other?

    -When the mean and median are close to each other, it suggests that the data is symmetric and not skewed, indicating that the mean is a reliable measure of central tendency.

  • How does the video demonstrate calculating the range of typical values?

    -The video demonstrates calculating the range of typical values by adding and subtracting the standard deviation from the mean. This range (4.736 to 5.398 inches) represents the central 68% of the data, which is typical for normally distributed data.

Outlines

00:00

πŸ’» Introduction to Analyzing Quantitative Data with Software

In this section, Matt introduces the topic of using software to analyze normal quantitative data, focusing on why it is more efficient to use computer programs rather than manual calculations. He provides an example using health data from his website, which includes various statistics like age, height, and wrist circumference for 40 men and 40 women. Matt walks through the process of selecting the 'women’s wrist circumference' data from an Excel file, copying it, and preparing it for analysis using statistical software tools like StatKey.

05:01

πŸ“Š Understanding Data Shape and Normality

Here, Matt explains the importance of knowing the shape of the data to determine if the mean and standard deviation can be used effectively. He discusses how to visually inspect the data's normality using histograms and dot plots. Matt demonstrates adjusting the number of histogram bins (called 'buckets' in StatKey) and explains how the highest bar in the middle and symmetry in the tails signify a normal, bell-shaped distribution, which justifies using the mean and standard deviation.

10:02

πŸ”’ Calculating Basic Statistics with StatKey and StatCato

In this paragraph, Matt compares the use of StatKey and StatCato for calculating statistics like mean and standard deviation. He walks through how to paste the wrist circumference data into both tools, explaining how to manipulate columns and bins in StatCato. He reiterates that fewer bins (around three or five) are better for small data sets, and both tools show a normal data distribution. He highlights how the close proximity of the mean and median is a strong indicator of normality.

πŸ“ Using Statistical Tools to Calculate and Interpret Data

Matt discusses the process of finding essential statistics, such as the mean, standard deviation, minimum, maximum, and sample size (n), using the statistics menu in StatCato. He demonstrates how these values are important for understanding the data, emphasizing how adding and subtracting the standard deviation from the mean can reveal 'typical values' that represent 68% of the data. Finally, he organizes the wrist data in ascending order and identifies typical values to further analyze the dataset visually.

Mindmap

Keywords

πŸ’‘Quantitative Data

Quantitative data refers to data that can be measured numerically, such as heights, weights, and ages. In the video, the data analyzed includes the wrist circumferences of women, and the instructor uses this data to demonstrate statistical analysis using software. This concept is central to the video as it deals with how to process and analyze numeric information using mean, standard deviation, and other statistical measures.

πŸ’‘Mean

The mean is the average of a set of numbers, calculated by summing all the values and dividing by the total count. In the video, the mean wrist circumference for the sampled women is calculated using statistical software. The mean is crucial because it provides a central value around which other data points are compared, helping determine the 'typical' wrist size in the sample.

πŸ’‘Standard Deviation

Standard deviation measures the spread or variability of a dataset, indicating how much individual data points differ from the mean. In the video, the standard deviation for the wrist circumference data is 0.331, which shows how the wrist measurements vary around the mean. This concept is key to understanding the distribution of data and whether it is concentrated around the mean or more spread out.

πŸ’‘Normal Distribution

A normal distribution is a bell-shaped curve where most data points are concentrated around the mean, with fewer points as you move away. In the video, the instructor demonstrates how to determine if the wrist circumference data follows a normal distribution using a histogram and dot plot. This concept is essential because many statistical methods, like using the mean and standard deviation, assume normality in the data.

πŸ’‘Histogram

A histogram is a type of bar graph that represents the frequency of data points in different intervals (or 'bins'). In the video, the instructor uses a histogram to visually assess whether the wrist data is normally distributed. The number of bins in the histogram is adjusted to better fit the small sample size, showing how the shape of the distribution is revealed through graphical representation.

πŸ’‘Dot Plot

A dot plot is a simple graphical representation where each data point is shown as a dot along a scale. In the video, a dot plot is used alongside the histogram to provide another visual way of assessing the distribution of the wrist circumference data. This helps to visually confirm whether the data is normal, with most dots clustering in the middle.

πŸ’‘StatKey

StatKey is an online tool used to perform statistical calculations and create graphs, mentioned frequently in the video. The instructor uses StatKey to compute the mean and standard deviation and to generate visualizations like histograms for the wrist circumference data. This tool is central to the video as it performs the 'heavy lifting' of data analysis without requiring manual calculations.

πŸ’‘Statcato

Statcato is another statistical software tool introduced in the video for conducting basic statistical analysis. The instructor demonstrates how to use Statcato to input data, create histograms, and compute basic descriptive statistics such as the mean and standard deviation. Statcato provides an alternative to StatKey, showing multiple ways of approaching the same problem with different software.

πŸ’‘Descriptive Statistics

Descriptive statistics involve summarizing and organizing data using measures like mean, median, and standard deviation. In the video, the instructor uses descriptive statistics to summarize the wrist circumference data for the group of women. The goal is to provide a clear, concise summary of the dataset, helping to identify key features such as central tendency and variability.

πŸ’‘Sample Size

Sample size refers to the number of data points in a dataset, which affects the reliability and accuracy of statistical results. In the video, the sample size consists of 40 women, and understanding this is critical when assessing whether the data is representative of a larger population. The sample size influences the interpretation of statistical measures, including how well the data fits a normal distribution.

Highlights

Introduction to using software for analyzing normal quantitative data.

Focus on not calculating mean and standard deviation by hand, but rather using computer programs.

Example data set used: health data from 40 women and 40 men, including age, height, and weight.

Specific analysis on women's wrist circumference data in inches.

Emphasized the importance of not altering raw data; advised copying data into new sheets for analysis.

Use of StatKey software to calculate the mean and standard deviation from pasted data.

Initial results: mean wrist circumference is 5.067 inches, and standard deviation is 0.331 inches.

Understanding data shape: importance of normal or bell-shaped distribution to validate the use of mean and standard deviation.

Visualizing data using histograms in StatKey to confirm a normal distribution.

Tips for adjusting histogram bars (bins) in StatKey to better represent small data sets.

Discussion on identifying normal distribution based on symmetrical histogram with the highest bar in the middle.

Comparison between mean and median to further assess data normality.

Use of Statcato software to visualize data and calculate descriptive statistics like mean, standard deviation, and sample size.

Explanation of how mean and standard deviation give typical data values and how to calculate these typical values.

Final data insights: values between 4.736 and 5.398 inches are considered typical for wrist circumference.

Transcripts

play00:00

hi everyone this is Matt to show with

play00:02

intro stats and today we're going to

play00:05

look how to use software to analyze

play00:09

normal quantitative data so we've kind

play00:13

of seen the theory behind that we've

play00:14

seen that we want to use the mean and

play00:16

the standard deviation when the data

play00:18

looks normal

play00:19

but again we actually don't want to

play00:21

calculate this stuff by hand we want to

play00:22

actually have a computer program

play00:24

calculate do the heavy lifting for us so

play00:27

let's take a look at an example so this

play00:29

again is my website Matt - to show our G

play00:32

and we need some data so I'm going to go

play00:34

to the statistics tab and I'm going to

play00:38

click on data sets the data we're going

play00:41

to look at today is the health data this

play00:43

has here's the health data so I'm gonna

play00:47

look here it's right here health data

play00:51

I'm just gonna open the excel file so

play00:53

health data Excel and you're gonna get

play00:59

this one now this data actually has a

play01:02

lot of data it has the random sample

play01:05

data from 40 women and 40 men so it has

play01:08

the data separated so you can see the

play01:11

men's ages and heights and weights and

play01:14

women's ages and Heights and weights and

play01:17

things like that and it also has the

play01:18

combined data so the combined data has

play01:21

all 80 of the men and women so it does

play01:26

have quite a beautiful day to set so

play01:28

when you're looking for something just

play01:29

keep kind of scrolling to the right

play01:31

until you find what you're looking for

play01:32

the data I'm going to look at today is

play01:36

women's wrist circumference and infor in

play01:39

inches so the they looked at they

play01:41

measured how far around the wrist is for

play01:46

these 40 randomly selected women in

play01:48

inches okay

play01:51

so again whenever you're working with

play01:54

data like don't it's not as a good idea

play01:56

don't mess up some raw data so I'm just

play01:58

gonna copy this we learned last time

play02:01

that if you want you hold your cursor

play02:03

above the column so it turns into a

play02:05

downward arrow and then just left-click

play02:07

and it'll highlight the entire column

play02:10

for you

play02:10

and then push control-c or its command

play02:14

see if you're in a mac and then we're

play02:17

going to go ahead and paste that into a

play02:19

new data set here so we have a new data

play02:25

set here

play02:28

there we go it's right there there's the

play02:30

new data set yeah if I want to have the

play02:36

computer calculate I'm going to use stat

play02:38

Cato and stat key to calculate the mean

play02:40

and standard deviation so we'll start

play02:43

with we'll start with looking at stat

play02:49

key so again you go to lock 5s com

play02:53

that's where stat key is hosted you

play02:56

don't need to save it it works so you

play02:58

can just open it click on the stat key

play03:01

button now this was one quantitative

play03:04

data set so we're gonna click on one

play03:07

quantitative variable it's under the

play03:09

descriptive statistics and graphs sort

play03:12

of on the top left it says one

play03:16

quantitative variable so we click on

play03:20

that now whenever you're pasting data

play03:29

into something into a into stat key

play03:33

usually you're gonna click the edit data

play03:34

button so edit data now right now this

play03:39

is kind of looks kind of weird you'll

play03:40

notice you'll see the the numbers the

play03:42

quantitative data but it also has a word

play03:45

next to every number that sometimes

play03:47

called an identifier these were how long

play03:50

do certain animals live in years and

play03:52

they were kind of showing what animal

play03:55

that that number came from that's called

play03:57

an identifier most of my data sets don't

play03:59

have an identifier necessarily so so

play04:03

what you're gonna do I'm gonna do is

play04:05

just delete out this data again an easy

play04:09

way to delete data out you can click and

play04:11

drag it down if you want but I just like

play04:14

push ctrl a ctrl a will highlight

play04:17

everything even if you got a really huge

play04:19

data set and then just push delete so

play04:22

ctrl a delete

play04:24

now I want to paste in the data that the

play04:27

wrist data from the 40 randomly selected

play04:30

women so I'm going to do that now if you

play04:33

look this data does have a title so

play04:35

titled means header row in stat in stat

play04:38

key so you leave this checked it says

play04:40

header row but it does not have an

play04:42

identifier it doesn't have a word next

play04:44

to every number so you want to uncheck

play04:46

that one and then just push OK and there

play04:50

we go

play04:51

if you notice right away it calculated

play04:53

there was 40 women that's the sample

play04:55

size we see the mean was 5.0 six seven

play04:58

we see the standard deviation is 0.33

play05:00

one but again we have to know we we have

play05:04

to know two what the shape of this data

play05:06

set looks like for it to be for us to

play05:09

know if we can use the mean and standard

play05:10

deviation so what we said was it needed

play05:13

to be normal or bell-shaped we can kind

play05:16

of see from the dot plot that it does

play05:19

sort of look sort of normal we got more

play05:22

dots in the middle and as we get away

play05:24

from the middle we have fewer and fewer

play05:26

dots but it's better to look at the

play05:28

histogram let's take a look at the

play05:29

histogram now the the number of bars is

play05:34

a lot for a data set is so small we only

play05:37

have 40 women in this data set so having

play05:39

10 bars or 10 bins or a stat key calls

play05:43

it buckets so 10 buckets is a lot of a

play05:47

lot of bars for this for this small of a

play05:50

data set so I like to decrease the

play05:52

amount of buckets you can kind of play

play05:54

around with this a little bit and have

play05:56

these go back and forth a little bit but

play05:58

you can see right away that that looks

play06:00

pretty normal right you can kind of see

play06:02

how the data looks looks very normal and

play06:06

also the mean which is the average 5.0

play06:09

67 is pretty close to the highest bar

play06:12

it's sort of right on the highest bar so

play06:15

this is what we call normal or

play06:16

bell-shaped the highest bar is in the

play06:18

middle and the left tail is about

play06:20

symmetric with the right tail all right

play06:23

that's though you know normal when you

play06:24

see it you can also decrease the number

play06:28

of buckets if you want a number of bins

play06:29

with this slider so here's here's five

play06:33

bars or even three bars three bars works

play06:37

pretty well

play06:38

especially if the data is very very

play06:39

small but this thing this data set

play06:41

looked normal even at five bars or even

play06:44

at seven bars so it looks pretty normal

play06:46

we got we definitely have the highest

play06:48

bar in the middle and it looks pretty

play06:50

symmetric so we this is normal and what

play06:53

we learned in our theory is that that's

play06:56

when the only time the mean actually is

play06:58

an accurate average so that we are

play07:00

allowed to use the mean one thing to

play07:03

take take a look at is that the mean and

play07:05

the median are actually very close in

play07:09

when we get to our discussions about

play07:12

skewed data the the mean sort of gets

play07:17

pulled in the direction of the skew but

play07:20

if the mean and the median are actually

play07:21

very close that's usually a good sign

play07:23

that you're probably the mean is is

play07:26

somewhat of an accurate average also

play07:29

standard deviation is going to be our

play07:31

spread for normal data okay so we know

play07:34

the mean is five point zero six seven

play07:36

and the standard deviation is 0.33 one

play07:39

now I could have actually found this on

play07:41

on statcato as well so whenever you're

play07:47

copying and pasting in stat kado you

play07:49

always want to make sure the title again

play07:51

is in the gray so I'm gonna paste in

play07:53

that data there it is women's wrists

play07:57

data by the way you can pull these if

play07:59

you put the cursor in between where it

play08:01

says c1 and c2 you'll see it turns into

play08:03

a sideways double arrow if you click you

play08:06

can drag it open and see the title

play08:08

better if you want okay so let's take a

play08:12

look at our graph if we want to graph

play08:13

this and see what the shape is I would

play08:15

go graph and histogram all right there's

play08:18

our histogram and then you just click on

play08:21

the column that you want to make a graph

play08:23

of this button down here this is show

play08:26

legend is a good button it puts a title

play08:29

on it for you so it's just kind of nice

play08:31

I always like my graphs tab titles and

play08:33

number of bins I know a stat key called

play08:36

them buckets

play08:37

yes but stat kado calls them bins and so

play08:42

the number of bins 10 is quite a few I

play08:44

think I'm gonna decrease this I think we

play08:47

even like three bars would probably be

play08:50

pretty well or

play08:51

five or seven three usually works pretty

play08:54

well if you have a small small data set

play08:56

and we can see it does look normal right

play08:58

the highest bar is definitely in the

play09:00

middle so it looks looks pretty

play09:02

symmetric okay and we can also have a

play09:08

dot plot if we want it so graph and dot

play09:10

plot and just click on that on the

play09:13

column you want again show legend we'll

play09:15

put a title and there's our dot plot so

play09:19

you can kind of see even the dot plot

play09:20

looks kind of normal

play09:22

alright and now how do we find

play09:25

statistics on stat kado well it's under

play09:28

the statistics menu so you go to

play09:31

statistics basic statistics descriptive

play09:35

statistics

play09:35

so again statistics basic statistics

play09:39

descriptive statistics if you just want

play09:41

to kind of calculate some basic

play09:42

statistics for a quantitative data set

play09:44

that's where you'd look again it will

play09:49

ask you input variable that wants the

play09:51

column so sometimes it'll have like a

play09:53

you know you can click on the column but

play09:56

this one doesn't so we have to type it

play09:58

in c1 stands for column one so it's the

play10:02

one that says here it's asking what

play10:03

variable what column your data is in and

play10:06

notice you have all kinds of different

play10:08

statistics that they will calculate

play10:10

here's a few that are I think are

play10:12

important especially if you already know

play10:13

that what I what I always do is find the

play10:15

shape first and then identify which of

play10:17

these statistics I want based on the

play10:19

shape since we already know this data

play10:21

was normal we're gonna want the mean and

play10:25

the standard deviation so that they're

play10:28

the meanest standard deviation right

play10:30

there also it's nice to have the min and

play10:32

the max I'd like to know the min and the

play10:34

max cuz those might be outliers and I

play10:37

also like to know how many numbers there

play10:38

were so I like this button this is n

play10:40

total if you remember the letter n

play10:42

stands for sample size or how many

play10:44

numbers are in your data set so those

play10:46

when I have a normal data set these are

play10:48

sort of the statistics I like to find

play10:50

and I just pushed ok and there we go

play10:54

it calculated it for us notice it got

play10:56

the same really the same numbers that

play10:58

stat key gave us the mean was 5.0 6 7

play11:02

and the standard deviation is 0.33 one

play11:05

the men was 4.2 max 5.8 and the end

play11:08

total we had 40 women in the dataset

play11:12

now what I want to show you though is

play11:14

what are these numbers mean in terms of

play11:17

the theory we learned last time so last

play11:20

time we learned that if you add and

play11:22

subtract the mean and the standard

play11:23

deviation you get the typical values the

play11:26

typical values so five point zero six

play11:28

seven the mean minus 0.33 one the

play11:31

standard deviation gives us four point

play11:33

seven three six and the mean plus the

play11:37

standard deviation gives us five point

play11:39

three nine eight if you remember last

play11:41

time where I mentioned that that's about

play11:43

the middle 68% of the data now I just

play11:47

want to kind of show you where that

play11:48

sounds oh it's good to kind of have a

play11:50

visual of the data now the one thing

play11:51

about this is again this data is not in

play11:54

order so what I'd like to do is just

play11:58

kind of put it in order so I'm gonna

play11:59

highlight this data set I'm gonna go to

play12:02

home and right here you'll see sort sort

play12:06

and I'm gonna sort smallest to largest

play12:09

there we go and that just makes it

play12:12

easier for me to kind of see this

play12:13

visually so anything between four point

play12:17

seven three six so four point seven

play12:21

would be less so it has to be between

play12:23

four point seven three six and five

play12:25

point three nine eight are going to be

play12:27

considered typical all right so this

play12:30

would be considered typical all right so

play12:34

there's our our four point eight

Rate This
β˜…
β˜…
β˜…
β˜…
β˜…

5.0 / 5 (0 votes)

Related Tags
Data AnalysisQuantitative DataStatKeyStatCatoHealth DataStatisticsDescriptive StatsData VisualizationNormal DistributionStandard Deviation