Standard Deviation

Bozeman Science
1 Dec 201207:49

Summary

TLDRIn this educational video, Mr. Andersen explains the concept of standard deviation, its importance in understanding data spread, and its relation to the normal distribution curve. He demonstrates how to calculate standard deviation manually using a simple dataset and then shows a quick method using Excel. The video aims to demystify the process, highlighting that while the formula may seem daunting, the calculation is straightforward and essential for analyzing data variability.

Takeaways

  • 📊 Standard Deviation is a crucial statistic in data analysis, representing the amount of variation or dispersion in a set of values.
  • 📈 Understanding the normal distribution is key to grasping standard deviation, which is often visualized as a bell-shaped curve.
  • 🔢 Standard deviation quantifies the spread of data points around the mean, with 68% of data falling within one standard deviation of the mean.
  • 🌟 Approximately 95% of data points lie within two standard deviations of the mean, and 99% within three standard deviations.
  • 📉 The value of standard deviation can vary significantly depending on the dataset, affecting the shape of the normal distribution curve.
  • 🧮 Calculating standard deviation by hand involves taking the square root of the average of the squared differences from the mean.
  • 📝 The formula for standard deviation includes dividing by 'n-1', where 'n' is the number of data points, to account for degrees of freedom.
  • 💻 Spreadsheet software like Excel can simplify the calculation of standard deviation using built-in functions like STDEV.
  • 📋 The script provides a step-by-step guide to calculating standard deviation both manually and using a spreadsheet for different datasets.
  • 🔑 The video script emphasizes the importance of understanding and calculating standard deviation for analyzing and interpreting data in scientific contexts.

Q & A

  • What is the main topic of Mr. Andersen's video?

    -The main topic of Mr. Andersen's video is Standard Deviation, including its conceptual understanding, calculation by hand, and calculation using a spreadsheet.

  • Why is standard deviation important in data analysis?

    -Standard deviation is important because it measures the spread or variation in a set of data, which helps in understanding the dispersion of the data points around the mean.

  • What is the normal distribution Mr. Andersen mentions?

    -The normal distribution, also known as the bell curve, is a statistical distribution that shows the probability of a data point occurring within a range, with the highest probability at the mean.

  • What percentage of individuals are typically within one standard deviation from the mean in a normal distribution?

    -Approximately 68% of individuals are within one standard deviation from the mean in a normal distribution.

  • How does the percentage change when considering two standard deviations from the mean?

    -When considering two standard deviations from the mean, about 95% of individuals fall within this range in a normal distribution.

  • What does the standard deviation signify about the spread of data in a set?

    -A higher standard deviation signifies a greater spread of data points, indicating more variation from the mean, while a lower standard deviation indicates less variation and a tighter clustering of data points around the mean.

  • What is the formula for calculating standard deviation by hand?

    -The formula for calculating standard deviation by hand involves taking the square root of the sum of the squared differences between each data point and the mean, divided by the degrees of freedom (n-1).

  • What does Mr. Andersen use to demonstrate the calculation of standard deviation?

    -Mr. Andersen uses a simple data set of five numbers (1, 2, 3, 4, 5) to demonstrate the calculation of standard deviation.

  • How does Mr. Andersen calculate the mean of the data set in the video?

    -Mr. Andersen calculates the mean by adding all the data points together and then dividing by the total number of data points (n), which in this case is 5.

  • What spreadsheet function does Mr. Andersen use to calculate the standard deviation?

    -Mr. Andersen uses the 'STDEV' function in Excel to calculate the standard deviation of a data set.

  • How does Mr. Andersen show the ease of calculating the mean and median in a spreadsheet?

    -Mr. Andersen demonstrates the ease by using the 'AVERAGE' and 'MEDIAN' functions in Excel, which automatically calculate the mean and median when the data set is selected.

Outlines

00:00

📊 Understanding Standard Deviation

In this paragraph, Mr. Andersen introduces the concept of standard deviation as a crucial statistic in data analysis, particularly in a scientific context. He explains that standard deviation measures the spread or variation in a dataset, which is visualized as a bell-shaped curve in a normal distribution. The mean or average is the central point of this curve, and the standard deviation quantifies how spread out the data points are from the mean. Mr. Andersen uses the example of the average height of men in the United States to illustrate how data points are distributed around the mean, with 68%, 95%, and 99% of individuals falling within one, two, and three standard deviations from the mean, respectively. He also points out that the standard deviation varies depending on the dataset and will be higher for more spread out data.

05:02

🔢 Calculating Standard Deviation by Hand

This paragraph delves into the process of calculating standard deviation manually. Mr. Andersen begins by emphasizing the importance of understanding the formula, which includes the summation symbol, and assures that it is not as daunting as it appears. He provides a step-by-step guide using a simple dataset (1, 2, 3, 4, 5) to demonstrate the calculation. The process involves finding the mean (average) of the dataset, then for each data point, subtracting the mean and squaring the result. These squared differences are summed up and then divided by the degrees of freedom (n-1), where n is the number of data points. The final step is taking the square root of this quotient to obtain the standard deviation. Mr. Andersen concludes by encouraging viewers to try the calculation themselves and provides the answer in the video description.

Mindmap

Keywords

💡Standard Deviation

Standard deviation is a measure of the amount of variation or dispersion in a set of values. It indicates how spread out the numbers are from the average, or mean, value. In the video, Mr. Andersen explains that standard deviation is a key statistic in understanding the spread of data in a normal distribution, and he provides a step-by-step guide on how to calculate it both conceptually and mathematically, using a simple dataset of the numbers 1 through 5.

💡Normal Distribution

Normal distribution, also known as Gaussian distribution, is a probability distribution that is characterized by a bell-shaped curve. The video script describes it as a fundamental concept for understanding standard deviation. Mr. Andersen uses the normal distribution to illustrate how data points are distributed around the mean, with the majority of individuals (68%, 95%, and 99%) falling within one, two, or three standard deviations from the mean, respectively.

💡Mean

The mean, often referred to as the average, is the sum of all data points in a set divided by the number of data points. It serves as the central value in a dataset. In the script, Mr. Andersen calculates the mean of the dataset (1, 2, 3, 4, 5) by adding the numbers together and dividing by the number of data points, which is 5, resulting in a mean of 3.

💡Bell Curve

A bell curve is a graphical representation of a normal distribution, where the highest point of the curve corresponds to the mean of the data set. The video script uses the bell curve to visually describe the distribution of data points and how standard deviation measures the spread of these points around the mean.

💡Spread

In the context of the video, spread refers to the dispersion of data points in relation to the mean. A larger spread indicates a greater variability in the data, which corresponds to a higher standard deviation. Mr. Andersen explains that the spread can be visualized on the bell curve, with more spread out data points resulting in a higher standard deviation.

💡Degrees of Freedom

Degrees of freedom in statistics is the number of values in the data set that are free to vary. In the calculation of standard deviation, the degrees of freedom is the number of data points minus one (n-1). The script mentions this concept when explaining the formula for standard deviation, where the sum of squared deviations is divided by the degrees of freedom.

💡Summation

The summation symbol (∑) is used in mathematics to represent the total sum of a sequence of numbers. In the script, Mr. Andersen introduces the summation symbol in the context of calculating the sum of the squared differences from the mean for each data point, which is a part of the standard deviation formula.

💡Data Set

A data set is a collection of data points or values that are often used for statistical analysis. In the video, Mr. Andersen uses a simple data set consisting of the numbers 1, 2, 3, 4, and 5 to demonstrate the process of calculating standard deviation.

💡Histogram

A histogram is a graphical representation of the distribution of data points, typically used to show the frequency of different values in a dataset. In the script, Mr. Andersen suggests plotting the data as a histogram to visually represent the standard deviation, indicating how the data points are spread around the mean.

💡Spreadsheet

A spreadsheet is a computer application that allows users to organize, manipulate, and analyze data in tabular form. In the video, Mr. Andersen demonstrates how to use a spreadsheet, specifically Excel, to calculate the mean, median, and standard deviation of a dataset more efficiently than doing it by hand.

Highlights

Introduction to the concept of Standard Deviation and its importance in data analysis.

Explanation of the relationship between standard deviation and the normal distribution curve.

The significance of the average height of men in the U.S. as an example of the mean.

Description of how standard deviation measures the spread of data in a bell curve.

The 68-95-99 rule explained in relation to standard deviation and data distribution.

Visual comparison of two normal distribution curves with different standard deviations.

The equation for calculating standard deviation and its components explained.

Demonstration of calculating the mean (average) from a given data set.

Step-by-step process of calculating the variance, a component of standard deviation.

Explanation of the summation symbol and its role in calculating standard deviation.

Calculation of standard deviation by hand using a simple data set.

The formula for standard deviation and its significance in data analysis.

The practical application of standard deviation in plotting data as a histogram.

Encouragement for viewers to try calculating standard deviation by hand for practice.

Introduction to using spreadsheets for calculating standard deviation efficiently.

Demonstration of calculating the mean and median using Excel functions.

A quick guide on using Excel's STDEV function to find the standard deviation.

Comparison of standard deviation values between two different data sets.

Conclusion on the importance and application of standard deviation in data analysis.

Transcripts

play00:03

Hi. It's Mr. Andersen and in this video I'm going to talk about Standard

play00:08

Deviation. When you're collecting data in a science lab the amount of data you collect

play00:12

is important. So is the average. But another important statistic is going to be the standard

play00:17

deviation of your sample. And so in this video I'm going to show you what it is conceptually.

play00:23

I'm then going to show you how to calculate standard deviation by hand and then finally

play00:26

I'm going to show you how to calculate it using a spreadsheet. And so first of all,

play00:30

what is it? Well to understand standard deviation you'll have to understand the normal distribution.

play00:35

And so what does that mean? Well, it's a bell shaped curve. You might think of it like that.

play00:39

And so in the United States most men are about 5 foot 9. In other words that's the average

play00:45

right here. That's the mean, or in statistics that's the X bar. But there's going to be

play00:51

a lot of men who are obviously taller than that and a lot who are shorter than that.

play00:55

And so the standard deviation is going to measure the spread or the variation in this

play01:01

bell shaped curve. And so basically if we were to go right over to here, this dark area

play01:06

is going to be 1 standard deviation above and 1 standard deviation below the mean. Or

play01:12

it's going to be below the average. And there's something cool about that. About 68% of the

play01:17

individuals are going to be in this area. So 1 standard deviation above and below. Or

play01:22

if we were to look at this for example, down here is two standard deviations and so 95%

play01:29

of individuals are going to be within 2 standard deviations from that mean. And then finally

play01:34

if we go way down here 99% of individuals are going to be within 3 standard deviations

play01:41

of the mean. But the standard deviation is going to vary depending on the data that you

play01:45

collect. So if we have two curves like this, so if this is one curve and then we had another

play01:52

curve that look like this, that data plotted on the same curve, this on is going to have

play02:01

a smaller standard deviation than this one. They're both going to have stand deviations

play02:05

obviously. They're going to have amounts where it's 68, 95 and 99% of the people, but this

play02:11

one down here since it's more spread out is going to have a higher standard deviation.

play02:15

And so how do we calculate that? Well the equation is a little scary. The scary part

play02:21

ends up being right here. So students are a little scared by that, the summation symbol.

play02:26

But it's actually pretty straight forward. It's not that hard to calculate the standard

play02:29

deviation. And so let me show you how to do that. And so first thing you want to do is

play02:33

you want to have a data set. And so here's going to be our data set right here. And to

play02:37

make this easy let's say we just have five pieces of data. 1, 2, 3, 4, and 5. So you're

play02:42

collecting data and this is the data in your data table. And you want to figure out the

play02:46

standard deviation of that. Well to set that up we're basically going to take the square

play02:51

root of the summation of this divided by the degrees of freedom. So that sounds a little

play02:55

bit scary and so let's go to the scariest part to begin with. Let's look at what's going

play02:59

on right here underneath that square root. And so what this is, so if we go like this,

play03:04

the summation of x minus x bar squared, basically it means for each of these data points that

play03:13

I have we're going to have to figure out what's right here, so x minus x bar. And so the first

play03:20

thing we have to do is figure out what the average is. So we have to figure out what

play03:23

x bar is. Well basically if I add 1, 2, 3, 4, 5 together I get fifteen. And if I divide

play03:31

that by n, which is the total number of data points, so in this case n equals 5. So we

play03:37

have 5 data points over here. So if I divide 15 by 5 hopefully you can figure out an average,

play03:42

the average is going to be 3. And so the mean is 3 or the average is 3. So what we have

play03:48

to do is we have to calculate this value for all five of these data points. What does that

play03:56

mean? Well right here we're going to use x and x for the first case is going to be 1.

play04:00

So that's going to be 1 minus 3 and then we're going to square that. So what is that? 1 minus

play04:07

3 and we square that is going to be negative 2 and if we square that, so that's negative

play04:11

2 squared and if we square that that's 4. Let's go to the next one. Well this is 2 minus

play04:17

3 so that stays the same. So that's negative 1 squared. And so that's going to be negative

play04:22

one squared or that's going to equal one. If we go to the next one, that's easy. That's

play04:26

3 minus 3 squared equals 0. And if we square 0 that's going to be 0. Go to the next one.

play04:34

That's going to be 4 minus 3. That's going to be 1 squared or equal to 1. And then finally

play04:41

if we go 5 minus 3, square it. That's going to be 2 squared and that's going to equal

play04:47

four. And so if you ever see the summation sign, don't be scared by that. It's not scary

play04:52

at all. It just means you've got to do a lot of work. So for each of these data points

play04:56

1 through 5 I had to calculate what was in there. And then I have to add it all up. So

play05:02

I have to add 4 plus 1 plus 1 plus 4. And if I add all of those up I get 10. And so

play05:08

what's going to be inside there is simply going to be 10. So let's figure out the rest

play05:12

of my standard deviation. Standard deviation is going to be the square root, in this case

play05:16

we've solved this as equal to 10 and then we're going to divide that by n minus 1. So

play05:21

what's n? That's our sample size. In this case it's 5 and so we take n minus 1 and that's

play05:27

going to equal four. And so what is our standard deviation? It's the square root of 10 divided

play05:32

by four which is 2.5. Or if we take the standard deviation of, the square root of 2.5, that's

play05:39

going to be something like 1.58. And so you're going to have to use a calculator to figure

play05:44

that out. Well what does that mean? If we were to plot this data as a histogram for

play05:48

example, this would be our standard deviation. 1.58. And so it takes awhile to figure that

play05:56

out based on doing it by hand. And so if you want to, give it a try. And so here's a data

play06:04

set over here and so try to calculate the standard deviation using this data set over

play06:10

here. And try to do it by hand. I'll put the answer down in the description below the video.

play06:15

But I would give it a try. It's worth doing once on your own. And again this is going

play06:19

to be our formula, standard deviation and so try to do that. Try to do that by hand.

play06:24

And so I'll wait. No, I won't wait for you to do that. Pause the video. Try to do this

play06:29

one and I'm going to show you how to calculate this really really quickly. And so I'm going

play06:32

to show you the spreadsheet shortcut. And so how do you do that in a spreadsheet. It's

play06:38

pretty simple. So what I'm going to do is going to take this data and I'm going to switch

play06:40

over here to Excel. So here's the data right here. 0, 2, 4, 5 and 7. And so I've entered

play06:46

my data into different cells. And now I'm going to figure out the mean, just to show

play06:50

you how easy this is. To figure out the mean I'm going to hit an = here and then I'm going

play06:54

to just start typing. So I'm going to type in average because the spread sheet's not

play06:58

going to use the word mean. So I type in the word average and then I select my data. I

play07:03

hit a closed parenthesis, I hit end and it's going to give me my average with is going

play07:06

to be 3.6. So if I wanted to know the average there it is. If I want to know the median

play07:11

for example I could just type median and I could go down like that and so spreadsheets

play07:17

are super simple. And so what are we looking for? We're looking for the standard deviation.

play07:23

So how do I do that? I just hit =. I then start typing stdev, can you see how it pops

play07:28

up right here, standard deviation, parenthesis and then I'm going to select that and then

play07:32

I'm going to go like that. So what's the standard deviation? It's 2.7. What does that mean?

play07:37

We had a bigger spread in the second data set then we did in the first set. A higher

play07:41

standard deviation. And if you did it by hand it should've look something like that. So

play07:45

that's standard deviation and I hope that's helpful.

Rate This

5.0 / 5 (0 votes)

相关标签
Standard DeviationData AnalysisStatistical ConceptsBell CurveMean CalculationSpread VariationScience LabExcel TutorialStatistical ToolsData Set
您是否需要英文摘要?