Standard Deviation
Summary
TLDRIn this educational video, Mr. Andersen explains the concept of standard deviation, its importance in understanding data spread, and its relation to the normal distribution curve. He demonstrates how to calculate standard deviation manually using a simple dataset and then shows a quick method using Excel. The video aims to demystify the process, highlighting that while the formula may seem daunting, the calculation is straightforward and essential for analyzing data variability.
Takeaways
- ๐ Standard Deviation is a crucial statistic in data analysis, representing the amount of variation or dispersion in a set of values.
- ๐ Understanding the normal distribution is key to grasping standard deviation, which is often visualized as a bell-shaped curve.
- ๐ข Standard deviation quantifies the spread of data points around the mean, with 68% of data falling within one standard deviation of the mean.
- ๐ Approximately 95% of data points lie within two standard deviations of the mean, and 99% within three standard deviations.
- ๐ The value of standard deviation can vary significantly depending on the dataset, affecting the shape of the normal distribution curve.
- ๐งฎ Calculating standard deviation by hand involves taking the square root of the average of the squared differences from the mean.
- ๐ The formula for standard deviation includes dividing by 'n-1', where 'n' is the number of data points, to account for degrees of freedom.
- ๐ป Spreadsheet software like Excel can simplify the calculation of standard deviation using built-in functions like STDEV.
- ๐ The script provides a step-by-step guide to calculating standard deviation both manually and using a spreadsheet for different datasets.
- ๐ The video script emphasizes the importance of understanding and calculating standard deviation for analyzing and interpreting data in scientific contexts.
Q & A
What is the main topic of Mr. Andersen's video?
-The main topic of Mr. Andersen's video is Standard Deviation, including its conceptual understanding, calculation by hand, and calculation using a spreadsheet.
Why is standard deviation important in data analysis?
-Standard deviation is important because it measures the spread or variation in a set of data, which helps in understanding the dispersion of the data points around the mean.
What is the normal distribution Mr. Andersen mentions?
-The normal distribution, also known as the bell curve, is a statistical distribution that shows the probability of a data point occurring within a range, with the highest probability at the mean.
What percentage of individuals are typically within one standard deviation from the mean in a normal distribution?
-Approximately 68% of individuals are within one standard deviation from the mean in a normal distribution.
How does the percentage change when considering two standard deviations from the mean?
-When considering two standard deviations from the mean, about 95% of individuals fall within this range in a normal distribution.
What does the standard deviation signify about the spread of data in a set?
-A higher standard deviation signifies a greater spread of data points, indicating more variation from the mean, while a lower standard deviation indicates less variation and a tighter clustering of data points around the mean.
What is the formula for calculating standard deviation by hand?
-The formula for calculating standard deviation by hand involves taking the square root of the sum of the squared differences between each data point and the mean, divided by the degrees of freedom (n-1).
What does Mr. Andersen use to demonstrate the calculation of standard deviation?
-Mr. Andersen uses a simple data set of five numbers (1, 2, 3, 4, 5) to demonstrate the calculation of standard deviation.
How does Mr. Andersen calculate the mean of the data set in the video?
-Mr. Andersen calculates the mean by adding all the data points together and then dividing by the total number of data points (n), which in this case is 5.
What spreadsheet function does Mr. Andersen use to calculate the standard deviation?
-Mr. Andersen uses the 'STDEV' function in Excel to calculate the standard deviation of a data set.
How does Mr. Andersen show the ease of calculating the mean and median in a spreadsheet?
-Mr. Andersen demonstrates the ease by using the 'AVERAGE' and 'MEDIAN' functions in Excel, which automatically calculate the mean and median when the data set is selected.
Outlines
๐ Understanding Standard Deviation
In this paragraph, Mr. Andersen introduces the concept of standard deviation as a crucial statistic in data analysis, particularly in a scientific context. He explains that standard deviation measures the spread or variation in a dataset, which is visualized as a bell-shaped curve in a normal distribution. The mean or average is the central point of this curve, and the standard deviation quantifies how spread out the data points are from the mean. Mr. Andersen uses the example of the average height of men in the United States to illustrate how data points are distributed around the mean, with 68%, 95%, and 99% of individuals falling within one, two, and three standard deviations from the mean, respectively. He also points out that the standard deviation varies depending on the dataset and will be higher for more spread out data.
๐ข Calculating Standard Deviation by Hand
This paragraph delves into the process of calculating standard deviation manually. Mr. Andersen begins by emphasizing the importance of understanding the formula, which includes the summation symbol, and assures that it is not as daunting as it appears. He provides a step-by-step guide using a simple dataset (1, 2, 3, 4, 5) to demonstrate the calculation. The process involves finding the mean (average) of the dataset, then for each data point, subtracting the mean and squaring the result. These squared differences are summed up and then divided by the degrees of freedom (n-1), where n is the number of data points. The final step is taking the square root of this quotient to obtain the standard deviation. Mr. Andersen concludes by encouraging viewers to try the calculation themselves and provides the answer in the video description.
Mindmap
Keywords
๐กStandard Deviation
๐กNormal Distribution
๐กMean
๐กBell Curve
๐กSpread
๐กDegrees of Freedom
๐กSummation
๐กData Set
๐กHistogram
๐กSpreadsheet
Highlights
Introduction to the concept of Standard Deviation and its importance in data analysis.
Explanation of the relationship between standard deviation and the normal distribution curve.
The significance of the average height of men in the U.S. as an example of the mean.
Description of how standard deviation measures the spread of data in a bell curve.
The 68-95-99 rule explained in relation to standard deviation and data distribution.
Visual comparison of two normal distribution curves with different standard deviations.
The equation for calculating standard deviation and its components explained.
Demonstration of calculating the mean (average) from a given data set.
Step-by-step process of calculating the variance, a component of standard deviation.
Explanation of the summation symbol and its role in calculating standard deviation.
Calculation of standard deviation by hand using a simple data set.
The formula for standard deviation and its significance in data analysis.
The practical application of standard deviation in plotting data as a histogram.
Encouragement for viewers to try calculating standard deviation by hand for practice.
Introduction to using spreadsheets for calculating standard deviation efficiently.
Demonstration of calculating the mean and median using Excel functions.
A quick guide on using Excel's STDEV function to find the standard deviation.
Comparison of standard deviation values between two different data sets.
Conclusion on the importance and application of standard deviation in data analysis.
Transcripts
Hi. It's Mr. Andersen and in this video I'm going to talk about Standard
Deviation. When you're collecting data in a science lab the amount of data you collect
is important. So is the average. But another important statistic is going to be the standard
deviation of your sample. And so in this video I'm going to show you what it is conceptually.
I'm then going to show you how to calculate standard deviation by hand and then finally
I'm going to show you how to calculate it using a spreadsheet. And so first of all,
what is it? Well to understand standard deviation you'll have to understand the normal distribution.
And so what does that mean? Well, it's a bell shaped curve. You might think of it like that.
And so in the United States most men are about 5 foot 9. In other words that's the average
right here. That's the mean, or in statistics that's the X bar. But there's going to be
a lot of men who are obviously taller than that and a lot who are shorter than that.
And so the standard deviation is going to measure the spread or the variation in this
bell shaped curve. And so basically if we were to go right over to here, this dark area
is going to be 1 standard deviation above and 1 standard deviation below the mean. Or
it's going to be below the average. And there's something cool about that. About 68% of the
individuals are going to be in this area. So 1 standard deviation above and below. Or
if we were to look at this for example, down here is two standard deviations and so 95%
of individuals are going to be within 2 standard deviations from that mean. And then finally
if we go way down here 99% of individuals are going to be within 3 standard deviations
of the mean. But the standard deviation is going to vary depending on the data that you
collect. So if we have two curves like this, so if this is one curve and then we had another
curve that look like this, that data plotted on the same curve, this on is going to have
a smaller standard deviation than this one. They're both going to have stand deviations
obviously. They're going to have amounts where it's 68, 95 and 99% of the people, but this
one down here since it's more spread out is going to have a higher standard deviation.
And so how do we calculate that? Well the equation is a little scary. The scary part
ends up being right here. So students are a little scared by that, the summation symbol.
But it's actually pretty straight forward. It's not that hard to calculate the standard
deviation. And so let me show you how to do that. And so first thing you want to do is
you want to have a data set. And so here's going to be our data set right here. And to
make this easy let's say we just have five pieces of data. 1, 2, 3, 4, and 5. So you're
collecting data and this is the data in your data table. And you want to figure out the
standard deviation of that. Well to set that up we're basically going to take the square
root of the summation of this divided by the degrees of freedom. So that sounds a little
bit scary and so let's go to the scariest part to begin with. Let's look at what's going
on right here underneath that square root. And so what this is, so if we go like this,
the summation of x minus x bar squared, basically it means for each of these data points that
I have we're going to have to figure out what's right here, so x minus x bar. And so the first
thing we have to do is figure out what the average is. So we have to figure out what
x bar is. Well basically if I add 1, 2, 3, 4, 5 together I get fifteen. And if I divide
that by n, which is the total number of data points, so in this case n equals 5. So we
have 5 data points over here. So if I divide 15 by 5 hopefully you can figure out an average,
the average is going to be 3. And so the mean is 3 or the average is 3. So what we have
to do is we have to calculate this value for all five of these data points. What does that
mean? Well right here we're going to use x and x for the first case is going to be 1.
So that's going to be 1 minus 3 and then we're going to square that. So what is that? 1 minus
3 and we square that is going to be negative 2 and if we square that, so that's negative
2 squared and if we square that that's 4. Let's go to the next one. Well this is 2 minus
3 so that stays the same. So that's negative 1 squared. And so that's going to be negative
one squared or that's going to equal one. If we go to the next one, that's easy. That's
3 minus 3 squared equals 0. And if we square 0 that's going to be 0. Go to the next one.
That's going to be 4 minus 3. That's going to be 1 squared or equal to 1. And then finally
if we go 5 minus 3, square it. That's going to be 2 squared and that's going to equal
four. And so if you ever see the summation sign, don't be scared by that. It's not scary
at all. It just means you've got to do a lot of work. So for each of these data points
1 through 5 I had to calculate what was in there. And then I have to add it all up. So
I have to add 4 plus 1 plus 1 plus 4. And if I add all of those up I get 10. And so
what's going to be inside there is simply going to be 10. So let's figure out the rest
of my standard deviation. Standard deviation is going to be the square root, in this case
we've solved this as equal to 10 and then we're going to divide that by n minus 1. So
what's n? That's our sample size. In this case it's 5 and so we take n minus 1 and that's
going to equal four. And so what is our standard deviation? It's the square root of 10 divided
by four which is 2.5. Or if we take the standard deviation of, the square root of 2.5, that's
going to be something like 1.58. And so you're going to have to use a calculator to figure
that out. Well what does that mean? If we were to plot this data as a histogram for
example, this would be our standard deviation. 1.58. And so it takes awhile to figure that
out based on doing it by hand. And so if you want to, give it a try. And so here's a data
set over here and so try to calculate the standard deviation using this data set over
here. And try to do it by hand. I'll put the answer down in the description below the video.
But I would give it a try. It's worth doing once on your own. And again this is going
to be our formula, standard deviation and so try to do that. Try to do that by hand.
And so I'll wait. No, I won't wait for you to do that. Pause the video. Try to do this
one and I'm going to show you how to calculate this really really quickly. And so I'm going
to show you the spreadsheet shortcut. And so how do you do that in a spreadsheet. It's
pretty simple. So what I'm going to do is going to take this data and I'm going to switch
over here to Excel. So here's the data right here. 0, 2, 4, 5 and 7. And so I've entered
my data into different cells. And now I'm going to figure out the mean, just to show
you how easy this is. To figure out the mean I'm going to hit an = here and then I'm going
to just start typing. So I'm going to type in average because the spread sheet's not
going to use the word mean. So I type in the word average and then I select my data. I
hit a closed parenthesis, I hit end and it's going to give me my average with is going
to be 3.6. So if I wanted to know the average there it is. If I want to know the median
for example I could just type median and I could go down like that and so spreadsheets
are super simple. And so what are we looking for? We're looking for the standard deviation.
So how do I do that? I just hit =. I then start typing stdev, can you see how it pops
up right here, standard deviation, parenthesis and then I'm going to select that and then
I'm going to go like that. So what's the standard deviation? It's 2.7. What does that mean?
We had a bigger spread in the second data set then we did in the first set. A higher
standard deviation. And if you did it by hand it should've look something like that. So
that's standard deviation and I hope that's helpful.
5.0 / 5 (0 votes)