Measures of Spread & Variability: Range, Variance, SD, etc| Statistics Tutorial | MarinStatsLectures
Summary
TLDRThis educational video delves into the realm of statistical variability, introducing various measures to quantify data's spread. Key concepts include the range, interquartile range (IQR), and the pivotal sample variance and standard deviation. The video emphasizes the importance of understanding these measures, not just the calculations, to grasp data's true nature. It also touches on the sensitivity of these measures to outliers and their practical implications in statistical analysis, all while maintaining an engaging and accessible tone.
Takeaways
- 📊 Variability is a key concept in statistics, focusing on how spread out or close observations are to each other.
- 🔢 The range, calculated by subtracting the minimum from the maximum value, provides a simple measure of variability.
- 📈 The interquartile range (IQR), which is the difference between the third and first quartiles, measures the spread of the middle 50% of the data and is less sensitive to outliers.
- 📉 Quartiles divide the dataset into quarters, with the first quartile (Q1) having 25% of observations below it and the third quartile (Q3) having 75% below it.
- 🧮 Sample variance, denoted as s^2, is calculated by averaging the squared differences from the sample mean and is sensitive to outliers.
- 📐 Sample standard deviation (SD), the square root of the sample variance, measures the average deviation from the mean and is also sensitive to outliers.
- ✂️ The IQR is often paired with the median as a measure of center, providing a robust estimate of the data's spread and center.
- 📚 The video emphasizes understanding the concepts behind these measures rather than focusing on the calculations, which are typically done using statistical software.
- 🔑 Greek letters like Sigma (Σ) are used to represent population parameters, while Latin letters are used for sample statistics, highlighting the difference between theoretical and empirical values.
- 💡 The video serves as an introduction to more detailed explanations of these concepts, encouraging viewers to look for further information in subsequent videos.
Q & A
What is the range and how is it calculated?
-The range is a simple measure of variability, calculated as the difference between the maximum and minimum values in a dataset. In the example provided, the range is 104 - 50 = 54 kilograms.
What is the interquartile range (IQR) and what does it represent?
-The interquartile range (IQR) is the difference between the third quartile (Q3) and the first quartile (Q1), representing the spread of the middle 50% of the data. It is not sensitive to outliers, making it a useful measure when extreme values are present. In the example, Q3 is 89 and Q1 is 64, so the IQR is 89 - 64 = 25 kilograms.
Why is the range less useful in analytical techniques?
-The range only considers the maximum and minimum values, which means it is highly sensitive to outliers. This makes it less reliable in analytical techniques, as it doesn’t give a full picture of the data’s variability.
How does the interquartile range handle outliers?
-The interquartile range (IQR) is robust against outliers because it focuses on the middle 50% of the data, excluding the top and bottom quartiles. This makes it less influenced by extreme values.
What is the sample variance and what does it measure?
-The sample variance measures the average of the squared deviations from the mean. It gives a sense of how far individual data points are from the sample mean. In the example, the variance is 317.7 kilograms squared.
Why is the formula for variance divided by n-1?
-The formula for variance uses n-1 (where n is the number of observations) to correct for bias in the estimation of population variance from a sample. This correction is known as Bessel’s correction.
What are the units of variance and how are they interpreted?
-The units of variance are the square of the original units of the data (in this case, kilograms squared). While variance provides useful information, the squared units make it harder to interpret, which is why standard deviation is often preferred.
What is the sample standard deviation and how is it related to variance?
-The sample standard deviation is the square root of the variance. It provides a measure of how much, on average, individual data points deviate from the mean. In this example, the standard deviation is 17.8 kilograms, making it easier to interpret than the variance.
How does the standard deviation handle outliers?
-Like variance, the standard deviation is sensitive to outliers. Extreme values can cause larger deviations, increasing the overall standard deviation.
Why is it important to pair the IQR with the median and variance with the mean?
-The interquartile range (IQR) should be paired with the median because both are resistant to outliers. On the other hand, variance and standard deviation should be paired with the mean, as both are sensitive to outliers and provide a fuller picture of data variability in distributions without extreme values.
Outlines
📊 Introduction to Measures of Variability
The video begins by introducing the concept of variability in statistics, emphasizing the importance of understanding how spread out or close observations are to the true population value. The presenter encourages viewers to subscribe for more content and uses a simple example of the weights of eight individuals to illustrate different measures of variability. The range, defined as the difference between the maximum and minimum values, is the first measure discussed, highlighting its simplicity and limitations in analytical techniques. The video also introduces the interquartile range (IQR), which measures the spread of the middle 50% of the data, and explains its robustness against outliers. The discussion sets the stage for more detailed exploration of variability measures in subsequent content.
🔢 Calculating Sample Variance and Standard Deviation
This section delves into the calculation of sample variance and standard deviation, which are measures of variability that estimate how far individual data points deviate from the sample mean. The presenter explains the formula for sample variance, emphasizing the squaring of deviations to account for both positive and negative distances from the mean. The video clarifies that the sample variance is calculated by averaging the squared deviations and then subtracting one, a step that will be further explained in a dedicated video. The sample standard deviation is introduced as the square root of the sample variance, providing a measure of average deviation in the same units as the data. The presenter also notes the sensitivity of these measures to outliers and the difference between sample and population estimates, using Greek and Latin notations, respectively.
📈 Understanding the Sample Standard Deviation
The final paragraph focuses on the sample standard deviation, which is described as a measure of the average deviation of individual weights from the sample mean. The presenter provides a conceptual understanding of the standard deviation as the 'average difference' and clarifies that it is calculated by taking the square root of the average of the squared deviations. The video emphasizes the importance of understanding the concept behind the calculation rather than focusing on the formula itself. It also notes the sensitivity of the standard deviation to outliers and the difference between sample and population standard deviations. The presenter concludes by reminding viewers that manual calculation of these measures is not practical and encourages them to stay tuned for more videos that will explore these concepts in greater depth.
Mindmap
Keywords
💡Variability
💡Range
💡Interquartile Range (IQR)
💡Quartiles
💡Sample Variance
💡Sample Standard Deviation (SD)
💡Outliers
💡Mean
💡Population Variance
💡Population Standard Deviation
Highlights
Introduction to measures of variability and spread in statistics.
Focus on understanding concepts rather than calculations.
Definition and calculation of range as a measure of variability.
Importance of reporting maximum and minimum values with range.
Limitations of range in analytic techniques due to sensitivity to outliers.
Explanation of interquartile range (IQR) as a measure of spread.
IQR represents the range of the middle 50% of ordered data, making it less sensitive to outliers.
Quartiles, percentiles, and their estimation methods discussed.
Recommendation to pair IQR with the median as a measure of center.
Introduction to sample variance as a measure of variability.
Formula and conceptual understanding of sample variance.
Explanation of why sample variance is calculated by squaring the deviations.
Discussion on the sensitivity of variance to outliers.
Difference between sample variance and population variance.
Introduction to sample standard deviation as the square root of variance.
Conceptual understanding of sample standard deviation as the average deviation from the mean.
Visualization of individual deviations from the mean in the context of standard deviation.
Emphasis on the importance of understanding the concept over memorizing the formulas.
Final reminder that manual calculation of these measures is not common practice.
Transcripts
in this video we're gonna talk a little
bit about different measures of
variability or spread we're going to go
through and show some of the formulas on
how to calculate these although we'd
like to focus on the concepts and not
the calculations a quick reminder to
subscribe and click on the bell to
receive notifications when we upload new
videos statistics is all about
variability estimating how variable
observations are something we're going
to build up to is trying to get an idea
of how far or how close is an estimate
to the true or population value so this
is something we're building up to for
now we just want to start talking about
measures of variability and we'll get
there later on for this discussion we'll
use this simple example here of having
the weights in kilograms of eight
individuals 50:58 all the way up to 104
and I've also drawn those in here along
the number line so we can try and
visualize some of these measures of
spread or variability so let's get to
discussing our first one the first and
most simplest one is the range so the
range is just the maximum or largest
observation minus the minimum or the
smallest so in this simple example this
is the 104 minus 50 which comes out to
be 54 kilograms so the range gives us an
idea of the full span of the data what's
the distance between the largest and the
smallest when reporting the range it's
also good to report the maximum and
minimum value along with that and it's
worth noting the range is useful as a
descriptive measure but it's not really
very useful often in analytic techniques
so the next measure of spread or
variability gets called the
interquartile range and while that's a
technical sounding word we'll break it
down into talking about exactly what it
is
what this one is is the third quartile
minus the first quartile and if we
recall in previous videos we talked
about quartiles right the third quartile
is the value that has three quarters or
75% of observations below it so in this
example here we can work it out to be
roughly 89 the first quartile 64
and again the first quartile has 1/4 or
25% of observations below it so 25% of
the observations are below 64 so that
works out to be 25 a few notes on the
interquartile range first is that it's
giving us the range of the middle 50% of
the ordered data there in other words
you can think of it as being a trimmed
range we cut off the bottom quarter we
cut off the top quarter and look at
what's the range of the 50% of data
sitting in the middle a note on this is
that it is not sensitive to outliers or
extreme values again we can see if this
observation of 50 was 20 the IQR
interquartile range would still be the
same and again I want to remind you when
talking about quartiles percentiles
quantiles we talked about there's
slightly different ways to estimate
these so let's not get too caught up on
the exact calculation of q1 and q3 but
focus on what is the interquartile range
and what is it trying to estimate and
here it's also worth mentioning that if
we're using the IQR as our measure of
spread or variability we should pair
that with the median as our measure of
center so the next measure of
variability that we're going to talk
about is the sample variance with
notation we're going to write that as
little s squared so it's worth noting
that we have a separate video that goes
into detail explaining the sample
variance as well as sample standard
deviation and building up those concepts
in much more detail than we're going to
do here so here we're going to introduce
the concept show the formula and that
separate video will break down see parts
of it a lot more detailed they will do
here the idea of the sample variance or
sample standard deviation which we're
going to get to in a moment is that we
want to get some number to help us
estimate on average how far individuals
weights getting from that sample mean of
seventy seven and a half kilograms so
again the mean of these was seventy
seven and a half some people went below
some people went far above and we want
to get some estimate that tells us on
average how far individuals weights
moving from that mean so for now let's
just write down the formula build that
up and we'll start to talk about the
of it to do so we want to think about
how far is that first individuals weight
of 50 kilograms from the sample mean of
77 and a half and what we're going to do
is square that and we'll get into the
details of why that is in a separate
video then we can add to that how far is
the second observation from that sample
mean of seventy seven point five all the
way up to the last so we'll do this for
each one
how far is each individual from that
sample mean squared and then if we
average all of these you can see this
formula here is giving us the average of
the squared distances or deviations on
average how far is an individual getting
from the mean squared or the average
squared deviation so let's write that
down here
this sample variance is giving us the
average squared deviation now one thing
to note in the formula is that we
actually subtract one from this and in
that separate video we'll expand on why
we're subtracting one and where that
comes from if we were to work this out
go to come out to 317 point seven
kilograms squared right so again this is
on average an individual's weight is
moving 317 point seven kilograms squared
from that sample mean of seventy seven
point five of course this doesn't have
that much of a meaningful interpretation
yet but when we get to the sample
standard deviation we'll make it a
little bit more meaningful now let's
just simplify this and present it as a
statistical formula here we're looking
at how far is X 1 or the first
observation from the sample mean squared
all the way up to how far is X n the
last observation from the sample mean
squared divided by n minus 1 or even a
bit more notation we're going to sum
from I going from 1 up to n X I minus X
bar squared divided by n minus 1
can again just a reminder we don't want
to get too caught up in the formula we
should never be calculating this by hand
or we're showing that so we can get a
conceptual understanding of what is the
variance trying to calculate some of the
things to mention about the variance the
units here are in kilograms squared we
care in general through the units of our
variable x squared it's sensitive to
outliers
okay our extreme values right again if
one of these values say the 50 where to
become 20 that distance is going to
become much further right the average
squared distance is going to grow much
larger and here we're time--what
estimating it for a sample if we're
looking at for a population the
population variance we write using Sigma
squared can we've talked about this in
earlier videos the use of Greek letters
to represent population or true
theoretical values and Latin letters to
represent statistics or sample estimates
from a sample of data now let's get into
talking about the sample standard
deviation often just abbreviated SD we
write with a lowercase s and in terms of
formulas if we want to write this in
notation it's the square root of the
variance here the square root of s
squared if you take the square root of s
squared we get s the sample standard
deviation and in notation it's the
square root of this here write the
square root of the variance so the
square root of the sum of I going from 1
up to n X I minus X bar squared divided
by n minus 1 and if you work that out
it's going to come out to be 17 point
eight kilograms now I just want to do a
quick reminder we don't want to get
caught up and focusing on the formula I
cannot remember the last time that I
calculated a standard deviation by hand
we got to say the data we can use
software to calculate that for us but
this helps us get an understanding of
what is the standard deviation and
what's it trying to estimate so let me
just write that here it's not quite this
mathematically but
it's pretty close to it so I'm just
gonna say it's approximately the average
deviation here's the weight of eight
individuals sample mean of seventy seven
and a half kilograms some are moving far
below some are moving above on average
okay an individual's weight moves about
17 point 8 kilograms from that sample
mean weight of 77 point five okay let's
draw that in here so we can visualize
the first observation was 50 and that
was below the sample mean of 77 point
five is actually 27 and a half kilograms
below the weight of 58 kilograms
again that was below and it was 19 and a
half kilograms below the mean this
weight of 70 kilograms is seven and a
half below the 75 is two and a half
below the weight of 85 kilograms is
seven and a half kilograms above the
weight of 88 is 10 and a half above the
mean the weight of 90 kilograms is 12
and a half above and that weight of 104
kilograms is 26 and a half above so
these here are showing all the different
deviations or how far is an individual
from the mean the sample standard
deviation what is trying to capture is
what is the average difference or
average deviation okay so conceptually
you can think of as being these average
deviations here it's not quite that it's
actually we calculate the square of the
deviations average of those and then
square root set but conceptually it's
okay for you to think of the standard
deviation as being the average deviation
on average how far does an individual
get from the mean some important notes
about the sample standard deviation it's
also sensitive to outliers right so
again if there's an extremely large
value that creates in a large error or
large deviation and that increases the
standard deviation and finally if we're
talking about the population standard
deviation we write that using Sigma okay
so again we have a separate video
building these up a bit more and
explaining them in a little bit more
detail getting into why do we subtract
one
the bottom there the final reminder is
you're probably never going to calculate
these by hand so don't get distracted by
the formula but folks on the concept of
what these are trying to estimate and
use the formula to help your
understanding there stick around guys
because we got lots more hope you guys
like the video physics is almost as
beautiful as a unicorn
Посмотреть больше похожих видео
Mode, Median, Mean, Range, and Standard Deviation (1.3)
Standard Deviation - Explained and Visualized
Learn Basic statistics for Business Analytics
Boxplots in Statistics | Statistics Tutorial | MarinStatsLectures
Five-Number Summaries and Boxplots
Cara menentukan nilai varian (Ragam) dan Standar deviasi (Simpangan baku) data tunggal dan kelompok
5.0 / 5 (0 votes)