Measures of Variability (Range, Standard Deviation, Variance)
Summary
TLDRThis video script delves into measures of variability, a crucial aspect of descriptive statistics that complements measures of central tendency. It emphasizes the importance of understanding data spread and consistency, using examples like clustered and spread-out datasets, and the choice between two medications with different improvement variabilities. The script introduces the range as a simple measure of variability, highlighting its limitations and the need for more nuanced metrics like standard deviation and variance. These measures provide deeper insights into data distribution, especially in normally distributed datasets, where they can predict the percentage of data points within certain intervals from the mean.
Takeaways
- 📊 Measures of variability are essential in descriptive statistics to understand the dispersion of data, in addition to measures of central tendency like the mean.
- 🌟 Two datasets can have the same mean but different variability, which is crucial for understanding the data's distribution.
- 💊 Variability is important in real-life decisions, such as choosing between medications with similar effectiveness but different consistency.
- 🔢 The range is a simple measure of variability calculated as the difference between the highest and lowest values in a dataset.
- 🚫 A limitation of the range is that it might not fully represent the dataset, especially if there are outliers or if the data is not evenly distributed.
- 📉 Standard deviation is a measure that describes the typical amount by which data points deviate from the mean and is more informative than the range.
- 📚 The standard deviation is particularly useful in understanding normally distributed data, such as height and weight, providing insights into what is common and uncommon.
- 📊 One standard deviation from the mean covers approximately 68% of the data in a normal distribution, two standard deviations cover about 95%, and three cover around 99.7%.
- 🧮 Variance is calculated as the square of the standard deviation and represents the average squared deviation from the mean.
- 📘 The formulas for standard deviation and variance differ for population and sample data, with the population version using the Greek letter Sigma (σ) and the sample version using 's'.
Q & A
Why are measures of variability important in statistics?
-Measures of variability are important because they provide a way to quantify the differences in a dataset, which cannot be captured by measures of central tendency alone, such as the mean. They describe how scores in a dataset differ from one another and can indicate how spread out or clustered the data points are.
What is the difference between the datasets with a mean of 87 in the video example?
-In the video, the top dataset has scores that are very clustered together, indicating low variability, while the bottom dataset has scores that are spread out, indicating high variability. Despite both having the same mean, the distribution of scores is quite different, highlighting the need for measures of variability.
Why might someone choose medication B over medication A in the pharmaceutical example?
-In the pharmaceutical example, even though the mean improvement scores for medications A and B are the same, medication B is chosen because it shows less variability in improvement. This suggests that medication B provides a more consistent effect across patients, which is often a desirable quality in medical treatments.
What is the range and how is it calculated?
-The range is a simple measure of variability that represents the difference between the highest and lowest values in a dataset. It is calculated by subtracting the lowest value (L) from the highest value (H), as shown by the formula R = H - L.
What is the limitation of using the range as a measure of variability?
-The limitation of the range is that it only considers the highest and lowest values in a dataset, potentially missing out on other important information about the distribution of the data. It does not account for the distribution of scores in between the extremes.
What is standard deviation and why is it useful?
-Standard deviation is a measure of variability that describes the typical amount by which scores deviate from the mean. It is useful because it provides a more comprehensive view of the dataset's dispersion than the range. It is particularly informative in normally distributed data, where it can indicate the proportion of data points within certain intervals from the mean.
What does it mean for a dataset to be normally distributed?
-A dataset is normally distributed if it follows a specific bell-shaped curve, often referred to as the normal curve. This distribution is characterized by the mean, median, and mode being the same, and the data points symmetrically distributed around the mean.
How do standard deviations relate to the normal distribution?
-In a normally distributed dataset, standard deviations provide insights into the commonality of data points. For instance, about 68% of data points fall within one standard deviation of the mean, 95% within two standard deviations, and 99.7% within three standard deviations.
What is variance and how is it different from standard deviation?
-Variance is the average squared deviation from the mean. It is calculated as the square of the standard deviation. Unlike standard deviation, which is in the same units as the data, variance is in squared units, making it less intuitive but useful for certain statistical calculations.
What are the formulas for calculating standard deviation and variance in a population?
-The formula for population standard deviation is Σ(x - μ)^2 / N, where Σ represents the sum, x is each value in the dataset, μ is the mean, and N is the number of observations. The formula for population variance is Σ(x - μ)^2 / N, which is the same as standard deviation but used to describe the spread of the data in squared units.
What are the differences between population and sample standard deviation formulas?
-The population standard deviation formula divides by the total number of observations (N), while the sample standard deviation formula divides by the number of observations minus one (N-1). This difference accounts for the additional uncertainty introduced by estimating from a sample rather than the entire population.
Outlines
📊 Introduction to Measures of Variability
This paragraph introduces the concept of measures of variability in descriptive statistics, emphasizing their importance alongside measures of central tendency. The speaker uses two datasets with the same mean to demonstrate how variability can significantly differ between datasets, thus affecting the interpretation of data. The top dataset shows scores clustered closely together, while the bottom dataset shows scores spread out widely. The speaker also discusses a real-life scenario involving the selection of medications for depression, where the variability in effectiveness is a crucial factor in decision-making. The paragraph concludes by introducing three measures of variability: range, standard deviation, and variance, with the range being the simplest to calculate.
🔍 Exploring Standard Deviation and Variance
The second paragraph delves into the significance of standard deviation, which quantifies the average amount by which data points deviate from the mean. The speaker explains that standard deviations are particularly useful for understanding normally distributed data, such as height and weight. It is highlighted that standard deviations can indicate what is common and uncommon within a dataset. For instance, in a normal distribution, approximately 68% of data points fall within one standard deviation of the mean, and 99.7% fall within three standard deviations. The speaker uses IQ scores as an example to illustrate how knowing the mean and standard deviation can provide insights into the distribution of intelligence. The paragraph also introduces the concept of variance, which is the square of the standard deviation, representing the average squared deviation from the mean. The speaker mentions that formulas for calculating standard deviation and variance will be covered in subsequent videos.
Mindmap
Keywords
💡Measures of Variability
💡Descriptive Statistics
💡Mean
💡Range
💡Standard Deviation
💡Variance
💡Population Standard Deviation
💡Sample Standard Deviation
💡Sums of Squares (SS)
💡Normal Distribution
Highlights
Measures of variability are essential for understanding data beyond central tendency.
Datasets with the same mean can have vastly different distributions, necessitating measures of variability.
The range is a simple measure of variability that calculates the difference between the highest and lowest values in a dataset.
The range can be quickly calculated but may not fully represent the dataset's variability due to outliers.
Standard deviation quantifies the average amount of variation from the mean, providing a more nuanced view of data spread.
Variance is the square of standard deviation, representing the average squared deviation from the mean.
In a normal distribution, standard deviations help to identify what is common and uncommon within a dataset.
68% of data points fall within one standard deviation from the mean in a normal distribution.
95% of data points are within two standard deviations of the mean, indicating a high level of confidence in the data's central tendency.
99.7% of data points lie within three standard deviations from the mean, which is considered extremely rare territory.
Standard deviations provide a framework for understanding the rarity or commonality of data points, such as IQ scores.
Knowing the standard deviation of a dataset can help in making informed decisions, such as choosing between medications with different variability profiles.
The video will cover the calculation of standard deviation and variance in upcoming lessons, providing tools for deeper data analysis.
Population standard deviation is denoted by the symbol Sigma (Σ), while sample standard deviation is denoted by 's'.
Population variance is represented as Sigma squared (Σ²), and sample variance as 's²', indicating the average squared deviation from the mean.
The sums of squares (SS) is a key component in the formulas for calculating standard deviation and variance, to be explained in more detail in future videos.
Transcripts
in this video we're going to learn about
measures of variability another form of
descriptive statistics that people often
want to know in addition to measures of
central tendency but before we get to
any of the nitty-gritty details I wanted
to kind of motivate why we need measures
of variability with two examples so
here's two different datasets one on the
top and one on the bottom I'll just go
ahead and tell you that the mean for
both datasets is 87 now if I were to
just tell you the mean of these data I
would be misleading you a little bit
because in reality the situation in each
dataset is quite different if I were to
plot it out for example you would see
this difference clearly in the top
dataset all the scores are very
clustered together everything is close
but in the bottom dataset scores are
very spread out so again I need some way
to quantify these differences and a
measure of central tendency like the
mean simply can't capture that alone
here's another example let's say you're
working for a pharmaceutical company
something like that and you need to
decide between two different medications
for depression we'll call them
medication a and medication B so let's
say you did a study where you measured
how much improvement happened when
people took one over the other and this
is what you got so let's say over here
that higher scores mean you know more
improvement and lower scores mean little
to no improvement well let's kind of
compare the means in this case are the
same in both cases people improved by
about 10-ish points or so but the
variability is very different on the
left some people benefited very greatly
whereas others really didn't benefit at
all but on the right everyone benefits a
good amount in this case I would
personally pick medication B because
it's more consistent and so this is an
example of why knowing the variability
might help us to you know make some
real-life decisions so in general and
statistics measures of variability are
ways to describe these differences
statistically they describe how scores
in a given data set differ from one
another and they capture things like how
spread out or how clustered together the
points are things we've been looking at
already so there are three that we're
going to talk about we have the range
standard deviation and variance let's
start with the range
the range is nice because it's a really
simple measure of variability of
dispersion of how spread out points are
it can often be calculated in five or
ten seconds here's the formula so we
have the range are don't get confused
later on when we learn about
correlations which are often also
described by our we'll use some
different subscripts to make that
difference clear when the time comes
but for now ranges are and then we have
R equals H minus L H means the highest
score in the data set L means the lowest
score in the data set so you can see
that this is a very simple calculation
and if we go back to the example we were
working with a minute ago we can
calculate the range very quickly so for
the first data set we have 95 minus 80
so the range is 15 and in the second
data set we have a hundred and fifty
minus 25 giving us a much larger range
of a hundred and twenty-five so in this
case I would do well to kind of report
both to you I'll tell you the mean and
this measure of variability because that
gives you a more full picture of what's
going on so a mean of 87 and a range of
15 describes a very different situation
compared to a mean of 87 and a range of
125 so again it's great idea for me to
report both and this is what's often
done a big limitation of the range
though is that by using it even though
it's simple and it's pretty effective
you might miss a little bit of the data
a little bit of the information in your
data set and let me show you an example
to illustrate here's the data set here
although these bars are quite high
there's really just one sort of value in
each bar so we have one person who
scored a thirty one person who scored a
forty and so on
another range here is a 120 it's a
hundred fifty minus thirty but let's
look at a second data set in this case
the range is still one hundred and
twenty because our highest and lowest
values are the same but everybody's kind
of over here and there's just a couple
outliers beyond that so again if I were
to just tell you the range I might be
misleading you a little bit because
you're not sure if it looks like this on
the left or if the data looks like this
on the right and this is where standard
deviation and variance come into play
standard deviation just like the name
suggests describes the standard or
typical amount that scores deviate from
the mean hence standard deviation now
we'll get into exactly what this looks
like once we learn to calculate standard
deviation but I just want to show you
some symbols for now so like with means
we have different symbols to describe
population standard deviation versus
sample standard deviation population
standard deviation is described by Sigma
this sort of Oh with Elvis hair I like
to think of it as not to be confused
with this Sigma which is a capital S
unfortunately they're named the same
thing which means take the sum of we
learned about that previously this is
Sigma with a little s so for a sample
standard deviation is simply described
by s so I want to take a step back and
talk about why standard deviations are
really useful whenever you have a normal
curve a normally distributed set of data
which is very common in the world things
like height weight and so on are all
normally distributed standard deviations
have this really interesting property of
telling you a lot of information about
what's common and what's uncommon so if
we have zero this is right at the mean
of whatever we're talking about right
this is the mean 0 standard deviations
away from the mean is right here you're
right at the mean we can look at one
standard deviation above the mean and
one standard deviation below and we
automatically know just because of how
standard deviations work that 68% of
people will fall within this range we
can go beyond that we know that between
two standard deviations in either
direction of the mean ninety-five
percent of people will be contained and
three you're getting really extreme
really far out really rare ninety-nine
point seven percent of the data will be
contained within three standard
deviations in either direction from the
mean to illustrate this a little bit
more let's talk some specifics so let's
say I'm looking at IQ scores we know a
lot about IQ scores we know for example
the population mean of IQ is 100 and we
know that the population standard
deviation Sigma is 15 so let's go ahead
and draw that same sort of normal
curve we know that intelligence is
normally distributed and let's kind of
take a look at what information we have
just by knowing standard deviation so
average IQ is right here at a hundred
one standard deviation above the mean
would be 115 two standard deviations
above the mean would be 130 and three
standard deviations would be a hundred
and forty five and we could do the same
in the opposite direction one standard
deviation below the mean of intelligence
is eighty five two standard deviations
below is seventy and three standard
deviations below the mean of
intelligence is fifty five so again I
automatically know 68% of people will
fall between an IQ of 85 and 115 I also
know that ninety-five percent of people
will fall between an IQ of 70 and 130
and finally that ninety-nine point seven
or so will fall between an IQ of 55 and
145 so this is great to know because if
you tell me you have an IQ of 146 I'm
really impressed this is rare this is
very extreme but if you tell me you have
an IQ of say you know a hundred and six
something like that you know that's fine
good for you not very impressed right
so knowing standard deviations helps you
to kind of get this extra information
about a data set so finally we have
variance variance is very simple it's
just the square of standard deviation so
it's the average squared deviation from
the mean
unfortunately for variance it doesn't
get its own symbols we just take the
symbols we already have for standard
deviation and we put a squared because
it's just a squared standard deviation
so here for a population we would call
the variance in a population Sigma
squared and for a sample we would call
the sample variance s squared so in the
next video we'll learn how to calculate
some of these things but I want to at
least highlight some of the formulas
you're going to see so we have four
different formulas because we have
standard deviation and variance and we
have the population versions and the
statistic for sample versions so for
standard deviation in the population
this is our formula
notice we have Sigma on the left and we
have all this mess which I'll get into
next time one thing I'll mention is that
for all of these formulas the numerator
is called the sums of squares SS and
we're gonna learn about what the sums of
squares really means in the next video
but for now just keep that in mind so
for our sample statistic we have this
you're gonna see an S on the left here
and it's gonna have some similarities
but you're gonna notice a difference or
two that we'll talk about in the next
video
for variance we have Sigma squared and
for sample statistic version of variance
we have s squared
Weitere ähnliche Videos ansehen
Mode, Median, Mean, Range, and Standard Deviation (1.3)
Descriptive Statistics: FULL Tutorial - Mean, Median, Mode, Variance & SD (With Examples)
STATISTIK DESKRIPTIF (MEAN, MEDIAN, MODE, KUARTIL, VARIAN, STANDAR DEVIASI) UNTUK DATA TUNGGAL
Statistics - Module 3 Video 3 - Variance and Coefficient of Variation - Problem 3-2Bab
03. Cómo describir una variable numérica | Curso de SPSS
Descriptive Statistics vs Inferential Statistics | Measure of Central Tendency | Types of Statistics
5.0 / 5 (0 votes)