Measures of Spread & Variability: Range, Variance, SD, etc| Statistics Tutorial | MarinStatsLectures

MarinStatsLectures-R Programming & Statistics
15 Oct 201911:32

Summary

TLDRThis educational video delves into the realm of statistical variability, introducing various measures to quantify data's spread. Key concepts include the range, interquartile range (IQR), and the pivotal sample variance and standard deviation. The video emphasizes the importance of understanding these measures, not just the calculations, to grasp data's true nature. It also touches on the sensitivity of these measures to outliers and their practical implications in statistical analysis, all while maintaining an engaging and accessible tone.

Takeaways

  • 📊 Variability is a key concept in statistics, focusing on how spread out or close observations are to each other.
  • 🔢 The range, calculated by subtracting the minimum from the maximum value, provides a simple measure of variability.
  • 📈 The interquartile range (IQR), which is the difference between the third and first quartiles, measures the spread of the middle 50% of the data and is less sensitive to outliers.
  • 📉 Quartiles divide the dataset into quarters, with the first quartile (Q1) having 25% of observations below it and the third quartile (Q3) having 75% below it.
  • 🧮 Sample variance, denoted as s^2, is calculated by averaging the squared differences from the sample mean and is sensitive to outliers.
  • 📐 Sample standard deviation (SD), the square root of the sample variance, measures the average deviation from the mean and is also sensitive to outliers.
  • ✂️ The IQR is often paired with the median as a measure of center, providing a robust estimate of the data's spread and center.
  • 📚 The video emphasizes understanding the concepts behind these measures rather than focusing on the calculations, which are typically done using statistical software.
  • 🔑 Greek letters like Sigma (Σ) are used to represent population parameters, while Latin letters are used for sample statistics, highlighting the difference between theoretical and empirical values.
  • 💡 The video serves as an introduction to more detailed explanations of these concepts, encouraging viewers to look for further information in subsequent videos.

Q & A

  • What is the range and how is it calculated?

    -The range is a simple measure of variability, calculated as the difference between the maximum and minimum values in a dataset. In the example provided, the range is 104 - 50 = 54 kilograms.

  • What is the interquartile range (IQR) and what does it represent?

    -The interquartile range (IQR) is the difference between the third quartile (Q3) and the first quartile (Q1), representing the spread of the middle 50% of the data. It is not sensitive to outliers, making it a useful measure when extreme values are present. In the example, Q3 is 89 and Q1 is 64, so the IQR is 89 - 64 = 25 kilograms.

  • Why is the range less useful in analytical techniques?

    -The range only considers the maximum and minimum values, which means it is highly sensitive to outliers. This makes it less reliable in analytical techniques, as it doesn’t give a full picture of the data’s variability.

  • How does the interquartile range handle outliers?

    -The interquartile range (IQR) is robust against outliers because it focuses on the middle 50% of the data, excluding the top and bottom quartiles. This makes it less influenced by extreme values.

  • What is the sample variance and what does it measure?

    -The sample variance measures the average of the squared deviations from the mean. It gives a sense of how far individual data points are from the sample mean. In the example, the variance is 317.7 kilograms squared.

  • Why is the formula for variance divided by n-1?

    -The formula for variance uses n-1 (where n is the number of observations) to correct for bias in the estimation of population variance from a sample. This correction is known as Bessel’s correction.

  • What are the units of variance and how are they interpreted?

    -The units of variance are the square of the original units of the data (in this case, kilograms squared). While variance provides useful information, the squared units make it harder to interpret, which is why standard deviation is often preferred.

  • What is the sample standard deviation and how is it related to variance?

    -The sample standard deviation is the square root of the variance. It provides a measure of how much, on average, individual data points deviate from the mean. In this example, the standard deviation is 17.8 kilograms, making it easier to interpret than the variance.

  • How does the standard deviation handle outliers?

    -Like variance, the standard deviation is sensitive to outliers. Extreme values can cause larger deviations, increasing the overall standard deviation.

  • Why is it important to pair the IQR with the median and variance with the mean?

    -The interquartile range (IQR) should be paired with the median because both are resistant to outliers. On the other hand, variance and standard deviation should be paired with the mean, as both are sensitive to outliers and provide a fuller picture of data variability in distributions without extreme values.

Outlines

00:00

📊 Introduction to Measures of Variability

The video begins by introducing the concept of variability in statistics, emphasizing the importance of understanding how spread out or close observations are to the true population value. The presenter encourages viewers to subscribe for more content and uses a simple example of the weights of eight individuals to illustrate different measures of variability. The range, defined as the difference between the maximum and minimum values, is the first measure discussed, highlighting its simplicity and limitations in analytical techniques. The video also introduces the interquartile range (IQR), which measures the spread of the middle 50% of the data, and explains its robustness against outliers. The discussion sets the stage for more detailed exploration of variability measures in subsequent content.

05:01

🔢 Calculating Sample Variance and Standard Deviation

This section delves into the calculation of sample variance and standard deviation, which are measures of variability that estimate how far individual data points deviate from the sample mean. The presenter explains the formula for sample variance, emphasizing the squaring of deviations to account for both positive and negative distances from the mean. The video clarifies that the sample variance is calculated by averaging the squared deviations and then subtracting one, a step that will be further explained in a dedicated video. The sample standard deviation is introduced as the square root of the sample variance, providing a measure of average deviation in the same units as the data. The presenter also notes the sensitivity of these measures to outliers and the difference between sample and population estimates, using Greek and Latin notations, respectively.

10:02

📈 Understanding the Sample Standard Deviation

The final paragraph focuses on the sample standard deviation, which is described as a measure of the average deviation of individual weights from the sample mean. The presenter provides a conceptual understanding of the standard deviation as the 'average difference' and clarifies that it is calculated by taking the square root of the average of the squared deviations. The video emphasizes the importance of understanding the concept behind the calculation rather than focusing on the formula itself. It also notes the sensitivity of the standard deviation to outliers and the difference between sample and population standard deviations. The presenter concludes by reminding viewers that manual calculation of these measures is not practical and encourages them to stay tuned for more videos that will explore these concepts in greater depth.

Mindmap

Keywords

💡Variability

Variability refers to the degree to which data points differ from each other. In the context of the video, it is a central theme as the speaker discusses various measures that quantify how spread out a set of data is. The video aims to help viewers understand how to estimate the variability in a dataset, which is crucial for statistical analysis. For example, the weights of eight individuals ranging from 50 to 104 kilograms illustrate variability.

💡Range

The range is defined as the difference between the maximum and minimum values in a dataset. It provides a simple measure of variability by showing the spread of the data from the highest to the lowest value. In the video, the range of the weights is calculated as 104 minus 50, which equals 54 kilograms, indicating the full span of the data.

💡Interquartile Range (IQR)

The interquartile range is the measure of variability that represents the difference between the third and first quartiles (75th and 25th percentiles) of a dataset. It is highlighted in the video as a way to understand the spread of the middle 50% of the data, making it less sensitive to outliers. The IQR is calculated as 89 minus 64, which equals 25 in the example provided.

💡Quartiles

Quartiles divide a dataset into four equal parts, with each part comprising 25% of the data. The video explains that the third quartile is the value below which 75% of the observations fall, while the first quartile is the value below which 25% of the observations fall. These are used to calculate the interquartile range.

💡Sample Variance

Sample variance is a measure that estimates the average of the squared differences from the mean of a dataset. It is denoted as 's^2' in the video and is calculated by summing the squared deviations of each data point from the sample mean and then dividing by the number of observations minus one. The video emphasizes that sample variance is sensitive to outliers and is expressed in squared units.

💡Sample Standard Deviation (SD)

The sample standard deviation is the square root of the sample variance and is represented by 's'. It is a measure of the average distance of each data point from the mean and is less sensitive to extreme values than the range. In the video, it is calculated to be approximately 17.8 kilograms, providing a sense of the average deviation of the weights from the mean.

💡Outliers

Outliers are data points that are significantly different from other observations, potentially skewing statistical measures. The video discusses how the interquartile range is less sensitive to outliers compared to other measures of variability. An example given is that changing the lowest weight from 50 to 20 would greatly affect the range but not the IQR.

💡Mean

The mean, often referred to as the average, is the sum of all data points divided by the number of points. It is a measure of central tendency. In the video, the mean weight of the eight individuals is calculated to be 77.5 kilograms, serving as a reference point for calculating the variance and standard deviation.

💡Population Variance

Population variance is the true variance of an entire population and is denoted by the Greek letter Sigma squared (σ^2). It is contrasted with sample variance in the video, which is an estimate based on a sample of the population. The video explains that if one were calculating for an entire population, the correct notation would be used.

💡Population Standard Deviation

Population standard deviation is the square root of the population variance and is represented by the Greek letter Sigma (σ). It is mentioned in the video as the true measure of dispersion for an entire population, as opposed to the sample standard deviation, which is an estimate based on a sample.

Highlights

Introduction to measures of variability and spread in statistics.

Focus on understanding concepts rather than calculations.

Definition and calculation of range as a measure of variability.

Importance of reporting maximum and minimum values with range.

Limitations of range in analytic techniques due to sensitivity to outliers.

Explanation of interquartile range (IQR) as a measure of spread.

IQR represents the range of the middle 50% of ordered data, making it less sensitive to outliers.

Quartiles, percentiles, and their estimation methods discussed.

Recommendation to pair IQR with the median as a measure of center.

Introduction to sample variance as a measure of variability.

Formula and conceptual understanding of sample variance.

Explanation of why sample variance is calculated by squaring the deviations.

Discussion on the sensitivity of variance to outliers.

Difference between sample variance and population variance.

Introduction to sample standard deviation as the square root of variance.

Conceptual understanding of sample standard deviation as the average deviation from the mean.

Visualization of individual deviations from the mean in the context of standard deviation.

Emphasis on the importance of understanding the concept over memorizing the formulas.

Final reminder that manual calculation of these measures is not common practice.

Transcripts

play00:00

in this video we're gonna talk a little

play00:01

bit about different measures of

play00:03

variability or spread we're going to go

play00:06

through and show some of the formulas on

play00:08

how to calculate these although we'd

play00:10

like to focus on the concepts and not

play00:12

the calculations a quick reminder to

play00:16

subscribe and click on the bell to

play00:18

receive notifications when we upload new

play00:20

videos statistics is all about

play00:22

variability estimating how variable

play00:25

observations are something we're going

play00:27

to build up to is trying to get an idea

play00:29

of how far or how close is an estimate

play00:32

to the true or population value so this

play00:35

is something we're building up to for

play00:37

now we just want to start talking about

play00:38

measures of variability and we'll get

play00:40

there later on for this discussion we'll

play00:42

use this simple example here of having

play00:44

the weights in kilograms of eight

play00:46

individuals 50:58 all the way up to 104

play00:49

and I've also drawn those in here along

play00:52

the number line so we can try and

play00:53

visualize some of these measures of

play00:55

spread or variability so let's get to

play00:57

discussing our first one the first and

play00:59

most simplest one is the range so the

play01:02

range is just the maximum or largest

play01:05

observation minus the minimum or the

play01:08

smallest so in this simple example this

play01:11

is the 104 minus 50 which comes out to

play01:15

be 54 kilograms so the range gives us an

play01:19

idea of the full span of the data what's

play01:21

the distance between the largest and the

play01:24

smallest when reporting the range it's

play01:27

also good to report the maximum and

play01:29

minimum value along with that and it's

play01:32

worth noting the range is useful as a

play01:34

descriptive measure but it's not really

play01:36

very useful often in analytic techniques

play01:39

so the next measure of spread or

play01:41

variability gets called the

play01:43

interquartile range and while that's a

play01:45

technical sounding word we'll break it

play01:47

down into talking about exactly what it

play01:49

is

play01:49

what this one is is the third quartile

play01:52

minus the first quartile and if we

play01:56

recall in previous videos we talked

play01:57

about quartiles right the third quartile

play01:59

is the value that has three quarters or

play02:02

75% of observations below it so in this

play02:06

example here we can work it out to be

play02:08

roughly 89 the first quartile 64

play02:13

and again the first quartile has 1/4 or

play02:15

25% of observations below it so 25% of

play02:19

the observations are below 64 so that

play02:22

works out to be 25 a few notes on the

play02:25

interquartile range first is that it's

play02:29

giving us the range of the middle 50% of

play02:32

the ordered data there in other words

play02:36

you can think of it as being a trimmed

play02:37

range we cut off the bottom quarter we

play02:40

cut off the top quarter and look at

play02:41

what's the range of the 50% of data

play02:44

sitting in the middle a note on this is

play02:46

that it is not sensitive to outliers or

play02:49

extreme values again we can see if this

play02:53

observation of 50 was 20 the IQR

play02:56

interquartile range would still be the

play02:58

same and again I want to remind you when

play03:01

talking about quartiles percentiles

play03:03

quantiles we talked about there's

play03:05

slightly different ways to estimate

play03:06

these so let's not get too caught up on

play03:08

the exact calculation of q1 and q3 but

play03:11

focus on what is the interquartile range

play03:14

and what is it trying to estimate and

play03:15

here it's also worth mentioning that if

play03:17

we're using the IQR as our measure of

play03:19

spread or variability we should pair

play03:21

that with the median as our measure of

play03:23

center so the next measure of

play03:25

variability that we're going to talk

play03:27

about is the sample variance with

play03:29

notation we're going to write that as

play03:31

little s squared so it's worth noting

play03:34

that we have a separate video that goes

play03:36

into detail explaining the sample

play03:38

variance as well as sample standard

play03:40

deviation and building up those concepts

play03:42

in much more detail than we're going to

play03:44

do here so here we're going to introduce

play03:46

the concept show the formula and that

play03:48

separate video will break down see parts

play03:50

of it a lot more detailed they will do

play03:52

here the idea of the sample variance or

play03:55

sample standard deviation which we're

play03:57

going to get to in a moment is that we

play03:59

want to get some number to help us

play04:01

estimate on average how far individuals

play04:04

weights getting from that sample mean of

play04:06

seventy seven and a half kilograms so

play04:09

again the mean of these was seventy

play04:11

seven and a half some people went below

play04:13

some people went far above and we want

play04:15

to get some estimate that tells us on

play04:17

average how far individuals weights

play04:19

moving from that mean so for now let's

play04:22

just write down the formula build that

play04:25

up and we'll start to talk about the

play04:27

of it to do so we want to think about

play04:30

how far is that first individuals weight

play04:32

of 50 kilograms from the sample mean of

play04:35

77 and a half and what we're going to do

play04:38

is square that and we'll get into the

play04:41

details of why that is in a separate

play04:42

video then we can add to that how far is

play04:46

the second observation from that sample

play04:48

mean of seventy seven point five all the

play04:51

way up to the last so we'll do this for

play04:53

each one

play04:54

how far is each individual from that

play04:57

sample mean squared and then if we

play05:00

average all of these you can see this

play05:03

formula here is giving us the average of

play05:05

the squared distances or deviations on

play05:08

average how far is an individual getting

play05:11

from the mean squared or the average

play05:13

squared deviation so let's write that

play05:16

down here

play05:17

this sample variance is giving us the

play05:20

average squared deviation now one thing

play05:25

to note in the formula is that we

play05:27

actually subtract one from this and in

play05:30

that separate video we'll expand on why

play05:31

we're subtracting one and where that

play05:33

comes from if we were to work this out

play05:35

go to come out to 317 point seven

play05:38

kilograms squared right so again this is

play05:42

on average an individual's weight is

play05:44

moving 317 point seven kilograms squared

play05:47

from that sample mean of seventy seven

play05:50

point five of course this doesn't have

play05:52

that much of a meaningful interpretation

play05:53

yet but when we get to the sample

play05:55

standard deviation we'll make it a

play05:57

little bit more meaningful now let's

play05:59

just simplify this and present it as a

play06:02

statistical formula here we're looking

play06:05

at how far is X 1 or the first

play06:07

observation from the sample mean squared

play06:09

all the way up to how far is X n the

play06:13

last observation from the sample mean

play06:15

squared divided by n minus 1 or even a

play06:19

bit more notation we're going to sum

play06:21

from I going from 1 up to n X I minus X

play06:26

bar squared divided by n minus 1

play06:30

can again just a reminder we don't want

play06:32

to get too caught up in the formula we

play06:35

should never be calculating this by hand

play06:36

or we're showing that so we can get a

play06:38

conceptual understanding of what is the

play06:40

variance trying to calculate some of the

play06:42

things to mention about the variance the

play06:44

units here are in kilograms squared we

play06:50

care in general through the units of our

play06:51

variable x squared it's sensitive to

play06:56

outliers

play06:57

okay our extreme values right again if

play07:01

one of these values say the 50 where to

play07:03

become 20 that distance is going to

play07:05

become much further right the average

play07:06

squared distance is going to grow much

play07:08

larger and here we're time--what

play07:10

estimating it for a sample if we're

play07:12

looking at for a population the

play07:15

population variance we write using Sigma

play07:20

squared can we've talked about this in

play07:22

earlier videos the use of Greek letters

play07:25

to represent population or true

play07:27

theoretical values and Latin letters to

play07:30

represent statistics or sample estimates

play07:32

from a sample of data now let's get into

play07:35

talking about the sample standard

play07:36

deviation often just abbreviated SD we

play07:41

write with a lowercase s and in terms of

play07:44

formulas if we want to write this in

play07:46

notation it's the square root of the

play07:50

variance here the square root of s

play07:54

squared if you take the square root of s

play07:57

squared we get s the sample standard

play07:59

deviation and in notation it's the

play08:03

square root of this here write the

play08:05

square root of the variance so the

play08:07

square root of the sum of I going from 1

play08:10

up to n X I minus X bar squared divided

play08:14

by n minus 1 and if you work that out

play08:16

it's going to come out to be 17 point

play08:20

eight kilograms now I just want to do a

play08:22

quick reminder we don't want to get

play08:24

caught up and focusing on the formula I

play08:26

cannot remember the last time that I

play08:28

calculated a standard deviation by hand

play08:29

we got to say the data we can use

play08:32

software to calculate that for us but

play08:34

this helps us get an understanding of

play08:35

what is the standard deviation and

play08:37

what's it trying to estimate so let me

play08:40

just write that here it's not quite this

play08:42

mathematically but

play08:44

it's pretty close to it so I'm just

play08:46

gonna say it's approximately the average

play08:51

deviation here's the weight of eight

play08:54

individuals sample mean of seventy seven

play08:57

and a half kilograms some are moving far

play09:00

below some are moving above on average

play09:03

okay an individual's weight moves about

play09:06

17 point 8 kilograms from that sample

play09:09

mean weight of 77 point five okay let's

play09:11

draw that in here so we can visualize

play09:13

the first observation was 50 and that

play09:16

was below the sample mean of 77 point

play09:18

five is actually 27 and a half kilograms

play09:21

below the weight of 58 kilograms

play09:24

again that was below and it was 19 and a

play09:29

half kilograms below the mean this

play09:31

weight of 70 kilograms is seven and a

play09:34

half below the 75 is two and a half

play09:38

below the weight of 85 kilograms is

play09:42

seven and a half kilograms above the

play09:45

weight of 88 is 10 and a half above the

play09:48

mean the weight of 90 kilograms is 12

play09:51

and a half above and that weight of 104

play09:54

kilograms is 26 and a half above so

play09:57

these here are showing all the different

play09:59

deviations or how far is an individual

play10:02

from the mean the sample standard

play10:03

deviation what is trying to capture is

play10:05

what is the average difference or

play10:08

average deviation okay so conceptually

play10:10

you can think of as being these average

play10:12

deviations here it's not quite that it's

play10:15

actually we calculate the square of the

play10:16

deviations average of those and then

play10:18

square root set but conceptually it's

play10:20

okay for you to think of the standard

play10:22

deviation as being the average deviation

play10:24

on average how far does an individual

play10:26

get from the mean some important notes

play10:28

about the sample standard deviation it's

play10:31

also sensitive to outliers right so

play10:35

again if there's an extremely large

play10:36

value that creates in a large error or

play10:39

large deviation and that increases the

play10:41

standard deviation and finally if we're

play10:44

talking about the population standard

play10:46

deviation we write that using Sigma okay

play10:50

so again we have a separate video

play10:51

building these up a bit more and

play10:53

explaining them in a little bit more

play10:54

detail getting into why do we subtract

play10:57

one

play10:57

the bottom there the final reminder is

play11:00

you're probably never going to calculate

play11:01

these by hand so don't get distracted by

play11:04

the formula but folks on the concept of

play11:07

what these are trying to estimate and

play11:08

use the formula to help your

play11:10

understanding there stick around guys

play11:14

because we got lots more hope you guys

play11:18

like the video physics is almost as

play11:23

beautiful as a unicorn

Rate This

5.0 / 5 (0 votes)

Etiquetas Relacionadas
StatisticsVariabilityRangeInterquartile RangeIQRSample VarianceStandard DeviationData AnalysisStatistical ConceptsDescriptive Measures
¿Necesitas un resumen en inglés?