Describing Distributions: Center, Spread & Shape | Statistics Tutorial | MarinStatsLectures

MarinStatsLectures-R Programming & Statistics
10 Sept 201907:54

Summary

TLDRThis script discusses the verbal description of the shape, center, and spread of a numeric variable's distribution. It introduces the concept of symmetry and skewness in distributions, using examples to illustrate symmetric, skewed right, and skewed left distributions. The script also touches on the importance of anticipating distribution shapes for variables like income, height, and class grades before data collection. It sets the stage for future discussions on more quantitative measures of distribution, such as mean, median, standard deviation, and interquartile range.

Takeaways

  • 📊 Describing a numeric variable's distribution involves discussing its shape, center, and spread.
  • 🔍 Histograms and box plots are useful for summarizing the distribution of numeric variables visually.
  • 📈 The shape of a distribution can be symmetric or skewed, with skewness being towards the right (positive) or left (negative).
  • 📚 A normal distribution is symmetric and bell-shaped, often used as a reference for many natural phenomena.
  • 🤔 When considering the distribution of variables like income, height, or grades, it's helpful to predict their shape before data collection.
  • 💼 Income distributions are often skewed right due to a lower bound and a long tail towards higher values.
  • 🚹 Adult height distributions tend to be more symmetric and bell-shaped, with people evenly distributed around the average height.
  • 📚 Grade distributions are typically skewed left or negatively skewed because they are capped at 100 and often have a lower tail towards zero.
  • 📐 Measures of location, such as mean, median, and quartiles, help pinpoint the center of a distribution.
  • 📉 Measures of spread or variability, like standard deviation, variance, and interquartile range, quantify how spread out the data is.
  • 🔢 Quantitative descriptions of center and spread will be explored in more detail in subsequent videos, moving beyond qualitative descriptors.

Q & A

  • What are the two main characteristics of a distribution that are verbally described?

    -The two main characteristics of a distribution that are verbally described are the shape of the distribution and the center and spread of the data.

  • What does a symmetric distribution imply about the data?

    -A symmetric distribution implies that the data is evenly distributed around the center point, with both sides mirroring each other.

  • What is the term used to describe a distribution that is not symmetric and tails out to one side?

    -A distribution that is not symmetric and tails out to one side is described as 'skewed.'

  • How is a distribution that tails out towards the right side characterized?

    -A distribution that tails out towards the right side is characterized as 'positively skewed' or 'skewed right.'

  • What is meant by a 'normal' or 'bell-shaped' distribution?

    -A 'normal' or 'bell-shaped' distribution refers to a symmetric distribution that decreases evenly on either side of the center, resembling the shape of a bell.

  • What is the expected shape of the distribution for individual incomes?

    -The expected shape of the distribution for individual incomes is often 'skewed right,' indicating a lower bound with few individuals at the higher end.

  • Why are adult heights often symmetrically distributed?

    -Adult heights are often symmetrically distributed because there is an average height, and people are evenly distributed above and below this average.

  • What is the typical distribution shape for class grades, and why?

    -Class grades typically have a 'negatively skewed' or 'skewed left' distribution because grades are bounded between 0 and 100, with many students scoring above 50 and fewer scoring very low or very high.

  • What are some measures of location that can be verbally described for a distribution?

    -Measures of location that can be verbally described for a distribution include the mean, median, maximum, minimum, and quartiles.

  • How is the spread or variability of a distribution verbally described?

    -The spread or variability of a distribution is verbally described by terms like 'spread out,' 'variable,' 'tight,' or 'concentrated,' based on how the data points are dispersed around the center.

  • What are some quantitative measures that will be used to describe the center and spread of a distribution in future videos?

    -Some quantitative measures that will be used to describe the center and spread of a distribution include mean, median, standard deviation, variance, and interquartile range.

Outlines

00:00

📊 Describing Distribution Shapes

This paragraph introduces the concept of verbally describing the shape, center, and spread of a distribution for a numeric variable. It emphasizes the importance of understanding different types of distributions through visualizations like histograms and box plots. The speaker provides examples of symmetric and skewed distributions, explaining the characteristics of each. The symmetric distributions are described as evenly distributed around a center, with one example resembling a normal distribution. Skewed distributions are further classified into right-skewed (positively skewed) and left-skewed (negatively skewed), based on the direction of the tail. The speaker also encourages the audience to consider the expected distribution shapes for variables such as income, height, and class grades before analyzing the data.

05:02

🔍 Exploring Distribution Descriptors and Measures

The second paragraph delves deeper into the descriptive language used to characterize distributions, such as 'exponentially distributed.' It also introduces measures of location and dispersion, which are crucial for understanding the distribution's central tendency and variability. The speaker discusses the concept of the distribution's center, using descriptive terms to pinpoint where the center might be for different examples. Furthermore, the paragraph touches on measures of location like maximum, minimum, quartiles, and percentiles, which were previously introduced in the context of box plots. The discussion on measures of spread includes the interquartile range and range, setting the stage for future quantitative measures like standard deviation and variance. The speaker concludes by hinting at upcoming videos that will provide more quantitative methods for analyzing the center and spread of distributions.

Mindmap

Keywords

💡Distribution

A distribution in statistics refers to the arrangement of values in a dataset, often depicted through graphical representations like histograms or box plots. In the video, the speaker discusses how to describe the shape of a distribution for a numeric variable, emphasizing the importance of understanding whether it is symmetric or skewed. The concept is central to the video's theme as it lays the groundwork for more detailed statistical analysis.

💡Symmetric Distribution

A symmetric distribution is one where the values are evenly distributed around a central point, typically the mean or median. The video uses the term to describe a distribution that is balanced, with no tailing off to one side more than the other. This is exemplified by the first example in the script, where the speaker notes a 'nice symmetric distribution' resembling a bell curve.

💡Skewed Distribution

A skewed distribution is uneven, with the majority of values clustering on one side of the center and the tail extending towards the other side. The video explains two types of skew: right-skewed (positively skewed) and left-skewed (negatively skewed). The script provides examples of each, noting that a right-skewed distribution tails off towards higher values, while a left-skewed one tails off towards lower values.

💡Normal Distribution

The normal distribution, also known as the Gaussian distribution, is a symmetric distribution that is often represented by a bell-shaped curve. The video mentions this concept when describing a symmetric distribution that looks 'sort of normal.' It's a fundamental concept in statistics, used to model various natural and social phenomena.

💡Uniform Distribution

A uniform distribution is a type of symmetric distribution where all values within a range are equally likely to occur. The speaker refers to this concept when discussing the second example, noting that it is 'evenly or uniformly distributed' around its center. This implies that there is no skewness and each value has the same probability of occurring.

💡Center of Distribution

The center of a distribution is a measure that describes the middle or average location of the data points. The video discusses the center in relation to the shape of the distribution, using descriptive terms like 'roughly there' to indicate where the center might be. The concept is crucial for understanding the overall tendency of the data.

💡Spread

Spread, or variability, refers to the extent to which data points are dispersed around the center of the distribution. The video contrasts two examples to illustrate different levels of spread, noting that one is 'much more spread out' than the other. Understanding spread is essential for gauging the consistency or variability within a dataset.

💡Quantitative Description

Quantitative description involves numerically measuring and summarizing the properties of a dataset. The script mentions that after discussing descriptive labels for the shape of distributions, the video will delve into more quantitative or numerical descriptions. This includes measures like mean, median, standard deviation, and variance, which provide precise information about the center and spread of the data.

💡Histogram

A histogram is a graphical representation used to show the distribution of data by way of bars, with each bar representing the frequency of data points within a range of values. The video script refers to histograms as a plot type that summarizes the distribution for a numeric variable, emphasizing their utility in visualizing the shape of the data.

💡Box Plot

A box plot, or box-and-whisker plot, is another graphical representation that provides a summary of the distribution of data points. The video mentions box plots in the context of summarizing the distribution and discussing measures of location such as the median and quartiles. It's a useful tool for quickly assessing the central tendency, spread, and skewness of a dataset.

💡Standard Deviation

Standard deviation is a measure of the amount of variation or dispersion in a set of values. A low standard deviation indicates that the values tend to be close to the mean, while a high standard deviation indicates greater variability. The video script alludes to standard deviation as a way to quantify the spread or variability of a distribution, which is a key aspect of statistical analysis.

Highlights

Introduction to verbally describing the shape, center, and spread of a distribution for a numeric variable.

Reminder to subscribe and click on the bell for notifications on new videos.

Discussion on different plots like histograms and box plots for summarizing distributions.

Explanation of verbally describing shapes as symmetric or skewed.

Description of a normal distribution and its characteristics.

Identification of a uniform distribution and its symmetric nature.

Definition and example of a positively skewed distribution.

Definition and example of a negatively skewed distribution.

Importance of considering expected distribution shapes before data collection.

Expected distribution shape for individual incomes, often skewed right.

Expected distribution shape for adult heights, typically symmetric and bell-shaped.

Expected distribution shape for class grades, often skewed left.

Introduction to more descriptive words for distribution shapes, such as exponentially distributed.

Discussion on measures of location, including mean, median, and percentiles.

Introduction to measures of spread or variability, such as standard deviation and variance.

Explanation of the interquartile range as a measure of spread.

Anticipation of upcoming videos that will quantify center and spread more formally.

Encouragement for viewers to stay tuned for more statistics content.

Transcripts

play00:00

but let's talk a little bit about

play00:01

verbally describing the shape as well as

play00:05

center and spread of a distribution for

play00:07

a numeric variable a quick reminder to

play00:11

subscribe and click on the bell to

play00:13

receive notifications when we upload new

play00:15

videos so we've already talked a little

play00:17

bit about different plots we can make

play00:19

like a histogram or box plot and how

play00:22

they summarize the distribution for a

play00:23

numeric variable but let's start with

play00:26

first verbally describing the shapes we

play00:29

see as well as Center and spread and

play00:31

then in following videos we'll get to

play00:33

more quantitatively or numerically

play00:36

describing some of these things so here

play00:38

I've drawn for examples ABCD and they're

play00:42

sort of artificial textbook very nice

play00:44

and neat examples and again to make the

play00:47

discussion easy for now so first let's

play00:50

go through each each of these here and

play00:52

attach kind of a qualitative or

play00:54

descriptive label to the shape so what

play00:57

we want to think about is the

play00:58

distribution symmetric or skewed okay so

play01:01

looking at this first one here it looks

play01:04

like a nice symmetric distribution and

play01:08

by that we mean if we pick a center

play01:10

point in there it's roughly evenly or

play01:12

symmetrically distributed around that

play01:14

Center and a word we're gonna attach to

play01:18

this later on is it looks sort of normal

play01:21

okay or like a bell curve or a normal

play01:23

distribution that's a topic that's

play01:26

coming up pretty soon let's add that

play01:28

descriptive label for now now this

play01:30

second example will be here

play01:32

and it looks fairly symmetric right if

play01:34

we look at the center looks roughly

play01:35

there and it looks pretty evenly or

play01:37

symmetrically distributed around that

play01:39

first let's add that word here it looks

play01:42

pretty symmetrically distributed around

play01:44

a center and this is one that later is

play01:48

going to get called uniform okay evenly

play01:51

or uniformly distributed okay so it's

play01:54

symmetric and it's rather mean

play01:56

bell-shaped and decreasing is fairly

play01:58

evenly distributed around its center

play02:01

then these two here look what we call

play02:03

skewed they're not symmetric they kind

play02:05

of tail out strongly to one side this

play02:08

one here is skewed and it's skewed to

play02:12

the right

play02:13

what also gets positively skewed and

play02:17

what gets called positively skewed now

play02:20

the terminology can be a little bit

play02:21

confusing at first but we say it's

play02:23

skewed in the direction where it tails

play02:25

out where the long tail is so this is

play02:27

skewed towards the right side or towards

play02:29

the positive or increasing numbers so

play02:31

we'd call this skewed right and this

play02:33

here again looks a little bit skewed and

play02:35

it's tailing out towards the left so

play02:38

this way to say it's skewed to the left

play02:41

or it can be called negatively skewed

play02:46

now often it's good to try and think

play02:49

about the shape you'd expect for a

play02:51

distribution of a variable when taking a

play02:53

sample before collecting or exploring

play02:55

the data so I'm going to give you three

play02:57

examples to think about and well let me

play03:00

mention those so suppose we take a

play03:02

sample record income of individuals or

play03:05

we record the heights of an adult

play03:08

population or maybe record class grades

play03:12

reported as percentages again for a

play03:15

class so it's good to think about what

play03:17

shape would you expect for the

play03:18

distribution of these variables before

play03:20

collecting them so take a moment to

play03:22

think about that and then I'll get into

play03:24

talking about what shape I would expect

play03:26

them to have

play03:28

you

play03:34

when we collect incomes of individuals

play03:36

these often have sort of a skewed right

play03:39

distribution that's usually what we'd

play03:40

expect and this often happens when

play03:43

there's a lower bound

play03:44

okay so incomes tend to clump somewhere

play03:47

between zero and maybe fifty or a

play03:49

hundred thousand maybe 150 thousand for

play03:52

those are getting paid a little bit

play03:54

nicer but then it tails up to other

play03:56

words the right right this people making

play03:58

five hundred thousand two million ten

play04:00

million 20 million a year right there's

play04:02

fewer of them right that's why it tails

play04:04

down but they often tend to have skewed

play04:06

right distributions if we think of

play04:09

heights of adults right so once people

play04:11

are no longer growing they tend to have

play04:13

more symmetric distributions right

play04:15

there's an average or mean or median

play04:18

height and people are somewhat evenly

play04:20

distributed above and below that and it

play04:22

often tends to be a bit more normal or

play04:24

bell-shaped if we think about class

play04:26

grades now this is a tricky one people

play04:28

often think normal right or symmetric

play04:31

they're actually usually skewed to the

play04:33

left or negatively skewed okay the

play04:36

reason that happens is grades are bound

play04:38

between 0 and 100 and often a class

play04:41

averages depends on how well your class

play04:45

goes but they're usually in the 70 to 80

play04:47

range so definitely the average is

play04:49

usually above 50 so there's usually some

play04:51

average around here and they're capped

play04:53

at 100 there's some really low grades

play04:55

tailing down towards the zero okay so

play04:57

looking at distribution of grades

play04:58

they're actually often negative for the

play05:02

skewed or skewed to the left

play05:03

now think about symmetric skewed skewed

play05:07

right skewed left there's often even

play05:09

more descriptive words that we use

play05:10

things like exponentially distributed or

play05:13

things like that so you can take a look

play05:15

at this graphic here and it's going to

play05:17

show a few more examples of other

play05:18

descriptive words that we might use to

play05:20

describe the shape of a distribution

play05:23

aside from describing the shape of the

play05:25

distribution we also want to think about

play05:27

some measures of location as well as

play05:30

dispersion or variability the first one

play05:33

we want to think about is the center

play05:35

where is the center of the distribution

play05:36

again just looking descriptively I would

play05:40

say for this one it looks roughly there

play05:42

looks roughly there for this one where

play05:45

is the center

play05:47

this somewhere around here somewhere

play05:50

around here okay so those are just kind

play05:53

of a very subjective we're gonna in

play05:56

following videos learn about things like

play05:57

the mean and care just what most people

play06:01

know was an average or an arithmetic

play06:02

average median and other measures of

play06:08

center closely related or other measures

play06:12

of location so things like what's the

play06:15

maximum and the minimum or things we

play06:17

mentioned when we learned about box

play06:18

plots let's say the first quartile where

play06:22

it cuts one quarter below three quarters

play06:24

above so these are often referred to as

play06:26

measures of location percentiles or what

play06:30

also often gets called quantile the

play06:33

words are interchangeable for the most

play06:34

part some slight differences but we

play06:37

won't get stuck on that for now and we

play06:39

also want to think a bit about measures

play06:41

of spread or variability so again

play06:44

looking at example a and example B here

play06:47

they've got roughly the same center

play06:48

right so they've got roughly the same

play06:49

mean or median but we can see here

play06:52

example B is much more spread out than a

play06:56

rate or much more variable is the word

play06:58

we're gonna start to attach that right

play07:00

now we're just using descriptive words B

play07:02

is more spread out or more variable than

play07:04

a but we're gonna start to quantify

play07:06

these using things like standard

play07:08

deviation or variance so those are

play07:11

topics coming up things like the

play07:13

interquartile range which we've touched

play07:16

on when talking about box plots in a

play07:18

separate video will more formally define

play07:19

all these or even just things as simple

play07:22

as the range what's the span from the

play07:25

maximum to the minimum in a separate

play07:27

video rather than using all these kind

play07:29

of descriptive qualitative type words

play07:31

we're going to start to quantify Center

play07:34

and spread a little bit more almost as

play07:39

beautiful as a limit going

play07:42

stick around guys cuz we got a lot more

play07:47

my dad is a statistics

play07:50

ninja

Rate This

5.0 / 5 (0 votes)

相关标签
Data DistributionStatisticsSymmetricSkewedNormal CurveDescriptive LabelsCenter of DataSpread of DataVariabilityQuantitative Analysis
您是否需要英文摘要?