Statistics - Module 3 - Mean, Median, Mode, Percentiles and Quartiles - Problem 3-1B

Peter Dalley
13 Aug 201718:33

Summary

TLDRThis educational video script delves into statistical measures of central tendency, focusing on mean, median, and mode. It explains the arithmetic mean, or average, by demonstrating its calculation with a simple dataset. The script contrasts the mean's sensitivity to extreme values with the median's resilience, which divides a dataset into two equal halves. It also introduces the mode as the most frequent value, noting the possibility of multiple modes leading to a multimodal distribution. The presenter emphasizes the importance of understanding these basic concepts and their formal notation, as they form the foundation for more complex statistical analyses.

Takeaways

  • 📊 The video discusses three measures of central tendency: mean, median, and mode.
  • 🔢 The arithmetic mean (average) is calculated by summing all data points and dividing by the number of points.
  • 📈 The mean is influenced by the magnitude of data points, unlike the median.
  • 🎯 The median is the middle value in a data set, where 50% of observations are greater and 50% are lesser.
  • 🔠 The mode is the value that occurs most frequently in a data set; a set can be unimodal, bimodal, or multimodal.
  • ✅ The video demonstrates how to calculate mean, median, and mode using a sample of 30 company share prices.
  • 📚 The script emphasizes the importance of understanding the notation and formulas for calculating mean, as they will be foundational for more complex concepts.
  • 💡 The video provides a detailed walkthrough of calculating the mean, including the formula and its notation.
  • 📋 The median is calculated by sorting the data and finding the middle value(s), which is crucial for even-numbered data sets.
  • 🔍 The mode is identified by observing the frequency of each value in the sorted data set.
  • 📝 The video concludes by reinforcing the importance of comfort with basic statistical calculations and notation for future learning.

Q & A

  • What are the three measures of central location discussed in the video?

    -The three measures of central location discussed in the video are mean, median, and mode.

  • What is the formula for calculating the arithmetic mean?

    -The formula for calculating the arithmetic mean is the sum of all data points divided by the number of data points.

  • How does the arithmetic mean differ from the median in terms of how it is influenced by the data points?

    -The arithmetic mean is influenced by the magnitude of the data points, while the median is not affected by the size of the values but rather their order.

  • What is the median and how is it found in a dataset?

    -The median is the value that separates the higher half from the lower half of a data sample. It is found by arranging the data points in order and picking the middle value or the average of the two middle values if there is an even number of observations.

  • What is the mode and how is it determined?

    -The mode is the value that occurs most frequently in a data set. It is determined by identifying the observation(s) that repeat the most.

  • Can a dataset have more than one mode? If so, what is it called?

    -Yes, a dataset can have more than one mode. If it has two modes, it is called bimodal, and if it has more than two, it is called multimodal.

  • What is the difference between the mean of a population and the mean of a sample?

    -The mean of a population is calculated using the entire dataset, while the mean of a sample is calculated using a subset of the population.

  • What is the notation used for the mean of a sample?

    -The notation used for the mean of a sample is x-bar (x̄).

  • How is the mean calculated when the number of observations is even?

    -When the number of observations is even, the mean is calculated by taking the average of the two middle values after the data has been sorted.

  • What is the importance of understanding the notation and formula for calculating the mean?

    -Understanding the notation and formula for calculating the mean is important because it provides a foundation for more complex statistical concepts and calculations that will be encountered in more advanced material.

Outlines

00:00

📊 Introduction to Measures of Central Tendency

The speaker begins by introducing the topic of the video, which is to explore various measures of location, specifically mean, median, and mode. The discussion starts with the mean, often referred to as the average. The arithmetic mean is explained as the sum of data points divided by the number of data points. An example is given using the data points 5, 5, 7, 10, and 14, which when summed and divided by 5, results in an arithmetic mean of 8.2. The speaker highlights that the mean is influenced by the magnitude of the data points, as changing one data point to 146 significantly alters the mean. This is contrasted with the median, which is the middle value when the data points are ordered, and is not affected by the magnitude of the data points. The mode, which is the value that occurs most frequently, is also introduced, and the concept of bimodal and multimodal data sets is briefly mentioned.

05:03

🔢 Formal Notation for Calculating the Mean

The speaker transitions to a more formal discussion of the arithmetic mean, emphasizing the importance of understanding the notation and formula used in calculations. Using the same data set from the previous example, the speaker assigns identifiers to each data point (x1, x2, x3, x4, x5) and explains that the mean is calculated by summing all observations (x1 through xn) and dividing by the number of observations (n). The generic formula for the mean is introduced, using the summation symbol to represent the sum of all xi values from 1 to n. The concept of a sample mean (x bar) is also explained, noting that it represents a subset of a larger population, using the Dow Jones Industrial Average as an example.

10:06

📈 Determining the Mode and Median of Share Prices

The speaker proceeds to an exercise involving the calculation of the mean, median, and mode of share prices for a list of 30 companies. The data is sorted from smallest to largest, which is beneficial for identifying both the median and mode. The mode is determined by identifying the observations that repeat, with the speaker noting that the highest frequency of repetition in the data set is twice, leading to multiple modes (30, 51, 100, and 132). To find the median, the speaker eliminates observations from both ends until reaching the middle of the data set, which in this case is a pair of observations (80 and 77). The median is then calculated as the average of these two values, resulting in 78.5.

15:06

💹 Calculating the Mean Share Price

Finally, the speaker calculates the mean share price for the 30 companies. The formula for the mean is reiterated, and the speaker demonstrates the calculation by summing all the share prices and dividing by the number of companies (30). The sum of the share prices is found to be 2558, and when divided by 30, the mean share price is calculated to be 85.26. The speaker concludes by emphasizing the importance of understanding the notation and formulas for calculating mean, median, and mode, as these concepts will become more complex in subsequent videos.

Mindmap

Keywords

💡Arithmetic Mean

The arithmetic mean, often referred to simply as the 'mean' or 'average,' is a measure of central tendency that represents the sum of a set of numbers divided by the count of numbers in the set. In the video, the arithmetic mean is calculated by adding together the data points (5, 5, 7, 10, 14) and then dividing by the number of data points (5), resulting in a mean of 8.2. This value is used to describe the central location of the data set, and it's sensitive to the magnitude of the data points, as changing the value of one data point (like increasing 14 to 146) significantly affects the mean.

💡Median

The median is another measure of central tendency that represents the middle value in a data set when the numbers are arranged in ascending order. If there is an even number of observations, as in the video's example with 30 companies, the median is calculated by averaging the two middle numbers after sorting the data. The video demonstrates this by sorting the share prices and then finding the average of the 15th and 16th values to get a median of 78.5. Unlike the mean, the median is not affected by the magnitude of the extreme values, making it a robust measure of central tendency.

💡Mode

The mode is the value that appears most frequently in a data set. In the context of the video, the mode is identified by looking for the data points that repeat, with the highest frequency being two occurrences for the values 30, 51, 100, and 132. The video explains that the data set is 'multimodal' because it has multiple modes. This is in contrast to a 'unimodal' distribution, which would have only one mode, and a 'bimodal' distribution, which would have two modes.

💡Central Tendency

Central tendency refers to a collection of measures that describe the center of a data set. In the video, central tendency is discussed in relation to the mean, median, and mode, which are all statistical measures used to represent the 'middle' or typical value of a data set. These measures help to summarize the data and provide a sense of its overall distribution.

💡Data Points

Data points are individual values within a data set. In the video, data points are the specific numbers that are used to calculate the mean, median, and mode. For example, the data points 5, 5, 7, 10, and 14 are used to calculate the arithmetic mean. Each data point contributes to the overall understanding of the data set's characteristics.

💡Observations

Observations are the individual items or measurements that are part of a data set. The video refers to observations when discussing the calculation of the mean, where each data point is considered an observation. The total number of observations (n) is crucial for calculating the mean, as it is the denominator in the formula for the arithmetic mean.

💡Sample Mean

The sample mean is the mean calculated from a subset of a larger population. In the video, the sample mean is denoted by the symbol 'x bar' and is calculated using the same formula as the arithmetic mean but is specifically used when the data set represents only a part of the whole population. The video emphasizes the importance of understanding the notation and formula for the sample mean, as it is foundational for more complex statistical concepts.

💡Notation

Notation in statistics refers to the symbols and shorthand used to represent mathematical operations and concepts. The video discusses the formal notation for calculating the mean, using the symbol Σ to represent the sum of all observations (xi) from i=1 to n. Understanding this notation is important for grasping more complex statistical formulas that will be introduced later in the course.

💡Dow Jones Industrial Average

The Dow Jones Industrial Average is a stock market index that represents the average of 30 significant stocks traded on the New York Stock Exchange and the NASDAQ. In the video, the Dow Jones Industrial Average is used as a context for the exercise of calculating the mean, median, and mode of share prices of 30 companies listed on the index.

💡Multimodal

A multimodal data set is one that has multiple modes, meaning there are several values that occur with the highest frequency. In the video, the data set of share prices is described as multimodal because there are four different values (30, 51, 100, and 132) that each appear twice, which is more frequent than any other values in the set.

Highlights

Introduction to measures of location: mean, median, and mode.

Definition and calculation of the arithmetic mean.

Explanation of how the mean is influenced by extreme values.

Introduction to the median as a measure of central tendency.

Demonstration of how the median is unaffected by the magnitude of data points.

Discussion on the mode as a measure of the most frequent observation.

Explanation of multimodal datasets and how to identify them.

Formal notation and formula for calculating the arithmetic mean.

Differentiation between calculating the population mean and sample mean.

Practical exercise on calculating the mean, median, and mode of share prices.

Sorting data to facilitate the identification of median and mode.

Identification of multiple modes in the dataset.

Manual calculation of the median for an even number of observations.

Calculation of the mean using the formula and practical data.

Final results of the mean, median, and modes for the sample of companies.

Emphasis on the importance of understanding basic statistical notation for future complex concepts.

Conclusion and summary of the video's educational content.

Transcripts

play00:00

hello welcome back uh this in this video

play00:02

we're going to look at various measures

play00:04

of location specifically here we're

play00:07

going to look at mean median and mode

play00:10

now if

play00:11

i begin with the discussion on mean

play00:14

this is one that you've probably heard

play00:16

of you've probably calculated it before

play00:18

it usually goes by the name average

play00:21

uh more specifically uh the type of mean

play00:25

that we're going to look at here and the

play00:26

one that you're probably most aware of

play00:29

is called an arithmetic

play00:32

mean

play00:33

and

play00:34

this is a mean that is had by adding

play00:37

adding your data points together

play00:39

and dividing by the number of of data

play00:42

points that you have so for example if

play00:45

you have

play00:46

5

play00:47

5 7 10

play00:50

14

play00:51

if those are your different data points

play00:53

that you're working with

play00:55

then in order to calculate the mean

play01:00

i'm just getting myself a calculator

play01:02

then you add these together 5 plus 5

play01:05

plus 7 plus 10

play01:08

plus 14.

play01:10

so that equals 41.

play01:12

so here i have my mean is 41 divided by

play01:17

well how many observations do i have one

play01:20

two three four five observations divide

play01:24

that by five and here we get what is

play01:27

called our arithmetic mean which is 8.2

play01:33

so this value of 8.2

play01:36

is one measure of what we call central

play01:38

tendency or central location it's a

play01:41

measure of roughly the middle in this

play01:44

case 8.2 is somewhere around here

play01:47

it's roughly the middle of that data set

play01:50

now one of the characteristics about

play01:52

this mean this arithmetic mean

play01:54

is that it is influenced by the the

play01:57

value of those numbers

play01:59

if this wasn't a 14 if this was instead

play02:03

146

play02:05

well you can imagine this numerator is

play02:08

going to be much larger

play02:10

we still have five observations so this

play02:13

ratio is going to end up being a lot

play02:15

larger meaning our measure of mean is

play02:18

going to be somewhere way out here

play02:20

so having different values within your

play02:22

data set influences where that mean

play02:25

where that arithmetic mean

play02:28

will will fall

play02:30

this is opposed to

play02:32

the median

play02:34

the other measure of of central tendency

play02:38

or central location

play02:39

uh but really the median doesn't pay a

play02:42

lot of attention to the the magnitude of

play02:46

those observations if i start off here

play02:49

with my original data set

play02:52

so if i get rid of that 6 and here we

play02:54

just have 14.

play02:56

what is my median well the median is at

play02:58

value

play03:00

where 50 of your observations are

play03:02

greater than or equal to the median and

play03:05

50 percent of your observations are less

play03:08

than or equal to the median

play03:10

so in this exercise where i have five

play03:13

observations here my median is this

play03:17

point right in the middle it's a seven

play03:20

fifty percent of my observations are

play03:22

greater than or equal to and fifty

play03:24

percent are down here so it's really

play03:26

splitting it right in half

play03:29

that doesn't change if this is a 14 or

play03:33

if this is 146. my median is still in

play03:36

exactly the same location

play03:39

so unlike the mean the median is not

play03:42

influenced by the relative magnitude of

play03:45

those values of those data points

play03:48

the mode

play03:51

is similar in the sense that it's not

play03:53

impacted by the magnitude of any given

play03:57

observation but the mode is is uh

play04:01

identified by finding the observation

play04:04

that occurs with the highest frequency

play04:06

so in this case i have

play04:08

two fives a seven a ten let's go back to

play04:11

our fourteen

play04:13

so i have here the the fives occur twice

play04:17

each of the other values only occur once

play04:20

so this is then my mode

play04:24

so it's possible to have one mode if i

play04:27

had another 14 i could have two modes

play04:31

and we would call this

play04:33

bimodal

play04:35

if i had

play04:37

another ten i have two fives two tens

play04:40

two fourteens then we would call it

play04:42

multimodal and so we can have as many

play04:44

modes

play04:46

really any number of modes

play04:51

it's a relatively little value at this

play04:53

point to to get into the discussion of

play04:56

multimodal

play04:57

data sets that may come up a little bit

play04:59

later when we talk about probability

play05:02

in some of the later modules of this

play05:04

course now

play05:06

i want to

play05:08

come back before we do this exercise

play05:10

specifically i want to come back to this

play05:12

discussion on the mean

play05:15

partly because as much as you're aware

play05:18

of how to calculate it

play05:20

you've probably done it before i want to

play05:23

just briefly talk about uh the formal

play05:26

notation uh of the formula that is used

play05:29

and that you've probably used without

play05:31

even knowing it uh for calculating a a

play05:34

mean an arithmetic mean

play05:36

so what we're going to do i'll work with

play05:38

the same

play05:39

data set of these five observations

play05:42

and i'm just going to replace or give

play05:45

each one sort of an identifier so i'll

play05:47

call each of these values x1 x2 x3

play05:51

and x4 and x5

play05:54

so these different x's these can really

play05:56

take on any value okay so our five

play05:59

in in the first case our first

play06:01

observation now this five i can denote

play06:04

it this is x1 this is my first

play06:06

observation

play06:08

x4 this is my fourth observation in this

play06:11

case it happens to be 10

play06:14

out here x5 is in this case 14 this is

play06:18

my fifth observation

play06:20

my data set consists of a total of five

play06:23

observations so i would say

play06:25

n equals in this example

play06:28

five

play06:29

okay now

play06:30

notice that n equals five so i have five

play06:34

observations my last observation that

play06:37

subscript that five is the exactly the

play06:41

same as the number of observations that

play06:42

i have okay maybe that's obvious i don't

play06:45

know

play06:46

so when we're calculating the mean

play06:48

uh regardless of how many observations

play06:51

we have

play06:53

we're adding together x1 plus x2 plus x3

play06:58

in this case i'll work with just these

play07:00

five observations we're adding together

play07:03

all of our observations and we're

play07:05

dividing it by n

play07:07

the number of observations that we have

play07:09

this is probably obvious you've probably

play07:12

been doing this

play07:13

any time you've calculated a mean

play07:16

now when we write

play07:18

the formula the more generic formula for

play07:22

a mean

play07:24

what we'll use the notation is this

play07:27

symbol here which means

play07:30

the sum

play07:32

of

play07:33

so the sum of so adding all of these

play07:35

different observations together the sum

play07:37

of x

play07:39

i

play07:40

where i is just an index i is a

play07:42

placeholder

play07:44

and it takes on every value from 1

play07:48

through to n

play07:50

so this is just a shorthand way

play07:53

of writing

play07:54

what i've got right here in the

play07:55

numerator

play07:56

so it's the sum of x equals one

play07:59

here's one x equals two

play08:02

i equals three sorry two and three and

play08:05

four and five all the way

play08:07

to n

play08:08

representing however many

play08:11

observations we have in that data set

play08:14

and then finally

play08:16

once we take the sum and we've added all

play08:18

of these observations together then we

play08:20

divide by n and that's just the same as

play08:23

what we've done here

play08:25

so for our means this is the formula

play08:28

that we're using that we've always used

play08:30

you've probably used it before without

play08:32

really looking at it in this format and

play08:34

if we're calculating what we call a

play08:36

sample mean

play08:37

then this is denoted by x bar so what i

play08:41

mean by a sample mean is it's a subset

play08:43

of the population so in this example

play08:47

i have a share of 30 companies

play08:50

on the dow jones this is only 30

play08:52

companies this isn't all of the

play08:54

companies it's not the full population

play08:56

of companies this is just a sample of

play08:59

30. it's a subset of 30. so here the

play09:02

notation for a sample

play09:04

is x bar

play09:06

okay so that's maybe a little bit more

play09:08

detail than you want um at this point of

play09:11

how to calculate a mean but the reason

play09:14

why i want to discuss it is because

play09:18

this is the really the simplest formula

play09:21

that we're going to

play09:22

encounter in this in the sequence of

play09:24

videos uh and i i really want whoever

play09:29

whoever is watching to understand this

play09:31

notation and understand these formulas

play09:33

because as we progress through the more

play09:35

difficult material

play09:37

these types of formulas are going to

play09:39

evolve and they're going to grow and are

play09:41

going to become increasingly complex so

play09:44

it's very helpful if at this stage in

play09:46

the game you have a good understanding

play09:49

and a good degree of comfort uh working

play09:52

with

play09:53

this

play09:54

kind of notation because it's going to

play09:56

be following us for a while

play09:58

okay so enough about this let's get into

play10:02

our exercise here so we've got this uh

play10:05

list of 30 companies we want to

play10:08

calculate the mean median and mode

play10:11

share price now i think i've given

play10:14

myself a cheat here so i've already

play10:16

listed the companies i've relisted the

play10:19

companies in one column

play10:21

from smallest value

play10:24

to the largest value

play10:27

now sorting

play10:28

your data like this

play10:30

is helpful for

play10:32

at least two things one

play10:34

it's really imperative to have it sorted

play10:37

like this in order to identify the

play10:38

median value so that value that lies

play10:41

right in the middle of the data set so

play10:44

it's

play10:45

imperative for identifying the median

play10:48

for identifying the mode it's also

play10:50

helpful because having it sorted from

play10:52

smallest to largest any values that

play10:54

repeat themselves which is necessary for

play10:57

identifying a mode they'll all be

play10:59

grouped together and so it's much easier

play11:01

to see repeating observations because

play11:04

they're all grouped together so

play11:06

let's let's start off with the mode

play11:09

because

play11:10

really that's probably the easiest one

play11:12

to spot we're just looking for

play11:13

observations

play11:15

that repeat so as i go through i'm just

play11:18

going to look through and say well

play11:19

there's two thirties

play11:21

there's two fifty ones

play11:24

uh there's two one hundreds

play11:26

there's two 132s

play11:29

and i don't see any observation that

play11:31

repeats more than twice if if we had an

play11:33

observation that repeated three or four

play11:35

or five times

play11:37

then these pairs that i've identified

play11:39

here are no longer relevant um

play11:42

but because the highest frequency of the

play11:45

observation is two

play11:47

uh then it becomes relevant and i have

play11:50

it looks like three of uh four

play11:52

observations one two

play11:55

three four observations

play11:57

each repeating twice

play11:59

none that repeat more than twice

play12:02

so these values 30

play12:05

51

play12:06

100 and 132

play12:09

those are all my modes

play12:11

so there's one mode

play12:13

to

play12:14

so this data set

play12:17

is what we would call multimodal

play12:20

uh it has multiple modes it has multiple

play12:23

observations that

play12:25

repeat uh the same frequency

play12:28

uh

play12:30

more than any others okay

play12:32

so there's our mode uh let's find our

play12:35

median so when we've got a nice a bigger

play12:38

data set like this there's different

play12:40

ways that we can go about finding a

play12:42

median

play12:43

in another video we'll show you a way

play12:45

using a simple formula

play12:48

in this exercise let's just do it sort

play12:50

of the manual way where we'll go through

play12:53

and eliminate observations starting with

play12:56

the smallest and the largest

play12:58

and so if we just go through our data

play13:00

set

play13:01

and eliminate

play13:02

my pens not lining up

play13:04

and eliminate individual observations

play13:07

small and large

play13:09

until we converge

play13:11

to something in the middle

play13:15

this would be a lot easier if my pen

play13:18

lined up properly on the screen

play13:23

and so here we are is that is that right

play13:30

with a median this is not right because

play13:32

i have an even number of observations so

play13:35

i can't have fallen

play13:36

directly on to one observation if i have

play13:39

an even number

play13:41

of observations

play13:43

then

play13:44

i must narrow down and be left with a

play13:47

pair of observations what i did right

play13:49

there i went through this exercise and i

play13:51

came to one observation well that can't

play13:53

be the case because i know i have 30

play13:55

observations here

play13:57

so i need to finish at a pair of

play13:59

observations so let me try this again

play14:04

and hopefully

play14:07

i'll converge

play14:09

to a pair

play14:14

otherwise i know i've got a larger

play14:16

mistake that i'll need to deal with

play14:19

there

play14:20

so now we've gone through and i've

play14:21

gotten down to here this pair of

play14:24

observations 80

play14:26

and 77. i don't want to erase those in

play14:29

the same sense that i've erased or

play14:30

deleted the others because then i'm left

play14:32

with nothing and that's not really

play14:33

helpful

play14:34

in identifying

play14:36

my my

play14:38

medium so what i want to do now is find

play14:42

the middle

play14:43

of those two remaining observations so

play14:47

in this case it's uh this is going to be

play14:49

80 plus 77 divided by 2

play14:53

and so let me get my calculator

play14:56

80 plus 77

play14:59

divided by 2 so that gives me 78.5

play15:06

and so that's my median

play15:09

so half of my observations are less than

play15:12

78.5

play15:14

and half of my observations are greater

play15:16

than 78.5

play15:18

okay so we've got our modes four modes

play15:20

multimodal we have our median

play15:23

now let's calculate our mean and so our

play15:26

mean here this is going to be x bar

play15:28

equals and our formula again

play15:32

this is

play15:34

looking tedious let me actually change

play15:36

my color

play15:38

so x bar equals so the sum

play15:42

observations

play15:44

x i is i equals 1 through n and n in

play15:47

this case is 30 because i know i have 30

play15:50

observations

play15:52

divided by n

play15:53

which is 30.

play15:56

so i'm going to erase all of my blue

play15:58

lines here so i can see my observations

play16:01

better

play16:02

okay

play16:03

now i need my calculator

play16:08

and let's uh just go through with this

play16:10

calculation it might take a few seconds

play16:12

you're welcome to fast forward a little

play16:14

bit i'm just going to punch in each of

play16:15

these numbers add them together and

play16:17

divide by 30. so let's get started 27

play16:21

plus 30

play16:23

30

play16:31

54 57 62

play16:35

63 64 65

play16:40

71

play17:02

almost there 129

play17:05

132

play17:08

132

play17:10

148

play17:12

155

play17:14

168.

play17:16

so 2558 is the numerator

play17:20

divided by

play17:22

30

play17:24

equals 85.26

play17:28

so this is 85.26

play17:32

is my mean so this is my average share

play17:35

price of these 30 uh different companies

play17:40

taken off of the dow jones industrial

play17:42

average

play17:43

okay so there we have our mean our

play17:47

median and our four modes in this case

play17:51

for the sample of 30 shares from the dow

play17:55

jones industrial average i hope that has

play17:58

helped i'm going on to 18 minutes it's a

play18:00

little bit longer than i wanted

play18:03

but i really think it's important to

play18:05

have spent some time going over the

play18:07

details and then the notation and the

play18:09

formula for something that is

play18:11

understandably a simple complex a simple

play18:14

concept simple calculation

play18:17

but as i said before the more

play18:19

comfortable you are with this notation

play18:20

at this point

play18:22

the better off you'll be as the material

play18:24

progresses and evolves into more

play18:27

difficult

play18:28

concepts okay thank you very much for

play18:31

watching

Rate This

5.0 / 5 (0 votes)

Etiquetas Relacionadas
StatisticsMean CalculationMedian FindingMode IdentificationData AnalysisMath TutorialCentral TendencyArithmetic MeanData PointsDescriptive Stats
¿Necesitas un resumen en inglés?