Ch 3 Lecture Video, Fall 2024: Measures of Central Tendency

Charley H-M, Ph.D.
17 Sept 202424:12

Summary

TLDRThis educational video script delves into the significance of measures of central tendency in statistical analysis, focusing on mode, median, and mean. It explains how these measures help summarize data, revealing patterns and trends. The script clarifies the concept of range, the calculation of mode as the most frequent value, and the median as the middle value in a dataset. It also discusses calculating the mean, emphasizing its importance in various types of data, including interval, ratio, and ordinal. The script provides examples to illustrate these concepts, aiming to enhance understanding of statistical summaries.

Takeaways

  • 📊 Measures of central tendency are crucial for summarizing and analyzing data, helping to understand the 'story' the data is telling.
  • 🔢 The concept of 'multivariant' data is introduced, referring to datasets with two or more variables, emphasizing the complexity of modern data analysis.
  • 📈 Understanding distributions is key, which involves recognizing how data is spread across different variables and values.
  • 🎯 Central tendency focuses on identifying the 'middle point' and typical trends within a dataset, aiding in data summarization.
  • 🏅 The mode is defined as the most frequently occurring value or category in a dataset, serving as an easy identifier of commonality.
  • 🔢 The median is the middle value in a dataset when ordered from least to greatest, dividing the data into halves.
  • 📐 The range is calculated as the difference between the highest and lowest values in a dataset, providing a sense of data spread.
  • 📊 Percentiles are discussed as a way to understand the relative standing of data points within a dataset, differentiating them from percentages.
  • 🧮 The mean, or average, is calculated using the formula \( \bar{Y} = \frac{\sum f \times y}{N} \), where \( f \) is frequency, \( y \) is the value, and \( N \) is the total number of cases.
  • ⚖️ Different types of data (nominal, ordinal, interval, ratio) are considered in relation to how they can be analyzed using measures of central tendency, with a clarification that nominal data does not have a mean in the traditional sense.

Q & A

  • What are the measures of central tendency and why are they important?

    -The measures of central tendency include mode, median, and mean. They are important because they help summarize data and provide a better understanding of the typical values or patterns within a dataset, allowing for easier analysis and interpretation.

  • What is the definition of 'distribution' in the context of statistics?

    -In statistics, 'distribution' refers to how values are spread across categories and variables. It looks at the similarities and differences, essentially showing how data points are dispersed.

  • Can you explain the concept of 'range' in statistics?

    -The 'range' in statistics is the difference between the highest and lowest values in a dataset, representing the spread of the data.

  • What is the mode and how is it determined?

    -The mode is the value or category that occurs most frequently in a dataset. It is determined by identifying the data point with the highest frequency.

  • How is the median calculated for a dataset with an odd number of observations?

    -For a dataset with an odd number of observations, the median is calculated by arranging the data in ascending order and selecting the middle value.

  • What is the formula used to calculate the mean?

    -The formula to calculate the mean is the sum of all the values (ΣY) divided by the total number of observations (n), represented as Y Bar = ΣY / n.

  • How does the calculation of the median differ when the dataset has an even number of observations?

    -When the dataset has an even number of observations, the median is the average of the two middle values after the data is arranged in ascending order.

  • What is the significance of assigning numerical values to non-numerical data?

    -Assigning numerical values to non-numerical data allows for statistical analysis and interpretation, such as calculating means or modes, even when the data is ordinal or nominal.

  • Why can't nominal data be calculated with a mean in the traditional sense?

    -Nominal data, which includes categories like gender or eye color, cannot be calculated with a mean in the traditional sense because they do not have a natural order or numerical value that can be averaged.

  • What is the difference between a percentage and a percentile?

    -A percentage represents a proportion of a whole, while a percentile indicates the relative standing of a score within a dataset, showing the percentage of cases that fall below a particular value.

Outlines

00:00

📊 Introduction to Measures of Central Tendency

The paragraph introduces the concept of measures of central tendency, emphasizing its importance in statistical analysis. It mentions that understanding variables and frequency distribution tables is foundational, and now the focus shifts to central tendency to better comprehend data. The paragraph highlights the significance of summarizing data to grasp the underlying story it tells. It also touches on the relevance of multivariate data, explaining that it involves dealing with two or more variables. The measures of central tendency discussed include mode, median, and mean, each serving a distinct purpose but collectively aiding in understanding typical value distributions. The text also introduces the term 'distribution' and explains its role in analyzing how values are spread across categories and variables.

05:03

🔢 Exploring the Mode and Median

This paragraph delves into the specifics of the mode and median, two measures of central tendency. The mode is defined as the category or score with the highest frequency, using the example of Spanish being the mode among non-English languages in the United States due to the highest number of speakers. The concept of bimodal distribution is introduced, where two categories have the highest frequencies closely related but not identical. The median is then explained as the middle value of a distribution when data is ordered, with a detailed walkthrough of how to calculate it for both odd and even numbers of data points. The paragraph uses the example of hate crimes reported by states to illustrate the calculation of the median.

10:04

📈 Calculating the Mean and Understanding Percentiles

The paragraph discusses the mean, another measure of central tendency, and provides the formula for its calculation. It explains that the mean represents the average value and is crucial for interval and ratio variables. The text provides an example of calculating the mean for the ideal number of children based on survey responses, demonstrating how to use the formula and interpret the result. Additionally, the concept of percentiles is introduced, explaining how they indicate a score's relative standing within a range, differentiating them from percentages. The paragraph also addresses the calculation of medians in frequency distribution tables and the importance of organizing data to identify the mode.

15:05

📉 Assigning Numerical Values and Grouped Data

This paragraph addresses the process of assigning numerical values to non-numerical data for the calculation of the mean, such as in surveys using Likert scales. It explains how to calculate the mean by multiplying the frequency of each response by its assigned numerical value and then summing these products. The text provides an example involving political views and shows how to interpret the calculated mean in the context of the data. It also touches on the calculation of means for grouped data, such as hours worked, and the assignment of midpoint values to ranges. The paragraph concludes by clarifying that nominal data, which lacks order, does not have a mean calculated in the traditional sense but can be analyzed through percentage breakdowns.

20:06

🚫 Clarification on Nominal Data and Mean Calculation

The final paragraph clarifies that nominal data, such as gender or eye color, does not have a mean calculated in the same way as interval or ordinal data. It emphasizes that while numerical values can be assigned for statistical analysis, these do not represent a central point or average in the same sense. The text provides a clear distinction between calculating percentages, which is similar to means for categorical data, and calculating a mean for ordinal or interval data. It uses the example of calculating the average number of miles driven by individuals to illustrate a straightforward mean calculation. The paragraph reinforces the importance of understanding the nature of data when performing statistical analysis and calculating central tendencies.

Mindmap

Keywords

💡Central Tendency

Central tendency refers to the center point of a data set, which can be represented by measures such as the mean, median, and mode. In the video, central tendency is crucial for summarizing and understanding data distributions. It helps in identifying the typical or average value within a set of data. For instance, the video discusses how measures of central tendency allow for a better understanding of the 'story' the data is telling, highlighting the importance of these measures in statistical analysis.

💡Frequency Distribution Tables

A frequency distribution table is a statistical tool used to organize data into categories and count how often each value or range of values occurs. The video script mentions that understanding these tables is a prerequisite to studying central tendency, suggesting that they are fundamental in building a foundation for statistical analysis. They are used to visualize data and prepare it for further analysis, such as calculating the mode.

💡Mode

The mode is the value that appears most frequently in a data set. It is one of the measures of central tendency discussed in the video. The mode is described as the 'easiest to identify' and is used to determine the most common category or score within a distribution. An example given in the script is the mode for the number of speakers of a language, where Spanish is identified as the mode due to having the highest number of speakers.

💡Median

The median is the middle value in a data set when the numbers are arranged in ascending or descending order. It is another measure of central tendency that the video discusses. The video explains that the median divides the data into two equal halves, with half the values above and half below the median. It is calculated differently for odd and even numbers of data points, as illustrated with examples of hate crimes data and ideal number of children.

💡Mean

The mean, often referred to as the average, is calculated by summing all the values in a data set and then dividing by the number of values. It is a measure of central tendency that the video emphasizes as important for understanding data distributions. The video provides examples of calculating the mean, such as the average number of children people consider ideal or the average number of miles driven by a group of individuals.

💡Range

The range is the difference between the highest and lowest values in a data set, representing the spread of the data. The video introduces the concept of range to understand the dispersion of data values. It is used to identify the extent of variability within a dataset, as shown when calculating the range of points in an example.

💡Multivariate Data

Multivariate data refers to data sets that involve multiple variables. The video script discusses the importance of understanding central tendency in the context of multivariate data, where summarizing and analyzing data with two or more variables becomes necessary. This concept is crucial for handling complex data sets and is part of the progression from univariate to more advanced statistical analysis.

💡Distribution

Distribution in the context of the video refers to the way values are spread across categories and variables. It is a key concept in understanding how data is organized and how it behaves. The video mentions distribution in relation to understanding the spread of values and the similarities and differences within the data, which is essential for identifying central tendencies.

💡Percentiles

Percentiles divide a data set into 100 equal parts and indicate the value below which a certain percentage of observations fall. The video script touches upon percentiles as a way to understand a score's location within a range of data. It explains that a percentile placement, such as the 75th percentile, indicates that 75% of cases are below that score, providing a relative measure of standing within a distribution.

💡Nominal Data

Nominal data refers to categorical data that cannot be ordered, such as gender or eye color. The video clarifies that while nominal data can be organized and counted, it does not have a mean in the statistical sense because it lacks an order or hierarchy. The script uses nominal data as an example to contrast with other types of data, such as ordinal or interval data, where measures of central tendency like the mean can be calculated.

Highlights

Measures of central tendency are crucial for statistical analysis and understanding data.

Understanding variables and frequency distribution tables is fundamental before delving into measures of central tendency.

Measures of central tendency help summarize data and reveal the story it tells.

Multivariate data involves two or more variables and is important for summarizing complex data sets.

Distribution refers to how data is spread across variables and values, showing similarities and differences.

Central tendency focuses on the average or typical patterns or trends within a data set.

Mode is the value or category with the highest frequency in a data set.

Bimodal distribution occurs when two values have the highest frequency.

Median is the middle value in a data set when arranged in order, dividing the data into halves.

Calculating the median involves finding the middle value or averaging the two middle values for even sets.

Mean is calculated by summing all values and dividing by the number of cases.

Mean can be influenced by assigning numerical values to categories in surveys or ratings.

Grouped data requires assigning a midpoint value to each group for mean calculation.

Nominal data, such as gender or eye color, cannot be calculated with a mean but can be presented in percentage breakdowns.

Ordinal data, like stress levels, can have a mean calculated to find a central point.

Understanding measures of central tendency is essential for analyzing and summarizing data effectively.

Transcripts

play00:00

we're going to talk about the measures

play00:01

of central tendency because um this is

play00:03

really one of the the this is really

play00:06

important part to um moving forward and

play00:10

getting the work done and starting to

play00:12

analyze things statistically um you know

play00:15

we've been doing things in starting to

play00:18

build our knowledge and understanding

play00:21

right so far we understand what the

play00:22

variables are we understand what

play00:24

frequency distribution tables are we

play00:26

have an understanding of these things

play00:28

right so now what we're doing is we're

play00:30

going to be working with the measures of

play00:32

central tendency which helps us

play00:34

understand our data a little bit

play00:36

more now with this um understanding the

play00:40

measures of central tendency is

play00:41

important you know we can use it for our

play00:43

visuals of course but really it's

play00:45

important most important because it

play00:48

allows us to summarize our data it

play00:50

allows us to summarize our data and be

play00:52

able to have a better understanding of

play00:55

what's happening the story it's telling

play00:57

the P picture that's it's painting right

play01:00

and so from there um we're able to be

play01:04

able to work with larger data sets and

play01:06

more variables uh so the idea of

play01:09

multivariate data is important because

play01:12

we might need to summarize more we might

play01:13

need to summarize things that have more

play01:15

than just one variable in this case

play01:18

multivariant means two or more variables

play01:20

so multi being multiple and then the

play01:22

variants right being variables so being

play01:25

able to understand the measures of

play01:26

central tendency and how that influences

play01:27

our movement forward um is important

play01:31

so in terms of the measures of central

play01:34

tendency um there's a couple there's a

play01:37

couple words that your book uses and

play01:39

just kind of acts as if uh you should

play01:41

understand what they are and one is um

play01:43

distribution it's used it a lot we're

play01:45

going to continue using that word and

play01:47

with distribution it's essentially how

play01:50

things are distributed right so how it's

play01:51

spread how it's broken up uh across

play01:54

variables and values so like it's it's

play01:57

the similarities and differences so

play01:58

we're looking at how are spread out how

play02:01

values are spread out across categories

play02:03

and

play02:04

variables for the measures of central

play02:06

tendency what we're looking at is we're

play02:08

looking at basically the average or the

play02:10

typical you know what's average or

play02:12

typical about the distribution so

play02:13

thinking about central tendency right

play02:15

Central where is that kind of Middle

play02:17

Point and tendency how do what is what

play02:20

is the what does it tend to do right the

play02:23

averages or typical um patterns or

play02:26

trends that are happening with the data

play02:28

it's essentially looking at categories

play02:30

and scores and then being able to

play02:31

describe these things what is typical

play02:34

across these values so there's three

play02:37

concepts that we're going to talk about

play02:38

mode median and mean each of these serve

play02:41

a different purpose um but they still

play02:43

highlight typical distributions of

play02:45

values so they all have a purpose a

play02:47

different purpose but they still allow

play02:49

us to see the measures of central

play02:50

tendency the the distribution of values

play02:53

within a

play02:56

category so in chapter 4 on page 100

play03:00

um it talks about range that's a little

play03:02

bit of a head a jump ahead but at the

play03:04

same time um at the same time uh it's

play03:09

important I think that we talk briefly

play03:10

about what range is so the range is

play03:13

basically just that it's the range of

play03:16

values um what that means is that it's

play03:21

what is the distance between your lowest

play03:23

and highest value so if you had like

play03:26

points for instance 4 6 19 32 79 those

play03:29

were points your range would be 75

play03:32

because 79 - 4 right 79 is your highest

play03:34

value four is your lowest value you'll

play03:37

also see these little curly brackets and

play03:39

those are called Curly brackets and they

play03:41

basically mean discrete values and

play03:42

that's where you'll see your range the

play03:44

discrete value is basically um are

play03:47

values that cannot be subdivided right

play03:48

so go back to continuous and discreet um

play03:51

from a couple weeks ago and that's what

play03:52

we're looking at with our range and the

play03:55

reason why I bring it range now is

play03:56

because how can we understand you know

play03:59

what our our or what the data does what

play04:01

it tends to do where the center point is

play04:03

if we don't understand kind of what the

play04:05

range is our code word is Apple so the

play04:08

next thing that we're going to talk

play04:09

about is the mode um the mode is a is

play04:13

the category or the score with the

play04:15

largest frequency this does not mean

play04:17

it's

play04:19

it's if you had like a rating scale it

play04:22

does not mean that 10 is your mode

play04:25

simply because it's the highest number

play04:26

it's the value or the category that has

play04:29

the highest number of things that are

play04:31

occurring it's the easiest to identify

play04:34

so it's it's essentially the answer or

play04:36

the selection that somebody chose or

play04:38

that the most amount of people chose um

play04:40

you find the category and you find the

play04:42

highest frequency it's really the

play04:43

easiest to identify it's the thing that

play04:45

occurs the most amount of times

play04:49

um it's the category or the score not

play04:52

the frequency itself so if it occurred

play04:54

17 times you're not saying the mode is

play04:56

17 you're saying the mode is whatever

play04:58

category or score or value is is there

play05:02

um if you have something that has more

play05:05

than you have like two categories that

play05:07

have the same um are the the highest

play05:10

though that's called was called bodal so

play05:12

think about bu being to modal for the

play05:14

modes um if they're also close but not

play05:17

exact so like let's say 75 and then 74.8

play05:20

that is essentially still bodal and you

play05:22

would report those two highest high you

play05:24

would still report the two highest

play05:26

categories or scores um

play05:30

now the next slide we're going to see

play05:31

from your book on page it should be on

play05:33

page 64 uh an example of how mode can

play05:36

look so in this case they listed as

play05:39

foreign languages but I'm referring to

play05:40

it as languages outside of the outside

play05:43

of languages in the United States out

play05:45

that are not English and so this is like

play05:48

the number of speakers on the right and

play05:49

the language on the left and this is the

play05:52

number of speakers in those

play05:54

corresponding

play05:55

languages in this case Spanish is the

play05:57

mode now it may not be organized this

play06:00

way though and it's important that you

play06:02

understand and identify how you will be

play06:04

able to organize your data as well

play06:06

because you're going to have to organize

play06:08

it it's best to organize it to be able

play06:10

to see what the mode is in this case

play06:12

it's easy to identify right we know what

play06:14

the mode is the mode here then is

play06:16

Spanish Spanish is the mode because it's

play06:18

has the highest number of speakers code

play06:20

word is

play06:21

pumpkin now the median is different so

play06:24

the mode is relatively simple because

play06:25

you can identify it remember it's not

play06:27

the frequency so in this case it's not

play06:29

37 million that's not the mode the mode

play06:31

is Spanish in this case the median is

play06:35

different because you're going to want

play06:36

to put things in or in order there is

play06:39

some sort of logical order not

play06:40

everything will fit into here but you

play06:42

can still find the median in the vast

play06:43

majority of things um but it's the exact

play06:46

middle of the distribution or the spread

play06:47

of numbers so in this case it divides it

play06:50

in half half above and half below um so

play06:54

you have to sort your data you have to

play06:56

sort it from you know highest to lowest

play06:58

or lowest to highest if if it has the

play07:00

even number of cases you divide it in

play07:02

half and you do the calculation based

play07:03

off those if it has an odd number of

play07:06

cases you use this Formula n+ 1 / 2

play07:09

where you divide it in half and that is

play07:11

your is your value and I'm going to show

play07:12

you on the next couple of slides so it

play07:14

makes a little more sense so essentially

play07:16

though the median is where you have your

play07:18

data and it's half above half below um

play07:21

and the median is literally that middle

play07:23

point of the data so in your book this

play07:26

is the table that it presents to you um

play07:29

if you look look at it right the number

play07:30

of hate crimes on the left uh is not in

play07:32

order it is not in any order it's just

play07:35

randomly um randomly placed there now if

play07:37

you look on the right it has the state

play07:39

and its corresponding state with the

play07:41

number of hate crimes that occur in it

play07:43

so we see that there are nine cases

play07:45

right so we have 1 2 3 4 5 6 7 8 nine so

play07:49

we have nine cases nine states in this

play07:51

case then that have reported the number

play07:52

of hate

play07:54

crimes now we've reorganized it now

play07:57

we've ordered it so if you look right

play07:58

there's no organization now we've

play08:00

ordered it from fewest to greatest from

play08:02

least to greatest in terms of amount so

play08:04

now we can see that this is in some sort

play08:06

of order still nine cases though now if

play08:09

you look at this here on the left hand

play08:12

side this is nine cases so because it's

play08:14

it's odd we have to use this little

play08:16

formula so we have nine cases right here

play08:18

there's nine cases um plus one because

play08:21

we need to be able to figure out the

play08:22

middle divided by two so in this case we

play08:24

get 10 divided two which is five so what

play08:26

it means is that we need to look at the

play08:28

fifth case so 1 2 3 4 5 which gives us

play08:31

Texas right here and that means that 145

play08:35

is our median that means that that one

play08:39

Texas here gives us our meeting of 145

play08:42

so that means half the number of cases

play08:44

will be fewer than 145 and half the

play08:46

number of cases will be more than 145 so

play08:48

that's what the median means is the

play08:49

middle point now this is for our odd

play08:53

number of cases but what happens if we

play08:55

have an even number of cases so we know

play08:57

that our median is 145 here

play09:00

let's according to like your textbook

play09:01

did this so let's say for a minute

play09:03

California didn't report um we have

play09:05

eight so we still have this order we've

play09:07

taken California off so we're pretending

play09:08

it doesn't exist right now and we have

play09:10

eight number of cases 1 2 3 4 5 6 7 8

play09:15

right so what happens is you look at the

play09:17

two middle so 1 2 3 4 hm 1 2 3 4 H so

play09:21

that gives us North Carolina and Texas

play09:23

well what you have to do is you have to

play09:25

add these two together right here and

play09:27

divide it by two so gives you your

play09:29

median and your median is

play09:32

142.5 so the median basically means that

play09:34

there are going to be half the amount of

play09:36

cases that are above 142.5 and half that

play09:39

are below and this is if you have an

play09:40

even number of cases you find the two

play09:42

middle ones you add them together and

play09:44

divide them into two to get you your

play09:46

median

play09:53

number and I'm going to end this video

play09:55

here and I'm going to create another

play09:56

video so that way you have them divided

play09:58

into two

play10:00

head okay so we're going to go ahead and

play10:04

um leave off now from where the last

play10:06

video was and the last video we were

play10:09

talking about median now you have your

play10:11

frequency distribution tables and um and

play10:15

those are useful when we're coming to

play10:18

when we're trying to figure out our

play10:19

median right because you've already kind

play10:21

of organized all the data but what

play10:23

happens is when you are creating this um

play10:26

sometimes you have certain categories

play10:28

that are associated with it so we kind

play10:29

of already talked about that a little

play10:30

bit uh and essentially when there's some

play10:33

sort of category though that isn't

play10:35

numerical in

play10:36

value um or is like a different category

play10:40

in terms of

play10:43

um what you're trying to measure so on

play10:45

page 70 it talks about political views

play10:48

um it says that the the mean uh or not

play10:50

the mean sorry the the the number um

play10:53

comes to

play11:00

21.5 what it's saying is that value is

play11:02

associated with the label of moderate

play11:04

and political views and that's what

play11:07

you're going to go to um so in a couple

play11:10

slides we'll talk about a little I'll

play11:11

talk about it a little bit more um in

play11:13

more detail but that's just something to

play11:15

think about when it comes to

play11:16

calculations of um of median for these

play11:20

categories another thing about this is

play11:25

um percentiles so you've probably

play11:28

encountered and experience percentiles a

play11:30

lot of standardized testing has

play11:32

percentiles um and with these

play11:35

percentiles it gives you an idea so a

play11:38

percentile is not the same as a

play11:39

percentage it is a general location

play11:41

within the a range it's a score at or

play11:43

below a specific range so if you scored

play11:46

in the 70th 75th percentile it means

play11:50

that 75% of cases are below it

play11:54

um and that's a typo so I apologize but

play11:56

75% of the cases are below it so so that

play11:59

means that 25% of cases are above that

play12:02

and it doesn't mean you scored a 75% you

play12:04

scored could have scored a 20% but you

play12:06

still are 75 uh in the 75th percentile

play12:09

or you could have scored 100% but you're

play12:11

still in the 7 maybe the 75th percentile

play12:14

because there was extra credit stuff um

play12:17

so these are things to consider when

play12:19

we're talking about the calculations of

play12:20

the medians we'll talk about percentiles

play12:22

in more detail um at a later point so

play12:26

now with that um is our mean now and

play12:31

this calculation this formula here is

play12:33

super important when I refer to these

play12:35

things this y here stands for Y Bar this

play12:38

e is Sigma which means sum the Y so the

play12:42

combination of these two is the sum of

play12:44

all your scores your variable scores um

play12:46

Y Bar means your mean and this n is the

play12:49

total number of cases and this is the

play12:50

formula that you use to calculate mean

play12:54

um there's a couple different ways that

play12:55

you're going to calculate data values

play12:57

it's mostly for Interval ratio VAR

play12:59

variables but it's anything that has a

play13:01

data value to it and we're going to talk

play13:03

about this in more you might also hear

play13:05

it referred to as um average okay so

play13:08

we're going to talk about that in more

play13:10

detail um on the next couple of sides to

play13:13

see how it's applied depending on what

play13:15

type of data we're using that data

play13:19

informs um how we move forward with it

play13:23

so um we're going to go to the next

play13:27

slide

play13:30

and oh too far okay so this is um on

play13:36

page 73 it's in chapter 3 and this is

play13:40

about the ideal number of children so

play13:42

essentially this is a survey um

play13:46

about this is a survey about

play13:51

um the ideal number of children

play13:56

and that means that

play13:59

they were asked what do you believe is

play14:01

the ideal number of children so the list

play14:03

here is you know 0 1 2 3 4 5 6 with the

play14:07

number of

play14:08

children and frequency is the number of

play14:12

responses that selected that option so

play14:14

in this case these were the options zero

play14:16

children one zero children one child two

play14:18

children three children four five six

play14:21

and this is the number of people who

play14:22

selected these options so 13 people said

play14:25

zero 16 people said 1 5006 said 2 Etc

play14:29

which gives us our n at

play14:31

868 so if you look over at this ex I'm

play14:35

going to move

play14:40

this so if you look over at this this

play14:42

here is Sigma times you or S yeah

play14:46

Sigma is the sum of your frequency times

play14:49

your yvalue so your frequency times your

play14:51

yvalue in this case is frequency times

play14:53

your y-value the reason you're doing it

play14:55

this way is because you're trying to

play14:57

measure the average average number of

play14:59

children all right that it seems to be

play15:02

ideal so in this case you have zero you

play15:05

have 16 you have 1,2 because what you're

play15:08

doing is you're multiplying your y value

play15:10

with your frequency so this is 506 * 2

play15:14

is here so this gives you your n of 868

play15:17

and then it gives you your your um your

play15:19

sum of all your y's which is

play15:21

2139 so what you do if you go back to

play15:24

that formula is you have

play15:27

2,100 sorry

play15:30

2,139 um divided by 868 which comes to

play15:34

2.46 and so oops so in this case right

play15:38

that means the ideal number of children

play15:39

is

play15:40

2.46 uh we'll talk about how we can

play15:42

report on this or the different ways

play15:44

that we can report on this but

play15:45

ultimately it's totally cool to save

play15:47

2.46 children um and go and go from

play15:52

there so what we do is um we're making

play15:55

this calculation and you're trying to

play15:58

figure out what what the answer is so in

play16:00

this case again the ideal number of

play16:01

children is um

play16:06

2.46 all

play16:09

right

play16:12

now here's a couple things to think

play16:14

about in terms of calculating mean in a

play16:16

different in a variety of other ways um

play16:22

so when we're looking at means this this

play16:25

one's a little more straightforward in

play16:26

some ways not this

play16:31

ah all right so well okay so before this

play16:34

was a little more straightforward in

play16:35

terms of the calculations but sometimes

play16:37

what happens is we have to actually

play16:42

assign a numerical value to something so

play16:44

when you're using a linear scale or a

play16:46

lyer scale um generally these generally

play16:49

you're you know if or let's say you have

play16:51

like a value a number value you've

play16:53

already assigned a numerical value to it

play16:54

but sometimes you don't and so you'll

play16:56

just have strongly agree agree neither

play16:58

disagree Etc and you'll assign a

play16:59

numerical value to it if you haven't

play17:01

already now this is then if you have

play17:04

your frequency this is the number of

play17:06

times that somebody selected this

play17:08

option then your frequency same thing

play17:11

and this is what it comes to the way you

play17:13

analyze this and you interpret it though

play17:15

is your mean comes to

play17:18

2.25 you don't have a 2.25 over here

play17:21

that doesn't have any value or any

play17:23

meaning to us right so what happens is

play17:25

you identify which numerical value

play17:27

closely aligns to the

play17:29

2.25 which basically is this neither

play17:32

agree or disagree so this means that

play17:34

most people are um you know the average

play17:37

comes to 2.25 which aligns with neither

play17:40

agree or disagree so that's essentially

play17:42

how you're identifying that why you're

play17:44

how you're using the mean in these types

play17:46

of linear likey scale

play17:53

calculations

play17:54

now there we go um okay so oops okay so

play18:00

sometimes you're going to have grouped

play18:02

um things too all right or frequency

play18:04

ranges and so for you for a lot of you

play18:05

you use things like hours works or

play18:07

credits and so similarly like on the

play18:09

left side hours work 0 1 through 5 6- 10

play18:12

11 through 15 16 through 20 you assign a

play18:15

numerical value to that as well because

play18:18

again you're not calculating this per se

play18:21

on its own you assign a numerical value

play18:23

and the reason this is also a good

play18:25

practice is because if you were to do a

play18:26

quantitative reasoning class where you

play18:28

have to do some additional statistical

play18:30

analysis being able to understand how

play18:32

you assign a numerical value to it is

play18:33

important so then you create a frequency

play18:36

your frequ you include your frequency so

play18:38

in this case we have this it comes to 32

play18:41

um and similar to before sorry for

play18:44

having to move this similar to before

play18:46

you have the same process right so it's

play18:49

0 3 10 24

play18:52

16 and then this comes 2 53 so darn it

play18:57

um okay so in this

play18:59

case your mean is 166 now this is kind

play19:02

of Where It's tricky because it's kind

play19:03

of arbitrary which means there's no real

play19:05

rule for it so the way I would say is

play19:08

like you figure okay well the numerical

play19:09

value is 1.66 it's kind of divided here

play19:12

well which really kind of leans more

play19:13

towards 6.10 but it also means we're not

play19:16

completely there so really you could say

play19:18

that most students work within the range

play19:19

of 1.5 per week but they probably work a

play19:22

little bit more than that range because

play19:24

that's essentially what it's telling you

play19:26

right based off of this mean so that's

play19:29

how you would calculate some sort of G

play19:31

grouped frequencies or ranges that you

play19:33

have

play19:34

here now um when it so um I realized

play19:39

that with the nominal data

play19:41

um I was thinking about it in terms of

play19:45

your individual questions and I had

play19:47

worked with most of you and you had like

play19:50

group frequencies and ranges you had um

play19:53

a variety of other things and so I

play19:56

wanted to kind of clarify a things that

play19:59

I may have said so nominal data itself

play20:02

can't be calculated with a mean some of

play20:04

you may actually have Nom and I when I

play20:06

say nominal data I don't necessarily

play20:07

mean

play20:11

um I don't necessarily mean it in terms

play20:13

of like hours work and such I mean

play20:15

nominal data specifically in terms of uh

play20:19

like gender and eye color and even major

play20:22

uh and so you're not going to have a

play20:24

mean calculation in the same way you

play20:27

will have the percent breako breakdown

play20:30

which is calculated similar to the mean

play20:33

um but it's not the mean in the typical

play20:37

way um and the reason I say this is

play20:40

that you will assign a value to them so

play20:43

similar to like the previous um slides

play20:47

and that gives you insight to be able to

play20:50

run things

play20:51

statistically but it's not a mean in the

play20:55

sense that it shows you a point of the

play20:58

average a use of something or average

play21:00

water or average hours so if I said that

play21:04

you would have a mean for everything the

play21:07

vast majority of things will have a mean

play21:09

some things will only have a mode if

play21:11

it's only a category um so sometimes

play21:15

your data you know if it's ordinal data

play21:19

you will have it and for a lot of your

play21:22

projects most of you have like interval

play21:24

Ratio or ordinal um but thinking about

play21:28

some of the major ones like major that

play21:30

is nominal and that's not going it

play21:31

doesn't have any order so you can

play21:33

calculate these averages right um but

play21:36

they don't have a statistical meaning in

play21:38

the same way and so I just wanted to

play21:40

make sure I clear that up if there was

play21:42

any confusion about what I was saying um

play21:45

which is why for some of you when I said

play21:46

to have these uh I perhaps the better

play21:49

way I should have said is like have it

play21:51

set up to where you will have two rows

play21:53

two columns um and you can calculate

play21:56

those things in terms of that break so

play21:58

some of your frequency distribution

play22:00

tables will be enough for that but

play22:02

essentially in this case you're um you

play22:04

you can calculate percentages which is

play22:06

essentially very similar um and that was

play22:11

my confusion um based off of the number

play22:14

of projects that I've been working on

play22:16

closely with all of you um so I just

play22:18

wanted to kind of clear that up um but

play22:21

it is important when we're talking about

play22:23

some of the two column ones that I was

play22:24

asking you to do uh which will arise um

play22:29

in terms of your projects so for

play22:31

instance you're thinking let's say you

play22:33

want to know you have been talking to

play22:35

these five people Sam Charlie Sarah Max

play22:37

and Joe and you want to know what the

play22:40

average or mean number of miles driven

play22:42

in a year between these people are so

play22:44

you asked you ask them how many miles

play22:46

have you driven in this past year and

play22:48

they give you this variety of things so

play22:50

Sam gives you

play22:51

7,335 Charlie's

play22:54

36295 Sarah is

play22:56

21523 Max is 10,23 5 and Jo's 12

play23:00

567 so in order to come up with your

play23:03

mean it's going to be very similar to

play23:05

where you calculate all of those numbers

play23:08

um and you divide it by the number of

play23:11

cases so in this case you add these all

play23:13

up comes to 87,000 955 you divide it by

play23:16

the total number of people which is five

play23:18

um which means that your mean miles

play23:20

driven is 17591 and that's it that's all

play23:22

you have to do for this uh mean um or

play23:24

average calculation so really the big

play23:27

thing is to think about like because

play23:28

there's no order to eye color there's no

play23:31

order to um any of those things but if

play23:34

you had like a category that was ordinal

play23:36

in terms of like stress levels or

play23:41

um a variety of those things then that's

play23:44

where you would calculate the mean you

play23:45

will calculate percentages which are

play23:47

still kind of like averages in a way but

play23:50

you're not assigning some sort of

play23:51

hierarchal value or statistical value to

play23:54

it in the sense that it will inform or

play23:56

influence some sort of Central Point and

play24:00

so that was that's kind of what I wanted

play24:02

to clear up and clarify here um so

play24:05

that's all there is for measures of

play24:07

central tendency so you are going to be

play24:09

thinking about these things as you move

play24:10

forward

Rate This

5.0 / 5 (0 votes)

Etiquetas Relacionadas
StatisticsData AnalysisCentral TendencyMode MedianMean CalculationFrequency DistributionData SummarizationStatistical ToolsMeasures of TendencyData Visualization
¿Necesitas un resumen en inglés?