Types of Data: Nominal, Ordinal, Interval/Ratio - Statistics Help

Dr Nic's Maths and Stats
13 Dec 201106:20

Summary

TLDRThis script explores the fundamental types of data in statistical analysis: Nominal, Ordinal, and Interval/Ratio. It explains how each type affects the choice of summary statistics and graphical representation, using the example of a questionnaire about choconutties. Nominal data, like chocolate preference, is best shown in pie or bar charts, while ordinal data, such as satisfaction levels, should be ordered in column charts. Interval/Ratio data, including age and spending, offers the most versatility for analysis and is effectively displayed in bar charts, histograms, or line charts.

Takeaways

  • πŸ“Š Data is central to statistical analysis and is collected to learn more about a phenomenon or process.
  • πŸ‘₯ Each thing data is collected about is called an observation, which could be a person, business, product, or period in time.
  • πŸ“ˆ Variables record the measurements of interest, such as age, sex, or chocolate preference, and are stored in rows and columns within a spreadsheet.
  • πŸ”’ The level of measurement used (Nominal, Ordinal, Interval/Ratio) determines the appropriate summary statistics, graphs, and analysis.
  • 🏷️ Nominal data, also known as categorical or qualitative, includes labels without a sense of order, like sex or preferred chocolate type.
  • πŸ“ Ordinal data has a meaningful order but unequal intervals between values, such as ranks or satisfaction levels.
  • πŸ“‰ Interval/Ratio data is the most precise level, including measurable quantities like age, weight, or number of customers, and can be either discrete or continuous.
  • πŸ“Š The representation of data in graphs or charts depends on the level of measurement, with specific guidelines for each type.
  • 🍫 In a practical example, Helen collects customer data on various variables, including nominal, ordinal, and interval/ratio data, and analyzes them using appropriate charts and summary statistics.
  • πŸ” The type of analysis performed on a dataset should be based on the level of measurement of the variables.

Q & A

  • What is the purpose of collecting data in statistical analysis?

    -The purpose of collecting data in statistical analysis is to find out more about a phenomenon or process by collecting several measures on each person or thing of interest.

  • What is an observation in the context of data collection?

    -An observation is each thing we collect data about, which could be a person, a business, a product, or a period in time such as a week.

  • What is a variable in data collection?

    -A variable is a characteristic or measurement that is recorded for each observation, such as age, sex, and chocolate preference.

  • How is data typically organized in a spreadsheet or database?

    -In a spreadsheet or database, each row corresponds to a single observation, and each column represents a variable.

  • What is the Nominal level of measurement?

    -The Nominal level is the most basic level of measurement, also known as categorical or qualitative, and includes variables like sex and preferred type of chocolate with no sense of order.

  • How can nominal data be summarized?

    -Nominal data can be summarized using frequency or percentage, but you cannot calculate a mean or average value for it.

  • What is the difference between ordinal and nominal data?

    -Ordinal data has a meaningful order, unlike nominal data, but the intervals between the values may not be equal. Examples include rank and satisfaction.

  • Is it appropriate to calculate a mean for ordinal data?

    -While some argue against calculating a mean for ordinal data, it is common practice in research, especially regarding people's behavior, but one should be cautious and consider the implications.

  • What is the most precise level of measurement and what does it include?

    -The most precise level of measurement is interval/ratio, which includes measurable quantities like the number of customers, weight, age, and size.

  • What are the common summary measures for interval/ratio data?

    -The most common summary measures for interval/ratio data are the mean, the median, and the standard deviation.

  • How should different levels of data be represented graphically?

    -Nominal data can be displayed as pie charts, column or bar charts. Ordinal data is best shown as a column or bar chart. Interval/ratio data is represented as bar charts, histograms, or line charts.

  • In the example of Helen's choconutties, what type of data is the customer's age and how should it be represented?

    -The customer's age is interval/ratio data and can be represented on bar charts or histograms.

  • What is the significance of the mean age of Helen's customers in the sample?

    -The mean age of 38 years for Helen's customers in the sample provides a meaningful summary statistic that can be used for further analysis or marketing strategies.

Outlines

00:00

πŸ“Š Understanding Data Types and Their Analysis

This paragraph introduces the fundamental concepts of data types and their significance in statistical analysis. It explains the difference between nominal, ordinal, interval, and ratio data, and how they are measured and represented. Nominal data, such as sex or chocolate preference, is categorical with no inherent order, and is summarized using frequencies or percentages. Ordinal data, like satisfaction levels, has a meaningful order but variable intervals, and its mean calculation can be controversial. Interval/Ratio data, which includes measurable quantities like age or weight, is the most mathematically versatile and can be summarized using mean, median, and standard deviation. The paragraph also discusses appropriate graphical representations for each data type, such as pie charts for nominal data and bar charts or histograms for interval/ratio data. The example of Helen's choconutties questionnaire illustrates how these concepts apply to real-world data collection and analysis.

05:02

πŸ“ˆ Analyzing Customer Preferences and Behavior

The second paragraph delves deeper into the analysis of the data collected by Helen through her customer questionnaire. It provides specific examples of how to summarize and interpret the different types of data. For nominal data, such as the type of chocolate preferred, percentages are used to show preferences, with 46% favoring dark chocolate, 40% milk chocolate, and 14% white chocolate. Ordinal data, including satisfaction and likelihood to purchase, should be displayed in a logical order using a column chart, and the mean satisfaction score is calculated as 2.06, indicating a 'satisfied' response, though the validity of this calculation is questioned. Interval/Ratio data, such as age, grocery spending, and chocolate bar purchases, are presented with mean values, showing the average age of 38 years, an average grocery spend of $192, and an average of 3.3 chocolate bars bought per week. The paragraph emphasizes the importance of selecting the appropriate statistical analysis based on the level of measurement of the data.

Mindmap

Keywords

πŸ’‘Data

Data refers to the collection of information or values that are used for statistical analysis. In the context of the video, data is central to understanding a phenomenon or process, and it is collected through various measures on each subject of interest. The script emphasizes the importance of data in forming the basis for statistical analysis and decision-making.

πŸ’‘Observation

An observation is a single set of data collected for a subject of interest, such as a person, business, product, or period in time. The script explains that each observation is associated with a set of variables, and the data for each observation is recorded as a row in a spreadsheet or database.

πŸ’‘Variable

A variable is a characteristic or factor that is measured and recorded in a dataset. Examples from the script include age, sex, and chocolate preference. Variables are the elements that are quantified and analyzed to draw conclusions about the observations.

πŸ’‘Level of Measurement

The level of measurement is the scale used to quantify the variables in a dataset. The script outlines different levels, such as nominal, ordinal, and interval/ratio, which determine the types of statistical analysis that can be performed on the data.

πŸ’‘Nominal

Nominal data is the most basic level of measurement and is used for categorical or qualitative variables. The script provides examples like sex and preferred type of chocolate, which are labels with no inherent order. Nominal data is summarized using frequencies or percentages, and it does not allow for the calculation of a mean.

πŸ’‘Ordinal

Ordinal data represents variables that have a meaningful order but with potentially unequal intervals between values. The script cites examples such as satisfaction levels and ranks, which can be summarized by frequencies but with caution when calculating a mean due to the potential for misinterpretation.

πŸ’‘Interval/Ratio

Interval/Ratio data is the most precise level of measurement, applicable to variables that can be measured on a scale, such as age, weight, or the number of customers. The script explains that this type of data can be either discrete or continuous and allows for the calculation of various summary measures like mean, median, and standard deviation.

πŸ’‘Summary Statistics

Summary statistics are numerical values that summarize and describe the main features of a dataset. The script mentions mean, median, and standard deviation as common measures for interval/ratio data, which provide insights into the central tendency and dispersion of the data.

πŸ’‘Graphs and Charts

Graphs and charts are visual representations of data used to analyze and communicate patterns or trends. The script discusses how the level of measurement affects the choice of graph or chart, such as pie charts for nominal data, bar charts for ordinal data, and histograms for interval/ratio data.

πŸ’‘Pie Chart

A pie chart is a circular graph used to display nominal data, showing the proportion of each category in the dataset. The script uses pie charts to illustrate the distribution of preferred chocolate types among customers, with percentages indicating the share of each preference.

πŸ’‘Histogram

A histogram is a graphical representation of the distribution of interval/ratio data, showing the frequency of data points within specified ranges or 'bins'. The script suggests using histograms to display and analyze continuous data, such as the age of customers.

πŸ’‘Box Plot

A box plot is a standardized way of displaying the distribution of a dataset based on five summary statistics: minimum, first quartile, median, third quartile, and maximum. The script mentions box plots as a way to illustrate these statistics neatly for a variable.

πŸ’‘Line Chart

A line chart is a graph that displays data points connected by straight lines, often used to represent data over time. The script notes that line charts are best for displaying data that occurs sequentially, such as changes in customer satisfaction over a period.

Highlights

Data is central to statistical analysis and can be collected on various subjects like people, businesses, or time periods.

Variables record the measurements of interest, such as age, sex, and preferences, for each observation.

Data in spreadsheets is organized with each row representing an observation and each column a variable.

The level of measurement for a variable dictates the types of statistics, graphs, and analyses that can be applied.

Nominal level is the most basic, dealing with categorical data without an inherent order, like sex or color preferences.

Nominal data is summarized using frequencies or percentages, and calculating a mean is not applicable.

Ordinal level includes variables with a meaningful order but potentially unequal intervals, such as satisfaction levels.

Calculating a mean for ordinal data is common but requires careful consideration of its validity.

Interval/Ratio level is the most precise, applicable to quantifiable measurements like age, weight, and size.

Interval/Ratio data can be either discrete or continuous and allows for a wide range of mathematical analysis.

Mean, median, and standard deviation are common summary measures for Interval/Ratio data.

Data representation in graphs or charts should correspond to the level of measurement.

Nominal data is best displayed in pie charts, column charts, or bar charts.

Ordinal data should be presented in column or bar charts, avoiding pie charts.

Interval/Ratio data is optimally represented in bar charts, histograms, or line charts for time-based data.

Box plots are useful for illustrating summary statistics of a variable.

Helen's case study demonstrates how different levels of data are collected and analyzed in a real-world scenario.

Customer preferences for chocolate type are nominal and can be summarized and visualized using percentages.

Satisfaction and purchase likelihood are ordinal, requiring logical ordering in column charts.

Age, grocery spending, and chocolate bar purchases are interval/ratio data, allowing for mean calculations.

The type of analysis suitable for a dataset is determined by the level of measurement of its variables.

Transcripts

play00:00

Types of data: Nominal

play00:02

Ordinal Interval/Ratio

play00:06

Data is central to statistical analysis

play00:09

When we wish to find out more about a phenomenon or process we collect data.

play00:14

Usually we collect several measures on each person or thing of interest.

play00:19

Each thing we collect data about is called an observation.

play00:23

If we are interested in how people respond,

play00:25

then each observation will be a person.

play00:28

OR an observation could be a business

play00:31

or a product, or a period in time, such as a week.

play00:34

Variables record the measurements we are interested in.

play00:38

Age, sex and chocolate preference can all be stored as variables.

play00:43

For each observation we record a score or value for each of the variables.

play00:48

When we store this data in a spreadsheet or database,

play00:51

each row corresponds to a single observation

play00:55

and each column is a variable.

play00:57

Level of measurement

play00:59

The level of measurement used for a variable

play01:03

determines which summary statistics,

play01:04

graphs and analysis are possible and sensible.

play01:08

The Nominal level is the most basic level of measurement.

play01:13

Nominal is also known as categorical or qualitative.

play01:17

Examples of nominal variables

play01:19

are sex,

play01:21

preferred type of chocolate

play01:22

and colour.

play01:24

These are descriptions or labels with no sense of order.

play01:27

Nominal values can be stored as a word or text or given a numerical code.

play01:33

However, the numbers do not imply order.

play01:36

To summarise nominal data we use a frequency or percentage.

play01:41

You can not calculate a mean or average value for nominal data.

play01:46

The next level of measurement is ordinal.

play01:49

Examples of ordinal variables are rank, satisfaction,

play01:53

and fanciness!

play01:55

Ordinal variables have a meaningful order,

play01:58

but the intervals between the values in the scale may not be equal.

play02:02

For example the gap between first and second runners in a race may be small,

play02:06

whereas there is a bigger gap between second and third.

play02:09

Similarly there may be a big difference between satisfied and unsatisfied,

play02:14

but a smaller difference between unsatisfied and very unsatisfied.

play02:20

Like Nominal data, ordinal data can be given as frequencies.

play02:24

Some people state that you should never calculate a mean or average for ordinal data.

play02:29

However it is quite common practice, particularly in research regarding

play02:33

people's behaviour to find mean values for ordinal data.

play02:37

You should be careful if you do this to think about what it means and if it is justifiable.

play02:43

The most precise level of measurement is interval/ratio.

play02:47

This label includes things that can be measured rather than classified or

play02:51

ordered,

play02:52

such as number of customers

play02:54

weight, age and size.

play02:57

Interval ratio data is also known as scale, quantitative or parametric.

play03:02

Interval/Ratio data can be discrete, with whole numbers

play03:07

or continuous, with fractional numbers.

play03:09

Interval/Ratio data is very mathematically versatile.

play03:13

The most common summary measures

play03:15

are the mean, the median and the standard deviation.

play03:23

The way data should be represented in a graph or chart depends on the level of measurement.

play03:28

Nominal data can be displayed as a pie chart,

play03:31

column or bar chart

play03:32

or stacked column or bar chart.

play03:34

In most cases the best choice for a single set of nominal data

play03:38

is a column chart.

play03:41

Ordinal data must not be represented as a pie chart,

play03:44

but is best shown as a column or bar chart.

play03:47

Interval/ratio data

play03:49

is best represented as a bar chart or a histogram.

play03:52

For these the data is grouped.

play03:55

Box plots illustrate the summary statistics for a variable in a neat way.

play04:00

Data which occurs over time is best displayed as a line chart.

play04:05

Here is an example using different types of data.

play04:08

Helen sells choconutties.

play04:10

Helen is interested in developing a new product to add to her line of choconutties.

play04:15

She develops a questionnaire and asks a random sample of 50 of her customers

play04:18

to fill it out.

play04:20

She asks them their age and sex, how much they spend on groceries each week,

play04:25

how many chocolate bars they buy in a week,

play04:29

and which they like best out of dark, milk and white chocolate.

play04:30

She asks them how satisfied they are with choconutties:

play04:34

very satisfied, satisfied, not satisfied, very unsatisfied.

play04:39

And she asks them how likely they are to buy a whole box

play04:43

of 10 packets of choconutties.

play04:45

Helen enters the data in a spreadsheet.

play04:47

Each row has responses from one customer.

play04:50

Each column contains the measurements or scores for one variable.

play04:55

The type of chocolate preferred is nominal data.

play04:58

This can be shown in a pie chart or bar chart.

play05:02

We can summarise by saying that 46% of customers prefer Dark chocolate,

play05:05

40% prefer milk chocolate,

play05:06

and 14% prefer white chocolate.

play05:12

The measures of satisfaction and likelihood are ordinal level data.

play05:15

These should not be shown in a pie chart.

play05:18

The values should be put in a logical order in a column chart.

play05:21

We could say that 32% are very satisfied with choconutties and 72% of people are satisfied or very satisfied.

play05:26

and 72% of people are satisfied or very satisfied.

play05:31

The average satisfaction score comes to 2.06,

play05:34

which could be interpreted as satisfied.

play05:38

However it is debatable whether it is sensible to calculate a mean satisfaction score.

play05:44

Age, amount spent on groceries

play05:44

and number of chocolate bars are all interval/ratio data.

play05:50

These can be displayed on bar charts or histograms.

play05:53

We can say that for the customers in the sample,

play05:58

the mean age is 38 years, the mean amount spent on groceries is $192,

play06:01

and the mean number of chocolate bars bought per week is 3.3.

play06:06

These are all meaningful summary statistics.

play06:09

The type of analysis that is sensible for a given dataset

play06:13

depends on the level of measurement.

play06:15

You can find out more about this in the video, "Choosing the test".

Rate This
β˜…
β˜…
β˜…
β˜…
β˜…

5.0 / 5 (0 votes)

Related Tags
Data TypesStatisticsAnalysisNominalOrdinalIntervalRatioData AnalysisResearchSpreadsheetGraphs