Data & Infographics: Crash Course Navigating Digital Information #8

CrashCourse
26 Feb 201913:02

Summary

TLDRIn this Crash Course episode, John Green explores the nuances of data and statistics, emphasizing the importance of context and source reliability. He discusses how data can be powerful yet deceptive, and urges viewers to critically evaluate its relevance and the credibility of its sources. Green also highlights the potential for manipulation in data visualization, illustrating how charts can be designed to mislead. The episode concludes with a call for vigilance in interpreting data and infographics to ensure quality decision-making.

Takeaways

  • 😀 Data is powerful evidence that can be quickly absorbed but is not inherently neutral due to human involvement in its collection, interpretation, and presentation.
  • 🔍 The context and source of data are crucial; even seemingly positive statistics can be misleading if not properly contextualized.
  • 📊 Data visualizations like charts and infographics can be both informative and misleading, depending on how they are designed and what data they represent.
  • 🤔 It's important to critically evaluate data by asking if it supports the claim and if the source is reliable.
  • 🏓 The example of Serena Williams and tennis penalties illustrates how raw data can be misinterpreted without considering the rate of occurrence.
  • 🔎 Lateral reading is essential to understand who commissioned and conducted research to assess the reliability of data sources.
  • 👀 The 'seeing is believing' trap can lead to uncritical acceptance of data, which is why understanding how data is presented is vital.
  • 🌐 Data can be made to seem more or less significant by adjusting the scale and context in which it is presented, as shown by climate change and graduation rate charts.
  • 📉 Misleading data visualizations can be created by using an inappropriate scale or by focusing on a narrow aspect of the data.
  • 🧐 Always check the accuracy, relevance, source reliability, and honesty of data presentation when encountering data visualizations.

Q & A

  • What is the key message in the introduction of the Crash Course episode?

    -The key message is that data, like statistics, needs to be placed in context to understand its true meaning. Simply seeing numbers without understanding their source and context can be misleading.

  • Why does John Green mention the survey he conducted with 10 Crash Course employees?

    -John Green uses this example to show how data can be manipulated. Although 90% of people surveyed said they loved Crash Course, it’s misleading because the survey was conducted with only 10 employees, making it unreliable.

  • What does Mark Twain’s quote, 'There are three kinds of lies: Lies, damned lies, and statistics,' imply?

    -The quote highlights that statistics, despite appearing factual and neutral, can be used to mislead or manipulate perceptions depending on how they are presented or interpreted.

  • How does the 2015 Stanford History Education Group study relate to people's perception of data?

    -The study found that many middle schoolers believed data in a comment was credible without checking its source, showing how easily people can be swayed by the mere presence of statistics, even when there is no reason to trust the data.

  • What’s the problem with Glenn Greenwald's tweet about male tennis players being punished more often?

    -Glenn Greenwald’s tweet is misleading because it only shows the total number of punishments, not the rate of punishment relative to misbehavior. To determine whether men are punished more frequently, we need to know how often both men and women misbehave.

  • What is 'lateral reading,' and how does it help in evaluating data?

    -Lateral reading involves opening new tabs to research the credibility and authority of the data's source. This helps verify whether the source is reliable, why the data was collected, and whether the source has a vested interest in the results.

  • Why is the '500 million straws per day' statistic often criticized?

    -This statistic, which is frequently cited, originated from a 9-year-old who conducted informal research by calling straw manufacturers. The figure lacks scientific rigor, making it unreliable, though widely circulated.

  • What should we look for when analyzing data visualizations like charts or infographics?

    -We need to ensure the data is accurate, comes from a reliable source, and is presented fairly without manipulation. Visualizations should not sacrifice accuracy for aesthetics or mislead through techniques like inappropriate scaling.

  • How can manipulating the y-axis of a graph mislead viewers, as seen in the climate change and graduation rate examples?

    -Manipulating the y-axis can either exaggerate or downplay trends. In the climate change example, zooming out made the temperature change appear minor, while zooming in on the graduation rate chart made a modest increase look dramatic.

  • What is the 'proportional ink principle' in data visualization?

    -The proportional ink principle states that the size of inked areas in a chart should be proportional to the data values they represent. This ensures that visual representations are accurate and not exaggerated or minimized.

Outlines

00:00

📊 Understanding Data and Statistics

John Green introduces the topic of data and statistics, emphasizing the importance of context and source reliability in interpreting information. He uses a humorous example of a survey conducted among Crash Course staff to highlight how statistics can be misleading without proper context. Green explains that data is a powerful form of evidence but can be deceptive if not critically evaluated. He points out that humans are not neutral in gathering and presenting data, which can lead to biases. The paragraph concludes with a discussion on how data, often consumed as statistics or visual representations, can be both helpful and deceptive, and the importance of questioning the support and reliability of data sources.

05:02

🔍 Evaluating Data Sources and Visualizations

This paragraph delves into the necessity of evaluating the reliability of data sources and the potential for data visualizations to be misleading. Green discusses the concept of lateral reading to verify the credibility of data sources and provides an example of a misleading statistic about straw usage that originated from a child's report. The paragraph also touches on the potential for vested interests to skew data presentation, such as in advertising. Green then transitions into discussing the power and pitfalls of data visualizations, using examples to illustrate how charts and graphs can be designed to either accurately represent data or to mislead by manipulating scale, context, or presentation.

10:02

📈 The Art and Deception of Data Visualization

The final paragraph focuses on the art of data visualization and the potential for manipulation through design choices. Green discusses how charts and graphs can be made to look appealing but may not accurately represent the data. He uses examples to show how altering the scale or scope of a graph can dramatically change the perceived implications of the data. The paragraph emphasizes the need for critical analysis of data visualizations to ensure they are based on accurate data and presented fairly. Green concludes by encouraging viewers to develop the skill to discern well-designed from poorly designed data visualizations and to maintain a critical eye for reliability and misrepresentation in data presentation.

Mindmap

Keywords

💡Data

Data refers to raw quantitative or qualitative information, such as facts, figures, or observations. In the video, data is portrayed as a powerful form of evidence that can provide insights into the world around us. However, the video emphasizes that data is not neutral, as humans collect and interpret it, which means it can be misrepresented depending on the context.

💡Statistics

Statistics are a specific type of data, typically presented in numerical form, often used to support arguments or make claims. The video discusses how statistics can be particularly deceptive because they appear neutral and authoritative, but they can be used to mislead, as seen in the example of Serena Williams and male tennis players’ punishments.

💡Source

The source refers to the origin of data or information. The video highlights the importance of understanding where data comes from and whether the source is reliable. For example, the statistic about Americans using 500 million straws a day came from a report by a 9-year-old, which raises questions about its credibility.

💡Context

Context refers to the circumstances or background that help give meaning to data or claims. The video argues that placing data in its proper context is essential to avoid misinterpretation, as demonstrated with statistics like the punishment rates in tennis, where the absence of additional context led to misleading conclusions.

💡Data visualization

Data visualization is the representation of data in visual formats such as graphs, charts, or infographics. The video explains that while these tools can make data easier to understand, they can also be manipulated to mislead audiences, as seen in examples like the climate change chart that downplays temperature changes.

💡Reliability

Reliability refers to the trustworthiness or credibility of data and its sources. The video stresses the importance of verifying whether data comes from an authoritative and unbiased source. For example, the video mentions Pew Research Center as a reliable source, as opposed to a biased or self-interested entity.

💡Misrepresentation

Misrepresentation occurs when data or information is presented in a misleading or inaccurate way. The video discusses how both the manipulation of statistics and the improper use of data visualizations, such as the example of gun control statistics, can lead to skewed or false conclusions.

💡Lateral reading

Lateral reading is the practice of checking the credibility of information by consulting multiple sources. The video suggests this method as a critical skill for evaluating the reliability of data, emphasizing the need to investigate who conducted the research and why, especially when encountering questionable data or claims online.

💡Proportional ink principle

The proportional ink principle is a guideline for data visualization, stating that the size of visual elements should be proportional to the data they represent. The video illustrates how violating this principle can distort reality, such as in the chart showing exaggerated graduation rate improvements during Obama’s administration by manipulating the scale.

💡Critical thinking

Critical thinking refers to the ability to analyze and evaluate information before accepting it as true. The video advocates for applying critical thinking when examining data and statistics, urging viewers to question whether the data supports the claim and if it’s presented fairly, as seen in the example of the misleading gun control chart.

Highlights

90% of people polled say they love Crash Course and find it reliable, but the survey was conducted among only 10 Crash Course staff members.

Data is powerful evidence, but its interpretation can be influenced by the context and source.

Statistics can be used to deceive due to their seemingly neutral and irrefutable nature.

Data is not neutral; it is gathered, interpreted, and presented by flawed humans.

A study by Stanford History Education Group shows that students often trust data without verifying its source.

When encountering data, ask if it supports the claim and if the source is reliable.

Example of data misinterpretation in the context of Serena Williams' penalties at the 2018 U.S. Open.

Statistics can show raw numbers but not the rate, which is crucial for understanding data.

Lateral reading is essential to verify the reliability of data sources.

The source of data can have vested interests that affect the data's presentation.

Data visualizations can be creative but also misleading if not presented accurately.

A chart claiming 'good guys with guns' save lives is based on speculation rather than actual data.

Data visualizations should be checked for accuracy, relevance, reliable sourcing, and honest presentation.

Misleading charts can be created by manipulating the scale or context of the data visualization.

The proportional ink principle states that the size of an area in a chart should be proportional to the data it represents.

Well-designed data visualizations are crucial for accurate interpretation of data.

The importance of maintaining a critical eye for data reliability and misrepresentation in the age of infographics and big data.

Transcripts

play00:00

Hi I’m John Green.

play00:01

This is Crash Course: Navigating Digital Information.

play00:03

So what would you say if I told you that 90% of people polled say that they love Crash

play00:08

Course and think we offer consistently reliable and accurate information on the most important

play00:13

educational topics.

play00:15

You might say, “Hold on.

play00:16

I’ve seen the comments.

play00:17

That can’t be true.”

play00:18

And you’d be kind of right, but I would also be kind of right, because I did do that

play00:24

survey, and 90% of people did agree with those positive statements about Crash Course--but

play00:29

I surveyed 10 people who work on Crash Course.

play00:32

It would’ve been 100%, but Stan said, “Is this for a bit?

play00:35

I’m not participating.”

play00:36

Anyway, whether it’s 4 out of 5 dentists or 9 out of 10 crash course viewers, source

play00:41

and context can make all the difference.

play00:44

We like to think of data as just being cold, hard facts, but as we’ve already learned

play00:48

in this series, there is no single magical way to get at the singular truth.

play00:54

We have to place everything in its context--even statistics.

play00:59

In fact, especially statistics.

play01:01

INTRO

play01:10

Okay, so data is raw quantitative or qualitative information, like facts and figures, survey

play01:17

results, or even conversations.

play01:19

Data can be derived from observation, experimentation, investigation or all three.

play01:25

It provides detailed and descriptive information about the world around us.

play01:29

The number of teens who use Snapchat, the rate at which millennials move in or out of

play01:34

a neighborhood, the average temperature of your living room -- those are all data points.

play01:38

And data is a really powerful form of evidence because it can be absorbed quickly and easily.

play01:43

Like we often consume it as numbers, like statistics, or as visual representations,

play01:48

like charts and infographics.

play01:50

But as Mark Twain once famously noted: “There are three kinds of lies.

play01:54

Lies, damned lies, and statistics.”

play01:58

Statistics can be extraordinarily helpful for understanding the world around us, but

play02:02

because statistics can seem neutral and irrefutable, they can be used to profoundly deceive us

play02:09

as well.

play02:10

The truth is neither data nor interpretations of it, are neutral.

play02:14

Humans gather, interpret, and present data and we are flawed, complex, and decidedly

play02:21

unneutral.

play02:22

Unfortunately, we often take data at face value.

play02:25

Just like with photos and videos, we can get stuck in the “seeing is believing” trap

play02:29

because we don’t all have the know-how to critically evaluate statistics and charts.

play02:34

Like a Stanford History Education Group study from 2015 bears this out.

play02:38

SHEG, developed the MediaWise curriculum that this series is based on.

play02:41

And they asked 201 middle schoolers to look at this comment on a news article.

play02:45

As you can see, the comment includes healthcare statistics, but doesn’t say where they came

play02:50

from.

play02:51

It doesn’t provide any biographical information on the commenter either.

play02:54

But, 40% of the students indicated they’d use that data in a research paper.

play03:00

In fact many cited the statistics as the reason they found the comment credible and useful.

play03:05

The sheer existence of quote unquote data enhanced its credibility despite there being

play03:11

no real reason to trust that data.

play03:14

Whenever we come across data in the wild, we should ask ourselves two questions:

play03:18

Does this data actually support the claim being made?

play03:21

And is the source of this data reliable?

play03:24

Here’s an example when it comes to data relevance.

play03:26

At the 2018 U.S. Open, Serena Williams was penalized for yelling at the umpire and smashing

play03:32

her racket during the game.

play03:33

On the court, she argued that men yell far worse things at umpires and physically express

play03:38

their emotions all the time without being penalized and a few weeks later, journalist

play03:42

Glenn Greenwald cited a New York Times story in a tweet:

play03:45

“Now, NYT just released a study of the actual data: contrary to that narrative, male tennis

play03:51

players are punished at far greater rates for misbehavior, especially the ones relevant

play03:56

to that controversy: verbal abuse, obscenity, and unsportsmanlike conduct”

play04:01

Well that sounds very authoritative.

play04:02

And also he linked to a table that showed that far more men have been fined for racket

play04:06

throwing and verbal abuse than women during grand slam tournaments.

play04:10

However, as statistician Nate Silver helpfully pointed out, this stat only shows that men

play04:16

are /punished/ more, which could be because they misbehave more.

play04:21

So all these statistics actually show is the raw number of punishments, not the rate of

play04:26

punishment despite Greenwald’s claims.

play04:29

To get the rate of punishment we’d have to divide the number of punishments by how

play04:32

many times men and women misbehave, and that data isn’t provided here.

play04:37

So the data in the end does not support Greenwald’s tweet at all, making his claim that male tennis

play04:43

players are punished more frequently… problematic at best.

play04:47

To be fair Serena Williams claim is also anecdotal, although, you know she does watch a lot of

play04:51

tennis.

play04:52

We also need to investigate whether the source providing the data is reliable, and we can

play04:56

do that through lateral reading.

play04:58

That means opening new tabs to learn more from other sources about:

play05:02

who commissioned the research behind data , who conducted the research, and why

play05:07

We also need to know if the source of the the information is authoritative, or in a

play05:10

good position to gather that data in the first place.

play05:13

Like remember in episode 3 of this series when we talked about the claim that Americans

play05:17

use 500 million straws per day?

play05:20

We couldn’t confirm how many straws Americans actually use every day, but we did see that

play05:25

sources across the web cited that statistic even though we found out that it came from

play05:30

a 2011 report written by a then-nine year old child, Milo Cress.

play05:36

To come up with the figure, he called up straw manufacturers to ask how many straws they

play05:40

made.

play05:41

There’s no way of knowing if those manufacturers were telling the truth, or if the group he

play05:45

called is representative of the whole industry.

play05:48

He was 9.

play05:49

He was obviously a very bright and industrious 9 year old, but he was 9!

play05:53

Apologies to all the 9 yr olds watching.

play05:55

Thank you for being careful in how you navigate digital information friends.

play05:59

A more reliable source of such far-reaching information might be a nonpartisan research

play06:04

organization like the Pew Research Center.

play06:06

They’re known for reliable, large-scale studies on U.S. trends and demographics.

play06:11

Once we know who a source of data is, whether they’re authoritative, and why they gathered

play06:16

it, we should ask ourselves what perspective that source may have.

play06:19

They could have a vested interest in the results.

play06:22

Like the beauty influencer you follow who’s always saying 92% of users of this snail slime

play06:27

facial get glowing skin in 10 days.

play06:30

That study may be accurate but there also may be a hashtag-ad in the caption to quietly

play06:35

let you know that the brand in question is paying them.

play06:38

But forget about snail slime.

play06:39

Have I told you about Squarespace?

play06:41

We have to take into account when people cite data that helps them make money.

play06:46

Including me.

play06:47

Alright, so once we know more about where our data comes from, it’s time to analyze

play06:51

how it’s presented.

play06:53

Data visualizations, like charts and graphs and infographics, can be amazing ways of displaying

play06:57

information because one they’re fun to look at, and two the best infographics take complex

play07:04

subjects and abstract ideas and turn them into something that we understand.

play07:09

Like I love this one that shows how factual movies “based on a true story” really

play07:13

are.

play07:14

Oh, and this one on cognitive biases.

play07:15

Although I might be cognitively biased towards appreciating a graphic about cognitive biases.

play07:21

The great thing about data visualization is that it’s a creative field, limited only

play07:25

by a designer’s imagination.

play07:27

But of course with artistic license comes the ability to present data in ways that sacrifice

play07:33

accuracy.

play07:34

It’s really quite easy to invent a nice-looking graphic that says whatever you want it to

play07:39

say.

play07:40

So we need to read them carefully and make sure there’s actually data behind a data

play07:44

visualization.

play07:45

For instance, look at this chart.

play07:46

It makes a claim that, when guns are legal, lives are saved because gun owners prevent

play07:51

deadly crimes -- the “good guys with guns” theory.

play07:54

But if you read the fine print, the chart acknowledges that statistics are not kept

play07:58

on crime /prevention/, or crimes that never happened -- so these figures are not based

play08:03

on real data at all.

play08:05

The chart also says that fewer homicides take place when guns are legal than when they’re

play08:10

banned.

play08:11

But what it doesn’t say is where this change would supposedly take place, and over what

play08:15

span of time.

play08:17

For instance homicides went down in Australia after strict gun control legislation was passed

play08:21

on the other hand they also went down in the United States as gun ownership increased.

play08:25

What is clear upon closer inspection is that this graphic, which initially appears to have

play08:30

some pretty dramatic estimates about gun control, is by its own admission mostly speculation.

play08:35

To trust a data visualizations we need to make sure that it is based on real data AND

play08:41

that the data is presented fairly.

play08:43

Let’s go to the Thought Bubble.

play08:45

Here’s a graph that was posted to Twitter by The National Review, a conservative site

play08:49

that often denies the effects of climate change.

play08:51

It uses data from NASA on the average global temperature from 1880 to 2015.

play08:56

It looks like a nearly straight line, with only a slight increase at the end and the

play09:00

tweet, “the only #climatechange chart you need to see”

play09:04

implies that it once and for all shows that the climate isn’t really getting warmer.

play09:09

However, the y-axis of this chart shows -10 to 110 degrees,

play09:14

which makes the scale of this data very small.

play09:18

One might say that the chart misleads by zooming out too far.

play09:22

If, for instance, the scale was truncated to show just 55 to 60 degrees, as in this

play09:28

Washington Post graphic using the same data, the change over time looks much more dramatic.

play09:33

And the original post also leaves out some much needed context.

play09:37

The entire globe shifting its average temperature by even a couple degrees over the period shown

play09:42

is extremely unusual and has an outsized impact on how the climate

play09:47

functions.

play09:48

The first chart does not present the change in this data or its significance in good faith.

play09:52

On the other hand, data visualization can also be very misleading if it zooms in too

play09:57

much.

play09:58

this chart produced by the administration of President Barack Obama shows how a truncated

play10:02

y-axis can /create/ manipulation, not solve it.

play10:06

The data behind this chart on graduation rates is reliable, but by zooming in the scale to

play10:11

show from around 70 to 85%, it makes the change throughout Obama’s administration look much

play10:17

more dramatic.

play10:18

Here’s what it would look like if you could see the entire scale.

play10:21

The increase in graduation rates looks much less significant.

play10:24

This follows the proportional ink principle of data visualization.

play10:28

The size of a filled in or inked area should be proportional to the data value it represents.

play10:34

Thanks, Thought Bubble.

play10:35

So a few simple tweaks to how data is presented can really make a big difference in how it’s

play10:41

interpreted.

play10:42

Whenever we encounter data visualizations, we need to check that the data is accurate

play10:45

and relevant, that its source is reliable, and that the information is being presented

play10:50

in a way that is honest about the conclusions it draws.

play10:53

Actually, once you get the hang of sorting the useful, well-designed data visualizations

play10:57

from poorly designed ones, the bad ones can be pretty entertaining.

play11:01

If you’d like to see some exceptionally terrible charts, take a spin through viz.wtf

play11:06

or the subreddit data is ugly.

play11:08

I especially fond of this completely indecipherable chart about the Now That’s What I Call Music

play11:13

CDzs, courtesy of the BBC.

play11:15

The challenge and opportunity of images is that they are so eye-catching that we sometimes

play11:20

forget that they’re created by and for humans who have the ability to manipulate them for

play11:26

their own ends.

play11:28

To make our information of lower quality and thereby make our decisions of lower quality.

play11:34

And the use of infographics and big data have become even more popular as our attention

play11:38

spans have waned.

play11:40

After all, it’s much easier to read a pie chart than an essay or an academic report.

play11:44

Plus it fits into a tweet.

play11:46

In summary, whether you’re encountering raw data on its own or visual representations

play11:50

of it, it’s very important to keep a critical eye out for reliability and misrepresentation.

play11:56

Thank you for spending several minutes of your waning attention with us we’re going

play12:00

to get deeper into that next time I’ll see you then.

Rate This

5.0 / 5 (0 votes)

Etiquetas Relacionadas
Data AnalysisCritical ThinkingMedia LiteracyStatisticsSource ReliabilityData VisualizationResearch MethodsDigital EducationInfographicsCredibility Assessment
¿Necesitas un resumen en inglés?