What the world looks like to an algorithm

Verge Science
6 Nov 201807:58

Summary

TLDRThe video explores the fascinating yet sometimes flawed world of machine vision algorithms, which are increasingly integrated into daily life from unlocking phones to driving cars. It delves into the science behind teaching computers to 'see', highlighting the advancements in deep learning that have enabled AI to outperform humans in certain visual tasks. However, the script also points out the limitations of these systems, such as their narrow understanding and potential for error, using the intriguing artwork of Tom White to illustrate the stark differences between human and machine perception.

Takeaways

  • 👀 Machine vision algorithms see the world differently from humans, identifying objects like sharks or binoculars in abstract images where humans see only random arrangements.
  • 🚗 These algorithms are increasingly used in everyday life, from self-driving cars to content monitoring on the internet and unlocking smartphones.
  • 🔬 The science of computer vision dates back to the 1960s and has evolved significantly with the advent of AI and deep learning, leading to systems that can outperform humans in certain tasks.
  • 🏥 Deep learning has been used to create algorithms capable of identifying cancerous tumors more accurately than doctors and distinguishing between various dog breeds almost instantly.
  • 🎨 The script discusses the work of Tom White, an academic and artist, who created abstract prints by reverse engineering vision systems to highlight the differences in how algorithms perceive images.
  • 🤖 The process of creating White's prints involves a drawing system and a machine vision classifier, with the image being tweaked and reclassified multiple times to reflect the algorithm's interpretation.
  • 🧐 The prints challenge human perception, as staff at The Verge were asked to guess the objects represented, demonstrating the gap between human and algorithmic understanding.
  • 📚 Machine learning programs are trained on specific data sets, which can lead to issues when encountering new or unexpected data, such as a penguin in a zoo of known animals.
  • 🔍 The limitations of AI are described as narrow or brittle, with systems only working well in limited scenarios and often breaking down when faced with unfamiliar data.
  • 🚦 The reliance on machine vision in critical applications like self-driving cars is highlighted, where the ability to correctly identify objects is crucial for safety.
  • ❗ The script mentions the first fatal crash involving Tesla's self-driving system, which was partly due to the algorithm's failure to distinguish between a white tractor trailer and the sky.
  • 🤖 Despite the limitations, there is ongoing work to improve machine learning algorithms, with humans often involved in the decision-making process to address shortcomings.

Q & A

  • What do the pictures in the video represent to a computer?

    -The pictures are designed to be recognized by machine vision algorithms, which can see objects like a shark, binoculars, or explicit nudity where humans might only see random arrangements of lines and blobs.

  • What are some everyday applications of machine vision algorithms?

    -Machine vision algorithms are used in self-driving cars, internet content monitoring, and phone unlocking, among other applications.

  • What is the history of computer vision in relation to artificial intelligence?

    -The science of teaching computers to see dates back to the 1960s, coinciding with the creation of the field of artificial intelligence. Early systems were basic, but recent advancements in AI, particularly deep learning, have led to more sophisticated vision systems.

  • How do deep learning vision algorithms outperform humans in certain tasks?

    -Deep learning has enabled the creation of vision algorithms that can identify cancerous tumors more accurately than doctors or distinguish between various dog breeds in milliseconds.

  • Who is Tom White, and how does his work relate to the script's theme?

    -Tom White is an academic and artist from New Zealand who created bizarre prints by reverse engineering vision systems like those used by Google and Amazon. His work demonstrates the differences in how AI and humans perceive the world.

  • How does the process of generating Tom White's prints work?

    -The prints are generated using a production line of algorithmic programs. A drawing system creates abstract lines, which are then fed into a machine vision classifier that guesses the object. The drawing system tweaks the image based on the classifier's guesses and repeats the process.

  • What was the purpose of asking Verge staff to guess the objects represented in Tom White's prints?

    -The purpose was to see if people could think like a computer and understand how the machine vision algorithms interpret the abstract images.

  • What are some limitations of machine learning algorithms when it comes to recognizing patterns?

    -Machine learning algorithms may not understand the world beyond the data they are trained on and may make decisions based on patterns that do not make sense in real-world scenarios, such as identifying all striped animals as zebras.

  • How does the script illustrate the difference between human and machine vision?

    -The script uses Tom White's art and a Pictionary game with a human algorithm to show that while humans may struggle to interpret machine vision, machines also have difficulty understanding human interpretations of abstract images.

  • What implications does the difference between human and machine vision have for technologies like self-driving cars?

    -The difference in vision can be critical for technologies like self-driving cars, where the ability to correctly identify objects such as pedestrians and stop signs can be a matter of life and death.

  • How do machine learning engineers address the shortcomings of vision algorithms?

    -Machine learning engineers are aware of these shortcomings and often have humans in the loop to make decisions, ensuring that algorithms are not solely relied upon in critical applications.

  • What is Tom White's perspective on the limitations of machine vision algorithms?

    -Tom White finds it refreshing and comforting that computers still struggle with simple tasks like counting the number of wheels on a tricycle, suggesting that we should be thankful for these limitations.

Outlines

00:00

🤖 The Dichotomy of Human and Machine Vision

This paragraph delves into the differences between human and machine vision, highlighting how machine vision algorithms interpret images differently from humans. It introduces the concept of machine vision, which is increasingly integrated into everyday technologies like self-driving cars and smartphone unlocking. The script mentions the evolution of computer vision from basic systems to sophisticated AI-driven systems capable of outperforming humans in certain tasks. It also introduces Tom White, a New Zealand academic and artist, who created art pieces by reverse engineering vision systems to demonstrate the peculiar way AI perceives images. The paragraph concludes with a human experiment where participants attempt to interpret the AI-generated images, showcasing the gap in understanding between human and machine perception.

05:01

🔍 Exploring the Limits and Abstractions of AI Vision

The second paragraph explores the limitations and abstract nature of AI vision through the lens of Tom White's artwork and a practical example of training a machine learning program. It discusses how AI systems are trained on specific data sets and can struggle with unfamiliar objects or scenarios, such as a penguin in a zoo filled with zebras and giraffes. The paragraph also touches on the brittleness of AI systems, which can fail when encountering unexpected data. It uses Tom's art to illustrate this point, showing how an algorithm might confuse a cello with a cellist or blur the lines between an instrument and its player. The script ends with a human-algorithm game of Pictionary, emphasizing the stark contrast between human and machine interpretation of visual data, and raises concerns about the increasing reliance on machine vision in critical areas such as self-driving cars and surveillance systems.

Mindmap

Keywords

💡Machine Vision Algorithms

Machine vision algorithms are computer programs designed to interpret and make decisions based on visual input from the world, much like the human visual system. In the context of the video, these algorithms are used to recognize objects in images that may appear as random arrangements of lines and blobs to humans. The video highlights how these algorithms interpret images differently, which can lead to unexpected results and is central to the theme of how AI perceives the world differently from humans.

💡Deep Learning

Deep learning is a subset of machine learning that uses artificial neural networks with multiple layers to learn and make decisions based on large amounts of data. The video mentions that deep learning has revolutionized the field of computer vision, enabling the creation of sophisticated vision systems that can outperform humans in certain tasks. This concept is integral to the video's exploration of AI's capabilities and limitations in visual recognition.

💡Classifier Networks

Classifier networks are a type of machine learning model used to categorize data into different classes or groups. In the video, Tom White uses classifier networks to invert the recognition process and create art that reflects how these algorithms see the world. The video uses classifier networks to demonstrate the peculiarities of AI's understanding and the resulting artwork to illustrate the gap between human and machine perception.

💡Artificial Intelligence (AI)

Artificial intelligence refers to the simulation of human intelligence in machines that are programmed to think like humans and mimic their actions. The video's theme revolves around AI's development, its application in computer vision, and the challenges it faces in interpreting the world as humans do. AI's role in everyday life, such as in self-driving cars and phone unlocking, is also discussed, emphasizing the importance of its accurate perception.

💡Tom White

Tom White is a New Zealand-based academic and artist featured in the video. He is known for creating art by reverse engineering vision systems, which helps to visualize and understand how AI perceives images. His work serves as a central example in the video to illustrate the differences between human and machine vision.

💡Vision Systems

Vision systems in the context of the video refer to the technological infrastructure that enables machines to interpret visual data. These systems are crucial for applications like self-driving cars, content monitoring on the internet, and phone security. The video discusses the evolution of these systems from basic to sophisticated, capable of tasks that can surpass human capabilities in speed and accuracy.

💡Self-Driving Cars

Self-driving cars, also known as autonomous vehicles, are a prominent application of machine vision algorithms. The video points out the reliance of these vehicles on the computer's ability to accurately perceive and interpret visual data, which is critical for safe navigation. The video also mentions a fatal crash involving a self-driving car due to an algorithm's failure to distinguish between a white tractor trailer and the sky, underscoring the potential risks of relying on machine vision.

💡Human-in-the-Loop

Human-in-the-loop is a concept where human oversight is integrated into AI systems to ensure that decisions made by the algorithms are accurate and appropriate. The video mentions that despite the advancements in AI, most algorithms still have some form of human involvement to address the limitations of machine vision and decision-making.

💡Narrow AI

Narrow AI, also known as weak AI, refers to AI systems that are designed and trained for a specific task or narrow set of tasks. The video describes AI as narrow or brittle, indicating that these systems work well within their limited scope but may fail when faced with unexpected data or situations outside their training.

💡Pictionary

Pictionary is a game where players draw clues for their teammates to guess a word or phrase. In the video, Pictionary is used as a metaphor to illustrate the differences between human and machine vision by having the computer 'draw' abstract representations of objects based on its understanding, which then humans attempt to guess.

💡CCTV Cameras

CCTV cameras are closed-circuit television systems used for surveillance and security purposes. The video mentions the application of AI algorithms in CCTV cameras as an example of where machine vision is being integrated into everyday life, raising concerns about the potential for misinterpretation and the importance of accurate AI perception.

Highlights

Pictures are designed to be recognized by machine vision algorithms, which see objects differently than humans.

Machine vision is increasingly used in everyday life, including self-driving cars, internet content monitoring, and phone unlocking.

The science of teaching computers to see dates back to the 1960s, coinciding with the creation of artificial intelligence.

Deep learning has revolutionized computer vision, enabling systems to outperform humans in certain tasks.

Deep learning has been used to create algorithms that can identify cancerous tumors and differentiate between dog breeds.

AI does not perceive the world as humans do, as demonstrated by Tom White's prints that confuse human and machine vision.

Tom White reverse engineered vision systems to create prints that challenge our understanding of how algorithms see the world.

The prints are generated through an iterative process involving a drawing system and machine vision classifier.

Humans were asked to guess objects in the prints, highlighting the gap between human and computer perception.

AI systems are described as narrow or brittle, only working well in limited scenarios and breaking down with unexpected data.

Tom's art reveals AI's limitations, such as blending a cello with the musician due to lack of understanding.

Vision algorithms struggle with tasks like counting, as seen with the tricycle example.

The differences between human and machine vision were explored through a Pictionary game with an algorithm.

Machine vision's increasing role in our lives, such as in self-driving cars and CCTV cameras, raises concerns about its limitations.

The first fatal Tesla self-driving crash was partly due to an algorithm failing to distinguish a white tractor trailer from the sky.

Machine learning engineers are aware of AI's shortcomings and often include humans in the decision-making process.

Tom White finds comfort in AI's simplicity, suggesting it's good to know their limitations and how they perceive the world differently.

The stories people create while trying to interpret the prints reflect the subjective nature of human perception.

Transcripts

play00:00

(gentle music)

play00:03

- What do you see in these pictures?

play00:05

Objects, faces, or do they just look

play00:07

like random arrangements of lines and blobs?

play00:10

If you don't see anything specific it's probably

play00:13

because you're not a computer.

play00:14

These pictures were specially designed

play00:16

to be recognized by machine vision algorithms.

play00:19

You see blobs, they see a shark,

play00:22

binoculars, explicit nudity.

play00:25

But the algorithms that see these things

play00:27

are the same ones being used

play00:29

in more and more parts of everyday life.

play00:31

They steer self-driving cars, they monitor content

play00:34

on the internet and they even unlock your phone.

play00:36

And the fact that they don't see the world the same way

play00:39

you do it could be a problem.

play00:41

(cosmic music)

play00:44

The science of teaching computers to see goes back

play00:47

to the 1960s and it coincides with the creation

play00:50

of the field of artificial intelligence.

play00:53

Early computer vision systems were very basic.

play00:55

They could process only the simplest versions of 3D scenes

play00:59

rendering the world in crude shapes and planes.

play01:02

But in recent years a revolution in AI,

play01:04

particularly deep learning, has created sophisticated

play01:07

vision systems which can outperform humans

play01:09

at a number of tasks.

play01:11

To date we've used deep learning to create vision

play01:13

algorithms that can identify cancerous tumors

play01:16

better than a doctor or tell the difference

play01:18

between a hundred different dog breeds in a millisecond.

play01:20

Or they can just tell you whether the food

play01:22

on your plate is a hotdog or not.

play01:23

Okay, most humans can do that too.

play01:26

But for all these achievements AI doesn't look

play01:28

at the world the same way humans do

play01:30

and that's what these bizarre prints

play01:32

are meant to demonstrate.

play01:34

They're the work of Tom White, an academic

play01:36

and an artist from New Zealand.

play01:38

He made them by essentially reverse engineering

play01:40

a number of vision systems like those used

play01:43

by Google and Amazon.

play01:44

- So I started looking at these classifier networks

play01:46

which knew how to classify or how to understand

play01:51

an image and I was wondering if I can invert that process.

play01:54

- [James] The prints are generated using a production line

play01:56

of algorithmic programs.

play01:58

First, a drawing system generates some abstract lines

play02:01

and the image is fed into a machine vision classifier

play02:03

which then tries to guess what object it might be.

play02:07

Based on the classifier's guesses the drawing system

play02:09

then tweaks the image and feeds it through again.

play02:11

- I mostly take a hands-off approach because I really want

play02:16

to know how the algorithms see the world.

play02:19

And so after I set the systems up I kind of sit back

play02:22

and let it run for a long time and see what comes out.

play02:25

- [James] But if that's an algorithm's view

play02:26

what does it look like to humans?

play02:28

We asked Verge staff to guess which object

play02:31

each print represented.

play02:32

In essence we asked people can you think like a computer?

play02:36

- Okay, you want me to say what I think that is?

play02:40

- Like I'm just imagining like it's under water

play02:41

and they're a bunch of fish or bugs.

play02:45

- This one just looks like hot topic.

play02:49

- A car?

play02:51

- [Tom] It's a Starfish.

play02:51

- It's a starfish, great.

play02:54

- A train on train tracks, subway.

play02:57

- This is someone getting pushed in front of a train.

play02:59

- It definitely looks like somebody's conducting

play03:02

with the wave motions.

play03:04

- Like something on a stove?

play03:06

- [Tom] It's a spider.

play03:08

- Oh, for real?

play03:09

- (chuckles) Okay.

play03:10

- This is a dancing elephant.

play03:12

- Like the wheel of a boat.

play03:14

- An elephant.

play03:18

- This was a whale.

play03:19

Is that a whale?

play03:20

Did people do good on this?

play03:21

Like am I the, okay.

play03:24

- Tom's work plays with the limits of AI

play03:26

but in an abstract way.

play03:28

So here's a more concrete example.

play03:30

Let's say you're a computer scientist

play03:32

and you're training a machine learning program

play03:34

to recognize animals at the zoo.

play03:36

You would collect a bunch of pictures of zebras, lions,

play03:39

giraffes, et cetera, and you'd feed them into this algorithm

play03:41

which would then look for patters in this data.

play03:44

So after studying the pictures it might conclude

play03:47

that if it sees something with a long neck, boom, giraffe.

play03:50

If it sees stripes that's a zebra and so on.

play03:53

There are problems with this approach though.

play03:55

First, the algorithm doesn't know anything

play03:58

about the world beyond that data.

play03:59

So if you get a new edition to the zoo,

play04:02

like a penguin, it's not gonna know what that is.

play04:05

The second problem is that the way it makes decisions

play04:07

might not be very sensible.

play04:09

If it decides that all animals with stripes are zebras,

play04:12

for example, what happens when it sees a tiger?

play04:15

This is why researchers often describe AI

play04:17

as narrow or brittle.

play04:19

These systems only work in very limited scenarios

play04:22

and when they come across unexpected data

play04:24

they often break down.

play04:25

This becomes really clear when you look back at Tom's art.

play04:28

For example, look at this red print here (cello music).

play04:31

The object it's supposed to represent is a cello.

play04:34

You can see the curves of the body of the instrument

play04:36

and the vertical lines with strings.

play04:38

But there's also this other shape hovering

play04:40

behind it in light red.

play04:42

That's the person playing the cello.

play04:45

Now the reason this figure appears is because the pictures

play04:48

that we used to train the algorithm

play04:50

included the cellos and the cellists too.

play04:53

But because the program has no understanding

play04:55

of what an instrument or a musician is

play04:57

it just blurs the two together.

play05:01

Or there's Tom's tricycle.

play05:03

- [Tom] When I see a tricycle the first thing I think

play05:05

about are the wheels and maybe in the back

play05:07

of my head I count them.

play05:08

- [James] But vision algorithms are terrible at counting

play05:11

so the number of wheels is no help.

play05:13

They lock onto the shape of the frame, a triangle,

play05:16

and think that that represents the essence

play05:17

of a tricycle at its best.

play05:20

- To me this shape that it comes up with

play05:22

doesn't remind me of a tricycle.

play05:25

It looks like a bunch of lines but I guess I gain

play05:28

an appreciation for the different ways of viewing things,

play05:30

It's almost like visiting another culture

play05:32

that has different ways of interpreting

play05:35

or relating to objects.

play05:38

- [James] To drive home the differences between human

play05:40

and machine vision we decided to turn the tables

play05:43

on the algorithms.

play05:44

So we took the same objects in Tom's art

play05:46

and abstracted them using a human algorithm.

play05:50

We made the algorithm play Pictionary against us.

play05:54

- So I'm drawing a spider, weirdly kind of intimidates me.

play06:02

See what the computer thinks.

play06:04

Well they said it was an invertebrate but I actually

play06:07

can't remember now if spiders are invertebrates or not.

play06:10

- The cello says text, black and white drawing, wing, joint.

play06:17

The cellos are more complex so it didn't get that at all,

play06:20

like anywhere near that.

play06:22

- But why does this matter?

play06:24

Well because despite the limitations of machine vision

play06:27

we're trusting it with more and more aspects of our lives.

play06:30

Take self-driving cars, for example.

play06:32

In the future, the idea is that they'll rely totally

play06:35

on what computers can and cannot see, no humans needed.

play06:39

So teaching a machine to spot the difference

play06:41

between a pedestrian and a stop sign will literally

play06:43

be a life or death matter.

play06:45

The first fatal crash involving Tesla's self-driving

play06:48

self-drive, for example, was partly caused

play06:50

by algorithms which couldn't distinguish

play06:52

between the side of a white tractor trailer

play06:54

and the bright sky behind it.

play06:56

And when you think about the other places we're starting

play06:58

to use these algorithms in CCTV cameras,

play07:00

in military drones, it becomes worrying.

play07:04

Now this doesn't mean we're building

play07:05

completely broken systems.

play07:07

Machine learning engineers are aware of these shortcomings

play07:09

and most algorithms, like the ones we described,

play07:12

still have humans in the loop somewhere making decisions.

play07:15

And Tom, he takes solace in the fact that the algorithms

play07:18

aren't any smarter than this.

play07:20

He suggests that we should be thankful that a computer

play07:22

still struggles to count the numbers

play07:24

of wheels on a tricycle.

play07:26

- I think it's kind of refreshing to see

play07:28

even when they have very simple models of the world,

play07:31

in a way that's comforting.

play07:33

It's good to know how these things work.

play07:35

It sometimes can give us insight into different ways

play07:38

that we can see the world.

play07:44

- [Lady] In this one I have birds, like it's a bird.

play07:47

It's a bird maybe being crushed in someone's fist.

play07:50

It's like a bird.

play07:51

- [Tom] It's a butterfly.

play07:52

- It's a but, yeah okay.

play07:53

- [Tom] I like the stories. - I see it.

play07:55

- [Tom] A bird being crushed in someone's fist?

Rate This

5.0 / 5 (0 votes)

Related Tags
AI VisionMachine LearningArtificial IntelligenceComputer VisionDeep LearningAlgorithm ArtHuman PerceptionTech InnovationVision SystemsAI Limitations