What the world looks like to an algorithm
Summary
TLDRThe video explores the fascinating yet sometimes flawed world of machine vision algorithms, which are increasingly integrated into daily life from unlocking phones to driving cars. It delves into the science behind teaching computers to 'see', highlighting the advancements in deep learning that have enabled AI to outperform humans in certain visual tasks. However, the script also points out the limitations of these systems, such as their narrow understanding and potential for error, using the intriguing artwork of Tom White to illustrate the stark differences between human and machine perception.
Takeaways
- 👀 Machine vision algorithms see the world differently from humans, identifying objects like sharks or binoculars in abstract images where humans see only random arrangements.
- 🚗 These algorithms are increasingly used in everyday life, from self-driving cars to content monitoring on the internet and unlocking smartphones.
- 🔬 The science of computer vision dates back to the 1960s and has evolved significantly with the advent of AI and deep learning, leading to systems that can outperform humans in certain tasks.
- 🏥 Deep learning has been used to create algorithms capable of identifying cancerous tumors more accurately than doctors and distinguishing between various dog breeds almost instantly.
- 🎨 The script discusses the work of Tom White, an academic and artist, who created abstract prints by reverse engineering vision systems to highlight the differences in how algorithms perceive images.
- 🤖 The process of creating White's prints involves a drawing system and a machine vision classifier, with the image being tweaked and reclassified multiple times to reflect the algorithm's interpretation.
- 🧐 The prints challenge human perception, as staff at The Verge were asked to guess the objects represented, demonstrating the gap between human and algorithmic understanding.
- 📚 Machine learning programs are trained on specific data sets, which can lead to issues when encountering new or unexpected data, such as a penguin in a zoo of known animals.
- 🔍 The limitations of AI are described as narrow or brittle, with systems only working well in limited scenarios and often breaking down when faced with unfamiliar data.
- 🚦 The reliance on machine vision in critical applications like self-driving cars is highlighted, where the ability to correctly identify objects is crucial for safety.
- ❗ The script mentions the first fatal crash involving Tesla's self-driving system, which was partly due to the algorithm's failure to distinguish between a white tractor trailer and the sky.
- 🤖 Despite the limitations, there is ongoing work to improve machine learning algorithms, with humans often involved in the decision-making process to address shortcomings.
Q & A
What do the pictures in the video represent to a computer?
-The pictures are designed to be recognized by machine vision algorithms, which can see objects like a shark, binoculars, or explicit nudity where humans might only see random arrangements of lines and blobs.
What are some everyday applications of machine vision algorithms?
-Machine vision algorithms are used in self-driving cars, internet content monitoring, and phone unlocking, among other applications.
What is the history of computer vision in relation to artificial intelligence?
-The science of teaching computers to see dates back to the 1960s, coinciding with the creation of the field of artificial intelligence. Early systems were basic, but recent advancements in AI, particularly deep learning, have led to more sophisticated vision systems.
How do deep learning vision algorithms outperform humans in certain tasks?
-Deep learning has enabled the creation of vision algorithms that can identify cancerous tumors more accurately than doctors or distinguish between various dog breeds in milliseconds.
Who is Tom White, and how does his work relate to the script's theme?
-Tom White is an academic and artist from New Zealand who created bizarre prints by reverse engineering vision systems like those used by Google and Amazon. His work demonstrates the differences in how AI and humans perceive the world.
How does the process of generating Tom White's prints work?
-The prints are generated using a production line of algorithmic programs. A drawing system creates abstract lines, which are then fed into a machine vision classifier that guesses the object. The drawing system tweaks the image based on the classifier's guesses and repeats the process.
What was the purpose of asking Verge staff to guess the objects represented in Tom White's prints?
-The purpose was to see if people could think like a computer and understand how the machine vision algorithms interpret the abstract images.
What are some limitations of machine learning algorithms when it comes to recognizing patterns?
-Machine learning algorithms may not understand the world beyond the data they are trained on and may make decisions based on patterns that do not make sense in real-world scenarios, such as identifying all striped animals as zebras.
How does the script illustrate the difference between human and machine vision?
-The script uses Tom White's art and a Pictionary game with a human algorithm to show that while humans may struggle to interpret machine vision, machines also have difficulty understanding human interpretations of abstract images.
What implications does the difference between human and machine vision have for technologies like self-driving cars?
-The difference in vision can be critical for technologies like self-driving cars, where the ability to correctly identify objects such as pedestrians and stop signs can be a matter of life and death.
How do machine learning engineers address the shortcomings of vision algorithms?
-Machine learning engineers are aware of these shortcomings and often have humans in the loop to make decisions, ensuring that algorithms are not solely relied upon in critical applications.
What is Tom White's perspective on the limitations of machine vision algorithms?
-Tom White finds it refreshing and comforting that computers still struggle with simple tasks like counting the number of wheels on a tricycle, suggesting that we should be thankful for these limitations.
Outlines
🤖 The Dichotomy of Human and Machine Vision
This paragraph delves into the differences between human and machine vision, highlighting how machine vision algorithms interpret images differently from humans. It introduces the concept of machine vision, which is increasingly integrated into everyday technologies like self-driving cars and smartphone unlocking. The script mentions the evolution of computer vision from basic systems to sophisticated AI-driven systems capable of outperforming humans in certain tasks. It also introduces Tom White, a New Zealand academic and artist, who created art pieces by reverse engineering vision systems to demonstrate the peculiar way AI perceives images. The paragraph concludes with a human experiment where participants attempt to interpret the AI-generated images, showcasing the gap in understanding between human and machine perception.
🔍 Exploring the Limits and Abstractions of AI Vision
The second paragraph explores the limitations and abstract nature of AI vision through the lens of Tom White's artwork and a practical example of training a machine learning program. It discusses how AI systems are trained on specific data sets and can struggle with unfamiliar objects or scenarios, such as a penguin in a zoo filled with zebras and giraffes. The paragraph also touches on the brittleness of AI systems, which can fail when encountering unexpected data. It uses Tom's art to illustrate this point, showing how an algorithm might confuse a cello with a cellist or blur the lines between an instrument and its player. The script ends with a human-algorithm game of Pictionary, emphasizing the stark contrast between human and machine interpretation of visual data, and raises concerns about the increasing reliance on machine vision in critical areas such as self-driving cars and surveillance systems.
Mindmap
Keywords
💡Machine Vision Algorithms
💡Deep Learning
💡Classifier Networks
💡Artificial Intelligence (AI)
💡Tom White
💡Vision Systems
💡Self-Driving Cars
💡Human-in-the-Loop
💡Narrow AI
💡Pictionary
💡CCTV Cameras
Highlights
Pictures are designed to be recognized by machine vision algorithms, which see objects differently than humans.
Machine vision is increasingly used in everyday life, including self-driving cars, internet content monitoring, and phone unlocking.
The science of teaching computers to see dates back to the 1960s, coinciding with the creation of artificial intelligence.
Deep learning has revolutionized computer vision, enabling systems to outperform humans in certain tasks.
Deep learning has been used to create algorithms that can identify cancerous tumors and differentiate between dog breeds.
AI does not perceive the world as humans do, as demonstrated by Tom White's prints that confuse human and machine vision.
Tom White reverse engineered vision systems to create prints that challenge our understanding of how algorithms see the world.
The prints are generated through an iterative process involving a drawing system and machine vision classifier.
Humans were asked to guess objects in the prints, highlighting the gap between human and computer perception.
AI systems are described as narrow or brittle, only working well in limited scenarios and breaking down with unexpected data.
Tom's art reveals AI's limitations, such as blending a cello with the musician due to lack of understanding.
Vision algorithms struggle with tasks like counting, as seen with the tricycle example.
The differences between human and machine vision were explored through a Pictionary game with an algorithm.
Machine vision's increasing role in our lives, such as in self-driving cars and CCTV cameras, raises concerns about its limitations.
The first fatal Tesla self-driving crash was partly due to an algorithm failing to distinguish a white tractor trailer from the sky.
Machine learning engineers are aware of AI's shortcomings and often include humans in the decision-making process.
Tom White finds comfort in AI's simplicity, suggesting it's good to know their limitations and how they perceive the world differently.
The stories people create while trying to interpret the prints reflect the subjective nature of human perception.
Transcripts
(gentle music)
- What do you see in these pictures?
Objects, faces, or do they just look
like random arrangements of lines and blobs?
If you don't see anything specific it's probably
because you're not a computer.
These pictures were specially designed
to be recognized by machine vision algorithms.
You see blobs, they see a shark,
binoculars, explicit nudity.
But the algorithms that see these things
are the same ones being used
in more and more parts of everyday life.
They steer self-driving cars, they monitor content
on the internet and they even unlock your phone.
And the fact that they don't see the world the same way
you do it could be a problem.
(cosmic music)
The science of teaching computers to see goes back
to the 1960s and it coincides with the creation
of the field of artificial intelligence.
Early computer vision systems were very basic.
They could process only the simplest versions of 3D scenes
rendering the world in crude shapes and planes.
But in recent years a revolution in AI,
particularly deep learning, has created sophisticated
vision systems which can outperform humans
at a number of tasks.
To date we've used deep learning to create vision
algorithms that can identify cancerous tumors
better than a doctor or tell the difference
between a hundred different dog breeds in a millisecond.
Or they can just tell you whether the food
on your plate is a hotdog or not.
Okay, most humans can do that too.
But for all these achievements AI doesn't look
at the world the same way humans do
and that's what these bizarre prints
are meant to demonstrate.
They're the work of Tom White, an academic
and an artist from New Zealand.
He made them by essentially reverse engineering
a number of vision systems like those used
by Google and Amazon.
- So I started looking at these classifier networks
which knew how to classify or how to understand
an image and I was wondering if I can invert that process.
- [James] The prints are generated using a production line
of algorithmic programs.
First, a drawing system generates some abstract lines
and the image is fed into a machine vision classifier
which then tries to guess what object it might be.
Based on the classifier's guesses the drawing system
then tweaks the image and feeds it through again.
- I mostly take a hands-off approach because I really want
to know how the algorithms see the world.
And so after I set the systems up I kind of sit back
and let it run for a long time and see what comes out.
- [James] But if that's an algorithm's view
what does it look like to humans?
We asked Verge staff to guess which object
each print represented.
In essence we asked people can you think like a computer?
- Okay, you want me to say what I think that is?
- Like I'm just imagining like it's under water
and they're a bunch of fish or bugs.
- This one just looks like hot topic.
- A car?
- [Tom] It's a Starfish.
- It's a starfish, great.
- A train on train tracks, subway.
- This is someone getting pushed in front of a train.
- It definitely looks like somebody's conducting
with the wave motions.
- Like something on a stove?
- [Tom] It's a spider.
- Oh, for real?
- (chuckles) Okay.
- This is a dancing elephant.
- Like the wheel of a boat.
- An elephant.
- This was a whale.
Is that a whale?
Did people do good on this?
Like am I the, okay.
- Tom's work plays with the limits of AI
but in an abstract way.
So here's a more concrete example.
Let's say you're a computer scientist
and you're training a machine learning program
to recognize animals at the zoo.
You would collect a bunch of pictures of zebras, lions,
giraffes, et cetera, and you'd feed them into this algorithm
which would then look for patters in this data.
So after studying the pictures it might conclude
that if it sees something with a long neck, boom, giraffe.
If it sees stripes that's a zebra and so on.
There are problems with this approach though.
First, the algorithm doesn't know anything
about the world beyond that data.
So if you get a new edition to the zoo,
like a penguin, it's not gonna know what that is.
The second problem is that the way it makes decisions
might not be very sensible.
If it decides that all animals with stripes are zebras,
for example, what happens when it sees a tiger?
This is why researchers often describe AI
as narrow or brittle.
These systems only work in very limited scenarios
and when they come across unexpected data
they often break down.
This becomes really clear when you look back at Tom's art.
For example, look at this red print here (cello music).
The object it's supposed to represent is a cello.
You can see the curves of the body of the instrument
and the vertical lines with strings.
But there's also this other shape hovering
behind it in light red.
That's the person playing the cello.
Now the reason this figure appears is because the pictures
that we used to train the algorithm
included the cellos and the cellists too.
But because the program has no understanding
of what an instrument or a musician is
it just blurs the two together.
Or there's Tom's tricycle.
- [Tom] When I see a tricycle the first thing I think
about are the wheels and maybe in the back
of my head I count them.
- [James] But vision algorithms are terrible at counting
so the number of wheels is no help.
They lock onto the shape of the frame, a triangle,
and think that that represents the essence
of a tricycle at its best.
- To me this shape that it comes up with
doesn't remind me of a tricycle.
It looks like a bunch of lines but I guess I gain
an appreciation for the different ways of viewing things,
It's almost like visiting another culture
that has different ways of interpreting
or relating to objects.
- [James] To drive home the differences between human
and machine vision we decided to turn the tables
on the algorithms.
So we took the same objects in Tom's art
and abstracted them using a human algorithm.
We made the algorithm play Pictionary against us.
- So I'm drawing a spider, weirdly kind of intimidates me.
See what the computer thinks.
Well they said it was an invertebrate but I actually
can't remember now if spiders are invertebrates or not.
- The cello says text, black and white drawing, wing, joint.
The cellos are more complex so it didn't get that at all,
like anywhere near that.
- But why does this matter?
Well because despite the limitations of machine vision
we're trusting it with more and more aspects of our lives.
Take self-driving cars, for example.
In the future, the idea is that they'll rely totally
on what computers can and cannot see, no humans needed.
So teaching a machine to spot the difference
between a pedestrian and a stop sign will literally
be a life or death matter.
The first fatal crash involving Tesla's self-driving
self-drive, for example, was partly caused
by algorithms which couldn't distinguish
between the side of a white tractor trailer
and the bright sky behind it.
And when you think about the other places we're starting
to use these algorithms in CCTV cameras,
in military drones, it becomes worrying.
Now this doesn't mean we're building
completely broken systems.
Machine learning engineers are aware of these shortcomings
and most algorithms, like the ones we described,
still have humans in the loop somewhere making decisions.
And Tom, he takes solace in the fact that the algorithms
aren't any smarter than this.
He suggests that we should be thankful that a computer
still struggles to count the numbers
of wheels on a tricycle.
- I think it's kind of refreshing to see
even when they have very simple models of the world,
in a way that's comforting.
It's good to know how these things work.
It sometimes can give us insight into different ways
that we can see the world.
- [Lady] In this one I have birds, like it's a bird.
It's a bird maybe being crushed in someone's fist.
It's like a bird.
- [Tom] It's a butterfly.
- It's a but, yeah okay.
- [Tom] I like the stories. - I see it.
- [Tom] A bird being crushed in someone's fist?
Weitere ähnliche Videos ansehen
Dr. Jüergen Schmidhuber Keynote - Global AI Summit 2022
Understanding Artificial Intelligence and Its Future | Neil Nie | TEDxDeerfield
MIT 6.S191 (2018): Issues in Image Classification
Can You Train an AI to Think Exactly Like You?
Tech seminar week-6 Autonomous vehicle
What Is AI? This Is How ChatGPT Works | AI Explained
5.0 / 5 (0 votes)