Why it's harder for AI to open doors than play chess | Pulkit Agrawal | TEDxMIT
Summary
TLDRIn this talk, the presenter explores the paradox of intelligence, highlighting that tasks humans find easy, like opening doors, are challenging for robots, while complex cognitive tasks like chess are relatively easy for AI. The speaker discusses the Model X paradox, where physical intelligence is undervalued compared to cognitive intelligence. They argue for the importance of developing physical intelligence in AI, showcasing simulations and real-world applications of robots learning to walk on various terrains and manipulate objects. The talk emphasizes the need for a holistic approach, including hardware and perception, to achieve true artificial intelligence.
Takeaways
- 🤖 The speaker discusses the paradox that tasks requiring physical intelligence, like opening a door, are more challenging for robots than intellectual tasks like playing chess, which AI has mastered.
- 🏆 AI systems have been able to surpass human capabilities in complex games like chess and Go, but physical tasks remain a significant challenge.
- 🧠 The 'Moravec's paradox' highlights the contrast between what humans perceive as hard (intellectual tasks) and what robots find hard (sensory-motor skills).
- 🕵️♂️ The speaker cites experts like Hans Moravec and Stephen Pinker, who argue that high-level reasoning requires less computation than sensory-motor skills.
- 🧬 The evolution of intelligence is discussed, with a focus on how evolution has spent much more time developing sensory-motor skills than language or reasoning.
- 🌐 The speaker emphasizes the importance of physical intelligence in AI, suggesting that it is a prerequisite for true artificial intelligence.
- 🤝 The use of simulation is highlighted as a key technique for generating data and training AI systems to perform complex physical tasks.
- 🤸♂️ Examples of robots learning to walk and manipulate objects in simulation and then transferring these skills to the real world are provided.
- 🤲 The importance of hardware design, such as hands with touch sensing, is discussed in the context of developing physical intelligence.
- 🔧 The speaker argues for a 'full stack' approach to physical intelligence, combining control algorithms, perception, and hardware design.
Q & A
What is the main paradox discussed in the script regarding AI capabilities?
-The main paradox discussed is the Moravec's paradox, which states that tasks requiring high-level reasoning that are considered difficult for humans are easy for AI, while tasks considered simple for humans, such as sensory motor skills, are actually difficult for AI.
Why do AI systems struggle with tasks like opening doors, despite their ability to play complex games like chess?
-AI systems struggle with tasks like opening doors because these tasks require a level of physical intelligence and sensory motor skills that are not as developed as their reasoning capabilities. These tasks involve understanding and interacting with the physical world in a complex way, which is still challenging for AI.
What is the 'Model X Paradox' mentioned in the script?
-The 'Model X Paradox' is a concept that emerged from observations that AI systems can solve complex cognitive tasks like chess relatively easily, but struggle with tasks involving physical intelligence, such as walking or opening a door, which humans find easy.
How does the script explain the difference in difficulty between chess and sensory motor skills for AI?
-The script explains that while chess, which requires reasoning, is considered a complex task for humans, it is relatively easy for AI systems. In contrast, sensory motor skills, which are second nature to humans, require enormous computational power and are much harder for AI to replicate accurately.
What role does human intuition play in the development of AI, according to the script?
-Human intuition plays a significant role in the development of AI by influencing what tasks are prioritized and how they are approached. The script suggests that human intuition often misjudges the difficulty of tasks for AI, leading to a focus on cognitive tasks at the expense of physical intelligence.
What is the significance of the timeline provided in the script regarding the evolution of intelligence?
-The timeline in the script is significant because it illustrates the vast amount of time evolution has spent developing sensory motor skills compared to language and reasoning. This highlights the complexity of physical intelligence and suggests that AI development should also prioritize physical capabilities.
How does the script suggest AI systems can improve their physical intelligence?
-The script suggests that AI systems can improve their physical intelligence through the use of simulation, where large amounts of data can be generated to train AI in various physical tasks. This data can then be transferred to real-world applications.
What is the importance of hardware in the development of AI physical intelligence as discussed in the script?
-The importance of hardware in the development of AI physical intelligence is highlighted by the need for hands with touch sensing and the development of tools to design better hands for specific tasks. The script emphasizes that hardware, perception, and control algorithms are all crucial for achieving physical intelligence.
How does the script relate the evolution of life on Earth to the development of AI?
-The script relates the evolution of life on Earth to the development of AI by drawing a parallel between the long evolutionary development of sensory motor skills and the relatively recent development of language and reasoning. It suggests that AI should also prioritize the development of physical intelligence before focusing on higher cognitive functions.
What is the speaker's stance on the future of AI and physical intelligence?
-The speaker believes that physical intelligence is a critical foundation for true artificial intelligence. They argue against the idea that AI can be fully realized without embodied intelligence and advocate for a 'full stack' approach that includes control, perception, and hardware in the development of AI.
Outlines
🤖 The Paradox of AI: Chess vs. Door Opening
The speaker begins by questioning the intelligence required for complex tasks like playing chess versus simple tasks like opening a door. They highlight the irony that while AI has surpassed human capabilities in chess two decades ago, AI's ability to perform seemingly simple tasks like opening doors remains challenging. The speaker introduces the concept of the 'Model X Paradox,' which suggests that tasks humans find easy, such as sensory-motor skills, are difficult for robots, while tasks involving reasoning, like chess, are easier for AI. The paradox is illustrated with examples of AI's success in complex games and the struggles of robots in the DARPA Grand Challenge finals to perform basic physical tasks.
🌱 Evolution's Emphasis on Sensory-Motor Skills
The speaker delves into the evolutionary timeline, noting that life began with single-cell organisms and evolved to apes capable of simple sensory-motor tasks over billions of years. The evolution of humans and the development of language occurred much more recently, suggesting that nature has prioritized sensory-motor skills over language and reasoning. The speaker uses this timeline to emphasize the disparity in AI development, where systems have advanced rapidly in language understanding but struggle with physical intelligence. They also discuss AI's current capabilities in language and image generation, showcasing AI's prowess in areas that are a tiny fraction of evolutionary time.
🤸♂️ Overcoming the Model X Paradox with Simulation
The speaker addresses the ongoing challenge of the Model X Paradox, where AI excels in language but falls short in physical tasks. They argue that physical intelligence is crucial for true AI and cannot be bypassed. To tackle this, the speaker's lab uses simulation to generate extensive data and train AI in sensory-motor skills. They demonstrate how simulated training can transfer to real-world scenarios, such as robots navigating various terrains and manipulating objects. The speaker also touches on the importance of hardware design, showing experiments with hands that can sense touch and are optimized for specific tasks.
🍰 The Full Stack Approach to Physical Intelligence
In the final paragraph, the speaker summarizes the dichotomy between artificial and natural intelligence, emphasizing the need for a 'full stack' approach to physical intelligence. They argue that while AI has made significant strides in language understanding, true intelligence requires embodiment and physical capabilities. The speaker advocates for the development of physical intelligence as a foundation for achieving artificial general intelligence. They conclude by encouraging more focus on physical intelligence amidst the current hype around AI, suggesting that without building this foundation, AI's potential remains unfulfilled.
Mindmap
Keywords
💡Artificial Intelligence (AI)
💡Model X Paradox
💡Sensory Motor Skills
💡Language Models
💡Physical Intelligence
💡Evolution
💡Simulation
💡Generalization
💡Manipulation
💡Hardware
Highlights
Artificial intelligence has surpassed human capabilities in complex games like chess and Go, yet struggles with simple tasks such as opening doors.
The dichotomy between tasks perceived as hard by humans and those challenging for robots is known as the Moravec's paradox.
Sensory motor skills, which are effortless for humans, require enormous computational power for robots.
The evolution of intelligence has prioritized sensory motor skills over language and reasoning, which are relatively recent developments.
Current AI systems excel at language understanding but lack physical intelligence, which is crucial for everyday tasks.
Simulation is a key technique used to generate data and train robots in various environments and tasks.
AI systems can be trained in simulation to perform complex physical tasks and then transferred to real-world applications.
Physical intelligence involves not just control algorithms but also perception and hardware design.
The development of hands with touch sensing capabilities is crucial for robots to interact with their environment effectively.
Hardware design, such as the creation of hands optimized for specific tasks, is a significant aspect of physical intelligence.
The speaker argues that physical intelligence must be developed before achieving true artificial intelligence.
The talk emphasizes the importance of building physical intelligence, which is often overlooked in favor of language and reasoning capabilities.
The Model X paradox, observed in 1988, is still relevant today, highlighting the persistent challenges in physical intelligence for AI.
The speaker's lab is working on techniques to enhance physical intelligence, including simulation and hardware optimization.
The future of AI may depend on our ability to develop physical intelligence, which is essential for robots to perform everyday tasks.
The talk concludes with a call to action for the AI community to focus on building physical intelligence as a foundation for advanced AI capabilities.
Transcripts
[Music]
thank you
so let us get started with a question
what do you think requires more
intelligence
playing the game of chess
or opening
a door
right I mean many times we think that
chess is a matter of Genius
so if chest was actually hard to do
then building machines which can play
chess should also be way harder than
building machines which can open doors
but let's see what we have managed to do
in artificial intelligence
you know we probably heard about AI
systems surpassing humans at the game of
chess
this is not today but 20 years ago
right since then we have had AI systems
which can play complex multiplayer games
and surpass humans even at this complex
game of Go
but now let's you know take a look at
opening doors
now you might think I am showing you bad
videos
but let me assure you these are the best
teams competing in the DARPA Grand
finals
just
seven years ago
you know doing simple things like
opening doors climbing stairs is
actually very hard
right
okay so you know let's try to understand
why is this the case
so there seems to be this dichotomy
between what we think is hard and what
our robots find to be hard
and in this talk you know what I'm going
to communicate to you
is that human intuition of what we think
is hard really gets in our way
and you know kind of stops us from
really building intelligent systems
now many scientists have wondered about
this you know I'll start with a quote
from Hans marawick you know one of the
people who thought about AI quite a lot
and here is what he has to say
you know reasoning
requires very little computation you
know reasoning like the kind of
reasoning you have in chess
right but sensory motor skill requires
enormous compute
it will quote another scientist you know
Stephen Pinker from Harvard
the main lesson of 35 years of AI
research
is that hard problems like chess are
actually easy
and easy problems like walking and
opening a door are actually hard
these observations came to be known as
the model X paradox
you know some people have gone ahead to
speculate
right that machines or AI systems are
going to do jobs which we think are
cognitively challenging quite soon
for example being a board member being a
data analyst or being creative and
making paintings
but jobs which require physical
intelligence
are going to be not being able to done
by machines for a long time to come
now why is this
and the reason is
that we are
least aware of things that we do very
well for example our heart is beating we
are breathing but are we aware of it
when you walk are you aware of it you
know these systems are working all the
time they're Flawless you don't even
think about them right this is what was
quoted by Marvin Minsky you know one of
the co-founders of the MIT is AI lab and
a turing Award winner
right and then he goes on to say that we
are more aware of simple processes that
do not work well
and when he says simple again think of
chess
and this is not an abstract concept I
think all of us have experienced this
model X Paradox in our lives
you know imagining riding a bicycle
you know when you've learned how to ride
a bicycle very early on you probably
paid attention to every you know every
movement of your foot where is the
handle going
but after some time it becomes natural
it becomes intuitive you don't even
think about it
so let's take this idea and you know
apply it to and use it to understand the
evolution of intelligence
so life started you know some 3.7
billion years ago with single cell
organisms
then it took around you know 3.7 billion
years to come to Apes now what can apes
do simple sensory motor stuff you know
like uh hanging from branches you know
picking up a fruit throwing it so on and
so forth
then it took you know a few million
years or 20 million years for humans to
evolve
then a few million years for language to
come in
and then you know we are just 50 000 to
150 000 years from when language started
so maybe there's a lesson over here also
that Evolution spend a lot of time
evolving sensory motor skills and
relatively very very little time
developing language or reasoning that we
think is complex today
no just to give a sense of these numbers
you know imagine that the origin of
Earth we are describing it in one day
right so we have start of the you know
Earth is born at midnight and they're
going to look at you know our 24 hours
period
so language is just 10 seconds old
right
humans are just one minute 26 or one
minute 20 seconds old
apes or maybe six minutes old
but life started 20 hours ago
so that gives you the sense of how much
time it took to get to these simple
skills
now what is this implication for
building a robot
so I for one you know want to have a
robot which can do the mundane thing
that I do at my house today
right now if I say to the robot you know
make me dinner
the first thing the robot needs to
understand is you know what is sinner
what do I eat what are the recipes and
how to make it right now this is what
language might give us right
now there's another part which is
physical intelligence which is how do I
actually make dinner which I have to
chop vegetables so on and so forth
so now let's look back at our timeline
so how much time it took for language
understanding 10 seconds how much time
for physical intelligence maybe 20 hours
right so what what have we done in AI
today is you know we have taken Lots
amounts of data from the internet and
developed very capable systems which can
understand language
you know just to give you a very quick
summary of how they work
you know so these systems are called as
language models they consume a lot of
data and then given a few words they try
to predict what words are going to come
next right for example you know the
question is Coke is in and then the AI
system can you know make a prediction of
what the next words are going to be
right here is you know one prediction
made by the system
let us ask it a different question and
see what the prediction is
you know sounds very reasonable right
maybe let's ask another question
and let's see what the answer is
you know maybe maybe a bit nonsensical
but but the point is you know yes you
know there are a few aberrations but
these systems are becoming really really
good
and we can also hook up images with
these systems it's just not about
language right for example you know we
can ask an AI system to generate an
image of a 2 2 on a stroll with a dog
right something which probably is you
know we never imagined right or
something like draw images
of an avocado chair
right and these systems can do it
so
what does this mean that we have such
good language understanding in context
of building robots that we could you
know be in my house
for the model X Paradox was made in you
know 1988 after observing 35 years of AI
research now we are almost in 2023 which
means it is almost 35 years since then
right and let's look at an attempt you
know to build a robotic system to
replicate some household phase so this
is a very impressive system you know put
out by Google you know some time back
and the question is
you know someone spilled the Coke and
they want the robot to clean the mess
so let's look at you know what the robot
ends up doing
right it realizes it needs to find a
Coke it goes it grasps this Coke can
and then it tries to throw it in the
trash can
[Music]
but cannot throw it
right then it moves ahead and says hey
you know I need something to wipe off
the table so I'm going to pick up this
pad and take it
to go wipe the table
foreign
okay so what is the lesson the Moto X
Paradox still triumphs
right 35 years have passed
you know and still the same problem
exists
right now I don't want to be here 35
years from now and tell you the same
thing
right
so we we need to fix this
so now what what actually is the problem
right the problem is you know to get to
language understanding which is
equivalent of 10 seconds of evolution we
are pretty much consumed all of the
internet
right now how are we going to go to
sensory motor skills
so you know some people think that hey
you know maybe we can get to artificial
intelligence without doing physical
intelligence
now I can talk a lot about this right
but in interest of time I was going to
tell you my bet my bet is no
right and these paradoxes that we have
you know seen so far also happen in
physical intelligence and this is what
it you know makes physical intelligence
challenging right to give you an example
you know consider a robot doing a
backflip
impressive right
but you know what about
the behavior of walking
seems very simple
but when you do a backflip you know
maybe what you're doing is a specialized
motion that you only have to reason
about your own motor system
but when I'm walking then I have to walk
on many different terrains so I need to
reason about the environment for these
systems have to generalize through a
large variety of terrains and this is
what makes them challenging
like
so
you know so the question which me and my
lab are trying to look at is you know
how do we get to physical intelligence
and I'm going to briefly now tell you
you know some of the ideas and
techniques that we have been using right
for the one thing we heavily make use of
is simulation
you know because in simulation we can
generate lots of data right in three
hours we can generate you know 100 days
worth of data right so this is you know
an example where you can simulate Many
Many Robots in parallel
then in simulation you know they can
learn how to walk
right so these are some gates that we
have learned
now once we learn these walking
behaviors in simulation then we take
them and transfer them into the real
world and by real world I mean different
kinds of terrains so it might be stairs
going on certain obstacle or walking on
sand
so here are you know some results of
these systems which were trained in
simulation but then deployed
you know in the real world right over
here this data is running fast but it's
not just fast it can go on these
challenging terrains and still be stable
or for example over here it tries to go
under an obstacle
right so it has to crouch before it can
go beneath
or you know for example you know going
up this Gravelly Hill
you know sometimes when the robot is
doing you know these behaviors the
environment is not you know forgiving
you know for example once when you're
running this robot outside in this
building one of the screws underneath
came off
so what you now see is you know this
robot is limping in a way but still
walking and this is the kind of
robustness and generalization that we
hope to achieve
and this is not just in context of
locomotion you know we can also think
about in context of manipulation right
for example you know things that we do
every day right we pick up tools we use
them right and we keep doing it all the
time sometimes for a purpose and
sometimes you know just for fun
right and sometimes you know we do it
because we have to do it
so what we can do is you know we can
also run simulations where we have you
know lots of hands you know stimulating
you know this task of reorienting
objects because it is needed to perform
a downstream manipulation task
and then we can take
you know this learned system in
simulation and transfer it into the real
world
right and the way this system is going
to work is it's going to have a camera
and going to give some commands to the
fingers to move right this is just a
side view so you have a sense of what
I'm going to show you
and we're going to evaluate on new
objects so the system has never seen
before
so you know for example over here on
your top right is a goal orientations
and you know let's look at what the
system ends up doing
so it tries to reorient this object to
the Target which is shown on the top
right
so again you know this is some examples
showing that in simulation we can
leverage large amounts of data and then
use it to perform things that seem
simple to humans but are actually quite
complex
now you know building these you know
physical intelligence or these systems
is not just about the control algorithm
it's also a lot about the hardware you
know for example the hand that I showed
you had no touch sensing right so we
need to you know activate some more
modalities so it can start you know
sensing where it makes contact
right
and you know we have been running you
know some experiments with you know
doing problems like hey if I give you an
object you know can you feel you know
what the object is
right and you know then if I close the
lights so it's just dark can you go and
find that object again
right so for example now we put more
objects shut off the light and the Hand
still has to find these objects just
based on touch sensing
the other question you can ask is well
is the design of this hand good
maybe maybe not right so we're
developing tools which can help us
design you know better hands so I'm
showing you four examples of four
different hands that we're able to
design which were optimized for the
particular task
right so for example over here you know
here's a hand which can cut things
and you know it can you know use
scissors to decide it can cut paper but
not cut acrylic
right so in summary
right it is not just about control but
also thinking about perception and also
thinking about Hardware right and we
need to think in a full stack way to
approach physical intelligence
so to end what I discussed was this
dichotomy between artificial and natural
intelligence
and what we think we have is the Cherry
right which is 10 seconds worth of
evolution you know models that we have
trained on the internet
but the question is where is my cake
right and you know while there are some
people who believe
that you know we can just go on the
Internet Train bigger models and not be
embodied and get to artificial
intelligence
you know me and some other people I may
be on the other camp we think we need we
need to build the cake first to build
physical intelligence before we can go
to True artificial intelligence right
and I hope now that more and more people
think about physical intelligence
especially that there is a lot of you
know hype and a lot of excitement about
this embodied intelligence going on
right I think that hype is very good but
we cannot you know have the cake without
building the cake with that thank you
[Applause]
Ver Más Videos Relacionados
Why Don’t We Have Better Robots Yet? | Ken Goldberg | TED
Summit Fernando Díaz Chief Learning and Technology Office Mentu GEF 2024
Richard Feynman: Can Machines Think?
Next Up for AI? Dancing Robots | Catie Cuan | TED
Trinity of Artificial Intelligence | Anima Anandkumar | TEDxIndianaUniversity
Artificial Intelligence in 2 Minutes | What is Artificial Intelligence? | Edureka
5.0 / 5 (0 votes)