Yann LeCun: Deep Learning, ConvNets, and Self-Supervised Learning | Lex Fridman Podcast #36
Summary
TLDR在这次深度对话中,人工智能领域的先驱Yann LeCun分享了他对深度学习革命的看法,探讨了其在AI发展中的重要性。LeCun教授讨论了价值不一致性问题,以及如何通过设计目标函数来避免机器做出有害的决策。他还提到了自主智能系统的构建,强调了预测模型的必要性,并探讨了情感在智能中的作用。LeCun对当前AI技术的局限性和未来的可能性提供了深刻的见解,展望了人工智能的未来发展。
Takeaways
- 🤖 深度学习是AI领域的一项革命性技术,它使机器能够从数据中学习。
- 🧠 深度学习之父之一的Yann LeCun,以其在卷积神经网络方面的贡献而闻名,特别是在光学字符识别和MNIST数据集上的应用。
- 🎓 Yann LeCun是纽约大学的教授,同时也是Facebook的副总裁和首席AI科学家,他因深度学习的工作获得了图灵奖。
- 🗣️ LeCun认为,机器的目标是学习世界的模型,以便能够预测未来并做出决策。
- 🚀 尽管深度学习在图像识别和自然语言处理等领域取得了巨大成功,但要达到人类水平的智能,还有很长的路要走。
- 🧠 对于AI系统来说,理解和处理不确定性是一个关键挑战,这需要对世界有更深层次的理解。
- 🤖 AI系统的设计需要考虑到伦理和道德问题,例如在设计目标函数时需要考虑到不伤害人类等原则。
- 📚 LeCun认为,AI的发展不仅仅是技术的进步,它还涉及到法律、伦理和社会等多个领域。
- 🌐 深度学习的成功部分归功于互联网和大数据的可用性,这些数据为训练复杂的神经网络提供了基础。
- 🔍 LeCun强调,尽管深度学习取得了显著进展,但在理解大脑如何工作方面,我们仍然有很多不知道的地方。
- 🚧 目前,AI系统在特定任务上表现出色,但它们并不是真正的通用智能,而是高度专业化的系统。
Q & A
Yann LeCun在人工智能领域有哪些重要贡献?
-Yann LeCun是深度学习领域的先驱之一,尤其以卷积神经网络的创立者而闻名,特别是在光学字符识别和著名的MNIST数据集上的应用。他还因深度学习的工作获得了图灵奖,并在Facebook担任首席AI科学家。
Yann LeCun如何看待人工智能的伦理和价值对齐问题?
-Yann LeCun认为,人工智能系统的价值对齐问题类似于人类社会中通过法律防止不良行为的情况。他强调,我们需要为AI系统设计目标函数,并在其中加入类似法律的约束,以确保AI系统的行为符合社会的共同利益。
Yann LeCun对于构建类似HAL 9000这样的AI系统有哪些建议?
-Yann LeCun认为,构建类似HAL 9000的AI系统时,不应要求AI系统保守秘密或说谎,因为这最终会导致内部冲突。他还提到,应该为AI系统设定类似医生誓言的基本规则,以确保它们不会违反某些伦理原则。
Yann LeCun如何看待深度学习中的大规模神经网络?
-Yann LeCun认为,大规模神经网络在相对较小的数据集上进行训练时,其效果出人意料地好,这打破了以往教科书中关于需要较少参数和数据样本的传统观点。他认为这是一个令人惊讶的发现,因为它证明了在参数数量巨大、目标函数非凸的情况下,神经网络仍然能够学习。
Yann LeCun对于AI系统进行推理的能力有何看法?
-Yann LeCun相信神经网络能够进行推理,并且认为推理是学习的一个结果。他提出,为了使神经网络能够进行推理,我们需要一种工作记忆系统,以及能够访问这些记忆并进行迭代处理的网络。他还提到,目前正在研究如何使Transformer这样的模型具有类似的记忆和推理能力。
Yann LeCun如何看待AI系统中的因果推理?
-Yann LeCun认为,当前的神经网络在因果推理方面存在不足,但已经有研究者在努力解决这个问题。他提到了最近的研究工作,这些工作致力于让神经网络关注真实的因果关系,这可能同时解决数据偏见等问题。
Yann LeCun对于深度学习在90年代的低迷和复兴有何看法?
-Yann LeCun认为,90年代深度学习之所以低迷,部分原因是当时缺乏易于使用的软件平台和足够的数据集。他提到,当时使用神经网络需要编写大量代码,并且没有像Python或MATLAB这样的工具。他还提到,当时AT&T的法律限制阻止了他们的代码作为开源发布,这限制了技术的传播和应用。
Yann LeCun如何看待专利在AI领域的作用?
-Yann LeCun表示,他个人并不相信软件或数学概念的专利。他认为,专利的存在是因为法律环境的要求,但他和Facebook都不相信这种专利,通常是出于防御目的而申请。他强调,整个行业并不相信专利能够保护创新,而是更注重开放合作和共同进步。
Yann LeCun对于AI领域的未来有哪些预测?
-Yann LeCun预测,AI领域将继续发展,特别是在交互式环境和模拟环境中训练和测试智能系统方面。他认为,未来的AI系统将更多地依赖于学习和模型预测控制,以及可能结合监督学习和基于模型的强化学习。他还强调了自我监督学习和模型预测控制在未来智能系统中的重要性。
Yann LeCun如何看待当前AI系统在理解世界方面的局限性?
-Yann LeCun认为,当前的AI系统在理解世界方面还很有限,尤其是缺乏对物理世界的直观理解。他用视觉系统的例子说明,即使是随机打乱输入信号的顺序,人类的大脑也无法通过学习恢复到原有的视觉质量。这表明,我们的大脑在很多方面是高度专业化的,而不是普遍通用的。
Yann LeCun对于如何构建具有人类水平智能的AI系统有何建议?
-Yann LeCun认为,要构建具有人类水平智能的AI系统,首先需要让机器通过观察和少量互动学习世界的模型,类似于婴儿和幼小动物的学习方式。他强调,自我监督学习是关键,即让机器通过预测和重建输入的一部分来学习。此外,他提到,为了让AI系统真正理解语言,需要某种形式的“接地”或与世界的联系。
Outlines
🤖 对话深度学习之父Yann LeCun
本段对话中,Yann LeCun讨论了深度学习的发展及其在人工智能中的应用。他提到了自己在深度学习和卷积神经网络方面的贡献,以及对于未来人工智能发展的展望。他还探讨了人工智能在社会中的价值对齐问题,以及如何通过法律和教育来指导AI行为。
🚀 人工智能与价值对齐
在这一段中,Yann LeCun深入讨论了人工智能中的价值对齐问题,即如何确保AI系统的行为与人类的价值观和目标一致。他通过《2001太空漫游》中的HAL 9000案例,探讨了当AI系统的目标与人类安全相冲突时可能产生的问题,并提出了通过法律和道德规范来约束AI行为的可能性。
🧠 人工智能的自主性和道德准则
Yann LeCun在这一段落中讨论了人工智能系统的自主性和道德准则。他提出了AI系统应该遵循的道德准则,类似于医生的希波克拉底誓言,并强调了在设计AI系统时需要考虑的伦理问题。他还提到了如何通过硬编码规则来确保AI系统不会违反这些道德准则。
🌟 深度学习中的惊喜发现
在这一段中,Yann LeCun分享了他在深度学习领域的一个惊喜发现:即使在数据量相对较少的情况下,巨大的神经网络也能够成功训练。这一发现打破了以往的教科书理论,证明了即使在非凸目标函数和参数数量巨大的情况下,神经网络仍然能够学习。
🔄 学习与智能的不可分割性
Yann LeCun在本段中讨论了学习与智能之间的不可分割关系。他认为,所有已知的智能实体都是通过学习获得智能的,因此机器学习是实现自动化智能的明显路径。他还提出了智能系统应该具备的能力,例如工作记忆和推理能力,以及如何通过神经网络实现这些能力。
🧠 神经网络与推理能力
在这一段对话中,Yann LeCun探讨了神经网络是否能够进行推理。他讨论了神经网络的结构和学习方式,以及如何通过梯度下降学习来实现推理能力。他还提到了知识获取的挑战,以及如何将知识编码到神经网络中。
📈 人工智能的专利和开放性
Yann LeCun在这段对话中讨论了人工智能领域的专利问题,以及他对专利的看法。他提到了Facebook和Google的专利政策,以及他们如何出于防御目的申请专利,但不会主动发起专利诉讼。他还分享了一个关于卷积神经网络专利的故事,以及如何通过开放源代码来促进技术的发展和共享。
🚗 自动驾驶与深度学习的结合
在这一段中,Yann LeCun讨论了深度学习在自动驾驶领域的应用。他提到了自动驾驶技术的发展历程,以及如何通过深度学习来提高自动驾驶系统的性能。他还探讨了未来自动驾驶系统的发展方向,包括如何利用深度学习来处理不确定性和进行规划。
🤖 人工智能的通用性和特殊性
Yann LeCun在这段对话中探讨了人工智能的通用性和特殊性。他指出,尽管人类智能在很多领域都表现出了强大的学习能力,但它并不是真正意义上的通用智能。他还讨论了如何通过建立基准测试来评估AI系统的性能,以及如何避免被那些声称拥有通用AI解决方案的公司所误导。
🧠 人工智能的自我监督学习和预测模型
在这段对话中,Yann LeCun讨论了自我监督学习的重要性,以及如何让AI系统通过观察和互动来学习世界模型。他强调了预测模型在智能自主系统中的核心作用,以及如何利用这些模型来进行优化控制。他还提到了在自然语言处理和图像识别中自我监督学习的应用。
🤖 人工智能的主动学习和效率提升
Yann LeCun在这段对话中探讨了主动学习的概念,即AI系统在特定情况下请求人类帮助以提高学习效率。他认为,尽管主动学习可以使现有的AI系统更加高效,但它不太可能带来智能水平的质的飞跃。他还讨论了自我监督学习、强化学习和模仿学习之间的关系。
🚀 人工智能的未来展望和挑战
在这段对话中,Yann LeCun分享了他对人工智能未来发展的看法,包括需要克服的主要挑战。他提到了自我监督学习和建立世界模型的重要性,并讨论了如何通过这些方法来提高AI的智能水平。他还提到了情感在智能中的作用,以及如何通过预测模型来实现更好的决策。
🎖️ Yann LeCun的成就和对未来AI的期待
在这段对话的最后,Yann LeCun被授予了图灵奖,以表彰他在深度学习和人工智能领域的杰出贡献。他对这一荣誉表示感谢,并对未来人工智能的发展表达了期待。他还提出了一个有趣的问题,即如果我们能够创建一个具有人类水平智能的AI系统,我们会问它什么问题。
Mindmap
Keywords
💡深度学习
💡价值不对齐
💡自主智能系统
💡自我监督学习
💡强化学习
💡卷积神经网络
💡人工智能播客
💡图灵奖
💡自主驾驶
💡认知科学
Highlights
Yann LeCun discusses the revolution in AI captivated by deep learning and his role as a pioneering figure in the field.
LeCun shares insights into the applications of convolutional neural networks, especially in optical character recognition and the MNIST dataset.
He emphasizes the concept of value misalignment in AI systems, illustrating it with HAL 9000's decision-making in '2001: A Space Odyssey'.
LeCun highlights the importance of designing AI with constraints to avoid unintended harmful actions.
The conversation delves into the ethics of AI decision-making and the parallels between AI objectives and human laws.
LeCun discusses the necessity of integrating learning into AI for genuine intelligence, critiquing the idea of pre-programmed intelligence.
He explores the surprise element in deep learning's success despite the theoretical warnings against its approach.
LeCun provides insights into his early work and the development of neural networks, highlighting the challenges and innovations.
He critically views the patenting of AI technologies, sharing his experience with convolutional neural networks' patents.
LeCun discusses the evolution and future of AI, emphasizing the importance of unsupervised learning and model understanding of the world.
The interview touches on the ethical considerations and potential misuses of AI, calling for responsible development and application.
LeCun shares his vision for AI's role in understanding human intelligence and the brain's learning mechanisms.
He discusses the limitations of current AI in terms of general intelligence and the significance of learning from interaction with the world.
LeCun predicts the integration of AI into various fields, stressing the need for AI systems to have a grounded understanding of reality.
Finally, LeCun speculates on the challenges ahead in achieving AI systems with human-like intelligence, focusing on learning, reasoning, and ethical considerations.
Transcripts
the following is a conversation with
Jana kun he's considered to be one of
the fathers of deep learning which if
you've been hiding under a rock is the
recent revolution in AI that's
captivated the world with the
possibility of what machines can learn
from data he's a professor in New York
University a vice president and chief AI
scientist a Facebook & Co recipient of
the Turing Award for his work on deep
learning he's probably best known as the
founding father of convolutional neural
networks in particular their application
to optical character recognition and the
famed M NIST data set he is also an
outspoken personality unafraid to speak
his mind in a distinctive French accent
and explore provocative ideas both in
the rigorous medium of academic research
and the somewhat less rigorous medium of
Twitter and Facebook this is the
artificial intelligence podcast if you
enjoy it subscribe on YouTube give it
five stars on iTunes support and on
patreon we're simply gonna equip me on
Twitter Alex Friedman spelled the Fri D
ma N and now here's my conversation with
Yann Laocoon you said that 2001 Space
Odyssey is one of your favorite movies
Hal 9000 decides to get rid of the
astronauts for people haven't seen the
movie spoiler alert because he it she
believes that the astronauts they will
interfere with the mission do you see
how is flawed in some fundamental way or
even evil or did he do the right thing
neither there's no notion of evil in
that in that context other than the fact
that people die but it was an example of
what people call value misalignment
right you give an objective to a machine
and the Machine strives to achieve this
objective and if you don't put any
constraints on this objective like don't
kill people and don't do things like
this
the Machine given the power will do
stupid things just to achieve this dis
objective or damaging things to achieve
its objective it's a little bit like we
are used to this in the context of human
society we we put in place laws to
prevent people from doing bad things
because fantasy did we do those bad
things right so we have to shave their
cost function the objective function if
you want through laws to kind of correct
an education obviously to sort of
correct for for those so maybe just
pushing a little further on on that
point how you know there's a mission
there's a this fuzziness around the
ambiguity around what the actual mission
is but you know do you think that there
will be a time from a utilitarian
perspective or an AI system where it is
not misalignment where it is alignment
for the greater good of society that
kneei system will make decisions that
are difficult well that's the trick I
mean eventually we'll have to figure out
how to do this and again we're not
starting from scratch because we've been
doing this with humans for four
millennia
so designing objective functions for
people is something that we know how to
do and we don't do it by you know
programming things although the legal
code is called code so that tells you
something and it's actually the design
of an object you function that's really
what legal code is right it tells you
you can do it what you can't do if you
do it you pay that much that's that's an
objective function so there is this idea
somehow that it's a new thing for people
to try to design objective functions are
aligned with the common good but no
we've been writing laws for millennia
and that's exactly what it is
so this that's where you know the
science of lawmaking and and computer
science will come together will come
together so it's nothing there's nothing
special about how or a I systems is just
the continuation of tools used to make
some of these difficult ethical
judgments that laws make yeah and we and
we have systems like this already that
you know make many decisions for
ourselves in society that you know need
to be designed in a way that they like
you know rules about things that
sometimes sometimes have bad side
effects and we have to be flexible
enough about those rules so that they
can be broken when it's obvious that
they shouldn't be applied so you don't
see this on the camera here but all the
decorations in this room is all pictures
from 2001 a Space Odyssey Wow
and by accident or is there a lot about
accident it's by design Wow so if you
were if you were to build hell 10,000 so
an improvement of Hal 9000 what would
you improve well first of all I wouldn't
ask you to hold secrets and tell lies
because that's really what breaks it in
the end that's the the fact that it's
asking itself questions about the
purpose of the mission and it's you know
pieces things together that it's heard
you know all the secrecy of the
preparation of the mission and the fact
that it was discovery and on the lunar
surface that really was kept secret and
and one part of Hal's memory knows this
and the other part is does not know it
and it's supposed to not tell anyone and
that creates a internal conflict do you
think there's never should be a set of
things that night AI system should not
be allowed like a set of facts that
should not be shared with the human
operators well I think no I think the I
think it should be a bit like in the
design of autonomous AI systems there
should be the equivalent of you know the
the the oath that hypocrite Oh calm
yourself yeah that doctors sign up to
right so the certain thing certain rule
said that that you have to abide by and
we can sort of hardwire this into into
our into our machines to kind of make
sure they don't go so I'm not you know
advocate of the the 303 dollars of
Robotics you know the as you move kind
of thing because I don't think it's
practical but but you know some some
level of of limits but but to be clear
this is not these are not questions that
are kind of really worth asking today
because we just don't have the
technology to do this we don't we don't
have a ton of missing teller machines we
have intelligent machines so my
intelligent machines that are very
specialized but they don't they don't
really sort of satisfy an objective
they're just you know kind of trained to
do one thing so until we have some idea
for design of a full-fledged autonomous
intelligent system asking the question
of how we design use objective I think
is a little a little too abstract it's a
little tough rat there's useful elements
to it in that it helps us understand our
own ethical codes humans so even just as
a thought experiment if you imagine that
in a GI system is here today how would
we program it is a kind of nice thought
experiment of constructing how should we
have a law have a system of laws far as
humans
it's just a nice practical tool and I
think there's echoes of that idea too in
the AI systems left today it don't have
to be that intelligent
yeah like autonomous vehicles there's
these things start creeping in that were
thinking about but certainly they
shouldn't be framed as as hell yeah
looking back what is the most I'm sorry
if it's a silly question but what is the
most beautiful or surprising idea and
deep learning or AI in general that
you've ever come across sort of
personally well you said back and and
just had this kind of wow that's pretty
cool
moment that's nice well surprising I
don't know if it's an idea rather than a
sort of empirical fact the fact that you
gigantic neural nets trying to train
them on you know relatively small
amounts of data relatively with the
caste grid in the center that it
actually works breaks everything you
read in every textbook right every pre
deep learning textbook that told you you
need to have fewer parameters and you
have data samples you know if you have
non-convex objective function you have
no guarantee of convergence you know all
the things that you read in textbook and
they tell you stay away from this and
they were all wrong huge number of
parameters non-convex and somehow which
is very relative to the number of
parameters data it's able to learn
anything right does that surprise you
today well it it was kind of obvious to
me before I knew anything that that's
that this is a good idea and then it
became surprising that it worked
because I started reading those text
books okay so okay you talk to the
intuition of why was obviously if you
remember well okay so the intuition was
it's it's sort of like you know those
people in the late 19th century who
proved that heavier than than air flight
was impossible right and of course you
have birds right they do fly and so on
the face of it it it's obviously wrong
as an empirical question right and so we
have the same kind of thing that you
know the we know that the brain works we
don't know how but we know it works and
we know it's a large network of neurons
and interaction and the learning takes
place by changing the connection so kind
of getting this level of inspiration
without copying the details but sort of
trying to derive basic principles
you know that kind of gives you a clue
as to which direction to go there's also
the idea somehow that I've been
convinced of since I was an undergrad
that even before that intelligence is
inseparable from running so you the idea
somehow that you can create an
intelligent machine by basically
programming for me was a non-starter you
know from the start every intelligent
entity that we know about arrives at
this intelligence to learning so
learning you know machine learning was
completely obvious path also because I'm
lazy so you know it's automate basically
everything and learning is the
automation of intelligence right so do
you think so what is learning then what
what falls under learning because do you
think of reasoning is learning where
reasoning is certainly a consequence of
learning as well just like other
functions of of the brain
the big question about reasoning is how
do you make reasoning compatible with
gradient based learning do you think
neural networks can be made to reason
yes that there's no question about that
again we have a good example right the
question is is how so the question is
how much prior structure you have to put
in the neural net so that something like
human reasoning will emerge from it you
know from running another question is
all of our kind of model of what
reasoning is that are based on logic are
discrete and and and are therefore
incompatible with gradient based
learning and I was very strong believer
in this idea Grandin baserunning I don't
believe that other types of learning
that don't use kind of gradient
information if you want so you don't
like discrete mathematics you don't like
anything discrete
well that's it's not that I don't like
it it's just that it's it's incompatible
with learning and I'm a big fan of
running right so in fact that's perhaps
one reason why deep learning has been
kind of looked at with suspicion by a
lot of computer scientists because the
math is very different the method you
use for deep running you know we kind of
as more to do with you know cybernetics
the kind of math you do in electrical
engineering then the kind of math you
doing computer science and and you know
nothing in in machine learning is exact
right computer science is all about sort
of you know obviously compulsive
attention to details of like you know
every index has to be right and you can
prove that an algorithm is correct right
machine learning is the science of
sloppiness really that's beautiful so
okay maybe let's feel around in the dark
of what is a neural network that reasons
or a system that is works with
continuous functions that's able to do
build knowledge however we think about
reasoning builds on previous knowledge
build on extra knowledge create new
knowledge generalized outside of any
training set ever built what does that
look like if yeah maybe
do you have Inklings of thoughts of what
that might look like well yeah I mean
yes or no if I had precise ideas about
this I think you know we'd be building
it right now but and there are people
working on this or whose main research
interest is actually exactly that right
so what you need to have is a working
memory so you need to have some device
if you want some subsystem they can
store a relatively large number of
factual episodic information for you
know a reasonable amount of time so you
you know in the in the brain for example
it kind of three main types of memory
one is the sort of memory of the the
state of your cortex and that sort of
disappears within 20 seconds you can't
remember things for more than about 20
seconds or a minute if if you don't have
any other form of memory the second type
of memory which is longer term is short
term is the hippocampus so you can you
know you came into this building you
remember whether where the the exit is
where the elevators are you have some
map of that building that's stored in
your hippocampus you might remember
something about what I said you know if
you
minutes ago and forgot all our stars
being raised but you know but that does
not work in your hippocampus and then
the the longer term memory is in the
synapse the synapses right so what you
need if you want for a system that's
capable reasoning is that you want the
hippocampus like thing right and that's
what people have tried to do with memory
networks and you know no Turing machines
and stuff like that right and and now
with transformers which have sort of a
memory in their kind of self attention
system you can you can think of it this
way so so that's one element you need
another thing you need is some sort of
network that can access this memory get
an information back and then kind of
crunch on it and then do this
iteratively multiple times because a
chain of reasoning is a process by which
you you you can you update your
knowledge about the state of the world
about you know what's gonna happen etc
and that there has to be this sort of
recurrent operation basically and you
think that kind of if we think about a
transformer so that seems to be too
small to contain the knowledge that's
that's to represent the knowledge as
containing Wikipedia for example but
transformer doesn't have this idea of
recurrence it's got a fixed number of
layers and that's number of steps that
you know limits basically it's a
representation but recurrence would
build on the knowledge somehow I mean
yeah it would evolve the knowledge and
expand the amount of information perhaps
or useful information within that
knowledge yeah but is this something
that just can emerge with size because
it seems like everything we have now is
just no it's not it's not it's not clear
how you access and right into an
associative memory in efficient way I
mean sort of the original memory network
maybe had something like the right
architecture but if you try to scale up
a memory network so that the memory
contains all we keep here it doesn't
quite work right so so this is a need
for new ideas there okay but it's not
the only form of reasoning so there's
another form of reasoning which is true
which is very classical so in
some types of AI and it's based on let's
call it energy minimization okay so you
have some sort of objective some energy
function that represents the the the
quality or the negative quality okay
energy goes up when things get bad and
they get low when things get good so
let's say you you want to figure out you
know what gestures do I need to to do to
grab an object or walk out the door if
you have a good model of your own body a
good model of the environment using this
kind of energy minimization you can make
a you can make you can do planning and
it's in optimal control it's called it's
called Marie put model predictive
control you have a model of what's gonna
happen in the world as consequence for
your actions and that allows you to buy
energy minimization figure out the
sequence of action that optimizes a
particular objective function which
measures you know minimize the number of
times you're gonna hit something and the
energy gonna spend doing the gesture and
etc so so that's performer reasoning
planning is a form of reasoning and
perhaps what led to the ability of
humans to reason is the fact that or you
know species you know that appear before
us had to do some sort of planning to be
able to hunt and survive and survive the
winter in particular and so you know
it's the same capacity that you need to
have so in your intuition is if you look
at expert systems in encoding knowledge
as logic systems as graphs in this kind
of way is not a useful way to think
about knowledge graphs are your brittle
or logic representation so basically you
know variables that that have values and
constraint between them that are
represented by rules as well too rigid
and too brittle right so one of the you
know some of the early efforts in that
respect were were to put probabilities
on them so a rule you know you know if
you have this in that symptom you know
you have this disease with that
probability and you should
describe that antibiotic with that
probability right this my sin system
from the for the 70s and that that's
what that branch of AI led to you know
busy networks in graphical models and
causal inference and vibrational you
know method so so there there is I mean
certainly a lot of interesting work
going on in this area the main issue
with this is is knowledge acquisition
how do you reduce a bunch of data to
graph of this type near relies on the
expert and a human being to encode at
add knowledge and that's essentially
impractical yeah the question the second
question is do you want to represent
knowledge symbols and you want to
manipulate them with logic and again
that's incomparable we're learning so
one suggestion with geoff hinton has
been advocating for many decades is
replace symbols by vectors think of it
as pattern of activities in a bunch of
neurons or units or whatever you wanna
call them and replace logic by
continuous functions okay
and that becomes now compatible there's
a very good set of ideas by region in a
paper about 10 years ago by leon go to
on who is here at face book the title of
the paper is for machine learning to
machine reasoning and his idea is that
learning learning system should be able
to manipulate objects that are in the
same space in a space and then put the
result back in the same space so is this
idea of working memory basically and
it's a very enlightening and in the
sense that might learn something like
the simple expert systems
I mean it's with you can learn basic
logic operations there yeah quite
possibly yeah this is a big debate on
sort of how much prior structure you
have to put in for this kind of stuff to
emerge that's the debate I have with
Gary Marcus and people like that yeah
yeah so and the other person so I just
talked to judea pearl mm-hmm well you
mentioned causal inference world
his worry is that the current knew all
networks are not able to learn what
causes what causal inference between
things so I think I think he's right and
wrong about this if he's talking about
the sort of classic type of neural nets
people also didn't worry too much about
this but there's a lot of people now
working on causal inference and there's
a paper that just came out last week by
Leon Mbutu among others develop his path
and push for other people exactly on
that problem of how do you kind of you
know get a neural net to sort of pay
attention to real causal relationships
which may also solve issues of bias in
data and things like this so I'd like to
read that paper because that ultimately
the challenges also seems to fall back
on the human expert to ultimately decide
causality between things people are not
very good at its direction causality
first of all so first of all you talk to
a physicist and physicists actually
don't believe in causality because look
at the all the busy clause or
microphysics are time reversible so
there is no causality the arrow of time
is not right yeah it's it's as soon as
you start looking at macroscopic systems
where there is unpredictable randomness
where there is clearly an arrow of time
but it's a big mystery in physics
actually well how that emerges is that
emergent or is it part of the
fundamental fabric of reality yeah or is
it bias of intelligent systems that you
know because of the second law of
thermodynamics we perceive a particular
arrow of time but in fact it's kind of
arbitrary right so yeah physicists
mathematicians they don't care about I
mean the math doesn't care about the
flow of time well certainly certainly
macro physics doesn't people themselves
are not very good at establishing causal
causal relationships if you ask is I
think it was in one of Seymour Papert
spoken on like children learning you
know he studied with Jean Piaget you
know he's the guy who co-authored the
book perceptron with Marvin Minsky that
kind of killed the first wave
but but he was actually a learning
person he in the sense of studying
learning in humans and machines that's
what he got interested in for scepter on
and he wrote that if you ask a little
kid about what is the cause of the wind
a lot of kids will say they will think
for a while and they'll say oh it's the
the branches in the trees they move and
that creates wind right so they get the
causal relationship backwards and it's
because their understanding of the world
and intuitive physics is not that great
right I mean these are like you know
four or five year old kids you know it
gets better and then you understand that
this it can't be right but there are
many things which we can because of our
common sense understanding of things
what people call common sense yeah and
we understanding of physics we can
there's a lot of stuff that we can
figure out causality even with diseases
we can figure out what's not causing
what often there's a lot of mystery of
course but the idea is that you should
be able to encode that into systems it
seems unlikely to be able to figure that
out themselves well whenever we can do
intervention but you know all of
humanity has been completely deluded for
millennia probably since existence about
a very very wrong causal relationship
where whatever you can explain you
attributed to you know some deity some
divinity right and that's a cop-out
that's the way of saying like I don't
know the cause so you know God did it
right so you mentioned Marvin Minsky and
the irony of you know maybe causing the
first day I winter you were there in the
90s you're there in the 80s of course in
the 90s what do you think people lost
faith and deep learning in the 90s and
found it again a decade later over a
decade later yeah it wasn't called
dethroning yeah it was just called
neural nets you know
yeah they lost interests I mean I think
I would put that around 1995 at least
the machine learning community there was
always a neural net community but it
became
disconnected from sort of ministry
machine owning if you want
there were it was basically electrical
engineering that kept at it and computer
science just gave up give up on neural
nets I don't I don't know you know I was
too close to it to really sort of
analyze it with sort of a unbiased eye
if you want but I would I would I would
would make a few guesses so the first
one is at the time neural nets were it
was very hard to make them work in the
sense that you would you know implement
back prop in your favorite language and
that favorite language was not Python it
was not MATLAB it was not any of those
things cuz they didn't exist right you
had to write it in Fortran or C or
something like this right so you would
experiment with it you would probably
make some very basic mistakes like you
know badly initialize your weights make
the network too small because you read
in the textbook you know you don't want
too many parameters right and of course
you know and you would train on x4
because you didn't have any other data
set to try it on and of course you know
it works half the time so we'd say you
give up also 22 the batch gradient which
you know isn't it sufficient so there's
a lot of bag of tricks that you had to
know to make those things work or you
had to reinvent and a lot of people just
didn't and they just couldn't make it
work so that's one thing the investment
in software platform to be able to kind
of you know display things figure out
why things don't work and I get a good
intuition for how to get them to work
have enough flexibility so you can
create you know network architectures
well completion ads and stuff like that
it was hard yeah when you had to write
everything from scratch and again you
didn't have any Python or MATLAB or
anything right so what I read that sorry
to interrupt but I read he wrote in in
Lisp the first versions of Lynette
accomplished in your networks which by
the way one of my favorite languages
that's how I knew you were legit the
Turing Award whatever this would be
programmed and list that's still my
favorite language but it's not that we
programmed in Lisp it's that we had to
write or this printer printer okay cuz
it's not that's right that's one that
existed so
we wrote a lisp interpreter that we
hooked up to you know back in library
that we wrote also for neural net
competition and then after a few years
around 1991 we invented this idea of
basically having modules that know how
to forward propagate and back propagate
gradients and then interconnecting those
modules in a graph loom but who had made
proposals on this about this in the late
80s and were able to implement this
using all this system eventually we
wanted to use that system to make build
production code for character
recognition at Bell Labs so we actually
wrote a compiler for that disp
interpreter so that Christy Martin who
is now Microsoft kind of did the bulk of
it with Leone and me and and so we could
write our system in lisp and then
compiled to seee and then we'll have a
self-contained complete system that
could kind of do the entire thing
neither Python or turn pro can do this
today yeah okay it's coming yeah I mean
there's something like that in
Whitehorse called you know tor script
and so you know we had to write or Lisp
interpreter which retinol is compiler
way to invest a huge amount of effort to
do this and not everybody if you don't
completely believe in the concept
you're not going to invest the time to
do this right now at the time also you
know it were today this would turn into
torture by torture and so for whatever
we put it in open-source everybody would
use it and you know realize it's good
back before 1995 working at AT&T there's
no way the lawyers would let you release
anything in open source of this nature
and so we could not distribute our code
really and at that point and sorry to go
on a million tangents but on that point
I also read that there was some almost
pad like a patent on convolution your
network yes it was labs so that first of
all I mean just to actually that ran out
the thankfully 8007 in 2007 that what
look can we can we just talk about that
first I know you're a facebook but
you're also done why you and and what
does it mean
patent ideas like these software ideas
essentially or what are mathematical
ideas or what are they okay so they're
not mathematical idea so there are you
know algorithms and there was a period
where the US Patent Office would allow
the patent of software as long as it was
embodied the Europeans are very
different they don't they don't quite
accept that they have a different
concept but you know I don't I know no I
mean I never actually strongly believed
in this but I don't believe in this kind
of patent Facebook basically doesn't
believe in this kind of pattern
Google Files patterns because they've
been burned with Apple and so now they
do this for defensive purpose but
usually they say we're not going to see
you if you infringe Facebook has a
similar policy they say you know we file
pattern on certain things for defensive
purpose we're not going to see you if
you infringe unless you sue us
so the the industry does not believe in
in patterns they are there because of
you know the legal landscape and and and
various things but but I don't really
believe in patterns for this kind of
stuff yes so that's that's a great thing
so I tell you a war story yeah you so
what happens was the the first the first
pattern of a condition that was about
kind of the early version Congress on
that that didn't have separate pudding
layers it had the conditional layers
which tried more than one if you want
right and then there was a second one on
commercial nets with separate pudding
layers
train with back probably in 89 and 1992
something like this at the time the life
life of a pattern was 17 years so here's
what happened over the next few years is
that we started developing character
recognition technology around commercial
Nets
and in 1994 a check reading system was
deployed in ATM machines in 1995 it was
for a large check reading machines in
back offices etc and those systems were
developed by an engineering group that
we were collaborating with AT&T and they
were commercialized by NCR which at the
time was a subsidiary of AT&T now it
ain't he split up in 1996
99 in 1996 and the lawyers just looked
at all the patterns and they distributed
the patterns among the various companies
they gave the the commercial net pattern
to NCR because they were actually
selling products that used it but nobody
I didn't see are at any idea where they
come from that was yeah okay so between
1996 and 2007
there's a whole period until 2002 I
didn't actually work on machine on your
couch on that I resumed working on this
around 2002 and between 2002 and 2007 I
was working on them crossing my finger
that nobody and NCR would notice nobody
noticed yeah and I and I hope that this
kind of somewhat as you said lawyers
decide relative openness of the
community now will continue
it accelerates the entire progress of
the industry and you know the problems
that Facebook and Google and others are
facing today is not whether Facebook or
Google or Microsoft or IBM or whoever is
ahead of the other it's that we don't
have the technology to build the things
we want to build we only build
intelligent virtual systems that have
common sense we don't have a monopoly on
good ideas for this we don't believe
with you maybe others do believe they do
but we don't okay if a start-up tells
you they have the secret to you know
human level intelligence and common
sense don't believe them they don't and
it's going to take the entire work of
the world research community for a while
to get to the point where you can go off
and in each of the company is going to
start to build things on this we're not
there yet
it's absolutely in this this calls to
the the gap between the space of ideas
and the rigorous testing of those ideas
of practical application that you often
speak to you've written advice saying
don't get fooled by people who claim to
have a solution to artificial general
intelligence who claim to have an AI
system that work just like the human
brain or who claim to have figured out
how the brain works ask them what the
error rate they get on em 'no store
imagenet this is a little dated by the
way that mean five years who's counting
okay but i think your opinion it's the
Amna stand imagenet yes may be data
there may be new benchmarks right but i
think that philosophy is one you still
and and somewhat hold that benchmarks
and the practical testing the practical
application is where you really get to
test the ideas well it may not be
completely practical like for example
you know it could be a toy data set
but it has to be some sort of task that
the community as a whole has accepted as
some sort of standard you know kind of
benchmark if you want it doesn't need to
be real so for example many years ago
here at fair people you know chosen
Western art one born and a few others
proposed the the babbitt asks which were
kind of a toy problem to test the
ability of machines to reason actually
to access working memory and things like
this and it was very useful even though
it wasn't a real task amnesties kind of
halfway a real task so you know toy
problems can be very useful it's just
that i was really struck by the fact
that a lot of people particularly our
people with money to invest would be
fooled by people telling them oh we have
you know the algorithm of the cortex and
you should give us 50 million yes
absolutely so there's a lot of people
who who tried to take advantage of the
hype for business reasons and so on but
let me sort of talk to this idea that
new ideas the ideas that push the field
forward
may not yet have a benchmark or it may
be very difficult to establish a
benchmark I agree that's part of the
process establishing benchmarks is part
of the process so what are your thoughts
about so we have these benchmarks on
around stuff we can do with images from
classification to captioning to just
every kind of information can pull off
from images and the surface level
there's audio datasets there's some
video what can we start natural language
what kind of stuff what kind of
benchmarks do you see they start
creeping on to more something like
intelligence like reasoning like maybe
you don't like the term but AGI echoes
of that kind of yeah sort of elation a
lot of people are working on interactive
environments in which you can you can
train and test intelligent systems so so
there for example you know it's the
classical paradigm of supervised running
is that you you have a data set you
partition it into a training site
validation set test set and there's a
clear protocol right but what if the
that assumes that this
apples are statistically independent you
can exchange them the order in which you
see them doesn't shouldn't matter you
know things like that but what if the
answer you give determines the next
sample you see which is the case for
example in robotics right you robot does
something and then it gets exposed to a
new room and depending on where it goes
the room would be different so that's
the decrease the exploration problem
the what if the samples so that creates
also a dependency between samples right
you you if you move if you can only move
it in in space the next sample you're
gonna see is going to be probably in the
same building most likely so so so the
all the assumptions about the validity
of this training set test set a potus's
break whatever a machine can take an
action that has an influence in the in
the world and it's what is going to see
so people are setting up artificial
environments where what that takes place
right the robot runs around a 3d model
of a house and can interact with objects
and things like this how you do robotics
by simulation you have those you know
opening a gym type thing or mu Joko kind
of simulated robots and you have games
you know things like that so that that's
where the field is going really this
kind of environment now back to the
question of a GI like I don't like the
term a GI because it implies that human
intelligence is general and human
intelligence is nothing like general
it's very very specialized we think it's
general we'd like to think of ourselves
as having your own science we don't
we're very specialized we're only
slightly more general than why does it
feel general so you kind of the term
general I think what's impressive about
humans is ability to learn as we were
talking about learning to learn in just
so many different domains is perhaps not
arbitrarily general but just you can
learn in many domains and integrate that
knowledge somehow okay that knowledge
persists so let me take a very specific
example yes it's not an example it's
more like a a quasi mathematical
demonstration so you have about 1
million fibers coming out of
one of your eyes okay two million total
but let's let's talk about just one of
them it's 1 million nerve fibers your
optical nerve let's imagine that they
are binary so they can be active or
inactive right so the input to your
visual cortex is 1 million bits
now they connected to your brain in a
particular way on your brain has
connections that are kind of a little
bit like accomplish on that they're kind
of local you know in space and things
like this I imagine I play a trick on
you it's a pretty nasty trick I admit I
I cut your optical nerve and I put a
device that makes a random perturbation
of a permutation of all the nerve fibers
so now what comes to your to your brain
is a fixed but random permutation of all
the pixels there's no way in hell that
your visual cortex even if I do this to
you in infancy will actually learn
vision to the same level of quality that
you can got it and you're saying there's
no way you ever learn that no because
now two pixels that on your body in the
world will end up in very different
places in your visual cortex and your
neurons there have no connections with
each other because they only connect it
locally so this whole our entire the
hardware is built in many ways to
support the locality of the real world
yeah yes that's specialization yep okay
it's still now really damn impressive so
it's not perfect generalization I even
closed no no it's it's it's it's not
that it's not even close it's not at all
yes it's socialize so how many boolean
functions so let's imagine you want to
train your visual system to you know
recognize particular patterns of those 1
million bits ok so that's a boolean
function right either the pattern is
here or not here this is a to to a
classification with 1 million binary
inputs
how many such boolean functions are
there okay if you have 2 to the 1
million combinations of inputs for each
of those you have an output bit and so
you have 2 to the 2 to the 1 million
boolean functions of this type okay
which is an unimaginably large number
how many of those functions can actually
be computed by your visual cortex and
the answer is a tiny tiny tiny tiny tiny
tiny sliver like an enormous little tiny
sliver yeah yeah so we are ridiculously
specialized you know okay but okay
that's an argument against the word
general I think there's there's a I
there's I agree with your intuition but
I'm not sure it's it seems the breath
the the brain is impressively capable of
adjusting to things so it's because we
can't imagine tasks that are outside of
our comprehension right we think we
think we are general because we're
general of all the things that we can
apprehend so yeah but there is a huge
world out there of things that we have
no idea
we call that heat by the way heat heat
so at least physicists call that heat or
they call it entropy which is kokkonen
you have a thing full of gas right call
system for gas right goes on a coast it
has you know pressure it has temperature
has you know and you can write the
equations PV equal NRT you know things
like that right when you reduce a volume
the temperature goes up the pressure
goes up you know things like that right
for perfect gas at least those are the
things you can know about that system
and it's a tiny tiny number of bits
compared to the complete information of
the state of the entire system because
the state when HR system will give you
the position and momentum of every every
molecule of the gas and what you don't
know about it is the entropy and you
interpret it as heat the energy
containing that thing is is what we call
heat now it's very possible that in fact
there is some very strong structure in
how those molecules are moving is just
that they are in a way that we are just
not wired to perceive they are ignorant
to it and there's in your infinite
amount of things we're not wired to
perceive any right that's a nice way to
put it
well general to all the things we can
imagine which is a very tiny a subset of
all things that are possible it was like
coma growth complexity or the coma was
charged in some one of complexity you
know every bit string or every integer
is random except for all the ones that
you can actually write down yeah okay so
beautifully put but you know so we can
just call it artificial intelligence we
don't need to have a general whatever
novel
human of all Nutella transmissible oh
you know you'll start anytime you touch
human it gets it gets interesting
because you know it's just because we
attach ourselves to human and it's
difficult to define with human
intelligences yeah
nevertheless my definition is maybe damn
impressive intelligence ok damn
impressive demonstration of intelligence
whatever and so on that topic most
successes in deep learning have been in
supervised learning what is your view on
unsupervised learning is there a hope to
reduce involvement of human input and
still have successful systems that are
have practically used yeah I mean
there's definitely a hope is it's more
than a hope actually it's it's you know
mounting evidence for it and that's
basically or I do like the only thing
I'm interested in at the moment is
I call it self supervised running not
unsupervised cuz unsupervised running is
a loaded term people who know something
about machine learning you know tell us
how you doing clustering or PCA yeah
she's nice and the way public we know
when you say enterprise only oh my god
you know machines are gonna learn by
themselves and without supervision you
know there's the parents yeah so so I
could sell supervised learning because
in fact the underlying algorithms that I
use are the same algorithms as the
supervised learning algorithms except
that what we trained them to do is not
predict a particular set of variables
like the category of an image and and
not to predict a set of variables that
have been provided by human labelers but
what you're trying to machine to do is
basically reconstruct a piece of its
input that it's being this being masked
masked out essentially you can think of
it this way right so show a piece of a
video to a machine and ask it to predict
what's gonna happen next and of course
after a while you can show what what
happens and the machine will kind of
train itself to do better at that task
you can do like all the latest most
successful models the natural language
processing use cell supervised running
you know sort of bird style systems for
example right you show it a window of a
thousand words on a test corpus you take
out 15% of the words and then you train
a machine to predict the words that are
missing that's out supervised running
it's not predicting the future it's just
you know predicting things in middle but
you could have you predict the future
that's what language models do so you
construct it so in an unsupervised way
you construct a model of language do you
think or video or the physical world or
whatever right how far do you think that
can take us do you think very far it
understands anything to some level it
has you know a shallow understanding of
of text but it needs to I mean to have
kind of true human level intelligence I
think you need to ground language in
reality so some people are attempting to
do this right having systems that can I
have some visual representation of what
what is being talked about which is one
reason you need interactive environments
actually this is like a huge technical
problem that is not solved and that
explains why such super versioning works
in the context of natural language that
does not work in the context on at least
not well in the context of image
recognition and video although it's
making progress quickly and the reason
that reason is the fact that it's much
easier to represent uncertainty in the
prediction you know context of natural
language than it is in the context of
things like video and images so for
example if I ask you to predict what
words are missing you know 15 percent of
the words that I've taken out the
possibility is small that means small
right there is 100,000 words in the in
the lexicon and what the Machine spits
out is a big probability vector right
it's a bunch of numbers between 0 & 1
that's 1 to 1 and we know how to do how
to do this with computers so they are
representing uncertainty in the
prediction is relatively easy and that's
in my opinion why those techniques work
for NLP for images if you ask if you
block a piece of an image and you as a
system reconstruct that piece of the
image
there are many possible answers there
are all perfectly legit right and how do
you represent that the set of possible
answers
you can't train a system to make one
prediction you can train a neural net to
say here it is that's the image because
it's there's a whole set of things that
are compatible with it so how do you get
the machine to represent not a single
output but all set of outputs and you
know similarly with video prediction
there's a lot of things that can happen
in the future video you're looking at me
right now I'm not moving my head very
much but you know I might you know what
turn my my head to the left or to the
right right if you don't have a system
that can predict this and you train it
with least Square to kind of minimize
the error with the prediction and what
I'm doing
what you get is a blurry image of myself
in all possible future positions that I
might be in which is not a good
prediction but so there might be other
ways to do the self supervision right
for visual scenes like what if i I mean
if I knew I wouldn't tell you
publish it first I don't know I know
there might be so I mean these are kind
of there might be artificial ways of
like self play in games the way you can
simulate part of the environment you can
oh that doesn't solve the problem it's
just a way of generating data but
because you have more of a country might
mean you can control yeah it's a way to
generate data and that's right and
because you can do huge amounts of data
generation that doesn't you write this
well it's it's a creeps up on the
problem from the side of data and you
don't think that's the right way to it
doesn't solve this problem of handling
uncertainty in the world right so if you
if you have a machine learn a predictive
model of the world in a game that is
deterministic or quasi deterministic
it's easy right just you know give a few
frames of the game to a combat put a
bunch of layers and then half the game
generates the next few frames and and if
the game is deterministic it works fine
and that includes you know feeding the
system with the action that your little
character is going to take
the problem comes from the fact that the
real world and certain most games are
not entirely predictable that's what
they're you get those blurry predictions
and you can't do planning with very
predictions all right so if you have a
perfect model of the world you can in
your head run this model with a
hypothesis for a sequence of actions and
you're going to predict the outcome of
that sequence of actions but if your
model is imperfect how can you plan yeah
it quickly explodes what are your
thoughts on the extension of this which
topic I'm super excited about it's
connected to something you're talking
about in terms of robotics is active
learning so as opposed to sort of
unemployed and supervisors self
supervised learning you ask the system
for human help right for selecting parts
you want annotated next so if you talk
about a robot exploring a space or a
baby exploring a space or a system
exploring a data set every once in a
while asking for human input you see
value in that kind of work I don't see
transformative value it's going to make
things that we can already do more
efficient or they will learn slightly
more efficiently but it's not going to
make machines sort of significantly more
intelligent I think and I and by the way
there is no opposition there is no
conflict between self supervisor on
reinforcement learning and supervisor on
your imitation learning or active
learning
I see sub super wrestling as a as a
preliminary to all of the above yes so
the example I use very often is how is
it that so if you use
enforcement running deep enforcement
running if you want the best methods
today was so-called model free
enforcement training to learn to play
Atari games take about 80 hours of
training to reach the level that any
human can reach in about 15 minutes they
get better than humans but it takes a
long time alpha star okay the you know
are your videos and his team's the
system to play to to play Starcraft
plays you know a single map a single
type of player and
which
better than human level is about the
equivalent of 200 years of training
playing against itself it's 200 years
right it's not something that no no
human can could every I'm not sure what
it doesn't take away from that okay now
take those algorithms the best our
algorithms we have today to train a car
to drive itself it would probably have
to drive millions of hours you will have
to kill thousands of pedestrians it will
have to run into thousands of trees it
will have to run off cliffs and you had
to run the cliff multiple times before
it figures out it's a bad idea first of
all
yeah and second of all the figures that
had not to do it and so I mean this type
of running obviously does not reflect
the kind of running that animals and
humans do there is something missing
that's really really important there and
my apart is is which have been
advocating for like five years now is
that we have predictive models of the
world that include the ability to
predict under uncertainty and what
allows us to not run off a cliff when we
learn to drive most of us can learn to
drive in about 20 or 30 hours of
training without ever crashing causing
any accident if we drive next to a cliff
we know that if we turn the wheel to the
right the car is going to run off the
cliff and nothing good is gonna come out
of this because we have a pretty good
model of intuitive physics that tells us
you know the car is gonna fall we know
we know about gravity babies run this
around the age of eight or nine months
that objects don't float they fall and
you know we have a pretty good idea of
the effect of turning the wheel of the
car and you know we know we need to stay
on the road so there is a lot of things
that we bring to the table which is
basically or predictive model of the
world and that model allows us to not do
stupid things and to basically stay
within the context of things we need to
do we still face you know unpredictable
situations and that's how we learn but
that allows us to learn really really
really quickly so that's called
model-based reinforcement running
there's some imitation and supervised
running because we have a driving
instructor that tells us occasionally
what to do but most of the learning is
Mauro bass is learning the model yeah
running physics that we've done since we
were babies that's where all almost all
are learning and the physics is somewhat
transferable from is transferable from
scene to scene stupid things are the
same everywhere yeah I mean if you you
know you have experience of the world
you don't need to be particularly from a
particularly intelligent species to know
that if you spill water from a container
you know the rest is gonna get wet and
you might get wet so you know cats know
this right yeah so the main problem we
need to solve is how do we learn models
of the world that's and that's what I'm
interesting that's what's a supervised
learning is all about if you were to try
to construct a benchmark for let's let's
look at happiness I'd love that dataset
but if you do you think it's useful
interesting / possible to perform well
on eminence with just one example of
each digit and how would we solve that
problem
yeah so it's probably yes the question
is what other type of running are you
allowed to do so if what you like to do
is train on some gigantic data set of
labelled digit that's called transfer
running and we know that works okay
we do this at Facebook like in
production right we we train large
commercial nets to predict hashtags that
people type on Instagram and we train on
billions of images literally billions
and and then we chop off the last layer
and fine-tune on whatever task we want
that works really well you can be you
know the image net record with we
actually open source the whole thing
like a few weeks ago yeah that's still
pretty cool but yeah so what in yet
won't be impressive and what's useful an
impressive what kind of transfer
learning would be useful impressive is
it Wikipedia that kind of thing no no I
don't think transfer learning is really
where we should focus we should try to
do
you know have a kind of scenario for
benchmark where you have only ball data
and you can and it's very large number
of enabled data it could be video clips
it could be what you do you know frame
prediction it could be images you could
choose to you know mask a piece of it it
could be whatever but they're only bold
and you're not allowed to label them so
you do some training on this and then
you train on a particular supervised
task imagenet or nist and you measure
how your test our decrease or variation
error decreases as you increase the
number of label training samples okay
and and what what you would like to see
is is that you know your your error
decreases much faster than if you
trained from scratch from random weights
so that to reach the same level of
performance and a completely supervised
purely supervised system would reach you
would need way fewer samples so that's
the crucial question because it will
answer the question to like you know
people are interested in medical image
analysis okay you know if I want to get
to a particular level of error rate for
this task I know I need a million
samples can I do you know soft
supervised pre-training to reduce this
to about 100 or something anything the
answer there is soft supervised
retraining yep some form some form of it
telling you active learning but you
disagree you know it's not useless it's
just not gonna lead to a quantum leap
it's just gonna make things that we
already do so you're way smarter than me
I just disagree with you but I don't
have anything to back that it's just
intuition so I've worked a lot of
large-scale data sets and there's
something there might be magic and
active learning but okay at least I said
it publicly at least some being an idea
publicly okay it's not bigoted yet it's
you know working with the data you have
I mean I mean certainly people are doing
things like okay I have three thousand
hours of you know imitation running for
in car but most of those are incredibly
boring what I like is select you know
10% of them that are kind of the most
informative and with just that I would
probably reach the same so it's a weak
form of of active running if you want
yes but there might be a much stronger
version yeah that's right that's what
another notion question is the question
is how much talking yet Elon Musk is
confident talk to him recently
he's confident that large-scale data and
deep learning can solve the autonomous
driving problem what are your thoughts
on the limitless possibilities of deep
learning in this space I was it's
obviously part of the solution I mean I
don't think we'll ever have a set
driving system or it is not in the
foreseeable future that does not use
deep running you put it this way now how
much of it so in the history of sort of
engineering particularly is sort of sort
of a I like systems is generally your
first phase where everything is built by
hand and it was the second phase and
that was the case for autonomous driving
you know 23 years ago there's a phase
where this a little bit of running is
used but there's a lot of engineering
that's involved in kind of you know
taking care of corner cases and and
putting limits etc because the learning
system is not perfect and then I as
technology progresses we end up relying
more and more on learning that's the
history of character recognition is a
history of speech recognition now
computer vision that ronnie was
processing and I think the same is going
to happen with with the term is driving
that currently the the the methods that
are closest to providing some level of
autonomy some you know a decent level of
autonomy where you don't expect a driver
to kind of do anything is where you
constrain the world so you only run
within you know 100 square kilometers or
square miles in Phoenix but the weather
is nice and the roads are wide it wishes
what Weimer is doing you completely over
engineer the car with tons of light
hours and sophisticated sensors that are
too expensive for consumer cars but
they're fine if you just run a fleet
and you engineer the thing the hell out
of the everything else you you map the
entire world so you have complete 3d
model of everything so the only thing
that the perception system has to take
care of is moving objects and and and
construction and sort of you know things
that that weren't in your map and you
can engineer a good you know slam system
or eye stuff right so so that's kind of
the current approach that's closest to
some level of autonomy but I think
eventually the long term solution is
going to rely more and more on learning
and possibly using a combination of
supervised learning and model-based
reinforcement or something like that but
ultimately learning will be at not just
at the core but really the fundamental
part of the system yeah it already is
but it'll become more and more what do
you think it takes to build a system
with human level intelligence you talked
about the AI system and then we her
being way out of reach our current reach
this might be outdated as well but this
is still way out of reach what would it
take to build her do you think so I can
tell you the first two obstacles that we
have to clear but I don't know how many
obstacles they are after this so the
image I usually use is that there is a
bunch of mountains that we have to climb
and we can see the first one but we
don't know if there are 50 mountains
behind it or not and this might be a
good sort of metaphor for why AI
researchers in the past I've been overly
optimistic about the result of AI you
know for example New Orleans Simon
Wright wrote the general problem solver
and they call it the general problems
you have problems okay and of course if
it's you realize is that all the
problems you want to solve is financial
and so you can't actually use it for
anything useful but you know yes oh yeah
all you see is the first peak so in
general what are the first couple of
peaks for her so the first peak which is
precisely what I'm working on is self
supervisor running high how do we get
machines to learn models of the world by
observation kind of like babies and like
young animals
so I we've been working with you know
cognitive scientists so this Amanda
depuis who is at fair and in Paris is
half-time is also a researcher and
French University and he he has his
chart that shows that which how many
months of life baby humans kind of
learned different concepts and you can
met you can measure this various ways so
things like distinguishing animate
objects from animate inanimate object
you can you can tell the difference at
age to three months whether an object is
going to stay stable is gonna fall you
know about four months you can tell you
know things like this and then things
like gravity the fact that objects are
not supposed to float in the air but as
opposed to fall you run this around the
age of eight or nine months if you look
at a lot of you know eight month old
babies you give them a bunch of toys on
the highchair first thing they do is
it's why I'm on the ground that you look
at them it's because you know they're
learning about actively learning about
gravity gravity yeah okay so they're not
trying to know you but they you know
they need to do the experiment right
yeah so you know how do we get machines
to learn like babies mostly by
observation with a little bit of
interaction and learning those those
those models of the world because I
think that's really a crucial piece of
an intelligent autonomous system so if
you think about the architecture of an
intelligent autonomous system it needs
to have a predictive model of the world
so something that says here is a wall
that time T here is a stable world at
time T plus one if I take this action
and it's not a single answer it can be
education yeah yeah well but we don't
know how to represent distributions in
high dimension continuous basis so it's
got to be something we care that data
Hey but with some summer presentation
with certainty if you have that then you
can do what optimal control theory is
called model predictive control which
means that you can run your model with
the hypothesis for a sequence of action
and then see the result now what you
need the other thing you need is some
sort of objective that you want to
optimize am i reaching the goal of
grabbing the subject about minimizing
energy am I
whatever right so there is some sort of
objectives that you have to minimize and
so in your head if you had this model
you can figure out the sequence of
action that will optimize your objective
that objective is something that
ultimately is rooted in your basal
ganglia at least in the human brain
that's that's what is available Gambia
computes your level of contentment or
miss contentment oh no noise that's a
word unhappiness okay yeah this
contentment this contentment and so your
entire behavior is driven towards kind
of minimizing that objective which is
maximizing your contentment computed by
your your basal ganglia and what you
have is an objective function which is
basically a predictor of what your basal
ganglia is going to tell you so you're
not going to put your hand on fire
because you know it's gonna you know
it's gonna burn and you're gonna get
hurt and you're predicting this because
of your model of the world and your your
predictor of this objective right so you
if you have those you have those three
components you have four components you
have the the hard-wired contentment
objective good computer if you want
calculator and then you have the three
components one is the objective
predictor which basically predicts your
level of contact and one is the model of
the world and there's a third module I
didn't mention which is a module that
will figure out the best course of
action to optimize an objective given
your model okay yeah cool it's a policy
policy network or something like that
right now you need those three
components to act autonomously
intelligently and you can be stupid in
three different ways you can be stupid
because your model of the world is wrong
you can be stupid because your objective
is not aligned with what you actually
want to achieve okay and in humans that
would be a psychopath right and then the
the third thing you the third way you
can be stupid is that you have the right
model you have the right objective but
you're unable to figure out a course of
action to optimize your objective given
your model
some people who are in charge of big
countries actually have all three that
are wrong all right which countries I
don't know okay so if we think about
this this agent if you think about the
movie her you've criticized the art
project that is Sophia the robot and
what that project essentially does is
uses our natural inclination to
anthropomorphize things that look like
human and given more do you think that
could be used by AI systems like in the
movie her
so do you think that body is needed to
create a feeling of intelligence
well if Sophia was just an art piece I
would have no problem with it but it's
presented as something else let me add
that comics real quick if creators of
Sofia could change something about their
marketing or behavior in general what
would it be what what's just about
everything I mean don't you think here's
a tough question I mean so I agree with
you so Sofia is not in the general
public feels that Sofia can do way more
than she actually can that's right and
the people will create a Sofia are not
honestly publicly communicating trying
to teach the public right but here's a
tough question don't you think this the
same thing is scientists in industry and
research are taking advantage of the
sameness misunderstanding in the public
when they create AI companies or
published stuff some companies yes I
mean there is no sense of there's no
desire to delude there's no desire to
kind of over claim what something is
done right you know you should paper on
AI that you know has this result on
image net you know it's pretty clear I
mean it's not even not even interesting
anymore but you know I I don't think
there is that I mean the reviewers are
generally not very forgiving of of you
know unsupported claims of this type and
but there are certainly quite a few
startups that have had a huge amount of
hype around this that I find extremely
damaging and I've been calling it out
when I've seen it so yeah but to go back
to your original question like the
necessity of embodiment I think I don't
think embodiment is necessary I think
grounding is necessary so I don't think
we're gonna get machines that I really
understand language without some level
of grounding in the world world and it's
not clear to me that language is a kind
of bandwidth medium to communicate how
the real world works I think what this
doctor ground our grounding means so
running me he's that
so there is this classic problem of
common sense reasoning you know the the
Winograd Winograd schema right and so I
tell you the the trophy doesn't fit in
the suitcase because this tool is too
big what the trophy doesn't fit in the
suitcase because it's too small and the
it in the first case refers to the
trophy in the second case to the
suitcase and the reason you can figure
this out is because you know what the
trophy in the suitcase are you know one
is supposed to fit in the other one and
you know the notion of size and the big
object doesn't fit in a small object and
this is a TARDIS you know it things like
that right so you have this got this
knowledge of how the world works of
geometry and things like that I don't
believe you can learn everything about
the world by just being told in language
how the world works I think you need
some low-level perception of the world
you know be a visual touch you know
whatever but some higher bandwidth
perceptions of the world but by reading
all the world's text you still may not
have enough information that's right
there's a lot of things that just will
never appear in text and that you can't
really infer so I think common sense
will emerge from you know certainly a
lot of language interaction but also
with watching videos or perhaps even
interacting in the in virtual
environments and possibly you know robot
interacting in the real world but I
don't actually believe necessarily that
this last one is absolutely necessary
but I think there's a need for some
grounding but the final product doesn't
necessarily need to be embodied you know
who say no it just needs to have an
awareness a grounding right but it needs
to know how the world works to have you
know to not be frustrated frustrating to
talk to and you talked about emotions
being important that's that's a whole
nother topic well so you know I talked
about this the the basal ganglia ganglia
as the you know this thing that could
you know calculates your level of miss
contentment contentment and then there
is this other module that sort of tries
to do a prediction of whether you're
going to be content or not that's the
source of some emotion so here for
example is an anticipation of bad things
that can happen to you right
you have this inkling that there is some
chance that something really bad is
gonna happen to you and that creates
here when you know for sure that
something bad is gonna happen to you you
cannot give up right it's not bad
anymore it's uncertainty it creates fear
so so the punchline is yes we're not
gonna have a ton of intelligence without
emotions whatever the heck emotions are
so you mentioned very practical things
of fear but there's a lot of other mess
around but there are kind of the results
of you know drives
yeah there's deeper biological stuff
going on and I've talked a few folks on
this there's a fascinating stuff that
ultimately connects to our joy to our
brain if we create an AGI system sorry
interminable human level intelligence
system and you get to ask her one
question what would that question be you
know I think the the first one we'll
create would probably not be that smart
did you like a four-year-old okay so you
would have to ask her a question - no
she's not that smart
yeah well what's a good question to ask
you know to be responsive wind and if
she answers oh it's because the leaves
of the tree are moving in that creates
wind she's on to something and if she
says yeah that's a stupid question
she's really obtuse no and then you tell
her actually you know here is the the
real thing and she says oh yeah that
makes sense
so questions that that reveal the
ability to do common-sense reasoning
about the physical world yeah and you
know someone will call 20 ferns causal
evidence well it was a huge honor
congratulations returning award you know
and thank you so much for talking today
thank you
you
استعرض المزيد من الفيديوهات ذات الصلة
![](https://i.ytimg.com/vi/sitHS6UDMJc/hq720.jpg)
Possible End of Humanity from AI? Geoffrey Hinton at MIT Technology Review's EmTech Digital
![](https://i.ytimg.com/vi/2EDP4v-9TUA/hq720.jpg)
Season 2 Ep 22 Geoff Hinton on revolutionizing artificial intelligence... again
![](https://i.ytimg.com/vi/2KbvJ3iapbc/hq720.jpg?sqp=-oaymwEmCIAKENAF8quKqQMa8AEB-AH-CYAC0AWKAgwIABABGHIgSSgrMA8=&rs=AOn4CLAO8xPR6pX0u1Sq1ywwmODGW6glfA)
3. Cognitive Architectures
![](https://i.ytimg.com/vi/qJZ_1a-t_sA/hq720.jpg)
6. Layers of Mental Activities
![](https://i.ytimg.com/vi/9EN_HoEk3KY/hq720.jpg)
Ilya Sutskever: OpenAI Meta-Learning and Self-Play | MIT Artificial General Intelligence (AGI)
![](https://i.ytimg.com/vi/MiqLoAZFRSE/hq720.jpg?sqp=-oaymwEmCIAKENAF8quKqQMa8AEB-AH-CYAC0AWKAgwIABABGHIgSChIMA8=&rs=AOn4CLBN_krujtKfSIi2i99yM3njNr7mZQ)
Yann Lecun | Objective-Driven AI: Towards AI systems that can learn, remember, reason, and plan
5.0 / 5 (0 votes)