Highlights of the Fireside Chat with Ilya Sutskever & Jensen Huang: AI Today & Vision of the Future
Summary
TLDR在这段访谈中,受访者分享了他对深度学习的直觉和成功背后的思考。他从人工智能的潜在影响和对人类意识的好奇心出发,探讨了机器学习的早期挑战和神经网络的潜力。他强调了无监督学习和压缩的重要性,并介绍了OpenAI的创立初衷和早期项目,如实时策略游戏DotA 2的AI研究。他还讨论了GPT模型的发展,特别是GPT-4在理解、推理和多模态学习方面的显著进步,以及这些技术未来可能的发展方向。
Takeaways
- 🤖 深度学习的强大力量改变了很多东西,包括对人工智能的兴趣和对其影响的认识。
- 🧠 人工智能的进步有助于理解意识和人类经验。
- 💡 2002-2003年,学习是计算机无法做到的,而人工学习被认为是AI进步的关键。
- 🌟 神经网络因其自动编程的特性和与大脑的相似性而被视为具有巨大潜力。
- 🚀 2015-2016年,OpenAI成立时,人工智能领域还处于起步阶段,研究者数量有限。
- 🔍 OpenAI最初的两大理念是无监督学习和通过压缩实现学习,这在当时是一个未解决的问题。
- 📈 通过预测下一个字符或token,神经网络能够学习并压缩数据,从而发现数据中的秘密。
- 🎯 扩大模型规模、增加训练数据可以提高性能,这是OpenAI从一开始就坚持的信念。
- 🤔 GPT-3和GPT-4之间的主要区别在于GPT-4在预测下一个单词方面更加精确,这表明了其更深的理解。
- 🌐 多模态学习使神经网络能够从文本和图像中学习,并提高对世界的理解。
- 🔮 未来,语言模型的发展将集中在提高可靠性和信任度上,使系统更加有用。
Q & A
深度学习的成功背后的直觉是什么?
-深度学习成功的直觉源于对人工智能影响的直观理解和对意识、人类经验的好奇心。深度学习被视为人工智能进步的关键,因为它涉及到机器的学习能力,这是早期计算机所不具备的。
Jeff Hinton在深度学习发展中起到了什么作用?
-Jeff Hinton在深度学习发展中起到了关键作用。他是当时在讲述者所在大学的教授,专注于神经网络研究。他的工作使讲述者意识到神经网络具有自动编程并行计算机的特性,这与大脑的工作方式相似,为深度学习的发展提供了巨大的长期潜力。
OpenAI成立初期的工作重点是什么?
-OpenAI成立初期的工作重点包括无监督学习和强化学习。他们相信通过数据压缩可以实现无监督学习,并通过在实时策略游戏(如DotA 2)中的强化学习研究,探索人工智能的应用。
GPT模型的预训练阶段是如何工作的?
-GPT模型的预训练阶段通过训练大型神经网络准确预测不同文本中的下一个单词。这一过程不仅仅是学习文本的统计相关性,而是通过压缩数据来学习世界模型,从而获得对人类条件、希望、梦想、动机以及所处情境的压缩抽象可用表示。
如何提高神经网络的可靠性?
-提高神经网络的可靠性可以通过进一步研究和开发更精确的训练算法来实现。这包括让网络在不理解时请求澄清、在不确定时请求更多信息,以及在不知道某事时能够明确表示出来。这些改进将使神经网络在实际应用中更加值得信赖。
GPT-4与ChatGPT相比有哪些改进?
-GPT-4在多个维度上对ChatGPT进行了显著改进。GPT-4的基础模型能够以更高的准确性预测下一个单词,这表明它对文本的理解更深。此外,GPT-4还展示了在数学问题解决、遵循指令以及视觉理解方面的能力提升。
多模态学习的重要性是什么?
-多模态学习的重要性在于它扩展了神经网络的理解能力,不仅包括文本,还包括图像。这样做可以增强对世界的理解,并且通过视觉信息学习,可能会发现文本中难以捕捉的信息。
GPT-4在视觉理解方面有哪些能力?
-GPT-4能够理解和解释图像内容,包括解释笑话、解释复杂的图表和图解。这种能力使得GPT-4不仅能够处理文本输入,还能处理和理解图像输入,从而提供更全面的解答。
深度学习的发展对讲述者来说最令人惊讶的是什么?
-对讲述者来说,最令人惊讶的是深度学习实际上真的有效。经过多年的发展,即使是简单的神经网络概念,只要扩大规模、改进数据集和训练方式,就能取得如此显著的成果。
未来一两年,语言模型领域可能会有哪些进展?
-未来一两年,语言模型领域可能会在可靠性方面取得显著进展,包括提高系统的信任度、确保系统能够理解并准确地遵循用户意图,以及在不理解时能够请求澄清或承认不了解的内容。
GPT-4在数学问题解决方面有哪些令人印象深刻的表现?
-GPT-4在数学问题解决方面的能力有显著提升,它能够执行复杂的数学推导、单位转换等。这表明GPT-4不仅能够理解数学概念,还能够应用这些概念解决问题。
Outlines
🤖 人工智能的直觉与深度学习的起源
这段对话从对深度学习的兴趣和直觉开始,讲述了人工智能发展的个人历程。起初,对人工智能的兴趣源于对其潜在影响的认识和对人类意识的好奇。2002-2003年间,计算机尚无法进行有效的学习,这成为了人工智能发展的一大障碍。然而,随着Jeff Hinton在神经网络方面的工作,深度学习开始展现出巨大的潜力。尽管当时并不清楚如何实现这一目标,但神经网络与大脑的相似性以及对并行计算机的编程能力,使得这一领域充满了长期的研究价值。
🌟 无监督学习的突破与OpenAI的初步探索
这段对话继续探讨了无监督学习的概念,特别是在2016年,这一领域还远未解决。通过压缩数据实现无监督学习的想法逐渐变得清晰,尤其是GPT模型的出现,证明了数据压缩与学习的关系。此外,还提到了情感神经元的工作,这在机器学习领域产生了深远的影响。OpenAI的创立初衷是探索如何利用规模来推动人工智能的发展,特别是在深度学习和强化学习方面。
🎮 强化学习在实时策略游戏中的应用
这段对话讨论了OpenAI在强化学习方面的首个重大项目,即训练一个能够玩实时策略游戏的智能代理。以DotA 2为例,这个项目展示了人工智能在复杂任务中的潜力。此外,对话还提到了GPT模型的发展,以及如何通过人类反馈进行强化学习,这是Chat GPT诞生的关键。
🚀 GPT模型的发展与Chat GPT的复杂系统
这段对话深入探讨了GPT模型的发展,特别是Chat GPT的复杂性。Chat GPT不仅仅是一个大型语言模型,它还包括了一系列复杂的系统,以确保其行为符合预期。对话中提到了预训练阶段的重要性,以及如何通过人类反馈进行微调,以提高模型的可靠性和有用性。此外,还讨论了GPT-3和GPT-4之间的主要差异,以及这些改进如何体现在各种测试中。
🧠 神经网络的推理能力与GPT-4的局限性
这段对话讨论了神经网络的推理能力,以及GPT-4在这一领域的进步。尽管神经网络在推理方面仍有局限,但通过要求网络“大声思考”等方法,可以提高其推理能力。对话还提到了神经网络在可靠性方面的挑战,以及如何通过进一步的研究来提高其性能。此外,还探讨了多模态学习的重要性,以及GPT-4如何通过结合文本和图像来增强对世界的理解。
📈 语言模型的未来发展趋势
这段对话对未来语言模型的发展趋势进行了预测。虽然具体的细节难以确定,但可以肯定的是,随着研究的深入,我们将看到更多令人惊叹的系统。当前的研究重点在于提高系统的可靠性,确保用户可以信任其产出的内容。此外,对话还提到了GPT-4在数学问题解决、诗歌创作和视觉解释方面展现出的惊人能力,以及这些进步如何超出了预期。
🌈 深度学习的成功与未来的惊喜
这段对话总结了深度学习的成功,以及作者在这一领域近20年的经历。作者对深度学习的有效性感到惊讶,尤其是基础的神经网络概念和训练算法的长期正确性。对话强调了从小型神经网络到大型、复杂模型的发展,以及这些模型在解决实际问题方面的潜力。
Mindmap
Keywords
💡深度学习
💡人工智能
💡神经网络
💡无监督学习
💡OpenAI
💡强化学习
💡ChatGPT
💡多模态学习
💡可扩展性
💡可靠性
Highlights
深度学习的巨大变革,感谢深度学习带来的惊人力量。
个人对人工智能的兴趣源于对意识和人类经验的好奇,以及人工智能在这方面的潜在帮助。
2002-2003年,学习是人类可以做而计算机做不到的事情,这成为了人工智能发展的关键。
神经网络的自动并行编程特性,以及与大脑的相似性,使其成为人工智能发展的重要方向。
OpenAI的创立和初期工作重点,包括对无监督学习和压缩的兴趣。
通过压缩数据实现无监督学习的理论,以及GPT模型实际上对训练数据的压缩。
情感神经元的发现,展示了无监督学习的潜力。
OpenAI的两个初始理念:无监督学习和强化学习。
通过强化学习解决实时策略游戏的尝试,如DotA 2。
GPT模型的发展,以及它们如何成为OpenAI其他工作的基础。
大型神经网络预训练阶段的作用,以及如何通过预测下一个词来学习世界模型。
GPT-4相较于ChatGPT的显著改进,包括更准确的下一个词预测能力。
GPT-4在多模态学习方面的能力,包括理解和响应文本和图像输入。
多模态学习的重要性,以及它如何增强对世界的理解。
GPT-4在可靠性方面的提升,以及这如何影响其实用性。
GPT-4在解决数学问题、理解和生成诗歌、解释笑话等方面的能力。
对GPT-4的惊喜反应,包括其可靠性和解决问题的能力。
对未来一两年语言模型发展的预测,特别是在可靠性和系统信任方面的进步。
深度学习的成功和其对人工智能领域的影响,以及对这一领域长期信念的验证。
Transcripts
what was your intuition around deep
learning why did you know that it was
going to work did you have any intuition
that are going to lead to this kind of
success
okay well first of all thank you so much
for the quote for all the kind words
a lot has changed thanks to the
incredible power of deep learning
like I think this my personal starting
point I was interested in artificial
intelligence for a whole variety of
reasons
starting from
an intuitive understanding of
appreciation of its impact
and also I had a lot of curiosity about
what is consciousness what is The Human
Experience
and it felt like progress in artificial
intelligence will help with that
The Next Step was well back then I was
starting in 2002-2003
and it seemed like learning is the thing
that humans can do that people can do
that computers can't do at all
in 2003 2002 computers could not learn
anything
and it wasn't even clear that it was
possible in theory
and so I thought that making progress
in learning in artificial learning in
machine learning
that would lead to the greatest progress
in AI
and then I started to look around for
what was out there
and nothing seemed too promising
but to my great luck
Jeff Hinton was a professor at my
University
and I was able to find him and he was
working in neural networks and it
immediately made sense
because neural networks had the property
that we are learning we are
automatically programming parallel
computers back then the parallel
computers were small
but the promise was if you could somehow
figure out how learning in neural
networks work
then you can program small parallel
computers from data and it was also
similar enough to the brain and the
brain works so it's like you had these
several factors going for it now it
wasn't clear how to get it to work
but of all the things that existed that
seemed like it had by far the greatest
long-term promise Big Bang of AI fast
forward to now you came out to the
valley you started open AI with some
friends
um you're the chief scientist
now what was the first initial idea
about what to work on at open AI because
you guys worked on several things some
of the trails of of inventions and and
work uh you could you could see led up
to
the chat GPT moment
um but what were the initial inspiration
what were you how would you approach
intelligence from that moment and led to
this yeah so obviously when we started
it wasn't
100 clear how to proceed
and the field was also very different
compared to the way it is right now
so right now you already used we already
used to
you have these amazing artifacts these
amazing neural Nets who are doing
incredible things and everyone is so
excited but back in 2015 2016 early 2016
when you were starting out
the whole thing seemed pretty crazy
there were so many fewer researchers
like
like 100 maybe they were between a
hundred and a thousand times fewer
people in the field compared to now like
back then you had like
100 people most of them are working in
Google slash deepmind and that was that
and then there were people picking up
the skills but it was very very scarce
very rare still
and we had two
big initial ideas
at the start of open AI
that state that had a lot of staying
power and they stayed with us to this
day
and I'll describe them right now
the first big idea that we had
one which
I was especially excited about very
early on
is the idea
of
unsupervised learning through
compression
some context
today we take it for granted that on
supervised learning is this easy thing
you just pre-train on everything and it
all does exactly as you'd expect
in 2016
unsupervised learning was an unsolved
problem in machine learning that no one
had any insight exactly any clue as to
what to do that's right iyanla Khan
would go around and give a talk give
talk saying that you have this Grand
Challenge in supervised learning
and I really believed
that really good compression
of the data will lead to unsupervised
learning
now
compression is not language that's
commonly used to describe what is really
being done until recently
when suddenly it became apparent to many
people that those gpts actually compress
the training data
you may recall that Ted Chiang New Times
article which also alluded to this
but there is a real mathematical sense
in which training these autoregressive
generative models compress the data and
intuitively you can see why that should
work
if you compress the data really well you
must extract all the hidden secrets
which exist in it
therefore that is the key
so that was the first idea that we're
really excited about and that led to
quite a few Works in openai
to the sentiment neuron
which I'll mention
very briefly it is not this work might
not be well known outside of the machine
learning field but it was very
influential especially in our thinking
this work
like the result there
was that when you train a neural network
back then it was not a Transformer it
was before the Transformer right small
recurrency
LST
some of the words that you've done
yourself you know
so the same lsdm with a few twists
trying to predict the next token in
Amazon reviews next character
and we discovered that if you predict
the next character well enough
it will be a neuron inside that lstm
that corresponds to its sentiment
so that was really cool because it
showed some traction for unsupervised
learning and it validated the idea that
really good next
character prediction next something
prediction compression yeah
has the probability that it discovers
the secrets in the data that's what we
see with these GPT models right you
train and people say just statistical
correlation I mean at this point it
should be so clear to anyone that
observation also
you know for me intuitively open up the
whole world of where do I get the data
for unsupervised learning
because I do have a whole lot of data if
I could just make you predict the next
character and I know what the ground
truth is I know what the answer is I
could be I could train a neural network
bothered with that so that that
observation and masking and other other
technology other approaches you know
open open my mind about where would the
world get all the data that's
unsupervised for unsupervised learning
well
you've always believed that scaling will
improve the performance of these models
yes larger larger networks deeper
networks more training data would scale
that
um there was a very important paper that
open AI wrote about the scaling laws and
the relationship between loss and the
size of the model and the amount of data
set the size of the data set when
Transformers came out it gave us the
opportunity to train very very large
models in very reasonable amount of time
um but what they in with the intuition
about about the scaling laws or the size
of of models and data
and your journey of gpt123 which came
first did you see the evidence of GPT
one through three first they were was
there the intuition about the scaling
law first
the intuition so I would say that the
way the way I'd phrase it is that I had
a very strong belief that bigger is
better
and that one of the goals that we had at
open AI is to figure out how to use the
scale correctly
there was a lot of belief about inopen
AI about scale from the very beginning
the question is
what to use it for precisely
because I'll mentioned right now we're
talking about the gpts but there is
another very important line of work
which I haven't mentioned the second big
idea but I think now is a good time to
make a detour and that's reinforcement
learning
that clearly seems important as well
what do you do with it
so the first really big project that was
done inside open AI
was our effort at solving a real-time
strategy game
and for context a real-time strategy
game is like it's a competitive sport
yeah right we need to be smart you need
to have faster you need to have a quick
reaction time you there's teamwork
and you're competing against another
team and it's pretty it's pretty it's
pretty involved
and there is a whole competitive league
for that game the game is called DotA 2.
and so we train a reinforcement learning
agent to play against itself
to produce
with the goal of the reaching a level so
that it could compete against the best
players in the world and that was a
major undertaking as well it was a very
different line it was reinforcement
learning yeah remember the data that you
guys announced that work and this is
this by the way when I was asking
earlier about about there's a there's a
large body of work that have come out of
open AI some of it seem like detours
um but but in fact as you were
explaining now they might might have
been detours this seemingly detours but
they they really led up to some of the
important work that we're now talking
about GPT yeah I mean there has been
real convergence where the gpts
produce the foundation and in the
reinforcement learning from DOTA morphed
into reinforcement learning from Human
feedback that's right and that
combination gave us chat GPT you know
there's a there's a there's a
misunderstanding that that uh chat GPT
is uh in itself just one giant large
language model there's a system around
it that's fairly complicated is it could
could you could you explain
um briefly for the audience the the uh
the fine-tuning of the the reinforcement
learning of the the the
um uh you know the various surrounding
systems that allows you to uh keep it on
Rails and and let it let it give it
knowledge and you know so on and so
forth yeah I can
so the way to think about it
is that when we train a large neural
network
to accurately predict the next word
in lots of different texts from the
internet
what we are doing is that we are
learning a world model
it looks like we are learning this it
may it may look on the surface that we
are just learning statistical
correlations in text
but it turns out that to just learn the
statistical correlations in text
to compress them really well
what the neural network learns is some
representation of the process that
produced the text
this text is actually a projection of
the world
there is a world out there
and it has a projection on this text
and so what the neural network is
learning is more and more aspects of the
world of people of the human conditions
their their their hopes dreams and
motivations their interactions and the
situations that we are in and the neural
network learns a compressed abstract
usable representation of that this is
what's being learned from accurately
predicting the next word
and furthermore the more accurate you
are is predicting the next word
the higher the Fidelity the more
resolution you get in this process
so that's what the pre-training stage
does
but what this does not do
is specify the desired behavior that you
wish our neural network to exhibit
you see a language model what it really
tries to do
is to answer the following question
if I had some random piece of text on
the internet which starts with some
prefix some prompt
what will it complete to if you just
randomly ended up on some text from the
internet but this is different from well
I want to have an assistant which will
be truthful that will be helpful that
will follow certain rules and not
violate them that requires additional
training this is where the fine tuning
and the reinforcement learning from
Human teachers
and other forms of AI assistance it's
not just reinforcement learning from
Human teachers it's also reinforcement
learning from human and AI collaboration
our teachers are working together with
an AI to teach our AI to behave but here
we are not teaching it new knowledge
this is not what's happening we are
teaching it
we are communicating with it we are
communicating to it what it is that we
want it to be and this process the
second stage is also extremely important
the better we do the second stage the
more useful the more reliable this
neural network will be so the second
stage is extremely important too in
addition to the first stage of the to
learn everything
learn everything learn as much as you
can about the world from the projection
of the world
came out just a few months ago
um fastest growing application in the
history
of humanity
uh lots of lots of uh uh
interpretations about why
um but some of the some of the things
that that is clear it is it is the
easiest application that anyone has ever
created for anyone to use it performs
tasks it performs things it does things
that are Beyond people's expectation
anyone can use it there are no
instruction sets there are no wrong ways
to use it you you just use it and uh if
it's if your instructions are our
prompts are ambiguous the conversation
refines the ambiguity until your intents
are understood by by the by the
application by the AI
the the impact of course uh clearly
remarkable now yesterday this is the day
after gpt4 just a few months later the
the performance of gpt4 in many areas
astounding SAT scores GRE scores bar
exams the number of the number of
tests that is able to perform at very
capable levels very capable human levels
astounding
what were the what were the major
differences between Chad gbt and gpt4
that led to its improvements in these in
these areas
so gpt4
is a
pretty substantial Improvement on top of
chat GPT across very many dimensions
we trained gpt4 I would say
between more than six months ago maybe
eight months ago I don't remember
exactly
GPT is the first build big difference
between shared GPT and gpd4
and that's perhaps is the more
the most important difference
is that the base on top of gpt4 is built
predicts the next word with crater
accuracy this is really important
because the better a neural network can
predict the next word in text the more
it understands it this claim is now
perhaps accepted by many at this point
but it might still not be intuitive or
not completely intuitive as to why that
is so I'd like to take a small detour
and to give an analogy that will
hopefully clarify why more accurate
prediction of the next word leads to
more understanding real understanding
let's consider an example
say you read a detective novel
it's like a complicated plot a storyline
different characters lots of events
Mysteries like Clues it's unclear then
let's say that at the last page of the
book the detective has got all the clues
gathered all the people and saying okay
I'm going to reveal the identity of
whoever committed the crime and that
person's name is predict that word
predict that word exactly my goodness
right yeah right now
there are many different words but by
predicting those words better and better
and better the understanding of the text
keeps on increasing gpt4 predicts the
next word better hell yeah people say
that the Deep learning won't lead to
reasoning that deep learning won't lead
to reasoning but in order to predict
that next word figure out from all of
the agents that were there and and all
of their you know strengths or
weaknesses or their intentions and uh
the context
um and to be able to predict that word
who who was the murderer that requires
some amount of reasoning a fair amount
of reasoning and so so how did that how
did the how is it that that um that it's
able to pre to learn reasoning and and
if if it learn reasoning
um you know one of the one of the things
that I was going to ask you is of all
the tests that were that were taken
um between Chad GPT and gpd4 there were
some tests
that gpt3 or chat GPT was already very
good at
there were some tests that gbt3 or chibi
was not as good at
um that gbt4 was much better at and
there were some tests that neither are
good at yet
I would love for it you know and some of
it has to do with reasoning it seems
that you know maybe in in calculus that
that it wasn't able to break maybe the
problem down
um into into its reasonable steps and
solve it is is it but yet in some areas
it seems to demonstrate reasoning skills
and so is that an area that that um uh
that in predicting the next word you're
you're learning reasoning and um uh what
are the limitations uh now of gpt4 that
would enhance his ability to reason even
even further
you know
the reasoning isn't this super
well-defined concept but we can try to
Define it anyway
which is when you maybe maybe when you
go further
where you're able to somehow think about
it a little bit and get a better answer
because of your reasoning
and I'd say I'd say that there are
neural Nets
you know maybe there is some kind of
limitation which could be addressed
by for example asking the neural network
to think out loud this has proven to be
extremely effective for reasoning but I
think it also remains to be seen just
how far the basic neural network will go
I think we have yet to uh
tap fully tap out its potential
but yeah I mean there is definitely some
sense where reasoning is still not
quiet at that level
as some of the other capabilities of the
neural network though we would like the
reasoning capabilities of the neural
network to be high higher
I think that it's fairly likely that
business as usual will keep will improve
the reasoning capabilities of the neural
network I wouldn't
I wouldn't necessarily confidently rule
out this possibility
yeah because one of the things that that
is really cool is you ask you as a
tragic a question that before it answers
the question tell me first first of what
you know and then to answer the question
um you know usually when somebody
answers a question if you give me the
the foundational knowledge that you have
or the foundational assumptions that
you're making before you answer the
question that really improves the my
believability of of the answer you're
also demonstrating some level of reason
when you're demonstrating reasoning and
so it seems to me that chat GPD has this
inherent capability embedded in it
yeah to some degree yeah this the the
the the the the way the one way to think
about what's happening now is that
these neural networks have a lot of
these capabilities they're just not
quite very reliable in fact you could
say that reliability is currently the
single biggest obstacle for these neural
networks being useful truly useful if
sometimes it is still the case
that these neural networks
hallucinate a little bit
or maybe make some mistakes which are
unexpected which you wouldn't expect the
person to make
it is this kind of unreliability that
makes them substantially less useful
but I think that perhaps with a little
bit more research with the current ideas
that you have and perhaps a few more of
the
ambitious research plans will be able to
achieve higher reliability as well and
that will be truly useful that will
allow us to have very accurate guard
rails which are very precise that's
right and it will make it ask for
clarification where it's unsure or maybe
say that it doesn't know something when
it does anything it doesn't know and do
so extremely reliably so I'd say that
these are
some of the bottlenecks really so it's
not about whether it exhibits some
particular capability but more how
reliable degree exactly yeah
multi-modality gpt4 has the ability to
learn from text and images
and respond to input from text and
images first of all
the foundation of multi-modality
learning
of course Transformers has made it
possible for us to learn from
multimodality tokenized text and images
but at the foundational level help us
understand how multimodality enhances
the understanding of the world
Beyond text by itself
and uh and my understanding is that that
that when you when you um uh do
multi-modality learning
learning that even when it is just a
text prompt the text prompt the text
understanding could actually be enhanced
tell us about multimodality at the
foundation why it's so important and and
what was what's the major breakthrough
in the the and the characteristic
differences as a result
so there are two Dimensions to
multimodality
two reasons why it is interesting
the first reason
is
a little bit humble the first reason is
that multimodality is useful
it is useful for a neural network to see
Vision in particular
because the world is very visual human
beings are very visual animals
I believe that a third of the visual
core of the human cortex is dedicated to
vision
and so
by not having vision
the usefulness of our neural networks
though still considerable
is not as big as it could be so it is a
very simple usefulness argument it is
simply
useful to see
and gpt4 can see quite well
the there is a second reason to division
which is that we learn more about the
world
by learning from images in addition to
learning from text
that is also a powerful argument though
it is not as clear-cut as it may seem
I'll give you an example
or rather before giving an example I'll
make the general comment
for a human being us human beings we get
to hear about one billion words
in our entire life
only only one billion words that's
amazing yeah that's not a lot yeah
that's not a lot
so we need to competent we need doesn't
include my own words in my own head
make it two billion but you see what I
mean yeah you know we can see that
because
um a billion seconds is 30 years so you
can kind of see like we don't get to see
more than a few words a second then if
you're asleep half the time so like a
couple billion words is the total we get
in our entire life so it becomes really
important for us to get as many sources
of information as we can and we
absolutely learn a lot more from vision
the same argument holds true for our
neural networks as well
except
except for the fact that the neural
network can learn from so many words
so
things which are hard to learn about the
world from text in a few billion words
may become easier
from trillions of words and I'll give
you an example
consider colors
surely one needs to see to understand
calories
and yet the text only neural networks
who've never seen a single Photon in
their entire life if you ask them which
colors are more similar to each other it
will know that red is more similar to
Orange than to Blue it will know that
blue is more similar to purple than to
Yellow
how does that happen
and one answer is that information about
the world even the visual information
slowly leaks in through text but slowly
not as quickly
but then you have a lot of text you can
still learn a lot of course once you
also add vision and learning about the
world from Vision you will learn
additional things which are not captured
in text
but it is no I would not say that it is
a binary there are things which are
impossible to learn from the from text
only I think this is more of an exchange
rate and in particular as you want to
learn if we are if you if you if you are
like a human being and you want to learn
from a billion words or a hundred
million words then of course the other
sources of information become far more
important
when you when you um I
on the on the context of the scores that
I saw
um the thing that was really interesting
was was uh the the data that you guys
published which which one of the tests
were were um uh performed well by gpt3
and which one of the tests performed
substantially better with gbt4
um
how did multi-modality contribute to
those tests you think oh I mean
in a pretty straightforward
straightforward way anytime there was a
test where a problem
would were to understand the problem you
need to look at a diagram like for
example in some math competitions like
there is a
cont math competition for high school
students called AMC 2012 right and there
presumably many of the problems have a
diagram
so GPT 3.5 does quite badly on that on
that X on that on the test gpt4 with
text only does
I think I don't remember but it's like
maybe from two percent to 20 accuracy of
success rate but then when you add
Vision it jumps to 40 success rate so
the vision is really doing a lot of work
the vision is extremely good and
I think being able to reason visually as
well and communicate visually will also
be very powerful and very nice things
which go beyond just learning about the
world you have several things you got to
learn you can learn about the world
you can reason about the world visually
and you can communicate visually where
now in the future perhaps in some future
version if you ask your neural net hey
like explain this to me rather than just
producing four paragraphs it will
produce hey like here's like a little
diagram which clearly conveys to you
exactly what you need to know and so
that's incredible tell tell us whatever
you can about about uh where we are now
and and what do you think will be and
and not not too distant future but you
know pick your your horizon a year or
two uh where do you think this whole
language Model area would be in some of
the areas that you're most excited about
you know predictions are hard
and um it's a bit although it's a little
difficult to do
say things which are too specific
I think it's safe to assume
that
progress will continue
and that we will keep on seeing systems
which Astound us
in their in the things that they can do
and the current Frontiers are will be
centered around reliability around the
system can be trusted really get into a
point where you can trust what it
produces
really get into a point where if he
doesn't understand something it asks for
a clarification
says that he doesn't know something says
it in his more information I think those
are perhaps the biggest the areas where
Improvement will lead to the biggest
impact on the usefulness of those
systems because right now that's really
what stands in the way you have an AF
asking your own network you're asking
neural net to maybe summarize some long
document and you get a summary
like are you sure that some important
detail wasn't omitted it's still a
useful summary
but it's a different story when you know
but all the important points have been
covered
at some point like and in particular
it's okay like if some even the reason
ambiguity it's fine but if a point is
clearly important such that anyone else
who saw that point would say this is
really important
when the neural network will also
recognize that reliably that's when you
know same for the guardrail say same for
its ability to clearly follow the intent
of the user of its operator
so I think we'll see a lot of that in
the next two years yeah because the
progress in those two areas will make
this technology trusted by people to use
and be able to apply for so many things
I was thinking that was going to be the
last question but I did have another one
sorry about it okay so so Chad uh chat
gbt to gpt4 gpt4 when when it first when
you first started using it what are some
of the skills that
it demonstrated that surprised even you
well
there were lots of really cool things
that it demonstrated which
which is which were quite cool and
surprising it was
it was quite good so I'll mention two
excess so let's see I'm just I'm just
trying to think
about the best way to go about it
the short answer
is that the level of its reliability was
surprising
where the previous neural networks if
you ask them a question sometimes they
might
misunderstand something in a kind of a
silly way whereas the gpt4 that stopped
happening
its ability to solve math problems
became far greater it's like you could
really like say sometimes you know
really do the derivation and like long
complicated derivation and you could
convert the units and so on and that was
really cool you know like many people
what's your proof it works through a
proof it's pretty amazing not all proofs
yeah naturally but but quite a few yeah
or another example would be like many
people noticed that it has the ability
to
produce poems with you know every word
starting with the same letters or every
word starting with some it follows
instructions really really clearly not
perfectly still but much better before
yeah really good and on the vision side
I really love how it can explain jokes
you can explain memes you show it a meme
and ask it why it's funny and it will
tell you and it will be correct the the
vision part I think is very
was also very it's like really actually
seeing it when you can ask questions
follow-up questions about some
complicated image with a complicated
diagram and get an explanation that's
really cool
but yeah overall I will say to take a
step back
you know I've been I've been in this
business for quite some time actually
like almost exactly 20 years
[Music]
and
the thing which most of which I find
most surprising is that it actually
works
yeah like it it's turned out to be the
same little thing all along which is no
longer little and it's a lot more
serious and much more intense
but it's the same neural network just
larger trained on maybe larger data sets
in different ways with the same
fundamental training algorithm yeah
so it's like wow
I would say this is what I find the most
surprising yeah whenever I take a step
back I go how is it possible that those
ideas those conceptual ideas about well
the brain has neurons so maybe
artificial neurons are just as good and
so maybe we just need to train them
somehow with some learning algorithm
that those arguments turned out to be so
incredibly correct
that would be
the biggest surprise it's
Browse More Related Video
Possible End of Humanity from AI? Geoffrey Hinton at MIT Technology Review's EmTech Digital
Interview with Dr. Ilya Sutskever, co-founder of OPEN AI - at the Open University studios - English
Heroes of Deep Learning: Andrew Ng interviews Geoffrey Hinton
Ilya sutskever | Humanity will eventually move towards AGI | The intelligent body will soon appear
Season 2 Ep 22 Geoff Hinton on revolutionizing artificial intelligence... again
No Priors Ep. 39 | With OpenAI Co-Founder & Chief Scientist Ilya Sutskever
5.0 / 5 (0 votes)