Highlights of the Fireside Chat with Ilya Sutskever & Jensen Huang: AI Today & Vision of the Future

Xiao Yang
16 May 202336:08

Summary

TLDR在这段访谈中,受访者分享了他对深度学习的直觉和成功背后的思考。他从人工智能的潜在影响和对人类意识的好奇心出发,探讨了机器学习的早期挑战和神经网络的潜力。他强调了无监督学习和压缩的重要性,并介绍了OpenAI的创立初衷和早期项目,如实时策略游戏DotA 2的AI研究。他还讨论了GPT模型的发展,特别是GPT-4在理解、推理和多模态学习方面的显著进步,以及这些技术未来可能的发展方向。

Takeaways

  • 🤖 深度学习的强大力量改变了很多东西,包括对人工智能的兴趣和对其影响的认识。
  • 🧠 人工智能的进步有助于理解意识和人类经验。
  • 💡 2002-2003年,学习是计算机无法做到的,而人工学习被认为是AI进步的关键。
  • 🌟 神经网络因其自动编程的特性和与大脑的相似性而被视为具有巨大潜力。
  • 🚀 2015-2016年,OpenAI成立时,人工智能领域还处于起步阶段,研究者数量有限。
  • 🔍 OpenAI最初的两大理念是无监督学习和通过压缩实现学习,这在当时是一个未解决的问题。
  • 📈 通过预测下一个字符或token,神经网络能够学习并压缩数据,从而发现数据中的秘密。
  • 🎯 扩大模型规模、增加训练数据可以提高性能,这是OpenAI从一开始就坚持的信念。
  • 🤔 GPT-3和GPT-4之间的主要区别在于GPT-4在预测下一个单词方面更加精确,这表明了其更深的理解。
  • 🌐 多模态学习使神经网络能够从文本和图像中学习,并提高对世界的理解。
  • 🔮 未来,语言模型的发展将集中在提高可靠性和信任度上,使系统更加有用。

Q & A

  • 深度学习的成功背后的直觉是什么?

    -深度学习成功的直觉源于对人工智能影响的直观理解和对意识、人类经验的好奇心。深度学习被视为人工智能进步的关键,因为它涉及到机器的学习能力,这是早期计算机所不具备的。

  • Jeff Hinton在深度学习发展中起到了什么作用?

    -Jeff Hinton在深度学习发展中起到了关键作用。他是当时在讲述者所在大学的教授,专注于神经网络研究。他的工作使讲述者意识到神经网络具有自动编程并行计算机的特性,这与大脑的工作方式相似,为深度学习的发展提供了巨大的长期潜力。

  • OpenAI成立初期的工作重点是什么?

    -OpenAI成立初期的工作重点包括无监督学习和强化学习。他们相信通过数据压缩可以实现无监督学习,并通过在实时策略游戏(如DotA 2)中的强化学习研究,探索人工智能的应用。

  • GPT模型的预训练阶段是如何工作的?

    -GPT模型的预训练阶段通过训练大型神经网络准确预测不同文本中的下一个单词。这一过程不仅仅是学习文本的统计相关性,而是通过压缩数据来学习世界模型,从而获得对人类条件、希望、梦想、动机以及所处情境的压缩抽象可用表示。

  • 如何提高神经网络的可靠性?

    -提高神经网络的可靠性可以通过进一步研究和开发更精确的训练算法来实现。这包括让网络在不理解时请求澄清、在不确定时请求更多信息,以及在不知道某事时能够明确表示出来。这些改进将使神经网络在实际应用中更加值得信赖。

  • GPT-4与ChatGPT相比有哪些改进?

    -GPT-4在多个维度上对ChatGPT进行了显著改进。GPT-4的基础模型能够以更高的准确性预测下一个单词,这表明它对文本的理解更深。此外,GPT-4还展示了在数学问题解决、遵循指令以及视觉理解方面的能力提升。

  • 多模态学习的重要性是什么?

    -多模态学习的重要性在于它扩展了神经网络的理解能力,不仅包括文本,还包括图像。这样做可以增强对世界的理解,并且通过视觉信息学习,可能会发现文本中难以捕捉的信息。

  • GPT-4在视觉理解方面有哪些能力?

    -GPT-4能够理解和解释图像内容,包括解释笑话、解释复杂的图表和图解。这种能力使得GPT-4不仅能够处理文本输入,还能处理和理解图像输入,从而提供更全面的解答。

  • 深度学习的发展对讲述者来说最令人惊讶的是什么?

    -对讲述者来说,最令人惊讶的是深度学习实际上真的有效。经过多年的发展,即使是简单的神经网络概念,只要扩大规模、改进数据集和训练方式,就能取得如此显著的成果。

  • 未来一两年,语言模型领域可能会有哪些进展?

    -未来一两年,语言模型领域可能会在可靠性方面取得显著进展,包括提高系统的信任度、确保系统能够理解并准确地遵循用户意图,以及在不理解时能够请求澄清或承认不了解的内容。

  • GPT-4在数学问题解决方面有哪些令人印象深刻的表现?

    -GPT-4在数学问题解决方面的能力有显著提升,它能够执行复杂的数学推导、单位转换等。这表明GPT-4不仅能够理解数学概念,还能够应用这些概念解决问题。

Outlines

00:00

🤖 人工智能的直觉与深度学习的起源

这段对话从对深度学习的兴趣和直觉开始,讲述了人工智能发展的个人历程。起初,对人工智能的兴趣源于对其潜在影响的认识和对人类意识的好奇。2002-2003年间,计算机尚无法进行有效的学习,这成为了人工智能发展的一大障碍。然而,随着Jeff Hinton在神经网络方面的工作,深度学习开始展现出巨大的潜力。尽管当时并不清楚如何实现这一目标,但神经网络与大脑的相似性以及对并行计算机的编程能力,使得这一领域充满了长期的研究价值。

05:03

🌟 无监督学习的突破与OpenAI的初步探索

这段对话继续探讨了无监督学习的概念,特别是在2016年,这一领域还远未解决。通过压缩数据实现无监督学习的想法逐渐变得清晰,尤其是GPT模型的出现,证明了数据压缩与学习的关系。此外,还提到了情感神经元的工作,这在机器学习领域产生了深远的影响。OpenAI的创立初衷是探索如何利用规模来推动人工智能的发展,特别是在深度学习和强化学习方面。

10:05

🎮 强化学习在实时策略游戏中的应用

这段对话讨论了OpenAI在强化学习方面的首个重大项目,即训练一个能够玩实时策略游戏的智能代理。以DotA 2为例,这个项目展示了人工智能在复杂任务中的潜力。此外,对话还提到了GPT模型的发展,以及如何通过人类反馈进行强化学习,这是Chat GPT诞生的关键。

15:06

🚀 GPT模型的发展与Chat GPT的复杂系统

这段对话深入探讨了GPT模型的发展,特别是Chat GPT的复杂性。Chat GPT不仅仅是一个大型语言模型,它还包括了一系列复杂的系统,以确保其行为符合预期。对话中提到了预训练阶段的重要性,以及如何通过人类反馈进行微调,以提高模型的可靠性和有用性。此外,还讨论了GPT-3和GPT-4之间的主要差异,以及这些改进如何体现在各种测试中。

20:06

🧠 神经网络的推理能力与GPT-4的局限性

这段对话讨论了神经网络的推理能力,以及GPT-4在这一领域的进步。尽管神经网络在推理方面仍有局限,但通过要求网络“大声思考”等方法,可以提高其推理能力。对话还提到了神经网络在可靠性方面的挑战,以及如何通过进一步的研究来提高其性能。此外,还探讨了多模态学习的重要性,以及GPT-4如何通过结合文本和图像来增强对世界的理解。

25:07

📈 语言模型的未来发展趋势

这段对话对未来语言模型的发展趋势进行了预测。虽然具体的细节难以确定,但可以肯定的是,随着研究的深入,我们将看到更多令人惊叹的系统。当前的研究重点在于提高系统的可靠性,确保用户可以信任其产出的内容。此外,对话还提到了GPT-4在数学问题解决、诗歌创作和视觉解释方面展现出的惊人能力,以及这些进步如何超出了预期。

30:08

🌈 深度学习的成功与未来的惊喜

这段对话总结了深度学习的成功,以及作者在这一领域近20年的经历。作者对深度学习的有效性感到惊讶,尤其是基础的神经网络概念和训练算法的长期正确性。对话强调了从小型神经网络到大型、复杂模型的发展,以及这些模型在解决实际问题方面的潜力。

Mindmap

Keywords

💡深度学习

深度学习是一种人工智能技术,通过模拟人脑神经网络的工作机制,使计算机能够从数据中学习并做出决策。在视频中,深度学习被视为人工智能发展的关键,因为它能够处理和学习人类所擅长的任务,如理解和预测。

💡人工智能

人工智能是指使计算机和机器模拟人类智能行为的科学领域,包括学习、推理、感知、语言理解等能力。视频中,人工智能的发展是核心主题,讨论了其对人类经验的影响以及如何通过深度学习实现进步。

💡神经网络

神经网络是一种模仿人脑神经元连接方式的计算模型,用于实现机器学习。在视频中,神经网络是深度学习的基础,它们通过并行处理信息来学习,这与大脑的工作方式相似。

💡无监督学习

无监督学习是一种机器学习方法,它不依赖于标记的训练数据,而是通过分析数据本身的结构来学习。视频中提到,无监督学习是通过压缩数据来实现的,这种方法能够提取数据中的隐藏秘密,从而实现学习。

💡OpenAI

OpenAI是一个致力于人工智能研究的组织,旨在确保人工智能技术的发展能够对全人类产生积极影响。视频中提到OpenAI的成立和其在人工智能领域的贡献,特别是在开发GPT系列模型方面的工作。

💡强化学习

强化学习是一种机器学习范式,其中算法通过与环境的交互来学习行为策略,以最大化累积奖励。视频中提到OpenAI在强化学习领域的研究,特别是在实时策略游戏DotA 2中的应用。

💡ChatGPT

ChatGPT是OpenAI开发的一种基于GPT(生成预训练变换器)架构的语言模型,它能够生成连贯的文本回复,并与用户进行自然语言对话。视频中讨论了ChatGPT的开发背景和它如何结合了深度学习和强化学习的技术。

💡多模态学习

多模态学习是指机器学习模型能够处理并整合多种不同类型的数据输入,如文本、图像和声音。视频中提到GPT-4具备多模态学习能力,能够从文本和图像中学习并做出响应。

💡可扩展性

可扩展性是指系统、模型或算法能够适应增长或变化的能力,特别是在处理更大或更复杂数据集时仍能保持性能。视频中强调了可扩展性在深度学习中的重要性,特别是随着模型规模的增大,其性能也会提升。

💡可靠性

可靠性是指系统或模型在长时间内稳定、准确地执行其功能的能力。在视频中,作者提到提高神经网络的可靠性是使其变得真正有用的关键,包括正确理解用户意图和在不确定时请求澄清。

Highlights

深度学习的巨大变革,感谢深度学习带来的惊人力量。

个人对人工智能的兴趣源于对意识和人类经验的好奇,以及人工智能在这方面的潜在帮助。

2002-2003年,学习是人类可以做而计算机做不到的事情,这成为了人工智能发展的关键。

神经网络的自动并行编程特性,以及与大脑的相似性,使其成为人工智能发展的重要方向。

OpenAI的创立和初期工作重点,包括对无监督学习和压缩的兴趣。

通过压缩数据实现无监督学习的理论,以及GPT模型实际上对训练数据的压缩。

情感神经元的发现,展示了无监督学习的潜力。

OpenAI的两个初始理念:无监督学习和强化学习。

通过强化学习解决实时策略游戏的尝试,如DotA 2。

GPT模型的发展,以及它们如何成为OpenAI其他工作的基础。

大型神经网络预训练阶段的作用,以及如何通过预测下一个词来学习世界模型。

GPT-4相较于ChatGPT的显著改进,包括更准确的下一个词预测能力。

GPT-4在多模态学习方面的能力,包括理解和响应文本和图像输入。

多模态学习的重要性,以及它如何增强对世界的理解。

GPT-4在可靠性方面的提升,以及这如何影响其实用性。

GPT-4在解决数学问题、理解和生成诗歌、解释笑话等方面的能力。

对GPT-4的惊喜反应,包括其可靠性和解决问题的能力。

对未来一两年语言模型发展的预测,特别是在可靠性和系统信任方面的进步。

深度学习的成功和其对人工智能领域的影响,以及对这一领域长期信念的验证。

Transcripts

play00:00

what was your intuition around deep

play00:02

learning why did you know that it was

play00:04

going to work did you have any intuition

play00:07

that are going to lead to this kind of

play00:08

success

play00:09

okay well first of all thank you so much

play00:13

for the quote for all the kind words

play00:16

a lot has changed thanks to the

play00:19

incredible power of deep learning

play00:22

like I think this my personal starting

play00:25

point I was interested in artificial

play00:27

intelligence for a whole variety of

play00:30

reasons

play00:31

starting from

play00:33

an intuitive understanding of

play00:35

appreciation of its impact

play00:37

and also I had a lot of curiosity about

play00:40

what is consciousness what is The Human

play00:42

Experience

play00:43

and it felt like progress in artificial

play00:46

intelligence will help with that

play00:48

The Next Step was well back then I was

play00:52

starting in 2002-2003

play00:54

and it seemed like learning is the thing

play00:57

that humans can do that people can do

play00:59

that computers can't do at all

play01:03

in 2003 2002 computers could not learn

play01:07

anything

play01:09

and it wasn't even clear that it was

play01:11

possible in theory

play01:13

and so I thought that making progress

play01:17

in learning in artificial learning in

play01:20

machine learning

play01:21

that would lead to the greatest progress

play01:23

in AI

play01:24

and then I started to look around for

play01:26

what was out there

play01:28

and nothing seemed too promising

play01:31

but to my great luck

play01:33

Jeff Hinton was a professor at my

play01:35

University

play01:36

and I was able to find him and he was

play01:39

working in neural networks and it

play01:40

immediately made sense

play01:42

because neural networks had the property

play01:45

that we are learning we are

play01:48

automatically programming parallel

play01:50

computers back then the parallel

play01:52

computers were small

play01:54

but the promise was if you could somehow

play01:55

figure out how learning in neural

play01:58

networks work

play01:59

then you can program small parallel

play02:01

computers from data and it was also

play02:03

similar enough to the brain and the

play02:05

brain works so it's like you had these

play02:07

several factors going for it now it

play02:09

wasn't clear how to get it to work

play02:13

but of all the things that existed that

play02:16

seemed like it had by far the greatest

play02:19

long-term promise Big Bang of AI fast

play02:22

forward to now you came out to the

play02:24

valley you started open AI with some

play02:27

friends

play02:27

um you're the chief scientist

play02:30

now what was the first initial idea

play02:33

about what to work on at open AI because

play02:35

you guys worked on several things some

play02:37

of the trails of of inventions and and

play02:40

work uh you could you could see led up

play02:43

to

play02:44

the chat GPT moment

play02:47

um but what were the initial inspiration

play02:50

what were you how would you approach

play02:53

intelligence from that moment and led to

play02:56

this yeah so obviously when we started

play03:01

it wasn't

play03:03

100 clear how to proceed

play03:07

and the field was also very different

play03:09

compared to the way it is right now

play03:12

so right now you already used we already

play03:14

used to

play03:16

you have these amazing artifacts these

play03:19

amazing neural Nets who are doing

play03:20

incredible things and everyone is so

play03:23

excited but back in 2015 2016 early 2016

play03:28

when you were starting out

play03:31

the whole thing seemed pretty crazy

play03:33

there were so many fewer researchers

play03:35

like

play03:37

like 100 maybe they were between a

play03:39

hundred and a thousand times fewer

play03:41

people in the field compared to now like

play03:43

back then you had like

play03:45

100 people most of them are working in

play03:49

Google slash deepmind and that was that

play03:51

and then there were people picking up

play03:54

the skills but it was very very scarce

play03:55

very rare still

play03:58

and we had two

play04:01

big initial ideas

play04:04

at the start of open AI

play04:06

that state that had a lot of staying

play04:08

power and they stayed with us to this

play04:09

day

play04:10

and I'll describe them right now

play04:13

the first big idea that we had

play04:16

one which

play04:18

I was especially excited about very

play04:20

early on

play04:22

is the idea

play04:24

of

play04:26

unsupervised learning through

play04:29

compression

play04:31

some context

play04:33

today we take it for granted that on

play04:35

supervised learning is this easy thing

play04:36

you just pre-train on everything and it

play04:38

all does exactly as you'd expect

play04:41

in 2016

play04:43

unsupervised learning was an unsolved

play04:46

problem in machine learning that no one

play04:50

had any insight exactly any clue as to

play04:53

what to do that's right iyanla Khan

play04:55

would go around and give a talk give

play04:57

talk saying that you have this Grand

play04:59

Challenge in supervised learning

play05:02

and I really believed

play05:04

that really good compression

play05:07

of the data will lead to unsupervised

play05:09

learning

play05:10

now

play05:12

compression is not language that's

play05:14

commonly used to describe what is really

play05:16

being done until recently

play05:19

when suddenly it became apparent to many

play05:21

people that those gpts actually compress

play05:24

the training data

play05:25

you may recall that Ted Chiang New Times

play05:28

article which also alluded to this

play05:30

but there is a real mathematical sense

play05:32

in which training these autoregressive

play05:35

generative models compress the data and

play05:39

intuitively you can see why that should

play05:40

work

play05:41

if you compress the data really well you

play05:44

must extract all the hidden secrets

play05:45

which exist in it

play05:47

therefore that is the key

play05:50

so that was the first idea that we're

play05:51

really excited about and that led to

play05:55

quite a few Works in openai

play05:58

to the sentiment neuron

play06:00

which I'll mention

play06:03

very briefly it is not this work might

play06:06

not be well known outside of the machine

play06:09

learning field but it was very

play06:11

influential especially in our thinking

play06:14

this work

play06:17

like the result there

play06:20

was that when you train a neural network

play06:23

back then it was not a Transformer it

play06:25

was before the Transformer right small

play06:27

recurrency

play06:28

LST

play06:31

some of the words that you've done

play06:33

yourself you know

play06:35

so the same lsdm with a few twists

play06:37

trying to predict the next token in

play06:40

Amazon reviews next character

play06:43

and we discovered that if you predict

play06:45

the next character well enough

play06:47

it will be a neuron inside that lstm

play06:50

that corresponds to its sentiment

play06:53

so that was really cool because it

play06:57

showed some traction for unsupervised

play06:59

learning and it validated the idea that

play07:03

really good next

play07:06

character prediction next something

play07:08

prediction compression yeah

play07:10

has the probability that it discovers

play07:13

the secrets in the data that's what we

play07:15

see with these GPT models right you

play07:16

train and people say just statistical

play07:19

correlation I mean at this point it

play07:20

should be so clear to anyone that

play07:21

observation also

play07:23

you know for me intuitively open up the

play07:26

whole world of where do I get the data

play07:28

for unsupervised learning

play07:31

because I do have a whole lot of data if

play07:33

I could just make you predict the next

play07:35

character and I know what the ground

play07:36

truth is I know what the answer is I

play07:39

could be I could train a neural network

play07:40

bothered with that so that that

play07:42

observation and masking and other other

play07:45

technology other approaches you know

play07:47

open open my mind about where would the

play07:49

world get all the data that's

play07:51

unsupervised for unsupervised learning

play07:53

well

play07:54

you've always believed that scaling will

play07:57

improve the performance of these models

play07:59

yes larger larger networks deeper

play08:02

networks more training data would scale

play08:05

that

play08:06

um there was a very important paper that

play08:08

open AI wrote about the scaling laws and

play08:11

the relationship between loss and the

play08:15

size of the model and the amount of data

play08:17

set the size of the data set when

play08:19

Transformers came out it gave us the

play08:21

opportunity to train very very large

play08:23

models in very reasonable amount of time

play08:26

um but what they in with the intuition

play08:30

about about the scaling laws or the size

play08:32

of of models and data

play08:35

and your journey of gpt123 which came

play08:40

first did you see the evidence of GPT

play08:42

one through three first they were was

play08:44

there the intuition about the scaling

play08:45

law first

play08:46

the intuition so I would say that the

play08:49

way the way I'd phrase it is that I had

play08:52

a very strong belief that bigger is

play08:55

better

play08:57

and that one of the goals that we had at

play09:01

open AI is to figure out how to use the

play09:04

scale correctly

play09:06

there was a lot of belief about inopen

play09:08

AI about scale from the very beginning

play09:10

the question is

play09:12

what to use it for precisely

play09:15

because I'll mentioned right now we're

play09:16

talking about the gpts but there is

play09:18

another very important line of work

play09:19

which I haven't mentioned the second big

play09:21

idea but I think now is a good time to

play09:23

make a detour and that's reinforcement

play09:25

learning

play09:27

that clearly seems important as well

play09:29

what do you do with it

play09:31

so the first really big project that was

play09:35

done inside open AI

play09:37

was our effort at solving a real-time

play09:43

strategy game

play09:44

and for context a real-time strategy

play09:47

game is like it's a competitive sport

play09:49

yeah right we need to be smart you need

play09:52

to have faster you need to have a quick

play09:53

reaction time you there's teamwork

play09:56

and you're competing against another

play09:57

team and it's pretty it's pretty it's

play10:00

pretty involved

play10:02

and there is a whole competitive league

play10:04

for that game the game is called DotA 2.

play10:07

and so we train a reinforcement learning

play10:09

agent to play against itself

play10:12

to produce

play10:16

with the goal of the reaching a level so

play10:19

that it could compete against the best

play10:20

players in the world and that was a

play10:23

major undertaking as well it was a very

play10:25

different line it was reinforcement

play10:27

learning yeah remember the data that you

play10:29

guys announced that work and this is

play10:31

this by the way when I was asking

play10:33

earlier about about there's a there's a

play10:35

large body of work that have come out of

play10:37

open AI some of it seem like detours

play10:40

um but but in fact as you were

play10:42

explaining now they might might have

play10:43

been detours this seemingly detours but

play10:46

they they really led up to some of the

play10:47

important work that we're now talking

play10:48

about GPT yeah I mean there has been

play10:52

real convergence where the gpts

play10:56

produce the foundation and in the

play10:58

reinforcement learning from DOTA morphed

play11:01

into reinforcement learning from Human

play11:03

feedback that's right and that

play11:05

combination gave us chat GPT you know

play11:07

there's a there's a there's a

play11:09

misunderstanding that that uh chat GPT

play11:12

is uh in itself just one giant large

play11:16

language model there's a system around

play11:17

it that's fairly complicated is it could

play11:20

could you could you explain

play11:22

um briefly for the audience the the uh

play11:25

the fine-tuning of the the reinforcement

play11:27

learning of the the the

play11:30

um uh you know the various surrounding

play11:32

systems that allows you to uh keep it on

play11:35

Rails and and let it let it give it

play11:39

knowledge and you know so on and so

play11:41

forth yeah I can

play11:43

so the way to think about it

play11:46

is that when we train a large neural

play11:49

network

play11:50

to accurately predict the next word

play11:53

in lots of different texts from the

play11:56

internet

play11:57

what we are doing is that we are

play11:59

learning a world model

play12:01

it looks like we are learning this it

play12:03

may it may look on the surface that we

play12:06

are just learning statistical

play12:08

correlations in text

play12:10

but it turns out that to just learn the

play12:14

statistical correlations in text

play12:16

to compress them really well

play12:18

what the neural network learns is some

play12:21

representation of the process that

play12:24

produced the text

play12:26

this text is actually a projection of

play12:29

the world

play12:30

there is a world out there

play12:32

and it has a projection on this text

play12:35

and so what the neural network is

play12:36

learning is more and more aspects of the

play12:40

world of people of the human conditions

play12:42

their their their hopes dreams and

play12:45

motivations their interactions and the

play12:48

situations that we are in and the neural

play12:52

network learns a compressed abstract

play12:55

usable representation of that this is

play12:59

what's being learned from accurately

play13:01

predicting the next word

play13:03

and furthermore the more accurate you

play13:05

are is predicting the next word

play13:07

the higher the Fidelity the more

play13:11

resolution you get in this process

play13:13

so that's what the pre-training stage

play13:14

does

play13:16

but what this does not do

play13:19

is specify the desired behavior that you

play13:23

wish our neural network to exhibit

play13:26

you see a language model what it really

play13:29

tries to do

play13:30

is to answer the following question

play13:33

if I had some random piece of text on

play13:37

the internet which starts with some

play13:39

prefix some prompt

play13:41

what will it complete to if you just

play13:45

randomly ended up on some text from the

play13:47

internet but this is different from well

play13:50

I want to have an assistant which will

play13:52

be truthful that will be helpful that

play13:55

will follow certain rules and not

play13:57

violate them that requires additional

play14:00

training this is where the fine tuning

play14:03

and the reinforcement learning from

play14:04

Human teachers

play14:06

and other forms of AI assistance it's

play14:08

not just reinforcement learning from

play14:09

Human teachers it's also reinforcement

play14:11

learning from human and AI collaboration

play14:14

our teachers are working together with

play14:16

an AI to teach our AI to behave but here

play14:19

we are not teaching it new knowledge

play14:20

this is not what's happening we are

play14:23

teaching it

play14:24

we are communicating with it we are

play14:28

communicating to it what it is that we

play14:31

want it to be and this process the

play14:34

second stage is also extremely important

play14:36

the better we do the second stage the

play14:39

more useful the more reliable this

play14:41

neural network will be so the second

play14:43

stage is extremely important too in

play14:46

addition to the first stage of the to

play14:48

learn everything

play14:50

learn everything learn as much as you

play14:52

can about the world from the projection

play14:55

of the world

play14:57

came out just a few months ago

play15:00

um fastest growing application in the

play15:02

history

play15:03

of humanity

play15:06

uh lots of lots of uh uh

play15:10

interpretations about why

play15:12

um but some of the some of the things

play15:14

that that is clear it is it is the

play15:17

easiest application that anyone has ever

play15:20

created for anyone to use it performs

play15:23

tasks it performs things it does things

play15:27

that are Beyond people's expectation

play15:31

anyone can use it there are no

play15:33

instruction sets there are no wrong ways

play15:35

to use it you you just use it and uh if

play15:39

it's if your instructions are our

play15:41

prompts are ambiguous the conversation

play15:44

refines the ambiguity until your intents

play15:47

are understood by by the by the

play15:50

application by the AI

play15:52

the the impact of course uh clearly

play15:56

remarkable now yesterday this is the day

play16:01

after gpt4 just a few months later the

play16:06

the performance of gpt4 in many areas

play16:09

astounding SAT scores GRE scores bar

play16:14

exams the number of the number of

play16:18

tests that is able to perform at very

play16:22

capable levels very capable human levels

play16:24

astounding

play16:26

what were the what were the major

play16:28

differences between Chad gbt and gpt4

play16:32

that led to its improvements in these in

play16:35

these areas

play16:36

so gpt4

play16:41

is a

play16:43

pretty substantial Improvement on top of

play16:46

chat GPT across very many dimensions

play16:50

we trained gpt4 I would say

play16:55

between more than six months ago maybe

play16:58

eight months ago I don't remember

play16:59

exactly

play17:02

GPT is the first build big difference

play17:05

between shared GPT and gpd4

play17:08

and that's perhaps is the more

play17:10

the most important difference

play17:13

is that the base on top of gpt4 is built

play17:16

predicts the next word with crater

play17:19

accuracy this is really important

play17:23

because the better a neural network can

play17:26

predict the next word in text the more

play17:28

it understands it this claim is now

play17:31

perhaps accepted by many at this point

play17:34

but it might still not be intuitive or

play17:37

not completely intuitive as to why that

play17:39

is so I'd like to take a small detour

play17:41

and to give an analogy that will

play17:43

hopefully clarify why more accurate

play17:46

prediction of the next word leads to

play17:48

more understanding real understanding

play17:52

let's consider an example

play17:54

say you read a detective novel

play17:57

it's like a complicated plot a storyline

play17:59

different characters lots of events

play18:02

Mysteries like Clues it's unclear then

play18:06

let's say that at the last page of the

play18:08

book the detective has got all the clues

play18:11

gathered all the people and saying okay

play18:12

I'm going to reveal the identity of

play18:15

whoever committed the crime and that

play18:18

person's name is predict that word

play18:20

predict that word exactly my goodness

play18:22

right yeah right now

play18:24

there are many different words but by

play18:26

predicting those words better and better

play18:28

and better the understanding of the text

play18:31

keeps on increasing gpt4 predicts the

play18:35

next word better hell yeah people say

play18:37

that the Deep learning won't lead to

play18:40

reasoning that deep learning won't lead

play18:43

to reasoning but in order to predict

play18:45

that next word figure out from all of

play18:48

the agents that were there and and all

play18:52

of their you know strengths or

play18:54

weaknesses or their intentions and uh

play18:57

the context

play18:59

um and to be able to predict that word

play19:01

who who was the murderer that requires

play19:05

some amount of reasoning a fair amount

play19:06

of reasoning and so so how did that how

play19:09

did the how is it that that um that it's

play19:13

able to pre to learn reasoning and and

play19:16

if if it learn reasoning

play19:18

um you know one of the one of the things

play19:20

that I was going to ask you is of all

play19:21

the tests that were that were taken

play19:24

um between Chad GPT and gpd4 there were

play19:26

some tests

play19:28

that gpt3 or chat GPT was already very

play19:31

good at

play19:32

there were some tests that gbt3 or chibi

play19:35

was not as good at

play19:37

um that gbt4 was much better at and

play19:40

there were some tests that neither are

play19:42

good at yet

play19:43

I would love for it you know and some of

play19:45

it has to do with reasoning it seems

play19:47

that you know maybe in in calculus that

play19:49

that it wasn't able to break maybe the

play19:51

problem down

play19:53

um into into its reasonable steps and

play19:55

solve it is is it but yet in some areas

play19:59

it seems to demonstrate reasoning skills

play20:02

and so is that an area that that um uh

play20:06

that in predicting the next word you're

play20:08

you're learning reasoning and um uh what

play20:12

are the limitations uh now of gpt4 that

play20:15

would enhance his ability to reason even

play20:17

even further

play20:19

you know

play20:21

the reasoning isn't this super

play20:23

well-defined concept but we can try to

play20:27

Define it anyway

play20:28

which is when you maybe maybe when you

play20:32

go further

play20:33

where you're able to somehow think about

play20:36

it a little bit and get a better answer

play20:39

because of your reasoning

play20:41

and I'd say I'd say that there are

play20:44

neural Nets

play20:46

you know maybe there is some kind of

play20:48

limitation which could be addressed

play20:51

by for example asking the neural network

play20:52

to think out loud this has proven to be

play20:55

extremely effective for reasoning but I

play20:57

think it also remains to be seen just

play20:59

how far the basic neural network will go

play21:01

I think we have yet to uh

play21:04

tap fully tap out its potential

play21:10

but yeah I mean there is definitely some

play21:13

sense where reasoning is still not

play21:17

quiet at that level

play21:19

as some of the other capabilities of the

play21:22

neural network though we would like the

play21:25

reasoning capabilities of the neural

play21:26

network to be high higher

play21:30

I think that it's fairly likely that

play21:32

business as usual will keep will improve

play21:34

the reasoning capabilities of the neural

play21:36

network I wouldn't

play21:38

I wouldn't necessarily confidently rule

play21:41

out this possibility

play21:43

yeah because one of the things that that

play21:45

is really cool is you ask you as a

play21:48

tragic a question that before it answers

play21:51

the question tell me first first of what

play21:52

you know and then to answer the question

play21:55

um you know usually when somebody

play21:56

answers a question if you give me the

play21:58

the foundational knowledge that you have

play22:00

or the foundational assumptions that

play22:02

you're making before you answer the

play22:03

question that really improves the my

play22:06

believability of of the answer you're

play22:09

also demonstrating some level of reason

play22:11

when you're demonstrating reasoning and

play22:13

so it seems to me that chat GPD has this

play22:15

inherent capability embedded in it

play22:18

yeah to some degree yeah this the the

play22:21

the the the the way the one way to think

play22:23

about what's happening now is that

play22:26

these neural networks have a lot of

play22:28

these capabilities they're just not

play22:30

quite very reliable in fact you could

play22:33

say that reliability is currently the

play22:36

single biggest obstacle for these neural

play22:38

networks being useful truly useful if

play22:42

sometimes it is still the case

play22:45

that these neural networks

play22:47

hallucinate a little bit

play22:49

or maybe make some mistakes which are

play22:51

unexpected which you wouldn't expect the

play22:53

person to make

play22:55

it is this kind of unreliability that

play22:58

makes them substantially less useful

play23:00

but I think that perhaps with a little

play23:03

bit more research with the current ideas

play23:04

that you have and perhaps a few more of

play23:06

the

play23:08

ambitious research plans will be able to

play23:11

achieve higher reliability as well and

play23:13

that will be truly useful that will

play23:15

allow us to have very accurate guard

play23:19

rails which are very precise that's

play23:21

right and it will make it ask for

play23:23

clarification where it's unsure or maybe

play23:28

say that it doesn't know something when

play23:30

it does anything it doesn't know and do

play23:33

so extremely reliably so I'd say that

play23:35

these are

play23:37

some of the bottlenecks really so it's

play23:39

not about whether it exhibits some

play23:42

particular capability but more how

play23:44

reliable degree exactly yeah

play23:47

multi-modality gpt4 has the ability to

play23:51

learn from text and images

play23:53

and respond to input from text and

play23:58

images first of all

play24:00

the foundation of multi-modality

play24:03

learning

play24:06

of course Transformers has made it

play24:08

possible for us to learn from

play24:10

multimodality tokenized text and images

play24:17

but at the foundational level help us

play24:19

understand how multimodality enhances

play24:22

the understanding of the world

play24:26

Beyond text by itself

play24:29

and uh and my understanding is that that

play24:32

that when you when you um uh do

play24:36

multi-modality learning

play24:38

learning that even when it is just a

play24:41

text prompt the text prompt the text

play24:43

understanding could actually be enhanced

play24:46

tell us about multimodality at the

play24:49

foundation why it's so important and and

play24:51

what was what's the major breakthrough

play24:53

in the the and the characteristic

play24:55

differences as a result

play24:57

so there are two Dimensions to

play24:59

multimodality

play25:01

two reasons why it is interesting

play25:03

the first reason

play25:05

is

play25:07

a little bit humble the first reason is

play25:10

that multimodality is useful

play25:13

it is useful for a neural network to see

play25:16

Vision in particular

play25:18

because the world is very visual human

play25:21

beings are very visual animals

play25:25

I believe that a third of the visual

play25:27

core of the human cortex is dedicated to

play25:30

vision

play25:32

and so

play25:35

by not having vision

play25:37

the usefulness of our neural networks

play25:39

though still considerable

play25:42

is not as big as it could be so it is a

play25:45

very simple usefulness argument it is

play25:48

simply

play25:49

useful to see

play25:52

and gpt4 can see quite well

play25:56

the there is a second reason to division

play25:59

which is that we learn more about the

play26:02

world

play26:03

by learning from images in addition to

play26:06

learning from text

play26:09

that is also a powerful argument though

play26:12

it is not as clear-cut as it may seem

play26:15

I'll give you an example

play26:17

or rather before giving an example I'll

play26:20

make the general comment

play26:22

for a human being us human beings we get

play26:25

to hear about one billion words

play26:28

in our entire life

play26:30

only only one billion words that's

play26:33

amazing yeah that's not a lot yeah

play26:34

that's not a lot

play26:36

so we need to competent we need doesn't

play26:39

include my own words in my own head

play26:41

make it two billion but you see what I

play26:45

mean yeah you know we can see that

play26:47

because

play26:49

um a billion seconds is 30 years so you

play26:52

can kind of see like we don't get to see

play26:53

more than a few words a second then if

play26:55

you're asleep half the time so like a

play26:57

couple billion words is the total we get

play26:59

in our entire life so it becomes really

play27:01

important for us to get as many sources

play27:03

of information as we can and we

play27:05

absolutely learn a lot more from vision

play27:08

the same argument holds true for our

play27:11

neural networks as well

play27:13

except

play27:14

except for the fact that the neural

play27:16

network can learn from so many words

play27:18

so

play27:20

things which are hard to learn about the

play27:23

world from text in a few billion words

play27:26

may become easier

play27:29

from trillions of words and I'll give

play27:31

you an example

play27:34

consider colors

play27:37

surely one needs to see to understand

play27:40

calories

play27:42

and yet the text only neural networks

play27:45

who've never seen a single Photon in

play27:49

their entire life if you ask them which

play27:51

colors are more similar to each other it

play27:54

will know that red is more similar to

play27:56

Orange than to Blue it will know that

play27:58

blue is more similar to purple than to

play28:00

Yellow

play28:02

how does that happen

play28:04

and one answer is that information about

play28:07

the world even the visual information

play28:09

slowly leaks in through text but slowly

play28:13

not as quickly

play28:15

but then you have a lot of text you can

play28:16

still learn a lot of course once you

play28:19

also add vision and learning about the

play28:23

world from Vision you will learn

play28:24

additional things which are not captured

play28:26

in text

play28:27

but it is no I would not say that it is

play28:30

a binary there are things which are

play28:32

impossible to learn from the from text

play28:34

only I think this is more of an exchange

play28:36

rate and in particular as you want to

play28:39

learn if we are if you if you if you are

play28:42

like a human being and you want to learn

play28:44

from a billion words or a hundred

play28:47

million words then of course the other

play28:49

sources of information become far more

play28:51

important

play28:53

when you when you um I

play28:56

on the on the context of the scores that

play28:58

I saw

play28:59

um the thing that was really interesting

play29:01

was was uh the the data that you guys

play29:04

published which which one of the tests

play29:06

were were um uh performed well by gpt3

play29:09

and which one of the tests performed

play29:11

substantially better with gbt4

play29:14

um

play29:15

how did multi-modality contribute to

play29:18

those tests you think oh I mean

play29:21

in a pretty straightforward

play29:22

straightforward way anytime there was a

play29:25

test where a problem

play29:27

would were to understand the problem you

play29:29

need to look at a diagram like for

play29:31

example in some math competitions like

play29:33

there is a

play29:35

cont math competition for high school

play29:37

students called AMC 2012 right and there

play29:42

presumably many of the problems have a

play29:44

diagram

play29:46

so GPT 3.5 does quite badly on that on

play29:50

that X on that on the test gpt4 with

play29:54

text only does

play29:56

I think I don't remember but it's like

play29:59

maybe from two percent to 20 accuracy of

play30:02

success rate but then when you add

play30:04

Vision it jumps to 40 success rate so

play30:06

the vision is really doing a lot of work

play30:08

the vision is extremely good and

play30:11

I think being able to reason visually as

play30:14

well and communicate visually will also

play30:16

be very powerful and very nice things

play30:19

which go beyond just learning about the

play30:21

world you have several things you got to

play30:23

learn you can learn about the world

play30:25

you can reason about the world visually

play30:27

and you can communicate visually where

play30:30

now in the future perhaps in some future

play30:32

version if you ask your neural net hey

play30:34

like explain this to me rather than just

play30:36

producing four paragraphs it will

play30:38

produce hey like here's like a little

play30:39

diagram which clearly conveys to you

play30:42

exactly what you need to know and so

play30:44

that's incredible tell tell us whatever

play30:47

you can about about uh where we are now

play30:51

and and what do you think will be and

play30:53

and not not too distant future but you

play30:56

know pick your your horizon a year or

play30:59

two uh where do you think this whole

play31:01

language Model area would be in some of

play31:03

the areas that you're most excited about

play31:04

you know predictions are hard

play31:07

and um it's a bit although it's a little

play31:10

difficult to do

play31:12

say things which are too specific

play31:14

I think it's safe to assume

play31:18

that

play31:19

progress will continue

play31:21

and that we will keep on seeing systems

play31:24

which Astound us

play31:26

in their in the things that they can do

play31:29

and the current Frontiers are will be

play31:32

centered around reliability around the

play31:35

system can be trusted really get into a

play31:38

point where you can trust what it

play31:40

produces

play31:41

really get into a point where if he

play31:43

doesn't understand something it asks for

play31:45

a clarification

play31:47

says that he doesn't know something says

play31:50

it in his more information I think those

play31:52

are perhaps the biggest the areas where

play31:55

Improvement will lead to the biggest

play31:58

impact on the usefulness of those

play32:00

systems because right now that's really

play32:02

what stands in the way you have an AF

play32:04

asking your own network you're asking

play32:05

neural net to maybe summarize some long

play32:07

document and you get a summary

play32:10

like are you sure that some important

play32:12

detail wasn't omitted it's still a

play32:13

useful summary

play32:14

but it's a different story when you know

play32:17

but all the important points have been

play32:19

covered

play32:21

at some point like and in particular

play32:23

it's okay like if some even the reason

play32:25

ambiguity it's fine but if a point is

play32:28

clearly important such that anyone else

play32:30

who saw that point would say this is

play32:31

really important

play32:33

when the neural network will also

play32:34

recognize that reliably that's when you

play32:36

know same for the guardrail say same for

play32:39

its ability to clearly follow the intent

play32:42

of the user of its operator

play32:45

so I think we'll see a lot of that in

play32:47

the next two years yeah because the

play32:49

progress in those two areas will make

play32:51

this technology trusted by people to use

play32:55

and be able to apply for so many things

play32:57

I was thinking that was going to be the

play32:58

last question but I did have another one

play33:00

sorry about it okay so so Chad uh chat

play33:03

gbt to gpt4 gpt4 when when it first when

play33:07

you first started using it what are some

play33:10

of the skills that

play33:11

it demonstrated that surprised even you

play33:15

well

play33:18

there were lots of really cool things

play33:19

that it demonstrated which

play33:24

which is which were quite cool and

play33:26

surprising it was

play33:29

it was quite good so I'll mention two

play33:32

excess so let's see I'm just I'm just

play33:34

trying to think

play33:35

about the best way to go about it

play33:38

the short answer

play33:39

is that the level of its reliability was

play33:42

surprising

play33:43

where the previous neural networks if

play33:46

you ask them a question sometimes they

play33:49

might

play33:49

misunderstand something in a kind of a

play33:53

silly way whereas the gpt4 that stopped

play33:55

happening

play33:56

its ability to solve math problems

play33:58

became far greater it's like you could

play34:01

really like say sometimes you know

play34:02

really do the derivation and like long

play34:05

complicated derivation and you could

play34:06

convert the units and so on and that was

play34:08

really cool you know like many people

play34:10

what's your proof it works through a

play34:12

proof it's pretty amazing not all proofs

play34:14

yeah naturally but but quite a few yeah

play34:16

or another example would be like many

play34:19

people noticed that it has the ability

play34:21

to

play34:23

produce poems with you know every word

play34:26

starting with the same letters or every

play34:28

word starting with some it follows

play34:31

instructions really really clearly not

play34:33

perfectly still but much better before

play34:35

yeah really good and on the vision side

play34:37

I really love how it can explain jokes

play34:40

you can explain memes you show it a meme

play34:43

and ask it why it's funny and it will

play34:45

tell you and it will be correct the the

play34:48

vision part I think is very

play34:51

was also very it's like really actually

play34:53

seeing it when you can ask questions

play34:55

follow-up questions about some

play34:57

complicated image with a complicated

play34:59

diagram and get an explanation that's

play35:01

really cool

play35:02

but yeah overall I will say to take a

play35:05

step back

play35:06

you know I've been I've been in this

play35:08

business for quite some time actually

play35:10

like almost exactly 20 years

play35:12

[Music]

play35:14

and

play35:16

the thing which most of which I find

play35:18

most surprising is that it actually

play35:20

works

play35:22

yeah like it it's turned out to be the

play35:25

same little thing all along which is no

play35:27

longer little and it's a lot more

play35:29

serious and much more intense

play35:31

but it's the same neural network just

play35:34

larger trained on maybe larger data sets

play35:37

in different ways with the same

play35:39

fundamental training algorithm yeah

play35:41

so it's like wow

play35:44

I would say this is what I find the most

play35:46

surprising yeah whenever I take a step

play35:48

back I go how is it possible that those

play35:50

ideas those conceptual ideas about well

play35:53

the brain has neurons so maybe

play35:55

artificial neurons are just as good and

play35:57

so maybe we just need to train them

play35:58

somehow with some learning algorithm

play36:00

that those arguments turned out to be so

play36:02

incredibly correct

play36:04

that would be

play36:06

the biggest surprise it's

Rate This

5.0 / 5 (0 votes)

Related Tags
深度学习人工智能GPT-4机器学习神经网络未来趋势技术突破认知科学数据压缩多模态学习
Do you need a summary in English?