Ilya Sutskever | This will all happen next year | I totally believe | AI is come

Me&ChatGPT
18 Apr 202416:07

Summary

TLDR这段视频剧本探讨了多模态学习的重要性及其对神经网络发展的影响。多模态学习不仅增强了神经网络的视觉理解能力,也通过图像而非仅文本来更全面地了解世界。讨论了人类一生中接收到的词汇量有限,因此需要通过视觉等多种信息源来丰富知识。此外,还提到了AI生成测试以训练其他AI的潜力,以及未来AI自我提升的可能性。最后,讨论了大型语言模型的可靠性和未来发展,强调了提高系统可信度和遵循用户意图的重要性。

Takeaways

  • 📈 多模态性对于神经网络尤其有用,因为世界是非常视觉化的,人类是视觉动物,大约有三分之一的视觉皮层致力于视觉。
  • 🧠 人类一生中大约只能听到十亿个单词,这强调了从视觉等其他信息源学习的重要性。
  • 🌐 神经网络可以从文本中学习世界信息,即使没有直接的视觉输入,也能了解颜色等概念。
  • 🔍 通过视觉学习,我们可以了解世界构造、物理和动画,而音频可以为模型学习提供额外的信息源。
  • 📊 在数学竞赛问题中,视觉输入显著提高了神经网络解决问题的准确率。
  • 🤖 神经网络通过视觉和文本学习,能够进行视觉推理和沟通,未来可能通过图像而非文字来解释问题。
  • 🔑 未来语言模型的发展将集中在提高系统的可靠性和信任度,确保其输出的准确性和完整性。
  • 🔄 神经网络可能会通过生成自己的数据来训练自己,类似于人类通过自我反思和睡眠中的大脑活动来学习。
  • 🚀 GPT-4在可靠性、解决数学问题的能力以及遵循指令方面表现出色,特别是在视觉方面解释笑话和模因的能力。
  • 🎯 神经网络的发展证明了早期关于人工神经元和学习算法的概念是正确的,这是过去20年中最大的惊喜。
  • 🌟 神经网络的训练和数据量在过去10年里增长了一百万倍,这是计算机科学领域难以置信的成就。

Q & A

  • 多模态在神经网络中的重要性是什么?

    -多模态对于神经网络非常重要,因为它增加了视觉信息的输入,使网络能够更好地理解和解释世界。人类是非常视觉化的生物,大脑中约有三分之一的皮层专门用于视觉处理,因此多模态可以显著提升神经网络的实用性。

  • 为什么说通过图像学习可以让我们更深入地了解世界?

    -通过图像学习,我们不仅可以从文本中获取信息,还能从视觉信息中获得额外的知识。例如,颜色的相似性,即使没有直接的视觉经验,文本信息也能间接地告诉我们红色与橙色比与蓝色更相似。

  • 人类一生中大约能听到多少单词?

    -人类一生中大约能听到十亿个单词。这个数字可能看起来很多,但实际上并不大,因为十亿秒大约是30年,而我们每天有一半的时间在睡觉,所以每秒听到的单词数量并不多。

  • 为什么说神经网络可以从大量文本中学习到世界的知识?

    -尽管神经网络可能从未直接看到过任何东西,但它们可以从大量的文本数据中学习到关于世界的知识。这是因为文本中包含了关于世界的间接信息,即使这些信息不是直接的视觉信息。

  • 为什么说多模态学习可以提高神经网络的实用性?

    -多模态学习可以提高神经网络的实用性,因为它允许网络从多种信息源中学习,而不仅仅是文本。例如,通过视觉信息,神经网络可以学习到颜色、形状和物体之间的关系等概念。

  • 为什么说音频信息对于神经网络的学习也是有用的?

    -音频信息是另一种信息源,它可以帮助神经网络理解语言的情感和语境,比如区分讽刺和热情的语气。虽然音频可能不如图像或视频信息那么丰富,但它仍然是一个有价值的补充信息源。

  • GPT-3和GPT-4在处理数学问题时的表现有何不同?

    -GPT-4在处理数学问题时的表现显著优于GPT-3。例如,在AMC2数学竞赛中,GPT-4在添加视觉信息后,解决问题的成功率从2%提高到了40%。这说明视觉信息对于提高神经网络解决问题的能力至关重要。

  • 为什么说神经网络的可靠性是未来研究的重要方向?

    -神经网络的可靠性是指它们能够被信任并准确无误地完成任务的能力。如果神经网络能够可靠地识别重要信息并遵循用户的意图,那么它们的实用性将大大提高。

  • GPT-4在哪些方面表现出了令人惊讶的技能?

    -GPT-4在多个方面表现出了令人惊讶的技能,包括解决复杂的数学问题、创作诗歌、解释笑话和模因,以及在视觉方面对复杂图像和图表的解释能力。

  • 为什么说神经网络的自我生成数据可能是未来训练AI的重要部分?

    -自我生成数据可以让AI在没有新外部数据的情况下继续学习和进步。类似于人类通过自我反思和思考问题来训练大脑,AI也可以通过生成对抗性内容或解决新问题来提高自身能力。

  • 未来一两年内,大型语言模型的哪些领域可能会有显著进步?

    -在未来一两年内,我们可以期待大型语言模型在可靠性和用户意图理解方面取得显著进步,这将使技术更加值得信赖,并能够应用于更多领域。

Outlines

00:00

👀 多模态的重要性与视觉理解

第一段主要讨论了多模态在神经网络中的重要性,特别是视觉方面。人类是视觉动物,大脑中有大量的区域专注于视觉处理。作者指出,虽然神经网络即使没有视觉也能发挥作用,但视觉的加入可以大大提升其效用。此外,通过图像学习可以让我们更全面地了解世界,尽管这并非总是直观的。例如,尽管我们一生中可能只会听到十亿个词,但通过视觉学习,我们可以获取更多的信息。作者还提到了颜色的识别,说明了即使没有直接的视觉经验,神经网络也能通过文本间接学习到关于颜色的知识。

05:03

🔊 音频作为信息的补充

第二段讨论了音频作为信息来源的补充作用,尽管它可能不如图像或视频那样丰富。作者提到了音频在识别和生成方面的重要性,并以GPT-3和GPT-4在多模态测试中的表现为例,说明了视觉对提高准确率的显著影响。此外,作者还探讨了未来神经网络可能通过视觉和沟通来更好地理解世界,并可能通过生成图表来更直观地解释问题。最后,作者提出了关于AI生成数据自我训练的概念,暗示这可能是未来AI发展的一个重要方向。

10:05

🤖 语言模型的未来与可靠性

第三段聚焦于语言模型的未来发展,特别是可靠性方面。作者强调了确保系统产出内容的可信赖性的重要性,包括在不理解时请求澄清,或在需要更多信息时明确表示。作者认为,提高这些方面的表现将极大提升系统的应用价值。此外,作者分享了对GPT-4在可靠性、解决数学问题能力以及遵循用户意图方面表现出的惊喜,以及在视觉方面解释笑话和模因的能力。最后,作者反思了20年来在这一领域的工作经历,并对神经网络概念的成功实现表示惊讶。

15:06

🎉 对话与成就的庆祝

在最后一段中,对话者对Ilia的工作表示赞赏,认为他对大型语言模型的描述是超越博士学位的最佳解释之一。他们庆祝了Ilia在计算机视觉和GPT模型方面的开创性工作,并对他20年的职业生涯表示敬意。这段对话以对Ilia的成就的赞扬和对未来的期待结束。

Mindmap

Keywords

💡多模态

多模态指的是在人工智能领域中,系统能够同时处理和理解多种类型的数据输入,如文本、图像、声音等。在视频中,多模态的重要性被强调,因为它能使得神经网络更全面地理解和学习世界,比如通过视觉信息增强对世界的认知。

💡神经网络

神经网络是一种受人脑结构启发的计算模型,用于处理复杂的数据模式。视频中提到神经网络在视觉方面的能力,强调了它们在模拟人类视觉皮层方面的作用,以及如何通过多模态输入提高其性能。

💡视觉皮层

视觉皮层是大脑中处理视觉信息的区域。视频中提到人类大脑约有三分之一的视觉皮层专门用于视觉处理,这表明视觉在人类认知中的重要性,以及在设计多模态AI时考虑视觉输入的必要性。

💡信息源

信息源指的是获取知识的渠道。视频中讨论了除了文本之外,图像和视觉也是重要的信息源,可以帮助我们更全面地学习和理解世界,对于神经网络同样适用。

💡颜色

颜色是视频中用来说明视觉信息重要性的例子。尽管文本可以描述颜色,但实际看到颜色是理解它们的关键。神经网络即使没有直接的视觉输入,也能通过大量文本数据学习颜色之间的关系。

💡合成数据

合成数据是指人工生成的数据,用于训练和测试AI模型。视频中提到AI生成自己的数据进行自我训练的可能性,这可能成为未来AI训练的重要部分。

💡可靠性

可靠性是指系统输出的准确性和一致性。视频中强调了提高AI系统的可靠性是未来发展的关键,确保它们能够被信任并广泛应用。

💡意图识别

意图识别是指AI系统理解用户指令或请求背后真正意图的能力。视频中提到,提高AI在理解用户意图方面的准确性是提升其实用性的关键。

💡自我学习

自我学习是指AI系统通过生成或处理数据来提高自身性能的能力。视频中讨论了AI在不使用时可能进行自我训练的概念,这可能对AI的未来发展至关重要。

💡GPT-3和GPT-4

GPT-3和GPT-4是OpenAI开发的先进语言模型。视频中提到GPT-4在多模态能力上的提升,特别是在处理视觉信息和数学问题方面,显示了新一代模型在理解和解决问题方面的进步。

💡视觉推理

视觉推理是指通过视觉信息进行逻辑分析和问题解决的能力。视频中提到GPT-4在视觉推理方面的能力,如解释笑话和理解图像,展示了AI在视觉理解方面的高级功能。

Highlights

多模态性的重要性:多模态性对于神经网络在视觉方面特别有用,因为世界是非常视觉化的,人类是视觉动物。

多模态性提升神经网络的实用性:通过视觉,神经网络的实用性可以大幅提升。

人类通过图像学习世界:除了文本,图像也是我们了解世界的重要方式。

人类一生听到的词汇量有限:大约只有十亿个词汇,强调了从其他信息源学习的重要性。

神经网络通过大量文本学习:神经网络可以处理数万亿的词汇,从而更容易地学习世界。

颜色理解示例:即使没有直接的视觉经验,文本神经网络也能了解颜色之间的关系。

视觉信息通过文本缓慢泄露:大量文本可以提供视觉信息,尽管不如直接视觉学习那样快速。

多模态学习的重要性:强调了除了文本之外,图像和视频作为信息源的重要性。

视频和声音在理解世界中的作用:通过视频和声音,我们可以学习世界的构造和物理规律。

音频作为信息源的价值:音频提供了除图像和视频之外的额外信息源。

多模态性在GPT3和GPT4测试中的贡献:视觉的添加显著提高了问题解决的成功率。

视觉推理和沟通的重要性:视觉不仅帮助我们了解世界,还能用于推理和沟通。

AI生成测试训练AI的概念:提出了AI使用自身生成的数据进行自我训练的可能性。

语言模型的未来:预测了语言模型在可靠性和信任度方面的未来发展。

GPT4的可靠性和解决问题能力:GPT4在理解问题和解决数学问题方面表现出色。

GPT4的视觉能力:GPT4能够解释笑话和梗图,展示其高级的视觉理解能力。

神经网络的基本原理:强调了神经网络基本原理的正确性和其在AI发展中的持久性。

计算能力的指数级增长:在过去十年中,用于训练神经网络的计算能力增长了一百万倍。

Transcripts

play00:00

so there are two Dimensions to

play00:02

multimodality two reasons why it is

play00:05

interesting the first

play00:07

reason is a little bit humble the first

play00:12

reason is that multimodality is

play00:15

useful it is useful for a neural network

play00:18

To See Vision in particular because the

play00:21

world is very visual human beings are

play00:25

very visual

play00:26

animals I believe that a third of the

play00:29

visual of the human cortex is dedicated

play00:32

to

play00:33

vision and

play00:37

so by not having Vision the usefulness

play00:40

of our neural networks though still

play00:43

considerable is not as big as it could

play00:46

be mhm so it is a very simple usefulness

play00:49

argument M it is simply useful to

play00:54

see and gp4 can see quite

play00:58

well the there is the second reason to

play01:01

div Vision which is that we learn more

play01:04

about the

play01:05

World by learning from images in

play01:08

addition to learning from

play01:11

text that is also a powerful argument

play01:15

though it is not as clearcut as it may

play01:16

seem and I'll give you an

play01:19

example or rather before giving an

play01:21

example I'll make the general comment

play01:24

for a human being as human beings we get

play01:28

to hear about one one billion words in

play01:31

our entire life only only one billion

play01:35

words that's amazing yeah that's not a

play01:36

lot yeah that's not a

play01:38

lot so we need to comp we need does that

play01:41

include my own words in my own

play01:44

head make it two billion but you see

play01:47

what I mean yeah you know we can see

play01:49

that because um a billion seconds is 30

play01:53

years so you can kind of see like we

play01:55

don't get to see more than a few words a

play01:57

second then we asleep half the time so

play01:59

like a couple billion words is the total

play02:01

we get in our entire life so it becomes

play02:04

really important for us to get as many

play02:06

sources of information as we can and we

play02:08

absolutely learn a lot more from

play02:10

Vision the same argument holds true for

play02:13

our neural networks as well except

play02:17

except for the fact that the neural

play02:18

network can learn from so many words

play02:22

so things which are hard to learn about

play02:25

the world from text in a few billion

play02:29

words may become easier from trillions

play02:32

of words and I'll give you an

play02:36

example consider

play02:39

colors surely one needs to see to

play02:43

understand

play02:44

colors and yet the text only neural

play02:47

networks who never seen a single Photon

play02:51

in their entire life if you ask them

play02:54

which colors are more similar to each

play02:55

other it will know that red is more

play02:58

similar to Orange than to Blue mhm it

play03:00

will know that blue is more similar to

play03:02

purple than to Yellow mhm how does that

play03:06

happen and one answer is that

play03:09

information about the world even the

play03:11

visual information slowly leaks in

play03:14

through text but slowly not as quickly

play03:17

but when you have a lot of text you can

play03:19

still learn a lot of course once you

play03:22

also add vision and learning about the

play03:25

world from Vision you will learn

play03:26

additional things which are not captured

play03:28

in text but it is not I would not say

play03:32

that it is a binary there are things

play03:34

which are impossible to learn from from

play03:37

text only I think this more of an

play03:39

exchange rate yeah and in particular as

play03:41

you want to learn if if we are if you if

play03:44

you if you are like a human being and

play03:46

you want to learn from a billion words

play03:48

or a 100 million words then of course

play03:51

the other sources of information become

play03:53

far more

play03:55

important yeah and so so the the um you

play03:59

learn from

play04:00

images is there is there a sensibility

play04:03

that that would suggest that if we

play04:04

wanted to understand um also the

play04:07

construction of the world as in you know

play04:10

the arm is connected to my shoulder that

play04:12

my elbows connected that somehow these

play04:14

things move the the the the an the the

play04:17

animation of the world the physics of

play04:19

the world if I wanted to learn that as

play04:21

well can I just watch videos and learn

play04:23

that yes yeah and and if I wanted to

play04:26

augment all of that with sound like for

play04:28

example if somebody said

play04:30

um the meaning of of

play04:32

great uh great could be great or great

play04:36

could be great you know so one is

play04:40

sarcastic one is enthusiastic uh there

play04:43

are many many words like that you know

play04:46

uh uh that's sick or you know I'm sick

play04:49

or I'm sick depending on how people say

play04:52

it would would audio also make a

play04:55

contribution to the learning of the the

play04:57

the model and and could we put that to

play04:59

good use soon yes yeah I think I think

play05:02

it's definitely the case that well you

play05:05

know what can we say about audio it's

play05:08

useful it's an additional source of

play05:09

information probably not as much as

play05:12

images or video but there is there is a

play05:16

case to be made for the usefulness of

play05:18

audio as well both on the recognition

play05:20

side and on the production side when you

play05:23

when you um I on the on the context of

play05:27

the scores that I saw uh the the thing

play05:30

that was really interesting was was uh

play05:32

the the data that you guys published

play05:34

which which one of the tests were were

play05:37

um uh performed well by gpt3 and which

play05:40

one of the tests performed substantially

play05:42

better with GPT 4 um how did

play05:46

multimodality contribute to those tests

play05:48

do you think oh I mean in a pretty

play05:51

straightforward straightforward way

play05:54

anytime there was a test where a problem

play05:57

would where to understand a problem you

play05:59

need to look at a diagram mhm like for

play06:01

example in some math competitions like

play06:03

there is a cont math competition for

play06:06

high school students called

play06:09

amc2 right and there presumably many of

play06:13

the problems have a diagram M so GPT 3.5

play06:18

does quite badly on that on that EX on

play06:21

that on the test GPT 4 with text only

play06:25

does I think I don't remember but it's

play06:28

like maybe from 2% to 20% accuracy of

play06:32

success rate but then when you add

play06:34

Vision it jumps to 40% success rate so

play06:36

the vision is really doing a lot of work

play06:38

the vision is extremely good and I think

play06:42

being able to reason visually as well

play06:44

and communicate visually will also be

play06:46

very powerful and very nice things which

play06:49

go beyond just learning about the world

play06:52

there have several things you got to

play06:53

learn you can learn about the world you

play06:55

can then reason about the world visually

play06:57

and you can communicate visually

play07:00

where now in the future perhaps in some

play07:02

future version if you ask your neural

play07:04

net hey like explain this to me rather

play07:06

than just producing four paragraphs it

play07:07

will produce hey like here's like a

play07:09

little diagram which clearly conveys to

play07:12

you exactly what you need to know and so

play07:14

yeah that's incredible you know one of

play07:15

the things that you said earlier about

play07:17

about an AI generating generating a a

play07:20

test to train another AI um you know

play07:23

there's there was a paper that was

play07:25

written about and I I I don't I don't

play07:28

completely know whether whether it's

play07:29

factual or not but but um that there's

play07:32

there's a total amount of somewhere

play07:33

between 4 trillion to something like 20

play07:35

trillion useful you know tokens in in

play07:41

language tokens that that the world will

play07:43

be able to train on you know over some

play07:46

period of time and that we're going to

play07:47

run out of tokens to train and and um I

play07:51

I well first of all I wonder if that's

play07:53

that you feel the same way and then the

play07:55

second

play07:56

secondarily whether whether the AI

play08:00

generating its own um data uh could be

play08:05

used to train the AI itself which you

play08:09

could argue is a little circular but um

play08:12

we train our brain with

play08:15

generated data all the time by uh

play08:19

self-reflection um working through a

play08:22

problem in our brain uh you know and and

play08:25

uh or you know some I guess I guess

play08:28

neuroscientists suggest sleeping uh we

play08:30

we do a lot of fair amount of you know

play08:32

developing our neurons um how do you see

play08:35

this this area of synthetic data

play08:37

generation is that going to be an

play08:38

important part of the future of training

play08:40

Ai and and the AI teaching itself well

play08:43

that's I think like I I wouldn't

play08:46

underestimate the data that exists out

play08:49

there mhm I think there

play08:51

probably I think there probably more

play08:53

more data Than People realize and as to

play08:57

your second question certainly

play08:59

possibility mhm remains to be seen yeah

play09:03

yeah it see it it really does seem that

play09:05

that um uh one of these days our AIS are

play09:08

are um you know when we're not using it

play09:11

maybe generating either adversarial

play09:13

content for itself to learn from or

play09:16

imagine solving problems that that it

play09:18

can go off and and then and then uh

play09:20

improve itself uh tell tell us uh uh

play09:24

whatever you can about about uh uh where

play09:27

we are now and and where do you think

play09:29

will be in in not not too distant future

play09:32

but you know pick pick your your horizon

play09:35

a year or two uh what do you think this

play09:37

whole language Model area would be in

play09:39

some of the areas that you're most

play09:40

excited about you know predictions are

play09:43

hard and um it's bit it's a bit although

play09:46

it's a little difficult

play09:48

to say things which are too

play09:50

specific I think it's safe to

play09:54

assume that progress will continue and

play09:59

that will keep on seeing systems which

play10:01

Astound us in their in the things they

play10:04

can do and the current Frontiers are

play10:08

will be centered around

play10:09

reliability around the system can be

play10:13

trusted really getting to a point where

play10:15

you can trust what it produces really

play10:18

getting to a point where if it doesn't

play10:21

understand something it asks for a

play10:23

clarification says that he doesn't know

play10:25

something says that he needs more

play10:27

information I think though those are

play10:30

perhaps the biggest the areas where

play10:32

Improvement will lead to the biggest

play10:35

impact on the usefulness of those

play10:37

systems because right now that's really

play10:39

what stands in the way you have an you

play10:41

have asking neural net for you asking

play10:42

neural net to maybe summarize some long

play10:44

document and you get a

play10:46

summary like are you sure that some

play10:48

important detail wasn't omitted it's

play10:49

still a useful summary but it's a

play10:52

different story when you know that all

play10:54

the important points have been

play10:57

covered at some point like and in

play10:59

particular it's okay like if some if

play11:02

there is ambiguity it's fine but if a

play11:04

point is clearly important such that

play11:06

anyone else who saw that point would say

play11:08

this is really important when the neural

play11:10

network will also recognize that

play11:11

reliably that's when you know same for

play11:14

the guard rail same same for its ability

play11:16

to clearly follow the intent of the user

play11:19

of of its operator so I think we'll see

play11:23

a lot of that in the next two years yeah

play11:25

that's terrific because those the

play11:26

progress in those two areas will make

play11:28

this Tech technology uh trusted by

play11:31

people to use and be able to apply for

play11:33

so many things I I was thinking that was

play11:35

going to be the last question but I did

play11:36

have another one sorry about that so so

play11:38

chat uh chat GPT to gp4 um gp4 when when

play11:43

it first when you first started using it

play11:45

uh what are some of the skills that it

play11:49

demonstrated that surprised even you

play11:54

well there were lots of really cool

play11:56

things that it demonstrated which

play12:01

which is which were quite cool and

play12:03

surprising it was it was quite good so

play12:08

I'll mention two ex so let's see I'm

play12:10

just I'm TR trying to think about the

play12:13

best way to go about it the short answer

play12:16

is that the level of its reliability was

play12:19

surprising mhm where the previous neural

play12:22

networks if you ask them a question

play12:25

sometimes they might misunderstand

play12:27

something in a kind of a

play12:29

silly way where it was gp4 that stopped

play12:32

happening its ability to solve math

play12:35

problems became far greater it's like

play12:37

you could really like say you know

play12:39

really do the derivation and like long

play12:41

complicated derivation you could convert

play12:43

the units and so and that was really

play12:45

cool you know like many people it works

play12:47

through a proof it works through a proof

play12:49

it's pretty amazing not all proofs

play12:51

naturally but but quite a few or another

play12:54

example would be like many people

play12:56

noticed that it has the ability to

play13:00

produce poems with you know every word

play13:03

starting with the same letter or every

play13:05

word starting with some it follows

play13:07

instructions really really clearly not

play13:10

perfectly still but much better than

play13:12

before yeah really good and on the

play13:13

vision side I really love how it can

play13:16

explain jokes it can explain memes you

play13:19

show it a meme and ask it why it's funny

play13:21

and it will tell you and it will be

play13:23

correct the V the vision part I think is

play13:27

very was also very it's like really

play13:29

actually seeing it when you can ask

play13:31

followup questions about some

play13:34

complicated image with a complicated

play13:36

diagram and get an explanation that's

play13:37

really

play13:38

cool but yeah overall I will say to take

play13:41

a step back you know I've been I've been

play13:45

in this business for quite some time

play13:46

actually like almost exactly 20

play13:50

years

play13:52

and the thing which most which I find

play13:54

most surprising is that it actually

play13:57

works

play13:59

yeah like it it turned out to be the

play14:01

same little thing all along which is no

play14:04

longer little and it's a lot more

play14:06

serious and much more intense but it's

play14:09

the same neural network just larger

play14:11

trained on maybe larger data sets in

play14:13

different ways with the same fundamental

play14:16

training algorithm yeah so it's like

play14:20

wow I would say this is what I find the

play14:22

most surprising yeah whenever I take a

play14:24

step back I go how is it possible that

play14:26

those ideas those conceptual ideas about

play14:28

well well the brain has neurons so maybe

play14:32

artificial neurons are just as good and

play14:34

so maybe we just need to train them

play14:35

somehow with some learning algorithm

play14:37

that those arguments turned out to be so

play14:39

incredibly

play14:40

correct that would be the biggest

play14:44

surprise I'd say in the in the 10 years

play14:46

that that we've known each other uh

play14:49

you're you're uh the near the models

play14:52

that you've trained and the amount of

play14:53

data you've trained from uh the what you

play14:57

did on Alex net to to now is about a

play15:00

million times and and uh uh no no one in

play15:06

the world of computer science would have

play15:08

would have believed that the amount of

play15:10

computation that was done in that 10

play15:12

years time would be a million times

play15:14

larger and that that uh you dedicated

play15:17

your career to go go do that um you've

play15:20

done two uh many more uh your body of

play15:24

work is incredible but two seminal works

play15:26

and the invention the co-invention with

play15:28

Alex and that that early work and and

play15:31

now with uh GPT at open AI uh it is it

play15:35

is truly remarkable what you've

play15:37

accomplished it's it's great to catch up

play15:39

with you again Ilia my good friend and

play15:42

and um uh it is uh it is quite an

play15:45

amazing moment and it's a today's

play15:47

today's talk the way you you uh break

play15:50

down the problem and describe it uh this

play15:52

is one of the one of the uh the the best

play15:55

PhD Beyond PhD descriptions of the State

play15:59

ofth art of large language models I

play16:01

really appreciate that it's great to see

play16:02

you congratulations thank you so much

play16:04

yeah thank you had so much fun thank you

Rate This

5.0 / 5 (0 votes)

Related Tags
多模态神经网络视觉学习文本分析AI发展图像理解语言模型自我学习数据生成技术进步
Do you need a summary in English?