Ilya Sutskever | The future of AGI will be like what you see in the movies

Me&ChatGPT
12 Aug 202430:10

Summary

TLDR这段视频剧本探讨了神经网络的复杂性和其在人工智能领域的应用。讨论了神经网络难以精确解释的特性,以及这种特性如何成为构建真正智能的潜在优势。剧本还回顾了神经网络的发展历程,包括早期的挑战和近年来的快速发展。此外,还讨论了OpenAI的成立初衷,以及对未来人工智能技术,特别是在编程、法律和其他白领职业中应用的展望。

Takeaways

  • 🧠 神经网络的编写与传统编程不同,它涉及在循环内编写复杂方程,运行后难以精确理解其功能,这与神经网络的不可解释性相关。
  • 🤖 神经网络的不可解释性可能是其特性而非缺陷,因为真正的智能本身就难以理解,正如人类自身的认知功能一样。
  • 📈 神经网络的成功部分归因于它们在数学上的难以推理,这与早期AI领域的其他方法形成对比。
  • 🕵️‍♂️ 早期对神经网络的研究者需要长期坚持,因为当时AI领域并不看好神经网络,直到2010年左右才开始受到关注。
  • 🚀 神经网络的潜力被重新认识,随着计算能力的提升,人们对其潜力的期待也随之增加。
  • 🌐 OpenAI的成立是出于将科学与工程紧密结合的愿景,以及对AI技术潜在影响的深刻理解。
  • 🛡️ OpenAI的目标之一是确保AI技术的安全和有益使用,同时关注政策和伦理问题。
  • 🔮 神经网络的未来发展不仅关乎技术进步,还涉及解决由AI带来的挑战和问题。
  • 🔢 神经网络模型的扩展性表明,随着计算能力的增加和数据集的扩大,模型的性能也会随之提高。
  • 💡 神经网络的泛化能力是其核心优势之一,但当前仍需大量数据来补偿其泛化能力的不足。
  • 🎨 神经网络在创造性任务上的表现出乎意料,这可能与它们的生成模型特性有关,这些模型在艺术创作方面显示出巨大潜力。

Q & A

  • 为什么作者认为神经网络的难以理解并不是一个缺陷,而是一个特性?

    -作者认为,如果我们想要构建智能,那么它不应该简单易懂。正如我们无法解释自己如何进行认知功能,如视觉、听觉和语言理解一样,如果计算机能够产生同样难以理解的对象,这可能意味着我们正走在正确的道路上。

  • 作者在什么时期开始对神经网络产生兴趣,并与Jeff Hinton合作?

    -作者在2000年代初开始对神经网络感兴趣,并在2003年开始与Jeff Hinton合作。

  • 为什么作者认为神经网络的成功部分是因为它们难以用数学来推理?

    -作者认为,神经网络之所以成功,部分原因是因为它们难以用数学来推理,这使得它们能够避免陷入数学证明的陷阱,专注于实际的进展。

  • 作者对于人工智能未来的希望是什么?

    -作者希望在未来,我们不仅能构建强大且有用的AI,而且能够实现通用人工智能(AGI),并确保它能够被用来解决大量问题并创造出许多令人惊叹的应用。

  • 为什么作者决定离开Google并创立OpenAI?

    -作者离开Google并创立OpenAI的动机之一是他认为将科学和工程融合在一起是AI发展的正确方式。另一个动机是他对AI技术有了更成熟的认识,希望建立一个能够意识到AI带来的挑战并致力于解决这些问题的公司。

  • OpenAI的初始愿景和目标是什么?

    -OpenAI的初始愿景是将科学和工程融合在一起,消除两者之间的界限,以科学的思想来指导工程实践,以工程的严谨性来推动科学发展。同时,作者希望OpenAI能够意识到AI的复杂性,并致力于使AI安全、有益。

  • 为什么作者认为将科学和工程结合在一起是重要的?

    -作者认为,随着AI领域的成熟,小规模的调整已经不足以推动实质性的进步。将科学和工程结合在一起可以确保科学发现得到精心的工程实施,同时确保工程实践得到科学思想的指导。

  • 为什么作者认为数据的可用性对于神经网络的性能至关重要?

    -作者指出,神经网络需要大量的数据来训练,以便更好地泛化。在数据丰富的领域,如语言处理,神经网络可以取得很好的效果。但在数据较少的领域,如法律,神经网络可能无法达到同样的性能水平。

  • 作者如何看待神经网络在编程领域的应用,例如Codex模型?

    -作者对Codex模型感到兴奋,因为它展示了神经网络能够从自然语言转换到代码,这是AI领域的一个重要进步。Codex模型能够理解自然语言描述的需求,并生成可运行的代码,这可能会改变编程职业的性质。

  • 为什么作者认为神经网络在创造性任务上的表现可能优于简单机器人任务?

    -作者认为,神经网络在创造性任务上的表现可能优于简单机器人任务,因为创造性任务如图像和文本生成有大量的数字数据可供学习,而机器人任务受限于现实世界中的数据收集。

  • 作者如何看待神经网络的泛化能力?

    -作者认为,神经网络的泛化能力是其核心优势之一。泛化能力使得神经网络能够在面对新的、不同于训练数据的情况时做出正确的反应。然而,这种能力尚未达到人类水平,因此需要大量的数据来补偿。

  • 作者对神经网络在未来的发展有哪些期望?

    -作者期望神经网络能够继续提高其泛化能力,减少对大量数据的依赖。他还期望神经网络能够更好地理解和生成代码,从而改变编程和其他白领职业的性质。

  • 为什么作者认为神经网络的泛化能力对于创造性应用特别重要?

    -作者认为,创造性应用特别适合神经网络,因为生成模型在机器学习中扮演着核心角色,它们生成新的、合理的数据的方式与艺术创作过程有相似之处。

Outlines

00:00

🤖 神经网络的不可解释性与智能发展

第一段主要讨论了神经网络与传统编程的区别,强调了神经网络的复杂性和不可解释性。作者认为,神经网络的这种特性并不是缺陷,而是其作为智能系统的一部分,反映了人类智能的复杂性。此外,提到了作者在2000年初对神经网络产生兴趣,并与Jeff Hinton合作,以及他对神经网络未来潜力的期待和对AI领域发展方向的早期看法。

05:02

🌐 AI技术的双刃剑特性及其全球影响

第二段中,讨论了AI技术潜在的积极影响和挑战,强调了AI的复杂性,以及它可能带来的问题和对人类生活的改善。作者表达了对AI技术未来发展的深刻认识,包括对政策和技术安全性的考虑,以及AI技术在全球范围内的分布和应用将如何定义未来几十年的世界格局。

10:05

🔬 科学与工程的融合以及AI的未来发展

第三段讲述了作者创立OpenAI的初衷,包括将科学与工程融合以推动AI发展,以及对AI技术潜在影响的深思熟虑。作者认为,科学与工程的紧密结合是AI领域成熟的标志,并且对于如何使用AI技术解决挑战和政策问题有着清晰的愿景。

15:08

📈 深度学习的扩展性与数据的重要性

第四段深入探讨了深度学习模型的扩展性,特别是随着计算能力和数据量的增加,模型性能的提升。作者指出,尽管有些人担心我们已经达到了深度学习的极限,但历史表明这种担忧往往是短视的。同时,作者也提出了数据量对于模型性能的影响,尤其是在数据较少的领域,如法律领域,AI的发展可能会面临挑战。

20:08

🛠️ Codex模型:从自然语言到代码的转变

第五段讨论了Codex模型,这是一个大型的GPT神经网络,它被训练来预测代码中的下一个词,而不是文本。作者强调了这种模型的实用性和创新性,以及它如何改变编程领域。Codex模型能够将自然语言转换为机器可执行的代码,这不仅展示了AI在编程领域的潜力,也预示着编程职业未来的变革。

25:08

🎨 AI在创意领域的应用及其对社会的影响

最后一段探讨了AI在创意领域,如艺术和写作中的应用,以及这些应用对社会的潜在影响。作者提到,尽管人们普遍认为简单的机器人任务将是自动化的首批目标,但实际上创意任务受到了较大的影响。AI在生成艺术作品和文本方面取得了显著进展,这可能会对社会经济和职业结构产生深远的影响。

Mindmap

Keywords

💡神经网络

神经网络是一种模拟人脑神经网络结构和功能的数学模型,用于处理复杂的数据模式。在视频中,提到神经网络的编写方式与传统编程不同,它们通过循环中的复杂方程式进行训练,其结果难以精确理解,这与神经网络的不透明性有关,但也是其强大之处。

💡解释性

解释性指的是模型或系统的可理解性,即我们能够理解模型是如何做出特定决策的。视频中提到,神经网络的不解释性不是缺陷,而是其特性之一,类似于人类智能的复杂性,我们无法完全解释我们的认知功能。

💡深度学习

深度学习是机器学习的一个子领域,它使用多层神经网络来模拟复杂的认知过程。视频中提到深度学习在数学上的难以推理性,以及它在AI领域的成功,这与深度学习模型的复杂性和强大能力有关。

💡Transformer

Transformer是一种深度学习模型架构,特别适用于处理序列数据,如文本和语音。视频中提到了GPT模型和Transformer架构,强调了它们在语言模型和预测任务中的有效性。

💡预测

在机器学习中,预测是指模型根据输入数据预测未知结果的能力。视频中讨论了预测与理解之间的联系,以及如何通过预测来提高模型的理解能力。

💡理解

理解通常指对信息或概念的深入认识。视频中提到,如果模型能够对接下来的内容做出很好的预测,那么它就具有一定程度的理解,即使这种理解并不完美。

💡数据集

数据集是用于训练和评估机器学习模型的数据集合。视频中提到了数据集在机器学习中的重要性,以及在特定领域(如法律)中数据的稀缺性可能限制了模型的性能。

💡计算能力

计算能力指的是执行计算任务的硬件和软件的能力。视频中讨论了随着计算能力的提升,深度学习模型的性能也随之提高,这是推动AI发展的关键因素之一。

💡Codex

Codex是一种大型语言模型,专门训练用于生成代码。视频中提到Codex能够将自然语言转换为机器可执行的代码,这展示了AI在编程领域的潜力和应用。

💡AGI(通用人工智能)

通用人工智能指的是具有广泛认知能力的人工智能,能够像人类一样在多个领域内学习和应用知识。视频中提到了对AGI的追求,以及它在未来可能带来的广泛应用和影响。

💡创新

创新是指创造新的或改进现有的想法、方法或产品。视频中提到了在数据不足的情况下,需要通过创新的方法来提高模型的性能,这包括改进算法和提高数据生成的效率。

💡泛化能力

泛化能力是指模型在未见过的数据上的表现能力。视频中讨论了神经网络的泛化能力,以及如何通过大量数据训练来提高这种能力,使其在新情境中也能做出正确的预测或决策。

💡创造性任务

创造性任务涉及生成新颖和原创的内容,如艺术、写作或编程。视频中提到,尽管人们可能认为简单的机器人任务更容易自动化,但实际上创造性任务正受到AI的显著影响,这与AI在生成新内容方面的能力有关。

Highlights

神经网络与传统编程方式不同,通过循环中的复杂方程式来实现,难以精确理解其功能。

神经网络的不透明性可能是其特性而非缺陷,类似于人类智能的复杂性。

作者在2000年代初对神经网络产生兴趣,并在2003年开始与Jeff Hinton合作。

AI领域曾长期处于探索阶段,需要大量坚持和毅力。

神经网络之所以成功,部分原因是它们在数学上的难以推理性。

OpenAI的创立初衷是将科学和工程融合,以推动AI发展。

AI技术的发展需要同时关注技术进步和安全性、政策等方面。

AI的发展将深刻影响国家间的力量对比和世界的未来走向。

跨学科融合是创新的重要途径,如苹果公司硬件与软件的结合。

GPT模型展示了深度学习模型在工程上的极致要求和科研上的创新需求。

预测能力与理解能力紧密相关,好预测需要一定程度的理解。

Transformer架构和模型规模的扩大是GPT模型成功的关键。

数据量的多少将影响深度学习模型在特定领域的应用效果。

深度学习历史上每年都有声音认为达到了极限,但每年都有新突破。

Codex模型能够将自然语言转换为代码,展示了AI在编程领域的潜力。

AI的发展可能首先影响创意产业,而非简单重复性劳动。

神经网络的泛化能力是AI进步的基础,但当前仍需大量数据支持。

生成模型在艺术创作中的应用,展示了AI在创意领域的潜力。

Transcripts

play00:00

because it's a very different way of

play00:02

writing code normally you write code and

play00:04

you can kind of think it through and

play00:06

understand yep whereas a neural network

play00:09

it's this you write an equation a

play00:11

complicated equation inside a loop and

play00:13

then you run the loop and a good luck

play00:15

figuring out what it does precisely and

play00:19

that connects to neural Nets not being

play00:22

interpretable but it could also argue

play00:24

that the difficulty of understanding

play00:26

what neural networks do is not a bug but

play00:28

it's feature

play00:30

like we want to build

play00:32

intelligence intelligence is not simple

play00:34

to understand we can't explain how we do

play00:38

the cognitive functions that we do how

play00:40

we see how we hear how we understand

play00:42

language so therefore if we can get if

play00:45

computers can produce objects that are

play00:47

similarly difficult to understand not

play00:50

impossible but similarly difficult it

play00:52

means we're on the right track and so

play00:54

all those things helped me uh converge

play00:58

on neural networks fairly early on

play01:00

yeah what year was it when you sort of

play01:02

like remember initially getting excited

play01:04

about neural networks and being pretty

play01:06

convicted like early 2000s I started

play01:08

working with Jeff Hinton in 2003 yeah so

play01:11

quite a while ago now long before I mean

play01:13

obviously the craze kind of started

play01:14

around 2010 and so there was a there's a

play01:17

good so like I think this is a common

play01:20

theme whenever you look at sort of any

play01:22

uh anybody who works like in in any sort

play01:24

of field that becomes very big but

play01:25

there's a long stretch of like you know

play01:28

wandering in the desert maybe is one way

play01:29

to put it yeah I mean definitely lots of

play01:32

perseverance is required because you

play01:34

don't know how long the how long you

play01:35

want to stay in the desert you just got

play01:37

to endure yeah and that's very helpful

play01:40

and did you expect like I mean obviously

play01:42

today neural networks do some pretty

play01:44

incredible things like do did did you

play01:46

expect back in 2003 or early 2000s that

play01:49

like in your lifetime you would see sort

play01:51

of the the things that we're seeing now

play01:53

with AI machine

play01:54

learning I was hoping but I did not

play01:58

expect it back

play02:01

then the field of AI was on the wrong

play02:04

track it was in a mindset of rejection

play02:08

of neural networks right and the reason

play02:11

for that is that neural networks are

play02:14

difficult to reason about mathematically

play02:16

while other stuff you can prove theorems

play02:18

about and there is something very

play02:20

seductive and dangerous about proving

play02:21

theorems about things y because it's a

play02:25

way to to Showcase your skill but it's

play02:28

not necessarily aligned with what makes

play02:29

the most progress in the field but I

play02:33

think that neural networks are as

play02:34

successful as they are precisely because

play02:36

they're difficult to reason about

play02:37

mathematically and so anyway my earlier

play02:42

hope was to Simply to convince the field

play02:44

that they should work on neural networks

play02:46

rather than the other stuff that they

play02:47

were doing yeah but then when computers

play02:50

started to get fast then my my level of

play02:52

excitement and about their potential has

play02:53

increased as well yeah and so what what

play02:56

are your aspirations today like what do

play02:57

you what is the like in your lifetime

play02:59

what's the thing you I mean I think it's

play03:01

obvious from the uh open ey Mission set

play03:03

but yeah I mean exactly right so now now

play03:05

thep now the hopes are much larger now I

play03:07

think we can really try to build not

play03:11

only really powerful and useful AI but

play03:14

actually AGI make it useful make it

play03:16

beneficial use it to solve and make it

play03:19

so that it will be used to solve a large

play03:22

number of problems and create lots of

play03:23

amazing applications that's that's what

play03:27

I'd like that's what I hope to see

play03:28

happen yep

play03:30

um and then you know obviously along the

play03:32

way you had been doing a lot of uh a lot

play03:34

of This research and doing a lot of

play03:35

groundbreaking work at Google and then

play03:37

you sort of left and started open AI

play03:39

with Sam mman and and Greg Brockman and

play03:42

a bunch of others um what was kind of

play03:44

the what were your kind of goals with

play03:46

starting opening eye at the outset what

play03:48

was what was sort of like the initial um

play03:50

conception or the initial vision and um

play03:53

and and what did you hope to accomplish

play03:54

by by starting sort of a new lab there

play03:57

were multiple motivations on my for

play04:00

starting open AI so the first motivation

play04:03

was that I felt that the way to make the

play04:07

most progress in AI was by merging

play04:10

science and engineering into a single

play04:13

hole into a unit to make it so that

play04:16

there is no distinction or as little

play04:18

distinction as possible between science

play04:20

and engineering so that all the science

play04:22

is in is infused with engineering

play04:24

discipline and careful execution and all

play04:26

the engineering is infused with the

play04:27

scientific ideas and the the reason for

play04:30

that is because the field is becoming

play04:33

mature and so it is hard to just do

play04:36

small scale tinkering without having a

play04:38

lot of engineering skill and effort to

play04:42

really make something work so that was

play04:44

one motivation I really wanted to have a

play04:46

comp a company that will be operating on

play04:49

this principle another motivation was

play04:51

that I came to

play04:55

see AI technology in a more sober way I

play04:59

used to think that AI will just be this

play05:01

endless good and now I see it in a more

play05:04

complex way where I think there will be

play05:06

a lot of truly incredible inconceivable

play05:11

applications that will improve our lives

play05:14

in dramatic ways right but I also think

play05:17

that there will be challenges I think

play05:19

that there will be lots of problems that

play05:21

will be posed by the misapplication of

play05:24

AI and by its peculiar properties that

play05:28

may be difficult for people to

play05:29

understand stand and I wanted a company

play05:32

that will be operating with this

play05:34

awareness in mind and that will be

play05:36

trying to address those challenges by

play05:40

you know as best as possible by not only

play05:42

working on advancing the technology but

play05:44

also working on making it safe and also

play05:46

working on the policy side of things as

play05:49

much as is rational and reasonable to

play05:52

make the whole be as useful and as

play05:55

beneficial as possible totally and I

play05:57

think something we agree on I mean I

play05:58

think one thing that is very obvious to

play06:00

me is that AI is something that's going

play06:02

to you know which countries have access

play06:05

to AI technology and the ways in which

play06:06

they use them are going to Define how

play06:08

the world plays out over the course of

play06:10

the next few decades like it's just I

play06:12

think that's how uh that's the path

play06:14

we're on as a as a world that's right

play06:16

among many other things right right and

play06:19

you know I and I this thing that you

play06:20

mentioned around sort of bringing

play06:22

together the science and engineering I

play06:23

think it's quite it's uh it's quite

play06:25

profound I think for a few reasons right

play06:28

because like one is that um uh you know

play06:31

first of all I think a lot of the the

play06:33

best the most incredible Innovative

play06:35

things happen often times from sort of

play06:36

blurring the lines between disciplines

play06:38

like apple is one of the best examples

play06:40

where from the very beginning they were

play06:42

always like Hey we're blending hardware

play06:44

and software and that's that's our

play06:45

special sauce and obviously it's uh

play06:47

produced some incredible things and I

play06:49

think a lot of other research Labs you

play06:50

know they operate in very sort of

play06:52

scientists tell the engineers what to do

play06:54

mindset which is counterproductive

play06:57

because you really need to understand

play06:58

both very well to understand what the

play07:00

what kind of the limits of the

play07:01

technology are yeah that's right and on

play07:03

that point you may even say isn't it

play07:05

obvious that the science and the

play07:07

engineering should be together and on

play07:09

some level it is but it just so happens

play07:12

that historically it hasn't been this

play07:14

way there's a certain kind of taste like

play07:18

empirically it has been the case in the

play07:20

past less so now that people who

play07:22

gravitate to research would have a

play07:24

certain taste that would also make them

play07:28

less drawn to engine engineering and

play07:30

vice versa and I think now because

play07:32

people are also seeing this reality on

play07:35

the ground that to do any kind of good

play07:36

science you need a good engineering then

play07:38

people then you have more and more

play07:39

people who are strong in both of these

play07:41

axes totally yeah and I think that you

play07:43

know uh Switching gears a little bit to

play07:45

to kind of the GPT models I this is a

play07:47

great illustration right because the GPT

play07:49

models are impossible without incredible

play07:52

engineering like the that's sort of um

play07:55

uh and and but yet they still require

play07:57

novel research they still require novel

play07:58

science to be able accomplish and they

play08:00

they've obviously been some of the

play08:02

biggest breakthroughs in the field of AI

play08:04

as of late and sort of um blown open

play08:06

many people's imaginations about what AI

play08:09

can accomplish or at least increase

play08:11

people's confidence that AI can

play08:12

accomplish incredible things you know

play08:14

I'm kind of curious about uh originally

play08:17

when when at opening ey when you guys

play08:18

were you've been working on these

play08:19

language models for some time what were

play08:21

the original sort of research

play08:23

Inspirations behind it and what were the

play08:25

original sort of um things that led you

play08:27

all to say hey this is something that's

play08:29

worth working on worth scaling up worth

play08:30

continuing to double down on so there'

play08:32

have been

play08:34

multiple lines of thinking that led us

play08:37

to convergent language

play08:39

models there has been an idea that we

play08:43

believed in relatively early on that you

play08:46

can somehow link understanding to

play08:50

prediction and specifically to

play08:52

prediction of

play08:54

whatever data you give to the to the

play08:56

model where the idea is well let's let's

play08:59

let's work out an example so before

play09:01

before diving into the example I'll I'll

play09:03

start with the conclusion first the

play09:05

conclusion is that if you can make

play09:07

really good guesses as to what's going

play09:09

to come next you can't make it perfectly

play09:12

it's impossible but if you can make a

play09:13

really good guess you need to have a

play09:16

meaningful degree of understanding you

play09:18

know in the example of of a book suppose

play09:22

that you read a book and it's a mystery

play09:24

novel and in the last chapter all the

play09:27

pieces are coming together and there is

play09:28

a critical sentence and you start to

play09:30

read you read the first word and the

play09:31

second word now you say okay the

play09:33

identity of some person is going to be

play09:35

relieved and your mind is honing you

play09:37

know like it's either this person or

play09:38

that person you don't know which one it

play09:40

is now maybe someone who read the book

play09:43

and thought about it very carefully says

play09:44

you know I think it's probably this

play09:46

person maybe that but probably this so

play09:49

what this example goes to show that

play09:51

really good

play09:52

prediction is connected to understanding

play09:55

and this kind of thinking has led us to

play09:59

EXP experiment with all kinds of

play10:01

approaches of hey can we predict things

play10:04

really well can we predict the next word

play10:06

can we predict the next pixel and study

play10:08

their their properties and through this

play10:10

line of work we were able to get to we

play10:15

did some work before the gpts before

play10:17

Transformers before the Transformers

play10:19

were

play10:20

invented and with um something that we

play10:23

call the sentiment neuron which is a

play10:26

neural net which was trying to predict

play10:27

the next word the next sorry the next

play10:30

character in reviews of Amazon

play10:34

products and it was a small neural net

play10:36

because it was maybe four years ago but

play10:39

it did prove the principle that if you

play10:41

predict the next character well enough

play10:45

you will eventually start to discover

play10:47

the semantic properties of the text and

play10:50

then with the gpts we took it further we

play10:52

said okay well we have the Transformer

play10:54

it's a better architecture so we have a

play10:56

stronger effect and then later there

play10:58

realization that if you make it larger

play11:01

it will be better so let's make it

play11:02

larger and it will be better yeah I mean

play11:05

there's there's a lot of uh there's a

play11:07

lot of great nuggets and what you just

play11:08

mentioned right I think first is the

play11:10

Elegance of this concept which is like

play11:12

hey if you get really good at predicting

play11:13

the next whatever get really good at

play11:15

prediction you get that obligates you to

play11:19

be good at all these other things if if

play11:21

you're really good at that and it's it's

play11:23

you know I think it's probably like

play11:25

under underrated how um that required

play11:28

some some Vision because it's like early

play11:31

on you know you try to get really good

play11:32

at predicting things and you know you

play11:35

got the sentiment NE on which is cool

play11:36

but it's like that's like a it's like a

play11:39

a blip relative to what we obviously

play11:41

have seen with the large language models

play11:43

and so that I think is significant and I

play11:45

think the other significant piece is um

play11:47

where you just mentioned which is kind

play11:48

of um scaling it up right and uh I think

play11:52

you know you guys had had uh released

play11:54

this paper about this kind of like a

play11:56

scaling laws of what you have found as

play11:57

you scaled up um compute data model size

play12:01

sort of in concert with one another but

play12:03

I'm I'm kind of curious like what's the

play12:06

um obviously there's there's some

play12:08

intuition which just like hey scaling

play12:10

things up um is good and you see you see

play12:12

great behaviors what's kind of your

play12:14

intuition behind um sort of if you think

play12:17

from now over the next few the next few

play12:19

years or even the next few decades like

play12:21

what what does scaling up mean why is it

play12:23

likely to continue resulting in in great

play12:26

results and and um what do you think the

play12:28

limits are if any I think two the two

play12:31

stat two statements are true at the same

play12:32

time on the one hand it does look like

play12:35

our models are quite large can we keep

play12:37

scaling them up even further can we keep

play12:40

finding more data for the scale up and I

play12:42

want to spend a little bit of time on

play12:44

the data question because I think it's

play12:45

not obvious at all yeah

play12:48

traditionally because of the roots of

play12:50

the field of machine learning because of

play12:53

the roots of because the field has been

play12:55

fundamentally academic and fundamentally

play12:57

concerned with discovering new meth

play12:59

methods and less with the development of

play13:02

very big and powerful systems the

play13:04

mindset has been someone

play13:08

builds someone creates a fixed

play13:11

Benchmark so a data set of a certain of

play13:15

a of a certain of certain shape of

play13:18

certain characteristics and then

play13:20

different people can compare their

play13:21

methods on this data set but what it

play13:24

does is that it forces everyone to work

play13:26

with a fixed data set yep the thing with

play13:28

the the gpts have shown in particular is

play13:31

that scaling requires that you increase

play13:34

the compute and the data and tend them

play13:37

at the same time and if you do this then

play13:40

you keep getting better and better

play13:42

results and in some domains like

play13:44

language there is quite a bit of data

play13:46

valuable in other maybe more specialized

play13:50

subdomains the amount of data is a lot

play13:52

smaller and that could be for example if

play13:55

you want to have an automated lawyer so

play13:58

I think

play13:59

your big language model will know quite

play14:01

a bit about language and it will be able

play14:03

to converse very intelligently about

play14:05

many topics but it may perhaps not be as

play14:08

good at being a lawyer as we'd like it

play14:11

will be quite formidable but will it be

play14:13

good enough so this is unknown because

play14:15

the amount of data there is smaller but

play14:18

any time where data is abundant then

play14:21

it's possible to apply the Deep the

play14:22

magic deep learning formula and to

play14:24

produce these

play14:26

increasingly good and increasingly more

play14:28

powerful model

play14:29

and then in terms of what are the limits

play14:31

of scaling so I think one thing

play14:34

that's notable about the history of deep

play14:38

learning over the past 10 years is that

play14:40

every year people said okay we had a

play14:43

good run but now we've hit the limits

play14:45

and that happened year after year after

play14:47

year and so I think that I think that we

play14:50

absolutely may hit the limits at some

play14:52

point but I also think that it would

play14:55

be unwise to bet against deep learning

play14:58

yeah you know there's a number of things

play15:00

I want to dig in here um because they're

play15:01

they're all pretty interesting one is is

play15:03

this this um just I think this you you

play15:07

certainly have this mental model um that

play15:09

I think is is is quite good which is

play15:11

kind of like hey uh Mo's law is this

play15:14

incredible this an incredible accelerant

play15:17

for everything that we do right and the

play15:18

more that there's Mor's law for

play15:20

everything you know Mor's law for

play15:21

different inputs that go into the deep

play15:24

into the machine learning life cycle you

play15:26

know we're just going to like push all

play15:28

these things to the map and we're going

play15:29

to see just incredible performance and I

play15:31

think is significant because as you

play15:32

mentioned about this data point it's

play15:34

like hey if we if we get more efficient

play15:37

at at we get more efficient at compute

play15:38

which is something that's happening we

play15:40

get more efficient at uh producing data

play15:42

or finding data or or generating data we

play15:44

get more effic obviously there's more

play15:46

efficiency out of the algorithms you

play15:47

know all these things are just going to

play15:49

keep enabling us to do the next

play15:51

incredible thing and the next incredible

play15:52

thing the next incredible thing um so

play15:55

first I guess like do like I we've

play15:57

talked about this a little bit before so

play15:59

I know you agree with that but like how

play16:01

what do you think is

play16:03

um where do you think like is there any

play16:05

flaw are there any flaws to that logic

play16:06

what would you be worried about in terms

play16:08

of how everything will scale up over the

play16:10

next few years I mean I think I think

play16:12

over the next few years I don't have too

play16:14

much concern about Contin continued

play16:16

progress I think that we will we will

play16:18

have faster computers we will find more

play16:21

data and we'll train better models I

play16:23

think that is I don't see to I don't see

play16:25

particular risk there I think moving

play16:27

forward we will need to start being more

play16:28

creative about okay so what do you do

play16:30

when you don't have a lot of data can

play16:32

you somehow intelligently use the same

play16:35

compute to compensate for that lack of

play16:37

data and I think those are the questions

play16:39

that we and the the field will need to

play16:42

to Grapple these to continue our

play16:44

progress yeah no and I think this point

play16:46

about data the other thing I wanted to

play16:48

touch on because this is this is

play16:49

something obviously at scale that we're

play16:51

we focus on and I think that the large

play16:53

language models thankfully because you

play16:55

can leverage the internet really like

play16:57

all the fact that like all this dat has

play16:59

you know existed and been accumulating

play17:01

for a while you can show some pretty

play17:03

incredible things in all new domains you

play17:06

need efficient ways to to generate lots

play17:08

of data right and I think that there's

play17:10

there's this whole question was like how

play17:11

do you make it so that you know each

play17:13

ounce of human effort that goes into

play17:15

generating some data produces as much

play17:17

data as possible um and I think that

play17:19

like something that we're passionate

play17:21

about that I think we've talked a little

play17:22

bit about is like how do you get like a

play17:23

mors law for data right how do you get

play17:26

you know more and more efficiency out of

play17:28

like a human effort in producing data

play17:30

and that might require novel new um

play17:32

paradigms but uh is something that I

play17:34

think is required for in this lawyer for

play17:37

example uh that you mentioned like we

play17:39

have a pretty finite set of lawyers how

play17:40

do we get those lawyers to produce

play17:42

enough data so you can create some great

play17:44

legal uh legal AI the choices that we

play17:48

have is either improve our methods so

play17:53

that we can do more with the same data

play17:55

or do the same with less data and the

play17:57

second is like you say somehow increase

play17:59

the efficiency of the

play18:02

teachers yep and I think both will be

play18:04

needed to make the most progress yeah

play18:06

well it's kind of like you know I I

play18:08

really think Mor's law is instructive

play18:09

right like to get these chips performing

play18:11

better people try all sorts of random

play18:14

crap and then the end output is that you

play18:15

have you have chips that have more

play18:17

transistors right and I think this like

play18:20

if we think about is like do we have

play18:21

models that perform better with like

play18:23

certain amounts of data or certain

play18:25

amounts of of teaching um how do we how

play18:27

do we make that go

play18:29

yeah I mean I think I'm I'm sure that

play18:30

there will be ways to do that I mean for

play18:33

example if you ask the human teachers to

play18:37

help you only in the hardest cases I

play18:39

think that will allow you to move faster

play18:41

uh I want to switch gears to uh one of

play18:43

the offshoots of the large language

play18:45

model efforts which is particularly

play18:47

exciting especially to me as an engineer

play18:49

probably most uh people who spend a lot

play18:51

of time coding which is codex uh which

play18:54

demonstrated some pretty incredible

play18:55

capabilities of going from uh sort of

play18:57

natural language and to code and sort of

play19:00

being able to to uh interact with with a

play19:03

program in in a very novel New Way um

play19:06

you know I'm kind of curious for you

play19:08

what what excites you about this effort

play19:10

what do you think is the what do you

play19:11

think are the reasonable expectations

play19:14

for what codex and codex like systems

play19:16

will enable in the next few years what

play19:18

about far beyond that and and uh

play19:21

ultimately why are you guys so excited

play19:22

about

play19:23

it for for some context codex is pretty

play19:28

much a large GPT neural network that's

play19:30

trained on code instead of training to

play19:35

predict the next word in text it's

play19:37

trying to predict the next word in code

play19:40

the next I guess token in code and the

play19:43

thing that's cool about it is that it

play19:46

works at

play19:47

all like I don't think it's self-evident

play19:50

to most people

play19:53

that it would be possible to train a

play19:55

neural

play19:56

net in such a way

play19:59

so that if you just give it some

play20:02

representation of text that describes

play20:05

what you want and then the neural

play20:07

network will just process this text and

play20:11

produce code and this code will be

play20:13

correct and it will run and it's

play20:15

exciting for a variety of reasons so

play20:18

first of all it is useful it is new it

play20:20

shows

play20:21

that I'd say when it code has been a

play20:25

domain that hasn't really been touched

play20:27

by AI too much

play20:29

even though it's obviously very

play20:31

important and it touches on aspects

play20:33

where AI has been you know today's AI

play20:35

deep learning has been perceived as weak

play20:38

which is reasoning and

play20:40

carefully laying

play20:43

out plans

play20:45

and not being

play20:47

fuzzy and so so it turns out that in

play20:50

fact they can do a quite quite a good

play20:52

job here and like one analogy one

play20:56

distinction between between codex and

play20:58

language models is that the the Codex

play21:01

models the code models they allow you to

play21:04

they in effect they can control the

play21:07

computer it's like they have the

play21:10

computer as an actuator and so that

play21:12

makes them much more it greatly expand

play21:16

it it makes them much more useful you

play21:18

can do so many more things with them and

play21:21

of course we want to make them better

play21:22

still I think they can improve in lots

play21:25

of different ways those those are just

play21:26

the preliminary code models

play21:29

I expect them to be quite useful to

play21:34

programmers and especially in areas

play21:36

where you need to know random apis

play21:40

because these neural networks they so

play21:41

one thing that I think is small

play21:44

digression the GPT neural networks they

play21:47

don't learn quite like

play21:49

people a person will often have somewhat

play21:53

narrow knowledge in great depth while

play21:56

these neural networks they want to know

play21:59

everything that exists and they really

play22:01

try to do that so their knowledge is

play22:03

encyclopedic it's not as deep it's

play22:05

pretty deep but not as deep as a

play22:08

person and so because of that these

play22:11

neural networks in their in the way they

play22:14

work today they complement people with

play22:17

their breaths so you might say I want to

play22:20

do something with a library I don't

play22:22

really know it could be some existing

play22:24

library or maybe the neural network had

play22:27

read all you know the code of all my of

play22:29

all my colleagues and it knows it knows

play22:31

what they've written and so I want to

play22:33

use some Library I don't know how to use

play22:35

the network will have a pretty good

play22:37

guess of how to use it youd still need

play22:39

to make sure that what it said is

play22:42

correct because such is its level of

play22:44

performance today you cannot trust it

play22:46

blindly especially if the code is

play22:48

important for some domains where it's

play22:51

easy to undo anything that it writes any

play22:55

code that it writes then I think you can

play22:56

trust it just fine but if you actually

play22:58

want have real code you want to check

play23:01

it but I expect that in the future those

play23:03

models will continue to improve I expect

play23:05

that the neural network that the code

play23:07

neural networks will keep getting better

play23:09

and I think the nature of the

play23:13

programming of the programming

play23:14

profession will change in response to

play23:16

these models I think that in a like in a

play23:20

sense it's a it's a natural

play23:23

continuation

play23:25

of how in in in the software engineering

play23:29

World we've been using higher and higher

play23:31

level programming languages at first

play23:33

people wrote assembly then they had

play23:35

foron then they had C now we have python

play23:37

now we have all these amazing python

play23:38

libraries that's a layer on top of that

play23:42

and now we can be a little bit more imp

play23:44

precise we can be a little bit more

play23:45

ambitious and the model the neural

play23:47

network will do a lot of the work for us

play23:50

and I do think that I should say I

play23:53

expect something similar to happen

play23:55

across the board in lots of other white

play23:59

color professions as well you know

play24:01

there's if you think about the economic

play24:02

impact of AI there's been an inversion I

play24:04

think I think there's been a lot of

play24:06

thinking that uh maybe simple robotics

play24:08

tasks will be the first you know the

play24:10

first ones to be hit by automation but

play24:12

instead we are finding that the creative

play24:16

tasks counterintuitively they seem to be

play24:19

affected quite a bit if you look at the

play24:20

generative neural networks in the way

play24:23

you generate images now now it's you can

play24:25

find it on Twitter all kinds of stunning

play24:27

images being generated ated you know

play24:29

generating cool text is happening as

play24:31

well but the images are getting most of

play24:32

the attention and then with things like

play24:34

code things like a lot of writing tasks

play24:37

this is the uh White Collar tasks they

play24:40

are also being affected by these AIS and

play24:42

I do expect that Society will change as

play24:44

progress continues to make Society will

play24:46

change and I think that it is important

play24:48

for economists and people who think

play24:49

about these questions to pay pay careful

play24:52

attention to these Trends so that as

play24:54

technology continues to improve there

play24:57

are good ideas in place to like in

play24:59

effect to to be ready for this

play25:01

technology yeah there's a number of like

play25:04

really again uh interesting nuggets in

play25:06

there I think one is that um I think

play25:08

like the one of the big Ideas behind

play25:10

codex or codex like models right is that

play25:13

you go from being able to go from human

play25:16

language to machine language right and

play25:18

and you kind of mention like oh all of a

play25:19

sudden the machine is an actuator and if

play25:21

you think about like I think many of us

play25:23

when we think about AI we think about

play25:25

like the Star Trek computer you know you

play25:26

can just ask a computer and it'll do

play25:28

things you know that's that this is this

play25:31

is a key enabling step right because if

play25:32

if all of a sudden you can go from how

play25:35

we speak how human speak to things that

play25:37

a machine can understand then you like

play25:39

bridge this like key translation step so

play25:42

I think that's super interesting you

play25:43

know another thing that this inversion

play25:45

that you just mentioned about is is

play25:46

super interesting because I think that

play25:48

you know one of the things that um my

play25:51

beliefs on this is like hey this is the

play25:54

reason that some things have become much

play25:56

easier than others you know it's all a

play25:57

product of availability of data right

play26:00

there's some areas where we've had there

play26:02

just exists lots and lots of Digital

play26:04

Data that you can kind of suck up into

play26:06

the algorithms and it can do quite well

play26:08

and then in things like robotic tasks or

play26:10

setting a table or you know all these

play26:12

things that that are that we've had very

play26:15

a lot of trouble um building machines to

play26:18

do you're like fundamentally limited by

play26:20

amount of data you have first just by

play26:22

the amount of data that's been collected

play26:23

so far but also like you know you can

play26:25

only have so much stuff happening in the

play26:26

real world to collect that data I'm

play26:28

curious how like it how do you think

play26:30

about that or do you think it's actually

play26:32

something intrinsic to the sort of like

play26:33

creative tasks that is uh that is

play26:36

somehow more uh suited to current heral

play26:39

networks I think it's both I think it is

play26:42

unquestionably true that with the so we

play26:47

can take a step

play26:49

backwards at the base of all AI

play26:53

progress that has happened at least in

play26:56

in all of deep learning and arguably

play26:58

more is the ability of neural networks

play27:01

to generalize now generalization is a

play27:05

technical term which

play27:07

means that you understand something

play27:10

correctly or take the right action in a

play27:14

situation that's unlike any situations

play27:18

that you've seen in the past in your

play27:20

experience and you can see and so now a

play27:23

system generalizes better if from the

play27:26

same data it can do the right thing or

play27:29

understand the right situation in a

play27:31

broader set of

play27:32

situations and so to make an analogy

play27:35

suppose you have a student at a

play27:37

university studying for an exam that

play27:39

student might say this is a very

play27:41

important exam for me let me memorize

play27:46

this let me make sure I can solve every

play27:48

single exercise in the textbook you know

play27:50

such a student will be very well

play27:52

prepared and could achieve a very very

play27:54

high grade in the

play27:55

exam now consider a different student

play27:58

who might say you know what I don't need

play28:01

to to learn to to figure to know how to

play28:04

solve all the exercise in the textbooks

play28:05

As Long As I Got the fundamentals right

play28:08

I read the first 20 pages and I feel I

play28:11

got the fundamentals if the if that

play28:14

second student also achieves a high

play28:16

grade in the exam that second student

play28:19

did something harder than the first

play28:21

student that second student exhibited a

play28:23

greater degree of generalization they

play28:25

were able to even though the questions

play28:27

were the same the situation was less

play28:29

familiar for the second student and the

play28:30

first student and so our neural networks

play28:33

are a lot like the first students they

play28:35

they have an incredible ability to

play28:38

generalize for a

play28:40

computer but we could we could do more

play28:43

and because they generalization is

play28:45

not yet perfect definitely not yet at a

play28:49

human level we need to compensate for it

play28:52

by training on very large amounts of

play28:54

data that's where the data comes in the

play28:57

the better you generalize the less data

play29:00

you need slash the further you can go

play29:02

with the same data so maybe once we find

play29:06

figure out how to make our neural

play29:07

networks generalize a lot better then

play29:10

all those small domains we don't have a

play29:12

lot of data it actually won't matter the

play29:14

neuro electric will say it's okay I know

play29:17

what to do well enough even with this

play29:19

limited amount of data but today we need

play29:22

a lot of data but now when it comes to

play29:24

the creative applications in particular

play29:27

there is some way in which they are

play29:29

especially well suited for neural

play29:30

networks and that's because generative

play29:32

models play a very Central role in

play29:35

machine learning and the nature of the

play29:38

generations of generative models are

play29:40

somehow analogous to the artistic

play29:43

process it's not perfect it doesn't

play29:45

capture everything and very and there is

play29:48

certain kinds of art which our models

play29:51

cannot do yet but I think this second

play29:54

connection the the the generative aspect

play29:56

of Art

play29:58

and the ability of generative models to

play30:01

generate new plausible data is another

play30:04

reason why art has been we've seen so

play30:08

much progress in generative art

Rate This

5.0 / 5 (0 votes)

Related Tags
深度学习神经网络人工智能自然语言机器翻译编程模型数据效率技术突破社会变革创新应用
Do you need a summary in English?