AI Pioneer Shows The Power of AI AGENTS - "The Future Is Agentic"

Matthew Berman
29 Mar 202423:47

Summary

TLDR在本次演讲中,Dr. Andrew Ning分享了他对代理(agents)的乐观看法,特别是像GPT 3.5这样的模型如何通过代理工作流程提升到GPT 4的水平。他强调了多代理协作和迭代的重要性,并通过案例研究展示了代理工作流程的显著效果。他还讨论了代理设计模式,如反思、工具使用、规划和多代理协作,以及这些模式如何推动人工智能的未来。最后,他提出了对快速生成令牌和代理推理设计模式的期待,认为这些将为实现更广泛的AI任务打开大门。

Takeaways

  • 🧠 Dr. Andrew Ning 是一位计算机科学家,曾是Google Brain的联合创始人和首席科学家,目前在人工智能领域具有重要影响力。
  • 🌟 Sequoia 是硅谷最著名的风险投资公司之一,其投资的公司总市值占纳斯达克市场的25%以上。
  • 📈 通过使用代理工作流,即使是GPT 3.5这样的模型也能实现接近GPT 4的性能水平。
  • 🔄 代理工作流的核心优势在于迭代,通过多个具有不同角色和能力的代理不断改进任务结果。
  • 🤖 代理可以采用不同的设计模式,包括反思、使用工具、规划和多代理协作,以提高问题解决的效率。
  • 🔧 反思工具使大型语言模型能够自我评估并改进其输出,从而提高性能。
  • 🛠️ 工具使用允许代理使用预定义的工具,如API调用和外部库,以增强其功能。
  • 📝 规划使代理能够慢下来,计划步骤,通过逐步推理来提高输出质量。
  • 🤼‍♂️ 多代理协作通过不同代理之间的合作,模拟人类团队合作解决问题的方式。
  • 🚀 快速生成令牌(tokens)对于代理工作流至关重要,因为它允许代理快速迭代和改进。
  • 🌐 随着模型变得更加商品化,未来可能会看到更多关于代理和推理速度的改进,这将推动人工智能的发展。

Q & A

  • Dr. Andrew Ning 是谁?

    -Dr. Andrew Ning 是一位计算机科学家,曾是 Google Brain 的联合创始人和首席科学家,也是在线教育平台 Coursera 的联合创始人。他在人工智能领域有着举足轻重的影响力。

  • Sequoia 是一家怎样的公司?

    -Sequoia 是一家传奇的硅谷风险投资公司,其投资组合中的公司占纳斯达克总市值的25%以上,投资过的知名公司包括 Apple、Instagram、Zoom 等。

  • 什么是非代理性工作流程?

    -非代理性工作流程是指用户向 AI 模型提出问题,模型生成答案,类似于人一次性写作一篇文章,不使用退格键进行修改。

  • 代理性工作流程有什么特点?

    -代理性工作流程是一种迭代的过程,涉及多个具有不同角色、背景、人格和工具的代理,它们共同协作,通过多次迭代来完成任务。

  • Dr. Andrew Ning 为什么认为代理性工作流程如此强大?

    -Dr. Andrew Ning 认为代理性工作流程的强大之处在于可以结合多个代理的力量,每个代理都有不同的专长,通过迭代和协作来达到最佳的结果,这与人类的工作方式相似。

  • 在编程基准测试中,使用零次射击提示的 GPT 3.5 和 GPT 4 的正确率是多少?

    -在使用零次射击提示时,GPT 3.5 的正确率是 48%,而 GPT 4 的正确率是 67%。

  • Dr. Andrew Ning 介绍了哪些代理性工作流程的设计模式?

    -Dr. Andrew Ning 介绍了四种代理性工作流程的设计模式:反思(reflection)、工具使用(tool use)、规划(planning)和多代理协作(multi-agent collaboration)。

  • 反思在代理性工作流程中是如何应用的?

    -反思是指让大型语言模型对其输出进行反思,找出改进的方法并返回改进后的结果。这种方法可以使模型的性能得到显著提升。

  • 工具使用在代理性工作流程中的作用是什么?

    -工具使用允许给大型语言模型提供额外的工具,如自定义编码工具、API 调用或复杂的数学库,从而扩展模型的功能和应用范围。

  • 多代理协作如何提高工作流程的效率?

    -多代理协作通过让不同的代理扮演不同的角色,如编码者、批评者、设计师等,共同协作完成任务。这种协作可以产生更好的结果,并且能够模拟人类的协作方式。

  • Dr. Andrew Ning 对于未来 AI 发展的趋势有何看法?

    -Dr. Andrew Ning 认为代理性工作流程将是未来 AI 发展的重要趋势,他预计 AI 能够完成的任务范围将会因为代理性工作流程而大幅扩展。同时,他也看好推理速度的提升和模型的进一步商品化。

Outlines

00:00

🤖 人工智能代理的未来展望

本段落介绍了Dr. Andrew Ning在Sequoia的演讲,他对于人工智能代理的未来非常乐观。他提到像GPT 3.5这样的模型可以驱动代理进行推理,达到GPT 4的水平。他还分享了许多关于人工智能的有趣见解。Andrew Ning是计算机科学家,曾是Google Brain的联合创始人和首席科学家,也是在线教育平台Coursera的联合创始人。他的演讲强调了人工智能代理的重要性和未来潜力。

05:02

📈 人工智能代理的工作流程与案例研究

这一段讨论了非代理性工作流程与代理性工作流程的对比。通过一个编写文章的例子,说明了代理性工作流程的迭代性和协作性,以及如何通过多个代理(如写作代理、审阅代理、拼写检查代理等)共同工作,不断迭代改进结果。此外,通过一个编程基准测试的例子,展示了使用GPT 3.5和代理性工作流程相比零拍GPT 4能获得更好的结果,强调了代理性工作流程的强大能力。

10:04

🛠️ 人工智能代理的设计模式

本段介绍了几种人工智能代理的设计模式,包括反思工具的使用、工具使用、规划和多代理协作。反思是指让大型语言模型审视并改进其输出。工具使用意味着可以给模型提供工具,如网络爬虫工具或数学库。规划是指让模型进行更慢的思考和步骤规划。多代理协作是指不同代理之间的合作,如自动生成代码和审查代码的代理。这些设计模式展示了人工智能代理的多样性和灵活性。

15:05

🌐 人工智能代理的未来发展

这一段讨论了人工智能代理的未来发展,包括对更快的推理速度和更高效的代理解算模型的期待。提到了使用代理工作流程可以显著提高生产力,并且随着更好的代理模型和工具的出现,代理的'finicky'方面将大大减少。此外,还提到了多代理系统和辩论的潜力,以及如何通过不同模型的协作获得更好的性能。

20:07

🚀 人工智能代理的快速迭代与性能提升

最后一段强调了快速迭代和改进在人工智能代理中的重要性。提到了快速生成令牌的能力对于代理工作流程的重要性,以及如何通过快速迭代来提高模型的性能。此外,还提到了对下一代模型如GPT-5和Clay 4的期待,以及这些模型如何帮助我们更接近通用人工智能(AGI)的目标。最后,演讲者对人工智能代理的未来表示了极大的乐观,并鼓励观众对这一领域保持关注。

Mindmap

Keywords

💡人工智能代理

人工智能代理(agents)是指能够执行特定任务或具有特定角色的智能系统。在视频中,Dr. Andrew Ning强调了代理在人工智能发展中的重要性,认为未来的AI将是一个由多个代理组成的系统。例如,通过让不同的代理承担写作者、审稿人、拼写检查器等角色,可以实现更高效和高质量的任务完成。

💡迭代工作流程

迭代工作流程是指在完成任务的过程中,不断进行反馈、修正和改进的过程。在视频中,Dr. Andrew Ning通过比较非代理性工作流程和代理性工作流程,强调了迭代在提高任务完成质量中的作用。例如,写一篇文章时,不仅仅是一次性生成内容,而是通过多次迭代,包括初稿、修订、再思考等步骤,来达到最佳效果。

💡工具使用

工具使用是指让人工智能系统利用特定的工具或功能来完成某项任务。在视频中,Dr. Andrew Ning提到了工具使用作为代理设计模式之一,说明了通过给AI系统提供工具,如网络爬虫工具、SEC查询工具等,可以扩展其能力并提高任务完成的效率和质量。

💡多代理协作

多代理协作是指多个人工智能代理共同工作以完成一个任务的过程。在视频中,Dr. Andrew Ning讨论了多代理协作的潜力,指出通过不同的代理承担不同的角色和任务,可以实现更复杂的任务协同和问题解决。

💡反思

反思是指人工智能代理在给出一个输出后,再次审视并尝试改进它的过程。在视频中,Dr. Andrew Ning将反思描述为一种强大的工具,通过让AI代理对自己的输出进行反思和改进,可以显著提高其性能和输出的质量。

💡计划

计划是指人工智能代理在执行任务前进行思考、安排步骤和策略的过程。在视频中,Dr. Andrew Ning提到计划作为代理设计模式之一,强调了让AI代理进行计划可以提高其解决问题的能力,例如通过逐步解释推理过程来产生更好的结果。

💡Dr. Andrew Ning

Dr. Andrew Ning是一位著名的计算机科学家,曾是Google Brain的联合创始人和首席科学家,也是在线教育平台Coursera的共同创始人。在视频中,他的演讲主要关注于人工智能代理的发展和潜力,以及代理性工作流程如何推动AI的进步。

💡Sequoia

Sequoia是硅谷最著名的风险资本公司之一,它投资了一系列成功的科技公司,对科技发展有着深远的影响。在视频中,提到Sequoia是为了展示Dr. Andrew Ning演讲的重要性和影响力,以及他的观点得到了这样一家重量级投资公司的认可。

💡GPT 3.5和GPT 4

GPT 3.5和GPT 4是OpenAI开发的先进的大型语言模型。在视频中,Dr. Andrew Ning通过比较这些模型在不同工作流程下的表现,来说明代理性工作流程和迭代过程如何提升AI的性能。

💡云计算

云计算是指通过互联网提供计算资源和服务的一种技术。在视频中,提到云计算是为了强调快速生成令牌(tokens)的重要性,以及如何通过云服务提供商来提高AI代理的工作速度和效率。

💡AGI(通用人工智能)

通用人工智能(AGI)是指能够执行任何智能任务的人工智能系统,与人类智能相当。在视频中,Dr. Andrew Ning提到AGI作为一个长期目标,而代理性工作流程和迭代过程是实现这一目标的重要步骤。

Highlights

Dr. Andrew Ning is incredibly bullish on agents and their potential in AI development.

Agents can reason at levels comparable to GPT 4 when powered by models like GPT 3.5.

Dr. Ning is a renowned computer scientist, co-founder of Google Brain, and founder of Corsera.

Sequoia, the venue of the talk, has an impressive portfolio including 25% of NASDAQ's total value.

Non-agentic workflows are likened to asking a person to write an essay without ever using backspace.

Agentic workflows involve multiple agents with different roles and tools, iterating on tasks for better outcomes.

Using agentic workflows, GPT 3.5 can outperform GPT 4 in certain tasks.

Agentic workflows can lead to remarkable improvements in coding benchmarks.

Reflection is a tool that allows large language models to self-improve their outputs.

Tool use allows LMs to utilize pre-existing code and libraries, expanding their capabilities.

Planning and multi-agent collaboration are emerging technologies with great potential.

Multi-agent systems can simulate different roles and collaborate on complex tasks.

Agent workflows can produce better results than zero-shot prompting or single model use.

The future of AI may involve a shift in expectations for immediate responses to prompts.

Fast token generation is crucial for iterative agentic workflows.

Agentic reasoning design patterns are expected to greatly expand the set of tasks AI can perform.

The path to AGI is seen as a journey, with agent workflows contributing to incremental progress.

Transcripts

play00:00

Dr Andrew ning just did a talk at

play00:03

Sequoia and is all about agents and he

play00:07

is incredibly bullish on agents he said

play00:09

things like GPT 3.5 powering agents can

play00:12

actually reason to the level of GPT 4

play00:15

and a lot of other really interesting

play00:17

tidbits so we're going to watch his talk

play00:19

together and I'm going to walk you

play00:20

through step by step what he's saying

play00:22

and why it's so important I am

play00:24

incredibly bullish on agents myself

play00:26

that's why I make so many videos about

play00:28

them and I truly believe the future of

play00:30

artificial intelligence is going to be a

play00:33

gentic so first who is Dr Andrew ning he

play00:36

is a computer scientist he was the

play00:38

co-founder and head of Google brain the

play00:41

former Chief scientist of Buu and a

play00:44

leading mind in artificial intelligence

play00:47

he went to UC Berkeley MIT and Carnegie

play00:50

melon so smart smart dude and he

play00:52

co-founded this company corsera where

play00:54

you can learn a ton about computer

play00:57

science about math a bunch of different

play00:59

topics absolutely free and so what he's

play01:02

doing is truly incredible and so when he

play01:05

talks about AI you should listen so

play01:07

let's get to this talk this is at seoa

play01:11

and if you're not familiar with seoa

play01:13

they are one of the most legendary

play01:14

Silicon Valley venture capital firms

play01:17

ever now here's an interesting stat

play01:18

about seoa that just shows how

play01:20

incredible they are at picking

play01:22

technological winners their portfolio of

play01:24

companies represents more than 25% of

play01:27

Today's total value of the the NASDAQ so

play01:31

the total value of all the companies

play01:33

that are listed on the NASDAQ 25% of

play01:35

that market capitalization are companies

play01:37

that are owned or have been owned or

play01:40

invested in by seoa incredible stat

play01:43

let's look at some of their companies

play01:44

Reddit instacart door Dash Airbnb a

play01:48

little company called Apple block

play01:50

snowflake vanta Zoom stripe WhatsApp

play01:55

OCTA Instagram this list is absolutely

play01:58

absurd all right another of the preface

play02:01

let me get into the Talk itself so a

play02:03

agents you know today the way most of us

play02:05

use l Shish models is like this with a

play02:08

non- agentic workflow where you type a

play02:10

prompt and generates an answer and

play02:12

that's a bit like if you ask a person to

play02:15

write an essay on a topic and I say

play02:17

please sit down to the keyboard and just

play02:19

type the essay from start to finish

play02:21

without ever using backspace um and

play02:24

despite how hard this is L's do it

play02:26

remarkably well in contrast with an

play02:30

agentic workflow this is what it may

play02:32

look like have an AI have an LM say

play02:34

write an essay outline do you need to do

play02:37

any web research if so let's do that

play02:39

then write the first draft and then read

play02:42

your own first draft and think about

play02:44

what parts need revision and then revise

play02:46

your draft and you go on and on and so

play02:49

this workflow is much more iterative

play02:51

where you may have the L do some

play02:54

thinking um and then revise this article

play02:57

and then do some more thinking and

play02:59

iterate this

play03:00

through a number of times so I want to

play03:02

pause it there and talk about this

play03:03

because this is the best explanation for

play03:06

why agents are so powerful I've heard a

play03:08

lot of people say well agents are just

play03:10

llms right and yeah technically that's

play03:13

true but the power of an agentic

play03:15

workflow is the fact that you can have

play03:17

multiple agents all with different roles

play03:19

different backgrounds different personas

play03:21

different tools working together and

play03:23

iterating that's the important word

play03:26

iterating on a task so in this example

play03:28

he said okay write an essay and yeah an

play03:31

llm can do that and usually it's pretty

play03:34

darn good but now let's say you have one

play03:36

agent who is the writer another agent

play03:39

who is the reviewer another for the

play03:41

spell checker another for the grammar

play03:42

checker another for the fact Checker and

play03:45

they're all working together and they

play03:47

iterate over and over again passing the

play03:49

essay back and forth making sure that it

play03:51

finally ends up to be the best possible

play03:53

outcome and so this is how humans work

play03:57

humans as he said do not just do

play04:00

everything in one take without thinking

play04:02

through and planning we plan we iterate

play04:05

and then we find the best solution so

play04:07

let's keep listening what not many

play04:08

people appreciate is this delivers

play04:11

remarkably better results um I've

play04:13

actually really surprised myself working

play04:15

these agent workflows how well how well

play04:18

they work other let's do one case study

play04:20

at my team analyzed some data using a

play04:23

coding Benchmark called the human eval

play04:25

Benchmark released by open a few years

play04:27

ago um but this says coding problems

play04:29

like given the nonent list of integers

play04:32

return the sum of all the odd elements

play04:33

are an even positions and it turns out

play04:35

the answer is you co snipper like that

play04:37

so today lot of us will use zero shot

play04:40

prompting meaning we tell the AI write

play04:42

the code and have it run on the first

play04:44

spot like who codes like that no human

play04:46

codes like that just type out the code

play04:47

and run it maybe you do I can't do that

play04:50

um so it turns out that if you use GPT

play04:53

3.5 uh zero shot prompting it gets it

play04:56

48% right uh gbd4 way better 67 7% right

play05:02

but if you take an agentic workflow and

play05:04

wrap it around GPT 3.5 say it actually

play05:08

does better than even

play05:10

gbd4 um and if you were to wrap this

play05:13

type of workflow around gbd4 you know it

play05:16

it it also um does very well all right

play05:19

let's pause here and think about what he

play05:20

just said over here we have the zero

play05:23

shot which basically means you're simply

play05:25

telling the large language model do this

play05:27

thing not giving it any example not

play05:30

giving it any chance to think or to

play05:31

iterate or any fancy prompting just do

play05:34

this thing and it got the human evalve

play05:36

Benchmark 48% correct then GPT 4 67%

play05:40

which is you know a huge Improvement and

play05:42

we're going to continue to see

play05:43

Improvement when GPT 5 comes out and so

play05:45

on however look at this GPT 3.5 wrapped

play05:49

in an agentic workflow any of these all

play05:53

perform better than the zero shot GPT 4

play05:56

using only GPT 3.5 and this lb BD plus

play06:00

reflection it's actually nearly 100%

play06:02

it's over 95% then of course if we wrap

play06:05

GPT 4 in the agentic workflow metag GPT

play06:08

for example we all know about it

play06:10

performs incredibly well across the

play06:12

board and agent coder kind of at the top

play06:15

here so it's really just showing the

play06:17

power of agentic workflows and you

play06:19

notice that GB 3.5 with an agentic

play06:22

workflow actually outperforms

play06:26

gp4 um and I think this has and this

play06:29

means that this has signant consequences

play06:31

I think how we all approach building

play06:33

applications so agents is the term has

play06:36

been tossed around a lot there's a lot

play06:38

of consultant reports how about agents

play06:40

the future of AI blah blah blah I want

play06:42

to be a bit concrete and share of you um

play06:44

the broad design patterns I'm seeing in

play06:47

agents it's a very messy chaotic space

play06:49

tons of research tons of Open Source

play06:51

there's a lot going on but I try to

play06:53

categorize um bit more concretely what's

play06:55

going on agents reflection is a tool

play06:58

that I think many of us are just use it

play07:00

just works uh to use I think it's more

play07:03

widely appreciated but actually works

play07:04

pretty well I think of these as pretty

play07:06

robust Technologies when I all right

play07:08

let's stop there and talk about what

play07:09

these things are so reflection is as

play07:12

obvious as it sounds you are literally

play07:14

saying to the large language model

play07:17

reflect on the output you just gave me

play07:19

find a way to improve it then return

play07:22

another result or just return the

play07:23

improvements so very straightforward and

play07:26

it seems so obvious but this actually

play07:29

causes large language models to perform

play07:31

a lot better and then we have tool use

play07:33

and we learned all about tool use with

play07:35

projects like autogen and crew AI tool

play07:38

use just means that you can give them

play07:40

tools to use you can custom code tools

play07:43

it's like function calling so you could

play07:45

say Okay I want a web scraping tool and

play07:48

I want an SEC lookup tool so you can get

play07:51

stock information about ticker symbols

play07:53

you can even plug in complex math

play07:57

libraries to it I mean the possibilities

play07:59

are literally endless so you can give a

play08:01

bunch of tools that the large language

play08:03

model didn't previously have you just

play08:05

describe what the tool does and the

play08:06

large language model can actually choose

play08:08

when to use the tool it's really cool

play08:10

use them I can you know almost always

play08:12

get them to work well um planning and

play08:15

multi-agent collaboration I think is

play08:17

more emerging when I use them sometimes

play08:20

my mind is blown for how well they work

play08:22

but at least at this moment in time I

play08:23

don't feel like I can always get them to

play08:25

work reliably so let me walk through

play08:28

these full design Pat

play08:30

all right so he's going to walk through

play08:31

it but I just want to touch on what

play08:32

planning and multi-agent collaboration

play08:34

is so planning we're basically saying

play08:36

giving the large language model the

play08:38

ability to think more slowly to plan

play08:40

steps and that's usually by the way why

play08:42

in all of my llm tests I say explain

play08:44

your reasoning step by step because that

play08:46

kind of forces them to plan and to think

play08:49

through each step which usually produces

play08:52

better results and then multi-agent

play08:54

collaboration that is autogen and crew

play08:56

AI that is a very emergent technology

play08:59

techology I am extremely bullish on it

play09:01

it is sometimes difficult to get the

play09:03

agents to behave like you need them to

play09:06

but with enough QA and enough testing

play09:08

and iteration you usually can and the

play09:10

results are phenomenal and not only do

play09:13

you get the benefit of having the large

play09:15

language model essentially reflect with

play09:17

different personalities or different

play09:18

roles but you can actually have

play09:21

different models powering different

play09:22

agents and so you're getting the benefit

play09:24

of the reflection based on the quality

play09:26

of each model so you're basically

play09:28

getting really different opinions as

play09:30

these agents are working together so

play09:32

let's keep listening and if some of you

play09:35

go back and yourself will ask your

play09:36

engineers to use these I think you get a

play09:38

productivity boost quite quickly so

play09:40

reflection here's an example let's say I

play09:43

ask a system please write Cod for me for

play09:46

a given task then we have a coder agent

play09:49

just an LM that you prompt to write code

play09:51

to say you def do Tas write a function

play09:54

like that um an example of

play09:57

self-reflection would be if you then

play09:59

prompt the LM with something like this

play10:01

here's code intended for a toss and just

play10:03

give it back the exact same code that

play10:05

they just generated and then say check

play10:07

the code carefully for correctness sound

play10:09

efficiency good construction CRI just

play10:10

write a prompt like that it turns out

play10:12

the same L that you prompted to write

play10:14

the code may be able to spot problems

play10:17

like this bug in line five and fix it by

play10:19

blah blah blah and if you now take his

play10:21

own feedback and give it to it and

play10:22

reprompt it it may come up with a

play10:25

version two of the code that could well

play10:26

work better than the first version not

play10:28

guaranteed but it works you know often

play10:30

enough but this to be worth trying for a

play10:32

law of appli so what you usually see me

play10:34

doing in my llm test videos is for

play10:36

example let's say I say write the Game

play10:38

snake in Python and it gives me the game

play10:41

Snake it's that is zero shot I'm just

play10:44

saying write it all out in one go then I

play10:47

take it I put it in my VSS code I play

play10:50

it I get the error or I look for any

play10:52

bugs and then I paste that back in to

play10:56

the large language model to fix now

play10:58

that's essentially me acting as an agent

play11:00

and what we can do is use an agent to

play11:02

automate me so basically look at the

play11:05

code look for any potential errors and

play11:07

even agents that can run the code get

play11:11

the error and pass it back into the

play11:13

large language model now it's completely

play11:15

automated coding to foreshadow to use if

play11:18

you let it run unit tests if it fails a

play11:20

unit test then why do you fail the unit

play11:23

test have that conversation and be able

play11:24

to figure out failed the unit test so

play11:26

you should try changing something and

play11:28

come up with V3 by the way for those of

play11:31

you that want to learn more about these

play11:32

Technologies I'm very excited about them

play11:34

for each of the four sections I have a

play11:36

little recommended reading section in

play11:37

the bottom that you know hopefully gives

play11:39

more references and again just the

play11:41

foreshadow of multi-agent systems I've

play11:44

described as a single coder agent that

play11:46

you prompt to have it you know have this

play11:48

conversation with itself um one Natural

play11:51

Evolution of this idea is instead of a

play11:53

single code agent you can have two

play11:56

agents where one is a code agent and the

play11:58

second is a critic agent and these could

play12:01

be the same base LM model but they you

play12:04

prompt in different ways where you say

play12:06

one your exper coder right code the

play12:08

other one say your expert code review as

play12:10

to review this code and this type of

play12:12

workflow is actually pretty easy to

play12:13

implement I think such a very general

play12:16

purpose technology for a lot of

play12:17

workflows this will give you a

play12:18

significant boost in in the performance

play12:20

of LMS um the second design pattern is

play12:24

to use many of you will already have

play12:26

seen you know lmb systems uh uh using

play12:29

tools on the left is a screenshot from

play12:32

um co-pilot on the right is something

play12:34

that I kind of extracted from uh gbd4

play12:37

but you know LM today if you ask it

play12:39

what's the best coffee maker can do web

play12:41

search for some problems LMS will

play12:43

generate code and run codes um and it

play12:45

turns out that there are a lot of

play12:48

different tools that many different

play12:49

people are using for analysis for

play12:52

gathering information for taking action

play12:54

personal productivity um it turns out a

play12:56

lot of the early work and to use turned

play12:58

out to be in the computer vision

play13:00

Community because before large language

play13:03

models LMS you know they couldn't do

play13:05

anything with images so the only option

play13:07

was that the LM generate a function call

play13:09

that could manipulate an image like

play13:11

generate an image or do object detection

play13:13

or whatever so if you actually look at

play13:14

literature it's been interesting how

play13:16

much of the work um in two years seems

play13:19

like it originated from Vision because

play13:21

Elms would blind to images before you

play13:24

know GPD 4V and and and lava and so on

play13:27

um so that's to use in it all right so

play13:30

tool use incredibly incredibly important

play13:33

because you're basically giving the

play13:34

large language model code to use it is

play13:37

hardcoded code so you always know the

play13:40

result it's not another large language

play13:42

model that might produce something a

play13:43

little different each time this is

play13:45

hardcoded and always is going to produce

play13:48

the same output so these tools are very

play13:50

valuable and the cool thing about tools

play13:53

is we don't have to rewrite them right

play13:54

we don't have to write them from scratch

play13:56

these are tools that programmers already

play13:58

test app to use in their code so whether

play14:01

it's external libraries API calls all of

play14:04

these things can now be used by large

play14:06

language models and that is really

play14:08

exciting we're not going to have to

play14:09

rewrite all of this tooling and then

play14:12

planning you know for those of you that

play14:13

have not yet played a lot with planning

play14:15

algorithms I I feel like a lot of people

play14:17

talk about the chat GPT moment where

play14:19

you're wow never seen anything like this

play14:22

I think if not use planning alums many

play14:24

people will have a kind of a AI agent

play14:27

wow I couldn't imag imagine the AI agent

play14:30

doing this so I've run live demos where

play14:32

something failed and the AI agent

play14:34

rerouted around the failure I've

play14:36

actually had quite a few of them like

play14:38

wow you can't believe my AI system just

play14:40

did that autonomously but um one example

play14:43

that I adapted from hugging GPT paper

play14:46

you know you say this general image

play14:48

where the girls read where girl and by

play14:49

the way I made a video about hugging GPT

play14:52

it is an amazing paper I'll link that in

play14:54

the description below I was reading a

play14:56

book and it post the same as a boy in

play14:57

the image example le. jpack and please

play15:00

subcribe the new imagy re voice so give

play15:01

an example like this um today we have ai

play15:04

agents who can kind of decide first

play15:06

thing I need to do is determine the post

play15:08

of the boy um then you know find the

play15:11

right model maybe on hugging face to

play15:14

extract the post then next need to find

play15:16

a post image model to synthesize a

play15:19

picture of a of a girl of as following

play15:22

the instructions then use uh image to

play15:24

text and then finally use text to speech

play15:27

and today we actually have agents that

play15:29

I don't want to say they work reliably

play15:32

you know they're kind of finicky they

play15:34

don't always work but when it works is

play15:36

actually pretty amazing but with agentic

play15:39

Loop sometimes you can recover from

play15:40

earlier failures as well so yeah and

play15:42

that's a really important Point agents

play15:44

are a little bit finicky but since you

play15:46

can iterate and the Agents can usually

play15:49

recover from their issues that makes

play15:52

them a lot more powerful and as we

play15:54

continue to evolve agents as we get

play15:56

better agentic models better tooling

play15:58

better Frameworks like crew aai and

play16:00

autogen all of these kind of finicky

play16:03

aspects of agents are going to start to

play16:06

get reduced tremendously I find myself

play16:09

already using research agents in some of

play16:11

my work well one a piece of research but

play16:13

I don't feel like you know Googling

play16:15

myself and spend long time I should send

play16:16

to the research agent come back in a few

play16:19

minutes and see what it's come up with

play16:20

and and it it sometimes works sometimes

play16:22

doesn't right but that's already a part

play16:24

of my personal

play16:25

workflow the final design pattern multi-

play16:28

Asian collaboration ation this is one of

play16:29

those funny things but uh um it works

play16:33

much better than you might think uh uh

play16:36

but on the left is a screenshot from a

play16:38

paper called um chat Dev I made a video

play16:42

about this it'll be in the description

play16:44

below as well uh which is completely

play16:46

open which actually open source many of

play16:48

you saw the you know flashy social media

play16:50

announcement of demo of a Devon uh uh

play16:53

Chad Dev is open source it runs on my

play16:56

laptop and what Chad Dev does is example

play16:59

of a multi-agent system where you prompt

play17:02

one LM to sometimes act like the CEO of

play17:05

a software engine company sometimes act

play17:07

a designer sometime a product manager

play17:09

sometimes ACC a tester and this flock of

play17:12

agents that you buil by prompting an LM

play17:14

to tell them you're now coo you're now

play17:16

software engineer they collaborate have

play17:18

an extended conversation so that if you

play17:21

tell it please develop a game develop a

play17:24

GOI game they'll actually spend you know

play17:26

a few minutes writing code testing it

play17:29

iterating and then generate a like

play17:31

surprisingly complex programs doesn't

play17:34

always work I've used it sometimes it

play17:36

doesn't work sometimes is amazing but

play17:38

this technology is really um getting

play17:40

better and and just one of design

play17:42

pattern it turns out that multi-agent

play17:45

debate where you have different agents

play17:46

you know for example could be have ch

play17:48

GPT and Gemini debate each other that

play17:51

actually results in better performance

play17:54

as well all right so he said the

play17:55

important part right there when you have

play17:57

different agents and each of them are

play17:58

are powered by different models maybe

play18:00

even fine-tuned models fine-tuned

play18:03

specifically for their task and their

play18:06

role you get really good performance and

play18:09

that is exactly what a project like crew

play18:11

AI like autogen is made for so Gabby

play18:14

multiple simulated air agents work

play18:16

together has been a powerful design

play18:18

pattern as well um so just to summarize

play18:21

I think these are the these are the the

play18:24

the uh patterns I've seen and I think

play18:26

that if we were to um use these uh uh

play18:29

patterns you know in our work a lot of

play18:32

us can get a prity boost quite quickly

play18:35

and I think that um agentic reasoning

play18:38

design patterns are going to be

play18:39

important uh this is my small slide I

play18:42

expect that the set of task AI could do

play18:44

will expand dramatically this year uh

play18:48

because of agentic workflows and one

play18:51

thing that it's actually difficult

play18:52

people to get used to is when we prompt

play18:54

an LM we want to response right away um

play18:57

in fact a decade ago when was you know

play18:59

having discussions around at at at

play19:01

Google on um called a big box search

play19:04

type in Long prompt one of the reasons

play19:07

you know I failed to push successfully

play19:09

for that was because when you do a web

play19:11

search you one have responds back in

play19:13

half a second right that's just human

play19:14

nature we like that instant gra instant

play19:16

feedback but for a lot of the agent

play19:18

workflows um I think we'll need to learn

play19:21

to dedicate the toss and AI agent and

play19:23

patiently wait minutes maybe even hours

play19:26

uh to for response but just like us I've

play19:28

seen a lot of novice managers delegate

play19:31

something to someone and then check in

play19:32

five minutes later right and that's not

play19:34

productive um I think we need to it be

play19:37

difficult we need to do that with some

play19:38

of our AI agents as well all right so

play19:41

this is actually a point which I want to

play19:44

pose a different way of thinking about

play19:45

it think about grock grock grq you get

play19:48

500 700 850 tokens per second with grock

play19:53

with their architecture and all of a

play19:55

sudden the agents which you know you

play19:57

usually expect them to take a few

play19:59

minutes to do a semi complex task all

play20:01

the way up to 10 15 20 minutes depending

play20:03

on what the task is a lot of the time in

play20:06

that task completion is the inference

play20:09

running that is assuming you're getting

play20:11

you know 10 15 20 tokens per second with

play20:14

open AI but if you're able to get 800

play20:16

tokens per second it's essentially

play20:18

instant and a lot of people when they

play20:20

first saw grock they thought well what's

play20:23

the point of 800 tokens per second

play20:25

because humans can't read that fast this

play20:27

is the best use case for that agents

play20:29

using hyper inference speed and reading

play20:31

each other's responses is the best way

play20:34

to leverage that really fast inference

play20:37

speed humans don't actually need to read

play20:39

it so this is a perfect example so if

play20:42

all of a sudden that part of your agent

play20:44

workflow is extremely fast and then

play20:47

let's say we get an embeddings model to

play20:49

be that fast all of a sudden the slowest

play20:52

part of the entire agent workflow is

play20:55

going to be searching the web or hitting

play20:58

a third party API it's no longer going

play21:00

to be the inference and the embeddings

play21:02

and that is really exciting let's keep

play21:05

watching the end and then one other

play21:07

important Trend fast token generation is

play21:09

important because with these agentic

play21:11

workflows we're iterating over and over

play21:13

so the elm is generating tokens for the

play21:15

to read and I think that um generating

play21:18

more tokens really quickly from even a

play21:20

slightly lower quality LM might give

play21:23

good results compared to slower tokens

play21:25

from a betm maybe it's a little bit

play21:27

controversial because it may let you go

play21:29

around this Loop a lot more times kind

play21:31

of like the results I showed with gpdc

play21:33

and an agent architecture on the first

play21:35

slide um and cand I'm really looking

play21:38

forward to Cloud 5 and Cloud 4 and gb5

play21:41

and Gemini 2.0 and all these other one4

play21:43

models that many building and part of me

play21:46

feels like if you're looking forward to

play21:48

running your thing on gb5 zero shot you

play21:51

know you may be to get closer to that

play21:53

level of performance on some

play21:55

applications than you might think with

play21:57

agent reasoning um but on an early model

play22:00

I think I I I I think this is an

play22:02

important Trend uh uh and honestly the

play22:07

path to AGI feels like a journey rather

play22:10

than a destination but I think this typ

play22:12

of agent workflows could help us take a

play22:14

small step forward on this very long

play22:16

journey thank you okay so he said a lot

play22:19

of important things at the end there one

play22:21

thing he said is if you're already

play22:22

looking forward to GPT 5 clae 4 the

play22:24

basically the next generation of The

play22:25

Cutting Edge models you might be able to

play22:27

achieve

play23:28

and what's the cost of all these tokens

play23:30

and all of that I think is going to get

play23:32

sorted out as models become more and

play23:34

more commoditized so I'm super excited

play23:37

about agents I'm super excited about

play23:39

inference speed improvements and I hope

play23:41

you liked Andrew ning's talk if you

play23:42

liked this video please consider giving

play23:44

a like And subscribe and I'll see you in

play23:46

the next one

Rate This

5.0 / 5 (0 votes)

相关标签
人工智能代理工作流Dr. Andrew NingGPT 3.5技术创新硅谷风险投资编程基准多代理协作自我反思工具使用