AI Pioneer Shows The Power of AI AGENTS - "The Future Is Agentic"

Matthew Berman
29 Mar 202423:47

Summary

The video is abnormal, and we are working hard to fix it.
Please replace the link and try again.

Takeaways

  • 🧠 Dr. Andrew Ngは人工知能のエージェントに非常に前向きで、GPT 3.5やGPT 4を動力源としたエージェントが論理的思考を行えるレベルに達する可能性があると述べています。
  • 🎓 Dr. Andrew Ngはコンピューターサイエンティストであり、Google Brainの共同創設者、かつBuuの前チーフサイエンティストとして人工知能の分野で先駆的な存在です。
  • 🌐 Ngが共同創設したCourseraは、コンピューターサイエンスや数学など多岐にわたる分野について無料で学ぶことができます。
  • 💡 Sequoia Capitalはシリコンバレーで最も伝説的なベンチャーキャピタルの一つであり、彼らのポートフォリオはNASDAQ市場資本額の25%を占めています。
  • 🔍 非代理的ワークフローでは、プロンプトを入力すると生成された回答がすぐに返されますが、代理的ワークフローでは、AIがイテレーションを通じて回答を改善することが可能です。
  • 🤖 代理的ワークフローでは、複数のエージェントが異なるロールやツールを使ってタスクを共同でイテレートし、最終的な最良の結果を生み出すことができます。
  • 📈 ゼロショットプロンプティングでは、AIにコードを生成させると48%の正解率だった一方、代理的ワークフローを用いたGPT 3.5はGPT 4を上回る性能を発揮しました。
  • 🛠 反射(reflection)は、言語モデルに対して生成した出力を改善する方法を見つけ出させるプロセスであり、これによりモデルのパフォーマンスが向上します。
  • 🔧 ツール使用(tool use)は、AIに特定のツールを用いさせて特定のタスクを遂行させる方法で、これは言語モデルが新しい機能を獲得する手段です。
  • 📝 計画(planning)と多代理協作(multi-agent collaboration)は、AIがより良い結果を生み出すために、ステップを計画し、複数のエージェントが協力するワークフローです。
  • 🔑 Andrew Ngの講演では、これらの代理的設計パターンが今年AIが遂行できるタスクのセットを劇的に拡大する可能性があると示唆しています。
  • 🚀 代理的ロジックとワークフローは、人工知能の将来において重要な役割を果たし、AGI(人工一般知能)への道のりにおいて一歩進む助けとなるでしょう。

Q & A

  • Dr. Andrew Ngはどのような人物ですか?

    -Dr. Andrew Ngはコンピューターサイエンティストで、Google Brainの共同創設者で元チーフサイエンティスト、またオンライン学習プラットフォームCourseraの創設者であり、人工知能の分野で重要な影響を与えている主要な思想家です。

  • Sequoia Capitalはどのような企業ですか?

    -Sequoia Capitalはシリコンバレーの伝説的なベンチャーキャピタル企業で、彼らのポートフォリオ企業はナスダック市場資本総額の25%を占めています。

  • Dr. Andrew Ngが語るエージェントとは何を指していますか?

    -Dr. Andrew Ngが語るエージェントとは、人工知能の未来を形作るであろう、自己完結的なAIシステムを指しており、GPT 3.5やGPT 4などの大きな言語モデルを駆使して、高度な思考や反復的な作業を実行できる能力を持つものを指します。

  • エージェントのワークフローとはどのようなものでしょうか?

    -エージェントのワークフローとは、AIが特定のタスクを遂行するために、複数のエージェントが異なるロールやツールを使って共同で反復的に作業を進めるプロセスを指します。

  • Dr. Andrew Ngが示すエージェントの強みは何ですか?

    -Dr. Andrew Ngが示すエージェントの強みは、複数のエージェントが異なる役割を分担し、反復的に作業を進めることにより、最終的な結果をより良いものにする能力にあります。

  • エージェントが使用するツールの利点は何ですか?

    -ツールの利点は、AIが特定のタスクを遂行するために必要な機能を持つコードを利用できることで、結果の予測可能性が高まり、AIの能力を拡張することができる点にあります。

  • Dr. Andrew Ngが語る「自己反映」とはどのようなプロセスですか?

    -「自己反映」とは、AIが生成した出力を自己が評価し、改善点を探してより良い結果を生成するプロセスを指します。

  • マルチエージェントコラボレーションの利点は何ですか?

    -マルチエージェントコラボレーションの利点は、異なるエージェントが異なる視点や専門知識を持ち、協力して問題を解決することで、より高品質な結果を出すことができる点にあります。

  • Dr. Andrew Ngが示すエージェントのワークフローはどのようにして実際のアプリケーションに適用されるでしょうか?

    -Dr. Andrew Ngが示すエージェントのワークフローは、開発者や研究者がAIエージェントを用いてタスクを遂行する際に、より効率的で高品質な結果を得るために、計画、反復、ツール使用などのステップを組み合わせて適用されます。

  • Dr. Andrew Ngが語るエージェントの未来についてどう思われていますか?

    -Dr. Andrew Ngはエージェントの未来について非常に前向きであり、人工知能の進化において重要な役割を果たすことを信じています。彼はエージェントのワークフローがタスクのセットを大幅に拡大し、AIの能力をさらに高めると予想しています。

Outlines

00:00

🤖 AIエージェントの未来とインパクト

Dr. Andrew NgのSeqoiaでの講演を紹介し、AIエージェントの重要性とその可能性について語る。NgはGoogle Brainの共同創設者であり、AIの分野で重要な影響を持つ。彼は、GPT 3.5やGPT 4を活用したエージェントが論理的思考を実行できると述べ、エージェントのワークフローが非代理的ワークフローを超える理由を説明する。また、Seqoiaのポートフォリオ企業の例を通じて、彼らの技術選定の優れさを強調している。

05:02

🔧 エージェントのデザインパターンとツール使用

エージェントのデザインパターンとツール使用について解説。反射(reflection)やツール使用、計画と多代理協調がエージェントのパフォーマンスを向上させる方法として提唱されている。特に、多代理協調が人間の思考プロセスと似ており、異なるロールを持つ複数のエージェントが協調して最終的な成果物を作り上げるプロセスが説明されている。

10:04

🛠 エージェントの自己反映とコードの改善

自己反映を通じてコードの品質を向上させる方法について説明。大規模言語モデル(LLM)に生成したコードを再評価させ、改善点を提案させるプロセスが紹介されている。このプロセスは、コードのバグを見つけるだけでなく、最適化や構造の改善にも役立つと示されている。

15:05

🔄 エージェントの計画と多代理協働の応用

計画と多代理協働のデザインパターンが、AIエージェントの応用範囲を広げていることを示す。計画能力を付与することで、AIは失敗から回復し、より良い結果を生み出すことが可能になる。また、多代理協働は、異なるモデルが異なる役割を担うことで、より高度なタスクを遂行することができる。

20:07

🚀 エージェントの進化とAGIへの道

エージェントの進化と、人工総知(AGI)への道について考察。エージェントのワークフローは、応答速度の向上により、より迅速かつ反復的なタスク遂行が可能になる。また、より高速なトークン生成が、エージェントの反復ループを効果的に活用し、AIのパフォーマンスを向上させる可能性があると語っている。

Mindmap

Keywords

💡エージェント

エージェントとは、人工知能の分野で、タスクを遂行するために自律的に行動するプログラムやアルゴリズムを指します。このビデオでは、エージェントが非代理的なワークフローと代理的なワークフローの比較において、より高度な問題解決能力を発揮する可能性があると強調されています。例えば、ビデオでは、「エージェントは論文の概要を作成し、ウェブ検索を行い、下書きを書いたり、修正を加えるなど、反復的なプロセスを経て最適な結果を出すことができます」と説明しています。

💡Dr. Andrew Ng

Dr. Andrew Ngは、コンピューターサイエンティストであり、Google Brainの共同創設者、かつ元チーフサイエンティストとして知られています。ビデオでは彼が人工知能の未来について語り、特にエージェントについて非常に前向きな見解を示しています。例えば、「Dr. Andrew Ngは人工知能の未来が代理的であると信じており、多くのビデオでその可能性について語っている」と触れています。

💡Sequoia

Sequoiaは、シリコンバレーの伝説的なベンチャーキャピタル企業であり、多くのテクノロジー企業をポートフォリオに抱えています。ビデオではSequoiaの成功について触れており、「Sequoiaのポートフォリオ企業は、ナスダック市場の総市值の25%を占めている」と紹介しています。

💡GPT

GPTとは、生成予測モデルの略で、人工知能の分野で自然言語処理を実行するアルゴリズムです。ビデオではGPT 3.5やGPT 4など、これらのモデルが代理的ワークフローに組み込まれ、より高度なタスクを遂行する能力を持つと示唆しています。例えば、「GPT 3.5は代理的ワークフローでラップされ、GPT 4を上回る性能を発揮する」と説明しています。

💡ゼロショットプロンプティング

ゼロショットプロンプティングは、AIモデルに特定のタスクを実行するように指示するプロセスで、モデルはそのタスクを一度で完璧にこなすことが求められます。ビデオでは、この方法がGPT 3.5では48%の正解率しか得られず、代理的ワークフローを用いた場合にその性能が向上する例を挙げています。

💡反復

反復とは、タスクを改善するために繰り返し行うプロセスです。ビデオでは、エージェントがタスクを反復的に実行し、最終的な結果を最適化する能力に重点を置いています。例えば、「このワークフローは反復的であり、エージェントが記事を修正し、さらに思考し、何度も繰り返すことで最適な結果を得る」と説明しています。

💡ツール使用

ツール使用とは、AIエージェントが特定のタスクを遂行するために利用できるツールやライブラリを指します。ビデオでは、ツール使用がAIの能力を拡張し、より複雑なタスクをこなすことができると示しています。例えば、「AIエージェントはウェブスクレイピングツールやSECルックアップツールなど、様々なツールを使用して情報を収集することができる」と触れています。

💡計画

計画は、タスクを遂行するために行う事前の思考や戦略の立案です。ビデオでは、計画能力をエージェントに与えることで、より良い結果を出すことができると強調しています。例えば、「計画を通じて、AIエージェントは失敗から回復することができる」と説明しています。

💡マルチエージェントコラボレーション

マルチエージェントコラボレーションは、複数のエージェントが協力してタスクを遂行するプロセスです。ビデオでは、この方法がAIのパフォーマンスを向上させると示唆しており、異なる役割を持つエージェントが協力する例を紹介しています。例えば、「マルチエージェントシステムでは、一人がCEO、一人がデザイナー、一人がソフトウェアエンジニアとして役割分担し、共同でゲームを開発する」と触れています。

💡人工知能の未来

人工知能の未来は、ビデオの中心となるトピックであり、Dr. Andrew Ngが特にエージェントがその未来を形づくると信じている点に重点を置いています。ビデオでは、AIの進化とそれに伴うタスクの拡大、そしてエージェントがその中心となる可能性について語っています。例えば、「Dr. Andrew Ngは人工知能の未来が代理的であると信じており、エージェントの重要性を強調している」と説明しています。

Highlights

Dr Andrew Ng 非常看好代理(agents)的未来,认为它们将推动人工智能的发展。

Andrew Ng 是一位计算机科学家,曾是 Google Brain 的联合创始人和负责人,也是 Buu 的前首席科学家。

Sequoia 是硅谷最著名的风险投资公司之一,其投资组合公司占 NASDAQ 总市值的 25%。

非代理性工作流程与代理性工作流程的对比,展示了代理性工作流程的迭代优势。

代理性工作流程允许多个代理协同工作,每个代理扮演不同的角色,共同迭代完成任务。

代理性工作流程在人类评估基准测试中的表现优于单一大型语言模型(LLM)。

通过代理性工作流程,GPT 3.5 的性能甚至超过了 GPT 4。

Andrew Ng 提出了几种在代理中看到的广泛设计模式,包括反思(reflection)、工具使用(tool use)、规划(planning)和多代理协作(multi-agent collaboration)。

反思是一种让大型语言模型评估并改进自己输出的技术。

工具使用允许大型语言模型利用预定义的工具来执行特定任务。

规划使大型语言模型能够更慢地思考,逐步规划任务。

多代理协作展示了不同代理如何通过不同角色和背景共同工作。

代理性工作流程在编码基准测试中显示出比零样本提示更好的性能。

Andrew Ng 认为,代理性工作流程将大大扩展 AI 能够执行的任务范围。

快速的 token 生成对于代理性工作流程至关重要,因为它允许更频繁的迭代。

Andrew Ng 认为,代理性工作流程可能是通向人工通用智能(AGI)的一步。

视频推荐了进一步阅读材料,以便观众更深入地了解这些技术。

Transcripts

play00:00

Dr Andrew ning just did a talk at

play00:03

Sequoia and is all about agents and he

play00:07

is incredibly bullish on agents he said

play00:09

things like GPT 3.5 powering agents can

play00:12

actually reason to the level of GPT 4

play00:15

and a lot of other really interesting

play00:17

tidbits so we're going to watch his talk

play00:19

together and I'm going to walk you

play00:20

through step by step what he's saying

play00:22

and why it's so important I am

play00:24

incredibly bullish on agents myself

play00:26

that's why I make so many videos about

play00:28

them and I truly believe the future of

play00:30

artificial intelligence is going to be a

play00:33

gentic so first who is Dr Andrew ning he

play00:36

is a computer scientist he was the

play00:38

co-founder and head of Google brain the

play00:41

former Chief scientist of Buu and a

play00:44

leading mind in artificial intelligence

play00:47

he went to UC Berkeley MIT and Carnegie

play00:50

melon so smart smart dude and he

play00:52

co-founded this company corsera where

play00:54

you can learn a ton about computer

play00:57

science about math a bunch of different

play00:59

topics absolutely free and so what he's

play01:02

doing is truly incredible and so when he

play01:05

talks about AI you should listen so

play01:07

let's get to this talk this is at seoa

play01:11

and if you're not familiar with seoa

play01:13

they are one of the most legendary

play01:14

Silicon Valley venture capital firms

play01:17

ever now here's an interesting stat

play01:18

about seoa that just shows how

play01:20

incredible they are at picking

play01:22

technological winners their portfolio of

play01:24

companies represents more than 25% of

play01:27

Today's total value of the the NASDAQ so

play01:31

the total value of all the companies

play01:33

that are listed on the NASDAQ 25% of

play01:35

that market capitalization are companies

play01:37

that are owned or have been owned or

play01:40

invested in by seoa incredible stat

play01:43

let's look at some of their companies

play01:44

Reddit instacart door Dash Airbnb a

play01:48

little company called Apple block

play01:50

snowflake vanta Zoom stripe WhatsApp

play01:55

OCTA Instagram this list is absolutely

play01:58

absurd all right another of the preface

play02:01

let me get into the Talk itself so a

play02:03

agents you know today the way most of us

play02:05

use l Shish models is like this with a

play02:08

non- agentic workflow where you type a

play02:10

prompt and generates an answer and

play02:12

that's a bit like if you ask a person to

play02:15

write an essay on a topic and I say

play02:17

please sit down to the keyboard and just

play02:19

type the essay from start to finish

play02:21

without ever using backspace um and

play02:24

despite how hard this is L's do it

play02:26

remarkably well in contrast with an

play02:30

agentic workflow this is what it may

play02:32

look like have an AI have an LM say

play02:34

write an essay outline do you need to do

play02:37

any web research if so let's do that

play02:39

then write the first draft and then read

play02:42

your own first draft and think about

play02:44

what parts need revision and then revise

play02:46

your draft and you go on and on and so

play02:49

this workflow is much more iterative

play02:51

where you may have the L do some

play02:54

thinking um and then revise this article

play02:57

and then do some more thinking and

play02:59

iterate this

play03:00

through a number of times so I want to

play03:02

pause it there and talk about this

play03:03

because this is the best explanation for

play03:06

why agents are so powerful I've heard a

play03:08

lot of people say well agents are just

play03:10

llms right and yeah technically that's

play03:13

true but the power of an agentic

play03:15

workflow is the fact that you can have

play03:17

multiple agents all with different roles

play03:19

different backgrounds different personas

play03:21

different tools working together and

play03:23

iterating that's the important word

play03:26

iterating on a task so in this example

play03:28

he said okay write an essay and yeah an

play03:31

llm can do that and usually it's pretty

play03:34

darn good but now let's say you have one

play03:36

agent who is the writer another agent

play03:39

who is the reviewer another for the

play03:41

spell checker another for the grammar

play03:42

checker another for the fact Checker and

play03:45

they're all working together and they

play03:47

iterate over and over again passing the

play03:49

essay back and forth making sure that it

play03:51

finally ends up to be the best possible

play03:53

outcome and so this is how humans work

play03:57

humans as he said do not just do

play04:00

everything in one take without thinking

play04:02

through and planning we plan we iterate

play04:05

and then we find the best solution so

play04:07

let's keep listening what not many

play04:08

people appreciate is this delivers

play04:11

remarkably better results um I've

play04:13

actually really surprised myself working

play04:15

these agent workflows how well how well

play04:18

they work other let's do one case study

play04:20

at my team analyzed some data using a

play04:23

coding Benchmark called the human eval

play04:25

Benchmark released by open a few years

play04:27

ago um but this says coding problems

play04:29

like given the nonent list of integers

play04:32

return the sum of all the odd elements

play04:33

are an even positions and it turns out

play04:35

the answer is you co snipper like that

play04:37

so today lot of us will use zero shot

play04:40

prompting meaning we tell the AI write

play04:42

the code and have it run on the first

play04:44

spot like who codes like that no human

play04:46

codes like that just type out the code

play04:47

and run it maybe you do I can't do that

play04:50

um so it turns out that if you use GPT

play04:53

3.5 uh zero shot prompting it gets it

play04:56

48% right uh gbd4 way better 67 7% right

play05:02

but if you take an agentic workflow and

play05:04

wrap it around GPT 3.5 say it actually

play05:08

does better than even

play05:10

gbd4 um and if you were to wrap this

play05:13

type of workflow around gbd4 you know it

play05:16

it it also um does very well all right

play05:19

let's pause here and think about what he

play05:20

just said over here we have the zero

play05:23

shot which basically means you're simply

play05:25

telling the large language model do this

play05:27

thing not giving it any example not

play05:30

giving it any chance to think or to

play05:31

iterate or any fancy prompting just do

play05:34

this thing and it got the human evalve

play05:36

Benchmark 48% correct then GPT 4 67%

play05:40

which is you know a huge Improvement and

play05:42

we're going to continue to see

play05:43

Improvement when GPT 5 comes out and so

play05:45

on however look at this GPT 3.5 wrapped

play05:49

in an agentic workflow any of these all

play05:53

perform better than the zero shot GPT 4

play05:56

using only GPT 3.5 and this lb BD plus

play06:00

reflection it's actually nearly 100%

play06:02

it's over 95% then of course if we wrap

play06:05

GPT 4 in the agentic workflow metag GPT

play06:08

for example we all know about it

play06:10

performs incredibly well across the

play06:12

board and agent coder kind of at the top

play06:15

here so it's really just showing the

play06:17

power of agentic workflows and you

play06:19

notice that GB 3.5 with an agentic

play06:22

workflow actually outperforms

play06:26

gp4 um and I think this has and this

play06:29

means that this has signant consequences

play06:31

I think how we all approach building

play06:33

applications so agents is the term has

play06:36

been tossed around a lot there's a lot

play06:38

of consultant reports how about agents

play06:40

the future of AI blah blah blah I want

play06:42

to be a bit concrete and share of you um

play06:44

the broad design patterns I'm seeing in

play06:47

agents it's a very messy chaotic space

play06:49

tons of research tons of Open Source

play06:51

there's a lot going on but I try to

play06:53

categorize um bit more concretely what's

play06:55

going on agents reflection is a tool

play06:58

that I think many of us are just use it

play07:00

just works uh to use I think it's more

play07:03

widely appreciated but actually works

play07:04

pretty well I think of these as pretty

play07:06

robust Technologies when I all right

play07:08

let's stop there and talk about what

play07:09

these things are so reflection is as

play07:12

obvious as it sounds you are literally

play07:14

saying to the large language model

play07:17

reflect on the output you just gave me

play07:19

find a way to improve it then return

play07:22

another result or just return the

play07:23

improvements so very straightforward and

play07:26

it seems so obvious but this actually

play07:29

causes large language models to perform

play07:31

a lot better and then we have tool use

play07:33

and we learned all about tool use with

play07:35

projects like autogen and crew AI tool

play07:38

use just means that you can give them

play07:40

tools to use you can custom code tools

play07:43

it's like function calling so you could

play07:45

say Okay I want a web scraping tool and

play07:48

I want an SEC lookup tool so you can get

play07:51

stock information about ticker symbols

play07:53

you can even plug in complex math

play07:57

libraries to it I mean the possibilities

play07:59

are literally endless so you can give a

play08:01

bunch of tools that the large language

play08:03

model didn't previously have you just

play08:05

describe what the tool does and the

play08:06

large language model can actually choose

play08:08

when to use the tool it's really cool

play08:10

use them I can you know almost always

play08:12

get them to work well um planning and

play08:15

multi-agent collaboration I think is

play08:17

more emerging when I use them sometimes

play08:20

my mind is blown for how well they work

play08:22

but at least at this moment in time I

play08:23

don't feel like I can always get them to

play08:25

work reliably so let me walk through

play08:28

these full design Pat

play08:30

all right so he's going to walk through

play08:31

it but I just want to touch on what

play08:32

planning and multi-agent collaboration

play08:34

is so planning we're basically saying

play08:36

giving the large language model the

play08:38

ability to think more slowly to plan

play08:40

steps and that's usually by the way why

play08:42

in all of my llm tests I say explain

play08:44

your reasoning step by step because that

play08:46

kind of forces them to plan and to think

play08:49

through each step which usually produces

play08:52

better results and then multi-agent

play08:54

collaboration that is autogen and crew

play08:56

AI that is a very emergent technology

play08:59

techology I am extremely bullish on it

play09:01

it is sometimes difficult to get the

play09:03

agents to behave like you need them to

play09:06

but with enough QA and enough testing

play09:08

and iteration you usually can and the

play09:10

results are phenomenal and not only do

play09:13

you get the benefit of having the large

play09:15

language model essentially reflect with

play09:17

different personalities or different

play09:18

roles but you can actually have

play09:21

different models powering different

play09:22

agents and so you're getting the benefit

play09:24

of the reflection based on the quality

play09:26

of each model so you're basically

play09:28

getting really different opinions as

play09:30

these agents are working together so

play09:32

let's keep listening and if some of you

play09:35

go back and yourself will ask your

play09:36

engineers to use these I think you get a

play09:38

productivity boost quite quickly so

play09:40

reflection here's an example let's say I

play09:43

ask a system please write Cod for me for

play09:46

a given task then we have a coder agent

play09:49

just an LM that you prompt to write code

play09:51

to say you def do Tas write a function

play09:54

like that um an example of

play09:57

self-reflection would be if you then

play09:59

prompt the LM with something like this

play10:01

here's code intended for a toss and just

play10:03

give it back the exact same code that

play10:05

they just generated and then say check

play10:07

the code carefully for correctness sound

play10:09

efficiency good construction CRI just

play10:10

write a prompt like that it turns out

play10:12

the same L that you prompted to write

play10:14

the code may be able to spot problems

play10:17

like this bug in line five and fix it by

play10:19

blah blah blah and if you now take his

play10:21

own feedback and give it to it and

play10:22

reprompt it it may come up with a

play10:25

version two of the code that could well

play10:26

work better than the first version not

play10:28

guaranteed but it works you know often

play10:30

enough but this to be worth trying for a

play10:32

law of appli so what you usually see me

play10:34

doing in my llm test videos is for

play10:36

example let's say I say write the Game

play10:38

snake in Python and it gives me the game

play10:41

Snake it's that is zero shot I'm just

play10:44

saying write it all out in one go then I

play10:47

take it I put it in my VSS code I play

play10:50

it I get the error or I look for any

play10:52

bugs and then I paste that back in to

play10:56

the large language model to fix now

play10:58

that's essentially me acting as an agent

play11:00

and what we can do is use an agent to

play11:02

automate me so basically look at the

play11:05

code look for any potential errors and

play11:07

even agents that can run the code get

play11:11

the error and pass it back into the

play11:13

large language model now it's completely

play11:15

automated coding to foreshadow to use if

play11:18

you let it run unit tests if it fails a

play11:20

unit test then why do you fail the unit

play11:23

test have that conversation and be able

play11:24

to figure out failed the unit test so

play11:26

you should try changing something and

play11:28

come up with V3 by the way for those of

play11:31

you that want to learn more about these

play11:32

Technologies I'm very excited about them

play11:34

for each of the four sections I have a

play11:36

little recommended reading section in

play11:37

the bottom that you know hopefully gives

play11:39

more references and again just the

play11:41

foreshadow of multi-agent systems I've

play11:44

described as a single coder agent that

play11:46

you prompt to have it you know have this

play11:48

conversation with itself um one Natural

play11:51

Evolution of this idea is instead of a

play11:53

single code agent you can have two

play11:56

agents where one is a code agent and the

play11:58

second is a critic agent and these could

play12:01

be the same base LM model but they you

play12:04

prompt in different ways where you say

play12:06

one your exper coder right code the

play12:08

other one say your expert code review as

play12:10

to review this code and this type of

play12:12

workflow is actually pretty easy to

play12:13

implement I think such a very general

play12:16

purpose technology for a lot of

play12:17

workflows this will give you a

play12:18

significant boost in in the performance

play12:20

of LMS um the second design pattern is

play12:24

to use many of you will already have

play12:26

seen you know lmb systems uh uh using

play12:29

tools on the left is a screenshot from

play12:32

um co-pilot on the right is something

play12:34

that I kind of extracted from uh gbd4

play12:37

but you know LM today if you ask it

play12:39

what's the best coffee maker can do web

play12:41

search for some problems LMS will

play12:43

generate code and run codes um and it

play12:45

turns out that there are a lot of

play12:48

different tools that many different

play12:49

people are using for analysis for

play12:52

gathering information for taking action

play12:54

personal productivity um it turns out a

play12:56

lot of the early work and to use turned

play12:58

out to be in the computer vision

play13:00

Community because before large language

play13:03

models LMS you know they couldn't do

play13:05

anything with images so the only option

play13:07

was that the LM generate a function call

play13:09

that could manipulate an image like

play13:11

generate an image or do object detection

play13:13

or whatever so if you actually look at

play13:14

literature it's been interesting how

play13:16

much of the work um in two years seems

play13:19

like it originated from Vision because

play13:21

Elms would blind to images before you

play13:24

know GPD 4V and and and lava and so on

play13:27

um so that's to use in it all right so

play13:30

tool use incredibly incredibly important

play13:33

because you're basically giving the

play13:34

large language model code to use it is

play13:37

hardcoded code so you always know the

play13:40

result it's not another large language

play13:42

model that might produce something a

play13:43

little different each time this is

play13:45

hardcoded and always is going to produce

play13:48

the same output so these tools are very

play13:50

valuable and the cool thing about tools

play13:53

is we don't have to rewrite them right

play13:54

we don't have to write them from scratch

play13:56

these are tools that programmers already

play13:58

test app to use in their code so whether

play14:01

it's external libraries API calls all of

play14:04

these things can now be used by large

play14:06

language models and that is really

play14:08

exciting we're not going to have to

play14:09

rewrite all of this tooling and then

play14:12

planning you know for those of you that

play14:13

have not yet played a lot with planning

play14:15

algorithms I I feel like a lot of people

play14:17

talk about the chat GPT moment where

play14:19

you're wow never seen anything like this

play14:22

I think if not use planning alums many

play14:24

people will have a kind of a AI agent

play14:27

wow I couldn't imag imagine the AI agent

play14:30

doing this so I've run live demos where

play14:32

something failed and the AI agent

play14:34

rerouted around the failure I've

play14:36

actually had quite a few of them like

play14:38

wow you can't believe my AI system just

play14:40

did that autonomously but um one example

play14:43

that I adapted from hugging GPT paper

play14:46

you know you say this general image

play14:48

where the girls read where girl and by

play14:49

the way I made a video about hugging GPT

play14:52

it is an amazing paper I'll link that in

play14:54

the description below I was reading a

play14:56

book and it post the same as a boy in

play14:57

the image example le. jpack and please

play15:00

subcribe the new imagy re voice so give

play15:01

an example like this um today we have ai

play15:04

agents who can kind of decide first

play15:06

thing I need to do is determine the post

play15:08

of the boy um then you know find the

play15:11

right model maybe on hugging face to

play15:14

extract the post then next need to find

play15:16

a post image model to synthesize a

play15:19

picture of a of a girl of as following

play15:22

the instructions then use uh image to

play15:24

text and then finally use text to speech

play15:27

and today we actually have agents that

play15:29

I don't want to say they work reliably

play15:32

you know they're kind of finicky they

play15:34

don't always work but when it works is

play15:36

actually pretty amazing but with agentic

play15:39

Loop sometimes you can recover from

play15:40

earlier failures as well so yeah and

play15:42

that's a really important Point agents

play15:44

are a little bit finicky but since you

play15:46

can iterate and the Agents can usually

play15:49

recover from their issues that makes

play15:52

them a lot more powerful and as we

play15:54

continue to evolve agents as we get

play15:56

better agentic models better tooling

play15:58

better Frameworks like crew aai and

play16:00

autogen all of these kind of finicky

play16:03

aspects of agents are going to start to

play16:06

get reduced tremendously I find myself

play16:09

already using research agents in some of

play16:11

my work well one a piece of research but

play16:13

I don't feel like you know Googling

play16:15

myself and spend long time I should send

play16:16

to the research agent come back in a few

play16:19

minutes and see what it's come up with

play16:20

and and it it sometimes works sometimes

play16:22

doesn't right but that's already a part

play16:24

of my personal

play16:25

workflow the final design pattern multi-

play16:28

Asian collaboration ation this is one of

play16:29

those funny things but uh um it works

play16:33

much better than you might think uh uh

play16:36

but on the left is a screenshot from a

play16:38

paper called um chat Dev I made a video

play16:42

about this it'll be in the description

play16:44

below as well uh which is completely

play16:46

open which actually open source many of

play16:48

you saw the you know flashy social media

play16:50

announcement of demo of a Devon uh uh

play16:53

Chad Dev is open source it runs on my

play16:56

laptop and what Chad Dev does is example

play16:59

of a multi-agent system where you prompt

play17:02

one LM to sometimes act like the CEO of

play17:05

a software engine company sometimes act

play17:07

a designer sometime a product manager

play17:09

sometimes ACC a tester and this flock of

play17:12

agents that you buil by prompting an LM

play17:14

to tell them you're now coo you're now

play17:16

software engineer they collaborate have

play17:18

an extended conversation so that if you

play17:21

tell it please develop a game develop a

play17:24

GOI game they'll actually spend you know

play17:26

a few minutes writing code testing it

play17:29

iterating and then generate a like

play17:31

surprisingly complex programs doesn't

play17:34

always work I've used it sometimes it

play17:36

doesn't work sometimes is amazing but

play17:38

this technology is really um getting

play17:40

better and and just one of design

play17:42

pattern it turns out that multi-agent

play17:45

debate where you have different agents

play17:46

you know for example could be have ch

play17:48

GPT and Gemini debate each other that

play17:51

actually results in better performance

play17:54

as well all right so he said the

play17:55

important part right there when you have

play17:57

different agents and each of them are

play17:58

are powered by different models maybe

play18:00

even fine-tuned models fine-tuned

play18:03

specifically for their task and their

play18:06

role you get really good performance and

play18:09

that is exactly what a project like crew

play18:11

AI like autogen is made for so Gabby

play18:14

multiple simulated air agents work

play18:16

together has been a powerful design

play18:18

pattern as well um so just to summarize

play18:21

I think these are the these are the the

play18:24

the uh patterns I've seen and I think

play18:26

that if we were to um use these uh uh

play18:29

patterns you know in our work a lot of

play18:32

us can get a prity boost quite quickly

play18:35

and I think that um agentic reasoning

play18:38

design patterns are going to be

play18:39

important uh this is my small slide I

play18:42

expect that the set of task AI could do

play18:44

will expand dramatically this year uh

play18:48

because of agentic workflows and one

play18:51

thing that it's actually difficult

play18:52

people to get used to is when we prompt

play18:54

an LM we want to response right away um

play18:57

in fact a decade ago when was you know

play18:59

having discussions around at at at

play19:01

Google on um called a big box search

play19:04

type in Long prompt one of the reasons

play19:07

you know I failed to push successfully

play19:09

for that was because when you do a web

play19:11

search you one have responds back in

play19:13

half a second right that's just human

play19:14

nature we like that instant gra instant

play19:16

feedback but for a lot of the agent

play19:18

workflows um I think we'll need to learn

play19:21

to dedicate the toss and AI agent and

play19:23

patiently wait minutes maybe even hours

play19:26

uh to for response but just like us I've

play19:28

seen a lot of novice managers delegate

play19:31

something to someone and then check in

play19:32

five minutes later right and that's not

play19:34

productive um I think we need to it be

play19:37

difficult we need to do that with some

play19:38

of our AI agents as well all right so

play19:41

this is actually a point which I want to

play19:44

pose a different way of thinking about

play19:45

it think about grock grock grq you get

play19:48

500 700 850 tokens per second with grock

play19:53

with their architecture and all of a

play19:55

sudden the agents which you know you

play19:57

usually expect them to take a few

play19:59

minutes to do a semi complex task all

play20:01

the way up to 10 15 20 minutes depending

play20:03

on what the task is a lot of the time in

play20:06

that task completion is the inference

play20:09

running that is assuming you're getting

play20:11

you know 10 15 20 tokens per second with

play20:14

open AI but if you're able to get 800

play20:16

tokens per second it's essentially

play20:18

instant and a lot of people when they

play20:20

first saw grock they thought well what's

play20:23

the point of 800 tokens per second

play20:25

because humans can't read that fast this

play20:27

is the best use case for that agents

play20:29

using hyper inference speed and reading

play20:31

each other's responses is the best way

play20:34

to leverage that really fast inference

play20:37

speed humans don't actually need to read

play20:39

it so this is a perfect example so if

play20:42

all of a sudden that part of your agent

play20:44

workflow is extremely fast and then

play20:47

let's say we get an embeddings model to

play20:49

be that fast all of a sudden the slowest

play20:52

part of the entire agent workflow is

play20:55

going to be searching the web or hitting

play20:58

a third party API it's no longer going

play21:00

to be the inference and the embeddings

play21:02

and that is really exciting let's keep

play21:05

watching the end and then one other

play21:07

important Trend fast token generation is

play21:09

important because with these agentic

play21:11

workflows we're iterating over and over

play21:13

so the elm is generating tokens for the

play21:15

to read and I think that um generating

play21:18

more tokens really quickly from even a

play21:20

slightly lower quality LM might give

play21:23

good results compared to slower tokens

play21:25

from a betm maybe it's a little bit

play21:27

controversial because it may let you go

play21:29

around this Loop a lot more times kind

play21:31

of like the results I showed with gpdc

play21:33

and an agent architecture on the first

play21:35

slide um and cand I'm really looking

play21:38

forward to Cloud 5 and Cloud 4 and gb5

play21:41

and Gemini 2.0 and all these other one4

play21:43

models that many building and part of me

play21:46

feels like if you're looking forward to

play21:48

running your thing on gb5 zero shot you

play21:51

know you may be to get closer to that

play21:53

level of performance on some

play21:55

applications than you might think with

play21:57

agent reasoning um but on an early model

play22:00

I think I I I I think this is an

play22:02

important Trend uh uh and honestly the

play22:07

path to AGI feels like a journey rather

play22:10

than a destination but I think this typ

play22:12

of agent workflows could help us take a

play22:14

small step forward on this very long

play22:16

journey thank you okay so he said a lot

play22:19

of important things at the end there one

play22:21

thing he said is if you're already

play22:22

looking forward to GPT 5 clae 4 the

play22:24

basically the next generation of The

play22:25

Cutting Edge models you might be able to

play22:27

achieve

play23:28

and what's the cost of all these tokens

play23:30

and all of that I think is going to get

play23:32

sorted out as models become more and

play23:34

more commoditized so I'm super excited

play23:37

about agents I'm super excited about

play23:39

inference speed improvements and I hope

play23:41

you liked Andrew ning's talk if you

play23:42

liked this video please consider giving

play23:44

a like And subscribe and I'll see you in

play23:46

the next one

Rate This

5.0 / 5 (0 votes)

関連タグ
AIエージェントデザインパターンDr.AndrewNg人工知能講演要約技術進化多代理システムツール使用自己反映計画アルゴリズムマルチエージェント