AI Pioneer Shows The Power of AI AGENTS - "The Future Is Agentic"

Matthew Berman
29 Mar 202423:47

Summary

TLDRDr. Andrew Ng, a prominent figure in AI and co-founder of Google Brain, delivered a talk at Sequoia Capital, emphasizing the transformative potential of AI agents. Ng highlighted the importance of an 'agentic workflow' in AI, where multiple agents with distinct roles collaborate and iterate on tasks, leading to superior outcomes over traditional single-model approaches. He discussed the effectiveness of this method using benchmarks like the HumanEval coding test, where an agentic workflow with GPT 3.5 outperformed GPT-4 with zero-shot prompting. Ng also touched on key agentic design patterns, including reflection, tool use, planning, and multi-agent collaboration, which are set to expand the scope of AI capabilities. He concluded by advocating for patience with AI agents, as their iterative processes may require longer wait times for more refined results, and expressed optimism about the future of AI with the advent of faster inference speeds.

Takeaways

  • 🧠 Dr. Andrew Ng, a leading AI expert, is very optimistic about the future of agents in AI, emphasizing their iterative and collaborative nature.
  • 📈 Agents, when compared to non-agentic workflows, can produce better results through a process of planning, iteration, and collaboration.
  • 🤖 The concept of 'reflection' in agents allows language models to review and improve their own outputs, leading to higher quality results.
  • 🛠️ 'Tool use' is a significant feature where agents can utilize predefined tools to perform specific tasks, enhancing their capabilities.
  • 📈 Sequoia, a renowned venture capital firm, has a portfolio representing over 25% of the NASDAQ's total value, highlighting their successful tech investments.
  • 🔍 'Planning' and 'multi-agent collaboration' are emerging as robust technologies that can lead to surprising and effective outcomes.
  • 📝 An agentic workflow can outperform even the latest language models like GPT-4 when used in conjunction with GPT-3.5.
  • 🚀 The potential of agentic workflows is expected to expand the scope of tasks AI can perform, possibly leading to significant productivity boosts.
  • ⏱️ Fast token generation is crucial for agentic workflows, as it allows for more iterations and quicker responses in complex tasks.
  • 🔗 The use of multi-agent systems, where different agents play different roles, can lead to better performance and more reliable outcomes.
  • 🌟 The path to Artificial General Intelligence (AGI) is viewed as a journey, and agentic workflows are seen as a step forward in this long-term goal.

Q & A

  • Who is Dr. Andrew Ng and why is he considered a leading mind in artificial intelligence?

    -Dr. Andrew Ng is a computer scientist known for co-founding and heading Google Brain, being the former Chief Scientist at Baidu, and his significant contributions to the field of AI. He has studied at prestigious institutions like UC Berkeley, MIT, and Carnegie Mellon, and he co-founded Coursera, an online learning platform offering a wide range of courses in computer science and other subjects.

  • What is the significance of Sequoia in the context of Silicon Valley venture capital firms?

    -Sequoia is one of the most legendary venture capital firms in Silicon Valley, known for its ability to pick technological winners. Their portfolio of companies represents more than 25% of the total value of the NASDAQ, which is an incredible statistic considering the vast number of companies listed on the exchange.

  • What is the difference between a non-agentic workflow and an agentic workflow in the context of using language models?

    -A non-agentic workflow involves using a language model to generate an answer to a prompt in one go, without any back-and-forth interaction. An agentic workflow, on the other hand, is iterative and involves multiple agents with different roles working together, revising, and iterating on a task to achieve the best possible outcome.

  • How does the agentic workflow improve the results of tasks compared to a non-agentic approach?

    -The agentic workflow improves results by allowing multiple agents, each with different roles and tools, to work together and iterate on a task. This collaborative and iterative process is more aligned with how humans work, leading to higher quality outcomes.

  • What is the 'reflection' tool in the context of agentic workflows?

    -Reflection is a tool used in agentic workflows where a large language model is prompted to review and find ways to improve its own output. This self-evaluation and iterative improvement process can significantly enhance the performance of the language model.

  • How does tool use enhance the capabilities of language models?

    -Tool use allows language models to leverage pre-existing, hardcoded code for specific functions, such as web scraping or stock information retrieval. By providing these tools to the language model, it can use them as needed, enhancing its capabilities without the need to generate new code from scratch.

  • What is the concept of planning in the context of AI agents?

    -Planning in AI agents involves giving the language model the ability to think through steps more slowly and methodically, often by explaining its reasoning step by step. This forced planning can lead to more thoughtful and accurate results.

  • What are the benefits of multi-agent collaboration in AI workflows?

    -Multi-agent collaboration allows different agents, potentially powered by different models, to work together, each contributing their specialized skills or perspectives. This collaboration can lead to more robust and higher quality outcomes compared to a single-agent approach.

  • How does the concept of 'fast token generation' relate to agentic workflows?

    -Fast token generation is important for agentic workflows because these workflows often involve multiple iterations. The ability to generate more tokens quickly, even from a slightly lower quality language model, can lead to better results due to the increased number of iterations possible.

  • What are some of the challenges in implementing agentic workflows?

    -One of the challenges is that agents can be finicky and may not always work as expected. However, the iterative nature of agentic workflows allows for recovery from earlier failures. Another challenge is adjusting to the slower pace of response, as agentic workflows may require waiting for minutes or even hours for the best results.

  • How does Dr. Andrew Ng's talk relate to the future of AI and the concept of AGI (Artificial General Intelligence)?

    -Dr. Ng's talk emphasizes the potential of agentic workflows in pushing the boundaries of what AI can do, which aligns with the pursuit of AGI. While AGI is a long-term goal, the advancements in agentic workflows, tool use, and multi-agent collaboration could contribute to incremental progress towards achieving more general intelligence in AI systems.

  • What are some of the key takeaways from Dr. Ng's talk regarding the future applications of AI?

    -Dr. Ng suggests that the set of tasks AI can perform will expand dramatically due to agentic workflows. He also highlights the importance of fast token generation and the potential for productivity boosts in various workflows. Additionally, he emphasizes the need for patience and a shift in expectations regarding the speed of AI responses, especially when leveraging agentic reasoning.

Outlines

00:00

🚀 Dr. Andrew Ng's Optimism on AI Agents

Dr. Andrew Ng, a prominent computer scientist and co-founder of Google Brain, shares his enthusiasm for AI agents at Sequoia, a prestigious Silicon Valley venture capital firm. Ng discusses the potential of models like GPT 3.5 to reason at the level of GPT 4. He emphasizes the iterative and collaborative nature of agentic workflows, which allow for multiple agents to work together, improving tasks through continuous iteration. Ng's talk is significant as it outlines the future of AI, which he believes lies in agents, and provides insights into the powerful combination of different agents with specialized roles.

05:02

📈 Agentic Workflows Surpass Zero-Shot Performance

The paragraph delves into the effectiveness of agentic workflows, comparing them to the zero-shot approach where AI is given a task without any examples or opportunities for reflection. It highlights a case study where using an agentic workflow with GPT 3.5 outperformed GPT 4's zero-shot prompting. The summary outlines the broad design patterns seen in agents, such as reflection, tool use, planning, and multi-agent collaboration. These patterns are seen as robust technologies that can significantly enhance productivity and performance of AI applications.

10:04

🔍 Reflection and Tool Use in Agentic Design

This section focuses on the concept of reflection, where a large language model (LLM) is asked to review and improve its own output. It also discusses tool use, which allows LLMs to utilize custom-coded tools or existing libraries to enhance their capabilities. The paragraph explains how these tools can be hardcoded for predictable outcomes and how they can be integrated into LLMs to improve their functionality and efficiency.

15:05

🤖 Multi-Agent Collaboration and Planning

The paragraph explores the idea of multi-agent collaboration, where different agents with distinct roles work together to achieve a task. It also touches on planning, which enables LLMs to think through steps more deliberately. The speaker shares his experience with research agents and how they can be integrated into personal workflows. The potential of multi-agent systems like crew AI and autogen is highlighted, emphasizing their ability to produce complex and high-quality outcomes when agents collaborate effectively.

20:07

⚡ Fast Token Generation and the Future of Agents

The final paragraph discusses the importance of fast token generation for agentic workflows, which rely on rapid iteration. It suggests that even a slightly lower quality LLM can produce good results if it generates tokens quickly, allowing for more iterations in the workflow. The speaker expresses excitement about the upcoming models like GPT 5 and the potential for agents to take a step forward in the journey towards artificial general intelligence (AGI). The paragraph concludes with a call to embrace耐心等待 (patient waiting) for AI agents to complete their tasks, comparing it to the way managers delegate work and expect results.

Mindmap

Keywords

💡Agents

In the context of the video, 'agents' refer to AI systems that can perform tasks autonomously, often with the ability to make decisions and iterate on solutions. They are a core focus of the discussion, as they represent a shift from passive language models to active, goal-oriented systems. The video emphasizes the potential of agents to revolutionize AI applications by collaborating, planning, and iterating to achieve better results, as illustrated by the example of different agents working together to write, review, and revise an essay.

💡Dr. Andrew Ng

Dr. Andrew Ng is a renowned computer scientist, co-founder of Google Brain, and a leading mind in AI. He is highlighted in the video as an authority on the subject, delivering a talk at Sequoia that the speaker is reviewing. His opinions and insights on agents and AI are central to the video's narrative, as they provide credibility and depth to the discussion on the future of AI.

💡Sequoia

Sequoia is a prestigious venture capital firm known for its successful investments in technology companies. The video mentions Sequoia to underscore the significance of Dr. Andrew Ng's talk, as it took place at an influential firm that has a history of identifying and supporting technological winners.

💡GPT (Generative Pre-trained Transformer)

GPT is a type of AI language model that is capable of generating human-like text. The video discusses different versions of GPT, such as GPT 3.5 and GPT 4, and how they power agents to perform tasks. The advancements in GPT models are crucial to the video's theme of the future of AI, as they enable more sophisticated and effective agentic workflows.

💡Iterative Workflow

An iterative workflow is a process that involves repeated cycles of development and refinement. In the context of the video, it is used to describe how agents can improve their output by going through multiple rounds of thinking and revising. This concept is central to the video's argument that agents can outperform non-agentic systems by simulating human-like planning and iteration.

💡Tool Use

Tool use in the video refers to the ability of agents to utilize predefined tools or functions to accomplish tasks. This can include web scraping, data analysis, or any other task-specific tool. The concept is important as it extends the capabilities of language models, allowing them to perform a wider range of tasks more effectively.

💡Reflection

Reflection, as used in the video, is a technique where a language model is prompted to review and improve its own output. This process is likened to self-editing and is a key component in enhancing the performance of agents. It is showcased as a method to achieve higher accuracy and quality in the tasks agents perform.

💡Multi-Agent Collaboration

This concept involves multiple agents, each potentially powered by different models, working together to achieve a common goal. The video discusses how this collaboration can lead to better outcomes than single-agent systems, as each agent can contribute specialized skills or perspectives to the task at hand.

💡Human Eval Benchmark

The Human Eval Benchmark is a coding challenge used to test the performance of AI systems. In the video, it is used as a case study to demonstrate the effectiveness of agentic workflows. The benchmark serves as a metric to compare the performance of different AI systems and versions of GPT in solving coding problems.

💡Zero-Shot Prompting

Zero-shot prompting is a method where an AI is asked to perform a task without any prior examples or training on that specific task. The video contrasts this approach with agentic workflows, showing that while zero-shot prompting can be effective, agentic workflows can yield even better results due to their iterative nature.

💡Planning

Planning in the context of the video refers to the ability of agents to think through steps and devise a strategy to achieve a goal. It is an essential part of the agentic workflow, allowing agents to approach problems in a more structured and efficient manner, which is likened to human problem-solving processes.

Highlights

Dr. Andrew Ng, a leading mind in AI and co-founder of Google Brain, is very optimistic about the future of agents in AI.

Agents powered by models like GPT 3.5 can reason at the level of GPT 4, indicating significant advancements in AI capabilities.

Sequoia, a renowned Silicon Valley venture capital firm, has a portfolio representing over 25% of the NASDAQ's total value, demonstrating their success in identifying technological winners.

Non-agentic workflows are compared to asking a person to write an essay without revision, whereas agentic workflows involve iterative processes similar to human planning and revision.

The power of agentic workflows lies in the ability to have multiple agents with different roles working together and iterating on a task.

Case studies show that an agentic workflow with GPT 3.5 outperforms zero-shot GPT 4 on coding benchmarks, highlighting the effectiveness of iterative agentic processes.

Reflection, a tool that allows large language models to review and improve their own outputs, significantly enhances performance.

Tool use in AI involves providing agents with custom-coded tools, APIs, and libraries, expanding their capabilities beyond their initial programming.

Planning and multi-agent collaboration are emerging technologies that, despite being finicky, can produce phenomenal results when agents work together.

Different models can power different agents, providing diverse perspectives and enhancing the quality of the final outcome.

Self-reflection in coding involves an agent reviewing and improving its own code, which can lead to more efficient and error-free results.

Automating the coding process through agents can lead to significant productivity boosts and improved code quality.

The use of tools by AI agents allows them to leverage existing, tested code and functionalities, making them more reliable and efficient.

Planning algorithms enable AI agents to autonomously find solutions to problems, circumventing failures, and adapting to new requirements.

Multi-agent collaboration, where different agents play different roles, can lead to complex problem-solving and innovative solutions.

The future of AI is expected to expand dramatically with agentic workflows, potentially changing how we interact with and utilize AI systems.

As AI models become more commoditized, the cost of using these advanced AI functionalities is expected to decrease, making them more accessible.

Fast token generation is crucial for agentic workflows, allowing for more iterations and faster response times, which can lead to better results.

The journey towards AGI (Artificial General Intelligence) is incremental, and agentic workflows could represent a step forward in this long-term progression.

Transcripts

play00:00

Dr Andrew ning just did a talk at

play00:03

Sequoia and is all about agents and he

play00:07

is incredibly bullish on agents he said

play00:09

things like GPT 3.5 powering agents can

play00:12

actually reason to the level of GPT 4

play00:15

and a lot of other really interesting

play00:17

tidbits so we're going to watch his talk

play00:19

together and I'm going to walk you

play00:20

through step by step what he's saying

play00:22

and why it's so important I am

play00:24

incredibly bullish on agents myself

play00:26

that's why I make so many videos about

play00:28

them and I truly believe the future of

play00:30

artificial intelligence is going to be a

play00:33

gentic so first who is Dr Andrew ning he

play00:36

is a computer scientist he was the

play00:38

co-founder and head of Google brain the

play00:41

former Chief scientist of Buu and a

play00:44

leading mind in artificial intelligence

play00:47

he went to UC Berkeley MIT and Carnegie

play00:50

melon so smart smart dude and he

play00:52

co-founded this company corsera where

play00:54

you can learn a ton about computer

play00:57

science about math a bunch of different

play00:59

topics absolutely free and so what he's

play01:02

doing is truly incredible and so when he

play01:05

talks about AI you should listen so

play01:07

let's get to this talk this is at seoa

play01:11

and if you're not familiar with seoa

play01:13

they are one of the most legendary

play01:14

Silicon Valley venture capital firms

play01:17

ever now here's an interesting stat

play01:18

about seoa that just shows how

play01:20

incredible they are at picking

play01:22

technological winners their portfolio of

play01:24

companies represents more than 25% of

play01:27

Today's total value of the the NASDAQ so

play01:31

the total value of all the companies

play01:33

that are listed on the NASDAQ 25% of

play01:35

that market capitalization are companies

play01:37

that are owned or have been owned or

play01:40

invested in by seoa incredible stat

play01:43

let's look at some of their companies

play01:44

Reddit instacart door Dash Airbnb a

play01:48

little company called Apple block

play01:50

snowflake vanta Zoom stripe WhatsApp

play01:55

OCTA Instagram this list is absolutely

play01:58

absurd all right another of the preface

play02:01

let me get into the Talk itself so a

play02:03

agents you know today the way most of us

play02:05

use l Shish models is like this with a

play02:08

non- agentic workflow where you type a

play02:10

prompt and generates an answer and

play02:12

that's a bit like if you ask a person to

play02:15

write an essay on a topic and I say

play02:17

please sit down to the keyboard and just

play02:19

type the essay from start to finish

play02:21

without ever using backspace um and

play02:24

despite how hard this is L's do it

play02:26

remarkably well in contrast with an

play02:30

agentic workflow this is what it may

play02:32

look like have an AI have an LM say

play02:34

write an essay outline do you need to do

play02:37

any web research if so let's do that

play02:39

then write the first draft and then read

play02:42

your own first draft and think about

play02:44

what parts need revision and then revise

play02:46

your draft and you go on and on and so

play02:49

this workflow is much more iterative

play02:51

where you may have the L do some

play02:54

thinking um and then revise this article

play02:57

and then do some more thinking and

play02:59

iterate this

play03:00

through a number of times so I want to

play03:02

pause it there and talk about this

play03:03

because this is the best explanation for

play03:06

why agents are so powerful I've heard a

play03:08

lot of people say well agents are just

play03:10

llms right and yeah technically that's

play03:13

true but the power of an agentic

play03:15

workflow is the fact that you can have

play03:17

multiple agents all with different roles

play03:19

different backgrounds different personas

play03:21

different tools working together and

play03:23

iterating that's the important word

play03:26

iterating on a task so in this example

play03:28

he said okay write an essay and yeah an

play03:31

llm can do that and usually it's pretty

play03:34

darn good but now let's say you have one

play03:36

agent who is the writer another agent

play03:39

who is the reviewer another for the

play03:41

spell checker another for the grammar

play03:42

checker another for the fact Checker and

play03:45

they're all working together and they

play03:47

iterate over and over again passing the

play03:49

essay back and forth making sure that it

play03:51

finally ends up to be the best possible

play03:53

outcome and so this is how humans work

play03:57

humans as he said do not just do

play04:00

everything in one take without thinking

play04:02

through and planning we plan we iterate

play04:05

and then we find the best solution so

play04:07

let's keep listening what not many

play04:08

people appreciate is this delivers

play04:11

remarkably better results um I've

play04:13

actually really surprised myself working

play04:15

these agent workflows how well how well

play04:18

they work other let's do one case study

play04:20

at my team analyzed some data using a

play04:23

coding Benchmark called the human eval

play04:25

Benchmark released by open a few years

play04:27

ago um but this says coding problems

play04:29

like given the nonent list of integers

play04:32

return the sum of all the odd elements

play04:33

are an even positions and it turns out

play04:35

the answer is you co snipper like that

play04:37

so today lot of us will use zero shot

play04:40

prompting meaning we tell the AI write

play04:42

the code and have it run on the first

play04:44

spot like who codes like that no human

play04:46

codes like that just type out the code

play04:47

and run it maybe you do I can't do that

play04:50

um so it turns out that if you use GPT

play04:53

3.5 uh zero shot prompting it gets it

play04:56

48% right uh gbd4 way better 67 7% right

play05:02

but if you take an agentic workflow and

play05:04

wrap it around GPT 3.5 say it actually

play05:08

does better than even

play05:10

gbd4 um and if you were to wrap this

play05:13

type of workflow around gbd4 you know it

play05:16

it it also um does very well all right

play05:19

let's pause here and think about what he

play05:20

just said over here we have the zero

play05:23

shot which basically means you're simply

play05:25

telling the large language model do this

play05:27

thing not giving it any example not

play05:30

giving it any chance to think or to

play05:31

iterate or any fancy prompting just do

play05:34

this thing and it got the human evalve

play05:36

Benchmark 48% correct then GPT 4 67%

play05:40

which is you know a huge Improvement and

play05:42

we're going to continue to see

play05:43

Improvement when GPT 5 comes out and so

play05:45

on however look at this GPT 3.5 wrapped

play05:49

in an agentic workflow any of these all

play05:53

perform better than the zero shot GPT 4

play05:56

using only GPT 3.5 and this lb BD plus

play06:00

reflection it's actually nearly 100%

play06:02

it's over 95% then of course if we wrap

play06:05

GPT 4 in the agentic workflow metag GPT

play06:08

for example we all know about it

play06:10

performs incredibly well across the

play06:12

board and agent coder kind of at the top

play06:15

here so it's really just showing the

play06:17

power of agentic workflows and you

play06:19

notice that GB 3.5 with an agentic

play06:22

workflow actually outperforms

play06:26

gp4 um and I think this has and this

play06:29

means that this has signant consequences

play06:31

I think how we all approach building

play06:33

applications so agents is the term has

play06:36

been tossed around a lot there's a lot

play06:38

of consultant reports how about agents

play06:40

the future of AI blah blah blah I want

play06:42

to be a bit concrete and share of you um

play06:44

the broad design patterns I'm seeing in

play06:47

agents it's a very messy chaotic space

play06:49

tons of research tons of Open Source

play06:51

there's a lot going on but I try to

play06:53

categorize um bit more concretely what's

play06:55

going on agents reflection is a tool

play06:58

that I think many of us are just use it

play07:00

just works uh to use I think it's more

play07:03

widely appreciated but actually works

play07:04

pretty well I think of these as pretty

play07:06

robust Technologies when I all right

play07:08

let's stop there and talk about what

play07:09

these things are so reflection is as

play07:12

obvious as it sounds you are literally

play07:14

saying to the large language model

play07:17

reflect on the output you just gave me

play07:19

find a way to improve it then return

play07:22

another result or just return the

play07:23

improvements so very straightforward and

play07:26

it seems so obvious but this actually

play07:29

causes large language models to perform

play07:31

a lot better and then we have tool use

play07:33

and we learned all about tool use with

play07:35

projects like autogen and crew AI tool

play07:38

use just means that you can give them

play07:40

tools to use you can custom code tools

play07:43

it's like function calling so you could

play07:45

say Okay I want a web scraping tool and

play07:48

I want an SEC lookup tool so you can get

play07:51

stock information about ticker symbols

play07:53

you can even plug in complex math

play07:57

libraries to it I mean the possibilities

play07:59

are literally endless so you can give a

play08:01

bunch of tools that the large language

play08:03

model didn't previously have you just

play08:05

describe what the tool does and the

play08:06

large language model can actually choose

play08:08

when to use the tool it's really cool

play08:10

use them I can you know almost always

play08:12

get them to work well um planning and

play08:15

multi-agent collaboration I think is

play08:17

more emerging when I use them sometimes

play08:20

my mind is blown for how well they work

play08:22

but at least at this moment in time I

play08:23

don't feel like I can always get them to

play08:25

work reliably so let me walk through

play08:28

these full design Pat

play08:30

all right so he's going to walk through

play08:31

it but I just want to touch on what

play08:32

planning and multi-agent collaboration

play08:34

is so planning we're basically saying

play08:36

giving the large language model the

play08:38

ability to think more slowly to plan

play08:40

steps and that's usually by the way why

play08:42

in all of my llm tests I say explain

play08:44

your reasoning step by step because that

play08:46

kind of forces them to plan and to think

play08:49

through each step which usually produces

play08:52

better results and then multi-agent

play08:54

collaboration that is autogen and crew

play08:56

AI that is a very emergent technology

play08:59

techology I am extremely bullish on it

play09:01

it is sometimes difficult to get the

play09:03

agents to behave like you need them to

play09:06

but with enough QA and enough testing

play09:08

and iteration you usually can and the

play09:10

results are phenomenal and not only do

play09:13

you get the benefit of having the large

play09:15

language model essentially reflect with

play09:17

different personalities or different

play09:18

roles but you can actually have

play09:21

different models powering different

play09:22

agents and so you're getting the benefit

play09:24

of the reflection based on the quality

play09:26

of each model so you're basically

play09:28

getting really different opinions as

play09:30

these agents are working together so

play09:32

let's keep listening and if some of you

play09:35

go back and yourself will ask your

play09:36

engineers to use these I think you get a

play09:38

productivity boost quite quickly so

play09:40

reflection here's an example let's say I

play09:43

ask a system please write Cod for me for

play09:46

a given task then we have a coder agent

play09:49

just an LM that you prompt to write code

play09:51

to say you def do Tas write a function

play09:54

like that um an example of

play09:57

self-reflection would be if you then

play09:59

prompt the LM with something like this

play10:01

here's code intended for a toss and just

play10:03

give it back the exact same code that

play10:05

they just generated and then say check

play10:07

the code carefully for correctness sound

play10:09

efficiency good construction CRI just

play10:10

write a prompt like that it turns out

play10:12

the same L that you prompted to write

play10:14

the code may be able to spot problems

play10:17

like this bug in line five and fix it by

play10:19

blah blah blah and if you now take his

play10:21

own feedback and give it to it and

play10:22

reprompt it it may come up with a

play10:25

version two of the code that could well

play10:26

work better than the first version not

play10:28

guaranteed but it works you know often

play10:30

enough but this to be worth trying for a

play10:32

law of appli so what you usually see me

play10:34

doing in my llm test videos is for

play10:36

example let's say I say write the Game

play10:38

snake in Python and it gives me the game

play10:41

Snake it's that is zero shot I'm just

play10:44

saying write it all out in one go then I

play10:47

take it I put it in my VSS code I play

play10:50

it I get the error or I look for any

play10:52

bugs and then I paste that back in to

play10:56

the large language model to fix now

play10:58

that's essentially me acting as an agent

play11:00

and what we can do is use an agent to

play11:02

automate me so basically look at the

play11:05

code look for any potential errors and

play11:07

even agents that can run the code get

play11:11

the error and pass it back into the

play11:13

large language model now it's completely

play11:15

automated coding to foreshadow to use if

play11:18

you let it run unit tests if it fails a

play11:20

unit test then why do you fail the unit

play11:23

test have that conversation and be able

play11:24

to figure out failed the unit test so

play11:26

you should try changing something and

play11:28

come up with V3 by the way for those of

play11:31

you that want to learn more about these

play11:32

Technologies I'm very excited about them

play11:34

for each of the four sections I have a

play11:36

little recommended reading section in

play11:37

the bottom that you know hopefully gives

play11:39

more references and again just the

play11:41

foreshadow of multi-agent systems I've

play11:44

described as a single coder agent that

play11:46

you prompt to have it you know have this

play11:48

conversation with itself um one Natural

play11:51

Evolution of this idea is instead of a

play11:53

single code agent you can have two

play11:56

agents where one is a code agent and the

play11:58

second is a critic agent and these could

play12:01

be the same base LM model but they you

play12:04

prompt in different ways where you say

play12:06

one your exper coder right code the

play12:08

other one say your expert code review as

play12:10

to review this code and this type of

play12:12

workflow is actually pretty easy to

play12:13

implement I think such a very general

play12:16

purpose technology for a lot of

play12:17

workflows this will give you a

play12:18

significant boost in in the performance

play12:20

of LMS um the second design pattern is

play12:24

to use many of you will already have

play12:26

seen you know lmb systems uh uh using

play12:29

tools on the left is a screenshot from

play12:32

um co-pilot on the right is something

play12:34

that I kind of extracted from uh gbd4

play12:37

but you know LM today if you ask it

play12:39

what's the best coffee maker can do web

play12:41

search for some problems LMS will

play12:43

generate code and run codes um and it

play12:45

turns out that there are a lot of

play12:48

different tools that many different

play12:49

people are using for analysis for

play12:52

gathering information for taking action

play12:54

personal productivity um it turns out a

play12:56

lot of the early work and to use turned

play12:58

out to be in the computer vision

play13:00

Community because before large language

play13:03

models LMS you know they couldn't do

play13:05

anything with images so the only option

play13:07

was that the LM generate a function call

play13:09

that could manipulate an image like

play13:11

generate an image or do object detection

play13:13

or whatever so if you actually look at

play13:14

literature it's been interesting how

play13:16

much of the work um in two years seems

play13:19

like it originated from Vision because

play13:21

Elms would blind to images before you

play13:24

know GPD 4V and and and lava and so on

play13:27

um so that's to use in it all right so

play13:30

tool use incredibly incredibly important

play13:33

because you're basically giving the

play13:34

large language model code to use it is

play13:37

hardcoded code so you always know the

play13:40

result it's not another large language

play13:42

model that might produce something a

play13:43

little different each time this is

play13:45

hardcoded and always is going to produce

play13:48

the same output so these tools are very

play13:50

valuable and the cool thing about tools

play13:53

is we don't have to rewrite them right

play13:54

we don't have to write them from scratch

play13:56

these are tools that programmers already

play13:58

test app to use in their code so whether

play14:01

it's external libraries API calls all of

play14:04

these things can now be used by large

play14:06

language models and that is really

play14:08

exciting we're not going to have to

play14:09

rewrite all of this tooling and then

play14:12

planning you know for those of you that

play14:13

have not yet played a lot with planning

play14:15

algorithms I I feel like a lot of people

play14:17

talk about the chat GPT moment where

play14:19

you're wow never seen anything like this

play14:22

I think if not use planning alums many

play14:24

people will have a kind of a AI agent

play14:27

wow I couldn't imag imagine the AI agent

play14:30

doing this so I've run live demos where

play14:32

something failed and the AI agent

play14:34

rerouted around the failure I've

play14:36

actually had quite a few of them like

play14:38

wow you can't believe my AI system just

play14:40

did that autonomously but um one example

play14:43

that I adapted from hugging GPT paper

play14:46

you know you say this general image

play14:48

where the girls read where girl and by

play14:49

the way I made a video about hugging GPT

play14:52

it is an amazing paper I'll link that in

play14:54

the description below I was reading a

play14:56

book and it post the same as a boy in

play14:57

the image example le. jpack and please

play15:00

subcribe the new imagy re voice so give

play15:01

an example like this um today we have ai

play15:04

agents who can kind of decide first

play15:06

thing I need to do is determine the post

play15:08

of the boy um then you know find the

play15:11

right model maybe on hugging face to

play15:14

extract the post then next need to find

play15:16

a post image model to synthesize a

play15:19

picture of a of a girl of as following

play15:22

the instructions then use uh image to

play15:24

text and then finally use text to speech

play15:27

and today we actually have agents that

play15:29

I don't want to say they work reliably

play15:32

you know they're kind of finicky they

play15:34

don't always work but when it works is

play15:36

actually pretty amazing but with agentic

play15:39

Loop sometimes you can recover from

play15:40

earlier failures as well so yeah and

play15:42

that's a really important Point agents

play15:44

are a little bit finicky but since you

play15:46

can iterate and the Agents can usually

play15:49

recover from their issues that makes

play15:52

them a lot more powerful and as we

play15:54

continue to evolve agents as we get

play15:56

better agentic models better tooling

play15:58

better Frameworks like crew aai and

play16:00

autogen all of these kind of finicky

play16:03

aspects of agents are going to start to

play16:06

get reduced tremendously I find myself

play16:09

already using research agents in some of

play16:11

my work well one a piece of research but

play16:13

I don't feel like you know Googling

play16:15

myself and spend long time I should send

play16:16

to the research agent come back in a few

play16:19

minutes and see what it's come up with

play16:20

and and it it sometimes works sometimes

play16:22

doesn't right but that's already a part

play16:24

of my personal

play16:25

workflow the final design pattern multi-

play16:28

Asian collaboration ation this is one of

play16:29

those funny things but uh um it works

play16:33

much better than you might think uh uh

play16:36

but on the left is a screenshot from a

play16:38

paper called um chat Dev I made a video

play16:42

about this it'll be in the description

play16:44

below as well uh which is completely

play16:46

open which actually open source many of

play16:48

you saw the you know flashy social media

play16:50

announcement of demo of a Devon uh uh

play16:53

Chad Dev is open source it runs on my

play16:56

laptop and what Chad Dev does is example

play16:59

of a multi-agent system where you prompt

play17:02

one LM to sometimes act like the CEO of

play17:05

a software engine company sometimes act

play17:07

a designer sometime a product manager

play17:09

sometimes ACC a tester and this flock of

play17:12

agents that you buil by prompting an LM

play17:14

to tell them you're now coo you're now

play17:16

software engineer they collaborate have

play17:18

an extended conversation so that if you

play17:21

tell it please develop a game develop a

play17:24

GOI game they'll actually spend you know

play17:26

a few minutes writing code testing it

play17:29

iterating and then generate a like

play17:31

surprisingly complex programs doesn't

play17:34

always work I've used it sometimes it

play17:36

doesn't work sometimes is amazing but

play17:38

this technology is really um getting

play17:40

better and and just one of design

play17:42

pattern it turns out that multi-agent

play17:45

debate where you have different agents

play17:46

you know for example could be have ch

play17:48

GPT and Gemini debate each other that

play17:51

actually results in better performance

play17:54

as well all right so he said the

play17:55

important part right there when you have

play17:57

different agents and each of them are

play17:58

are powered by different models maybe

play18:00

even fine-tuned models fine-tuned

play18:03

specifically for their task and their

play18:06

role you get really good performance and

play18:09

that is exactly what a project like crew

play18:11

AI like autogen is made for so Gabby

play18:14

multiple simulated air agents work

play18:16

together has been a powerful design

play18:18

pattern as well um so just to summarize

play18:21

I think these are the these are the the

play18:24

the uh patterns I've seen and I think

play18:26

that if we were to um use these uh uh

play18:29

patterns you know in our work a lot of

play18:32

us can get a prity boost quite quickly

play18:35

and I think that um agentic reasoning

play18:38

design patterns are going to be

play18:39

important uh this is my small slide I

play18:42

expect that the set of task AI could do

play18:44

will expand dramatically this year uh

play18:48

because of agentic workflows and one

play18:51

thing that it's actually difficult

play18:52

people to get used to is when we prompt

play18:54

an LM we want to response right away um

play18:57

in fact a decade ago when was you know

play18:59

having discussions around at at at

play19:01

Google on um called a big box search

play19:04

type in Long prompt one of the reasons

play19:07

you know I failed to push successfully

play19:09

for that was because when you do a web

play19:11

search you one have responds back in

play19:13

half a second right that's just human

play19:14

nature we like that instant gra instant

play19:16

feedback but for a lot of the agent

play19:18

workflows um I think we'll need to learn

play19:21

to dedicate the toss and AI agent and

play19:23

patiently wait minutes maybe even hours

play19:26

uh to for response but just like us I've

play19:28

seen a lot of novice managers delegate

play19:31

something to someone and then check in

play19:32

five minutes later right and that's not

play19:34

productive um I think we need to it be

play19:37

difficult we need to do that with some

play19:38

of our AI agents as well all right so

play19:41

this is actually a point which I want to

play19:44

pose a different way of thinking about

play19:45

it think about grock grock grq you get

play19:48

500 700 850 tokens per second with grock

play19:53

with their architecture and all of a

play19:55

sudden the agents which you know you

play19:57

usually expect them to take a few

play19:59

minutes to do a semi complex task all

play20:01

the way up to 10 15 20 minutes depending

play20:03

on what the task is a lot of the time in

play20:06

that task completion is the inference

play20:09

running that is assuming you're getting

play20:11

you know 10 15 20 tokens per second with

play20:14

open AI but if you're able to get 800

play20:16

tokens per second it's essentially

play20:18

instant and a lot of people when they

play20:20

first saw grock they thought well what's

play20:23

the point of 800 tokens per second

play20:25

because humans can't read that fast this

play20:27

is the best use case for that agents

play20:29

using hyper inference speed and reading

play20:31

each other's responses is the best way

play20:34

to leverage that really fast inference

play20:37

speed humans don't actually need to read

play20:39

it so this is a perfect example so if

play20:42

all of a sudden that part of your agent

play20:44

workflow is extremely fast and then

play20:47

let's say we get an embeddings model to

play20:49

be that fast all of a sudden the slowest

play20:52

part of the entire agent workflow is

play20:55

going to be searching the web or hitting

play20:58

a third party API it's no longer going

play21:00

to be the inference and the embeddings

play21:02

and that is really exciting let's keep

play21:05

watching the end and then one other

play21:07

important Trend fast token generation is

play21:09

important because with these agentic

play21:11

workflows we're iterating over and over

play21:13

so the elm is generating tokens for the

play21:15

to read and I think that um generating

play21:18

more tokens really quickly from even a

play21:20

slightly lower quality LM might give

play21:23

good results compared to slower tokens

play21:25

from a betm maybe it's a little bit

play21:27

controversial because it may let you go

play21:29

around this Loop a lot more times kind

play21:31

of like the results I showed with gpdc

play21:33

and an agent architecture on the first

play21:35

slide um and cand I'm really looking

play21:38

forward to Cloud 5 and Cloud 4 and gb5

play21:41

and Gemini 2.0 and all these other one4

play21:43

models that many building and part of me

play21:46

feels like if you're looking forward to

play21:48

running your thing on gb5 zero shot you

play21:51

know you may be to get closer to that

play21:53

level of performance on some

play21:55

applications than you might think with

play21:57

agent reasoning um but on an early model

play22:00

I think I I I I think this is an

play22:02

important Trend uh uh and honestly the

play22:07

path to AGI feels like a journey rather

play22:10

than a destination but I think this typ

play22:12

of agent workflows could help us take a

play22:14

small step forward on this very long

play22:16

journey thank you okay so he said a lot

play22:19

of important things at the end there one

play22:21

thing he said is if you're already

play22:22

looking forward to GPT 5 clae 4 the

play22:24

basically the next generation of The

play22:25

Cutting Edge models you might be able to

play22:27

achieve

play23:28

and what's the cost of all these tokens

play23:30

and all of that I think is going to get

play23:32

sorted out as models become more and

play23:34

more commoditized so I'm super excited

play23:37

about agents I'm super excited about

play23:39

inference speed improvements and I hope

play23:41

you liked Andrew ning's talk if you

play23:42

liked this video please consider giving

play23:44

a like And subscribe and I'll see you in

play23:46

the next one

Rate This

5.0 / 5 (0 votes)

الوسوم ذات الصلة
AI WorkflowsDr. Andrew NgGPT ModelsProductivityInference SpeedAgentic AIAI FutureVenture CapitalSilicon ValleyCoding BenchmarkMulti-Agent Collaboration
هل تحتاج إلى تلخيص باللغة الإنجليزية؟