What are AI Agents?

IBM Technology
15 Jul 202412:28

Summary

TLDR2024年将是AI代理的一年。AI代理是基于生成性AI领域的变化,从单一模型向复合AI系统转变。单一模型受限于训练数据,难以适应新任务。而复合AI系统通过系统设计原则,将模型与外部工具集成,提高了解决问题的能力。AI代理进一步将大型语言模型置于控制逻辑的核心,通过推理、行动和访问记忆的能力,自主解决复杂问题。视频还介绍了ReACT模型,展示了如何配置AI代理来处理复杂查询。

Takeaways

  • 🧠 2024年将是AI代理的一年,AI代理是生成性AI领域中的重要转变之一。
  • 🔄 从单一模型向复合AI系统转变,单一模型受限于训练数据,难以适应新任务。
  • 🏖️ 举例说明,单一模型无法准确回答关于个人假期天数的问题,因为它无法访问个人敏感信息。
  • 🔍 构建系统时,将模型与现有流程结合,例如通过访问数据库来获取个人信息,可以提高准确性。
  • 🤖 复合AI系统通过系统设计原则解决特定问题,具有模块化和可扩展性。
  • 🛠️ 系统可以包含多种组件,如调整模型、输出验证器、搜索数据库等,以提高答案正确率。
  • 🔗 检索增强生成(RAG)是复合AI系统中常用的一种方法,但可能在处理非预期查询时失败。
  • 📈 大型语言模型(LLM)的推理能力提升,使得它们能够处理复杂问题并制定解决方案。
  • 🛑 代理方法通过将控制逻辑交给大型语言模型,允许系统在解决问题时进行计划和迭代。
  • 🔑 LLM代理的关键能力包括推理、行动(调用外部工具)和访问记忆(内部日志或对话历史)。
  • 🔄 ReACT模型结合了推理和行动,通过提示模型慢思考、规划工作并执行来提高问题解决的成功率。
  • 🌟 复合AI系统和代理方法代表了AI发展的新趋势,它们使系统能够处理更复杂的任务,并具有更高的自主性。

Q & A

  • 什么是AI代理?

    -AI代理是一种复合AI系统,它通过将大型语言模型置于控制逻辑的核心,使模型能够解决复杂问题,并通过外部工具执行解决方案。

  • 为什么需要从单一模型转向复合AI系统?

    -单一模型受限于其训练数据,难以适应新任务。而复合AI系统通过模块化设计,可以更灵活地集成不同组件来解决更复杂的问题。

  • 如何理解模型的局限性?

    -模型的局限性主要体现在它们只能根据训练数据来理解和解决问题,缺乏对新情境的适应能力,且调整模型需要额外的数据和资源投入。

  • 什么是检索增强生成(RAG)?

    -检索增强生成(RAG)是一种流行的复合AI系统,它结合了搜索和生成的能力,以提高问题解决的准确性和效率。

  • 为什么需要在AI系统中加入控制逻辑?

    -控制逻辑定义了程序如何回答问题的路径,有助于指导AI系统更有效地解决问题,尤其是在面对复杂查询时。

  • 如何构建一个能够解决实际问题的AI系统?

    -构建AI系统需要将模型与外部数据库、搜索工具等集成,使模型能够访问和利用这些资源来生成准确的回答。

  • 什么是大型语言模型(LLM)的代理方法?

    -代理方法是指将大型语言模型置于系统控制逻辑的核心,使其能够自主地规划和执行解决问题的步骤。

  • AI代理的三个主要能力是什么?

    -AI代理的三个主要能力包括推理能力、行动能力和访问记忆的能力,这些能力共同支持代理解决问题。

  • 什么是ReACT模型?

    -ReACT模型结合了推理(Reasoning)和行动(Acting)两个组件,使AI代理能够更有效地规划和执行解决方案。

  • 为什么说2024年将是AI代理的一年?

    -随着大型语言模型在推理能力上的显著进步,以及对系统设计的深入理解,预计2024年AI代理将在解决问题的能力和应用范围上取得重大进展。

  • 如何理解AI系统中的“记忆”组件?

    -在AI系统中,'记忆'可以指模型在解决问题时的内部日志,也可以是与人类用户交互时的对话历史,这些记忆有助于提供更个性化的体验。

Outlines

00:00

🧠 从单体模型到复合AI系统的转变

2024年被视为AI代理的元年。AI代理的解释从生成AI领域的转变开始,首先是从单体模型到复合AI系统的转变。单体模型受限于训练数据,难以适应新任务,调整成本高。通过将模型集成到现有系统中,如访问数据库以获取个人假期信息,可以构建出能够解决复杂问题的复合AI系统。这种系统设计原则认为某些问题通过系统化的方法解决更为有效。复合AI系统具有模块化特点,可以根据需求选择不同的模型和程序组件,如输出验证器、数据库搜索等,以提高答案正确率,比调整模型更快速灵活。此外,还介绍了检索增强生成(RAG)系统,这是一种流行的复合AI系统,但可能在面对非常规查询时失败,因为它依赖于预定义的控制逻辑。

05:00

🤖 代理型AI系统的兴起与能力

代理型AI系统通过将大型语言模型(LLM)置于控制系统的核心,利用其推理能力来解决问题。与传统的快速响应系统不同,代理型AI系统被设计为“慢思考”,通过计划和逐步执行来提高解决问题的成功率。代理型AI系统的关键能力包括推理、行动和访问记忆。行动能力通过调用外部工具实现,如搜索、计算或数据库操作等。记忆能力则涉及到内部日志和与用户的交互历史,以提供个性化体验。配置代理的方式之一是通过ReACT模型,它结合了推理和行动组件。例如,当用户询问关于假期的问题时,代理会根据之前的记忆和当前的查询,通过调用外部工具和迭代计划来提供准确的答案。

10:01

🔄 复合AI系统的模块化与自主性

复合AI系统因其模块化而能够解决更复杂的问题,提供了多种解决问题的路径。视频中通过一个关于假期和防晒霜的复杂问题示例,展示了系统如何通过多个步骤来解决问题,包括检索记忆、查询天气预报、查找推荐的防晒用量以及进行数学计算。这种系统可以根据问题的复杂性和定义的狭窄性来调整自主性。对于定义明确、范围狭窄的问题,程序化方法可能更有效;而对于需要独立解决GitHub问题等复杂任务的系统,则代理型AI系统更有优势。尽管我们仍处于代理系统初期,但结合系统设计与遗传行为的组合正在取得快速进展,并且在大多数情况下,人类仍然参与以提高准确性。

Mindmap

Keywords

💡AI代理

AI代理指的是人工智能领域中的一种智能系统,它能够执行任务或解决问题,通常通过集成多个组件和能力来实现。在视频中,AI代理被描述为具有推理、行动和访问记忆的能力,它们是构建复杂AI系统的关键部分。

💡生成性AI

生成性AI是一种能够创造新内容的人工智能技术,比如文本、图像或音乐。视频中提到,生成性AI领域正在经历从单一模型向复合AI系统的转变,这表明了AI技术的进步和应用的多样化。

💡复合AI系统

复合AI系统是由多个组件构成的系统,这些组件可以是不同的AI模型、程序逻辑或其他工具。视频中通过一个查询假期天数的例子,展示了如何通过集成数据库查询来增强单一AI模型的功能。

💡系统设计原则

系统设计原则是指在构建系统时所遵循的一系列规则和方法,以确保系统的高效性和可维护性。视频中提到,通过应用系统设计原则,可以更好地解决某些问题,比如通过模块化和集成不同组件来构建复合AI系统。

💡检索增强生成(RAG)

检索增强生成是一种复合AI系统,它结合了检索和生成的过程来提供更准确的答案。视频中提到RAG系统通常有固定的查询路径,但可能在面对不同查询时会失败,比如在查询天气时。

💡控制逻辑

控制逻辑是程序中用于指导如何回答问题或执行任务的一系列指令。在视频中,控制逻辑可以是预定义的程序逻辑,也可以是由大型语言模型动态生成的,这取决于AI系统的类型。

💡大型语言模型(LLM)

大型语言模型是一种复杂的AI模型,能够处理和生成自然语言文本。视频中强调了LLM在提高AI代理的推理和问题解决能力方面的重要性,它们可以被用来制定解决问题的计划。

💡工具

在AI代理的上下文中,工具指的是可以被模型调用的外部程序或API,以执行特定的任务,比如搜索、计算或数据操作。视频中提到,工具的使用是AI代理行动能力的一部分。

💡记忆访问

记忆访问指的是AI代理能够存储和检索信息的能力,这可以是模型的内部日志,也可以是与人类用户交互的历史记录。视频中提到,记忆访问可以提高AI代理的个性化体验。

💡ReACT

ReACT是一种配置AI代理的方法,它结合了推理和行动的组件。视频中通过一个具体的例子,说明了ReACT代理如何处理用户查询,包括规划工作、执行行动以及观察和迭代以得到最终答案。

💡AI自主性

AI自主性是指AI系统在没有人类干预的情况下执行任务的能力。视频中讨论了在设计AI系统时,需要考虑的自主性水平,以及如何根据不同的问题类型选择适当的控制逻辑。

Highlights

2024年将是AI代理的一年。

AI代理的解释需要从生成性AI领域的变化开始。

从单一模型向复合AI系统的转变。

模型的局限性在于它们仅受限于训练数据。

模型难以适应,需要数据和资源的投入。

通过具体例子说明模型的局限性:规划假期。

模型能够执行的任务包括文档摘要、草拟邮件和报告。

构建系统可以解锁模型的潜力。

复合AI系统通过系统设计原则解决特定问题。

系统由多个模块化组件组成,包括模型和程序组件。

检索增强生成(RAG)是常用的复合AI系统之一。

控制逻辑是程序回答查询的路径。

代理是通过将大型语言模型置于控制逻辑中实现的。

大型语言模型在推理能力上的显著提升。

代理的三个能力:推理、行动和访问记忆。

工具是代理可以调用的外部程序,以执行解决方案。

ReACT模型结合了推理和行动组件。

配置REACT代理时,模型会根据提示进行慢思考和规划。

代理系统可以处理更复杂的任务,如解决GitHub问题。

AI自主性的滑动尺度,根据问题的性质选择适当的自主性。

复合AI系统将变得更加代理化,结合系统设计和遗传行为。

Transcripts

play00:00

2024 will be the year of AI agents.

play00:04

So what are AI agents?

play00:05

And to start explaining that,

play00:07

we have to look at the various shifts that  we're seeing in the field of generative AI.

play00:10

And the first shift I would like to talk  to you about

play00:13

is this move from monolithic models to compound AI systems.

play00:26

So models on their own are limited by the data they've been trained on.

play00:31

So that impacts what they know about the world

play00:34

and what sort of tasks they can solve.

play00:40

They are also hard to adapt.

play00:42

So you could tune a model, but it would take  an investment in data,

play00:46

and in resources.

play00:51

So let's take a concrete example  to illustrate this point.

play00:55

I want to plan a vacation for this summer,

play00:58

and I want to know how many vacation days are at my disposal.

play01:06

What I can do is take my query,

play01:10

feed that into a model that can generate a response.

play01:19

I think we can all expect that this answer will be incorrect,

play01:23

because the model doesn't know who I am

play01:26

and does not have access  to this sensitive information about me.

play01:30

So models on their own could be useful for a  number of tasks, as we've seen in other videos. 

play01:35

So they can help with summarizing documents,

play01:38

they can help me with creating first drafts for emails

play01:41

and different reports I'm trying to do.

play01:43

But the magic gets unlocked when I start building systems

play01:47

around the model and actually take the model and  integrate them into the existing processes I have.

play01:52

So if we were to design a system to solve this,

play01:56

I would have to give the model access to the  database where my vacation data is stored.

play02:03

So that same query would get  fed into the language model. 

play02:07

The difference now is the model would  be prompted to create a search query,  

play02:13

and that would be a search query that  can go into the database that I have. 

play02:18

So that would go and fetch the information  from the database, output an answer,  

play02:23

and then that would go back into the  model that can generate a sentence

play02:28

to answer, so, "Maya, you have ten days  left in your vacation database."

play02:33

So the answer that I would get here would be correct. 

play02:42

This is an example of a compound AI system,

play02:45

and it recognizes that certain problems are better solved

play02:48

when you apply the principles of system design.

play02:55

So what does that mean?

play02:58

By the term "system", you can understand there's multiple components.

play03:02

So systems are inherently modular.

play03:04

I can have a model, I can choose between tuned models,

play03:08

large language models, image generation models,

play03:11

but also I have programmatic components that can come around it.

play03:15

So I can have output verifiers.

play03:18

I can have programs that can that can take  a query and then break it down

play03:21

to increase the chances of the answer being correct.

play03:25

I can combine that with searching databases.

play03:27

I can combine that with different tools.

play03:30

So when we talking about a system approaches,

play03:33

I can break down what I desire my program to do

play03:36

and pick the right components to be able to solve that.

play03:40

And this is inherently easier to solve for than tuning a model.

play03:45

So that makes this much faster and quicker to adapt.

play03:54

Okay, so the example I use below,

play03:58

is an example of a compound AI system.

play04:00

You also might be popular with retrieval augmented generation (RAG),

play04:05

which is one of the most popular  and commonly used compound AI systems out there.

play04:11

Most RAG systems and the example I  use below are defined in a certain way. 

play04:18

So if I bring a very different query, let's  ask about the weather in this example here. 

play04:23

It's going to fail because this the path  that this program has to follow

play04:28

is to always search my vacation policy database.

play04:32

And that has nothing to do with the weather.

play04:34

So when we say the path to answer a query,

play04:37

we are talking about something called  the control logic of a program.

play04:43

So compound AI systems, we said   most of them have programmatic control logic.

play04:49

So that was something that I defined myself as the human.

play04:55

Now let's talk about, where do agents come in?

play05:00

One other way of controlling the logic  of a compound AI system

play05:04

is to put a large language model in charge,

play05:07

and this is only possible because   we're seeing tremendous improvements

play05:11

in the capabilities of reasoning   of large language models.

play05:15

So large language models, you  can feed them complex problems

play05:18

and you can prompt them to break them down  and come up with a plan on how to tackle it.

play05:23

Another way to think about it is,

play05:25

on one end of the spectrum,  I'm telling my system to think fast,

play05:30

act as programmed, and don't deviate  from the instructions I've given you.

play05:34

And on the other end of the spectrum,

play05:36

you're designing your system to think slow.

play05:40

So, create a plan, attack each part of the plan,

play05:44

see where you get stuck, see if you need to readjust the plan.

play05:47

So I might give you a complex question,

play05:49

and if you would just give me the  first answer that pops into your head,

play05:53

very likely the answer might be wrong,

play05:55

but you have higher chances of success  if you break it down,

play05:59

understand where you need external help to  solve some parts of the problem,

play06:02

and maybe take an afternoon to solve it.

play06:05

And when we put a LLMs in charge of the logic,

play06:08

this is when we're talking  about an agentic approach.

play06:13

So let's break down the components of LLM agents.

play06:19

The first capability is the ability to reason, which we talked about.

play06:24

So this is putting the model at the core of how problems are being solved.

play06:29

The model will be prompted to come up with a plan  and to reason about each step of the process along the way.

play06:35

Another capability of agents is the ability to act.

play06:39

And this is done by external programs  that are known in the industry as tools.

play06:45

So tools are external pieces of the program,

play06:48

and the model can define when to call them  and how to call them

play06:52

in order to best execute the  solution to the question they've been asked.

play06:56

So an example of a tool can be search,

play06:59

searching the web, searching a database at their disposal.

play07:03

Another example can be a  calculator to do some math. 

play07:08

This could be a piece of program code  that maybe might manipulate the database. 

play07:13

This can also be another language model that  maybe you're trying to do a translation task,  

play07:18

and you want a model that can be able to do that.

play07:21

And there's so many other possibilities of what can do here.

play07:23

So these can be APIs.

play07:25

Basically any piece of external program  you want to give your model access to. 

play07:30

Third capability, that is  the ability to access memory. 

play07:35

And the term "memory" can mean a couple of things.

play07:37

So we talked about the models thinking through the program

play07:41

kind of how you think out loud  when you're trying to solve through a problem.

play07:45

So those inner logs can be stored and can be  useful to retrieve at different points in time. 

play07:51

But also this could be the history of  conversations that you as a human had  

play07:56

when interacting with the agent.

play07:57

And that would allow to make the experience   much more personalized.

play08:01

So the way of configuring agents,   there's many are ways to approach it.

play08:05

One of the more most popular ways of going about it is through something called ReACT,

play08:11

which, as you can tell by the name,

play08:13

combines the reasoning and act components of LLM agents.

play08:18

So let's make this very concrete.

play08:21

What happens when I configure a REACT agent?

play08:23

You have your user query that gets fed into a model. So an alarm the alarm is given a prompt. 

play08:31

So the instructions that's given is don't  give me the first answer that pops to you. 

play08:37

Think slow planning your work. And then try to execute something. 

play08:44

Tried to act. And when you want to act, you can define whether. 

play08:49

If you want to use external tools to  help you come up with the solution. 

play08:53

Once you get you call a  tool and you get an answer. 

play08:56

Maybe it gave you the wrong answer  or it came up with an error. 

play09:00

You can observe that. So the alarm would observe. 

play09:02

The answer would determine if it does answer the  question at hand, or whether it needs to iterate  

play09:08

on the plan and tackle it differently. Up until I get to a final answer. 

play09:17

So let's go back and make  this very concrete again. 

play09:20

Let's talk about my vacation example. And as you can tell, I'm really excited  

play09:25

to go on one, so I want to take  the rest of my vacation days. 

play09:29

I'm planning to go on to Florida next month. 

play09:32

I'm planning on being outdoors  a lot and I'm prone to burning. 

play09:35

So I want to know what is the number of two ounce  sunscreen bottles that I should bring with me? 

play09:43

And this is a complex problem. So there's a first thing. 

play09:45

There's a number of things to plan. One is how many vacation days  

play09:49

are my planning to take? And maybe that is information  

play09:52

the system can retrieve from its memory. Because I asked that question before. 

play09:56

Two is how many hours do I plan to be in the sun? I said, I plan to be in there a lot,  

play10:01

so maybe that would mean looking into the weather  forecast, for next month in Florida and seeing  

play10:06

what is the average sun hours that are expected. Three is trying maybe going to a public health  

play10:13

website to understand what is the recommended  dosage of sunscreen per hour in the sun. 

play10:17

And then for doing some math, to be able  to determine how much of that sunscreen  

play10:22

fits into two ounce bottles. So that's quite complicated. 

play10:25

But what's really powerful here is  there's so many paths that can be  

play10:29

explored in order to solve a problem. So this makes the system quite modular. 

play10:33

And I can hit it with much more complex problems. So going back to the concept of compound AI  

play10:40

systems, compound AI systems are here to stay. What we're going to observe this year is that  

play10:44

they're going to become more agent tech. The way I like to think about it is  

play10:49

you have a sliding scale of AI autonomy. And you would the person defining the system  

play11:02

would examine what trade offs they want in terms  of autonomy in the system for certain problems,  

play11:09

especially problems that are narrow, well-defined. So you don't expect someone to ask them about the  

play11:14

weather when they need to ask about vacations. So a narrow problem set. 

play11:19

You can define a narrow system like this one. It's more efficient to go the programmatic  

play11:24

route because every single query  will be answered the same way. 

play11:27

If I were to apply the genetic approach here, there might be unnecessarily  

play11:32

looping and iteration. So for narrow problems, pragmatic approach can  

play11:36

be more efficient than going the generic route. But if I expect to have a system, accomplish very  

play11:43

complex tasks like, say, trying to solve  GitHub issues independently, and handle  

play11:50

a variety of queries, a spectrum of queries. This is where an agent de Groot can be helpful,  

play11:54

because it would take you too much effort to  configure every single path in the system. 

play11:59

And we're still in the early days of agent systems. 

play12:02

We're seeing rapid progress when you combine the  effects of system design with a genetic behavior. 

play12:08

And of course, you will have a human in the  loop in most cases as the accuracy is improving. 

play12:13

I hope you found this video very useful, and  please subscribe to the channel to learn more.

Rate This

5.0 / 5 (0 votes)

相关标签
人工智能AI代理系统设计生成AI语言模型复杂问题模块化工具集成记忆访问自主性ReACT模型