What are AI Agents?
Summary
TLDR2024年将是AI代理的一年。AI代理是基于生成性AI领域的变化,从单一模型向复合AI系统转变。单一模型受限于训练数据,难以适应新任务。而复合AI系统通过系统设计原则,将模型与外部工具集成,提高了解决问题的能力。AI代理进一步将大型语言模型置于控制逻辑的核心,通过推理、行动和访问记忆的能力,自主解决复杂问题。视频还介绍了ReACT模型,展示了如何配置AI代理来处理复杂查询。
Takeaways
- 🧠 2024年将是AI代理的一年,AI代理是生成性AI领域中的重要转变之一。
- 🔄 从单一模型向复合AI系统转变,单一模型受限于训练数据,难以适应新任务。
- 🏖️ 举例说明,单一模型无法准确回答关于个人假期天数的问题,因为它无法访问个人敏感信息。
- 🔍 构建系统时,将模型与现有流程结合,例如通过访问数据库来获取个人信息,可以提高准确性。
- 🤖 复合AI系统通过系统设计原则解决特定问题,具有模块化和可扩展性。
- 🛠️ 系统可以包含多种组件,如调整模型、输出验证器、搜索数据库等,以提高答案正确率。
- 🔗 检索增强生成(RAG)是复合AI系统中常用的一种方法,但可能在处理非预期查询时失败。
- 📈 大型语言模型(LLM)的推理能力提升,使得它们能够处理复杂问题并制定解决方案。
- 🛑 代理方法通过将控制逻辑交给大型语言模型,允许系统在解决问题时进行计划和迭代。
- 🔑 LLM代理的关键能力包括推理、行动(调用外部工具)和访问记忆(内部日志或对话历史)。
- 🔄 ReACT模型结合了推理和行动,通过提示模型慢思考、规划工作并执行来提高问题解决的成功率。
- 🌟 复合AI系统和代理方法代表了AI发展的新趋势,它们使系统能够处理更复杂的任务,并具有更高的自主性。
Q & A
什么是AI代理?
-AI代理是一种复合AI系统,它通过将大型语言模型置于控制逻辑的核心,使模型能够解决复杂问题,并通过外部工具执行解决方案。
为什么需要从单一模型转向复合AI系统?
-单一模型受限于其训练数据,难以适应新任务。而复合AI系统通过模块化设计,可以更灵活地集成不同组件来解决更复杂的问题。
如何理解模型的局限性?
-模型的局限性主要体现在它们只能根据训练数据来理解和解决问题,缺乏对新情境的适应能力,且调整模型需要额外的数据和资源投入。
什么是检索增强生成(RAG)?
-检索增强生成(RAG)是一种流行的复合AI系统,它结合了搜索和生成的能力,以提高问题解决的准确性和效率。
为什么需要在AI系统中加入控制逻辑?
-控制逻辑定义了程序如何回答问题的路径,有助于指导AI系统更有效地解决问题,尤其是在面对复杂查询时。
如何构建一个能够解决实际问题的AI系统?
-构建AI系统需要将模型与外部数据库、搜索工具等集成,使模型能够访问和利用这些资源来生成准确的回答。
什么是大型语言模型(LLM)的代理方法?
-代理方法是指将大型语言模型置于系统控制逻辑的核心,使其能够自主地规划和执行解决问题的步骤。
AI代理的三个主要能力是什么?
-AI代理的三个主要能力包括推理能力、行动能力和访问记忆的能力,这些能力共同支持代理解决问题。
什么是ReACT模型?
-ReACT模型结合了推理(Reasoning)和行动(Acting)两个组件,使AI代理能够更有效地规划和执行解决方案。
为什么说2024年将是AI代理的一年?
-随着大型语言模型在推理能力上的显著进步,以及对系统设计的深入理解,预计2024年AI代理将在解决问题的能力和应用范围上取得重大进展。
如何理解AI系统中的“记忆”组件?
-在AI系统中,'记忆'可以指模型在解决问题时的内部日志,也可以是与人类用户交互时的对话历史,这些记忆有助于提供更个性化的体验。
Outlines
🧠 从单体模型到复合AI系统的转变
2024年被视为AI代理的元年。AI代理的解释从生成AI领域的转变开始,首先是从单体模型到复合AI系统的转变。单体模型受限于训练数据,难以适应新任务,调整成本高。通过将模型集成到现有系统中,如访问数据库以获取个人假期信息,可以构建出能够解决复杂问题的复合AI系统。这种系统设计原则认为某些问题通过系统化的方法解决更为有效。复合AI系统具有模块化特点,可以根据需求选择不同的模型和程序组件,如输出验证器、数据库搜索等,以提高答案正确率,比调整模型更快速灵活。此外,还介绍了检索增强生成(RAG)系统,这是一种流行的复合AI系统,但可能在面对非常规查询时失败,因为它依赖于预定义的控制逻辑。
🤖 代理型AI系统的兴起与能力
代理型AI系统通过将大型语言模型(LLM)置于控制系统的核心,利用其推理能力来解决问题。与传统的快速响应系统不同,代理型AI系统被设计为“慢思考”,通过计划和逐步执行来提高解决问题的成功率。代理型AI系统的关键能力包括推理、行动和访问记忆。行动能力通过调用外部工具实现,如搜索、计算或数据库操作等。记忆能力则涉及到内部日志和与用户的交互历史,以提供个性化体验。配置代理的方式之一是通过ReACT模型,它结合了推理和行动组件。例如,当用户询问关于假期的问题时,代理会根据之前的记忆和当前的查询,通过调用外部工具和迭代计划来提供准确的答案。
🔄 复合AI系统的模块化与自主性
复合AI系统因其模块化而能够解决更复杂的问题,提供了多种解决问题的路径。视频中通过一个关于假期和防晒霜的复杂问题示例,展示了系统如何通过多个步骤来解决问题,包括检索记忆、查询天气预报、查找推荐的防晒用量以及进行数学计算。这种系统可以根据问题的复杂性和定义的狭窄性来调整自主性。对于定义明确、范围狭窄的问题,程序化方法可能更有效;而对于需要独立解决GitHub问题等复杂任务的系统,则代理型AI系统更有优势。尽管我们仍处于代理系统初期,但结合系统设计与遗传行为的组合正在取得快速进展,并且在大多数情况下,人类仍然参与以提高准确性。
Mindmap
Keywords
💡AI代理
💡生成性AI
💡复合AI系统
💡系统设计原则
💡检索增强生成(RAG)
💡控制逻辑
💡大型语言模型(LLM)
💡工具
💡记忆访问
💡ReACT
💡AI自主性
Highlights
2024年将是AI代理的一年。
AI代理的解释需要从生成性AI领域的变化开始。
从单一模型向复合AI系统的转变。
模型的局限性在于它们仅受限于训练数据。
模型难以适应,需要数据和资源的投入。
通过具体例子说明模型的局限性:规划假期。
模型能够执行的任务包括文档摘要、草拟邮件和报告。
构建系统可以解锁模型的潜力。
复合AI系统通过系统设计原则解决特定问题。
系统由多个模块化组件组成,包括模型和程序组件。
检索增强生成(RAG)是常用的复合AI系统之一。
控制逻辑是程序回答查询的路径。
代理是通过将大型语言模型置于控制逻辑中实现的。
大型语言模型在推理能力上的显著提升。
代理的三个能力:推理、行动和访问记忆。
工具是代理可以调用的外部程序,以执行解决方案。
ReACT模型结合了推理和行动组件。
配置REACT代理时,模型会根据提示进行慢思考和规划。
代理系统可以处理更复杂的任务,如解决GitHub问题。
AI自主性的滑动尺度,根据问题的性质选择适当的自主性。
复合AI系统将变得更加代理化,结合系统设计和遗传行为。
Transcripts
2024 will be the year of AI agents.
So what are AI agents?
And to start explaining that,
we have to look at the various shifts that we're seeing in the field of generative AI.
And the first shift I would like to talk to you about
is this move from monolithic models to compound AI systems.
So models on their own are limited by the data they've been trained on.
So that impacts what they know about the world
and what sort of tasks they can solve.
They are also hard to adapt.
So you could tune a model, but it would take an investment in data,
and in resources.
So let's take a concrete example to illustrate this point.
I want to plan a vacation for this summer,
and I want to know how many vacation days are at my disposal.
What I can do is take my query,
feed that into a model that can generate a response.
I think we can all expect that this answer will be incorrect,
because the model doesn't know who I am
and does not have access to this sensitive information about me.
So models on their own could be useful for a number of tasks, as we've seen in other videos.
So they can help with summarizing documents,
they can help me with creating first drafts for emails
and different reports I'm trying to do.
But the magic gets unlocked when I start building systems
around the model and actually take the model and integrate them into the existing processes I have.
So if we were to design a system to solve this,
I would have to give the model access to the database where my vacation data is stored.
So that same query would get fed into the language model.
The difference now is the model would be prompted to create a search query,
and that would be a search query that can go into the database that I have.
So that would go and fetch the information from the database, output an answer,
and then that would go back into the model that can generate a sentence
to answer, so, "Maya, you have ten days left in your vacation database."
So the answer that I would get here would be correct.
This is an example of a compound AI system,
and it recognizes that certain problems are better solved
when you apply the principles of system design.
So what does that mean?
By the term "system", you can understand there's multiple components.
So systems are inherently modular.
I can have a model, I can choose between tuned models,
large language models, image generation models,
but also I have programmatic components that can come around it.
So I can have output verifiers.
I can have programs that can that can take a query and then break it down
to increase the chances of the answer being correct.
I can combine that with searching databases.
I can combine that with different tools.
So when we talking about a system approaches,
I can break down what I desire my program to do
and pick the right components to be able to solve that.
And this is inherently easier to solve for than tuning a model.
So that makes this much faster and quicker to adapt.
Okay, so the example I use below,
is an example of a compound AI system.
You also might be popular with retrieval augmented generation (RAG),
which is one of the most popular and commonly used compound AI systems out there.
Most RAG systems and the example I use below are defined in a certain way.
So if I bring a very different query, let's ask about the weather in this example here.
It's going to fail because this the path that this program has to follow
is to always search my vacation policy database.
And that has nothing to do with the weather.
So when we say the path to answer a query,
we are talking about something called the control logic of a program.
So compound AI systems, we said most of them have programmatic control logic.
So that was something that I defined myself as the human.
Now let's talk about, where do agents come in?
One other way of controlling the logic of a compound AI system
is to put a large language model in charge,
and this is only possible because we're seeing tremendous improvements
in the capabilities of reasoning of large language models.
So large language models, you can feed them complex problems
and you can prompt them to break them down and come up with a plan on how to tackle it.
Another way to think about it is,
on one end of the spectrum, I'm telling my system to think fast,
act as programmed, and don't deviate from the instructions I've given you.
And on the other end of the spectrum,
you're designing your system to think slow.
So, create a plan, attack each part of the plan,
see where you get stuck, see if you need to readjust the plan.
So I might give you a complex question,
and if you would just give me the first answer that pops into your head,
very likely the answer might be wrong,
but you have higher chances of success if you break it down,
understand where you need external help to solve some parts of the problem,
and maybe take an afternoon to solve it.
And when we put a LLMs in charge of the logic,
this is when we're talking about an agentic approach.
So let's break down the components of LLM agents.
The first capability is the ability to reason, which we talked about.
So this is putting the model at the core of how problems are being solved.
The model will be prompted to come up with a plan and to reason about each step of the process along the way.
Another capability of agents is the ability to act.
And this is done by external programs that are known in the industry as tools.
So tools are external pieces of the program,
and the model can define when to call them and how to call them
in order to best execute the solution to the question they've been asked.
So an example of a tool can be search,
searching the web, searching a database at their disposal.
Another example can be a calculator to do some math.
This could be a piece of program code that maybe might manipulate the database.
This can also be another language model that maybe you're trying to do a translation task,
and you want a model that can be able to do that.
And there's so many other possibilities of what can do here.
So these can be APIs.
Basically any piece of external program you want to give your model access to.
Third capability, that is the ability to access memory.
And the term "memory" can mean a couple of things.
So we talked about the models thinking through the program
kind of how you think out loud when you're trying to solve through a problem.
So those inner logs can be stored and can be useful to retrieve at different points in time.
But also this could be the history of conversations that you as a human had
when interacting with the agent.
And that would allow to make the experience much more personalized.
So the way of configuring agents, there's many are ways to approach it.
One of the more most popular ways of going about it is through something called ReACT,
which, as you can tell by the name,
combines the reasoning and act components of LLM agents.
So let's make this very concrete.
What happens when I configure a REACT agent?
You have your user query that gets fed into a model. So an alarm the alarm is given a prompt.
So the instructions that's given is don't give me the first answer that pops to you.
Think slow planning your work. And then try to execute something.
Tried to act. And when you want to act, you can define whether.
If you want to use external tools to help you come up with the solution.
Once you get you call a tool and you get an answer.
Maybe it gave you the wrong answer or it came up with an error.
You can observe that. So the alarm would observe.
The answer would determine if it does answer the question at hand, or whether it needs to iterate
on the plan and tackle it differently. Up until I get to a final answer.
So let's go back and make this very concrete again.
Let's talk about my vacation example. And as you can tell, I'm really excited
to go on one, so I want to take the rest of my vacation days.
I'm planning to go on to Florida next month.
I'm planning on being outdoors a lot and I'm prone to burning.
So I want to know what is the number of two ounce sunscreen bottles that I should bring with me?
And this is a complex problem. So there's a first thing.
There's a number of things to plan. One is how many vacation days
are my planning to take? And maybe that is information
the system can retrieve from its memory. Because I asked that question before.
Two is how many hours do I plan to be in the sun? I said, I plan to be in there a lot,
so maybe that would mean looking into the weather forecast, for next month in Florida and seeing
what is the average sun hours that are expected. Three is trying maybe going to a public health
website to understand what is the recommended dosage of sunscreen per hour in the sun.
And then for doing some math, to be able to determine how much of that sunscreen
fits into two ounce bottles. So that's quite complicated.
But what's really powerful here is there's so many paths that can be
explored in order to solve a problem. So this makes the system quite modular.
And I can hit it with much more complex problems. So going back to the concept of compound AI
systems, compound AI systems are here to stay. What we're going to observe this year is that
they're going to become more agent tech. The way I like to think about it is
you have a sliding scale of AI autonomy. And you would the person defining the system
would examine what trade offs they want in terms of autonomy in the system for certain problems,
especially problems that are narrow, well-defined. So you don't expect someone to ask them about the
weather when they need to ask about vacations. So a narrow problem set.
You can define a narrow system like this one. It's more efficient to go the programmatic
route because every single query will be answered the same way.
If I were to apply the genetic approach here, there might be unnecessarily
looping and iteration. So for narrow problems, pragmatic approach can
be more efficient than going the generic route. But if I expect to have a system, accomplish very
complex tasks like, say, trying to solve GitHub issues independently, and handle
a variety of queries, a spectrum of queries. This is where an agent de Groot can be helpful,
because it would take you too much effort to configure every single path in the system.
And we're still in the early days of agent systems.
We're seeing rapid progress when you combine the effects of system design with a genetic behavior.
And of course, you will have a human in the loop in most cases as the accuracy is improving.
I hope you found this video very useful, and please subscribe to the channel to learn more.
浏览更多相关视频
Understand DSPy: Programming AI Pipelines
Artificial Intelligence Explained Simply in 1 Minute! ✨
【生成式AI導論 2024】第4講:訓練不了人工智慧?你可以訓練你自己 (中) — 拆解問題與使用工具
【生成式AI導論 2024】第9講:以大型語言模型打造的AI Agent (14:50 教你怎麼打造芙莉蓮一級魔法使考試中出現的泥人哥列姆)
Python Advanced AI Agent Tutorial - LlamaIndex, Ollama and Multi-LLM!
AI Boom Vs. Internet Boom
5.0 / 5 (0 votes)