Agentic AI: The Future is here?
Summary
TLDR本视频深入探讨了'代理性AI'(即agentic AI)的概念,指出这一术语常被作为营销炒作。视频首先介绍了代理性AI应具备的特质,如自主性、意图性、自我决定和责任感。随后,通过比较分析大型语言模型(LLMs)和真正的代理性系统,明确了LLMs尽管功能强大,但并不具备真正的代理性。视频还讨论了功能调用(function calling)与代理性的区别,并提出未来AI发展可能的方向,包括利用世界知识模型进行规划。最后,视频呼吁观众区分科学术语和营销术语,鼓励关注真正推动AI发展的科学要素。
Takeaways
- 🧠 LLM(大型语言模型)和RAG(检索增强生成模型)目前并不具备代理性(agentic),即它们没有自主性、自我意识和预定的操作参数。
- 🔍 代理性(agency)涉及独立行动、意向性、自我决定和责任感,通常与生物体相关联,而AI代理则是在编程约束下自主操作的计算系统。
- 🤖 AI代理通过传感器感知环境、进行推理并执行任务以实现预定义的目标,而代理性则指一个实体具有独立行动、意向性和自我决定的固有能力。
- 📚 功能调用(function calling)是LLM的一种能力,允许它们访问外部资源和执行超出文本限制的任务,但这并不等同于代理性。
- 🛠️ RAG系统通过信息检索模块和大型语言模型增强了事实准确性和相关性,但它仍然缺乏代理性。
- 🔑 规划(planning)是实现代理性AI系统的关键功能,涉及目标设定、策略发展、行动排序和执行。
- 🚀 未来的AI系统可能需要超越LLM作为逻辑中心,通过世界知识模型进行规划,这可能涉及到科学法则和自然法则。
- 🔄 目前AI研究正在探索的领域包括如何将LLM中的口头世界知识转化为状态知识,以构建更准确的世界模型。
- 📈 规划功能的提升将增加AI系统的效率、效果、适应性,并减少错误,这在逻辑、医疗、股票交易和制造业等领域尤为重要。
- 🔗 功能调用与规划之间存在明显区别,前者是预定义的、结构化的,而后者则涉及动态的、适应性的决策过程。
- 🎯 尽管目前AI系统尚未达到代理性,但了解它们的局限性有助于我们聚焦于未来发展的关键要素。
Q & A
什么是代理性(Agnetic)AI,它与AI代理有何不同?
-代理性AI指的是具有独立行动、自主决策和对世界施加意志的能力的实体,涉及意图性、自主性和责任感。而AI代理是一个设计用来感知环境、推理并执行动作以实现特定目标的计算系统,它在预定义规则和学习数据的基础上自主操作,不直接涉及人类干预。
大型语言模型(LLMs)是否具有代理性?
-根据脚本中的讨论,LLMs虽然功能强大,但它们并不具备自主性、固有目标或动机,也没有自我效能感,因此不被认为具有代理性。
什么是功能调用(Function Calling),它与代理性有何关联?
-功能调用是一种能力,允许模型调用外部函数或API,以与环境交互、接收新信息或执行特定任务。虽然这增强了LLMs的能力,但它并不等同于代理性,因为功能调用并不涉及独立的目标形成、行动选择和自我监控。
RAG系统是什么,它与标准功能调用AI系统有何不同?
-RAG系统是一个结合了信息检索模块和大型语言模型的系统。与标准功能调用AI系统相比,RAG专注于检索信息,而不是执行更广泛的外部操作或与物理世界互动。
为什么规划(planning)在构建真正智能的AI代理中如此重要?
-规划是智能代理设定目标、制定策略、安排行动序列并执行的能力。它对于提高AI系统的效率、效果、适应性,并减少错误至关重要。
如何理解LLMs与世界模型(Word Model)在规划中的作用?
-LLMs提供了丰富的语言和世界知识,但缺乏执行这些知识的能力。世界模型则基于科学法则和自然法则,提供了一种可能的方式来构建一个能够规划和执行复杂任务的智能系统。
目前AI研究中,关于代理性AI系统的规划有哪些新的发展?
-最新的研究正在探索如何将LLMs的知识和世界模型结合起来,创建一个能够进行复杂规划和决策的系统。这包括使用图结构来表示世界状态和可能的行动路径,以及集成实时感官数据来构建虚拟世界模型。
为什么说目前市场上的一些所谓的'代理性策略'实际上是市场营销术语,而非科学术语?
-一些所谓的'代理性策略',如在Llama指数中提到的,实际上只是使用了LLMs进行决策制定,并没有涉及到真正的代理性行为,如自主性、意图性、自我决定和责任感。因此,这些术语更多是市场营销用语,而非科学上对代理性的准确描述。
AI系统在执行任务时,如何从功能调用转变为更高级的规划能力?
-AI系统可以通过集成更复杂的世界模型和增强的推理能力,从简单的功能调用转变为能够进行高级规划的系统。这可能包括自编码新功能、动态集成新功能以及与环境的实时交互。
目前的AI系统在规划和决策方面存在哪些限制?
-当前的AI系统,特别是基于LLMs的系统,缺乏真正的自主性、意图性、自我决定能力和责任感。它们在适应新任务、学习新情境以及自我监控和调整方面的能力有限。
为什么理解和规划在AI的发展中如此关键?
-理解和规划是实现真正智能代理的关键因素。它们使AI系统能够设定目标、制定策略、安排行动并执行,这对于解决复杂问题和适应不断变化的环境至关重要。
Outlines
🤖 探索代理性AI与AI代理的区别
本段讨论了代理性AI(agentic AI)的概念,以及它与AI代理(AI agent)的不同之处。代理性是指实体独立行动、自主决策并影响世界的能力,涉及自主性、意图性、自我决定和责任感。AI代理则是设计用来感知环境、基于算法和数据进行推理并执行任务以达成特定目标的计算系统。文中提到了大型语言模型(LLM)和代理性策略,并通过询问多个AI模型来探讨它们是否具有代理性。
🧐 深入理解LLMs的代理性
在这段中,通过询问不同的大型语言模型(LLMs),探讨了LLMs是否具有代理性。Gemini 1.5 Pro和Llamas 3等模型给出了它们的观点。Gemini 1.5 Pro认为代理性涉及意识水平和内部体验,而Llamas 3明确指出LLMs缺乏自主性、固有目标或动机,因此不能被视为具有代理性的系统。此外,还讨论了LLMs如何通过调用外部函数或API来执行特定任务,但这并不等同于真正的代理性。
🔍 功能调用与代理性的区别
本段深入讨论了功能调用(function calling)与代理性之间的差异。功能调用是LLMs的一个强大工具,允许它们访问外部资源并执行超出文本限制的任务,但这并不等同于代理性。Gemini 1.5 Pro解释了真正的代理性涉及独立行动、目标导向的行为、自我监控和适应性规划。而LLMs缺乏意识、自我保护和内在动机,因此不能被视为真正的代理性系统。
🛠️ 功能调用的具体实现
这段内容详细介绍了如何在LLMs中实现功能调用。首先需要定义函数,然后指定函数模式(通常使用JSON模式),接着使用这个模式提示LLM,最后根据LLM生成的函数调用来执行函数。文中还提供了代码示例,展示了如何定义函数模式、如何使用工具命令以及如何执行函数调用。
📚 探索RAG系统与代理性
本段探讨了RAG(Retrieval-Augmented Generation)系统,这是一种结合了信息检索模块和大型语言模型的系统。尽管RAG能够提高事实准确性和相关性,但它并不具备代理性,因为它缺乏自主性、自我意识和预定的操作参数。文中询问了多个LLMs,它们一致认为RAG系统不是代理性的。
🤔 规划在代理性AI中的作用
这段内容讨论了规划在实现代理性AI中的重要性。规划涉及目标设定、战略发展、行动排序和执行。文中提到,当前的LLMs和RAG系统都不具有代理性,但规划可能是实现代理性AI的关键功能。此外,还提出了一个观点,即未来的AI系统可能不仅仅依赖LLMs进行规划,而可能依赖于基于科学法则的世界模型。
🚀 未来AI系统的发展方向
本段提出了未来AI系统可能的发展方向,强调了规划能力的重要性。文中提到,未来的AI系统可能需要能够自我编码新功能,并将其整合到现有功能集中,这可能涉及到自我学习的能力。尽管当前的AI系统还未能实现这一点,但这种能力可能会显著改变AI的发展。
🔗 当前AI系统与理想状态的差距
这段内容讨论了当前AI系统与理想中的代理性AI系统之间的差距。文中通过比较现有的AI系统结构和业务财务系统,指出了AI系统的局限性。同时,提出了一个观点,即AI的逻辑应用应该根据任务的性质来决定,而不是遵循预定义的操作流程。
📈 规划与功能调用之间的差异
本段详细区分了规划和功能调用之间的差异。规划是一个动态的过程,涉及目标设定、战略制定、行动排序和执行,而功能调用则是基于预定义的函数和结构。文中强调,尽管LLMs可以学习何时以及如何调用特定函数,但它们缺乏真正的适应性和智能,因为它们不能即兴创造新的行动。
🎯 规划能力的提升与未来展望
这段内容探讨了如何提升LLMs的规划能力,并对未来的AI研究提出了展望。文中提到了通过扩展函数库、动态函数集成和构建世界知识模型来增强规划能力的可能性。同时,推荐了一些文献,以供进一步了解LLM代理规划环境和当前研究动态。
🚧 当前AI系统的局限性与误区
在这段中,作者指出了当前AI系统的局限性,并批评了一些关于代理性策略的误导性营销术语。作者强调,尽管有些系统可能使用LLMs进行决策,但这并不意味着它们具有代理性。作者呼吁观众保持对科学术语的正确理解,以避免被营销术语所误导。
👋 结语:对代理性AI未来的思考
视频的结尾部分,作者总结了对代理性AI未来的思考,并鼓励观众保持对科学术语的正确理解。作者认为,尽管当前的AI系统还不具备代理性,但通过理解它们的局限性,我们可以明确未来AI发展的方向。同时,作者希望观众能享受视频内容,并期待在下一个视频中与观众见面。
Mindmap
Keywords
💡代理性AI(Agentic AI)
💡AI代理(AI Agent)
💡自主性(Autonomy)
💡意向性(Intentionality)
💡责任感(Responsibility)
💡功能调用(Function Calling)
💡RAG系统(Retrieval-Augmented Generation)
💡规划(Planning)
💡世界模型(World Model)
💡LLM(Large Language Model)
Highlights
当前AI领域的热门话题是具有代理性的AI(即agentic AI),但这个概念在科学上尚无明确定义。
代理性(agency)通常指一个实体独立行动、做出选择、对世界施加意志的能力。
AI代理(AI agent)是一个设计用来感知环境、推理并采取行动以实现特定目标的计算系统。
代理性与AI代理之间的主要区别在于意向性的来源、自主性的性质以及责任和问责框架。
大型语言模型(LLMs)缺乏自主性和内在目标,因此并不具备代理性。
LLMs可以被设计为调用外部函数或API,但这被视为一种脚本化的代理性,而非真正的代理性。
真正的代理性AI系统能够做出决策,适应变化的情况,并展示自主行为。
代理性代表一种高级智能形式,其特征为目标导向行为、自我监控和适应性规划。
LLMs虽然令人印象深刻,但并不具备真正的代理性。
功能调用(function calling)是增强LLM能力的强大工具,但它并不等同于代理性。
功能调用涉及定义函数、指定函数架构、提示模型以及执行函数调用。
RAG(Retrieval-Augmented Generation)系统结合了信息检索模块和大型语言模型,但并不具备代理性。
尽管RAG系统能够检索信息,但它的功能调用能力并不如直接与外部系统交互的功能调用系统强大。
LLMs和RAG系统都缺乏代理性,因为它们的自主性有限,缺乏自我意识,并且操作参数是预定的。
未来的AI系统可能需要能够自我编码新函数并整合这些新函数到它们自身的功能集中,这将显著改变游戏规则。
当前AI系统的发展目标是创建能够进行目标设定、策略发展、行动排序和执行的真正智能代理。
规划(planning)是AI系统中一个关键的新兴功能,它对于提高效率、效果、适应性和减少错误至关重要。
当前的研究正在探索如何将世界知识模型与LLMs的口头世界知识结合起来,以创建行动模型。
最新研究提出了使用世界知识模型进行规划,而不是仅依赖LLMs,这可能为实现代理性AI开辟新途径。
Transcripts
Hello community! Isn't this fascinating? We have now agentic AI.
And you said, hey, wait a second, what is an agentic LLM in scientific detail?
And you asked, is it related to the AI agent that I just programmed?
And what does it mean, as my prof is telling me, I have to increase the rack agency of my system?
And are racks agentic? And what exactly is the mathematical and scientific
definition if I want to improve here my agency? So great that you asked, because I have to tell you,
I don't know. This is such a beautiful marketing buzzword that have here a new hype in AI.
And now you give me the opportunity that I can learn. And let's have a look what I found out.
Now, if you go, for example, LLM index, as you can see, we have here official home page,
agentic strategies. LLM index rack pipeline here. This is beautiful.
And you even have simpler agentic strategies. And I have no idea what this is.
And then you look here and I put in agentic rack in Google. Yes, I just use Google.
Imagine I'm that old fashioned. And you have here, agentic rack system,
agentic rack in any enterprise, AI agent and the agentic processes. And you might say,
unbelievable, we have something fascinating. So let's look what is agency.
Agency refers to the capacity of an entity, not specified if it's a human or a mechanic,
to act independently, make its own choices, impose its will on the world.
It involves intentionality, self-determination, and the ability to execute actions based on
personal motivation and goals. The key characteristics of agency are autonomy,
intentionality, self-determination, and responsibility. The entity can be held
accountable for its action. And OpenAI says, hey, this is great. So GPT is responsible
and accountable, also in financial terms, whatever GPT is doing to its users.
And so all the big, beautiful global corporations say, yeah, of course, we have agency in our AI
because then the systems are responsible and not we as a company. And you might say, this is
fascinating. Of course, it's also that the agentic lifestyle may come from an AI
agent being an agent itself. So an AI agent refers to a computational system
designed to perceive its environment with sensors, reason about it, and take actions
to achieve specific goals. It operates based on algorithms and data, executing tasks autonomously
with its programming constraints. Key characteristic is autonomy. It operates
without direct human intervention, making decisions based on predefined rules and learned data.
It has its own perception using its own sensors and its own input structures. It can do reasoning
using the algorithm and it can act and execute tasks or behaviors to achieve specific objectives.
Now, what are the key differences between agency versus an AI agent? And there are three points.
It's the source of intentionality, the nature of the autonomy and the responsibility and the
accountability. So in summary, we can say the agency refers to an inherent capacity
for an independent action, intentionality, and self-determination, typically associated
with living beings. An AI agent, on the other hand, is a computational system designed to operate
autonomously within the constraint of its programming, perceiving its environment,
reasoning about it, and taking action to achieve predefined goals. And yeah, taking action limits
this more or less to a robotic system. So the primary distinction lie in the source of
intentionality, the nature of the autonomy, and the framework of responsibility and accountability.
And you say, okay, so if I have now the word agentic, is it referring to agency or is it referring
here to the agent? Because those are two different systems. And I had a brilliant idea. I said,
hey, we're talking about the AI systems and large language model. So let's ask the LLM a question.
Are they agentic? What do they think about themselves? Because those are
the artificial intelligence system that it's all about.
In the upper right corner, you will see here the model that answers. And I asked about 12
different LLMs. So Gemini 1.5 Pro, a beautiful new system, says, hey, in, let's call it human
psychology and cognitive science, agency refers to the capacity of an individual
to act independently and make their own free choices. It encompasses several key components
from intentionality, for sort, self-reactiveness, and self-reflectiveness. Crucially, agency implies
a level of consciousness, an internal experience that goes beyond simple stimulus response
mechanisms. So Gemini 1.5 Pro, yeah, that sounds good. Then I ask a Llamas 3 70p instructor that
everybody loves. And Llamas 3 tells me, hey, while LLMs are incredible powerful and versatile,
they are not inherently agentic systems. And Llamas 3 goes on and explains, here's why.
They lack autonomy. LLMs do not operate independently and require human input,
such as prompts or training data to generate responses. LLMs have a limited intentionality.
LLMs do not have inherent goals or motivations. Their intentions are largely determined by their
training objectives in the input they receive. And there is no self-efficacy. LLMs do not perceive
themselves as capable of influencing the environment or achieving goals beyond generating text based
on their programming and on their training. So Llamas 3 tells us, no, LLMs are no agentic systems.
And I have here a short discussion with Llamas 3 70p and I say, but wait, LLMs can be designed
like the latest Mistral version 3, designed to call external functions or APIs, enabling them
to interact with other systems with their environment to receive new information or
perform specific tasks. Is this not an agentic way?
And Llamas 3 comes back with its own logic and says, in this sense, an LLM's function calling
ability can be seen as a form of scripted agency where the model is programmed to execute specific
actions or call functions in response to certain input or conditions. While this is not a true
agency, it is a powerful tool for automating tasks. So Llamas 3, quite clear on this topic.
Llamas 3 goes on, hey, to illustrate the difference, consider a chatbot that uses an
LLM to generate responses. And the chatbot will be able to call external functions,
like Mistral version 3, to retrieve information or perform tasks. But it is still a scripted system
that lacks autonomy, intentionality, and self-efficacy. In contrast, a true agentic AI system
would be able of making decisions, adapting to changing circumstances, and exhibiting autonomous
behavior. And if you think here about a certain movie about the future of the machines, you do
not want that an AI exhibits some real autonomous behavior and connect to the internet of the world.
Okay, this was Llamas 3. And I go to Gemini 1.5 Pro and say, hey, buddy, you know me, no? So,
can you do a scientific comparison between this function calling and LLMs that is so beautiful
to this strange word, agentic, that everybody says is here? And Gemini 1.5 Pro then comes back
and says, hey, the term agentic and the term function calling describe distinct capabilities,
so they can appear intertwined in the context of LLMs. So let's do a definition. By agentic,
being agentic, it implies having the capacity for independent action and goal-directed behavior.
This involves a goal formation, identifying desired outcomes and formulating plans to achieve
them. Please remember, this is so important. Formulating plans to achieve them. I come back
later in the video to this point. Second, action selection, choosing actions that maximize the
likelihood of a goal attainment and a self-monitoring. It can self-evaluate its progress.
Now, you know, on a biological basis, tells me Gemini, agency is a complex emergent property
arising from the interaction of various cognitive functions. And I thought, yeah, this is exactly
what I'm looking for in my AI, no? It's a complex emergent property and it comes from the interaction
of billions of little GPUs. But unfortunately, Gemini tells me, hey, the perception, the memory,
and again, we have here the planning. So creating mental representation of future actions
and their potential outcomes. Yeah, okay. This is not what an AI system is currently able to do.
And Gemini goes on, while AI system can exhibit behaviors that kind of resemble agency,
they're not truly agentic in the same way as biological systems. Because this AI system
lacks three points, consciousness, self-preservation, and intrinsic motivation. Fascinating.
I said, this is real nice. So coming back to the question of agentic,
agentic represents an advanced form of intelligence characterized by goal-directed behavior,
self-monitoring, and again, so important, adaptive planning. Gemini tells us, LLM,
while they are impressive and beautiful, LLMs do not possess true agency.
And function calling, what I ask here, is a powerful tool that enhances the LLM's capability
by allowing them access to external sources and perform tasks beyond their textual limitations.
However, function calling does not equate to agency. So I have to say,
those LLMs are really clear what they are. Now to my green grasshoppers, yes, I know you are
listening. This function calling, if you're not real familiar with it, it is simple. It just has
four steps when you can do function calling under your LLM, like your Mistral version 03.
So at first, you're not going to believe it, you have to define the functions.
Then you specify the function schema, and we use here normally a JSON schema.
Then we prompt our LLM with this, and then we execute the function based on a function call
generated by the LLM. The LLM executes the function with the provided arguments and returns to the
result. So you see, define function, specify schemas, prompt, execute. Let's have a look at
this. What is the code? Step one, define the function. Simple. Step two, define the function
schema. Let's go with a JSON. You have a function, you give it a name. If you are a nice coder,
you give it a description so other people understand what it is, and you define the
parameters. For example, you say, hey, I have here an object with a property of a location,
and the description is the city and state, like San Francisco, California. And I have a unit,
string, that gives me Celsius or Fahrenheit. And required is only the location. We have a schema
defined. And then we prompt a model with this. We say, hey, the role of the user is, question,
what is the weather like in London? And now we use here the tools command.
And we define what tools the LLM can use. Let's say here, OpenAI chat completion. We have go with
GPT-4. We have the functions, our tools that are defined. And now we just define one function,
but we can define multiple functions. So if we want the system to choose the function,
we say function call auto, or we specify our specific function. What is the weather in London?
What is the weather in London? And then we execute the function with a function call.
The name gets the current weather, the argument is London, and the unit is Celsius.
And you execute the function with this command. This is it. This is function calling.
If you want to learn more about it, I would like to recommend here
Mistral function calling. They have a beautiful summary. Let's have a look at this.
Mistral AI has here a real beautiful information for you. Function calling. You have a call up
notebook you can follow along. You have a video you can follow along for Mistral specific function
calling. And if they tell you the available models are Mistral small, large, and the Mixtral,
now also here, the latest Mistral 7B version 3 is also doing function calling. And I tell you,
hey, there's four steps. You have a user, a model, execute the function and generate the answer.
Easily explained. And if you want to look at the code, they go step by step and they explain for
each step which tools are available. They show you here the Mistral specific code for the Mistral
specific models. If you have the tools, how you define function one, how you define function two.
Beautiful. Explained. Go there. You have your user query. Step two, the tool choice. As I told
you, for example, here auto. The model decides which tool to use. Any forces here to tool use
or none prevents a tool use. Of course, you need your Mistrow key and you have your step three and
step four explained beautifully. Now, if we go now to OpenAI, we have the same. We have here
function calling, a beautiful description. They introduce you what is function calling in an
IPI call. You can describe function and have the model intelligently choose to output a JSON object
containing arguments to call one or many function. You have the ChatCompletion API does not call the
function. Instead, the model generates JSON that you can use to call the function in your code.
Those are the models that have these use cases. For example, create a system that
answer question by calling external APIs, convert natural language into API calls.
So you say, hey, who are my top customer? If you have a company AI, this is then converted into
the function, get customer minimum revenue at the threshold created before and the limit
and calls your external or internal API or you have structured data from a text. So you see,
beautiful demonstration. They tell you the behavior. They even show you parallel function calling.
Remember, you have to pay for each function calling. No, this is not a way to get you out of this.
And they show you how to invoke multiple function call in one response. You can copy the code. It
is immediately available for you and it all follows the same schema. There might be some small code
adaptation if you work with OpenAI or if you work with Mistral. Never mind. Also, I would recommend
here the literature from Oh Lama because in the new Oh Lama updates, you can work now with the
Oh Lama raw mode and Mistral version 3 is also able to use raw mode, which is a beautiful thing
I will do in a later video. So coming back, I know you have been waiting for Claude3 opus.
All my subscribers say, hey, I like Claude. Here we are. And Claude tells us the concept of agency
in AI is still an active area of research. While LLMs with function calling ability demonstrate
impressive language understanding, they may not yet possess the full range of characteristics
associated with agentic systems. So not yet agentic systems. So also Claude3 opos tells us
LLMs with function calling are not agentic. And I asked Claude3 now, hey, can you tell me here
the differences between being agentic and the function calling ability? Isn't it the same?
And Claude comes back and says, hey, no way. I have five points where I can show you it is not.
Let's look at autonomy. An agentic system has a higher degree of autonomy in decision making,
whereas the LLM function calling ability is more dependent on a specific input and a predefined
function. It has been trained, the user defined, to recognize and execute the command.
The second point, the goal-directed behavior. Agentic systems have their own goals and can adapt
their own behavior to achieve those goals, while LLMs with function calling ability are
primarily focused on executing the requested function based on the input provided. Point three,
adaptability. Agentic systems can adapt to new situations and learn from their experiences.
LLMs with function calling ability are limited to the specific function they have been trained
to perform and may not be able to adapt to novel situations without further training.
Point four, self-regulation. Agentic systems can monitor their own performance and make
adjustments to improve their effectiveness. LLMs with function calling ability do not inherently
possess this level of self-regulation and rely on external human evaluation and fine tuning to
improve their performance. Point five, the contextual understanding. Agentic systems have
a deeper understanding of the context and can use this understanding to inform their decision
and actions. LLMs with function calling ability have some level of contextual understanding
based on the training data, but are often more limited. So this was Claude 3. And then I looked
in total at 12 LLMs and believe me, like Gemini Flash attributing full agency to LLM is premature.
QN, no full agentic systems. QN 110b, not qualifies agentic system. And then we have a funny one,
one that does not exist at all, a Gemini 1.5 Pro tuned that is not official here yet. And they say
it falls short of the criteria for an agentic system. So LLM by all means,
every LLM tells me LLMs are no agentic systems. Yeah, yeah, this Pro tuned is secret stealth LLM.
If you ask you the characteristics of an agentic system, it has a different twist to it. Look at
this. It says, agentic systems are defined by goal-directed behavior, autonomy, perception,
and action. But it's a little bit different. If you read it in detail, you will understand what I mean.
Okay, I went back to Gemini 1.5 Pro, the new API preview, and it tells me, hey,
function calling with LLMs still lack inherent goals or motivation. There's no long-term planning.
There's a limited autonomy and function calling makes LLM a beautiful tool, but it does not
transform them into independent agents. So LLMs with function calling are no agents.
Now this planning, as you noticed, maybe since I mentioned it several times now, is of crucial
importance for the next step, for the next evolution of AI systems.
Agents can conceptualize future states and strategize pathways to achieve the desired
outcomes. This involves considering different options, anticipating obstacles, and making
decisions based on predicted consequences. This is not an easy task for an LLM.
Now, let me give you a very simple idea of mine. Whenever an AI system encounters a specific task
where it would need to utilize a new function, it has not been programmed, that is not in its
memory and does not know how to connect to external sources, this AI system should be able to create
this AI system with its given set of functions. This new adequate function,
if it has access to a Python environment, it should be able to self-code this new function
and integrate this synthetically generated new function in its own set of function,
which means for me kind of a self-learning.
This would change the game significantly, but we are not there, not at any means.
Where are we today? Our best systems, and I will show you later on,
they are a DAG, a Directed Acyclic Graph System. So, we started a node with a function or an
operation or whatever you like, then either we go in A or either we go in B, and if we went to A,
and if we went to A, then we have to, maybe the option here to go over there and perform this
task, or maybe this node tells us, hey, you have three possibilities, given your specific input
data or your specific threshold or whatever parameter you have, you can go either to A1,
A2 or A3. But this is a predefined structure, and this predefined flow that some system try
to integrate into a multiple function calling set of operation,
it is not what we are looking for, because this is just like SAP for business finances.
If you want to calculate your, I don't know, your profits in your company, well, you start with the
cost, then you divide the cost, I don't know, per staff, and you divide this then by another
criterion, you compare it here to your revenues, and somewhere down the line you have a clear
defined flow of mathematical operation, and then the SAP system generates here the statement of
your financial profit in the company. This is a known way how to do financial calculations.
There is no intelligence, because I, as a programmer, I predefined here the way of the
logic that the system should follow, and please, in business finances, there are national laws,
so I have to make sure that all of these steps are followed and calculated. But this is not an AI
system, this is a Python or SAP system for business finances. So you see the difference?
But currently we are here, and we want to go over there.
So whenever an AI system encounters a new task, now the AI, and now the first question,
is this really the task of a large language model of Lawrence and decides what programs
or function calling to activate in the flow? And you see, in my other videos I talked about
causal logical reasoning so much, and now you understand why. Because this reasoning ability,
let's say of an LLM, is an essential feature for the planning, planning the logical path of our
AI system to the desired solution, but not a predefined flow of operation that is necessary
if you want to go with your national accounting laws, if you want to calculate your business
finances. So you see, where and how strong you apply AI logic is a complete different game.
And then I know you have been waiting for this about RAG, and I know that my subscriber says,
hey, if there's an obstacle, I just build the RAG system and RAG solves everything.
RAG is a unique solution to the problems in the world of AI.
Or maybe not so. Okay, so RAG, yeah, beautiful. So what is RAG? It is an information retrieval
module and a large language model. Now we know that a large language model is not agentic.
It is not an agent. It does not have agency. So you might say, hey, no problem than it is.
Here the combination. Okay, let's look at the key differences between the RAG
and a standard function calling AI system. And we look here at the dynamic knowledge integration
and the improved factual accuracy and the relevance, the grounding, and some domain
specific expertise. You can read this so I can tell you RAG is just not as powerful function calling
because with function calling I can access Python system, interactive system. It can have robotic
system that really act in the external world. With RAG, I just retrieve information. So RAG,
let's be progressive, is a little subsystem of function calling because we only have to retrieve
it, the augmentation that is done in the LLM and the generation that is done in the LLM.
But function calling is a much more powerful system.
Now there are multiple ways RAG can access this beautiful external information.
You can have here just your local file system or you go to a network file or you go here with
SQL or SQLite. You can have a Mongo client that you say, beautiful, this is the way.
Or you go here with access S3 data. Or you go here and you have an FSTP protocol. Or, and this is
normally also happening, you have an API call. So you see multiple ways that two of two external
information can be accessed via RAG. And then, let's make it short, I ask here,
so LLMs with their function calling abilities and RAG with their information retrieval engine,
they are just simple AI systems that both lack agency. And I say, would you agree? And if not,
explain in scientific terms, why not? And I asked 12 LLMs and all the systems told me this.
And I have here LLM3 and a Gemini. And I said, yep. RAG also lack agency due to their limited
autonomy, lack of self-awareness, and predetermined operational parameters. Gemini tells us, yes.
RAG, also lack inherent agency in the same way humans possess it, characterizing them as simple,
however, is an oversimplification. So Gemini said, yeah, I know they don't have agency,
but you know it, they are not really simple. Okay, I see what you want to do, Gemini.
And of course, if you want to experience the other 10 yourself, just say go to lmsys.org,
put this in. This is the screenshot here from my last, what I showed you. And you have then
LLM3 or Gemini. Do it in a blind test. You will be surprised how different models,
different LLMs answer. A different kind of pathway of the argumentation,
but the result is always the same. RAGs are not agentic. That's it.
So here we go. What we have proven until now, there are no agentic LLMs. There is no agency
to a RAG system. So you can go back to your professor and say, you know that you are wrong.
So what is it that we are looking for? And what is the cool new buzzword that we should use that
somebody watches here, some new, I don't know, information on some platform?
And are we interested in this or do we all understand that these are just buzzwords
for marketing like agency and agentic? Or do we take the step and say, hey, those terms have
a scientific meaning and it's important to understand because they open up a new avenue
for our development. And I go, of course, with the second one and I say, have you noticed that
in the whole session up until now, planning seems to be the most emergent functionality of the
system we have to talk about? And from Lillian Wank here, this is a beautiful, very old statement.
What is an agent? And she did an equation. An agent is an LLM that has quite a long memory,
internal memory structure with 2 million token context length, can use tools like our function
calling abilities, and has a plan what to do if given a new task. And this is this beautiful
planning. So you see, it was all the time there. Just all this marketing really shifted our focus
where it should be. Now have a look at this. Today, we think that the LLM is the main course,
the main source of logic, that the causal reasoning of an LLM is essential for the planning
strategy of the whole agent system, if it would be an agent.
But there's also another option we are now starting to explore in AI research,
that a plan what to do is not the LLM, but is a world model that is based on scientific laws
under the laws of nature. So hey, this is interesting. So we have now two kind of
intelligent system. The rest is just tool use, just planning here, new steps.
So let's look at the key aspect of planning in a theoretical true AI agent. We have goal setting,
strategy development, action sequencing, like in robotics, and then the execution.
And yeah, you see it all goes to robotics, to visual language model that interact in an external
environment. So goal setting is easy, establish specific objectives that the agent aims to achieve,
strategy develop, you create a plan, planning strategy, outline the step necessary, okay?
The action sequencing is nice, because remember the action sequencing, the LLM would know
theoretically what to do, but it doesn't understand why. So there's some treasure hidden here,
and the execution. The importance of the new planning functionality in our new real AI agents,
it would have to increase here the efficiency, the effectiveness, the adaptability,
and it will reduce errors. And I've given you here examples in logic, healthcare, stock trading,
and manufacturing. And you might say, but wait a second, what exactly is now the difference
between planning and function calling in LLMs? Are they not somehow related? Because if you have
a set of 10 functions, and this LLM learned to differentiate when to call each specific function,
and in what order to call each specific function in a kind of time series,
no, there is some underlying difference. Look at this. We look here at predefined functions
by the user, function calling framework structure, and the limitations in its adaptability.
So the function need to be defined by coders, by developers beforehand. I define the function
that my system needs for a task. So the function only actions the LLM can call during its operation,
and we have a training and fine tuning that if I have a natural language command,
the LLM understands, hey, this means I have to issue an API call to the weather station.
The LLM uses the function call to interact with external system and databases. But those
interactions are encapsulated and predefined within the defined functions. So there is nothing new to
this system. It's just a mechanically predisposed system. And LLMs cannot improvise new action
beyond the defined function. And this is a strong limit in its ability to adapt to new tasks.
Now, it's interesting if you look in the future, how can we enhance this planning capabilities?
What if we move beyond an LLM as the logical central switching board?
You can expand the function libraries. Like, I don't know, somewhere on an index, you say, hey,
this is now a database, and I have here 10,000 functions defined. So, let's look at this.
10,000 functions defined. And I give you the description of what a function can do and what
it should have as an input and what it should generate as an output. But you understand this
is, again, kind of a database system. Creating function from a variety of related tasks,
the dynamic function integration. The system can be designed by training and learning a lot of
training data to dynamically integrate now these new 1,000 functions. This is also not really a
sign of intelligence. And even if you combine it in a hybrid model, where you have a static
and a dynamic planning, where a little bit of intelligence shines through in this dynamic,
it is still not really what we want. So, here we have now from the end of May 2024,
a beautiful new archive preprint, agent planning. Now, not with an LLM, but with a world knowledge
model. And they say, hmm, currently, if we look here at the graph structure, it must not be a
DAG. We have the agent model, you know, one action, another action, make a decision, go there,
find out, no, not working. We have hallucination somewhere and we have a wrong link and whatever.
This is a trial and error and this is really a problem. And they propose here, hey,
in our LLMs, we have the verbal world knowledge, but we do not know how to execute it.
So, we transfer this to a state knowledge and this state knowledge, we can build here a
graph structure from our semantic knowledge and then with the nodes and edges and probability
to travel along specific edges and the properties of certain nodes, we build here a new world view
model in the graph structure where we have kind of predefined correct path to the solution.
Of course, they need here to integrate real sensory data, continuous data that build up here
as a copy here from the real world scenario, if you have a robot moving in a room, to build
out a virtual twin, this virtual scenario of the world model, but currently we only have the
description of our world in our LLMs, not the understanding. When they thought, hey, is it
possible to use the verbal information, what to do, how the world works that we have in our LLM
and kind of build an action model. Interesting. Another literature I would like to recommend to
you, although it's not really the absolute latest, it is from February 2024, this is a survey done
here from some beautiful institutions, understanding the planning of LLM agents, a survey,
and if you want to see here the taxonomy of an LLM agent planning environment,
it gives you a very good overview and you are amazed how many system elements we already have,
we already have kind of operational code, but we have not yet put it together.
So interesting literature if you're interested in this topic, I think, or if you just do your
PhD thesis currently somewhere at the university on this beautiful planet, think about those topics
because those are topics that will be important in the next month. And another from end of May 2024,
you see here LLM plus reasoning capability, causal reasoning, logical reasoning, plus the planning
here in the presence of our APIs. Also an interesting publication if you want to have
a little bit more of a deep dive if you want to see where the current research is happening right
now. So there you have it. We do not have an agentic AI today, but we understood what all the
systems lack, what they are not able to do, and therefore we know on what elements to focus on
for the future of AI. And you know, I think agentic AI systems are not that far off in the future,
maybe just years.
And at the end of this, you know, short personal idea, I want to show you where is code today,
and I really mean today. And if we go here, for example here, Open AI agent query planning,
there are tools. And the Open AI function agent with a query planning tool is already there.
And if you go there, LLAMA index, here is your link. But if you look at this, you understand
the shortcomings immediately. Look at this. We have here an engine, the name is called September
2022. Provide information about quarterly financial data. Then we have another engine,
and the name is now June 2022. Provides information about the quarterly financial,
yes. And then we have a third engine, and you get the idea.
And then we have here this beautiful new code, and it says, yeah, the query plan tool,
so the real complex of planning. It has three tools, and we have here the query tool for
September, the query tool for June, and the query tool for March of 2022. And you might say, yeah,
but this is just, I mean, yeah, okay. And then you have here the query plan tool metadata to
Open AI tool, and you say, my goodness, this sounds so intelligent, but you know what it actually does?
If you have defined your agent with Open AI agent from tools, and then you have your function
calls, and then you have here your LLM that you use, and you have defined here your query plan
tool with your three, and then you say, hey, now, agent query, what are the risk factors in, and now
September 2022, and you say, my goodness, this is intelligent. And then you hard code here from Lama
index core tool, query plan, import the query plan and the query node, and you have your query
plan with the node, and you have the query node one, risk factors, and the tool name is now.
If you want to answer the question, what is the risk factor in September 2022,
you go with the tool September 2022. Okay. Now, this is, of course, just a demonstration of the
code, and they say, yeah, of course, this should be a DAX system, and you have to start with one
code, of course, and you go here with the first code. Okay. But you know, the real power of a graph
interaction is not yet there in this open source community, but they are trying to go there.
But, however, if you read a statement here, like in the Lama index, talking about
agentic strategies, you have now for yourself to decide, is this just the marketing term
or is it really the scientific term that really provides an avenue to the future of AI?
Because if you say here, like in agentic strategies here, in the official page here,
that a lot of modules like routing or query transformation and more are already agentic
in nature, in that they use LLMs for decision making, this is not
agentic. This is simply, plainly, scientifically wrong. This statement is not correct.
This statement is a marketing statement, but you should be able to decide,
or if you see this, that you understand exactly that this is not an agentic strategy at all.
But I hope I've shown you in this video possible avenues to the future development of agentic
strategies. But we should not give ourselves away this opportunity, because there are some crazy
marketing whatever, just to get your attention that we should lose the focus what is really
necessary to achieve an agentic AI system in the real scientific terms and in the real scientific
world. I hope you had a little bit fun today. You enjoyed it. You had maybe some new thoughts
about something and it would be great to see you in my next video.
تصفح المزيد من مقاطع الفيديو ذات الصلة
Natural Language Processing: Crash Course Computer Science #36
【生成式AI導論 2024】第9講:以大型語言模型打造的AI Agent (14:50 教你怎麼打造芙莉蓮一級魔法使考試中出現的泥人哥列姆)
《與楊立昆的對話:人工智能是生命線還是地雷?》- World Governments Summit
Python Advanced AI Agent Tutorial - LlamaIndex, Ollama and Multi-LLM!
【生成式AI】ChatGPT 可以自我反省!
【生成式AI導論 2024】第4講:訓練不了人工智慧?你可以訓練你自己 (中) — 拆解問題與使用工具
5.0 / 5 (0 votes)