Agentic AI: The Future is here?

code_your_own_AI
28 May 202446:27

Summary

TLDR本视频深入探讨了'代理性AI'(即agentic AI)的概念,指出这一术语常被作为营销炒作。视频首先介绍了代理性AI应具备的特质,如自主性、意图性、自我决定和责任感。随后,通过比较分析大型语言模型(LLMs)和真正的代理性系统,明确了LLMs尽管功能强大,但并不具备真正的代理性。视频还讨论了功能调用(function calling)与代理性的区别,并提出未来AI发展可能的方向,包括利用世界知识模型进行规划。最后,视频呼吁观众区分科学术语和营销术语,鼓励关注真正推动AI发展的科学要素。

Takeaways

  • 🧠 LLM(大型语言模型)和RAG(检索增强生成模型)目前并不具备代理性(agentic),即它们没有自主性、自我意识和预定的操作参数。
  • 🔍 代理性(agency)涉及独立行动、意向性、自我决定和责任感,通常与生物体相关联,而AI代理则是在编程约束下自主操作的计算系统。
  • 🤖 AI代理通过传感器感知环境、进行推理并执行任务以实现预定义的目标,而代理性则指一个实体具有独立行动、意向性和自我决定的固有能力。
  • 📚 功能调用(function calling)是LLM的一种能力,允许它们访问外部资源和执行超出文本限制的任务,但这并不等同于代理性。
  • 🛠️ RAG系统通过信息检索模块和大型语言模型增强了事实准确性和相关性,但它仍然缺乏代理性。
  • 🔑 规划(planning)是实现代理性AI系统的关键功能,涉及目标设定、策略发展、行动排序和执行。
  • 🚀 未来的AI系统可能需要超越LLM作为逻辑中心,通过世界知识模型进行规划,这可能涉及到科学法则和自然法则。
  • 🔄 目前AI研究正在探索的领域包括如何将LLM中的口头世界知识转化为状态知识,以构建更准确的世界模型。
  • 📈 规划功能的提升将增加AI系统的效率、效果、适应性,并减少错误,这在逻辑、医疗、股票交易和制造业等领域尤为重要。
  • 🔗 功能调用与规划之间存在明显区别,前者是预定义的、结构化的,而后者则涉及动态的、适应性的决策过程。
  • 🎯 尽管目前AI系统尚未达到代理性,但了解它们的局限性有助于我们聚焦于未来发展的关键要素。

Q & A

  • 什么是代理性(Agnetic)AI,它与AI代理有何不同?

    -代理性AI指的是具有独立行动、自主决策和对世界施加意志的能力的实体,涉及意图性、自主性和责任感。而AI代理是一个设计用来感知环境、推理并执行动作以实现特定目标的计算系统,它在预定义规则和学习数据的基础上自主操作,不直接涉及人类干预。

  • 大型语言模型(LLMs)是否具有代理性?

    -根据脚本中的讨论,LLMs虽然功能强大,但它们并不具备自主性、固有目标或动机,也没有自我效能感,因此不被认为具有代理性。

  • 什么是功能调用(Function Calling),它与代理性有何关联?

    -功能调用是一种能力,允许模型调用外部函数或API,以与环境交互、接收新信息或执行特定任务。虽然这增强了LLMs的能力,但它并不等同于代理性,因为功能调用并不涉及独立的目标形成、行动选择和自我监控。

  • RAG系统是什么,它与标准功能调用AI系统有何不同?

    -RAG系统是一个结合了信息检索模块和大型语言模型的系统。与标准功能调用AI系统相比,RAG专注于检索信息,而不是执行更广泛的外部操作或与物理世界互动。

  • 为什么规划(planning)在构建真正智能的AI代理中如此重要?

    -规划是智能代理设定目标、制定策略、安排行动序列并执行的能力。它对于提高AI系统的效率、效果、适应性,并减少错误至关重要。

  • 如何理解LLMs与世界模型(Word Model)在规划中的作用?

    -LLMs提供了丰富的语言和世界知识,但缺乏执行这些知识的能力。世界模型则基于科学法则和自然法则,提供了一种可能的方式来构建一个能够规划和执行复杂任务的智能系统。

  • 目前AI研究中,关于代理性AI系统的规划有哪些新的发展?

    -最新的研究正在探索如何将LLMs的知识和世界模型结合起来,创建一个能够进行复杂规划和决策的系统。这包括使用图结构来表示世界状态和可能的行动路径,以及集成实时感官数据来构建虚拟世界模型。

  • 为什么说目前市场上的一些所谓的'代理性策略'实际上是市场营销术语,而非科学术语?

    -一些所谓的'代理性策略',如在Llama指数中提到的,实际上只是使用了LLMs进行决策制定,并没有涉及到真正的代理性行为,如自主性、意图性、自我决定和责任感。因此,这些术语更多是市场营销用语,而非科学上对代理性的准确描述。

  • AI系统在执行任务时,如何从功能调用转变为更高级的规划能力?

    -AI系统可以通过集成更复杂的世界模型和增强的推理能力,从简单的功能调用转变为能够进行高级规划的系统。这可能包括自编码新功能、动态集成新功能以及与环境的实时交互。

  • 目前的AI系统在规划和决策方面存在哪些限制?

    -当前的AI系统,特别是基于LLMs的系统,缺乏真正的自主性、意图性、自我决定能力和责任感。它们在适应新任务、学习新情境以及自我监控和调整方面的能力有限。

  • 为什么理解和规划在AI的发展中如此关键?

    -理解和规划是实现真正智能代理的关键因素。它们使AI系统能够设定目标、制定策略、安排行动并执行,这对于解决复杂问题和适应不断变化的环境至关重要。

Outlines

00:00

🤖 探索代理性AI与AI代理的区别

本段讨论了代理性AI(agentic AI)的概念,以及它与AI代理(AI agent)的不同之处。代理性是指实体独立行动、自主决策并影响世界的能力,涉及自主性、意图性、自我决定和责任感。AI代理则是设计用来感知环境、基于算法和数据进行推理并执行任务以达成特定目标的计算系统。文中提到了大型语言模型(LLM)和代理性策略,并通过询问多个AI模型来探讨它们是否具有代理性。

05:05

🧐 深入理解LLMs的代理性

在这段中,通过询问不同的大型语言模型(LLMs),探讨了LLMs是否具有代理性。Gemini 1.5 Pro和Llamas 3等模型给出了它们的观点。Gemini 1.5 Pro认为代理性涉及意识水平和内部体验,而Llamas 3明确指出LLMs缺乏自主性、固有目标或动机,因此不能被视为具有代理性的系统。此外,还讨论了LLMs如何通过调用外部函数或API来执行特定任务,但这并不等同于真正的代理性。

10:10

🔍 功能调用与代理性的区别

本段深入讨论了功能调用(function calling)与代理性之间的差异。功能调用是LLMs的一个强大工具,允许它们访问外部资源并执行超出文本限制的任务,但这并不等同于代理性。Gemini 1.5 Pro解释了真正的代理性涉及独立行动、目标导向的行为、自我监控和适应性规划。而LLMs缺乏意识、自我保护和内在动机,因此不能被视为真正的代理性系统。

15:14

🛠️ 功能调用的具体实现

这段内容详细介绍了如何在LLMs中实现功能调用。首先需要定义函数,然后指定函数模式(通常使用JSON模式),接着使用这个模式提示LLM,最后根据LLM生成的函数调用来执行函数。文中还提供了代码示例,展示了如何定义函数模式、如何使用工具命令以及如何执行函数调用。

20:17

📚 探索RAG系统与代理性

本段探讨了RAG(Retrieval-Augmented Generation)系统,这是一种结合了信息检索模块和大型语言模型的系统。尽管RAG能够提高事实准确性和相关性,但它并不具备代理性,因为它缺乏自主性、自我意识和预定的操作参数。文中询问了多个LLMs,它们一致认为RAG系统不是代理性的。

25:19

🤔 规划在代理性AI中的作用

这段内容讨论了规划在实现代理性AI中的重要性。规划涉及目标设定、战略发展、行动排序和执行。文中提到,当前的LLMs和RAG系统都不具有代理性,但规划可能是实现代理性AI的关键功能。此外,还提出了一个观点,即未来的AI系统可能不仅仅依赖LLMs进行规划,而可能依赖于基于科学法则的世界模型。

30:20

🚀 未来AI系统的发展方向

本段提出了未来AI系统可能的发展方向,强调了规划能力的重要性。文中提到,未来的AI系统可能需要能够自我编码新功能,并将其整合到现有功能集中,这可能涉及到自我学习的能力。尽管当前的AI系统还未能实现这一点,但这种能力可能会显著改变AI的发展。

35:21

🔗 当前AI系统与理想状态的差距

这段内容讨论了当前AI系统与理想中的代理性AI系统之间的差距。文中通过比较现有的AI系统结构和业务财务系统,指出了AI系统的局限性。同时,提出了一个观点,即AI的逻辑应用应该根据任务的性质来决定,而不是遵循预定义的操作流程。

40:27

📈 规划与功能调用之间的差异

本段详细区分了规划和功能调用之间的差异。规划是一个动态的过程,涉及目标设定、战略制定、行动排序和执行,而功能调用则是基于预定义的函数和结构。文中强调,尽管LLMs可以学习何时以及如何调用特定函数,但它们缺乏真正的适应性和智能,因为它们不能即兴创造新的行动。

45:29

🎯 规划能力的提升与未来展望

这段内容探讨了如何提升LLMs的规划能力,并对未来的AI研究提出了展望。文中提到了通过扩展函数库、动态函数集成和构建世界知识模型来增强规划能力的可能性。同时,推荐了一些文献,以供进一步了解LLM代理规划环境和当前研究动态。

🚧 当前AI系统的局限性与误区

在这段中,作者指出了当前AI系统的局限性,并批评了一些关于代理性策略的误导性营销术语。作者强调,尽管有些系统可能使用LLMs进行决策,但这并不意味着它们具有代理性。作者呼吁观众保持对科学术语的正确理解,以避免被营销术语所误导。

👋 结语:对代理性AI未来的思考

视频的结尾部分,作者总结了对代理性AI未来的思考,并鼓励观众保持对科学术语的正确理解。作者认为,尽管当前的AI系统还不具备代理性,但通过理解它们的局限性,我们可以明确未来AI发展的方向。同时,作者希望观众能享受视频内容,并期待在下一个视频中与观众见面。

Mindmap

Keywords

💡代理性AI(Agentic AI)

代理性AI指的是具有独立行动能力、自主决策和对世界施加影响的AI系统。在视频中,代理性AI被描述为一种营销热词,它涉及到实体的自主性、意向性、自我决定和责任感。视频通过讨论代理性AI与AI代理的区别,以及如何通过增加系统的代理性来提升AI的能力,来探讨这一概念。

💡AI代理(AI Agent)

AI代理是指设计用来感知环境、推理并执行动作以实现特定目标的计算系统。它基于算法和数据自主操作,不需要直接的人类干预。视频中提到AI代理的关键在于自主性,并且通过预定义的规则和学习数据来做出决策。

💡自主性(Autonomy)

自主性是指一个实体独立行动和做出选择的能力。在视频中,自主性是代理性的关键特征之一,它允许实体根据自己的动机和目标执行动作。视频通过比较代理性与AI代理来讨论自主性的来源和性质。

💡意向性(Intentionality)

意向性涉及到实体有目的地行动和基于个人动机和目标做出选择的能力。视频中提到,代理性实体具有意向性,而AI代理虽然能够执行任务,但它们的意向性是由训练目标和接收到的输入决定的。

💡责任感(Responsibility)

责任感意味着实体能够对其行为负责,并且可以被追究责任。在视频中,责任感是代理性的一个关键特征,它与实体的自主性和意向性紧密相关。视频讨论了AI系统在财务等方面如何表现出责任感。

💡功能调用(Function Calling)

功能调用是指AI系统能够调用外部函数或API以执行特定任务的能力。视频中提到,虽然功能调用增强了AI系统的能力,但它并不等同于代理性。功能调用允许AI系统访问外部资源,但它仍然是在预定义的编程约束下操作。

💡RAG系统(Retrieval-Augmented Generation)

RAG系统是一种结合了信息检索模块和大型语言模型的AI系统。视频中指出,尽管RAG系统能够检索信息并增强LLM的能力,但它并不具备代理性。RAG系统主要用于改善事实准确性和相关性,但它的功能调用能力有限,主要局限于信息检索。

💡规划(Planning)

规划是指为实现特定目标而制定的一系列行动步骤和策略。视频中强调了规划在发展真正代理性AI系统中的重要性。规划涉及目标设定、策略发展、行动排序和执行,是AI系统适应新情况和学习经验的关键。

💡世界模型(World Model)

世界模型是指基于自然法则和科学原理构建的对世界的认知模型。视频中提到,未来的AI研究可能会探索使用世界模型来进行规划,而不是仅仅依赖于LLM。这种模型能够提供对世界的更深刻理解,并指导AI系统做出决策。

💡LLM(Large Language Model)

LLM指的是大型语言模型,它们能够理解和生成自然语言文本。视频中讨论了LLM在代理性AI中的角色,指出尽管LLM在语言理解方面表现出色,但它们并不具备完整的代理性特征,如自主性、意向性和自我效能感。

Highlights

当前AI领域的热门话题是具有代理性的AI(即agentic AI),但这个概念在科学上尚无明确定义。

代理性(agency)通常指一个实体独立行动、做出选择、对世界施加意志的能力。

AI代理(AI agent)是一个设计用来感知环境、推理并采取行动以实现特定目标的计算系统。

代理性与AI代理之间的主要区别在于意向性的来源、自主性的性质以及责任和问责框架。

大型语言模型(LLMs)缺乏自主性和内在目标,因此并不具备代理性。

LLMs可以被设计为调用外部函数或API,但这被视为一种脚本化的代理性,而非真正的代理性。

真正的代理性AI系统能够做出决策,适应变化的情况,并展示自主行为。

代理性代表一种高级智能形式,其特征为目标导向行为、自我监控和适应性规划。

LLMs虽然令人印象深刻,但并不具备真正的代理性。

功能调用(function calling)是增强LLM能力的强大工具,但它并不等同于代理性。

功能调用涉及定义函数、指定函数架构、提示模型以及执行函数调用。

RAG(Retrieval-Augmented Generation)系统结合了信息检索模块和大型语言模型,但并不具备代理性。

尽管RAG系统能够检索信息,但它的功能调用能力并不如直接与外部系统交互的功能调用系统强大。

LLMs和RAG系统都缺乏代理性,因为它们的自主性有限,缺乏自我意识,并且操作参数是预定的。

未来的AI系统可能需要能够自我编码新函数并整合这些新函数到它们自身的功能集中,这将显著改变游戏规则。

当前AI系统的发展目标是创建能够进行目标设定、策略发展、行动排序和执行的真正智能代理。

规划(planning)是AI系统中一个关键的新兴功能,它对于提高效率、效果、适应性和减少错误至关重要。

当前的研究正在探索如何将世界知识模型与LLMs的口头世界知识结合起来,以创建行动模型。

最新研究提出了使用世界知识模型进行规划,而不是仅依赖LLMs,这可能为实现代理性AI开辟新途径。

Transcripts

play00:00

Hello community! Isn't this fascinating? We have now agentic AI.

play00:06

And you said, hey, wait a second, what is an agentic LLM in scientific detail?

play00:14

And you asked, is it related to the AI agent that I just programmed?

play00:18

And what does it mean, as my prof is telling me, I have to increase the rack agency of my system?

play00:26

And are racks agentic? And what exactly is the mathematical and scientific

play00:32

definition if I want to improve here my agency? So great that you asked, because I have to tell you,

play00:41

I don't know. This is such a beautiful marketing buzzword that have here a new hype in AI.

play00:48

And now you give me the opportunity that I can learn. And let's have a look what I found out.

play00:55

Now, if you go, for example, LLM index, as you can see, we have here official home page,

play01:01

agentic strategies. LLM index rack pipeline here. This is beautiful.

play01:08

And you even have simpler agentic strategies. And I have no idea what this is.

play01:14

And then you look here and I put in agentic rack in Google. Yes, I just use Google.

play01:19

Imagine I'm that old fashioned. And you have here, agentic rack system,

play01:24

agentic rack in any enterprise, AI agent and the agentic processes. And you might say,

play01:30

unbelievable, we have something fascinating. So let's look what is agency.

play01:37

Agency refers to the capacity of an entity, not specified if it's a human or a mechanic,

play01:44

to act independently, make its own choices, impose its will on the world.

play01:49

It involves intentionality, self-determination, and the ability to execute actions based on

play01:58

personal motivation and goals. The key characteristics of agency are autonomy,

play02:04

intentionality, self-determination, and responsibility. The entity can be held

play02:10

accountable for its action. And OpenAI says, hey, this is great. So GPT is responsible

play02:17

and accountable, also in financial terms, whatever GPT is doing to its users.

play02:23

And so all the big, beautiful global corporations say, yeah, of course, we have agency in our AI

play02:29

because then the systems are responsible and not we as a company. And you might say, this is

play02:35

fascinating. Of course, it's also that the agentic lifestyle may come from an AI

play02:44

agent being an agent itself. So an AI agent refers to a computational system

play02:52

designed to perceive its environment with sensors, reason about it, and take actions

play03:00

to achieve specific goals. It operates based on algorithms and data, executing tasks autonomously

play03:08

with its programming constraints. Key characteristic is autonomy. It operates

play03:15

without direct human intervention, making decisions based on predefined rules and learned data.

play03:23

It has its own perception using its own sensors and its own input structures. It can do reasoning

play03:30

using the algorithm and it can act and execute tasks or behaviors to achieve specific objectives.

play03:42

Now, what are the key differences between agency versus an AI agent? And there are three points.

play03:49

It's the source of intentionality, the nature of the autonomy and the responsibility and the

play03:58

accountability. So in summary, we can say the agency refers to an inherent capacity

play04:08

for an independent action, intentionality, and self-determination, typically associated

play04:16

with living beings. An AI agent, on the other hand, is a computational system designed to operate

play04:23

autonomously within the constraint of its programming, perceiving its environment,

play04:29

reasoning about it, and taking action to achieve predefined goals. And yeah, taking action limits

play04:35

this more or less to a robotic system. So the primary distinction lie in the source of

play04:41

intentionality, the nature of the autonomy, and the framework of responsibility and accountability.

play04:47

And you say, okay, so if I have now the word agentic, is it referring to agency or is it referring

play04:56

here to the agent? Because those are two different systems. And I had a brilliant idea. I said,

play05:05

hey, we're talking about the AI systems and large language model. So let's ask the LLM a question.

play05:13

Are they agentic? What do they think about themselves? Because those are

play05:21

the artificial intelligence system that it's all about.

play05:26

In the upper right corner, you will see here the model that answers. And I asked about 12

play05:31

different LLMs. So Gemini 1.5 Pro, a beautiful new system, says, hey, in, let's call it human

play05:39

psychology and cognitive science, agency refers to the capacity of an individual

play05:45

to act independently and make their own free choices. It encompasses several key components

play05:52

from intentionality, for sort, self-reactiveness, and self-reflectiveness. Crucially, agency implies

play06:04

a level of consciousness, an internal experience that goes beyond simple stimulus response

play06:11

mechanisms. So Gemini 1.5 Pro, yeah, that sounds good. Then I ask a Llamas 3 70p instructor that

play06:19

everybody loves. And Llamas 3 tells me, hey, while LLMs are incredible powerful and versatile,

play06:28

they are not inherently agentic systems. And Llamas 3 goes on and explains, here's why.

play06:37

They lack autonomy. LLMs do not operate independently and require human input,

play06:43

such as prompts or training data to generate responses. LLMs have a limited intentionality.

play06:51

LLMs do not have inherent goals or motivations. Their intentions are largely determined by their

play06:57

training objectives in the input they receive. And there is no self-efficacy. LLMs do not perceive

play07:05

themselves as capable of influencing the environment or achieving goals beyond generating text based

play07:12

on their programming and on their training. So Llamas 3 tells us, no, LLMs are no agentic systems.

play07:19

And I have here a short discussion with Llamas 3 70p and I say, but wait, LLMs can be designed

play07:28

like the latest Mistral version 3, designed to call external functions or APIs, enabling them

play07:38

to interact with other systems with their environment to receive new information or

play07:44

perform specific tasks. Is this not an agentic way?

play07:51

And Llamas 3 comes back with its own logic and says, in this sense, an LLM's function calling

play07:57

ability can be seen as a form of scripted agency where the model is programmed to execute specific

play08:04

actions or call functions in response to certain input or conditions. While this is not a true

play08:13

agency, it is a powerful tool for automating tasks. So Llamas 3, quite clear on this topic.

play08:23

Llamas 3 goes on, hey, to illustrate the difference, consider a chatbot that uses an

play08:28

LLM to generate responses. And the chatbot will be able to call external functions,

play08:34

like Mistral version 3, to retrieve information or perform tasks. But it is still a scripted system

play08:42

that lacks autonomy, intentionality, and self-efficacy. In contrast, a true agentic AI system

play08:53

would be able of making decisions, adapting to changing circumstances, and exhibiting autonomous

play09:00

behavior. And if you think here about a certain movie about the future of the machines, you do

play09:08

not want that an AI exhibits some real autonomous behavior and connect to the internet of the world.

play09:16

Okay, this was Llamas 3. And I go to Gemini 1.5 Pro and say, hey, buddy, you know me, no? So,

play09:24

can you do a scientific comparison between this function calling and LLMs that is so beautiful

play09:32

to this strange word, agentic, that everybody says is here? And Gemini 1.5 Pro then comes back

play09:39

and says, hey, the term agentic and the term function calling describe distinct capabilities,

play09:47

so they can appear intertwined in the context of LLMs. So let's do a definition. By agentic,

play09:55

being agentic, it implies having the capacity for independent action and goal-directed behavior.

play10:02

This involves a goal formation, identifying desired outcomes and formulating plans to achieve

play10:09

them. Please remember, this is so important. Formulating plans to achieve them. I come back

play10:17

later in the video to this point. Second, action selection, choosing actions that maximize the

play10:24

likelihood of a goal attainment and a self-monitoring. It can self-evaluate its progress.

play10:32

Now, you know, on a biological basis, tells me Gemini, agency is a complex emergent property

play10:39

arising from the interaction of various cognitive functions. And I thought, yeah, this is exactly

play10:45

what I'm looking for in my AI, no? It's a complex emergent property and it comes from the interaction

play10:51

of billions of little GPUs. But unfortunately, Gemini tells me, hey, the perception, the memory,

play11:00

and again, we have here the planning. So creating mental representation of future actions

play11:07

and their potential outcomes. Yeah, okay. This is not what an AI system is currently able to do.

play11:17

And Gemini goes on, while AI system can exhibit behaviors that kind of resemble agency,

play11:24

they're not truly agentic in the same way as biological systems. Because this AI system

play11:32

lacks three points, consciousness, self-preservation, and intrinsic motivation. Fascinating.

play11:41

I said, this is real nice. So coming back to the question of agentic,

play11:49

agentic represents an advanced form of intelligence characterized by goal-directed behavior,

play11:56

self-monitoring, and again, so important, adaptive planning. Gemini tells us, LLM,

play12:05

while they are impressive and beautiful, LLMs do not possess true agency.

play12:12

And function calling, what I ask here, is a powerful tool that enhances the LLM's capability

play12:19

by allowing them access to external sources and perform tasks beyond their textual limitations.

play12:24

However, function calling does not equate to agency. So I have to say,

play12:32

those LLMs are really clear what they are. Now to my green grasshoppers, yes, I know you are

play12:40

listening. This function calling, if you're not real familiar with it, it is simple. It just has

play12:46

four steps when you can do function calling under your LLM, like your Mistral version 03.

play12:53

So at first, you're not going to believe it, you have to define the functions.

play12:57

Then you specify the function schema, and we use here normally a JSON schema.

play13:04

Then we prompt our LLM with this, and then we execute the function based on a function call

play13:11

generated by the LLM. The LLM executes the function with the provided arguments and returns to the

play13:18

result. So you see, define function, specify schemas, prompt, execute. Let's have a look at

play13:26

this. What is the code? Step one, define the function. Simple. Step two, define the function

play13:35

schema. Let's go with a JSON. You have a function, you give it a name. If you are a nice coder,

play13:42

you give it a description so other people understand what it is, and you define the

play13:46

parameters. For example, you say, hey, I have here an object with a property of a location,

play13:54

and the description is the city and state, like San Francisco, California. And I have a unit,

play14:00

string, that gives me Celsius or Fahrenheit. And required is only the location. We have a schema

play14:07

defined. And then we prompt a model with this. We say, hey, the role of the user is, question,

play14:16

what is the weather like in London? And now we use here the tools command.

play14:21

And we define what tools the LLM can use. Let's say here, OpenAI chat completion. We have go with

play14:28

GPT-4. We have the functions, our tools that are defined. And now we just define one function,

play14:35

but we can define multiple functions. So if we want the system to choose the function,

play14:40

we say function call auto, or we specify our specific function. What is the weather in London?

play14:48

What is the weather in London? And then we execute the function with a function call.

play14:53

The name gets the current weather, the argument is London, and the unit is Celsius.

play14:57

And you execute the function with this command. This is it. This is function calling.

play15:05

If you want to learn more about it, I would like to recommend here

play15:08

Mistral function calling. They have a beautiful summary. Let's have a look at this.

play15:14

Mistral AI has here a real beautiful information for you. Function calling. You have a call up

play15:21

notebook you can follow along. You have a video you can follow along for Mistral specific function

play15:27

calling. And if they tell you the available models are Mistral small, large, and the Mixtral,

play15:32

now also here, the latest Mistral 7B version 3 is also doing function calling. And I tell you,

play15:40

hey, there's four steps. You have a user, a model, execute the function and generate the answer.

play15:46

Easily explained. And if you want to look at the code, they go step by step and they explain for

play15:54

each step which tools are available. They show you here the Mistral specific code for the Mistral

play16:01

specific models. If you have the tools, how you define function one, how you define function two.

play16:08

Beautiful. Explained. Go there. You have your user query. Step two, the tool choice. As I told

play16:16

you, for example, here auto. The model decides which tool to use. Any forces here to tool use

play16:23

or none prevents a tool use. Of course, you need your Mistrow key and you have your step three and

play16:32

step four explained beautifully. Now, if we go now to OpenAI, we have the same. We have here

play16:38

function calling, a beautiful description. They introduce you what is function calling in an

play16:44

IPI call. You can describe function and have the model intelligently choose to output a JSON object

play16:51

containing arguments to call one or many function. You have the ChatCompletion API does not call the

play16:58

function. Instead, the model generates JSON that you can use to call the function in your code.

play17:03

Those are the models that have these use cases. For example, create a system that

play17:10

answer question by calling external APIs, convert natural language into API calls.

play17:16

So you say, hey, who are my top customer? If you have a company AI, this is then converted into

play17:23

the function, get customer minimum revenue at the threshold created before and the limit

play17:29

and calls your external or internal API or you have structured data from a text. So you see,

play17:36

beautiful demonstration. They tell you the behavior. They even show you parallel function calling.

play17:42

Remember, you have to pay for each function calling. No, this is not a way to get you out of this.

play17:48

And they show you how to invoke multiple function call in one response. You can copy the code. It

play17:54

is immediately available for you and it all follows the same schema. There might be some small code

play18:01

adaptation if you work with OpenAI or if you work with Mistral. Never mind. Also, I would recommend

play18:09

here the literature from Oh Lama because in the new Oh Lama updates, you can work now with the

play18:17

Oh Lama raw mode and Mistral version 3 is also able to use raw mode, which is a beautiful thing

play18:25

I will do in a later video. So coming back, I know you have been waiting for Claude3 opus.

play18:34

All my subscribers say, hey, I like Claude. Here we are. And Claude tells us the concept of agency

play18:41

in AI is still an active area of research. While LLMs with function calling ability demonstrate

play18:49

impressive language understanding, they may not yet possess the full range of characteristics

play18:58

associated with agentic systems. So not yet agentic systems. So also Claude3 opos tells us

play19:05

LLMs with function calling are not agentic. And I asked Claude3 now, hey, can you tell me here

play19:13

the differences between being agentic and the function calling ability? Isn't it the same?

play19:19

And Claude comes back and says, hey, no way. I have five points where I can show you it is not.

play19:26

Let's look at autonomy. An agentic system has a higher degree of autonomy in decision making,

play19:32

whereas the LLM function calling ability is more dependent on a specific input and a predefined

play19:38

function. It has been trained, the user defined, to recognize and execute the command.

play19:45

The second point, the goal-directed behavior. Agentic systems have their own goals and can adapt

play19:52

their own behavior to achieve those goals, while LLMs with function calling ability are

play19:57

primarily focused on executing the requested function based on the input provided. Point three,

play20:04

adaptability. Agentic systems can adapt to new situations and learn from their experiences.

play20:11

LLMs with function calling ability are limited to the specific function they have been trained

play20:16

to perform and may not be able to adapt to novel situations without further training.

play20:24

Point four, self-regulation. Agentic systems can monitor their own performance and make

play20:29

adjustments to improve their effectiveness. LLMs with function calling ability do not inherently

play20:35

possess this level of self-regulation and rely on external human evaluation and fine tuning to

play20:42

improve their performance. Point five, the contextual understanding. Agentic systems have

play20:49

a deeper understanding of the context and can use this understanding to inform their decision

play20:55

and actions. LLMs with function calling ability have some level of contextual understanding

play21:01

based on the training data, but are often more limited. So this was Claude 3. And then I looked

play21:08

in total at 12 LLMs and believe me, like Gemini Flash attributing full agency to LLM is premature.

play21:17

QN, no full agentic systems. QN 110b, not qualifies agentic system. And then we have a funny one,

play21:27

one that does not exist at all, a Gemini 1.5 Pro tuned that is not official here yet. And they say

play21:33

it falls short of the criteria for an agentic system. So LLM by all means,

play21:39

every LLM tells me LLMs are no agentic systems. Yeah, yeah, this Pro tuned is secret stealth LLM.

play21:50

If you ask you the characteristics of an agentic system, it has a different twist to it. Look at

play21:56

this. It says, agentic systems are defined by goal-directed behavior, autonomy, perception,

play22:02

and action. But it's a little bit different. If you read it in detail, you will understand what I mean.

play22:10

Okay, I went back to Gemini 1.5 Pro, the new API preview, and it tells me, hey,

play22:18

function calling with LLMs still lack inherent goals or motivation. There's no long-term planning.

play22:25

There's a limited autonomy and function calling makes LLM a beautiful tool, but it does not

play22:32

transform them into independent agents. So LLMs with function calling are no agents.

play22:42

Now this planning, as you noticed, maybe since I mentioned it several times now, is of crucial

play22:49

importance for the next step, for the next evolution of AI systems.

play22:55

Agents can conceptualize future states and strategize pathways to achieve the desired

play23:01

outcomes. This involves considering different options, anticipating obstacles, and making

play23:09

decisions based on predicted consequences. This is not an easy task for an LLM.

play23:14

Now, let me give you a very simple idea of mine. Whenever an AI system encounters a specific task

play23:24

where it would need to utilize a new function, it has not been programmed, that is not in its

play23:30

memory and does not know how to connect to external sources, this AI system should be able to create

play23:38

this AI system with its given set of functions. This new adequate function,

play23:47

if it has access to a Python environment, it should be able to self-code this new function

play23:54

and integrate this synthetically generated new function in its own set of function,

play24:01

which means for me kind of a self-learning.

play24:05

This would change the game significantly, but we are not there, not at any means.

play24:13

Where are we today? Our best systems, and I will show you later on,

play24:18

they are a DAG, a Directed Acyclic Graph System. So, we started a node with a function or an

play24:25

operation or whatever you like, then either we go in A or either we go in B, and if we went to A,

play24:33

and if we went to A, then we have to, maybe the option here to go over there and perform this

play24:38

task, or maybe this node tells us, hey, you have three possibilities, given your specific input

play24:45

data or your specific threshold or whatever parameter you have, you can go either to A1,

play24:51

A2 or A3. But this is a predefined structure, and this predefined flow that some system try

play25:01

to integrate into a multiple function calling set of operation,

play25:09

it is not what we are looking for, because this is just like SAP for business finances.

play25:18

If you want to calculate your, I don't know, your profits in your company, well, you start with the

play25:25

cost, then you divide the cost, I don't know, per staff, and you divide this then by another

play25:33

criterion, you compare it here to your revenues, and somewhere down the line you have a clear

play25:39

defined flow of mathematical operation, and then the SAP system generates here the statement of

play25:46

your financial profit in the company. This is a known way how to do financial calculations.

play25:55

There is no intelligence, because I, as a programmer, I predefined here the way of the

play26:02

logic that the system should follow, and please, in business finances, there are national laws,

play26:08

so I have to make sure that all of these steps are followed and calculated. But this is not an AI

play26:15

system, this is a Python or SAP system for business finances. So you see the difference?

play26:25

But currently we are here, and we want to go over there.

play26:31

So whenever an AI system encounters a new task, now the AI, and now the first question,

play26:39

is this really the task of a large language model of Lawrence and decides what programs

play26:45

or function calling to activate in the flow? And you see, in my other videos I talked about

play26:52

causal logical reasoning so much, and now you understand why. Because this reasoning ability,

play26:58

let's say of an LLM, is an essential feature for the planning, planning the logical path of our

play27:05

AI system to the desired solution, but not a predefined flow of operation that is necessary

play27:13

if you want to go with your national accounting laws, if you want to calculate your business

play27:18

finances. So you see, where and how strong you apply AI logic is a complete different game.

play27:30

And then I know you have been waiting for this about RAG, and I know that my subscriber says,

play27:36

hey, if there's an obstacle, I just build the RAG system and RAG solves everything.

play27:42

RAG is a unique solution to the problems in the world of AI.

play27:47

Or maybe not so. Okay, so RAG, yeah, beautiful. So what is RAG? It is an information retrieval

play27:55

module and a large language model. Now we know that a large language model is not agentic.

play28:01

It is not an agent. It does not have agency. So you might say, hey, no problem than it is.

play28:09

Here the combination. Okay, let's look at the key differences between the RAG

play28:16

and a standard function calling AI system. And we look here at the dynamic knowledge integration

play28:22

and the improved factual accuracy and the relevance, the grounding, and some domain

play28:27

specific expertise. You can read this so I can tell you RAG is just not as powerful function calling

play28:37

because with function calling I can access Python system, interactive system. It can have robotic

play28:44

system that really act in the external world. With RAG, I just retrieve information. So RAG,

play28:51

let's be progressive, is a little subsystem of function calling because we only have to retrieve

play28:58

it, the augmentation that is done in the LLM and the generation that is done in the LLM.

play29:04

But function calling is a much more powerful system.

play29:09

Now there are multiple ways RAG can access this beautiful external information.

play29:17

You can have here just your local file system or you go to a network file or you go here with

play29:24

SQL or SQLite. You can have a Mongo client that you say, beautiful, this is the way.

play29:31

Or you go here with access S3 data. Or you go here and you have an FSTP protocol. Or, and this is

play29:40

normally also happening, you have an API call. So you see multiple ways that two of two external

play29:49

information can be accessed via RAG. And then, let's make it short, I ask here,

play29:55

so LLMs with their function calling abilities and RAG with their information retrieval engine,

play30:02

they are just simple AI systems that both lack agency. And I say, would you agree? And if not,

play30:10

explain in scientific terms, why not? And I asked 12 LLMs and all the systems told me this.

play30:19

And I have here LLM3 and a Gemini. And I said, yep. RAG also lack agency due to their limited

play30:29

autonomy, lack of self-awareness, and predetermined operational parameters. Gemini tells us, yes.

play30:38

RAG, also lack inherent agency in the same way humans possess it, characterizing them as simple,

play30:45

however, is an oversimplification. So Gemini said, yeah, I know they don't have agency,

play30:51

but you know it, they are not really simple. Okay, I see what you want to do, Gemini.

play30:57

And of course, if you want to experience the other 10 yourself, just say go to lmsys.org,

play31:04

put this in. This is the screenshot here from my last, what I showed you. And you have then

play31:09

LLM3 or Gemini. Do it in a blind test. You will be surprised how different models,

play31:16

different LLMs answer. A different kind of pathway of the argumentation,

play31:21

but the result is always the same. RAGs are not agentic. That's it.

play31:29

So here we go. What we have proven until now, there are no agentic LLMs. There is no agency

play31:36

to a RAG system. So you can go back to your professor and say, you know that you are wrong.

play31:41

So what is it that we are looking for? And what is the cool new buzzword that we should use that

play31:48

somebody watches here, some new, I don't know, information on some platform?

play31:54

And are we interested in this or do we all understand that these are just buzzwords

play31:58

for marketing like agency and agentic? Or do we take the step and say, hey, those terms have

play32:05

a scientific meaning and it's important to understand because they open up a new avenue

play32:12

for our development. And I go, of course, with the second one and I say, have you noticed that

play32:19

in the whole session up until now, planning seems to be the most emergent functionality of the

play32:28

system we have to talk about? And from Lillian Wank here, this is a beautiful, very old statement.

play32:35

What is an agent? And she did an equation. An agent is an LLM that has quite a long memory,

play32:43

internal memory structure with 2 million token context length, can use tools like our function

play32:50

calling abilities, and has a plan what to do if given a new task. And this is this beautiful

play33:00

planning. So you see, it was all the time there. Just all this marketing really shifted our focus

play33:10

where it should be. Now have a look at this. Today, we think that the LLM is the main course,

play33:18

the main source of logic, that the causal reasoning of an LLM is essential for the planning

play33:24

strategy of the whole agent system, if it would be an agent.

play33:32

But there's also another option we are now starting to explore in AI research,

play33:38

that a plan what to do is not the LLM, but is a world model that is based on scientific laws

play33:45

under the laws of nature. So hey, this is interesting. So we have now two kind of

play33:54

intelligent system. The rest is just tool use, just planning here, new steps.

play34:03

So let's look at the key aspect of planning in a theoretical true AI agent. We have goal setting,

play34:12

strategy development, action sequencing, like in robotics, and then the execution.

play34:19

And yeah, you see it all goes to robotics, to visual language model that interact in an external

play34:27

environment. So goal setting is easy, establish specific objectives that the agent aims to achieve,

play34:34

strategy develop, you create a plan, planning strategy, outline the step necessary, okay?

play34:41

The action sequencing is nice, because remember the action sequencing, the LLM would know

play34:46

theoretically what to do, but it doesn't understand why. So there's some treasure hidden here,

play34:54

and the execution. The importance of the new planning functionality in our new real AI agents,

play35:03

it would have to increase here the efficiency, the effectiveness, the adaptability,

play35:09

and it will reduce errors. And I've given you here examples in logic, healthcare, stock trading,

play35:15

and manufacturing. And you might say, but wait a second, what exactly is now the difference

play35:21

between planning and function calling in LLMs? Are they not somehow related? Because if you have

play35:28

a set of 10 functions, and this LLM learned to differentiate when to call each specific function,

play35:36

and in what order to call each specific function in a kind of time series,

play35:42

no, there is some underlying difference. Look at this. We look here at predefined functions

play35:48

by the user, function calling framework structure, and the limitations in its adaptability.

play35:55

So the function need to be defined by coders, by developers beforehand. I define the function

play36:02

that my system needs for a task. So the function only actions the LLM can call during its operation,

play36:09

and we have a training and fine tuning that if I have a natural language command,

play36:14

the LLM understands, hey, this means I have to issue an API call to the weather station.

play36:19

The LLM uses the function call to interact with external system and databases. But those

play36:25

interactions are encapsulated and predefined within the defined functions. So there is nothing new to

play36:31

this system. It's just a mechanically predisposed system. And LLMs cannot improvise new action

play36:38

beyond the defined function. And this is a strong limit in its ability to adapt to new tasks.

play36:44

Now, it's interesting if you look in the future, how can we enhance this planning capabilities?

play36:49

What if we move beyond an LLM as the logical central switching board?

play36:56

You can expand the function libraries. Like, I don't know, somewhere on an index, you say, hey,

play37:03

this is now a database, and I have here 10,000 functions defined. So, let's look at this.

play37:11

10,000 functions defined. And I give you the description of what a function can do and what

play37:18

it should have as an input and what it should generate as an output. But you understand this

play37:24

is, again, kind of a database system. Creating function from a variety of related tasks,

play37:31

the dynamic function integration. The system can be designed by training and learning a lot of

play37:37

training data to dynamically integrate now these new 1,000 functions. This is also not really a

play37:46

sign of intelligence. And even if you combine it in a hybrid model, where you have a static

play37:52

and a dynamic planning, where a little bit of intelligence shines through in this dynamic,

play37:57

it is still not really what we want. So, here we have now from the end of May 2024,

play38:04

a beautiful new archive preprint, agent planning. Now, not with an LLM, but with a world knowledge

play38:12

model. And they say, hmm, currently, if we look here at the graph structure, it must not be a

play38:20

DAG. We have the agent model, you know, one action, another action, make a decision, go there,

play38:28

find out, no, not working. We have hallucination somewhere and we have a wrong link and whatever.

play38:34

This is a trial and error and this is really a problem. And they propose here, hey,

play38:42

in our LLMs, we have the verbal world knowledge, but we do not know how to execute it.

play38:52

So, we transfer this to a state knowledge and this state knowledge, we can build here a

play39:00

graph structure from our semantic knowledge and then with the nodes and edges and probability

play39:06

to travel along specific edges and the properties of certain nodes, we build here a new world view

play39:16

model in the graph structure where we have kind of predefined correct path to the solution.

play39:25

Of course, they need here to integrate real sensory data, continuous data that build up here

play39:33

as a copy here from the real world scenario, if you have a robot moving in a room, to build

play39:39

out a virtual twin, this virtual scenario of the world model, but currently we only have the

play39:46

description of our world in our LLMs, not the understanding. When they thought, hey, is it

play39:53

possible to use the verbal information, what to do, how the world works that we have in our LLM

play40:03

and kind of build an action model. Interesting. Another literature I would like to recommend to

play40:11

you, although it's not really the absolute latest, it is from February 2024, this is a survey done

play40:19

here from some beautiful institutions, understanding the planning of LLM agents, a survey,

play40:27

and if you want to see here the taxonomy of an LLM agent planning environment,

play40:33

it gives you a very good overview and you are amazed how many system elements we already have,

play40:41

we already have kind of operational code, but we have not yet put it together.

play40:48

So interesting literature if you're interested in this topic, I think, or if you just do your

play40:54

PhD thesis currently somewhere at the university on this beautiful planet, think about those topics

play41:00

because those are topics that will be important in the next month. And another from end of May 2024,

play41:08

you see here LLM plus reasoning capability, causal reasoning, logical reasoning, plus the planning

play41:16

here in the presence of our APIs. Also an interesting publication if you want to have

play41:22

a little bit more of a deep dive if you want to see where the current research is happening right

play41:27

now. So there you have it. We do not have an agentic AI today, but we understood what all the

play41:36

systems lack, what they are not able to do, and therefore we know on what elements to focus on

play41:43

for the future of AI. And you know, I think agentic AI systems are not that far off in the future,

play41:54

maybe just years.

play41:59

And at the end of this, you know, short personal idea, I want to show you where is code today,

play42:04

and I really mean today. And if we go here, for example here, Open AI agent query planning,

play42:12

there are tools. And the Open AI function agent with a query planning tool is already there.

play42:22

And if you go there, LLAMA index, here is your link. But if you look at this, you understand

play42:27

the shortcomings immediately. Look at this. We have here an engine, the name is called September

play42:35

2022. Provide information about quarterly financial data. Then we have another engine,

play42:42

and the name is now June 2022. Provides information about the quarterly financial,

play42:48

yes. And then we have a third engine, and you get the idea.

play42:53

And then we have here this beautiful new code, and it says, yeah, the query plan tool,

play42:58

so the real complex of planning. It has three tools, and we have here the query tool for

play43:04

September, the query tool for June, and the query tool for March of 2022. And you might say, yeah,

play43:10

but this is just, I mean, yeah, okay. And then you have here the query plan tool metadata to

play43:18

Open AI tool, and you say, my goodness, this sounds so intelligent, but you know what it actually does?

play43:27

If you have defined your agent with Open AI agent from tools, and then you have your function

play43:32

calls, and then you have here your LLM that you use, and you have defined here your query plan

play43:38

tool with your three, and then you say, hey, now, agent query, what are the risk factors in, and now

play43:45

September 2022, and you say, my goodness, this is intelligent. And then you hard code here from Lama

play43:52

index core tool, query plan, import the query plan and the query node, and you have your query

play43:58

plan with the node, and you have the query node one, risk factors, and the tool name is now.

play44:03

If you want to answer the question, what is the risk factor in September 2022,

play44:07

you go with the tool September 2022. Okay. Now, this is, of course, just a demonstration of the

play44:17

code, and they say, yeah, of course, this should be a DAX system, and you have to start with one

play44:25

code, of course, and you go here with the first code. Okay. But you know, the real power of a graph

play44:31

interaction is not yet there in this open source community, but they are trying to go there.

play44:42

But, however, if you read a statement here, like in the Lama index, talking about

play44:47

agentic strategies, you have now for yourself to decide, is this just the marketing term

play44:55

or is it really the scientific term that really provides an avenue to the future of AI?

play45:02

Because if you say here, like in agentic strategies here, in the official page here,

play45:07

that a lot of modules like routing or query transformation and more are already agentic

play45:14

in nature, in that they use LLMs for decision making, this is not

play45:21

agentic. This is simply, plainly, scientifically wrong. This statement is not correct.

play45:28

This statement is a marketing statement, but you should be able to decide,

play45:36

or if you see this, that you understand exactly that this is not an agentic strategy at all.

play45:44

But I hope I've shown you in this video possible avenues to the future development of agentic

play45:51

strategies. But we should not give ourselves away this opportunity, because there are some crazy

play45:58

marketing whatever, just to get your attention that we should lose the focus what is really

play46:06

necessary to achieve an agentic AI system in the real scientific terms and in the real scientific

play46:14

world. I hope you had a little bit fun today. You enjoyed it. You had maybe some new thoughts

play46:21

about something and it would be great to see you in my next video.

Rate This

5.0 / 5 (0 votes)

Related Tags
人工智能代理性LLM自主性目标导向功能调用智能策略世界模型规划能力科学研究技术发展
Do you need a summary in English?