Harrison Chase - Agents Masterclass from LangChain Founder (LLM Bootcamp)

The Full Stack
25 May 202325:47

Summary

TLDRThe speaker discusses the concept of agents in language models, focusing on their use as reasoning engines to interact with external tools. They cover prompting strategies like React, challenges in implementing agents, and recent advancements in memory and personalization. Projects like Auto GPT, Baby AGI, Camel, and Generative Agents are highlighted for their contributions to agent development.

Takeaways

  • 🧠 Agents are considered the most interesting aspect of language chains due to their non-deterministic nature and ability to interact with the outside world based on user input and previous actions.
  • 🔧 The use of agents is tied to tool usage, connecting to external data sources or computational tools like search APIs and databases to overcome language models' limitations.
  • 💡 Agents offer flexibility and power, allowing for better error recovery and handling of multi-hop tasks through their reasoning capabilities as opposed to predetermined step sequences.
  • 🔄 The typical implementation of agents involves using a language model to choose a tool and input, execute the action, and feed the observation back into the model until a stopping condition is met.
  • 📚 The REACT method (Reasoning and Acting) is a prominent strategy for implementing agents, combining chain of thought reasoning with action-taking to improve reliability and effectiveness.
  • 🛠 Challenges with agents include appropriate tool usage, avoiding unnecessary tool use in conversational contexts, parsing agent instructions into executable code, and maintaining memory of previous steps.
  • 🔑 Tools descriptions and retrieval methods are crucial for agents to understand when and how to use various tools effectively.
  • 🔄 Output parsers are used to convert the agent's string output into actionable code, addressing issues like misformatting and missing information.
  • 🔑 Long-term memory is vital for agents dealing with long-running tasks, and methods like using a retriever vector store have been introduced to handle this.
  • 🤖 Recent projects like Auto GPT, Baby AGI, CAMEL, and Generative Agents have built upon the REACT style agent framework, introducing concepts like long-term memory, planning vs. execution separation, and simulation environments.
  • 🔬 The concept of reflection in agents, where they review past actions and update their state, is a recent development that could generalize to various applications and improve agent reliability.

Q & A

  • What is the core idea of agents in the context of language models?

    -The core idea of agents is to use the language model as a reasoning engine. This means the language model determines actions and interactions with the outside world based on user input and results of previous actions, rather than following a hard-coded sequence of actions.

  • Why are agents considered more flexible and powerful than traditional language models?

    -Agents are considered more flexible and powerful because they can recover from errors better, handle multi-hop tasks, and act as a reasoning engine. This allows them to adapt their actions based on user input and the outcomes of previous actions, making them more dynamic and responsive.

  • What is the significance of tool usage in the context of agents?

    -Tool usage is significant because it allows agents to connect to other sources of data or computation, overcoming limitations of language models such as lack of knowledge about specific data or poor mathematical capabilities. This integration enhances the agent's ability to perform tasks and provide more accurate responses.

  • How does the REACT method improve the reliability of agents?

    -REACT (Reasoning and then Acting) is a prompting strategy that combines reasoning with action-taking. It helps agents think through steps and then take actions based on real data, improving the reliability of their responses and actions. This method allows agents to arrive at more accurate and reliable answers by integrating reasoning and tool usage.

  • What are some challenges in getting agents to work reliably in production?

    -Challenges include getting agents to use tools appropriately, avoiding unnecessary tool usage, parsing the language model's output into actionable code, remembering previous steps, and evaluating the agent's trajectory and efficiency. These challenges affect the agent's ability to perform tasks reliably and efficiently in real-world applications.

  • How does the concept of memory play a role in the functionality of agents?

    -Memory is crucial for agents as it allows them to recall previous interactions and steps, which can inform their current actions. This can include remembering user interactions, AI-to-tool interactions, and personalizing the agent's behavior over time. Memory helps in maintaining context and continuity in agent-based systems.

  • What is the role of tool descriptions in helping agents decide when to use tools?

    -Tool descriptions provide context and information about the capabilities and limitations of each tool. This helps the agent understand when and how to use specific tools to overcome its limitations and perform tasks more effectively.

  • How can retrieval methods help in managing the complexity of tool usage by agents?

    -Retrieval methods, such as embedding search lookup, can help agents manage the complexity of tool usage by retrieving the most relevant tools based on the task at hand. This can reduce the need for lengthy tool descriptions in the prompt and make the agent's decision-making process more efficient.

  • What are some strategies to prevent agents from using tools unnecessarily?

    -Strategies include providing instructions in the prompt that remind the agent it doesn't always need to use tools, adding a tool that explicitly returns to the user, and using output parsers to correct mistakes in tool usage. These strategies help keep the agent focused on the task and prevent unnecessary tool usage.

  • How can the concept of reflection be beneficial in the context of agent-based systems?

    -Reflection allows agents to review their recent actions and update their understanding of the world or task at hand. This can help in maintaining focus, improving decision-making, and adapting to new information. It is particularly useful in long-running tasks where continuous learning and adaptation are necessary.

Outlines

00:00

🤖 Introduction to Agents in Language Models

The speaker begins by introducing the topic of 'agents' in the context of language models. They highlight the interest in agents due to their dynamic and non-deterministic nature, which allows for flexible interaction with the user and the environment. The core idea is to use language models as reasoning engines to determine actions based on user input and previous actions. The speaker also touches on the benefits of agents, such as their ability to use tools, connect to external data sources, and handle multi-hop tasks more effectively than traditional language models.

05:01

🔍 The REACT Method and Tool Usage in Agents

This paragraph delves into the REACT (Reasoning and Acting) method, a prominent strategy for implementing agents. The speaker discusses how REACT combines reasoning with action-taking, utilizing language models to decide on the appropriate tools and actions. They provide an example from the Hotspot QA dataset to illustrate the effectiveness of REACT in answering multi-hop questions. The paragraph also addresses the challenges of making agents reliable, especially in production environments, and the importance of tool descriptions and retrieval in guiding agents to use tools effectively.

10:06

🛠️ Challenges in Agent Implementation and Tool Use

The speaker outlines several challenges in implementing agents, focusing on the appropriate use of tools. They discuss the need for agents to understand when to use tools and when not to, emphasizing the importance of instructions and tool descriptions. The paragraph also covers the technical challenge of parsing language model outputs into actionable code, introducing the concept of output parsers. Additionally, the speaker mentions the issue of agents remembering previous steps, suggesting the use of retrieval methods to manage long-running tasks and maintain context.

15:07

🏗️ Building and Evaluating Agent Systems

This paragraph discusses the complexities of building and evaluating agent systems. The speaker highlights the need for agents to stay on track and reiterate objectives, suggesting methods like reiterating the objective before each action and separating planning and execution steps. Evaluation methods for agents are also explored, including assessing the correctness of the final answer and evaluating the efficiency and correctness of the agent's trajectory. The speaker emphasizes the importance of memory in agent systems, both for tool interactions and user interactions.

20:09

🌐 Recent Advances in Agent Research and Projects

The speaker reviews recent projects and papers in the field of agents, noting their contributions to the development of agent systems. They mention Auto GPT, which introduces long-term memory for agent-tool interactions, and Baby AGI, which separates planning and execution steps. The Camel paper is highlighted for its use of a simulation environment for agent interaction, and Generative Agents is noted for its complex simulation environment and memory retrieval system. The speaker also discusses the concept of reflection in agent systems, where agents reflect on their actions and update their states accordingly.

25:11

📚 Conclusion and Invitation for Questions

In the final paragraph, the speaker concludes their presentation and invites the audience to ask questions. They express their readiness to engage in a discussion, indicating that they are open to exploring topics in greater depth and addressing any queries the audience might have. The speaker also acknowledges the time constraint, showing their consideration for the audience's time and the format of the presentation.

Mindmap

Keywords

💡Agents

Agents in this context refer to AI systems that can autonomously determine actions based on user input and previous actions. They are central to the video's theme as they represent a dynamic approach to AI interaction. The script discusses how agents use language models as reasoning engines to interact with the world, making decisions on what actions to take, which is a departure from hardcoded, deterministic sequences.

💡Tool Usage

Tool usage is a concept closely tied to agents, where they connect to external data sources or computational tools like search APIs and databases. This is crucial for overcoming limitations of language models, such as their lack of knowledge about specific user data or their limited mathematical capabilities. The script highlights the importance of tool usage in agents to enhance their functionality and flexibility.

💡React

React, which stands for Reasoning and Acting, is a prompting strategy that accelerates and makes the use of agents more reliable. It is a method discussed in the script for implementing agents, combining reasoning with action-taking. The script uses React as an example to illustrate how agents can effectively interact with external tools and data to provide accurate responses.

💡Chain of Thought

Chain of Thought is a reasoning method associated with agents, where they think through steps sequentially to reach a conclusion. This method is mentioned in the script as a way to improve the quality and reliability of agent responses. It helps agents to think more deeply about the task at hand, leading to more accurate outcomes.

💡Multi-hop Tasks

Multi-hop tasks are complex tasks that require multiple steps or queries to complete. The script mentions this in the context of interacting with SQL databases, where an agent might need to execute multiple queries to synthesize a response. This concept is important as it showcases the flexibility and power of agents in handling complex tasks.

💡Language Models

Language models are AI systems that generate human-like language. In the script, they are described as the core of agents, used as reasoning engines to determine actions and interactions. The limitations of language models, such as their inability to perform advanced math or access specific user data, are also discussed, highlighting the need for agents to use tools to overcome these limitations.

💡Reliability

Reliability in the context of the script refers to the consistency and dependability of agents in performing tasks and providing accurate responses. The script discusses challenges in achieving reliability in agents, such as ensuring they use tools appropriately and recover from errors effectively. This is a key theme as it addresses the practical application of agents in real-world scenarios.

💡Memory

Memory in the script is discussed in relation to agents remembering previous steps and interactions. It is crucial for agents to maintain context and continuity in their tasks. The script mentions the use of memory to improve the performance of agents, especially in long-running or complex tasks, by recalling past actions and observations.

💡Output Parsers

Output parsers are tools mentioned in the script that convert the string output from language models into actionable code or commands. They are essential for translating the agent's decisions into executable actions, such as invoking specific tools or APIs. The script discusses the challenges and strategies in parsing agent outputs effectively.

💡Evaluation

Evaluation in the script refers to the methods used to assess the performance and effectiveness of agents. It is a critical aspect discussed in the script, as it helps in understanding whether the agents are achieving their goals and taking the correct steps. The script mentions evaluating not only the final result but also the efficiency and correctness of the agent's trajectory.

Highlights

Agents are considered the most interesting aspect of the language chain.

Agents use language models as reasoning engines to determine actions and interactions with the outside world.

The non-deterministic nature of agents allows for flexible and powerful responses to user inputs.

Agents can overcome limitations of language models by connecting to external data sources and computation tools.

Tool usage is a key feature of agents, enhancing their capabilities beyond predefined steps.

The REACT method (Reasoning and Acting) is a prominent strategy for implementing agents, combining reasoning and action-taking.

Challenges with agents include ensuring reliable tool usage and managing errors in tool selection or execution.

Memory management is crucial for agents to remember previous steps and maintain context over long-running tasks.

The importance of tool descriptions and retrieval for agents to effectively select and use tools.

Strategies to prevent agents from using tools inappropriately, such as conversational scenarios.

Parsing language model outputs into actionable tool invocations presents a significant challenge.

The use of output parsers to handle and correct errors in agent responses.

Evaluation of agents is complex, involving both the final result and the efficiency and correctness of the steps taken.

The role of memory in personalizing agents and giving them a sense of long-term objectives.

Recent projects like Auto GPT and Baby AGI build upon the REACT framework, introducing long-term memory and planning.

The use of simulation environments in agent research for evaluation and interaction studies.

Innovative approaches in generative agents for memory retrieval and reflection, influencing future agent behavior.

Transcripts

play00:00

um so so I'll be talking about agents um and uh  yeah there are many things in in Lane chain but  

play00:07

I think agents are probably the most interesting  one to talk about so that's what I'll be doing  

play00:10

I'll cover kind of like what our agents why  use agents the typical implementation of Agents  

play00:16

talk about react which is one of the first uh  prompting strategies that really accelerated and  

play00:22

made reliable the use of Agents um then I'll  then I'll talk a bunch about challenges with  

play00:27

with agents and challenges of getting to them to  work reliably getting them to work in production  

play00:32

um I'll then touch a little bit on memory and  segue that into more recent kind of like papers  

play00:37

and projects that do agentic things um I'll  probably skim over the initial slides because  

play00:44

I think most people here are probably familiar  with the idea of Agents but the the core idea  

play00:49

of Agents is using the language model as as a  reasoning engine so using it to determine kind  

play00:55

of like what to do and how to interact with the  outside world and and this means that there is a  

play01:00

non-deterministic kind of like sequence of actions  that'll be taken depending on the user input so  

play01:05

there's no kind of like hard-coded do do a then  do B then do c rather that the agent determines  

play01:11

what actions to do depending on the user input  and depending on results of previous actions  

play01:18

so why why would you want to do this in  the first place um so so one there's this  

play01:23

very tied to agents is the idea of tool usage  and connecting it to the to the outside world  

play01:29

um and so connecting it to other sources of data  or computation like search apis databases these  

play01:33

are these are all very useful to overcome some  of the limitations of language models um such  

play01:38

as they don't know about your data they can't do  math amazingly well um but but those the idea of  

play01:43

tool usage isn't kind of like unique to agents  you can still use tools you can still connect  

play01:48

llms to search engines without using an agent  so so why use an agent and I think some of the  

play01:53

benefits of agents are that they're they're more  flexible they're more powerful they allow you to  

play01:58

kind of like recover from errors better they allow  you to handle kind of like multi-hop tasks again  

play02:03

with this idea of being the reasoning engine and  so an example that I like to use here is thinking  

play02:07

about like interacting with a SQL database you can  do that in a sequence of predetermined steps you  

play02:14

can have kind of like a natural language query  which you then use a language model to convert  

play02:18

to a SQL query which you then pass or you then  execute that get back a result pass that back  

play02:24

to the language model ask it to synthesize it  with respect to the original question and get  

play02:29

kind of this natural language wrapper around  a SQL database very very useful by itself but  

play02:33

there are a lot of edge cases and things that  can go wrong so there could be an error in the  

play02:37

SQL query maybe it hallucinates a table name or  a field name maybe it just writes incorrect SQL  

play02:43

and then there are also queries that need multiple  kind of like queries to be made under the hood in  

play02:48

order to answer and so although kind of like  a simple chain can can you know handle maybe  

play02:55

I don't know 50 80 of the use cases you very  quickly run into these like edge cases where a  

play03:01

more flexible framework like agents helps kind of  like circumvent so the typical implementation of  

play03:08

Agents generally and it's so early in this field  that it kind of feels a bit weird to be talking  

play03:11

about a typical implementation because I'm sure  we'll see a bunch of different variants but but  

play03:15

generally you get a user query you use the the llm  that's that's the agent to choose a tool to use  

play03:23

um you then and also the input to that tool you  then you then do that you take that action you  

play03:28

get back an observation and then you feed that  back into the language model and and you kind of  

play03:33

continue doing this until a stopping conduction  is met and so there can be different types of  

play03:37

stopping conditions probably the most common is  the language model itself realizes hey I'm done  

play03:42

with this one with this task with this question  that someone asked of me I should now return to  

play03:47

the user but there can be other more hard-coded  rules and so when we talk about like reliability  

play03:52

of Agents some of these can really help with that  so you know if an agent has done five different  

play03:57

steps in a row and it hasn't reached a final  answer it might be nice to have it just return  

play04:01

something then there are also certain tools that  you can just like return automatic so so basically  

play04:07

the general idea is choose a tool to use observe  the output of that tool and kind of continue going  

play04:13

um so how do you actually like so that was  pseudocode now let's talk about the actual kind of  

play04:18

like ways to get this to do to do what we want um  and the the first and still kind of like the main  

play04:27

um way of doing this the main prompting strategy  slash algorithm slash method for doing this is  

play04:33

react which stands for reasoning and then acting  so re from reasoning act from acting and it's the  

play04:39

the paper is a great paper came out in I think  October out of Princeton of our synergizing  

play04:46

two different kind of like methods um and we  and we can we can look at this example which  

play04:50

is which is taken from their paper and see  why it's so effective um so this uh example  

play04:56

comes from the the hotspot QA data set which is  basically a data set where it's asking questions  

play05:01

over Wikipedia Pages where there are multi-hop  usually kind of like two or three intermediate  

play05:08

questions that need to be reasoned about before  answering so we can see so so here there's this  

play05:13

question aside from the Apple remote what other  device can control the program Apple remote  

play05:18

Apple remote was originally designed to interact  with and so the most basic prompting strategy is  

play05:24

kind of just pass that into the language model  and get back an answer and so we can see that in  

play05:28

in 1A standard and it just returns uh you know  a single answer and we can see that it's wrong  

play05:33

another method that had emerged maybe like a  month or so prior was the idea of like Chain  

play05:38

of Thought reasoning um and so this is usually  uh associated with the let's think step by step  

play05:44

prefix to the response and you can see in red  that there's this like Chain of Thought thing  

play05:50

that's happening as it thinks through step by step  um and uh it returns an answer that that's also  

play05:57

incorrect In this case this has been shown to kind  of like get the agent or get the language model to  

play06:04

to think a little bit better so to speak so it's  yielding higher quality more reliable results  

play06:12

the issue is it still only knows kind of like  what is actually present in the data that the  

play06:17

language model is trained on and so there's  there's another technique that came out which  

play06:20

is basically action only where you give it uh  access to different Tools in this case I think  

play06:26

all the examples in in this uh picture are of  search but it there's I think in the paper it  

play06:32

had search and then look up and so you can  see here that the language model outputs  

play06:38

um kind of like search Apple remote it looks up  Apple remote it gets back in observation it then  

play06:42

does search front row confines that so that's an  instance of the kind of like recovering from an  

play06:48

error and then it does search front row software  finds uh an answer and then and then finishes  

play06:57

and you can see here that the uh the the output  that it gave yes um kind of loses some of what  

play07:05

it was actually supposed to answer so you've  got this Chain of Thought reasoning which helps  

play07:10

the language model think about what to do then  you've got this action taking step that basically  

play07:16

actually allows it to plug into more kind of like  sources of real data what if you combine them and  

play07:22

that's the idea of react and and that's the idea  of a lot of the the popular agent Frameworks  

play07:28

because again agents use a language model as  a reasoning engine so you want it to be really  

play07:34

good at reasoning so if there are any prompting  techniques you can use to improve its reasoning  

play07:39

you should probably do those and then the big  part of it is also connecting to the tools and  

play07:43

that's where action comes in and so you can see  here it arrives at kind of like the final answer

play07:50

so that's the that's the idea of Agents react is  still one of the more popular implementations for  

play07:55

it but what are some of the current challenges  and and there are a lot of challenges like I  

play08:00

I think most agents are not amazingly production  ready at the moment um and and and these are some  

play08:08

of the reasons why and we can we can I'll walk  through them in a bunch more detail I'll also  

play08:15

leave a lot of time that ends for the question so  I'd love to hear kind of like what you guys are  

play08:19

observing as as issues for getting agents to work  reliably this is probably far from a complete list  

play08:25

so the most basic challenge with agents is  getting them to use tools in appropriate scenarios  

play08:30

um and so you know in in the react paper they  address this challenge by bringing in the the  

play08:37

reasoning aspect of it the Chain of Thought style  prompting asking it to think um kind of like  

play08:43

common uh ways to do this include just like saying  in the instructions you have access to these tools  

play08:48

you should use these to overcome some of your  limitations and and just basically instructing the  

play08:53

language model tool descriptions are really really  important for this if you want the agent to use a  

play08:59

particular tool it should probably have enough  context to know that this tool is good at this  

play09:04

and that that generally comes in the form of like  tool descriptions or some some information about  

play09:09

the tool that's passed into the prompt um that  can that can maybe not scale super well if you've  

play09:15

got a lot of tools because now you've got maybe  these more complex descriptions you want to put  

play09:19

them in the final prompt for the language model  to know what to do with them but you can quickly  

play09:25

run into a kind of like context length issues  and so that's where I think the idea of tool  

play09:29

retrieval comes in handy so you can have hundreds  thousands of tools you can take in you can do some  

play09:35

retrieval step and I think retrieval is another  really interesting topic that I'm not going to  

play09:38

go into too much depth here so for the sake of  this we'll just say it's some embedding search  

play09:42

lookup although there's I think there's a lot more  interesting things to do there you basically do  

play09:46

the retrieval step you get back 5 10 however  many tools that you think are most promising  

play09:50

you can then pass those to the prompt and kind  of have the language model take the final steps  

play09:55

from there a few shot examples I think can also  be really helpful so you can use those to guide  

play10:00

the language model in what to do again I think  the idea of retrieval to find the most relevant  

play10:05

few shot examples is particularly promising  here so if you give it examples similar to the  

play10:09

one it's trying to do those help a lot better than  random examples and then probably the most extreme  

play10:15

version of this is fine tuning the model like like  tool former to really help with tool selection  

play10:21

um there's also a a subtle Second Challenge  which is getting them not to use tools when  

play10:26

they don't need to so a big use case for agents is  having conversational style agents one of the big  

play10:31

problems that we've seen is oftentimes these types  of Agents just want to use tools no matter what  

play10:36

even if they're having a conversation and so again  like the most basic thing you can do is probably  

play10:41

put some some information in the instructions some  reminder in the prompt like hey you don't have to  

play10:47

use a tool you can respond to the user if it seems  like it's more of a conversation that that can get  

play10:53

you so far another kind of like clever hack  that we've seen here is add another tool that  

play10:57

explicitly just returns to the user and then you  know they they like to use tools so they'll but  

play11:03

they'll usually use that tool um so so I thought  that was a pretty uh clever and interesting hack  

play11:09

um a third challenge is the language models  tell you what tool to use and how to use it  

play11:16

um but that's in the form of a string and so  you need to go from that string into some code  

play11:21

or something that can actually be run and so that  involves parsing kind of like the output of the  

play11:25

language model into this this tool invocation  and so um some some tips and tricks and hacks  

play11:30

here are one like the more structured you ask for  the response the easier it is to parse generally  

play11:34

so language models are pretty good at writing uh  Json so we've kind of transitioned a few of our  

play11:40

agents to use that schema still doesn't always  work especially some of the chat models like to  

play11:47

add in a lot of uh kind of like language so we've  introduced uh kind of like this concept of output  

play11:56

parsers which generically encapsulate all the  logic that's needed to parse this response and  

play12:01

we've tried to make that as modular as possible  so if you're seeing areas you can hopefully kind  

play12:06

of like substitute that out very related to  that we also have a concept of like output  

play12:11

parsers that can retry and fix mistakes so and  I think there's there's some subtle uh there's  

play12:18

some subtle differences here that I think are  really cool basically like you know if you have  

play12:22

misformatted schema you can fix that explicitly  by just passing it the the output and the error  

play12:29

and saying fix this response but if you have uh  if you have an output that just forgets one of  

play12:35

the fields like it Returns the action but not  the action input or something like that you  

play12:39

need to provide more information here so I think  there's actually a lot of subtlety in fixing some  

play12:43

of these errors but the basic idea is that you  can try to parse it if you if it fails you can  

play12:48

then try to fix it all this we we currently  encapsulate in this idea like output parsers  

play12:55

um so the the fourth challenge is getting them  to remember previous steps that were taken the  

play12:59

most basic thing the thing that the react paper  does is just keep a list of those steps in memory  

play13:05

um again that starts to run into  some some context uh window issues  

play13:10

um especially when you're dealing with long  running tasks and so the thing that we've  

play13:14

seen done here is again fetch previous steps with  some retrieval method and put those into context  

play13:19

usually we've actually seen a combination of  the two so we've seen using the the N most  

play13:24

recent actions and observations combined with  the K most relevant actions and observations  

play13:33

um incorporating long observations is another  really interesting one this actually comes up  

play13:37

or this came up a lot when we were dealing  with working with apis because apis often  

play13:42

return really big Json blobs that are really  big and hard to put in context um so the the  

play13:48

most common thing that we've done here is just  parse that in some way you can do really simple  

play13:54

stuff like convert that blob to a string and put  the first like thousand characters or something  

play13:58

as the response you can also do some more if if  you know that you're working with a specific API  

play14:03

you can probably write some custom logic to kind  of like take only the relevant keys and put that  

play14:09

if you want to make something general you can also  maybe do something dynamically to like figure out  

play14:14

what key like basically explore the Json object  and figure out what keys to put in that's a bit  

play14:19

more exploratory I would say but the basic idea  is yeah there is this issue of um and so uh zapier  

play14:27

I always have to think about how to pronounce  it but zapier makes you happier so zapier when  

play14:30

they did this with their natural language API not  only did they have something before the API that  

play14:37

was like natural language to some API call they  also spent a lot of time working on the output  

play14:41

and so the output is actually very specifically  I think it's like under like 200 or 300 tokens or  

play14:46

something like that and they did that on purpose  they spent a lot of time thinking about that and  

play14:49

so I I think for Tool usage that is that is really  important as well another more kind of like uh  

play14:56

exploratory way of doing this is also you could  perhaps just store the the long output and  

play15:01

then do retrieval on it when you're when you're  trying to think of like what next steps to take  

play15:07

um agents can often go off track  especially in long-running things  

play15:12

um and so there's kind of like two methods that  I've seen to kind of like keep them on track one  

play15:16

you can just reiterate the objective  uh right before it makes each action  

play15:21

um so and why this works I think we've seen that  with at least with a lot of the current models  

play15:26

with instructions that are earlier in the prompt  it might forget it by the time it gets the end if  

play15:31

it's a really long prompt so putting it at the end  seems to help and then another really interesting  

play15:35

one that I'll talk about when when I talk about  some of the more recent papers and stuff that have  

play15:40

come out is this idea of separating explicitly a  planning and execution step and basically have one  

play15:47

step that explicitly kind of thinks about these  are kind of like all the objectives that I want  

play15:52

to do at a high level and then a second  step that says okay given this objective  

play15:57

given this one sub-objective now how do I do  this one sub-objective and basically break it  

play16:01

down even more in a hierarchical manner  and there's a there's a good example of  

play16:05

that with baby AGI which I'll talk about in  a bit and another big issue is evaluation of  

play16:10

these things I think evaluation of language  models in general very difficult evaluation  

play16:15

of applications built on top of language models  also very difficult and agents are no exception  

play16:22

I think there's the obvious kind of like evaluate  whether it arrived at the correct result in terms  

play16:28

of in terms of getting metrics on evaluation and  so um yeah you know if you're asking the agent to  

play16:33

produce some answer that's like a natural language  response there's techniques you can do there in  

play16:39

a lot of them in the flavor of asking a language  model to score the expected answer and the actual  

play16:45

answer and come up with some grade and stuff like  that that that applies to the output of Agents as  

play16:49

well but then there's also some agent specific  ones that I think are really interesting mostly  

play16:53

around evaluating this idea of like the agent  trajectory or the intermediate steps and so uh  

play16:59

where we'll actually have something coming out for  this so someone opened a PR that I need to get in  

play17:06

but basically there's there's a lot of like little  different things you can look at like did it take  

play17:09

the correct action is the input to the action  correct um is it the correct number of steps and  

play17:14

by this uh you know like sometimes you and this  is very related to the next one which is like the  

play17:19

most efficient sequence of steps and so there's  a bunch of different things that you can do to  

play17:22

evaluate not only the final answer but like is  the agent getting there like efficiently correctly  

play17:28

um and and those are sometimes just as useful if  not more useful than evaluating the end result  

play17:34

um I'm trying to see what time it is because I  also want to leave lots of time for questions  

play17:39

um but I think I'm good so memory I think  is really interesting as well so we've  

play17:42

obviously tried it about like memory of  remembering the AI to Tool interactions  

play17:48

um there's also like a more basic idea of  remembering the user to AI interactions um  

play17:54

but I think the third type which is is showing up  in a lot of the recent uh papers on agents is this  

play18:02

idea of like personalization of giving an agent  kind of like its own kind of like uh objective and  

play18:07

own Persona and stuff like that the most obvious  way to do that is just like you encode it in the  

play18:11

prompt you say like hey like you know this is  your job as an agent you're supposed to do this  

play18:16

yada yada yada but I think there's some really  cool work being done on how to kind of like evolve  

play18:21

that over time and give agents a sense of like  this long-term memory um and and one of the papers  

play18:27

in particular around generative agents I think  does a really interesting job of diving into this  

play18:33

um and I think what a lot of people the reason  this is here in the agent section is I think  

play18:39

when when people think of Agents there's the  obvious like kind of like tool usage deciding  

play18:43

what to do but I think agents is also starting to  take on this concept of some kind of like uh more  

play18:51

encapsulated kind of like program that that  adapts over time and memory is a big part of that  

play18:57

um and so I think memories is is uh there's  a lot to explore here so that's why this is  

play19:02

a bit of outlier slide um okay I wanted to  chat very quickly about four uh projects  

play19:08

that that came out in the past two three  weeks specifically how they relate build  

play19:14

upon improve upon this side the the react  style agent that has been around for a  

play19:19

while first up is auto GPT which I'm  assuming most people have heard of um

play19:27

oh there we go all right um so Auto GPT the one  of the main differences between this and the react  

play19:35

style agents is is just the objective of what  it's trying to solve Auto GPT a lot of the initial  

play19:40

goals are like you know improve or increase my  Twitter following or something like that very kind  

play19:45

of like open-ended broad long-running goals react  on the other hand was designed and benchmarked  

play19:50

on more kind of like um uh short-lived kind of  like uh really immediately quantifiable or more  

play19:58

immediately quantifiable goals and so as a result  one of the things that auto GPT introduced is this  

play20:04

idea of long-term memory between the agent and  tools interactions and using a retriever Vector  

play20:08

store for that which becomes necessary because  now you have this doing like 20 or 30 kind of like  

play20:14

steps and it's just really long-running project  and so it's something that react just didn't need  

play20:18

but due to the change in objectives Auto GPT kind  of had to introduce baby AGI is another popular  

play20:24

one it also has this idea of long-term memory  for the agent tool interactions and this is the  

play20:30

project that introduced separate kind of like  planning and execution steps which I think is  

play20:33

a really interesting idea to improve upon some of  the long-running objectives and so specifically it  

play20:41

comes up with tasks it then takes the first tasks  it then thinks about how to do that which usually  

play20:45

involves actually baby AGI initially didn't have  any tools so it kind of just like made stuff up  

play20:51

um I think I think they're giving it tools now so  it can actually actually execute those things but  

play20:56

the idea of separating the planning execution  steps is I think that's a really interesting  

play21:00

idea that might help with some of the the  reliability and focus issues of longer term agents  

play21:07

uh camel is another paper that came out the  main novel thing here was they put two agents  

play21:14

in a simulation environment which in this case  because it was just two was just a chat room and  

play21:18

had them interact with each other um and so  the agents themselves I think were basically  

play21:23

just kind of like prompted language models so  I don't even think they were hooked up with  

play21:27

tools but going back to this idea of kind of  like memory and personalization um when people  

play21:32

kind of like talk about agents that is part  of what they're talking about and so I think  

play21:36

like the camel paper in my mind the main thing  is this idea of simulation environment there's  

play21:42

um there's maybe like two reasons you might want  to do this and have a simulation environment one  

play21:49

is kind of like practically to maybe like  evaluate an agent if you're kind of like  

play21:53

testing out an agent and you want you want to see  how it's interacting and for whatever reason you  

play21:57

don't want to test it out yourself so you put two  of them and you kind of like make sure they don't  

play22:02

go off the rails or something like that another  one is just uh kind of entertainment purposes so  

play22:07

there are a lot of examples of this by by people  I think there was one with like a VC and a founder  

play22:12

and had them chatting with each other and kind  of like uh solving stuff there so so this is a  

play22:17

little bit entertainment a little bit practical  generative agents was another paper that came out  

play22:23

I think this was maybe like a week and a half ago  so very recent it also it also had a simulation  

play22:28

environment aspect it was more complex so I think  they had like 25 different agents and kind of like  

play22:33

a sims-like world interacting with each other so  a much more complex environment set up um and then  

play22:41

they also did some really cool stuff around  memory and and reflection so memory uh refers  

play22:47

to basically remembering previous uh things that  happened in the world so basically an assimilation  

play22:54

environment they had kind of like things that  happened then they had the agents decide what  

play22:58

to do take actions observe kind of like the  results of those actions observe more things  

play23:03

that came in and so all this is encapsulated  in the idea of memory and then you fetch things  

play23:08

from this memory to inform kind of like their  their their actions in next time sequences  

play23:15

um so they and so there's three kind of like  main components to this memory retrieval thing  

play23:19

they had a Time weighted component which  basically fetched more recent memories they  

play23:24

had an important weighted uh piece which  fetched more like important information  

play23:30

um so you know like trivial things like like I  forget what I had for breakfast today um but uh  

play23:36

I don't know what's something that's really but  I remember meeting Charles way back when right so  

play23:40

there's different levels of importance there that  get described to events and so you want to fetch  

play23:44

events that are kind of like more um bigger in  importance and then they had the typical kind of  

play23:49

like relevancy weighted thing so depending on  what situation you're in you want to remember  

play23:52

events that are relevant for that then they also  introduced a really interesting reflection step  

play23:57

which basically after like I think they I think it  was like 20 different steps or something happened  

play24:04

um they would reflect on those things and kind  of like update different states of the world  

play24:09

um and so I think this is uh I've been thinking  about this a bit because I think this idea of like  

play24:14

reflecting on recent things and then updating  state is maybe like a generalization that can  

play24:21

be kind of like applied to a bunch of different  things so so some of the other memory types that  

play24:24

we have in langjang are we have like an entity  memory type which basically based on conversation  

play24:31

um kind of like extract relevant entities and then  constructs some type of graph and updates that  

play24:36

there's a more General kind of like uh Knowledge  Graph version of that as well and then we also  

play24:40

have kind of like a summary uh uh conversation  memory which based on the conversation updates  

play24:47

of running summary so you can get around some  of the context window links and so I think if  

play24:50

you look at it sort of through a certain angle  all of those kind of like relate to this idea  

play24:54

of taking recent observations and updating  some State whether that state is like a graph  

play24:59

or just a piece of text or anything like that so  there's also been some other papers recently that  

play25:05

have Incorporated this idea of like reflection I  haven't had time to read those as carefully but I  

play25:10

think that's uh yeah I don't know my personal  take is I think that's really interesting and  

play25:14

something to keep an eye out for the future  um and that's it I have no idea what time it  

play25:20

is because I can't see the time but I'm happy  to take questions until Charles kicks me off

Rate This

5.0 / 5 (0 votes)

الوسوم ذات الصلة
AI AgentsLanguage ModelsTool UsageChain of ThoughtReact StrategyMulti-hop TasksError RecoveryMemory ManagementAgent ReliabilityLong-term GoalsSimulation Environment
هل تحتاج إلى تلخيص باللغة الإنجليزية؟