Harrison Chase - Agents Masterclass from LangChain Founder (LLM Bootcamp)
Summary
TLDRThe speaker discusses the concept of agents in language models, focusing on their use as reasoning engines to interact with external tools. They cover prompting strategies like React, challenges in implementing agents, and recent advancements in memory and personalization. Projects like Auto GPT, Baby AGI, Camel, and Generative Agents are highlighted for their contributions to agent development.
Takeaways
- 🧠 Agents are considered the most interesting aspect of language chains due to their non-deterministic nature and ability to interact with the outside world based on user input and previous actions.
- 🔧 The use of agents is tied to tool usage, connecting to external data sources or computational tools like search APIs and databases to overcome language models' limitations.
- 💡 Agents offer flexibility and power, allowing for better error recovery and handling of multi-hop tasks through their reasoning capabilities as opposed to predetermined step sequences.
- 🔄 The typical implementation of agents involves using a language model to choose a tool and input, execute the action, and feed the observation back into the model until a stopping condition is met.
- 📚 The REACT method (Reasoning and Acting) is a prominent strategy for implementing agents, combining chain of thought reasoning with action-taking to improve reliability and effectiveness.
- 🛠 Challenges with agents include appropriate tool usage, avoiding unnecessary tool use in conversational contexts, parsing agent instructions into executable code, and maintaining memory of previous steps.
- 🔑 Tools descriptions and retrieval methods are crucial for agents to understand when and how to use various tools effectively.
- 🔄 Output parsers are used to convert the agent's string output into actionable code, addressing issues like misformatting and missing information.
- 🔑 Long-term memory is vital for agents dealing with long-running tasks, and methods like using a retriever vector store have been introduced to handle this.
- 🤖 Recent projects like Auto GPT, Baby AGI, CAMEL, and Generative Agents have built upon the REACT style agent framework, introducing concepts like long-term memory, planning vs. execution separation, and simulation environments.
- 🔬 The concept of reflection in agents, where they review past actions and update their state, is a recent development that could generalize to various applications and improve agent reliability.
Q & A
What is the core idea of agents in the context of language models?
-The core idea of agents is to use the language model as a reasoning engine. This means the language model determines actions and interactions with the outside world based on user input and results of previous actions, rather than following a hard-coded sequence of actions.
Why are agents considered more flexible and powerful than traditional language models?
-Agents are considered more flexible and powerful because they can recover from errors better, handle multi-hop tasks, and act as a reasoning engine. This allows them to adapt their actions based on user input and the outcomes of previous actions, making them more dynamic and responsive.
What is the significance of tool usage in the context of agents?
-Tool usage is significant because it allows agents to connect to other sources of data or computation, overcoming limitations of language models such as lack of knowledge about specific data or poor mathematical capabilities. This integration enhances the agent's ability to perform tasks and provide more accurate responses.
How does the REACT method improve the reliability of agents?
-REACT (Reasoning and then Acting) is a prompting strategy that combines reasoning with action-taking. It helps agents think through steps and then take actions based on real data, improving the reliability of their responses and actions. This method allows agents to arrive at more accurate and reliable answers by integrating reasoning and tool usage.
What are some challenges in getting agents to work reliably in production?
-Challenges include getting agents to use tools appropriately, avoiding unnecessary tool usage, parsing the language model's output into actionable code, remembering previous steps, and evaluating the agent's trajectory and efficiency. These challenges affect the agent's ability to perform tasks reliably and efficiently in real-world applications.
How does the concept of memory play a role in the functionality of agents?
-Memory is crucial for agents as it allows them to recall previous interactions and steps, which can inform their current actions. This can include remembering user interactions, AI-to-tool interactions, and personalizing the agent's behavior over time. Memory helps in maintaining context and continuity in agent-based systems.
What is the role of tool descriptions in helping agents decide when to use tools?
-Tool descriptions provide context and information about the capabilities and limitations of each tool. This helps the agent understand when and how to use specific tools to overcome its limitations and perform tasks more effectively.
How can retrieval methods help in managing the complexity of tool usage by agents?
-Retrieval methods, such as embedding search lookup, can help agents manage the complexity of tool usage by retrieving the most relevant tools based on the task at hand. This can reduce the need for lengthy tool descriptions in the prompt and make the agent's decision-making process more efficient.
What are some strategies to prevent agents from using tools unnecessarily?
-Strategies include providing instructions in the prompt that remind the agent it doesn't always need to use tools, adding a tool that explicitly returns to the user, and using output parsers to correct mistakes in tool usage. These strategies help keep the agent focused on the task and prevent unnecessary tool usage.
How can the concept of reflection be beneficial in the context of agent-based systems?
-Reflection allows agents to review their recent actions and update their understanding of the world or task at hand. This can help in maintaining focus, improving decision-making, and adapting to new information. It is particularly useful in long-running tasks where continuous learning and adaptation are necessary.
Outlines
🤖 Introduction to Agents in Language Models
The speaker begins by introducing the topic of 'agents' in the context of language models. They highlight the interest in agents due to their dynamic and non-deterministic nature, which allows for flexible interaction with the user and the environment. The core idea is to use language models as reasoning engines to determine actions based on user input and previous actions. The speaker also touches on the benefits of agents, such as their ability to use tools, connect to external data sources, and handle multi-hop tasks more effectively than traditional language models.
🔍 The REACT Method and Tool Usage in Agents
This paragraph delves into the REACT (Reasoning and Acting) method, a prominent strategy for implementing agents. The speaker discusses how REACT combines reasoning with action-taking, utilizing language models to decide on the appropriate tools and actions. They provide an example from the Hotspot QA dataset to illustrate the effectiveness of REACT in answering multi-hop questions. The paragraph also addresses the challenges of making agents reliable, especially in production environments, and the importance of tool descriptions and retrieval in guiding agents to use tools effectively.
🛠️ Challenges in Agent Implementation and Tool Use
The speaker outlines several challenges in implementing agents, focusing on the appropriate use of tools. They discuss the need for agents to understand when to use tools and when not to, emphasizing the importance of instructions and tool descriptions. The paragraph also covers the technical challenge of parsing language model outputs into actionable code, introducing the concept of output parsers. Additionally, the speaker mentions the issue of agents remembering previous steps, suggesting the use of retrieval methods to manage long-running tasks and maintain context.
🏗️ Building and Evaluating Agent Systems
This paragraph discusses the complexities of building and evaluating agent systems. The speaker highlights the need for agents to stay on track and reiterate objectives, suggesting methods like reiterating the objective before each action and separating planning and execution steps. Evaluation methods for agents are also explored, including assessing the correctness of the final answer and evaluating the efficiency and correctness of the agent's trajectory. The speaker emphasizes the importance of memory in agent systems, both for tool interactions and user interactions.
🌐 Recent Advances in Agent Research and Projects
The speaker reviews recent projects and papers in the field of agents, noting their contributions to the development of agent systems. They mention Auto GPT, which introduces long-term memory for agent-tool interactions, and Baby AGI, which separates planning and execution steps. The Camel paper is highlighted for its use of a simulation environment for agent interaction, and Generative Agents is noted for its complex simulation environment and memory retrieval system. The speaker also discusses the concept of reflection in agent systems, where agents reflect on their actions and update their states accordingly.
📚 Conclusion and Invitation for Questions
In the final paragraph, the speaker concludes their presentation and invites the audience to ask questions. They express their readiness to engage in a discussion, indicating that they are open to exploring topics in greater depth and addressing any queries the audience might have. The speaker also acknowledges the time constraint, showing their consideration for the audience's time and the format of the presentation.
Mindmap
Keywords
💡Agents
💡Tool Usage
💡React
💡Chain of Thought
💡Multi-hop Tasks
💡Language Models
💡Reliability
💡Memory
💡Output Parsers
💡Evaluation
Highlights
Agents are considered the most interesting aspect of the language chain.
Agents use language models as reasoning engines to determine actions and interactions with the outside world.
The non-deterministic nature of agents allows for flexible and powerful responses to user inputs.
Agents can overcome limitations of language models by connecting to external data sources and computation tools.
Tool usage is a key feature of agents, enhancing their capabilities beyond predefined steps.
The REACT method (Reasoning and Acting) is a prominent strategy for implementing agents, combining reasoning and action-taking.
Challenges with agents include ensuring reliable tool usage and managing errors in tool selection or execution.
Memory management is crucial for agents to remember previous steps and maintain context over long-running tasks.
The importance of tool descriptions and retrieval for agents to effectively select and use tools.
Strategies to prevent agents from using tools inappropriately, such as conversational scenarios.
Parsing language model outputs into actionable tool invocations presents a significant challenge.
The use of output parsers to handle and correct errors in agent responses.
Evaluation of agents is complex, involving both the final result and the efficiency and correctness of the steps taken.
The role of memory in personalizing agents and giving them a sense of long-term objectives.
Recent projects like Auto GPT and Baby AGI build upon the REACT framework, introducing long-term memory and planning.
The use of simulation environments in agent research for evaluation and interaction studies.
Innovative approaches in generative agents for memory retrieval and reflection, influencing future agent behavior.
Transcripts
um so so I'll be talking about agents um and uh yeah there are many things in in Lane chain but
I think agents are probably the most interesting one to talk about so that's what I'll be doing
I'll cover kind of like what our agents why use agents the typical implementation of Agents
talk about react which is one of the first uh prompting strategies that really accelerated and
made reliable the use of Agents um then I'll then I'll talk a bunch about challenges with
with agents and challenges of getting to them to work reliably getting them to work in production
um I'll then touch a little bit on memory and segue that into more recent kind of like papers
and projects that do agentic things um I'll probably skim over the initial slides because
I think most people here are probably familiar with the idea of Agents but the the core idea
of Agents is using the language model as as a reasoning engine so using it to determine kind
of like what to do and how to interact with the outside world and and this means that there is a
non-deterministic kind of like sequence of actions that'll be taken depending on the user input so
there's no kind of like hard-coded do do a then do B then do c rather that the agent determines
what actions to do depending on the user input and depending on results of previous actions
so why why would you want to do this in the first place um so so one there's this
very tied to agents is the idea of tool usage and connecting it to the to the outside world
um and so connecting it to other sources of data or computation like search apis databases these
are these are all very useful to overcome some of the limitations of language models um such
as they don't know about your data they can't do math amazingly well um but but those the idea of
tool usage isn't kind of like unique to agents you can still use tools you can still connect
llms to search engines without using an agent so so why use an agent and I think some of the
benefits of agents are that they're they're more flexible they're more powerful they allow you to
kind of like recover from errors better they allow you to handle kind of like multi-hop tasks again
with this idea of being the reasoning engine and so an example that I like to use here is thinking
about like interacting with a SQL database you can do that in a sequence of predetermined steps you
can have kind of like a natural language query which you then use a language model to convert
to a SQL query which you then pass or you then execute that get back a result pass that back
to the language model ask it to synthesize it with respect to the original question and get
kind of this natural language wrapper around a SQL database very very useful by itself but
there are a lot of edge cases and things that can go wrong so there could be an error in the
SQL query maybe it hallucinates a table name or a field name maybe it just writes incorrect SQL
and then there are also queries that need multiple kind of like queries to be made under the hood in
order to answer and so although kind of like a simple chain can can you know handle maybe
I don't know 50 80 of the use cases you very quickly run into these like edge cases where a
more flexible framework like agents helps kind of like circumvent so the typical implementation of
Agents generally and it's so early in this field that it kind of feels a bit weird to be talking
about a typical implementation because I'm sure we'll see a bunch of different variants but but
generally you get a user query you use the the llm that's that's the agent to choose a tool to use
um you then and also the input to that tool you then you then do that you take that action you
get back an observation and then you feed that back into the language model and and you kind of
continue doing this until a stopping conduction is met and so there can be different types of
stopping conditions probably the most common is the language model itself realizes hey I'm done
with this one with this task with this question that someone asked of me I should now return to
the user but there can be other more hard-coded rules and so when we talk about like reliability
of Agents some of these can really help with that so you know if an agent has done five different
steps in a row and it hasn't reached a final answer it might be nice to have it just return
something then there are also certain tools that you can just like return automatic so so basically
the general idea is choose a tool to use observe the output of that tool and kind of continue going
um so how do you actually like so that was pseudocode now let's talk about the actual kind of
like ways to get this to do to do what we want um and the the first and still kind of like the main
um way of doing this the main prompting strategy slash algorithm slash method for doing this is
react which stands for reasoning and then acting so re from reasoning act from acting and it's the
the paper is a great paper came out in I think October out of Princeton of our synergizing
two different kind of like methods um and we and we can we can look at this example which
is which is taken from their paper and see why it's so effective um so this uh example
comes from the the hotspot QA data set which is basically a data set where it's asking questions
over Wikipedia Pages where there are multi-hop usually kind of like two or three intermediate
questions that need to be reasoned about before answering so we can see so so here there's this
question aside from the Apple remote what other device can control the program Apple remote
Apple remote was originally designed to interact with and so the most basic prompting strategy is
kind of just pass that into the language model and get back an answer and so we can see that in
in 1A standard and it just returns uh you know a single answer and we can see that it's wrong
another method that had emerged maybe like a month or so prior was the idea of like Chain
of Thought reasoning um and so this is usually uh associated with the let's think step by step
prefix to the response and you can see in red that there's this like Chain of Thought thing
that's happening as it thinks through step by step um and uh it returns an answer that that's also
incorrect In this case this has been shown to kind of like get the agent or get the language model to
to think a little bit better so to speak so it's yielding higher quality more reliable results
the issue is it still only knows kind of like what is actually present in the data that the
language model is trained on and so there's there's another technique that came out which
is basically action only where you give it uh access to different Tools in this case I think
all the examples in in this uh picture are of search but it there's I think in the paper it
had search and then look up and so you can see here that the language model outputs
um kind of like search Apple remote it looks up Apple remote it gets back in observation it then
does search front row confines that so that's an instance of the kind of like recovering from an
error and then it does search front row software finds uh an answer and then and then finishes
and you can see here that the uh the the output that it gave yes um kind of loses some of what
it was actually supposed to answer so you've got this Chain of Thought reasoning which helps
the language model think about what to do then you've got this action taking step that basically
actually allows it to plug into more kind of like sources of real data what if you combine them and
that's the idea of react and and that's the idea of a lot of the the popular agent Frameworks
because again agents use a language model as a reasoning engine so you want it to be really
good at reasoning so if there are any prompting techniques you can use to improve its reasoning
you should probably do those and then the big part of it is also connecting to the tools and
that's where action comes in and so you can see here it arrives at kind of like the final answer
so that's the that's the idea of Agents react is still one of the more popular implementations for
it but what are some of the current challenges and and there are a lot of challenges like I
I think most agents are not amazingly production ready at the moment um and and and these are some
of the reasons why and we can we can I'll walk through them in a bunch more detail I'll also
leave a lot of time that ends for the question so I'd love to hear kind of like what you guys are
observing as as issues for getting agents to work reliably this is probably far from a complete list
so the most basic challenge with agents is getting them to use tools in appropriate scenarios
um and so you know in in the react paper they address this challenge by bringing in the the
reasoning aspect of it the Chain of Thought style prompting asking it to think um kind of like
common uh ways to do this include just like saying in the instructions you have access to these tools
you should use these to overcome some of your limitations and and just basically instructing the
language model tool descriptions are really really important for this if you want the agent to use a
particular tool it should probably have enough context to know that this tool is good at this
and that that generally comes in the form of like tool descriptions or some some information about
the tool that's passed into the prompt um that can that can maybe not scale super well if you've
got a lot of tools because now you've got maybe these more complex descriptions you want to put
them in the final prompt for the language model to know what to do with them but you can quickly
run into a kind of like context length issues and so that's where I think the idea of tool
retrieval comes in handy so you can have hundreds thousands of tools you can take in you can do some
retrieval step and I think retrieval is another really interesting topic that I'm not going to
go into too much depth here so for the sake of this we'll just say it's some embedding search
lookup although there's I think there's a lot more interesting things to do there you basically do
the retrieval step you get back 5 10 however many tools that you think are most promising
you can then pass those to the prompt and kind of have the language model take the final steps
from there a few shot examples I think can also be really helpful so you can use those to guide
the language model in what to do again I think the idea of retrieval to find the most relevant
few shot examples is particularly promising here so if you give it examples similar to the
one it's trying to do those help a lot better than random examples and then probably the most extreme
version of this is fine tuning the model like like tool former to really help with tool selection
um there's also a a subtle Second Challenge which is getting them not to use tools when
they don't need to so a big use case for agents is having conversational style agents one of the big
problems that we've seen is oftentimes these types of Agents just want to use tools no matter what
even if they're having a conversation and so again like the most basic thing you can do is probably
put some some information in the instructions some reminder in the prompt like hey you don't have to
use a tool you can respond to the user if it seems like it's more of a conversation that that can get
you so far another kind of like clever hack that we've seen here is add another tool that
explicitly just returns to the user and then you know they they like to use tools so they'll but
they'll usually use that tool um so so I thought that was a pretty uh clever and interesting hack
um a third challenge is the language models tell you what tool to use and how to use it
um but that's in the form of a string and so you need to go from that string into some code
or something that can actually be run and so that involves parsing kind of like the output of the
language model into this this tool invocation and so um some some tips and tricks and hacks
here are one like the more structured you ask for the response the easier it is to parse generally
so language models are pretty good at writing uh Json so we've kind of transitioned a few of our
agents to use that schema still doesn't always work especially some of the chat models like to
add in a lot of uh kind of like language so we've introduced uh kind of like this concept of output
parsers which generically encapsulate all the logic that's needed to parse this response and
we've tried to make that as modular as possible so if you're seeing areas you can hopefully kind
of like substitute that out very related to that we also have a concept of like output
parsers that can retry and fix mistakes so and I think there's there's some subtle uh there's
some subtle differences here that I think are really cool basically like you know if you have
misformatted schema you can fix that explicitly by just passing it the the output and the error
and saying fix this response but if you have uh if you have an output that just forgets one of
the fields like it Returns the action but not the action input or something like that you
need to provide more information here so I think there's actually a lot of subtlety in fixing some
of these errors but the basic idea is that you can try to parse it if you if it fails you can
then try to fix it all this we we currently encapsulate in this idea like output parsers
um so the the fourth challenge is getting them to remember previous steps that were taken the
most basic thing the thing that the react paper does is just keep a list of those steps in memory
um again that starts to run into some some context uh window issues
um especially when you're dealing with long running tasks and so the thing that we've
seen done here is again fetch previous steps with some retrieval method and put those into context
usually we've actually seen a combination of the two so we've seen using the the N most
recent actions and observations combined with the K most relevant actions and observations
um incorporating long observations is another really interesting one this actually comes up
or this came up a lot when we were dealing with working with apis because apis often
return really big Json blobs that are really big and hard to put in context um so the the
most common thing that we've done here is just parse that in some way you can do really simple
stuff like convert that blob to a string and put the first like thousand characters or something
as the response you can also do some more if if you know that you're working with a specific API
you can probably write some custom logic to kind of like take only the relevant keys and put that
if you want to make something general you can also maybe do something dynamically to like figure out
what key like basically explore the Json object and figure out what keys to put in that's a bit
more exploratory I would say but the basic idea is yeah there is this issue of um and so uh zapier
I always have to think about how to pronounce it but zapier makes you happier so zapier when
they did this with their natural language API not only did they have something before the API that
was like natural language to some API call they also spent a lot of time working on the output
and so the output is actually very specifically I think it's like under like 200 or 300 tokens or
something like that and they did that on purpose they spent a lot of time thinking about that and
so I I think for Tool usage that is that is really important as well another more kind of like uh
exploratory way of doing this is also you could perhaps just store the the long output and
then do retrieval on it when you're when you're trying to think of like what next steps to take
um agents can often go off track especially in long-running things
um and so there's kind of like two methods that I've seen to kind of like keep them on track one
you can just reiterate the objective uh right before it makes each action
um so and why this works I think we've seen that with at least with a lot of the current models
with instructions that are earlier in the prompt it might forget it by the time it gets the end if
it's a really long prompt so putting it at the end seems to help and then another really interesting
one that I'll talk about when when I talk about some of the more recent papers and stuff that have
come out is this idea of separating explicitly a planning and execution step and basically have one
step that explicitly kind of thinks about these are kind of like all the objectives that I want
to do at a high level and then a second step that says okay given this objective
given this one sub-objective now how do I do this one sub-objective and basically break it
down even more in a hierarchical manner and there's a there's a good example of
that with baby AGI which I'll talk about in a bit and another big issue is evaluation of
these things I think evaluation of language models in general very difficult evaluation
of applications built on top of language models also very difficult and agents are no exception
I think there's the obvious kind of like evaluate whether it arrived at the correct result in terms
of in terms of getting metrics on evaluation and so um yeah you know if you're asking the agent to
produce some answer that's like a natural language response there's techniques you can do there in
a lot of them in the flavor of asking a language model to score the expected answer and the actual
answer and come up with some grade and stuff like that that that applies to the output of Agents as
well but then there's also some agent specific ones that I think are really interesting mostly
around evaluating this idea of like the agent trajectory or the intermediate steps and so uh
where we'll actually have something coming out for this so someone opened a PR that I need to get in
but basically there's there's a lot of like little different things you can look at like did it take
the correct action is the input to the action correct um is it the correct number of steps and
by this uh you know like sometimes you and this is very related to the next one which is like the
most efficient sequence of steps and so there's a bunch of different things that you can do to
evaluate not only the final answer but like is the agent getting there like efficiently correctly
um and and those are sometimes just as useful if not more useful than evaluating the end result
um I'm trying to see what time it is because I also want to leave lots of time for questions
um but I think I'm good so memory I think is really interesting as well so we've
obviously tried it about like memory of remembering the AI to Tool interactions
um there's also like a more basic idea of remembering the user to AI interactions um
but I think the third type which is is showing up in a lot of the recent uh papers on agents is this
idea of like personalization of giving an agent kind of like its own kind of like uh objective and
own Persona and stuff like that the most obvious way to do that is just like you encode it in the
prompt you say like hey like you know this is your job as an agent you're supposed to do this
yada yada yada but I think there's some really cool work being done on how to kind of like evolve
that over time and give agents a sense of like this long-term memory um and and one of the papers
in particular around generative agents I think does a really interesting job of diving into this
um and I think what a lot of people the reason this is here in the agent section is I think
when when people think of Agents there's the obvious like kind of like tool usage deciding
what to do but I think agents is also starting to take on this concept of some kind of like uh more
encapsulated kind of like program that that adapts over time and memory is a big part of that
um and so I think memories is is uh there's a lot to explore here so that's why this is
a bit of outlier slide um okay I wanted to chat very quickly about four uh projects
that that came out in the past two three weeks specifically how they relate build
upon improve upon this side the the react style agent that has been around for a
while first up is auto GPT which I'm assuming most people have heard of um
oh there we go all right um so Auto GPT the one of the main differences between this and the react
style agents is is just the objective of what it's trying to solve Auto GPT a lot of the initial
goals are like you know improve or increase my Twitter following or something like that very kind
of like open-ended broad long-running goals react on the other hand was designed and benchmarked
on more kind of like um uh short-lived kind of like uh really immediately quantifiable or more
immediately quantifiable goals and so as a result one of the things that auto GPT introduced is this
idea of long-term memory between the agent and tools interactions and using a retriever Vector
store for that which becomes necessary because now you have this doing like 20 or 30 kind of like
steps and it's just really long-running project and so it's something that react just didn't need
but due to the change in objectives Auto GPT kind of had to introduce baby AGI is another popular
one it also has this idea of long-term memory for the agent tool interactions and this is the
project that introduced separate kind of like planning and execution steps which I think is
a really interesting idea to improve upon some of the long-running objectives and so specifically it
comes up with tasks it then takes the first tasks it then thinks about how to do that which usually
involves actually baby AGI initially didn't have any tools so it kind of just like made stuff up
um I think I think they're giving it tools now so it can actually actually execute those things but
the idea of separating the planning execution steps is I think that's a really interesting
idea that might help with some of the the reliability and focus issues of longer term agents
uh camel is another paper that came out the main novel thing here was they put two agents
in a simulation environment which in this case because it was just two was just a chat room and
had them interact with each other um and so the agents themselves I think were basically
just kind of like prompted language models so I don't even think they were hooked up with
tools but going back to this idea of kind of like memory and personalization um when people
kind of like talk about agents that is part of what they're talking about and so I think
like the camel paper in my mind the main thing is this idea of simulation environment there's
um there's maybe like two reasons you might want to do this and have a simulation environment one
is kind of like practically to maybe like evaluate an agent if you're kind of like
testing out an agent and you want you want to see how it's interacting and for whatever reason you
don't want to test it out yourself so you put two of them and you kind of like make sure they don't
go off the rails or something like that another one is just uh kind of entertainment purposes so
there are a lot of examples of this by by people I think there was one with like a VC and a founder
and had them chatting with each other and kind of like uh solving stuff there so so this is a
little bit entertainment a little bit practical generative agents was another paper that came out
I think this was maybe like a week and a half ago so very recent it also it also had a simulation
environment aspect it was more complex so I think they had like 25 different agents and kind of like
a sims-like world interacting with each other so a much more complex environment set up um and then
they also did some really cool stuff around memory and and reflection so memory uh refers
to basically remembering previous uh things that happened in the world so basically an assimilation
environment they had kind of like things that happened then they had the agents decide what
to do take actions observe kind of like the results of those actions observe more things
that came in and so all this is encapsulated in the idea of memory and then you fetch things
from this memory to inform kind of like their their their actions in next time sequences
um so they and so there's three kind of like main components to this memory retrieval thing
they had a Time weighted component which basically fetched more recent memories they
had an important weighted uh piece which fetched more like important information
um so you know like trivial things like like I forget what I had for breakfast today um but uh
I don't know what's something that's really but I remember meeting Charles way back when right so
there's different levels of importance there that get described to events and so you want to fetch
events that are kind of like more um bigger in importance and then they had the typical kind of
like relevancy weighted thing so depending on what situation you're in you want to remember
events that are relevant for that then they also introduced a really interesting reflection step
which basically after like I think they I think it was like 20 different steps or something happened
um they would reflect on those things and kind of like update different states of the world
um and so I think this is uh I've been thinking about this a bit because I think this idea of like
reflecting on recent things and then updating state is maybe like a generalization that can
be kind of like applied to a bunch of different things so so some of the other memory types that
we have in langjang are we have like an entity memory type which basically based on conversation
um kind of like extract relevant entities and then constructs some type of graph and updates that
there's a more General kind of like uh Knowledge Graph version of that as well and then we also
have kind of like a summary uh uh conversation memory which based on the conversation updates
of running summary so you can get around some of the context window links and so I think if
you look at it sort of through a certain angle all of those kind of like relate to this idea
of taking recent observations and updating some State whether that state is like a graph
or just a piece of text or anything like that so there's also been some other papers recently that
have Incorporated this idea of like reflection I haven't had time to read those as carefully but I
think that's uh yeah I don't know my personal take is I think that's really interesting and
something to keep an eye out for the future um and that's it I have no idea what time it
is because I can't see the time but I'm happy to take questions until Charles kicks me off
Voir Plus de Vidéos Connexes
LangChain Agents: A Simple, Fast-Paced Guide
"Agentic AI" Explained (And Why It's Suddenly so Popular!)
What is LangChain?
The Future of Generative AI Agents with Joon Sung Park
“Wait, this Agent can Scrape ANYTHING?!” - Build universal web scraping agent
Avoiding Mistakes in Defining Agents and Tasks in CrewAI
5.0 / 5 (0 votes)