5 Problems Getting LLM Agents into Production
Summary
TLDRThis video discusses the top five challenges in deploying AI agents into production, focusing on reliability as the primary issue. The speaker emphasizes the need for agents to be consistently reliable, as most struggle to achieve even 99% accuracy. Other issues include agents getting stuck in loops, the importance of custom tools, self-checking mechanisms, and the need for agents to be explainable. The video suggests strategies for mitigating these problems and hints at future content on building and debugging agents.
Takeaways
- 🛡️ Reliability is the top concern for deploying AI agents into production, with most agents struggling to achieve even 60-70% reliability, far from the desired 'five nines' or even 'two nines' (99%).
- 🔁 Agents often fall into excessively long loops, which can be due to failing tools or the LLMs deciding to repeat parts of the process unnecessarily, leading to inefficiency and potential costs.
- 🛠️ Customizing tools for specific use cases is crucial as generic tools may not meet the needs of an agent and can lead to failures in the agent's operation.
- 🔄 The importance of creating intelligent tools that can manipulate and prepare data for LLMs effectively, and handle failures in a way that prevents endless loops.
- 🔍 Agents require self-checking mechanisms to ensure the outputs are useful, such as running unit tests for code generation or verifying the existence of URLs.
- 📑 Explainability is key for user trust; agents should provide explanations or citations for their outputs to show the reasoning behind decisions or results.
- 🐞 Debugging is an essential part of agent development; logs and outputs should be intelligently designed to help trace and understand where and why an agent fails.
- 📊 Minimizing decision points in an agent's operation can lead to more straightforward and reliable outcomes, reducing the complexity and potential for errors.
- 💡 The script suggests that not all tasks require the complexity of an LLM; sometimes, a simpler, more direct approach might be more effective.
- 🚀 The speaker plans to create more videos discussing building agents with frameworks like LangGraph and even without frameworks, using plain Python for certain tasks.
- ❓ The video encourages viewers to think critically about their own agent designs, assessing decision points and reliability to improve their agents' performance.
Q & A
What are the five common problems discussed in the video script that people face when trying to get their AI agents into production?
-The script discusses five key issues: 1) Reliability, with agents often not meeting the desired level of consistency; 2) Excessive looping, where agents get stuck in repetitive processes; 3) Tool issues, including the need for custom tools tailored to specific use cases; 4) Self-checking, where agents should be able to verify the usefulness of their outputs; 5) Lack of explainability, which is important for users to understand and trust the agent's decisions.
Why is reliability considered the number one problem for AI agents according to the script?
-Reliability is the top issue because companies typically require a high level of consistency for production use, often expecting 'five nines' or 99.999% reliability. Most agents, however, are only able to achieve around 60-70% reliability, which is insufficient for production needs.
What is the concern with agents going into excessively long loops?
-Long loops can occur for various reasons, such as a failing tool or the agent deciding to repeat a process unnecessarily. This can lead to inefficiency, increased costs if using an expensive model, and a lack of progress, ultimately hindering the agent's performance.
Why is it important to have custom tools for AI agents?
-Custom tools are crucial because they can be tailored to specific use cases, ensuring that the agent can filter inputs, manipulate data, and prepare it in a way that is beneficial for the LLMs. This customization helps in avoiding common pitfalls and enhances the overall functionality and efficiency of the agent.
What is the purpose of self-checking in AI agents?
-Self-checking allows the agent to verify the usefulness of its outputs, ensuring that the results are accurate and relevant to the task. This is particularly important in tasks like code generation, where running unit tests can confirm the correctness of the code produced by the agent.
How does the lack of explainability in AI agents affect their usability in production?
-Without explainability, it's difficult for users to trust the agent's outputs, as they cannot understand the reasoning behind the decisions made. This is crucial for gaining user confidence and ensuring that the agent's decisions are transparent and justifiable.
What is the role of citations in improving the explainability of AI agents?
-Citations provide a way to attribute the information used by the agent to make decisions or perform tasks. By showing where the information came from, citations offer transparency and help users understand the basis for the agent's actions or conclusions.
Outlines
🔒 Key Challenges in Agent Reliability
The speaker addresses the common issues encountered when trying to put AI agents into production, focusing on their reliability. Companies often seek high reliability levels, but most agents struggle to achieve even 60-70% effectiveness. The speaker emphasizes the need for agents to be consistently reliable to be useful in production environments, as unreliable agents necessitate constant human oversight, which defeats the purpose of automation. The speaker also touches on the problem of agents getting stuck in loops, a common issue with certain frameworks like CrewAI, and the importance of architecting agents to avoid or quickly exit such loops.
🛠️ The Importance of Custom Tools for Agents
This paragraph delves into the critical role of tools in the functionality of AI agents. The speaker points out that while tools like those in LangChain are good for starting out, they often need to be heavily customized for specific use cases. The speaker suggests that understanding and improving these tools is essential, as they are the 'secret sauce' for agents, affecting how data is obtained, manipulated, and prepared for the LLMs. Examples are given, such as a webpage diffing tool, to illustrate how custom tools can be developed to meet specific needs and prevent endless loops, thereby enhancing the agent's efficiency and effectiveness.
🔍 Ensuring Agent Outputs are Useful and Explainable
The speaker discusses the necessity for agents to have self-checking mechanisms to ensure the usefulness of their outputs. This is particularly important in scenarios like code generation, where running unit tests can verify the correctness of the code produced by the agent. The speaker also highlights the importance of explainability in agents, where the agent should be able to provide explanations or citations for its outputs, thereby increasing user confidence. Additionally, the speaker touches on the need for intelligent debugging tools and logs that can help trace the agent's decision-making process and identify points of failure.
🛑 Minimizing Decision Points and Debugging for Agent Efficiency
In the final paragraph, the speaker emphasizes the importance of minimizing decision points within an agent to streamline its operation and increase reliability. The speaker advises assessing existing agents to identify unnecessary decision points and to ensure conformity at each point to achieve desired outcomes. The speaker also mentions the intention to create more videos on building agents with frameworks like LangGraph and CrewAI, despite reservations about using CrewAI for production, and encourages viewers to consider moving from high-level frameworks to more direct coding approaches for greater control and efficiency.
Mindmap
Keywords
💡Reliability
💡Frameworks
💡Production
💡Autonomy
💡Loops
💡Tools
💡Customization
💡Self-checking
💡Explainability
💡Debugging
💡Decision Points
Highlights
The video discusses five common problems faced when trying to put AI agents into production.
Reliability is the top issue for AI agents, with most only achieving around 60-70% effectiveness.
The desire for agents to be fully autonomous without the need for human oversight.
Agents often get stuck in excessively long loops, which can be due to failing tools or repeated attempts at a task.
The importance of hardcoding limits on the number of steps an agent can take to prevent endless loops.
Customizing tools for specific use cases is crucial for the success of AI agents.
Tools should be able to handle data effectively and communicate failures to the LLM in a beneficial way.
Creating custom tools for specific tasks can greatly enhance an agent's functionality.
The necessity for agents to have self-checking mechanisms to ensure the usefulness of their outputs.
The example of using unit tests for code generation by agents to verify their outputs.
The challenge of ensuring agents generate accurate URLs and the importance of validating them.
The lack of explainability in LLM agents and the need for providing explanations for their decisions.
The use of citations as a method to increase confidence in an agent's output by showing information sources.
The importance of debugging and having intelligent logs to trace an agent's decision-making process.
Minimizing decision points in an agent's architecture to streamline outcomes and reliability.
The suggestion that sometimes simple tasks do not require an LLM and can be sequenced without decision points.
The speaker's intention to make more videos on building with LangGraph and evaluating the effectiveness of CrewAI for prototyping.
The emphasis on the importance of considering these problems when developing AI agents for production environments.
Transcripts
All right.
So in this video, I want to talk about the five problems that I keep seeing
again and again that people face of getting their agents good enough
to basically put into production.
I get a lot of questions about in regards to frameworks around this.
And while I'm trying to be sort of reasonably framework agnostic
here, certainly some of these things apply a lot more to some
frameworks than to other frameworks.
So one of the things that came up recently was someone asked me about
putting CrewAI into production.
And my comment was that I actually would never currently put CrewAI into production
based on the fact that, there were so many issues with it that I wouldn't trust it.
Putting things like LangGraph into production that's
certainly much more reliable.
but I think you've got some of these problems with all of the different
agent frameworks if you're not aware of them and if you're not
thinking about how to basically fix these problems as we go through.
So let's dive into this.
By far the number one problem for all of agents out there
at the moment is reliability.
So talking to a lot of startups, talking to a lot of companies that
want to do agents the thing I'm seeing consistently is that companies are very
reluctant to do agents, for anything really complicated just because the
reliability of the agents is so low.
While your typical company wants five nines of reliability, they'd probably
even settle for, two nines of reliability, meaning 99%, but most agents are
probably at best getting around 60, 70 percent of being able to do things.
Now, there are some places where maybe that's okay, but for the majority
of things, getting something into production, you have to make it reliable.
You have to be able to make it consistently be able to produce
an output that the end user would be able to benefit from.
That the end result would be able to, be like they expect it to
be and something that they can benefit from and actually use it.
there's no use of creating agents that only work some of the time, and then end
up failing a large percentage of the time.
The issue that creates is the whole issue of humans then having to basically
check every single thing in the agent.
Now that's fine if you're, starting out and you're trying to make training
data or something like that, and you've got a human in the loop and
you're doing that kind of thing.
but really what we want for agents eventually is we want to be able
to be fully autonomous, to be fully operating by themselves, producing a
consistent level of result, without a human having to be in the loop there.
So this brings us to some of the things that actually go wrong.
So the second thing that I see happening a lot is, agents going into
excessively long loops and this can be for a variety of different reasons.
But it's quite common to see this in CrewAI and some of the other frameworks.
, where you'll have it set up and the agents will basically not like the
output , either of a tool, which can be one of the ways that this happens quite
often is a failing tool or a tool that sort of just don't working in some way.
the other way too, though, is that where the LLMs basically, get a response out
from one sub agent to the next part.
And it just decides that no, it needs to do that part again.
And it just gets into this loop of going through it again and again and again.
Now this is one of the frustrations I've felt a lot with CrewAI
and with some of the others.
with LangGraph, what I actually do is I sort of hard code it so that we kind
of know how many steps it's taking.
Now CrewAI has actually, set up a thing also that does something like that
nowadays too where you can actually limit the number of steps that it
goes through or repeats and retries that it does for this kind of thing.
But this is a very common pattern that you see with LLM agents, that
they get into these kind of loops.
And a lot of what you have to think about when you're architecting an agent is
actually how to handle any of these loops.
ideally you want to reduce them to none.
but if they do happen, you want to make sure that your overall sort of agent or
system is aware that they're happening.
And then puts a stop to them pretty quickly.
Otherwise, you find that you end up just getting an agent, just going
on, making LLM call off the LLM call.
And if it is, fully autonomous where you're not watching that, they can get
very expensive very quickly if you're using an expensive model or something
The third problem that can go wrong is around tools.
Now, tools is something that I've been meaning to make a
lot more videos about, in here.
In the previous section, I talked about failing tools.
And this is something that happens a lot, that I feel like
people are often not aware of.
while the tools in things like LangChain, a pretty nice for starting out, you're
gonna find that you want to customize them a lot to your specific use case.
you need to understand that a lot of those tools were made over a year ago.
They were very simple at the time.
They're not really made for agents a lot.
They're often made more for use in sort of RAG than agentic stuff.
and you really find that what you want to do is basically make
your own set of custom tools.
Now I will follow up with a video talking a bit about custom tools,
but I will say that, tools are really your agents sort of secret sauce.
if you got a really good set of tools that basically can filter inputs
can use inputs in the right way.
can generate outputs that are going to be beneficial to the actual LLMs.
So really the whole tools thing is all about how do you get data?
how do you manipulate data?
And how do you prepare it for an LLM?
And then when it fails, how does the tool basically tell the LLM that it's
failed in a way that, is actually going to be beneficial Rather than
going into an endless loop in here.
So you can see for often really simple things, I will make quite complex tools.
This is an example of a webpage diffing tool, just to check, basically the
outputs of a web page so then an agent can tell when a web page has been updated.
So for example, this was a simple use of the tool for basically checking,
if OpenAI's webpage had been updated.
it could then basically assess what new links were there, and then
be able to go to those new links.
and find out what had been announced for, returning news,
returning different kinds of things.
Now the same kind of thing, worked nicely on sites like CNN and other
news sites and stuff like that.
The idea here though, is that this is a very custom tool
for a very specific use case.
And that's how you want to think about most of the things that you're doing.
When I look at some of the best, agents that I see companies doing, they've
generally got very specific tools that, they are able to sort of handle,
different kinds of input, work out what they need to do to generate data,
et cetera, provide that back to the agent in a way that's useful so that
the agent can know what's going on.
one of the sort of classic examples is if you look at a lot of the simple
search tools while they'll return information about, what's on the page,
they don't actually provide the URL.
so you want to sort of go through and customize some of those things so that
you're actually getting the URL back.
You're storing those URLs.
You'll then basically, caching any response to that URL.
So, if you're scraping that URL, then you're caching it so that your
agent can basically use that cache again and again, without having to
do any kind of, repeating itself of calling these different things.
this is a whole class of what I would call sort of intelligent tools
that you want to build in here.
All right.
Th this brings us to the fourth problem that I see a lot is the
whole idea of self-checking.
you need your agent to have some thing or some way of being able to check
its outputs and see, is it generating outputs that are useful or not useful?
the classic example of this would be, with code examples.
So if you got an agent that you've got, that's actually generating code,
you want to make sure that at some point, that code is checked and that
might be as simple as running a unit test on it to see, do all the imports
work, do the functions actually run, and return what I expect for them.
You want to set up some tests for things like that So that you can
actually check the output of the code that the agent is actually generating.
Now in lots of other use cases, you're not going to be generating code.
So you need to think about in those sort of situations, how will your agent
have the ability to know if something is right versus if something is wrong,
how can it check to see that this is something that's going to be useful versus
something that's just going to be totally off base of what the end user wants?
and that can be things like, checking URLs, LLMs loved hallucinate URLs.
So check, do those URLs actually exists?
Do they not exist?
That kind of thing that you want to think about as you're going through,
but this idea of self checking is a really sort of key thing.
The last thing, I think that you need to think about a lot and that
I see as a big problem with LLM agents is the lack of explainability.
So you really want to think about when the user actually gets a result
back at the end from an agent.
Can the agent sort of point to some explanation?
Now this could be citations is a great way of doing this.
citations showing exactly where the information that used to basically make
a decision or to do something, was,
That gives people a lot more confidence in the output of the agent when
they can see why the agent said something, or why the agent gave a
certain result, that kind of thing.
It can also be things like, being able to look at a set of log files
or look at a set of outputs that the agent made along the way.
So this brings us to sort of like the sixth of the bonus sort
of thing that you need to think of, which is debugging an agent.
you need to have some kind of outputs or some kind of logs that are kind
of intelligent and not just purely calling the LLMs and the agents.
That's one way of doing it, but can be very tedious way of going through.
You need to be able to assess at which point does the agent start to fall apart?
Now, remember a lot of this stuff.
if you're using the LLM agent, you should be using that to basically make decisions.
And perhaps generate, tokens out, as either text or as code or something
like that but mostly what you're using the reasoning part of an LLM
agent is to be able to make decisions is to be able to see these things.
Now you want to make sure that's something that gets logged
independently that's quite easy for you to see, ah, okay, this looks a
bit suspicious what's going on here?
Can we debug this?
We can look at the reasoning points in the agent as we go along.
So these things I think are things that you need to be thinking about constantly
when you're doing anything with LLM agents, autonomous agents in here.
far too often, I see people doing stuff that actually, you don't even need an
LLM, to do some of these things, you can just basically, sequence them up
. There's no need for any sort of decision point or something like that in there.
make sure that, when you're building your agent, you want it to have as few decision
points as possible to get the outcome that you want to be able to achieve with this.
So go back and assess some of your own agents and look at it and think
about, okay, where are the points of decision, going on in here?
And how am I checking to make sure that each of these things is being conformed
to, so that you do get the actual sort of reliability out of these things.
Are we making a bunch more videos of looking at building things with
LangGraph, even with things like CrewAI.
Even though, I don't think CrewAI is ideal for production.
I think it's great for trying ideas out really quickly.
I'll show you some sort of things that I've been doing with that To
be able to build some of these crews really quickly and try out ideas and
get a sense of what is probably going to work, what is not going to work.
and then look at, more about how converting them across to much more
sort of low level code things like LangGraph, things like just coding
some of these things in plain Python.
Often you don't need a framework to do some of these things.
and that's something that I want to go into more in the
future as we go through this.
Anyway, hopefully this video was useful to get you thinking about
the key things that go wrong in getting LLM agents into production.
And how you can start just think about mitigating some of these
problems that you come across.
As always, if you've got comments or questions, please
put them in the comments below.
If you found the video useful, please click like and subscribe.
And I will talk to you in the next video.
Bye for now.
Ver más vídeos relacionados
![](https://i.ytimg.com/vi/6E7GsUST6XY/hq720.jpg)
"More Agents is All You Need" Paper | Is Collective Intelligence the way to AGI?
![](https://i.ytimg.com/vi/WsPE3NfyoZ4/hq720.jpg)
PERCHE I NUOVI AGENTI NON PRENDONO INCARICHI DI VENDITA
![](https://i.ytimg.com/vi/1AmLD1aY7cM/hq720.jpg)
LangChain Agents: A Simple, Fast-Paced Guide
![](https://i.ytimg.com/vi/X3VF9FM-Oyo/hq720.jpg)
Chatbot or AI Agent Setting up crewai framework for scaling tasks
![](https://i.ytimg.com/vi/vVb366mGtXo/hq720.jpg)
The Future of Generative AI Agents with Joon Sung Park
![](https://i.ytimg.com/vi/_eICrhF1W5g/hq720.jpg)
Software Startup Ideas in 2024 | SaaS
5.0 / 5 (0 votes)