Watch how a Pro develops AI Agents in real-time
Summary
TLDRThe script details a tutorial on creating an AI agent using 'agent Ops' and 'Ops' to scrape websites and summarize content into tables. It introduces 'augment' as a coding assistant, discusses utilizing 'fir crawl' for web scraping, and 'open AI' for text summarization. The process involves setting up functions, handling API requests, and monitoring agent performance through a dashboard, emphasizing the ease of tracking costs and debugging with 'agent Ops'.
Takeaways
- 🤖 The script discusses building AI agents with 'Crew AI' and 'Agent Ops' for web scraping and summarizing content into tables.
- 📝 The importance of initializing and ending sessions with Agent Ops for tracking purposes is highlighted.
- 🔍 The use of 'fir crawl', an open-source web scraping framework, is introduced for fetching web data.
- 🔑 A mention of using the 'request' library to interact with APIs, specifically for the 'fir crawl' service.
- 💡 Tips are provided for debugging, such as using 'cgbd' to explain confusing parts of the code.
- 🛠️ The script covers the implementation of functions like 'crawl_web' and 'summarize_text' using Python.
- 📈 Agent Ops is used for monitoring AI agent performance, including tracking costs and latencies of large language model (LLM) calls.
- 🔗 The integration of Agent Ops with various LLM providers for automatic tracking of requests and costs is explained.
- 📊 The script demonstrates creating a table summary using OpenAI's language model and handling errors with Agent Ops' dashboard.
- 🔍 Agent Ops provides detailed observability into agent behavior, including chat history, waterfall graphs, and cost tracking.
- 🛑 The script identifies issues such as 'yappy' agents that produce too much output, making debugging difficult without proper tools.
Q & A
What is the primary objective of the project described in the script?
-The primary objective is to build an AI agent using Crew AI with Agent Ops that can scrape web data, summarize it, and present the information in a table format.
Which tool is mentioned for web scraping in the script?
-The tool mentioned for web scraping is 'Fir Crawl', an easy-to-use open-source web scraping framework.
What is the alternative to Co-Pilot that the speaker is using, and why is it preferred?
-The speaker is using 'Augment' as an alternative to Co-Pilot because it is faster and works better by scraping the entire database instead of just in-context.
How does the speaker suggest to handle confusion during the coding process?
-The speaker suggests taking a screenshot of the confusing part and pasting it into a chatbot, asking it to explain.
What is the purpose of the 'agent ops.init' function in the script?
-The 'agent ops.init' function is used to kick off the session for the AI agent, ensuring it starts correctly.
How does the speaker plan to summarize the web data into a table?
-The speaker plans to use a large language model (LLM) to read through the web content and create a table summary with the help of the 'summarize text' function.
What is the significance of the 'agent ops.record' decorator mentioned in the script?
-The 'agent ops.record' decorator is used to record the functions being executed, allowing Agent Ops to track which functions are happening at which moment, providing traceability.
What does the speaker mean by 'yappy agents' and why is it a problem?
-'Yappy agents' refers to agents that produce a large amount of output, making it difficult to parse through the console and debug. It's a problem because it lacks observability and clarity on the agent's actions and costs.
How does the Agent Ops dashboard help in debugging and understanding agent behavior?
-The Agent Ops dashboard provides a visual representation of the agent's actions, including a chat breakdown, waterfall diagram, and cost tracking, making it easier to understand and debug agent behavior.
What is the role of the 'client.chat.completion.create' in the script?
-The 'client.chat.completion.create' is used to interact with the large language model, sending prompts and receiving completions that help in summarizing the web data.
How does the speaker suggest tracking and managing the costs associated with using LLMs?
-The speaker suggests using Agent Ops to automatically track all the requests and costs associated with using LLMs, providing insights into the expenses and helping in managing them.
Outlines
🤖 Building AI Agents with AgentOps
This paragraph introduces a project to create AI agents capable of web scraping and summarizing information into a table format. The process begins with setting up a Python environment using the 'agent Ops' tool and 'augment' as an alternative to 'co-pilot' for database scraping. A 'crawl web' function is defined to scrape data, and a 'summarize text' function is planned. The use of 'fir crawl', an open-source web scraping framework, is highlighted for its ease of use and efficiency. The paragraph concludes with a discussion on implementing the web scraping function using Python code and the 'requests' library to interact with the 'fir crawl' API.
🔍 Implementing AgentOps for Data Recording and Debugging
The focus shifts to using AgentOps for recording data and debugging. A decorator function is introduced to mark specific functions for recording, allowing for traceability of events within the session. The paragraph demonstrates how to run the script and access the AgentOps dashboard to monitor the web scraping event. It also explains how to use the dashboard to understand the cost, latency, and success of the operations. The use of large language models (LLMs) like OpenAI is discussed, with AgentOps automatically tracking requests and costs. The paragraph ends with a practical example of summarizing web data into a table using OpenAI's chat completion feature.
🛠️ Debugging and Observability with AgentOps
This paragraph delves into debugging and observability using AgentOps. An error encountered during the LLM call is used to illustrate how AgentOps can help identify and fix issues by providing detailed traceback information. The discussion includes the benefits of using AgentOps for understanding agent behavior, such as cost tracking and performance monitoring. The paragraph also provides insights into optimizing agent performance by avoiding repetitive tasks and managing context lengths effectively. The use of AgentOps with different agent frameworks like Langchain and Autogen is mentioned, emphasizing the tool's versatility.
📊 AgentOps Dashboard for Session Analysis and Cost Management
The paragraph highlights the capabilities of the AgentOps dashboard for analyzing agent sessions and managing costs. It discusses how the dashboard provides a comprehensive view of agent performance over time, including completion rates, error analysis, and expenditure on AI credits. The dashboard's ability to offer a historical perspective on agent development and performance improvement is emphasized. The paragraph also touches on the integration of AgentOps with CI/CD pipelines and its role in evaluating agent performance against various tests, providing a robust framework for agent testing and monitoring.
🤝 Collaboration and Community Support for AgentOps Users
The final paragraph wraps up the discussion by emphasizing the collaborative aspect of using AgentOps. It mentions the availability of community support through a Discord server and the potential for further collaboration, such as podcasts or interviews. The paragraph reflects on the positive experience of creating AI agents with AgentOps and the value of the community in troubleshooting and sharing knowledge.
Mindmap
Keywords
💡Crew AI
💡Ops
💡Web Scraper
💡Table Formatting
💡Augment
💡Fir Crawl
💡API Key
💡Environmental Variables
💡Agent Ops
💡Large Language Model (LLM)
💡Error Dashboard
💡Observability
Highlights
Introduction to building AI agents with Ops to scrape web data and summarize it in a table format.
Use of 'augment' as an alternative to co-pilot for faster and more effective web scraping.
Explanation of initializing 'agent ops' and the importance of session management in Python scripts.
Utilization of 'fir crawl', an open-source web scraping framework, for easy data extraction.
Demonstration of using the 'requests' library to interact with the 'fir crawl' API for web scraping.
Importance of environmental variables in Python scripts for successful execution.
Testing the 'crawl web' function and observing the scraping process through the agent Ops dashboard.
Introduction of 'agent ops.record function' for tracking the functions executed during the session.
Integration of OpenAI for summarizing web content using large language models.
Automatic tracking of requests and costs when using large language model providers with agent Ops.
Debugging process using agent Ops dashboard to identify and fix errors in the code.
Overview of creating a job posting agent using separate research, writing, and review agents.
Discussion on the challenges of managing chatty agents and the lack of observability in agent operations.
Use of agent Ops session ID to access detailed information and chat history for debugging purposes.
Analysis of agent performance over time and identification of common errors affecting completion rates.
Introduction of 'Web Arena', an open-source evaluation set for testing agent performance on web tasks.
Explanation of agent testing, monitoring, and the importance of observability for scaling production.
Final thoughts on the value of agent Ops for understanding, debugging, and improving AI agent operations.
Transcripts
we are going to use crew AI with agent
Ops to build AI agents so let's get
started sounds good we're going to make
a agent that can scrape the web fetch
information about what's on the website
and then format it in a table uh first
thing we do with every python file is we
create a um an explanation what we're
doing so we're going to do two things
number one we're going to create a
website scraper so we'll say We'll
create summarized text function we'll
also create def craw web function yeah
import agent Ops obviously which code
assistant that you're using right now
right now I'm using augment which is a
um it's an alternative to co-pilot it's
a lot faster actually and it seems to
work a lot better cuz it scrapes your
entire database instead of just doing in
context yeah I need to try it I actually
I'll hook you up with the guys
afterwards all right sweet so let's
define our M we'll say agent ops. init
we'll create this all we have to do is
run end session after all of this done
all right so we're going to say number
one uh web data equals crawl web and
then we'll say we'll make this a return
to string quick tip for the people
watching if you are confused at any part
you can take a screenshot and paste it
into cgbd and ask it to explain so we'll
say web data equals craw web and we'll
say uh summarized text equals summarize
text on web data all right awesome so so
we're just storing the outputs of the
functions into simple variables exactly
so we're just going to call two
functions we're going say craw the web
and then with that web data we're going
to summarize it uh so we've yet to
implement what that looks like but we'll
figure that out just right now so uh the
crawl web I'm going to use a third party
tool called fir crawl so fir crawl is a
uh it's an easy web scraping framework
it's open source uh you can basically
just put in any website and we'll give
you just markdown text that you can feed
to any LM it's super clean and super
easy so we're just going to use this guy
to create a simple WebCrawler um let's
look at the documentation see how we can
make it work a lot of people are not
sure like which crawler to use so we
just say like this one is the best from
your experience fir crawl is probably
one of the easiest ones to use uh and it
works like super super well so uh let's
just get the code for this take the
python code assume that works and there
we go all right so um what are we
missing here
pren awesome
we're just go into pip install fir crawl
real quick pip install fir crawl okay
we're going to say that a URL belongs
here it should be a string awesome and
we're just going to use the request
library to hit the API and then we're
going to jump uh just dump the Json so
json. dump best and we should be good to
go um okay can you quickly go over the
entire function for people who might you
know not be following okay so what we're
doing here is we're just going to use
the request library to hit the fire CRA
AP fir crawl takes a simple parameter
takes the URL of the website we're
trying to scrape it will wait and then
it will just return exactly what we're
looking for at the end of it so we set
the headers so we just say application
jsol which is pretty stand standard we
set an API key uh which is the fire API
key uh we just do request. poost we send
it to the URL with the headers and then
this body which should be the Json
containing the URL that we want to post
and then we just wait to see if we get
the right status code if it's uh not 200
then going to say it failed otherwise
we're going to return the status which
will or return the uh the body which
should be response. Json so uh all we're
going to do next is um one one other
thing we're going to do is make sure
that you load in your environmental
variables so we're going to do load. EnV
so this makes sure that your
environmental variables are going to be
in your python script when you run it so
what we're going to do right now is
we're just going to test to make sure
that this actually works so uh we're
going to run this craw web function
we're going to give it the URL
htps agent ops.
a uh we're going to run the uh web
crawler function and see what happens
python main.py let's see if this guy
runs okay no module Nam fire C just get
rid of that
guy okay so python main.py uh nothing
happened because we did not run
main okay so kind of a high level view
of the code again we defined our web
crawler with crawl web we have a
summarized text which is yet to be
defined and then we have a main function
that takes we're going to run agent ops.
init that basically kicks off our
session we're going to run the crawler
function and then we're going to end the
session to make sure it's success or
fail so I'm just going to run main.py
see what
happens okay so we got our session
replay ID right here uh right now it's
probably scraping the web so let's see
what happens all right so in our
dashboard we can see that this run took
10
seconds and we didn't record any data
though so the way that we record data
with agent Ops is you just add the
simple decorator function on top of your
function so we'll say AO or agent ops.
record function we'll say scrape the web
and then we'll say agent Ops at repord
function summarize text so this way
agent Ops actually knows which functions
are happening at at which moment so you
can actually Trace back exactly what's
happening we can even add it to the main
function too so we'll say a
main cool so let's see what happens next
let's run it one more time see what goes
on so at what point can you go into the
agent op agent Ops dashboard and start
seeing it there so as soon as you see
the session ID show up in the terminal
you're free to go check it and the
events should come streaming in
automatically okay so you can see the
session ID up here so that means the
events are automatically streaming in so
if we actually check out the link it
should start showing up so let's go
check it out all right awesome so we
could see that that web scraping event
probably took 13 seconds if we look down
at the waterfall graph we could see okay
so this was the the scraping the web
function uh we can see that it actually
got all the data from agent Ops so we
can see the content is all on this web
page so this is all text that I can feed
to my large language model all right
awesome so now what we're going to do is
now we're going to feed this to a large
language model and see if we can get a
table summary of it so let's import open
AI we can use any llm here we can use
gro we can use anthropic but open AI for
Simplicity so the magic about agent Ops
is that when you import almost any large
language model provider we automatically
track all the requests that go through
and we can track the cost we can track
the latency we track whether it actually
works for you so just by importing it
alone you're good to go so what we're
going to do here is uh we're going to
set an open AI client in the function so
let's say summarize text we'll say uh
client equals open AI open AI we set our
open a key uh and we'll say uh messages
equals your web summarizer agent your
task is to read through web content and
tell me the name title interesting facts
of the company the website does be clear
and don't make things up awesome so this
this looks actually pretty good to me
for a automatic completion uh so we're
going to take here is we have client.
chat. completion. create we're going to
set our model to GPD 3.5 turbo we're
going to set our messages to messages so
so um that should give us a good way of
starting so let's run
main.py all right we got our session ID
up
here we're going to wait for that to
occur uh and then we can basically track
all the events happening in the agent
Ops dashboard so let's take a look here
uh we have one three events so far we
have the uh web scrape and then we have
the uh the open Ai call and we can see
here it costs less than a penny
and here's all the text it
took and here's our easy summary so it
tells us the name of the website agent
Ops title build AI agents with LM apps
interesting facts agent Ops is a
platform to build reliable AI agents and
monitoring awesome all this information
checks out one last step we're going to
take here is we're just going to add one
more open Ai call and we're going to say
uh response equals that we'll say
response text equals the text and we're
going to make one more open Ai call say
uh table messages
equals make the response text a
table Your Role is to summarize the
table the information below as a table
and
markdown okay awesome so we got the the
messages in here and we're just going to
copy paste the code from above and see
what we get
so say response equals client.
completion. create reset our message to
this uh and we'll say all right awesome
and we'll just print that at the end
return
response
print response and we'll say return
response. choices. text and yeah that
should just about do it so uh and we're
just going to print here print
summarize text
got it so in the past few minutes we
have now created an agent that can crawl
the
web get that web data and then summarize
to a table and we're just going to spend
one more time trying to
um to uh run it also you can also see
for all prior runs every time you finish
a run you get a chat summary saying
exactly how much you spent for that run
so that way it's a lot easier of a way
to just track what's going on awesome so
it looks like we actually had some
errors here so now we can use Asian Ops
to bug what the errors were in the
dashboard so I'm going to open up the
link and see exactly what
happens chat completion object is not
subscriptable okay so I can use this
traceback to see exactly what happened
here uh my guess is that I actually used
the open AI client incorrectly so that
way I can go back in time and see what
exactly occurred so we get a big error
bar here saying in the main function
which we decorated at the beginning uh
this was kind of like the encapsulating
function that caused the issue we can
see that the web scraping worked cuz
it's nice and blue but we have we have
an error that was attached to the llm uh
so we actually had a type error that
said chat completion object not
subscriptable and we could basically use
this as a way to read through what the
parameters of the function were and also
what the returns were and that way we
can basically see exactly what were
wrong and then fix that in our code we
just took an example crew AI uh notebook
so essentially this agent takes three
separate agents that work together it
takes a research
analyst a writer agent and then a review
agent if we actually look at what these
agents do we can see that the research
analyst will analyze company websites
the writer agent will use the insights
from the website to create a detailed
engaging and enticing job posting and
then the final The Specialist the review
specialist will use the job posting to
create a clear more concise
grammatically correct job posting that
we can post on LinkedIn or indeed or any
other job websites uh so all we have to
do for agent Ops if we're going to plug
this in is two lines of code you just do
import agent Ops and agent ops. in it
and then optionally you can add any tags
so you can track your sessions more
easily and then you're good to go so
let's try running it see what we get and
we're also going to link this GitHub
repo below the video so people can just
clone it themselves all right so uh
we're going to run the python script so
just run main.py so it's going to ask us
two things number one what is the
company description and number two it's
actually going to give us a little handy
dandy link to where we can inspect the
session later so we'll say say agent Ops
does a i agent testing monitoring and
evals agents suck we're fixing that
company domain agent ops. hiring
needs AI
Engineers speciic benefits work with the
most cracked developers in San Francisco
all right so right now the agents are
all going to work together to spin up
and create this amazing job posting it's
going to scrape the web it's going to
use a set of tools and they're all going
to work and chat together to be able to
solve this problem all dynamically so
first thing that we notice here is that
the agents are kicking off uh one big
problem here is these things are very
yappy that's a technical term by the way
so these agents are just spewing tokens
all over my console and if I want to
debug this thing it's a huge pain in the
neck so you can see here there's just
like so much stuff going on um wall of
text wall of text exactly so being able
to just parse through this is a
nightmare to begin with uh secondarily
like we actually have no observability
what's going on what step of the
sequence is at how much it's costing us
how long it's going to go on for so
that's a huge pain in the neck but
thankfully since we have this agent off
session ID up here all we have to do is
click this link and it's going to open
our current session so our current
session we can see that we started this
about 2 minutes ago and it's already
cost us 14 cents in open AI credits
which is pretty crazy to think about we
get all of the environment information
about uh which environment we're running
on so it could be a pod or Docker
container you name it uh we're running
on my MacBook right now but kind of more
interestingly we can see like the entire
chat breakdown so remember how we spun
up three different agents we can
actually see that we have the research
analyst the job description writer and
the review specialist all here in the
timeline and we can filter down by those
if we really wanted to um also so here's
the full chat history so instead of
parsing through again this really messy
block of text we can just watch it
through here which is a huge time saer
uh lastly we actually get this like
Dynamic waterfall breakdown of all the
actions that the agent was taking all
the tool calls like searching websites
all the llm calls that uses gp4 so we
can see this one cost us 4 cents we can
see the entire prompt and completion
here and all the reasoning that it goes
through uh I'm going to go back to a
previous session I ran in the past and
just show you how powerful this could
actually
be yeah I mean this dashboard is just
going to let people understand agents
much faster because especially when
people are new to building agents it's
got hard to understand what they're
doing right but in agent OBS you can
clearly see that every step of the
way basically we are taking agents we a
black box agents used to be a black box
we're making it so we're taking a
flashlight you can see exactly what's
going on you see how much they cost you
see how long they take you can see
actually how they solve problems how
they reason through things so here's an
agent for example that ran and cost me
almost $10 an open AI credits and took
almost 12 minutes took 11 minutes and 8
seconds uh and you can see actually had
a stop at pre prematurely I can see the
whole end session reason the sign
detected but also all the environment
variables uh and I can reason through
like okay so which uh agent was doing
the most amount of time so I could see
all the agents let's suppose I want to
look at the review specialist and see
exactly what was happening I could see
that the review Specialist made 44 tool
calls and 40 llm calls so that's
probably a big mistake and one big
challenge here is that had this repeat
thought and the repeat thought was that
it was constantly doing the same thing
over and over again so we have this
repeat thought catcher that can
basically say sometimes the agents going
circles how do we make sure they stopped
going in circles and so we know that
agent the the review specialist in
particular was problematic so we can go
back in time and essentially rewire the
prompt to make sure that it
works so I'm going to go look at the
entire timeline of events and see
exactly what could have been changed
here so for example here oh what's this
there's a big red error I wonder what
that could have been so basically we
actually ran out of context length that
this agent was so yappy so we got a 400
error code we consumed over 8,000 tokens
uh because we used 88,200 38 tokens and
so that actually showed a huge stack
Trace error that we could have prevented
by doing better context management so
this is a huge step up from instead of
just looking at the terminal and trying
to figure out what's going on you
basically have a giant error dashboard
just says okay super easy to do through
here give it a shot and also you can use
any of the main agent Frameworks
right we support most of the main agent
Frameworks so we're totally plugged in
with crew AI you can just pip install
agent Ops and pip install crew uh with a
special branch and you're good to go
also Al we're native on Pi autogen which
is the Microsoft framework uh for
building multi-agent Frameworks uh if
you're building agents of Lang chain we
work with that and we're rolling out
support with uh llama index later this
week nice so in addition to actually
looking at individual sessions I can get
an overall view of how all of my agents
are performing over time so I've been
able to look at individual sessions but
how about the aggregate so here's a
session breakdown so I'm going to select
agents I've run in the past 30 days we
can see most of my agents uh actually
never completed 36 of them never
completed and I can see like basically
all of the highle metrics I care about
to know what's causing the agents to
fail for example this one I interrupted
about six of them but a lot of them also
uh come from these 429 error codes a lot
of them come from 400 error codes so on
and so forth so I can use that as
information to understand why my agents
Break um more interestingly I can also
see how much I've been spending on these
things agents again super expensive
right now they're expensive and they're
unreliable and you want to have
observability to exactly why they're
breaking and how much they cost you
otherwise you're not going to scale the
production and they're not going to
change the world if they're too
expensive and too slow so that's how
this dashboard gives you a high level
view of exactly what the agents are
doing it's going also sort of like a
personal history of you know developing
agents like for example on GitHub you
have the heat map of when you're active
right so this is kind of similar where
you can see like how many agents you've
built and how you've improved over time
maybe reduced the amount of Errors
you're getting stuff like
that yeah we're working in uh actually a
handful of like cicd things so if you're
trying to roll out a test kit for your
agents we also have that covered too
okay so for example if you're running
agents against variety of like different
tests we have several thousand actually
loaded into the platform right now so
here's an agent configuration test
called Web Arena web arena is basically
an open source evaluation set where you
can run agents against websites and see
how well they perform on doing web tasks
so for example on this website the task
was to find the top selling brand in q1
2022 and here's the website we can take
a look and see exactly what that looks
like so some sort of dashboard the
agent's job is to basically log into the
dashboard and find that information we
can see that the agent failed the answer
was supposed to be Sprite and it gave
the wrong answer so that means the
evaluation failed but we can do this for
human evals exact evals fuzzy matches
you name it all this information you can
basically track and see how your agents
are increasing or losing performance
over time W that's that's actually super
powerful uh the way it works is we do
all sorts of thing everything related to
agent tracking agent testing and agent
monitoring so replay Analytics LM cost
tracking benchmarking compliance we have
a ton of Integrations with Frameworks
like linkchain crew Ai and autogen uh
and all it really takes is two lines of
code really it's that easy all you have
to do is import agent Ops and run agent
Ops out in it optionally if you want to
track specific functions you just run
this agent ops. record function and it
automatically adds to your dashboard the
only other thing you might have to do
and it's totally optional is just end
your session and this way you can see
whether your agents succeeded or your
agents failed and all from doing that
you get these really fancy dashboards
that show you exactly what your agents
were doing you have the ability to
rewind and understand what your agents
cost and how much they take in terms of
time and how much compute they take and
then lastly you get these like waterfall
diagrams showing you exactly what they
were doing at any given moment uh at any
given moment in time you can see how
much they cost in terms of llms whether
they're errors you can see if you have
interruptions in your services it just
all works perfectly out of the box by
the way guys if you get stuck at any
point building with agent Ops they have
a really cool Discord server with over
1,000 members but it's super active and
you can ask any question so even though
it's like three lines of code sometimes
you know you can get unexpected errors
and you know I'm sure Alex or other
people will help you out yeah man this
was super fun uh I'm really glad to put
this together man your channel is sick
like I was watching some of the
interviews like I I really love how
you're growing so it's really really
cool I mean we can do a podcast too like
just me and you if you want to
استعرض المزيد من الفيديوهات ذات الصلة
orb.live Signup and Basic How To
How I Automated My Workflow with AI Agent Teams (NO-CODE)
SHOCKING Robots EVOLVE in the SIMULATION plus OpenAI Leadership Just... LEAVES?
This AI Agent can Scrape ANY WEBSITE!!!
OpenAI Assistants API Tutorial - Playground & NodeJS
免费的 GPT-4 Turbo 香不香?Coze 扣子海外版、中文版双发,深度评测 + 手把手教程 | 回到Axton
5.0 / 5 (0 votes)