Watch how a Pro develops AI Agents in real-time
Summary
TLDRDieses Skript demonstriert die Erstellung eines AI-Agents, der Webinhalte extrahiert und in eine Tabelle formatiert. Es wird die 'agent Ops'-Plattform verwendet, um die Agenten-Performance zu überwachen und zu analysieren. Durch den Einsatz von 'fir crawl' für das Scraping und OpenAI für die Textsummarisierung werden die Schritte des Prozesses veranschaulicht. Zusätzlich wird gezeigt, wie man Fehler mit der Agent-Dashboard-Funktion diagnostiziert und wie man die Kosten- und Leistungsanalyse von verschiedenen Agenten-Laufzeiten einblendet.
Takeaways
- 🤖 Die Diskussion umfasst das Erstellen eines AI-Agents, der Webinhalte extrahieren und in Tabellenform präsentieren kann.
- 🔍 Der Agent soll in der Lage sein, Websites zu crawlen und Informationen zu sammeln, indem er die Funktion 'crawl web' nutzt.
- 📝 Die Zusammenfassung der Webinhalte erfolgt durch die 'summarize text'-Funktion, die noch zu implementieren ist.
- 🛠 Für das Web-Crawling wird das Open-Source-Framework 'fir crawl' verwendet, das Markdown-Texte liefert.
- 🔑 Es wird auf die Verwendung von API-Schlüsseln und Anforderungsbibliotheken wie 'requests' für die Kommunikation mit dem 'fir crawl' hingewiesen.
- 🔄 Die Umgebungsvariablen müssen geladen werden, um sicherzustellen, dass sie im Python-Skript zur Verfügung stehen, wenn es ausgeführt wird.
- 📊 Der Agent 'Ops' bietet Dashboards an, die es ermöglichen, den Ablauf und die Kosten von AI-Agents zu verfolgen und zu analysieren.
- 🔎 Agent 'Ops' kann die Interaktionen und Kosten von verschiedenen großen Sprachmodellen (LLMs) verfolgen und analysieren.
- 🚀 Durch den Einsatz von 'Agent Ops' können Entwickler ihre AI-Agents überwachen, was zu einer verbesserten Zuverlässigkeit und Leistung bei der Problembehandlung führt.
- 💡 Der Einsatz von 'Agent Ops' ermöglicht es, Fehler schnell zu identifizieren und zu beheben, indem man die Schritte und Entscheidungen der Agents im Dashboard nachvollziehen kann.
- 🔗 'Agent Ops' ist in verschiedenen Agenten-Frameworks integriert und kann leicht in bestehende Projekte einbezogen werden.
Q & A
Was ist das Hauptziel des in der Diskussion beschriebenen Projekts?
-Das Hauptziel des Projekts ist es, einen AI-Agenten zu erstellen, der Webseiten crawlen und Informationen sammeln kann, um sie dann in einer Tabelle zu formatieren.
Welche Funktionen soll der AI-Agent haben?
-Der AI-Agent soll zwei Hauptfunktionen haben: Erstens soll er in der Lage sein, Inhalte von Webseiten zu crawlen, und zweitens soll er diese Inhalte in einer übersichtlichen Tabelle zusammenfassen.
Welches Tool wird für das Webcrawling verwendet?
-Für das Webcrawling wird das Tool 'Fir Crawl' verwendet, das Open Source ist und eine einfache Möglichkeit bietet, Inhalte von Webseiten zu sammeln.
Was ist 'Augment' und wie wird es in diesem Projekt eingesetzt?
-Augment ist ein alternativer Code-Assistent, vergleichbar mit Co-Pilot, der in diesem Projekt verwendet wird, um den Code-Schreibprozess zu beschleunigen und zu verbessern.
Wie wird die Zusammenfassung der Webinhalte erstellt?
-Die Zusammenfassung der Webinhalte wird durch einen separaten Funktionsaufruf namens 'summarize text' erstellt, der die gesammelten Daten in einer übersichtlichen Form präsentiert.
Welche Rolle spielt die Agent-Ops-Plattform in diesem Projekt?
-Die Agent-Ops-Plattform dient zur Überwachung und Analyse der Aktivitäten des AI-Agents, einschließlich Kosten-, Latenz- und Erfolgsverfolgung der durchgeführten Anfragen.
Wie wird sichergestellt, dass die Umgebungsvariablen im Python-Skript geladen werden?
-Um sicherzustellen, dass die Umgebungsvariablen geladen werden, wird die Funktion 'load.env' aufgerufen, die die Umgebungsvariablen in das Python-Skript einbindet.
Welche Bedeutung haben die Dekoratoren in der Agent-Ops-Funktion?
-Die Dekoratoren in der Agent-Ops-Funktion, wie z.B. 'agent_ops.record_function', dienen dazu, die von den Funktionen durchgeführten Aktionen zu verfolgen und im Dashboard nachzuvollziehen.
Was ist der Zweck des OpenAI-Clients in der 'summarize text'-Funktion?
-Der OpenAI-Client in der 'summarize text'-Funktion wird verwendet, um die gesammelten Webinhalte an einen Large Language Model (LLM) zu senden und eine Zusammenfassung zu erhalten.
Wie wird die Leistung der Agents im Laufe der Zeit gemessen und visualisiert?
-Die Leistung der Agents wird durch das Agent-Ops-Dashboard gemessen und visualisiert, das Informationen über die Kosten, die Dauer und den Erfolg der durchgeführten Aktionen anzeigt.
Was sind die Vorteile des Agent-Ops-Dashboards bei der Fehlerbehebung und -analyse?
-Das Agent-Ops-Dashboard ermöglicht eine detaillierte Analyse von Fehlern und die Überwachung der Agentenleistung, indem es eine zeitliche Aufschlüsselung der Ereignisse und eine Visualisierung der Aktionen bietet.
Outlines
🤖 Einführung in die Erstellung von AI-Agenten
Dieses Video-Script führt in die Erstellung von AI-Agenten mit dem Tool 'Agent Ops' ein. Der Fokus liegt auf dem Bau eines Web-Crawlers, der Informationen von Websites sammelt und in einer Tabelle präsentiert. Der Prozess beginnt mit der Erklärung des Projekts und der Installation von 'Agent Ops'. Es wird auf die Verwendung von 'Augment' als schnelleren Alternative zu 'Co-Pilot' hingewiesen, da es die gesamte Datenbank durchsucht. Der Prozess umfasst das Definieren von Funktionen zur Web-Crawling und Text-Summarisierung, die Verwendung von 'fir crawl' als Drittanbieter-Tool für das Scraping und die Implementierung von OpenAI zur Text-Summarisierung. Schließlich wird ein Test des Crawler-Codes dargestellt, der zeigt, wie er Daten sammelt und in der Agent-Ops-Dashboard-Übersicht angezeigt werden kann.
🔍 Aufzeichnen und Überwachen von Agent-Funktionen
Diese Passage erläutert, wie man mit 'Agent Ops' Daten aufzeichnet und Funktionen überwacht. Es wird beschrieben, wie man einen Dekorator verwendet, um zu kennzeichnen, welche Funktionen ausgeführt werden, und wie man das Dashboard nutzt, um die Leistung der Agenten zu verfolgen. Es wird gezeigt, wie man den 'record function' und 'report function' verwendet, um die Aktivitäten nachzuvollziehen und wie man den 'main' Funktion hinzufügt, um die gesamte Agent-Aktivität zu überwachen. Der Abschnitt zeigt auch, wie man die Kosten und Latenz von OpenAI-Anfragen verfolgt, wenn man einen Large Language Model (LLM) nutzt, und wie man die Ergebnisse der Text-Summarisierung in eine Tabelle übersetzt.
🛠 Problembehandlung und Fehleranalyse
In diesem Abschnitt wird beschrieben, wie man mit 'Agent Ops' bei auftretenden Fehlern nachvollzogen und diese behebt. Es wird ein Fehler 'chat completion object is not subscriptable' vorgestellt, der auf ein Missverständnis beim Einsatz des OpenAI-Clients hindeutet. Der Fehler wird im Dashboard nachvollzogen, und es wird gezeigt, wie man die Funktionsparameter und Rückgabewerte überprüft, um die Ursache des Problems zu identifizieren. Es wird auch auf die Verwendung von 'Agent Ops' für die Überwachung von Agenten hinzugeweiht, die zusammenarbeiten, um Aufgaben wie die Erstellung von Stellenanzeigen zu erledigen.
📊 Überwachung der Agent-Leistung und -Kosten
Dieser Abschnitt konzentriert sich auf die Überwachung der Leistung und Kosten der Agenten mit 'Agent Ops'. Es wird gezeigt, wie man die Dashboard-Funktionen nutzt, um die Kosten und die Dauer von Agent-Aktivitäten zu analysieren. Es wird erläutert, wie man die Chat-Historie und die Aktionen der Agenten im Zeitverlauf verfolgt und wie man die Kosten für die Nutzung von LLMs wie OpenAI verfolgt. Es wird auch auf die Vorteile der Verwendung von 'Agent Ops' für die Verbesserung der Agent-Performance und die Kostenkontrolle hingewiesen.
🤝 Zusammenfassung und Community-Support
Der Schlussabschnitt des Scripts bietet eine Zusammenfassung der Vorteile von 'Agent Ops' für die Entwicklung und Überwachung von AI-Agenten. Es wird auf die einfache Integration in verschiedene Agent-Frameworks und die Unterstützung durch eine aktive Community in Discord verwiesen. Es wird betont, wie 'Agent Ops' Entwicklern hilft, ihre Agenten zu verbessern, indem es eine detaillierte Analyse und Überwachung ermöglicht. Der Abschnitt endet mit einem Angebot für eine Podcast-Diskussion, um die Zusammenarbeit zu vertiefen.
Mindmap
Keywords
💡AgentOps
💡Web-Crawler
💡Summarize Text
💡OpenAI
💡Fir Crawl
💡Request Library
💡JSON
💡Decorator
💡Observability
💡Crew AI
Highlights
Erstellung eines AI-Agents, der Webinhalte extrahiert und in eine Tabelle formatiert.
Verwendung von 'agent Ops' zur Erstellung und Überwachung von AI-Agents.
Import von 'agent Ops' zur Code-Verwendung und Session-Management.
Erstellung einer Funktion zur Zusammenfassung von Text mithilfe von AI.
Verwendung des 'fir crawl' Frameworks für Web-Scraping.
Installation von 'fir crawl' mittels 'pip install fir crawl'.
Anwendung des 'request' Libraries für API-Anfragen.
Laden von Umgebungsvariablen mit 'load. EnV' für Python-Skripte.
Test der Web-Crawler-Funktion mit einer gegebenen URL.
Verwendung des 'agent Ops' Dashboards zur Überwachung von Agent-Aktivitäten.
Hinzufügen von 'agent Ops' Dekoratoren für die Funktionsüberwachung.
Integration von OpenAI zur Text-Zusammenfassung in Tabellenform.
Verfolgung von Anfragen und Kosten für Language Model-Anbieter durch 'agent Ops'.
Fehlerbehandlung und -debugging mit dem 'agent Ops' Dashboard.
Überwachung der Agent-Leistung mit der 'agent Ops' Session-Historie.
Analyse von Agent-Fehlern und -Kosten im 'agent Ops' Dashboard.
Verwendung von 'agent Ops' für die Entwicklung und Überwachung von Agent-basierten Aufgaben.
Integration von 'agent Ops' mit verschiedenen Agent-Frameworks und Plattformen.
Einsatz von 'agent Ops' für die Durchführung von CI/CD-Tests für Agents.
Überblick über die Leistung von Agents im 'agent Ops' Dashboard.
Einsatz von 'agent Ops' zur Verbesserung der Agent-Entwicklung durch Analyse von Fehlern und Kosten.
Transcripts
we are going to use crew AI with agent
Ops to build AI agents so let's get
started sounds good we're going to make
a agent that can scrape the web fetch
information about what's on the website
and then format it in a table uh first
thing we do with every python file is we
create a um an explanation what we're
doing so we're going to do two things
number one we're going to create a
website scraper so we'll say We'll
create summarized text function we'll
also create def craw web function yeah
import agent Ops obviously which code
assistant that you're using right now
right now I'm using augment which is a
um it's an alternative to co-pilot it's
a lot faster actually and it seems to
work a lot better cuz it scrapes your
entire database instead of just doing in
context yeah I need to try it I actually
I'll hook you up with the guys
afterwards all right sweet so let's
define our M we'll say agent ops. init
we'll create this all we have to do is
run end session after all of this done
all right so we're going to say number
one uh web data equals crawl web and
then we'll say we'll make this a return
to string quick tip for the people
watching if you are confused at any part
you can take a screenshot and paste it
into cgbd and ask it to explain so we'll
say web data equals craw web and we'll
say uh summarized text equals summarize
text on web data all right awesome so so
we're just storing the outputs of the
functions into simple variables exactly
so we're just going to call two
functions we're going say craw the web
and then with that web data we're going
to summarize it uh so we've yet to
implement what that looks like but we'll
figure that out just right now so uh the
crawl web I'm going to use a third party
tool called fir crawl so fir crawl is a
uh it's an easy web scraping framework
it's open source uh you can basically
just put in any website and we'll give
you just markdown text that you can feed
to any LM it's super clean and super
easy so we're just going to use this guy
to create a simple WebCrawler um let's
look at the documentation see how we can
make it work a lot of people are not
sure like which crawler to use so we
just say like this one is the best from
your experience fir crawl is probably
one of the easiest ones to use uh and it
works like super super well so uh let's
just get the code for this take the
python code assume that works and there
we go all right so um what are we
missing here
pren awesome
we're just go into pip install fir crawl
real quick pip install fir crawl okay
we're going to say that a URL belongs
here it should be a string awesome and
we're just going to use the request
library to hit the API and then we're
going to jump uh just dump the Json so
json. dump best and we should be good to
go um okay can you quickly go over the
entire function for people who might you
know not be following okay so what we're
doing here is we're just going to use
the request library to hit the fire CRA
AP fir crawl takes a simple parameter
takes the URL of the website we're
trying to scrape it will wait and then
it will just return exactly what we're
looking for at the end of it so we set
the headers so we just say application
jsol which is pretty stand standard we
set an API key uh which is the fire API
key uh we just do request. poost we send
it to the URL with the headers and then
this body which should be the Json
containing the URL that we want to post
and then we just wait to see if we get
the right status code if it's uh not 200
then going to say it failed otherwise
we're going to return the status which
will or return the uh the body which
should be response. Json so uh all we're
going to do next is um one one other
thing we're going to do is make sure
that you load in your environmental
variables so we're going to do load. EnV
so this makes sure that your
environmental variables are going to be
in your python script when you run it so
what we're going to do right now is
we're just going to test to make sure
that this actually works so uh we're
going to run this craw web function
we're going to give it the URL
htps agent ops.
a uh we're going to run the uh web
crawler function and see what happens
python main.py let's see if this guy
runs okay no module Nam fire C just get
rid of that
guy okay so python main.py uh nothing
happened because we did not run
main okay so kind of a high level view
of the code again we defined our web
crawler with crawl web we have a
summarized text which is yet to be
defined and then we have a main function
that takes we're going to run agent ops.
init that basically kicks off our
session we're going to run the crawler
function and then we're going to end the
session to make sure it's success or
fail so I'm just going to run main.py
see what
happens okay so we got our session
replay ID right here uh right now it's
probably scraping the web so let's see
what happens all right so in our
dashboard we can see that this run took
10
seconds and we didn't record any data
though so the way that we record data
with agent Ops is you just add the
simple decorator function on top of your
function so we'll say AO or agent ops.
record function we'll say scrape the web
and then we'll say agent Ops at repord
function summarize text so this way
agent Ops actually knows which functions
are happening at at which moment so you
can actually Trace back exactly what's
happening we can even add it to the main
function too so we'll say a
main cool so let's see what happens next
let's run it one more time see what goes
on so at what point can you go into the
agent op agent Ops dashboard and start
seeing it there so as soon as you see
the session ID show up in the terminal
you're free to go check it and the
events should come streaming in
automatically okay so you can see the
session ID up here so that means the
events are automatically streaming in so
if we actually check out the link it
should start showing up so let's go
check it out all right awesome so we
could see that that web scraping event
probably took 13 seconds if we look down
at the waterfall graph we could see okay
so this was the the scraping the web
function uh we can see that it actually
got all the data from agent Ops so we
can see the content is all on this web
page so this is all text that I can feed
to my large language model all right
awesome so now what we're going to do is
now we're going to feed this to a large
language model and see if we can get a
table summary of it so let's import open
AI we can use any llm here we can use
gro we can use anthropic but open AI for
Simplicity so the magic about agent Ops
is that when you import almost any large
language model provider we automatically
track all the requests that go through
and we can track the cost we can track
the latency we track whether it actually
works for you so just by importing it
alone you're good to go so what we're
going to do here is uh we're going to
set an open AI client in the function so
let's say summarize text we'll say uh
client equals open AI open AI we set our
open a key uh and we'll say uh messages
equals your web summarizer agent your
task is to read through web content and
tell me the name title interesting facts
of the company the website does be clear
and don't make things up awesome so this
this looks actually pretty good to me
for a automatic completion uh so we're
going to take here is we have client.
chat. completion. create we're going to
set our model to GPD 3.5 turbo we're
going to set our messages to messages so
so um that should give us a good way of
starting so let's run
main.py all right we got our session ID
up
here we're going to wait for that to
occur uh and then we can basically track
all the events happening in the agent
Ops dashboard so let's take a look here
uh we have one three events so far we
have the uh web scrape and then we have
the uh the open Ai call and we can see
here it costs less than a penny
and here's all the text it
took and here's our easy summary so it
tells us the name of the website agent
Ops title build AI agents with LM apps
interesting facts agent Ops is a
platform to build reliable AI agents and
monitoring awesome all this information
checks out one last step we're going to
take here is we're just going to add one
more open Ai call and we're going to say
uh response equals that we'll say
response text equals the text and we're
going to make one more open Ai call say
uh table messages
equals make the response text a
table Your Role is to summarize the
table the information below as a table
and
markdown okay awesome so we got the the
messages in here and we're just going to
copy paste the code from above and see
what we get
so say response equals client.
completion. create reset our message to
this uh and we'll say all right awesome
and we'll just print that at the end
return
response
print response and we'll say return
response. choices. text and yeah that
should just about do it so uh and we're
just going to print here print
summarize text
got it so in the past few minutes we
have now created an agent that can crawl
the
web get that web data and then summarize
to a table and we're just going to spend
one more time trying to
um to uh run it also you can also see
for all prior runs every time you finish
a run you get a chat summary saying
exactly how much you spent for that run
so that way it's a lot easier of a way
to just track what's going on awesome so
it looks like we actually had some
errors here so now we can use Asian Ops
to bug what the errors were in the
dashboard so I'm going to open up the
link and see exactly what
happens chat completion object is not
subscriptable okay so I can use this
traceback to see exactly what happened
here uh my guess is that I actually used
the open AI client incorrectly so that
way I can go back in time and see what
exactly occurred so we get a big error
bar here saying in the main function
which we decorated at the beginning uh
this was kind of like the encapsulating
function that caused the issue we can
see that the web scraping worked cuz
it's nice and blue but we have we have
an error that was attached to the llm uh
so we actually had a type error that
said chat completion object not
subscriptable and we could basically use
this as a way to read through what the
parameters of the function were and also
what the returns were and that way we
can basically see exactly what were
wrong and then fix that in our code we
just took an example crew AI uh notebook
so essentially this agent takes three
separate agents that work together it
takes a research
analyst a writer agent and then a review
agent if we actually look at what these
agents do we can see that the research
analyst will analyze company websites
the writer agent will use the insights
from the website to create a detailed
engaging and enticing job posting and
then the final The Specialist the review
specialist will use the job posting to
create a clear more concise
grammatically correct job posting that
we can post on LinkedIn or indeed or any
other job websites uh so all we have to
do for agent Ops if we're going to plug
this in is two lines of code you just do
import agent Ops and agent ops. in it
and then optionally you can add any tags
so you can track your sessions more
easily and then you're good to go so
let's try running it see what we get and
we're also going to link this GitHub
repo below the video so people can just
clone it themselves all right so uh
we're going to run the python script so
just run main.py so it's going to ask us
two things number one what is the
company description and number two it's
actually going to give us a little handy
dandy link to where we can inspect the
session later so we'll say say agent Ops
does a i agent testing monitoring and
evals agents suck we're fixing that
company domain agent ops. hiring
needs AI
Engineers speciic benefits work with the
most cracked developers in San Francisco
all right so right now the agents are
all going to work together to spin up
and create this amazing job posting it's
going to scrape the web it's going to
use a set of tools and they're all going
to work and chat together to be able to
solve this problem all dynamically so
first thing that we notice here is that
the agents are kicking off uh one big
problem here is these things are very
yappy that's a technical term by the way
so these agents are just spewing tokens
all over my console and if I want to
debug this thing it's a huge pain in the
neck so you can see here there's just
like so much stuff going on um wall of
text wall of text exactly so being able
to just parse through this is a
nightmare to begin with uh secondarily
like we actually have no observability
what's going on what step of the
sequence is at how much it's costing us
how long it's going to go on for so
that's a huge pain in the neck but
thankfully since we have this agent off
session ID up here all we have to do is
click this link and it's going to open
our current session so our current
session we can see that we started this
about 2 minutes ago and it's already
cost us 14 cents in open AI credits
which is pretty crazy to think about we
get all of the environment information
about uh which environment we're running
on so it could be a pod or Docker
container you name it uh we're running
on my MacBook right now but kind of more
interestingly we can see like the entire
chat breakdown so remember how we spun
up three different agents we can
actually see that we have the research
analyst the job description writer and
the review specialist all here in the
timeline and we can filter down by those
if we really wanted to um also so here's
the full chat history so instead of
parsing through again this really messy
block of text we can just watch it
through here which is a huge time saer
uh lastly we actually get this like
Dynamic waterfall breakdown of all the
actions that the agent was taking all
the tool calls like searching websites
all the llm calls that uses gp4 so we
can see this one cost us 4 cents we can
see the entire prompt and completion
here and all the reasoning that it goes
through uh I'm going to go back to a
previous session I ran in the past and
just show you how powerful this could
actually
be yeah I mean this dashboard is just
going to let people understand agents
much faster because especially when
people are new to building agents it's
got hard to understand what they're
doing right but in agent OBS you can
clearly see that every step of the
way basically we are taking agents we a
black box agents used to be a black box
we're making it so we're taking a
flashlight you can see exactly what's
going on you see how much they cost you
see how long they take you can see
actually how they solve problems how
they reason through things so here's an
agent for example that ran and cost me
almost $10 an open AI credits and took
almost 12 minutes took 11 minutes and 8
seconds uh and you can see actually had
a stop at pre prematurely I can see the
whole end session reason the sign
detected but also all the environment
variables uh and I can reason through
like okay so which uh agent was doing
the most amount of time so I could see
all the agents let's suppose I want to
look at the review specialist and see
exactly what was happening I could see
that the review Specialist made 44 tool
calls and 40 llm calls so that's
probably a big mistake and one big
challenge here is that had this repeat
thought and the repeat thought was that
it was constantly doing the same thing
over and over again so we have this
repeat thought catcher that can
basically say sometimes the agents going
circles how do we make sure they stopped
going in circles and so we know that
agent the the review specialist in
particular was problematic so we can go
back in time and essentially rewire the
prompt to make sure that it
works so I'm going to go look at the
entire timeline of events and see
exactly what could have been changed
here so for example here oh what's this
there's a big red error I wonder what
that could have been so basically we
actually ran out of context length that
this agent was so yappy so we got a 400
error code we consumed over 8,000 tokens
uh because we used 88,200 38 tokens and
so that actually showed a huge stack
Trace error that we could have prevented
by doing better context management so
this is a huge step up from instead of
just looking at the terminal and trying
to figure out what's going on you
basically have a giant error dashboard
just says okay super easy to do through
here give it a shot and also you can use
any of the main agent Frameworks
right we support most of the main agent
Frameworks so we're totally plugged in
with crew AI you can just pip install
agent Ops and pip install crew uh with a
special branch and you're good to go
also Al we're native on Pi autogen which
is the Microsoft framework uh for
building multi-agent Frameworks uh if
you're building agents of Lang chain we
work with that and we're rolling out
support with uh llama index later this
week nice so in addition to actually
looking at individual sessions I can get
an overall view of how all of my agents
are performing over time so I've been
able to look at individual sessions but
how about the aggregate so here's a
session breakdown so I'm going to select
agents I've run in the past 30 days we
can see most of my agents uh actually
never completed 36 of them never
completed and I can see like basically
all of the highle metrics I care about
to know what's causing the agents to
fail for example this one I interrupted
about six of them but a lot of them also
uh come from these 429 error codes a lot
of them come from 400 error codes so on
and so forth so I can use that as
information to understand why my agents
Break um more interestingly I can also
see how much I've been spending on these
things agents again super expensive
right now they're expensive and they're
unreliable and you want to have
observability to exactly why they're
breaking and how much they cost you
otherwise you're not going to scale the
production and they're not going to
change the world if they're too
expensive and too slow so that's how
this dashboard gives you a high level
view of exactly what the agents are
doing it's going also sort of like a
personal history of you know developing
agents like for example on GitHub you
have the heat map of when you're active
right so this is kind of similar where
you can see like how many agents you've
built and how you've improved over time
maybe reduced the amount of Errors
you're getting stuff like
that yeah we're working in uh actually a
handful of like cicd things so if you're
trying to roll out a test kit for your
agents we also have that covered too
okay so for example if you're running
agents against variety of like different
tests we have several thousand actually
loaded into the platform right now so
here's an agent configuration test
called Web Arena web arena is basically
an open source evaluation set where you
can run agents against websites and see
how well they perform on doing web tasks
so for example on this website the task
was to find the top selling brand in q1
2022 and here's the website we can take
a look and see exactly what that looks
like so some sort of dashboard the
agent's job is to basically log into the
dashboard and find that information we
can see that the agent failed the answer
was supposed to be Sprite and it gave
the wrong answer so that means the
evaluation failed but we can do this for
human evals exact evals fuzzy matches
you name it all this information you can
basically track and see how your agents
are increasing or losing performance
over time W that's that's actually super
powerful uh the way it works is we do
all sorts of thing everything related to
agent tracking agent testing and agent
monitoring so replay Analytics LM cost
tracking benchmarking compliance we have
a ton of Integrations with Frameworks
like linkchain crew Ai and autogen uh
and all it really takes is two lines of
code really it's that easy all you have
to do is import agent Ops and run agent
Ops out in it optionally if you want to
track specific functions you just run
this agent ops. record function and it
automatically adds to your dashboard the
only other thing you might have to do
and it's totally optional is just end
your session and this way you can see
whether your agents succeeded or your
agents failed and all from doing that
you get these really fancy dashboards
that show you exactly what your agents
were doing you have the ability to
rewind and understand what your agents
cost and how much they take in terms of
time and how much compute they take and
then lastly you get these like waterfall
diagrams showing you exactly what they
were doing at any given moment uh at any
given moment in time you can see how
much they cost in terms of llms whether
they're errors you can see if you have
interruptions in your services it just
all works perfectly out of the box by
the way guys if you get stuck at any
point building with agent Ops they have
a really cool Discord server with over
1,000 members but it's super active and
you can ask any question so even though
it's like three lines of code sometimes
you know you can get unexpected errors
and you know I'm sure Alex or other
people will help you out yeah man this
was super fun uh I'm really glad to put
this together man your channel is sick
like I was watching some of the
interviews like I I really love how
you're growing so it's really really
cool I mean we can do a podcast too like
just me and you if you want to
Weitere verwandte Videos ansehen
Crashkurs für Anfänger | Canva Tutorial Deutsch
So erstellst du fotorealistische KI Bilder in Canva. Schritt für Schritt Anleitung.
Interactive Feedback Painting in TouchDesigner Tutorial
Forsche mit uns! Farbige Salzkristalle selber züchten - so geht's!
Neues Bestellformular anlegen und gestalten | Digistore24 Bestellformular anlegen
BREAK EVEN POINT berechnen und grafisch einzeichnen – BWL Gewinnschwelle
5.0 / 5 (0 votes)