Watch how a Pro develops AI Agents in real-time

David Ondrej
15 Jun 202420:28

Summary

TLDRDieses Skript demonstriert die Erstellung eines AI-Agents, der Webinhalte extrahiert und in eine Tabelle formatiert. Es wird die 'agent Ops'-Plattform verwendet, um die Agenten-Performance zu überwachen und zu analysieren. Durch den Einsatz von 'fir crawl' für das Scraping und OpenAI für die Textsummarisierung werden die Schritte des Prozesses veranschaulicht. Zusätzlich wird gezeigt, wie man Fehler mit der Agent-Dashboard-Funktion diagnostiziert und wie man die Kosten- und Leistungsanalyse von verschiedenen Agenten-Laufzeiten einblendet.

Takeaways

  • 🤖 Die Diskussion umfasst das Erstellen eines AI-Agents, der Webinhalte extrahieren und in Tabellenform präsentieren kann.
  • 🔍 Der Agent soll in der Lage sein, Websites zu crawlen und Informationen zu sammeln, indem er die Funktion 'crawl web' nutzt.
  • 📝 Die Zusammenfassung der Webinhalte erfolgt durch die 'summarize text'-Funktion, die noch zu implementieren ist.
  • 🛠 Für das Web-Crawling wird das Open-Source-Framework 'fir crawl' verwendet, das Markdown-Texte liefert.
  • 🔑 Es wird auf die Verwendung von API-Schlüsseln und Anforderungsbibliotheken wie 'requests' für die Kommunikation mit dem 'fir crawl' hingewiesen.
  • 🔄 Die Umgebungsvariablen müssen geladen werden, um sicherzustellen, dass sie im Python-Skript zur Verfügung stehen, wenn es ausgeführt wird.
  • 📊 Der Agent 'Ops' bietet Dashboards an, die es ermöglichen, den Ablauf und die Kosten von AI-Agents zu verfolgen und zu analysieren.
  • 🔎 Agent 'Ops' kann die Interaktionen und Kosten von verschiedenen großen Sprachmodellen (LLMs) verfolgen und analysieren.
  • 🚀 Durch den Einsatz von 'Agent Ops' können Entwickler ihre AI-Agents überwachen, was zu einer verbesserten Zuverlässigkeit und Leistung bei der Problembehandlung führt.
  • 💡 Der Einsatz von 'Agent Ops' ermöglicht es, Fehler schnell zu identifizieren und zu beheben, indem man die Schritte und Entscheidungen der Agents im Dashboard nachvollziehen kann.
  • 🔗 'Agent Ops' ist in verschiedenen Agenten-Frameworks integriert und kann leicht in bestehende Projekte einbezogen werden.

Q & A

  • Was ist das Hauptziel des in der Diskussion beschriebenen Projekts?

    -Das Hauptziel des Projekts ist es, einen AI-Agenten zu erstellen, der Webseiten crawlen und Informationen sammeln kann, um sie dann in einer Tabelle zu formatieren.

  • Welche Funktionen soll der AI-Agent haben?

    -Der AI-Agent soll zwei Hauptfunktionen haben: Erstens soll er in der Lage sein, Inhalte von Webseiten zu crawlen, und zweitens soll er diese Inhalte in einer übersichtlichen Tabelle zusammenfassen.

  • Welches Tool wird für das Webcrawling verwendet?

    -Für das Webcrawling wird das Tool 'Fir Crawl' verwendet, das Open Source ist und eine einfache Möglichkeit bietet, Inhalte von Webseiten zu sammeln.

  • Was ist 'Augment' und wie wird es in diesem Projekt eingesetzt?

    -Augment ist ein alternativer Code-Assistent, vergleichbar mit Co-Pilot, der in diesem Projekt verwendet wird, um den Code-Schreibprozess zu beschleunigen und zu verbessern.

  • Wie wird die Zusammenfassung der Webinhalte erstellt?

    -Die Zusammenfassung der Webinhalte wird durch einen separaten Funktionsaufruf namens 'summarize text' erstellt, der die gesammelten Daten in einer übersichtlichen Form präsentiert.

  • Welche Rolle spielt die Agent-Ops-Plattform in diesem Projekt?

    -Die Agent-Ops-Plattform dient zur Überwachung und Analyse der Aktivitäten des AI-Agents, einschließlich Kosten-, Latenz- und Erfolgsverfolgung der durchgeführten Anfragen.

  • Wie wird sichergestellt, dass die Umgebungsvariablen im Python-Skript geladen werden?

    -Um sicherzustellen, dass die Umgebungsvariablen geladen werden, wird die Funktion 'load.env' aufgerufen, die die Umgebungsvariablen in das Python-Skript einbindet.

  • Welche Bedeutung haben die Dekoratoren in der Agent-Ops-Funktion?

    -Die Dekoratoren in der Agent-Ops-Funktion, wie z.B. 'agent_ops.record_function', dienen dazu, die von den Funktionen durchgeführten Aktionen zu verfolgen und im Dashboard nachzuvollziehen.

  • Was ist der Zweck des OpenAI-Clients in der 'summarize text'-Funktion?

    -Der OpenAI-Client in der 'summarize text'-Funktion wird verwendet, um die gesammelten Webinhalte an einen Large Language Model (LLM) zu senden und eine Zusammenfassung zu erhalten.

  • Wie wird die Leistung der Agents im Laufe der Zeit gemessen und visualisiert?

    -Die Leistung der Agents wird durch das Agent-Ops-Dashboard gemessen und visualisiert, das Informationen über die Kosten, die Dauer und den Erfolg der durchgeführten Aktionen anzeigt.

  • Was sind die Vorteile des Agent-Ops-Dashboards bei der Fehlerbehebung und -analyse?

    -Das Agent-Ops-Dashboard ermöglicht eine detaillierte Analyse von Fehlern und die Überwachung der Agentenleistung, indem es eine zeitliche Aufschlüsselung der Ereignisse und eine Visualisierung der Aktionen bietet.

Outlines

00:00

🤖 Einführung in die Erstellung von AI-Agenten

Dieses Video-Script führt in die Erstellung von AI-Agenten mit dem Tool 'Agent Ops' ein. Der Fokus liegt auf dem Bau eines Web-Crawlers, der Informationen von Websites sammelt und in einer Tabelle präsentiert. Der Prozess beginnt mit der Erklärung des Projekts und der Installation von 'Agent Ops'. Es wird auf die Verwendung von 'Augment' als schnelleren Alternative zu 'Co-Pilot' hingewiesen, da es die gesamte Datenbank durchsucht. Der Prozess umfasst das Definieren von Funktionen zur Web-Crawling und Text-Summarisierung, die Verwendung von 'fir crawl' als Drittanbieter-Tool für das Scraping und die Implementierung von OpenAI zur Text-Summarisierung. Schließlich wird ein Test des Crawler-Codes dargestellt, der zeigt, wie er Daten sammelt und in der Agent-Ops-Dashboard-Übersicht angezeigt werden kann.

05:01

🔍 Aufzeichnen und Überwachen von Agent-Funktionen

Diese Passage erläutert, wie man mit 'Agent Ops' Daten aufzeichnet und Funktionen überwacht. Es wird beschrieben, wie man einen Dekorator verwendet, um zu kennzeichnen, welche Funktionen ausgeführt werden, und wie man das Dashboard nutzt, um die Leistung der Agenten zu verfolgen. Es wird gezeigt, wie man den 'record function' und 'report function' verwendet, um die Aktivitäten nachzuvollziehen und wie man den 'main' Funktion hinzufügt, um die gesamte Agent-Aktivität zu überwachen. Der Abschnitt zeigt auch, wie man die Kosten und Latenz von OpenAI-Anfragen verfolgt, wenn man einen Large Language Model (LLM) nutzt, und wie man die Ergebnisse der Text-Summarisierung in eine Tabelle übersetzt.

10:01

🛠 Problembehandlung und Fehleranalyse

In diesem Abschnitt wird beschrieben, wie man mit 'Agent Ops' bei auftretenden Fehlern nachvollzogen und diese behebt. Es wird ein Fehler 'chat completion object is not subscriptable' vorgestellt, der auf ein Missverständnis beim Einsatz des OpenAI-Clients hindeutet. Der Fehler wird im Dashboard nachvollzogen, und es wird gezeigt, wie man die Funktionsparameter und Rückgabewerte überprüft, um die Ursache des Problems zu identifizieren. Es wird auch auf die Verwendung von 'Agent Ops' für die Überwachung von Agenten hinzugeweiht, die zusammenarbeiten, um Aufgaben wie die Erstellung von Stellenanzeigen zu erledigen.

15:01

📊 Überwachung der Agent-Leistung und -Kosten

Dieser Abschnitt konzentriert sich auf die Überwachung der Leistung und Kosten der Agenten mit 'Agent Ops'. Es wird gezeigt, wie man die Dashboard-Funktionen nutzt, um die Kosten und die Dauer von Agent-Aktivitäten zu analysieren. Es wird erläutert, wie man die Chat-Historie und die Aktionen der Agenten im Zeitverlauf verfolgt und wie man die Kosten für die Nutzung von LLMs wie OpenAI verfolgt. Es wird auch auf die Vorteile der Verwendung von 'Agent Ops' für die Verbesserung der Agent-Performance und die Kostenkontrolle hingewiesen.

20:03

🤝 Zusammenfassung und Community-Support

Der Schlussabschnitt des Scripts bietet eine Zusammenfassung der Vorteile von 'Agent Ops' für die Entwicklung und Überwachung von AI-Agenten. Es wird auf die einfache Integration in verschiedene Agent-Frameworks und die Unterstützung durch eine aktive Community in Discord verwiesen. Es wird betont, wie 'Agent Ops' Entwicklern hilft, ihre Agenten zu verbessern, indem es eine detaillierte Analyse und Überwachung ermöglicht. Der Abschnitt endet mit einem Angebot für eine Podcast-Diskussion, um die Zusammenarbeit zu vertiefen.

Mindmap

Keywords

💡AgentOps

AgentOps ist eine Plattform, die das Erstellen und Überwachen zuverlässiger KI-Agenten ermöglicht. Im Video wird gezeigt, wie AgentOps verwendet wird, um eine Agent-basierte Web-Crawler zu erstellen, der Informationen sammelt und summarisiert. Es wird auch demonstriert, wie AgentOps die Interaktionen und Kosten der verschiedenen Funktionen verfolgt, was für die Entwicklung und Optimierung von Agenten unerlässlich ist.

💡Web-Crawler

Ein Web-Crawler ist ein Programm, das automatisiert Websites durchsucht und Inhalte sammelt. Im Kontext des Videos wird ein Web-Crawler erstellt, der Daten von einer Website extrahiert, um sie später mit einem Sprachmodell zu verarbeiten. Der Crawler wird mit Hilfe der 'fir crawl' Bibliothek implementiert.

💡Summarize Text

Summarize Text bezieht sich auf die Funktion, die im Video verwendet wird, um die gesammelten Web-Inhalte zu einem kürzeren, zusammenfassenden Text zu reduzieren. Dies geschieht mithilfe eines Sprachmodells, das in der Lage ist, wichtige Informationen zu identifizieren und zu präsentieren.

💡OpenAI

OpenAI ist ein Unternehmen, das KI-Technologien entwickelt, einschließlich der Sprachmodell-Plattform, die im Video verwendet wird. OpenAI bietet Modelle wie GPT-3.5, die für die Text-Summarisierung und -Generierung eingesetzt werden. Im Video wird OpenAI für die Automatisierung der Text-Summarisierung genutzt.

💡Fir Crawl

Fir Crawl ist ein Open-Source-Web-Scraping-Framework, das im Video erwähnt wird. Es ermöglicht es, Websites zu durchsuchen und gibt sauberen, für Language Models geeigneten Text zurück. Es wird als einfach zu bedienendes Tool für den Web-Crawler in der Demonstration eingesetzt.

💡Request Library

Die Request Library ist ein Bestandteil der Python-Programmierung, das verwendet wird, um HTTP-Anfragen zu senden und zu empfangen. Im Video wird die Library genutzt, um eine Verbindung zum API von 'fir crawl' herzustellen und die gewünschten Daten abzurufen.

💡JSON

JSON steht für JavaScript Object Notation und ist ein Format zur Datenaustausch. Im Video wird JSON verwendet, um Daten an das 'fir crawl' API zu senden und die Ergebnisse zu empfangen, was für die Kommunikation zwischen dem Web-Crawler und dem API notwendig ist.

💡Decorator

In Python ist ein Decorator ein spezieller Typ von Funktion, der verwendet wird, um die Funktionalität anderer Funktionen zu erweitern oder zu ändern, ohne deren Code zu verändern. Im Video wird ein Decorator von AgentOps verwendet, um die Funktionen zu verfolgen und zu analysieren.

💡Observability

Observability bezieht sich auf die Fähigkeit, den Zustand und das Verhalten eines Systems zu beobachten und zu messen, ohne direkten Zugriff auf den internen Zustand zu haben. Im Video wird die Observability von AgentOps hervorgehoben, um den Betrieb und die Leistung der erstellten Agenten zu überwachen und zu analysieren.

💡Crew AI

Crew AI ist eine Plattform für die Erstellung von KI-Agenten, die im Video erwähnt wird. Es wird als Alternative zu anderen Code-Assistenten wie Co-Pilot dargestellt und wird für die Zusammenarbeit verschiedener Agenten in einer Arbeitsabläufe verwendet.

Highlights

Erstellung eines AI-Agents, der Webinhalte extrahiert und in eine Tabelle formatiert.

Verwendung von 'agent Ops' zur Erstellung und Überwachung von AI-Agents.

Import von 'agent Ops' zur Code-Verwendung und Session-Management.

Erstellung einer Funktion zur Zusammenfassung von Text mithilfe von AI.

Verwendung des 'fir crawl' Frameworks für Web-Scraping.

Installation von 'fir crawl' mittels 'pip install fir crawl'.

Anwendung des 'request' Libraries für API-Anfragen.

Laden von Umgebungsvariablen mit 'load. EnV' für Python-Skripte.

Test der Web-Crawler-Funktion mit einer gegebenen URL.

Verwendung des 'agent Ops' Dashboards zur Überwachung von Agent-Aktivitäten.

Hinzufügen von 'agent Ops' Dekoratoren für die Funktionsüberwachung.

Integration von OpenAI zur Text-Zusammenfassung in Tabellenform.

Verfolgung von Anfragen und Kosten für Language Model-Anbieter durch 'agent Ops'.

Fehlerbehandlung und -debugging mit dem 'agent Ops' Dashboard.

Überwachung der Agent-Leistung mit der 'agent Ops' Session-Historie.

Analyse von Agent-Fehlern und -Kosten im 'agent Ops' Dashboard.

Verwendung von 'agent Ops' für die Entwicklung und Überwachung von Agent-basierten Aufgaben.

Integration von 'agent Ops' mit verschiedenen Agent-Frameworks und Plattformen.

Einsatz von 'agent Ops' für die Durchführung von CI/CD-Tests für Agents.

Überblick über die Leistung von Agents im 'agent Ops' Dashboard.

Einsatz von 'agent Ops' zur Verbesserung der Agent-Entwicklung durch Analyse von Fehlern und Kosten.

Transcripts

play00:00

we are going to use crew AI with agent

play00:02

Ops to build AI agents so let's get

play00:05

started sounds good we're going to make

play00:07

a agent that can scrape the web fetch

play00:10

information about what's on the website

play00:12

and then format it in a table uh first

play00:13

thing we do with every python file is we

play00:16

create a um an explanation what we're

play00:18

doing so we're going to do two things

play00:19

number one we're going to create a

play00:20

website scraper so we'll say We'll

play00:22

create summarized text function we'll

play00:24

also create def craw web function yeah

play00:27

import agent Ops obviously which code

play00:30

assistant that you're using right now

play00:31

right now I'm using augment which is a

play00:33

um it's an alternative to co-pilot it's

play00:35

a lot faster actually and it seems to

play00:36

work a lot better cuz it scrapes your

play00:38

entire database instead of just doing in

play00:40

context yeah I need to try it I actually

play00:42

I'll hook you up with the guys

play00:43

afterwards all right sweet so let's

play00:45

define our M we'll say agent ops. init

play00:48

we'll create this all we have to do is

play00:50

run end session after all of this done

play00:53

all right so we're going to say number

play00:54

one uh web data equals crawl web and

play01:00

then we'll say we'll make this a return

play01:02

to string quick tip for the people

play01:04

watching if you are confused at any part

play01:06

you can take a screenshot and paste it

play01:07

into cgbd and ask it to explain so we'll

play01:10

say web data equals craw web and we'll

play01:13

say uh summarized text equals summarize

play01:18

text on web data all right awesome so so

play01:21

we're just storing the outputs of the

play01:23

functions into simple variables exactly

play01:26

so we're just going to call two

play01:27

functions we're going say craw the web

play01:29

and then with that web data we're going

play01:30

to summarize it uh so we've yet to

play01:32

implement what that looks like but we'll

play01:34

figure that out just right now so uh the

play01:37

crawl web I'm going to use a third party

play01:39

tool called fir crawl so fir crawl is a

play01:43

uh it's an easy web scraping framework

play01:47

it's open source uh you can basically

play01:49

just put in any website and we'll give

play01:51

you just markdown text that you can feed

play01:53

to any LM it's super clean and super

play01:55

easy so we're just going to use this guy

play01:56

to create a simple WebCrawler um let's

play02:00

look at the documentation see how we can

play02:01

make it work a lot of people are not

play02:04

sure like which crawler to use so we

play02:06

just say like this one is the best from

play02:08

your experience fir crawl is probably

play02:10

one of the easiest ones to use uh and it

play02:12

works like super super well so uh let's

play02:15

just get the code for this take the

play02:17

python code assume that works and there

play02:22

we go all right so um what are we

play02:26

missing here

play02:27

pren awesome

play02:30

we're just go into pip install fir crawl

play02:32

real quick pip install fir crawl okay

play02:36

we're going to say that a URL belongs

play02:38

here it should be a string awesome and

play02:40

we're just going to use the request

play02:41

library to hit the API and then we're

play02:43

going to jump uh just dump the Json so

play02:47

json. dump best and we should be good to

play02:49

go um okay can you quickly go over the

play02:52

entire function for people who might you

play02:53

know not be following okay so what we're

play02:55

doing here is we're just going to use

play02:57

the request library to hit the fire CRA

play02:59

AP fir crawl takes a simple parameter

play03:02

takes the URL of the website we're

play03:03

trying to scrape it will wait and then

play03:05

it will just return exactly what we're

play03:07

looking for at the end of it so we set

play03:09

the headers so we just say application

play03:11

jsol which is pretty stand standard we

play03:13

set an API key uh which is the fire API

play03:16

key uh we just do request. poost we send

play03:19

it to the URL with the headers and then

play03:21

this body which should be the Json

play03:23

containing the URL that we want to post

play03:25

and then we just wait to see if we get

play03:26

the right status code if it's uh not 200

play03:29

then going to say it failed otherwise

play03:30

we're going to return the status which

play03:32

will or return the uh the body which

play03:34

should be response. Json so uh all we're

play03:37

going to do next is um one one other

play03:41

thing we're going to do is make sure

play03:42

that you load in your environmental

play03:43

variables so we're going to do load. EnV

play03:48

so this makes sure that your

play03:49

environmental variables are going to be

play03:51

in your python script when you run it so

play03:53

what we're going to do right now is

play03:54

we're just going to test to make sure

play03:55

that this actually works so uh we're

play03:57

going to run this craw web function

play04:00

we're going to give it the URL

play04:02

htps agent ops.

play04:05

a uh we're going to run the uh web

play04:09

crawler function and see what happens

play04:11

python main.py let's see if this guy

play04:13

runs okay no module Nam fire C just get

play04:16

rid of that

play04:19

guy okay so python main.py uh nothing

play04:23

happened because we did not run

play04:27

main okay so kind of a high level view

play04:30

of the code again we defined our web

play04:31

crawler with crawl web we have a

play04:33

summarized text which is yet to be

play04:34

defined and then we have a main function

play04:36

that takes we're going to run agent ops.

play04:38

init that basically kicks off our

play04:40

session we're going to run the crawler

play04:41

function and then we're going to end the

play04:43

session to make sure it's success or

play04:44

fail so I'm just going to run main.py

play04:47

see what

play04:48

happens okay so we got our session

play04:50

replay ID right here uh right now it's

play04:53

probably scraping the web so let's see

play04:55

what happens all right so in our

play04:56

dashboard we can see that this run took

play04:59

10

play05:00

seconds and we didn't record any data

play05:03

though so the way that we record data

play05:04

with agent Ops is you just add the

play05:06

simple decorator function on top of your

play05:08

function so we'll say AO or agent ops.

play05:12

record function we'll say scrape the web

play05:14

and then we'll say agent Ops at repord

play05:17

function summarize text so this way

play05:21

agent Ops actually knows which functions

play05:22

are happening at at which moment so you

play05:25

can actually Trace back exactly what's

play05:26

happening we can even add it to the main

play05:28

function too so we'll say a

play05:30

main cool so let's see what happens next

play05:33

let's run it one more time see what goes

play05:35

on so at what point can you go into the

play05:38

agent op agent Ops dashboard and start

play05:41

seeing it there so as soon as you see

play05:43

the session ID show up in the terminal

play05:45

you're free to go check it and the

play05:46

events should come streaming in

play05:48

automatically okay so you can see the

play05:50

session ID up here so that means the

play05:52

events are automatically streaming in so

play05:53

if we actually check out the link it

play05:55

should start showing up so let's go

play05:57

check it out all right awesome so we

play05:59

could see that that web scraping event

play06:00

probably took 13 seconds if we look down

play06:02

at the waterfall graph we could see okay

play06:05

so this was the the scraping the web

play06:07

function uh we can see that it actually

play06:09

got all the data from agent Ops so we

play06:10

can see the content is all on this web

play06:13

page so this is all text that I can feed

play06:15

to my large language model all right

play06:17

awesome so now what we're going to do is

play06:21

now we're going to feed this to a large

play06:22

language model and see if we can get a

play06:23

table summary of it so let's import open

play06:26

AI we can use any llm here we can use

play06:29

gro we can use anthropic but open AI for

play06:31

Simplicity so the magic about agent Ops

play06:34

is that when you import almost any large

play06:36

language model provider we automatically

play06:39

track all the requests that go through

play06:40

and we can track the cost we can track

play06:42

the latency we track whether it actually

play06:43

works for you so just by importing it

play06:45

alone you're good to go so what we're

play06:47

going to do here is uh we're going to

play06:49

set an open AI client in the function so

play06:52

let's say summarize text we'll say uh

play06:55

client equals open AI open AI we set our

play06:59

open a key uh and we'll say uh messages

play07:04

equals your web summarizer agent your

play07:06

task is to read through web content and

play07:08

tell me the name title interesting facts

play07:10

of the company the website does be clear

play07:12

and don't make things up awesome so this

play07:15

this looks actually pretty good to me

play07:16

for a automatic completion uh so we're

play07:19

going to take here is we have client.

play07:22

chat. completion. create we're going to

play07:25

set our model to GPD 3.5 turbo we're

play07:27

going to set our messages to messages so

play07:29

so um that should give us a good way of

play07:34

starting so let's run

play07:35

main.py all right we got our session ID

play07:38

up

play07:38

here we're going to wait for that to

play07:41

occur uh and then we can basically track

play07:44

all the events happening in the agent

play07:46

Ops dashboard so let's take a look here

play07:49

uh we have one three events so far we

play07:52

have the uh web scrape and then we have

play07:55

the uh the open Ai call and we can see

play07:57

here it costs less than a penny

play08:00

and here's all the text it

play08:01

took and here's our easy summary so it

play08:04

tells us the name of the website agent

play08:07

Ops title build AI agents with LM apps

play08:09

interesting facts agent Ops is a

play08:11

platform to build reliable AI agents and

play08:12

monitoring awesome all this information

play08:14

checks out one last step we're going to

play08:16

take here is we're just going to add one

play08:18

more open Ai call and we're going to say

play08:21

uh response equals that we'll say

play08:24

response text equals the text and we're

play08:28

going to make one more open Ai call say

play08:30

uh table messages

play08:33

equals make the response text a

play08:37

table Your Role is to summarize the

play08:42

table the information below as a table

play08:46

and

play08:47

markdown okay awesome so we got the the

play08:50

messages in here and we're just going to

play08:51

copy paste the code from above and see

play08:53

what we get

play08:56

so say response equals client.

play09:00

completion. create reset our message to

play09:02

this uh and we'll say all right awesome

play09:06

and we'll just print that at the end

play09:09

return

play09:12

response

play09:13

print response and we'll say return

play09:17

response. choices. text and yeah that

play09:21

should just about do it so uh and we're

play09:24

just going to print here print

play09:27

summarize text

play09:29

got it so in the past few minutes we

play09:32

have now created an agent that can crawl

play09:34

the

play09:35

web get that web data and then summarize

play09:38

to a table and we're just going to spend

play09:39

one more time trying to

play09:42

um to uh run it also you can also see

play09:46

for all prior runs every time you finish

play09:49

a run you get a chat summary saying

play09:50

exactly how much you spent for that run

play09:52

so that way it's a lot easier of a way

play09:53

to just track what's going on awesome so

play09:56

it looks like we actually had some

play09:57

errors here so now we can use Asian Ops

play09:59

to bug what the errors were in the

play10:01

dashboard so I'm going to open up the

play10:02

link and see exactly what

play10:04

happens chat completion object is not

play10:07

subscriptable okay so I can use this

play10:09

traceback to see exactly what happened

play10:11

here uh my guess is that I actually used

play10:13

the open AI client incorrectly so that

play10:15

way I can go back in time and see what

play10:17

exactly occurred so we get a big error

play10:18

bar here saying in the main function

play10:20

which we decorated at the beginning uh

play10:23

this was kind of like the encapsulating

play10:24

function that caused the issue we can

play10:26

see that the web scraping worked cuz

play10:27

it's nice and blue but we have we have

play10:29

an error that was attached to the llm uh

play10:32

so we actually had a type error that

play10:33

said chat completion object not

play10:34

subscriptable and we could basically use

play10:36

this as a way to read through what the

play10:38

parameters of the function were and also

play10:41

what the returns were and that way we

play10:43

can basically see exactly what were

play10:44

wrong and then fix that in our code we

play10:47

just took an example crew AI uh notebook

play10:49

so essentially this agent takes three

play10:52

separate agents that work together it

play10:54

takes a research

play10:56

analyst a writer agent and then a review

play10:59

agent if we actually look at what these

play11:00

agents do we can see that the research

play11:03

analyst will analyze company websites

play11:05

the writer agent will use the insights

play11:08

from the website to create a detailed

play11:10

engaging and enticing job posting and

play11:13

then the final The Specialist the review

play11:15

specialist will use the job posting to

play11:17

create a clear more concise

play11:19

grammatically correct job posting that

play11:20

we can post on LinkedIn or indeed or any

play11:23

other job websites uh so all we have to

play11:26

do for agent Ops if we're going to plug

play11:28

this in is two lines of code you just do

play11:31

import agent Ops and agent ops. in it

play11:34

and then optionally you can add any tags

play11:35

so you can track your sessions more

play11:36

easily and then you're good to go so

play11:39

let's try running it see what we get and

play11:41

we're also going to link this GitHub

play11:42

repo below the video so people can just

play11:44

clone it themselves all right so uh

play11:46

we're going to run the python script so

play11:49

just run main.py so it's going to ask us

play11:52

two things number one what is the

play11:53

company description and number two it's

play11:54

actually going to give us a little handy

play11:55

dandy link to where we can inspect the

play11:58

session later so we'll say say agent Ops

play12:00

does a i agent testing monitoring and

play12:06

evals agents suck we're fixing that

play12:10

company domain agent ops. hiring

play12:14

needs AI

play12:17

Engineers speciic benefits work with the

play12:21

most cracked developers in San Francisco

play12:26

all right so right now the agents are

play12:27

all going to work together to spin up

play12:30

and create this amazing job posting it's

play12:31

going to scrape the web it's going to

play12:32

use a set of tools and they're all going

play12:34

to work and chat together to be able to

play12:35

solve this problem all dynamically so

play12:39

first thing that we notice here is that

play12:40

the agents are kicking off uh one big

play12:43

problem here is these things are very

play12:44

yappy that's a technical term by the way

play12:47

so these agents are just spewing tokens

play12:48

all over my console and if I want to

play12:50

debug this thing it's a huge pain in the

play12:51

neck so you can see here there's just

play12:53

like so much stuff going on um wall of

play12:57

text wall of text exactly so being able

play13:00

to just parse through this is a

play13:02

nightmare to begin with uh secondarily

play13:04

like we actually have no observability

play13:05

what's going on what step of the

play13:07

sequence is at how much it's costing us

play13:09

how long it's going to go on for so

play13:10

that's a huge pain in the neck but

play13:12

thankfully since we have this agent off

play13:13

session ID up here all we have to do is

play13:15

click this link and it's going to open

play13:17

our current session so our current

play13:19

session we can see that we started this

play13:21

about 2 minutes ago and it's already

play13:23

cost us 14 cents in open AI credits

play13:25

which is pretty crazy to think about we

play13:26

get all of the environment information

play13:28

about uh which environment we're running

play13:30

on so it could be a pod or Docker

play13:32

container you name it uh we're running

play13:34

on my MacBook right now but kind of more

play13:36

interestingly we can see like the entire

play13:38

chat breakdown so remember how we spun

play13:40

up three different agents we can

play13:42

actually see that we have the research

play13:44

analyst the job description writer and

play13:45

the review specialist all here in the

play13:47

timeline and we can filter down by those

play13:49

if we really wanted to um also so here's

play13:52

the full chat history so instead of

play13:54

parsing through again this really messy

play13:57

block of text we can just watch it

play13:58

through here which is a huge time saer

play14:01

uh lastly we actually get this like

play14:03

Dynamic waterfall breakdown of all the

play14:06

actions that the agent was taking all

play14:07

the tool calls like searching websites

play14:09

all the llm calls that uses gp4 so we

play14:11

can see this one cost us 4 cents we can

play14:13

see the entire prompt and completion

play14:15

here and all the reasoning that it goes

play14:16

through uh I'm going to go back to a

play14:18

previous session I ran in the past and

play14:20

just show you how powerful this could

play14:21

actually

play14:22

be yeah I mean this dashboard is just

play14:24

going to let people understand agents

play14:26

much faster because especially when

play14:27

people are new to building agents it's

play14:28

got hard to understand what they're

play14:30

doing right but in agent OBS you can

play14:32

clearly see that every step of the

play14:35

way basically we are taking agents we a

play14:38

black box agents used to be a black box

play14:41

we're making it so we're taking a

play14:42

flashlight you can see exactly what's

play14:43

going on you see how much they cost you

play14:44

see how long they take you can see

play14:46

actually how they solve problems how

play14:47

they reason through things so here's an

play14:49

agent for example that ran and cost me

play14:51

almost $10 an open AI credits and took

play14:54

almost 12 minutes took 11 minutes and 8

play14:56

seconds uh and you can see actually had

play14:58

a stop at pre prematurely I can see the

play14:59

whole end session reason the sign

play15:01

detected but also all the environment

play15:03

variables uh and I can reason through

play15:05

like okay so which uh agent was doing

play15:07

the most amount of time so I could see

play15:09

all the agents let's suppose I want to

play15:10

look at the review specialist and see

play15:12

exactly what was happening I could see

play15:13

that the review Specialist made 44 tool

play15:15

calls and 40 llm calls so that's

play15:18

probably a big mistake and one big

play15:20

challenge here is that had this repeat

play15:21

thought and the repeat thought was that

play15:23

it was constantly doing the same thing

play15:24

over and over again so we have this

play15:25

repeat thought catcher that can

play15:27

basically say sometimes the agents going

play15:29

circles how do we make sure they stopped

play15:30

going in circles and so we know that

play15:32

agent the the review specialist in

play15:33

particular was problematic so we can go

play15:35

back in time and essentially rewire the

play15:37

prompt to make sure that it

play15:39

works so I'm going to go look at the

play15:41

entire timeline of events and see

play15:42

exactly what could have been changed

play15:43

here so for example here oh what's this

play15:46

there's a big red error I wonder what

play15:47

that could have been so basically we

play15:49

actually ran out of context length that

play15:51

this agent was so yappy so we got a 400

play15:54

error code we consumed over 8,000 tokens

play15:57

uh because we used 88,200 38 tokens and

play16:00

so that actually showed a huge stack

play16:01

Trace error that we could have prevented

play16:03

by doing better context management so

play16:05

this is a huge step up from instead of

play16:07

just looking at the terminal and trying

play16:08

to figure out what's going on you

play16:09

basically have a giant error dashboard

play16:11

just says okay super easy to do through

play16:13

here give it a shot and also you can use

play16:16

any of the main agent Frameworks

play16:18

right we support most of the main agent

play16:21

Frameworks so we're totally plugged in

play16:22

with crew AI you can just pip install

play16:24

agent Ops and pip install crew uh with a

play16:26

special branch and you're good to go

play16:28

also Al we're native on Pi autogen which

play16:30

is the Microsoft framework uh for

play16:32

building multi-agent Frameworks uh if

play16:35

you're building agents of Lang chain we

play16:36

work with that and we're rolling out

play16:37

support with uh llama index later this

play16:40

week nice so in addition to actually

play16:42

looking at individual sessions I can get

play16:44

an overall view of how all of my agents

play16:46

are performing over time so I've been

play16:48

able to look at individual sessions but

play16:49

how about the aggregate so here's a

play16:51

session breakdown so I'm going to select

play16:53

agents I've run in the past 30 days we

play16:55

can see most of my agents uh actually

play16:57

never completed 36 of them never

play16:58

completed and I can see like basically

play17:00

all of the highle metrics I care about

play17:02

to know what's causing the agents to

play17:03

fail for example this one I interrupted

play17:06

about six of them but a lot of them also

play17:08

uh come from these 429 error codes a lot

play17:10

of them come from 400 error codes so on

play17:11

and so forth so I can use that as

play17:13

information to understand why my agents

play17:14

Break um more interestingly I can also

play17:17

see how much I've been spending on these

play17:18

things agents again super expensive

play17:20

right now they're expensive and they're

play17:21

unreliable and you want to have

play17:23

observability to exactly why they're

play17:25

breaking and how much they cost you

play17:26

otherwise you're not going to scale the

play17:27

production and they're not going to

play17:28

change the world if they're too

play17:29

expensive and too slow so that's how

play17:32

this dashboard gives you a high level

play17:33

view of exactly what the agents are

play17:35

doing it's going also sort of like a

play17:37

personal history of you know developing

play17:40

agents like for example on GitHub you

play17:42

have the heat map of when you're active

play17:44

right so this is kind of similar where

play17:45

you can see like how many agents you've

play17:47

built and how you've improved over time

play17:49

maybe reduced the amount of Errors

play17:50

you're getting stuff like

play17:52

that yeah we're working in uh actually a

play17:55

handful of like cicd things so if you're

play17:57

trying to roll out a test kit for your

play17:59

agents we also have that covered too

play18:01

okay so for example if you're running

play18:02

agents against variety of like different

play18:05

tests we have several thousand actually

play18:07

loaded into the platform right now so

play18:08

here's an agent configuration test

play18:10

called Web Arena web arena is basically

play18:12

an open source evaluation set where you

play18:14

can run agents against websites and see

play18:15

how well they perform on doing web tasks

play18:18

so for example on this website the task

play18:20

was to find the top selling brand in q1

play18:23

2022 and here's the website we can take

play18:25

a look and see exactly what that looks

play18:27

like so some sort of dashboard the

play18:29

agent's job is to basically log into the

play18:31

dashboard and find that information we

play18:33

can see that the agent failed the answer

play18:34

was supposed to be Sprite and it gave

play18:36

the wrong answer so that means the

play18:38

evaluation failed but we can do this for

play18:40

human evals exact evals fuzzy matches

play18:42

you name it all this information you can

play18:45

basically track and see how your agents

play18:46

are increasing or losing performance

play18:49

over time W that's that's actually super

play18:51

powerful uh the way it works is we do

play18:53

all sorts of thing everything related to

play18:55

agent tracking agent testing and agent

play18:56

monitoring so replay Analytics LM cost

play18:59

tracking benchmarking compliance we have

play19:02

a ton of Integrations with Frameworks

play19:03

like linkchain crew Ai and autogen uh

play19:07

and all it really takes is two lines of

play19:09

code really it's that easy all you have

play19:11

to do is import agent Ops and run agent

play19:13

Ops out in it optionally if you want to

play19:15

track specific functions you just run

play19:17

this agent ops. record function and it

play19:19

automatically adds to your dashboard the

play19:21

only other thing you might have to do

play19:23

and it's totally optional is just end

play19:24

your session and this way you can see

play19:25

whether your agents succeeded or your

play19:27

agents failed and all from doing that

play19:29

you get these really fancy dashboards

play19:30

that show you exactly what your agents

play19:32

were doing you have the ability to

play19:34

rewind and understand what your agents

play19:35

cost and how much they take in terms of

play19:37

time and how much compute they take and

play19:40

then lastly you get these like waterfall

play19:41

diagrams showing you exactly what they

play19:43

were doing at any given moment uh at any

play19:45

given moment in time you can see how

play19:46

much they cost in terms of llms whether

play19:48

they're errors you can see if you have

play19:50

interruptions in your services it just

play19:52

all works perfectly out of the box by

play19:54

the way guys if you get stuck at any

play19:55

point building with agent Ops they have

play19:57

a really cool Discord server with over

play19:59

1,000 members but it's super active and

play20:01

you can ask any question so even though

play20:02

it's like three lines of code sometimes

play20:04

you know you can get unexpected errors

play20:06

and you know I'm sure Alex or other

play20:08

people will help you out yeah man this

play20:10

was super fun uh I'm really glad to put

play20:12

this together man your channel is sick

play20:14

like I was watching some of the

play20:15

interviews like I I really love how

play20:17

you're growing so it's really really

play20:19

cool I mean we can do a podcast too like

play20:21

just me and you if you want to

Rate This

5.0 / 5 (0 votes)

Verwandte Tags
KI-AgentenAgent OpsCrew AIWeb-ScrapingDaten-AnalyseCode-DebuggingAgent-MonitoringEntwicklungAI-TechnologieDashboardKostenüberwachung