Watch how a Pro develops AI Agents in real-time

David Ondrej
15 Jun 202420:28

Summary

TLDRThe script details a tutorial on creating an AI agent using 'agent Ops' and 'Ops' to scrape websites and summarize content into tables. It introduces 'augment' as a coding assistant, discusses utilizing 'fir crawl' for web scraping, and 'open AI' for text summarization. The process involves setting up functions, handling API requests, and monitoring agent performance through a dashboard, emphasizing the ease of tracking costs and debugging with 'agent Ops'.

Takeaways

  • πŸ€– The script discusses building AI agents with 'Crew AI' and 'Agent Ops' for web scraping and summarizing content into tables.
  • πŸ“ The importance of initializing and ending sessions with Agent Ops for tracking purposes is highlighted.
  • πŸ” The use of 'fir crawl', an open-source web scraping framework, is introduced for fetching web data.
  • πŸ”‘ A mention of using the 'request' library to interact with APIs, specifically for the 'fir crawl' service.
  • πŸ’‘ Tips are provided for debugging, such as using 'cgbd' to explain confusing parts of the code.
  • πŸ› οΈ The script covers the implementation of functions like 'crawl_web' and 'summarize_text' using Python.
  • πŸ“ˆ Agent Ops is used for monitoring AI agent performance, including tracking costs and latencies of large language model (LLM) calls.
  • πŸ”— The integration of Agent Ops with various LLM providers for automatic tracking of requests and costs is explained.
  • πŸ“Š The script demonstrates creating a table summary using OpenAI's language model and handling errors with Agent Ops' dashboard.
  • πŸ” Agent Ops provides detailed observability into agent behavior, including chat history, waterfall graphs, and cost tracking.
  • πŸ›‘ The script identifies issues such as 'yappy' agents that produce too much output, making debugging difficult without proper tools.

Q & A

  • What is the primary objective of the project described in the script?

    -The primary objective is to build an AI agent using Crew AI with Agent Ops that can scrape web data, summarize it, and present the information in a table format.

  • Which tool is mentioned for web scraping in the script?

    -The tool mentioned for web scraping is 'Fir Crawl', an easy-to-use open-source web scraping framework.

  • What is the alternative to Co-Pilot that the speaker is using, and why is it preferred?

    -The speaker is using 'Augment' as an alternative to Co-Pilot because it is faster and works better by scraping the entire database instead of just in-context.

  • How does the speaker suggest to handle confusion during the coding process?

    -The speaker suggests taking a screenshot of the confusing part and pasting it into a chatbot, asking it to explain.

  • What is the purpose of the 'agent ops.init' function in the script?

    -The 'agent ops.init' function is used to kick off the session for the AI agent, ensuring it starts correctly.

  • How does the speaker plan to summarize the web data into a table?

    -The speaker plans to use a large language model (LLM) to read through the web content and create a table summary with the help of the 'summarize text' function.

  • What is the significance of the 'agent ops.record' decorator mentioned in the script?

    -The 'agent ops.record' decorator is used to record the functions being executed, allowing Agent Ops to track which functions are happening at which moment, providing traceability.

  • What does the speaker mean by 'yappy agents' and why is it a problem?

    -'Yappy agents' refers to agents that produce a large amount of output, making it difficult to parse through the console and debug. It's a problem because it lacks observability and clarity on the agent's actions and costs.

  • How does the Agent Ops dashboard help in debugging and understanding agent behavior?

    -The Agent Ops dashboard provides a visual representation of the agent's actions, including a chat breakdown, waterfall diagram, and cost tracking, making it easier to understand and debug agent behavior.

  • What is the role of the 'client.chat.completion.create' in the script?

    -The 'client.chat.completion.create' is used to interact with the large language model, sending prompts and receiving completions that help in summarizing the web data.

  • How does the speaker suggest tracking and managing the costs associated with using LLMs?

    -The speaker suggests using Agent Ops to automatically track all the requests and costs associated with using LLMs, providing insights into the expenses and helping in managing them.

Outlines

00:00

πŸ€– Building AI Agents with AgentOps

This paragraph introduces a project to create AI agents capable of web scraping and summarizing information into a table format. The process begins with setting up a Python environment using the 'agent Ops' tool and 'augment' as an alternative to 'co-pilot' for database scraping. A 'crawl web' function is defined to scrape data, and a 'summarize text' function is planned. The use of 'fir crawl', an open-source web scraping framework, is highlighted for its ease of use and efficiency. The paragraph concludes with a discussion on implementing the web scraping function using Python code and the 'requests' library to interact with the 'fir crawl' API.

05:01

πŸ” Implementing AgentOps for Data Recording and Debugging

The focus shifts to using AgentOps for recording data and debugging. A decorator function is introduced to mark specific functions for recording, allowing for traceability of events within the session. The paragraph demonstrates how to run the script and access the AgentOps dashboard to monitor the web scraping event. It also explains how to use the dashboard to understand the cost, latency, and success of the operations. The use of large language models (LLMs) like OpenAI is discussed, with AgentOps automatically tracking requests and costs. The paragraph ends with a practical example of summarizing web data into a table using OpenAI's chat completion feature.

10:01

πŸ› οΈ Debugging and Observability with AgentOps

This paragraph delves into debugging and observability using AgentOps. An error encountered during the LLM call is used to illustrate how AgentOps can help identify and fix issues by providing detailed traceback information. The discussion includes the benefits of using AgentOps for understanding agent behavior, such as cost tracking and performance monitoring. The paragraph also provides insights into optimizing agent performance by avoiding repetitive tasks and managing context lengths effectively. The use of AgentOps with different agent frameworks like Langchain and Autogen is mentioned, emphasizing the tool's versatility.

15:01

πŸ“Š AgentOps Dashboard for Session Analysis and Cost Management

The paragraph highlights the capabilities of the AgentOps dashboard for analyzing agent sessions and managing costs. It discusses how the dashboard provides a comprehensive view of agent performance over time, including completion rates, error analysis, and expenditure on AI credits. The dashboard's ability to offer a historical perspective on agent development and performance improvement is emphasized. The paragraph also touches on the integration of AgentOps with CI/CD pipelines and its role in evaluating agent performance against various tests, providing a robust framework for agent testing and monitoring.

20:03

🀝 Collaboration and Community Support for AgentOps Users

The final paragraph wraps up the discussion by emphasizing the collaborative aspect of using AgentOps. It mentions the availability of community support through a Discord server and the potential for further collaboration, such as podcasts or interviews. The paragraph reflects on the positive experience of creating AI agents with AgentOps and the value of the community in troubleshooting and sharing knowledge.

Mindmap

Keywords

πŸ’‘Crew AI

Crew AI is a platform for building AI agents. In the video, it is mentioned as the foundation for creating a web scraping agent that can fetch and summarize information from websites. The term is central to the video's theme of developing AI agents for various tasks.

πŸ’‘Ops

Ops typically refers to operational aspects of a system or process. In the context of the video, 'Ops' is part of the 'Crew AI with agent, Ops' framework, which is used to build and manage AI agents effectively.

πŸ’‘Web Scraper

A web scraper is a software tool used to extract information from websites. The video discusses creating a web scraper as part of an AI agent's functionality, which scrapes data from a given URL and formats it for further processing.

πŸ’‘Table Formatting

Table formatting refers to the presentation of data in a structured grid of rows and columns. The script mentions formatting the summarized web data into a table, which is a way to organize and present information clearly.

πŸ’‘Augment

Augment is an alternative to co-pilot, mentioned as a code assistant that works faster and better by scraping the entire database instead of just in-context. It is used in the video to assist in the development process of the AI agent.

πŸ’‘Fir Crawl

Fir Crawl is described as an easy web scraping framework used in the video. It is an open-source tool that can take any website and return clean markdown text, which is then used for further analysis or summarization by the AI agent.

πŸ’‘API Key

An API key is a unique identifier used to authenticate requests to an API. In the script, setting an API key is part of the process of using the Fir Crawl tool to perform web scraping tasks.

πŸ’‘Environmental Variables

Environmental variables are used to configure and customize the runtime environment of an application. The video script mentions loading environmental variables to ensure they are included in the Python script when it is run.

πŸ’‘Agent Ops

Agent Ops is a platform for building reliable AI agents and monitoring their performance. The script discusses using Agent Ops to track requests, costs, and latencies associated with the large language model provider, providing observability into the agent's operations.

πŸ’‘Large Language Model (LLM)

A large language model (LLM) is an AI model trained on vast amounts of data to understand and generate human-like text. The video describes using an LLM to summarize web content into a table format, demonstrating the application of such models in processing and analyzing data.

πŸ’‘Error Dashboard

An error dashboard is a tool that provides a visual representation of errors and issues within a system. The script mentions using an error dashboard to identify and troubleshoot problems encountered during the agent's operations, enhancing the debugging process.

πŸ’‘Observability

Observability refers to the ability to understand the internal state of a system by only observing its outputs. The video emphasizes the importance of observability in agent development, allowing developers to track agent performance, costs, and reliability over time.

Highlights

Introduction to building AI agents with Ops to scrape web data and summarize it in a table format.

Use of 'augment' as an alternative to co-pilot for faster and more effective web scraping.

Explanation of initializing 'agent ops' and the importance of session management in Python scripts.

Utilization of 'fir crawl', an open-source web scraping framework, for easy data extraction.

Demonstration of using the 'requests' library to interact with the 'fir crawl' API for web scraping.

Importance of environmental variables in Python scripts for successful execution.

Testing the 'crawl web' function and observing the scraping process through the agent Ops dashboard.

Introduction of 'agent ops.record function' for tracking the functions executed during the session.

Integration of OpenAI for summarizing web content using large language models.

Automatic tracking of requests and costs when using large language model providers with agent Ops.

Debugging process using agent Ops dashboard to identify and fix errors in the code.

Overview of creating a job posting agent using separate research, writing, and review agents.

Discussion on the challenges of managing chatty agents and the lack of observability in agent operations.

Use of agent Ops session ID to access detailed information and chat history for debugging purposes.

Analysis of agent performance over time and identification of common errors affecting completion rates.

Introduction of 'Web Arena', an open-source evaluation set for testing agent performance on web tasks.

Explanation of agent testing, monitoring, and the importance of observability for scaling production.

Final thoughts on the value of agent Ops for understanding, debugging, and improving AI agent operations.

Transcripts

play00:00

we are going to use crew AI with agent

play00:02

Ops to build AI agents so let's get

play00:05

started sounds good we're going to make

play00:07

a agent that can scrape the web fetch

play00:10

information about what's on the website

play00:12

and then format it in a table uh first

play00:13

thing we do with every python file is we

play00:16

create a um an explanation what we're

play00:18

doing so we're going to do two things

play00:19

number one we're going to create a

play00:20

website scraper so we'll say We'll

play00:22

create summarized text function we'll

play00:24

also create def craw web function yeah

play00:27

import agent Ops obviously which code

play00:30

assistant that you're using right now

play00:31

right now I'm using augment which is a

play00:33

um it's an alternative to co-pilot it's

play00:35

a lot faster actually and it seems to

play00:36

work a lot better cuz it scrapes your

play00:38

entire database instead of just doing in

play00:40

context yeah I need to try it I actually

play00:42

I'll hook you up with the guys

play00:43

afterwards all right sweet so let's

play00:45

define our M we'll say agent ops. init

play00:48

we'll create this all we have to do is

play00:50

run end session after all of this done

play00:53

all right so we're going to say number

play00:54

one uh web data equals crawl web and

play01:00

then we'll say we'll make this a return

play01:02

to string quick tip for the people

play01:04

watching if you are confused at any part

play01:06

you can take a screenshot and paste it

play01:07

into cgbd and ask it to explain so we'll

play01:10

say web data equals craw web and we'll

play01:13

say uh summarized text equals summarize

play01:18

text on web data all right awesome so so

play01:21

we're just storing the outputs of the

play01:23

functions into simple variables exactly

play01:26

so we're just going to call two

play01:27

functions we're going say craw the web

play01:29

and then with that web data we're going

play01:30

to summarize it uh so we've yet to

play01:32

implement what that looks like but we'll

play01:34

figure that out just right now so uh the

play01:37

crawl web I'm going to use a third party

play01:39

tool called fir crawl so fir crawl is a

play01:43

uh it's an easy web scraping framework

play01:47

it's open source uh you can basically

play01:49

just put in any website and we'll give

play01:51

you just markdown text that you can feed

play01:53

to any LM it's super clean and super

play01:55

easy so we're just going to use this guy

play01:56

to create a simple WebCrawler um let's

play02:00

look at the documentation see how we can

play02:01

make it work a lot of people are not

play02:04

sure like which crawler to use so we

play02:06

just say like this one is the best from

play02:08

your experience fir crawl is probably

play02:10

one of the easiest ones to use uh and it

play02:12

works like super super well so uh let's

play02:15

just get the code for this take the

play02:17

python code assume that works and there

play02:22

we go all right so um what are we

play02:26

missing here

play02:27

pren awesome

play02:30

we're just go into pip install fir crawl

play02:32

real quick pip install fir crawl okay

play02:36

we're going to say that a URL belongs

play02:38

here it should be a string awesome and

play02:40

we're just going to use the request

play02:41

library to hit the API and then we're

play02:43

going to jump uh just dump the Json so

play02:47

json. dump best and we should be good to

play02:49

go um okay can you quickly go over the

play02:52

entire function for people who might you

play02:53

know not be following okay so what we're

play02:55

doing here is we're just going to use

play02:57

the request library to hit the fire CRA

play02:59

AP fir crawl takes a simple parameter

play03:02

takes the URL of the website we're

play03:03

trying to scrape it will wait and then

play03:05

it will just return exactly what we're

play03:07

looking for at the end of it so we set

play03:09

the headers so we just say application

play03:11

jsol which is pretty stand standard we

play03:13

set an API key uh which is the fire API

play03:16

key uh we just do request. poost we send

play03:19

it to the URL with the headers and then

play03:21

this body which should be the Json

play03:23

containing the URL that we want to post

play03:25

and then we just wait to see if we get

play03:26

the right status code if it's uh not 200

play03:29

then going to say it failed otherwise

play03:30

we're going to return the status which

play03:32

will or return the uh the body which

play03:34

should be response. Json so uh all we're

play03:37

going to do next is um one one other

play03:41

thing we're going to do is make sure

play03:42

that you load in your environmental

play03:43

variables so we're going to do load. EnV

play03:48

so this makes sure that your

play03:49

environmental variables are going to be

play03:51

in your python script when you run it so

play03:53

what we're going to do right now is

play03:54

we're just going to test to make sure

play03:55

that this actually works so uh we're

play03:57

going to run this craw web function

play04:00

we're going to give it the URL

play04:02

htps agent ops.

play04:05

a uh we're going to run the uh web

play04:09

crawler function and see what happens

play04:11

python main.py let's see if this guy

play04:13

runs okay no module Nam fire C just get

play04:16

rid of that

play04:19

guy okay so python main.py uh nothing

play04:23

happened because we did not run

play04:27

main okay so kind of a high level view

play04:30

of the code again we defined our web

play04:31

crawler with crawl web we have a

play04:33

summarized text which is yet to be

play04:34

defined and then we have a main function

play04:36

that takes we're going to run agent ops.

play04:38

init that basically kicks off our

play04:40

session we're going to run the crawler

play04:41

function and then we're going to end the

play04:43

session to make sure it's success or

play04:44

fail so I'm just going to run main.py

play04:47

see what

play04:48

happens okay so we got our session

play04:50

replay ID right here uh right now it's

play04:53

probably scraping the web so let's see

play04:55

what happens all right so in our

play04:56

dashboard we can see that this run took

play04:59

10

play05:00

seconds and we didn't record any data

play05:03

though so the way that we record data

play05:04

with agent Ops is you just add the

play05:06

simple decorator function on top of your

play05:08

function so we'll say AO or agent ops.

play05:12

record function we'll say scrape the web

play05:14

and then we'll say agent Ops at repord

play05:17

function summarize text so this way

play05:21

agent Ops actually knows which functions

play05:22

are happening at at which moment so you

play05:25

can actually Trace back exactly what's

play05:26

happening we can even add it to the main

play05:28

function too so we'll say a

play05:30

main cool so let's see what happens next

play05:33

let's run it one more time see what goes

play05:35

on so at what point can you go into the

play05:38

agent op agent Ops dashboard and start

play05:41

seeing it there so as soon as you see

play05:43

the session ID show up in the terminal

play05:45

you're free to go check it and the

play05:46

events should come streaming in

play05:48

automatically okay so you can see the

play05:50

session ID up here so that means the

play05:52

events are automatically streaming in so

play05:53

if we actually check out the link it

play05:55

should start showing up so let's go

play05:57

check it out all right awesome so we

play05:59

could see that that web scraping event

play06:00

probably took 13 seconds if we look down

play06:02

at the waterfall graph we could see okay

play06:05

so this was the the scraping the web

play06:07

function uh we can see that it actually

play06:09

got all the data from agent Ops so we

play06:10

can see the content is all on this web

play06:13

page so this is all text that I can feed

play06:15

to my large language model all right

play06:17

awesome so now what we're going to do is

play06:21

now we're going to feed this to a large

play06:22

language model and see if we can get a

play06:23

table summary of it so let's import open

play06:26

AI we can use any llm here we can use

play06:29

gro we can use anthropic but open AI for

play06:31

Simplicity so the magic about agent Ops

play06:34

is that when you import almost any large

play06:36

language model provider we automatically

play06:39

track all the requests that go through

play06:40

and we can track the cost we can track

play06:42

the latency we track whether it actually

play06:43

works for you so just by importing it

play06:45

alone you're good to go so what we're

play06:47

going to do here is uh we're going to

play06:49

set an open AI client in the function so

play06:52

let's say summarize text we'll say uh

play06:55

client equals open AI open AI we set our

play06:59

open a key uh and we'll say uh messages

play07:04

equals your web summarizer agent your

play07:06

task is to read through web content and

play07:08

tell me the name title interesting facts

play07:10

of the company the website does be clear

play07:12

and don't make things up awesome so this

play07:15

this looks actually pretty good to me

play07:16

for a automatic completion uh so we're

play07:19

going to take here is we have client.

play07:22

chat. completion. create we're going to

play07:25

set our model to GPD 3.5 turbo we're

play07:27

going to set our messages to messages so

play07:29

so um that should give us a good way of

play07:34

starting so let's run

play07:35

main.py all right we got our session ID

play07:38

up

play07:38

here we're going to wait for that to

play07:41

occur uh and then we can basically track

play07:44

all the events happening in the agent

play07:46

Ops dashboard so let's take a look here

play07:49

uh we have one three events so far we

play07:52

have the uh web scrape and then we have

play07:55

the uh the open Ai call and we can see

play07:57

here it costs less than a penny

play08:00

and here's all the text it

play08:01

took and here's our easy summary so it

play08:04

tells us the name of the website agent

play08:07

Ops title build AI agents with LM apps

play08:09

interesting facts agent Ops is a

play08:11

platform to build reliable AI agents and

play08:12

monitoring awesome all this information

play08:14

checks out one last step we're going to

play08:16

take here is we're just going to add one

play08:18

more open Ai call and we're going to say

play08:21

uh response equals that we'll say

play08:24

response text equals the text and we're

play08:28

going to make one more open Ai call say

play08:30

uh table messages

play08:33

equals make the response text a

play08:37

table Your Role is to summarize the

play08:42

table the information below as a table

play08:46

and

play08:47

markdown okay awesome so we got the the

play08:50

messages in here and we're just going to

play08:51

copy paste the code from above and see

play08:53

what we get

play08:56

so say response equals client.

play09:00

completion. create reset our message to

play09:02

this uh and we'll say all right awesome

play09:06

and we'll just print that at the end

play09:09

return

play09:12

response

play09:13

print response and we'll say return

play09:17

response. choices. text and yeah that

play09:21

should just about do it so uh and we're

play09:24

just going to print here print

play09:27

summarize text

play09:29

got it so in the past few minutes we

play09:32

have now created an agent that can crawl

play09:34

the

play09:35

web get that web data and then summarize

play09:38

to a table and we're just going to spend

play09:39

one more time trying to

play09:42

um to uh run it also you can also see

play09:46

for all prior runs every time you finish

play09:49

a run you get a chat summary saying

play09:50

exactly how much you spent for that run

play09:52

so that way it's a lot easier of a way

play09:53

to just track what's going on awesome so

play09:56

it looks like we actually had some

play09:57

errors here so now we can use Asian Ops

play09:59

to bug what the errors were in the

play10:01

dashboard so I'm going to open up the

play10:02

link and see exactly what

play10:04

happens chat completion object is not

play10:07

subscriptable okay so I can use this

play10:09

traceback to see exactly what happened

play10:11

here uh my guess is that I actually used

play10:13

the open AI client incorrectly so that

play10:15

way I can go back in time and see what

play10:17

exactly occurred so we get a big error

play10:18

bar here saying in the main function

play10:20

which we decorated at the beginning uh

play10:23

this was kind of like the encapsulating

play10:24

function that caused the issue we can

play10:26

see that the web scraping worked cuz

play10:27

it's nice and blue but we have we have

play10:29

an error that was attached to the llm uh

play10:32

so we actually had a type error that

play10:33

said chat completion object not

play10:34

subscriptable and we could basically use

play10:36

this as a way to read through what the

play10:38

parameters of the function were and also

play10:41

what the returns were and that way we

play10:43

can basically see exactly what were

play10:44

wrong and then fix that in our code we

play10:47

just took an example crew AI uh notebook

play10:49

so essentially this agent takes three

play10:52

separate agents that work together it

play10:54

takes a research

play10:56

analyst a writer agent and then a review

play10:59

agent if we actually look at what these

play11:00

agents do we can see that the research

play11:03

analyst will analyze company websites

play11:05

the writer agent will use the insights

play11:08

from the website to create a detailed

play11:10

engaging and enticing job posting and

play11:13

then the final The Specialist the review

play11:15

specialist will use the job posting to

play11:17

create a clear more concise

play11:19

grammatically correct job posting that

play11:20

we can post on LinkedIn or indeed or any

play11:23

other job websites uh so all we have to

play11:26

do for agent Ops if we're going to plug

play11:28

this in is two lines of code you just do

play11:31

import agent Ops and agent ops. in it

play11:34

and then optionally you can add any tags

play11:35

so you can track your sessions more

play11:36

easily and then you're good to go so

play11:39

let's try running it see what we get and

play11:41

we're also going to link this GitHub

play11:42

repo below the video so people can just

play11:44

clone it themselves all right so uh

play11:46

we're going to run the python script so

play11:49

just run main.py so it's going to ask us

play11:52

two things number one what is the

play11:53

company description and number two it's

play11:54

actually going to give us a little handy

play11:55

dandy link to where we can inspect the

play11:58

session later so we'll say say agent Ops

play12:00

does a i agent testing monitoring and

play12:06

evals agents suck we're fixing that

play12:10

company domain agent ops. hiring

play12:14

needs AI

play12:17

Engineers speciic benefits work with the

play12:21

most cracked developers in San Francisco

play12:26

all right so right now the agents are

play12:27

all going to work together to spin up

play12:30

and create this amazing job posting it's

play12:31

going to scrape the web it's going to

play12:32

use a set of tools and they're all going

play12:34

to work and chat together to be able to

play12:35

solve this problem all dynamically so

play12:39

first thing that we notice here is that

play12:40

the agents are kicking off uh one big

play12:43

problem here is these things are very

play12:44

yappy that's a technical term by the way

play12:47

so these agents are just spewing tokens

play12:48

all over my console and if I want to

play12:50

debug this thing it's a huge pain in the

play12:51

neck so you can see here there's just

play12:53

like so much stuff going on um wall of

play12:57

text wall of text exactly so being able

play13:00

to just parse through this is a

play13:02

nightmare to begin with uh secondarily

play13:04

like we actually have no observability

play13:05

what's going on what step of the

play13:07

sequence is at how much it's costing us

play13:09

how long it's going to go on for so

play13:10

that's a huge pain in the neck but

play13:12

thankfully since we have this agent off

play13:13

session ID up here all we have to do is

play13:15

click this link and it's going to open

play13:17

our current session so our current

play13:19

session we can see that we started this

play13:21

about 2 minutes ago and it's already

play13:23

cost us 14 cents in open AI credits

play13:25

which is pretty crazy to think about we

play13:26

get all of the environment information

play13:28

about uh which environment we're running

play13:30

on so it could be a pod or Docker

play13:32

container you name it uh we're running

play13:34

on my MacBook right now but kind of more

play13:36

interestingly we can see like the entire

play13:38

chat breakdown so remember how we spun

play13:40

up three different agents we can

play13:42

actually see that we have the research

play13:44

analyst the job description writer and

play13:45

the review specialist all here in the

play13:47

timeline and we can filter down by those

play13:49

if we really wanted to um also so here's

play13:52

the full chat history so instead of

play13:54

parsing through again this really messy

play13:57

block of text we can just watch it

play13:58

through here which is a huge time saer

play14:01

uh lastly we actually get this like

play14:03

Dynamic waterfall breakdown of all the

play14:06

actions that the agent was taking all

play14:07

the tool calls like searching websites

play14:09

all the llm calls that uses gp4 so we

play14:11

can see this one cost us 4 cents we can

play14:13

see the entire prompt and completion

play14:15

here and all the reasoning that it goes

play14:16

through uh I'm going to go back to a

play14:18

previous session I ran in the past and

play14:20

just show you how powerful this could

play14:21

actually

play14:22

be yeah I mean this dashboard is just

play14:24

going to let people understand agents

play14:26

much faster because especially when

play14:27

people are new to building agents it's

play14:28

got hard to understand what they're

play14:30

doing right but in agent OBS you can

play14:32

clearly see that every step of the

play14:35

way basically we are taking agents we a

play14:38

black box agents used to be a black box

play14:41

we're making it so we're taking a

play14:42

flashlight you can see exactly what's

play14:43

going on you see how much they cost you

play14:44

see how long they take you can see

play14:46

actually how they solve problems how

play14:47

they reason through things so here's an

play14:49

agent for example that ran and cost me

play14:51

almost $10 an open AI credits and took

play14:54

almost 12 minutes took 11 minutes and 8

play14:56

seconds uh and you can see actually had

play14:58

a stop at pre prematurely I can see the

play14:59

whole end session reason the sign

play15:01

detected but also all the environment

play15:03

variables uh and I can reason through

play15:05

like okay so which uh agent was doing

play15:07

the most amount of time so I could see

play15:09

all the agents let's suppose I want to

play15:10

look at the review specialist and see

play15:12

exactly what was happening I could see

play15:13

that the review Specialist made 44 tool

play15:15

calls and 40 llm calls so that's

play15:18

probably a big mistake and one big

play15:20

challenge here is that had this repeat

play15:21

thought and the repeat thought was that

play15:23

it was constantly doing the same thing

play15:24

over and over again so we have this

play15:25

repeat thought catcher that can

play15:27

basically say sometimes the agents going

play15:29

circles how do we make sure they stopped

play15:30

going in circles and so we know that

play15:32

agent the the review specialist in

play15:33

particular was problematic so we can go

play15:35

back in time and essentially rewire the

play15:37

prompt to make sure that it

play15:39

works so I'm going to go look at the

play15:41

entire timeline of events and see

play15:42

exactly what could have been changed

play15:43

here so for example here oh what's this

play15:46

there's a big red error I wonder what

play15:47

that could have been so basically we

play15:49

actually ran out of context length that

play15:51

this agent was so yappy so we got a 400

play15:54

error code we consumed over 8,000 tokens

play15:57

uh because we used 88,200 38 tokens and

play16:00

so that actually showed a huge stack

play16:01

Trace error that we could have prevented

play16:03

by doing better context management so

play16:05

this is a huge step up from instead of

play16:07

just looking at the terminal and trying

play16:08

to figure out what's going on you

play16:09

basically have a giant error dashboard

play16:11

just says okay super easy to do through

play16:13

here give it a shot and also you can use

play16:16

any of the main agent Frameworks

play16:18

right we support most of the main agent

play16:21

Frameworks so we're totally plugged in

play16:22

with crew AI you can just pip install

play16:24

agent Ops and pip install crew uh with a

play16:26

special branch and you're good to go

play16:28

also Al we're native on Pi autogen which

play16:30

is the Microsoft framework uh for

play16:32

building multi-agent Frameworks uh if

play16:35

you're building agents of Lang chain we

play16:36

work with that and we're rolling out

play16:37

support with uh llama index later this

play16:40

week nice so in addition to actually

play16:42

looking at individual sessions I can get

play16:44

an overall view of how all of my agents

play16:46

are performing over time so I've been

play16:48

able to look at individual sessions but

play16:49

how about the aggregate so here's a

play16:51

session breakdown so I'm going to select

play16:53

agents I've run in the past 30 days we

play16:55

can see most of my agents uh actually

play16:57

never completed 36 of them never

play16:58

completed and I can see like basically

play17:00

all of the highle metrics I care about

play17:02

to know what's causing the agents to

play17:03

fail for example this one I interrupted

play17:06

about six of them but a lot of them also

play17:08

uh come from these 429 error codes a lot

play17:10

of them come from 400 error codes so on

play17:11

and so forth so I can use that as

play17:13

information to understand why my agents

play17:14

Break um more interestingly I can also

play17:17

see how much I've been spending on these

play17:18

things agents again super expensive

play17:20

right now they're expensive and they're

play17:21

unreliable and you want to have

play17:23

observability to exactly why they're

play17:25

breaking and how much they cost you

play17:26

otherwise you're not going to scale the

play17:27

production and they're not going to

play17:28

change the world if they're too

play17:29

expensive and too slow so that's how

play17:32

this dashboard gives you a high level

play17:33

view of exactly what the agents are

play17:35

doing it's going also sort of like a

play17:37

personal history of you know developing

play17:40

agents like for example on GitHub you

play17:42

have the heat map of when you're active

play17:44

right so this is kind of similar where

play17:45

you can see like how many agents you've

play17:47

built and how you've improved over time

play17:49

maybe reduced the amount of Errors

play17:50

you're getting stuff like

play17:52

that yeah we're working in uh actually a

play17:55

handful of like cicd things so if you're

play17:57

trying to roll out a test kit for your

play17:59

agents we also have that covered too

play18:01

okay so for example if you're running

play18:02

agents against variety of like different

play18:05

tests we have several thousand actually

play18:07

loaded into the platform right now so

play18:08

here's an agent configuration test

play18:10

called Web Arena web arena is basically

play18:12

an open source evaluation set where you

play18:14

can run agents against websites and see

play18:15

how well they perform on doing web tasks

play18:18

so for example on this website the task

play18:20

was to find the top selling brand in q1

play18:23

2022 and here's the website we can take

play18:25

a look and see exactly what that looks

play18:27

like so some sort of dashboard the

play18:29

agent's job is to basically log into the

play18:31

dashboard and find that information we

play18:33

can see that the agent failed the answer

play18:34

was supposed to be Sprite and it gave

play18:36

the wrong answer so that means the

play18:38

evaluation failed but we can do this for

play18:40

human evals exact evals fuzzy matches

play18:42

you name it all this information you can

play18:45

basically track and see how your agents

play18:46

are increasing or losing performance

play18:49

over time W that's that's actually super

play18:51

powerful uh the way it works is we do

play18:53

all sorts of thing everything related to

play18:55

agent tracking agent testing and agent

play18:56

monitoring so replay Analytics LM cost

play18:59

tracking benchmarking compliance we have

play19:02

a ton of Integrations with Frameworks

play19:03

like linkchain crew Ai and autogen uh

play19:07

and all it really takes is two lines of

play19:09

code really it's that easy all you have

play19:11

to do is import agent Ops and run agent

play19:13

Ops out in it optionally if you want to

play19:15

track specific functions you just run

play19:17

this agent ops. record function and it

play19:19

automatically adds to your dashboard the

play19:21

only other thing you might have to do

play19:23

and it's totally optional is just end

play19:24

your session and this way you can see

play19:25

whether your agents succeeded or your

play19:27

agents failed and all from doing that

play19:29

you get these really fancy dashboards

play19:30

that show you exactly what your agents

play19:32

were doing you have the ability to

play19:34

rewind and understand what your agents

play19:35

cost and how much they take in terms of

play19:37

time and how much compute they take and

play19:40

then lastly you get these like waterfall

play19:41

diagrams showing you exactly what they

play19:43

were doing at any given moment uh at any

play19:45

given moment in time you can see how

play19:46

much they cost in terms of llms whether

play19:48

they're errors you can see if you have

play19:50

interruptions in your services it just

play19:52

all works perfectly out of the box by

play19:54

the way guys if you get stuck at any

play19:55

point building with agent Ops they have

play19:57

a really cool Discord server with over

play19:59

1,000 members but it's super active and

play20:01

you can ask any question so even though

play20:02

it's like three lines of code sometimes

play20:04

you know you can get unexpected errors

play20:06

and you know I'm sure Alex or other

play20:08

people will help you out yeah man this

play20:10

was super fun uh I'm really glad to put

play20:12

this together man your channel is sick

play20:14

like I was watching some of the

play20:15

interviews like I I really love how

play20:17

you're growing so it's really really

play20:19

cool I mean we can do a podcast too like

play20:21

just me and you if you want to

Rate This
β˜…
β˜…
β˜…
β˜…
β˜…

5.0 / 5 (0 votes)

Related Tags
AI AgentsWeb ScrapingText SummarizationAgent MonitoringPython ScriptingAPI IntegrationError TrackingCost AnalysisPerformance MetricsDashboard Insights