How OpenTelemetry Helps Generative AI - Phillip Carter, Honeycomb
Summary
TLDRPhilip from Honeycomb's product team discusses the role of Open Telemetry in improving generative AI applications. He emphasizes the importance of observability to understand user interactions and model performance, highlighting challenges in managing costs and ensuring reliability with AI's unpredictable nature. The talk explores the practical aspects of building AI applications, the use of language models, and the significance of context and prompting techniques. It also touches on the ongoing work within the Open Telemetry community to standardize tracing and metrics for AI applications.
Takeaways
- đ Open Telemetry is a crucial tool for improving generative AI applications, despite being considered one of the least interesting parts of the project by the speaker.
- đ The speaker emphasizes the importance of observability in AI, especially in understanding user inputs and outputs to make AI applications more reliable.
- đĄ AI and generative models are becoming more accessible and affordable, shifting the bottleneck from large tech companies to the broader developer community.
- đ The speaker discusses the challenges of managing costs and understanding model performance when building AI applications.
- đ The key to building successful AI applications is understanding the right prompting techniques and knowing when to fine-tune or train your own language models.
- đ The speaker highlights the significance of having good data to feed into the AI models, as well as the right context for user inputs to produce accurate outputs.
- đ ïž The process of building AI applications involves a stack of operations before calling a language model, including search services and retrieval-augmented generation.
- đŹ Observability is akin to a tracing problem, where capturing the entire flow from user input to model output is essential for analysis and improvement.
- đ Metrics like latency, error rates, and cost are important but often easier to manage compared to ensuring the right data is fed into the model and the model behaves correctly.
- đ The speaker suggests logging extensive information about the AI application process, including prompts, model responses, and post-processing steps for better debugging and improvement.
- đ Open Telemetry is being improved with semantic conventions for AI applications, aiming to standardize the representation of operations and data within traces and logs.
Q & A
What is the speaker's name and what team does he work for at Honeycomb?
-The speaker's name is Philip, and he works for the product team at Honeycomb.
What is the main topic of Philip's talk?
-The main topic of Philip's talk is how open telemetry helps generative AI, although he mentions that he won't be discussing open telemetry in depth.
What does Philip consider to be the least interesting part of the project?
-Philip considers open telemetry to be the least interesting part of the project, as it should just work and be helpful without being the main focus.
What is the purpose of good observability in the context of generative AI applications?
-Good observability is important for understanding what users are inputting, what the outputs look like, and how to improve the AI based on real-world usage.
What is the current state of AI in terms of accessibility and cost?
-AI, particularly powerful machine learning models, is becoming more accessible and affordable for a broader audience, with costs decreasing over time.
What challenges do developers face when managing generative AI applications?
-Developers face challenges such as managing costs, understanding model performance, and determining the right kind of application to build.
What is the significance of the 'killer apps' mentioned in the script?
-The 'killer apps' like chat GBT and G Co-pilot represent successful applications of AI, but they also indicate the competitive landscape and the need for innovation beyond these applications.
What does Philip mean by 'inscrutable black boxes' in the context of generative AI?
-By 'inscrutable black boxes,' Philip refers to the non-deterministic nature of AI models, which can be difficult to understand and predict in terms of their outputs.
What is the importance of understanding user behavior and inputs in AI application development?
-Understanding user behavior and inputs is crucial for improving AI applications, as it helps developers to refine prompts, model usage, and overall application performance.
What role does open telemetry play in addressing the challenges faced by developers in AI applications?
-Open telemetry plays a role in providing observability into the AI application's performance, helping developers to trace and understand the flow of data and the impact of various components.
What is the 'golden triplet' that Philip mentions for analyzing AI applications?
-The 'golden triplet' refers to the combination of inputs, errors, and responses for each user request, which is essential for evaluating and improving AI application performance.
Outlines
đ€ Introduction to Generative AI and Open Telemetry
Philip, a product team member from Honeycomb, introduces the topic of generative AI and its integration with open telemetry. He emphasizes the project's goal to operate quietly in the background, benefiting users without being a central focus. The talk is based on his experience in enhancing AI features by observing user interactions post-release. Observability plays a key role in understanding user inputs and system outputs, which is crucial for improving AI applications. The discussion touches on the accessibility of powerful machine learning models and the challenges faced in managing costs and performance, as well as the need for creativity and reliability in AI outputs.
đ The Role of Observability in AI Application Development
This section delves into the importance of observability in developing AI applications, particularly focusing on the input-output dynamics of generative AI models. It outlines the process of gathering contextual information to enhance the model's output, mentioning the concept of retrieval-augmented generation (RAG). The speaker discusses the challenges of ensuring the right data is fed to the model and monitoring the model's behavior with the correct inputs. The section also touches on the less critical but still important aspects like latency, error rates, and cost, suggesting that these are usually easier to manage compared to the core data handling and model behavior.
đ Tracing and Observability in AI Application Management
The speaker presents a simplified diagram of a typical generative AI application, highlighting the complexity of the processes involved in input handling and output generation. He discusses the use of tracing to monitor the end-to-end flow of user interactions with the AI system. The importance of capturing detailed information about the system's operations, such as input prompts, model responses, and post-processing steps, is emphasized. The section also introduces the concept of using open telemetry for observability, suggesting that while it may not be the most exciting part of the project, it is a crucial and well-suited tool for the job.
đ Open Telemetry's Application in AI and Ongoing Developments
This part of the script discusses the practical application of open telemetry in AI systems, focusing on the need for detailed logging and analysis of user inputs, errors, and model responses. The speaker shares his experience in using open telemetry for pattern recognition and improvement of AI applications. He mentions the ongoing work in the open telemetry community to define standards for instrumenting AI applications, including the handling of prompts, responses, and other metadata. The potential for auto-instrumentation in the future is also highlighted, suggesting that open telemetry will become an even more integral part of AI application development.
đ§ Debugging and Data Collection for AI Model Improvement
The final paragraph discusses the intricacies of debugging AI models and the importance of data collection for model improvement. The speaker talks about the challenges of dealing with black box AI models and the strategies used to understand and improve their outputs. He describes the process of building databases of examples for few-shot prompting and the use of CSV files to analyze patterns in user behavior and system responses. The section also touches on the practical aspects of data volume and the cost of observability systems, suggesting that while there are challenges, they are manageable and often lead to better system performance and understanding.
Mindmap
Keywords
đĄOpen Telemetry
đĄGenerative AI
đĄObservability
đĄLanguage Models
đĄPrompting Techniques
đĄRetrieval-Augmented Generation (RAG)
đĄVector Search
đĄTraces
đĄAuto Instrumentation
đĄEvaluation
đĄDebuggability
Highlights
Open Telemetry's role in generative AI is less about the technology itself and more about its seamless integration and utility.
Observability is crucial for understanding user interaction with generative AI applications and improving AI features based on real-world usage.
Generative AI models, while powerful, are often non-deterministic and can be challenging to manage for consistent output quality.
The accessibility of advanced machine learning models has democratized AI development, making it more widely available to developers.
Managing costs and understanding model performance are key challenges in the practical application of generative AI.
The importance of selecting the right application for AI to ensure its success and avoid competing with established market leaders.
Generative AI applications often involve a combination of user input, context gathering, and language model calls to produce output.
The concept of retrieval-augmented generation (RAG) allows for leveraging existing language models with contextual data to enhance responses.
Two key questions in building AI applications are ensuring the right data is fed to the model and verifying the model's correct behavior with the right inputs.
Latency and error rates are often secondary concerns in AI applications due to user expectations and the nature of language models.
The importance of logging and tracing in understanding the flow of data and decisions leading up to a language model call.
Open Telemetry's fit-for-purpose nature makes it a suitable tool for tracing and observability in generative AI applications.
The ongoing work in the Open Telemetry semantic conventions working group to standardize the instrumentation of AI applications.
The potential for auto-instrumentation in AI SDKs to simplify the implementation of Open Telemetry for AI applications.
The use of the 'golden triplet' of inputs, errors, and responses to analyze and improve AI application performance.
The practical approach to debugging and improving AI models by capturing detailed logs and understanding user behavior patterns.
The consideration of sampling strategies and cost management when dealing with large volumes of data in AI applications.
Current research into true debuggability of AI models and the potential for more sophisticated understanding of model decisions.
The comparison of AI observability challenges with regular system observability and the focus on pattern recognition for improvement.
Transcripts
so hi uh my name is Philip I work for
honeycomb um I'm on the product team uh
I um talking about a fun topic uh how
open Telemetry helps generative AI I'm
not going to talk about open Telemetry
very much though um and that's because
open Telemetry is like one of the least
interesting parts of all of this which I
think is kind of like a goal of the
project if you will just kind of works
and is really helpful for people um so
that's what this is going to be about uh
this is based on uh last year uh early
last year um I did the old around and
find out thing um where in the course of
around I found out a whole lot about how
you can make AI better when you uh build
a feature and then release it to all of
your users uh and um find out what they
actually want to do with it uh once it's
live um and it turns out having good
observability for things like what are
people putting into this input box what
is the output look like what do we do
about that um it's kind of a a good good
use for obser ability so let's get into
it um so a is all the hype AI is all the
hype these days um this talk is not
going to be focused on infrastructural
level stuff so this is not about like
monitoring your gpus um or anything like
that if you're like working in the cloud
offering gen Services you might care
more about that you might care more
about how you do inference um monitoring
all of those are like completely other
talks that we could give uh this is for
the majority of people out there who are
building applications that use some
generative AI model like
gp4 um behind an API and they want to
just make it good because what's really
cool is like it's kind of got into this
world where like quite literally the
world's most powerful machine learning
models are broadly available for anyone
to use at a relatively cheap price and
getting a lot faster and a lot cheaper
by the like month basically um so like
this bottleneck where like you could
maybe build something really cool using
AI um was like stuck in the likes of
Amazon and Google and meta and Microsoft
and all that like you just couldn't do
that as a normal developer now that's
not the case anymore but that doesn't
mean that like everything is all magic
and sugar and all of that um there's
there's a lot of problems that people
have um when it comes to managing costs
when it comes to understanding how these
models even perform when it comes to
figuring out what the right kind of
application you're going to build is um
there's like killer apps already like
chat gbt and G co-pilot but like chances
are if you're going to create like a
chat rapper it's not going to do very
well if you're going to try to create
like a little code completion thing um
Good Luck competing against GitHub uh
they've got like a five-year head start
on you so um you know it's great but
like there's a lot of opportunities out
there that are just outside of chat apps
and outside of little tab completion
code helpers um but I think like it's
safe to say that this sort of broke the
tech world uh and it's still a little
bit broken and we're probably due for
one of those like Gartner hype cycle
troughs of disillusionment pretty soon
um but like the the the like it's
fundamentally changed now like we have
fundamental Computing capabilities that
we just didn't have anymore um so what
does that mean well they are powerful
but inscrutable black boxes um It Is by
design that language models that
generative AI uh in general is either
non-deterministic by Design or even if
you turn down the temperature conf which
is a value that you can that you can use
um down to zero uh depending on the
model you're using it's still
non-deterministic and like there's a lot
of variants there where some smaller
models actually are deterministic and
all of that but like the point is if you
want to generate so-called creative
responses to Things based off of inputs
that come in you don't want something
that produces boring outputs usually
like that you just don't use AI if
that's the case like you want something
that produces really interesting outputs
that are interesting but like now your
users kind of do expect some degree of
reliability and you have an inscrutable
black box that like you try to prompt it
and like good luck trying to understand
what the best prompting technique is
there's like 20 or 30 of them that are
probably helpful and some are going to
regress certain things and make other
things better and like you're not going
to know UPF front which one is the right
one um you're going to have totally
different behavior in production
compared to development because your
users are going to do things you could
never possibly expect and you're going
to have to learn the hard way that like
you got to do something different you
can't just like write some unit test and
hope it gets better that's not going to
happen if you just try to say well it
looks good on my machine let's throw it
in production like it's going to it's
going to produce garbage it's not going
to be any good and you're not going to
be able to keep that feature in
production so um let's stare at a
diagram for a little bit this is
basically every geni app today uh it's
massively oversimplified of course um
but outside of like the super boring
useless chat apps that people right um
when they're not named open AI um
there's some form of input generally
producing some kind of output it's
almost always some kind of Json and the
things that happen in between are really
interesting there's one or more language
model calls it's usually only one
because that's usually all that you need
but there's this whole stack of stuff
beforehand um called search Service uh I
pulled this this diagram out there where
it talks about Vector search a lot of
the AI world is sort of relearning that
Vector search is not the only kind of
search that you can do that's really
helpful the goal is that you want to
take user input gather a whole bunch of
contextual information about like what
could be helpful to produce an answer to
the question that they have or the
output that you're trying to achieve and
gather as much of that as possible and
produce um uh a a context package if you
will it's uh it's called retrieval
augmented generation or rag it was this
really cool behavior that some meta
researchers found in 2020 where they
figured out that language models if they
are not trained on a certain kind of
data but you feed in that data on a
request to it they can kind of act like
they were trained on that data and
there's a little bit of wiggle room
there but like it's really cool because
you don't need to train your own
language model you can use an
off-the-shelf language model and pass in
a whole bunch of stuff and produce
useful things and so this is what almost
everybody who is building AI apps today
is building some form of this diagram
okay so there's really two key questions
that you need to answer when you're
building this stuff and you want to make
it better um notice I don't have the
words latency error rate CPU statistic
GPU blah blah blah whatever uh it's is
the data right like is the right data
being fed to the model in the first
place like if I'm gathering context for
somebody's user input am I actually
Gathering the right context uh I talked
to someone last year who has a version
of that diagram where that search
service is actually six different
databases and one of the questions they
have is okay based off of the user's
input are we calling the right database
or not um how many should we call how do
we merge those results together what
sorts of search um systems do we do we
actually have and like can I ass
statically show that on like classes of
inputs I can produce context packages if
you will that are actually right for
that kind of input how do I like measure
that and see if it's continuing to
improve over time and like not
progressing over time similarly on the
model side how do you know that it's
behaving correctly when you actually do
have the right inputs right so like
assume that you have retrieval which is
a really hard problem in a lot of cases
solved is it actually still doing the
right thing like are you using the right
prompting techniques are you at a point
where you need to actually fine-tune a
language model are you at a point where
God help you you have to train your own
language model I certainly hope that's
not the case and similarly can you
systematically show that you're making
progress when you go to production and
people are inputting all kinds of weird
stuff and there's like weird outputs and
you start thinking that you're fixing
those outputs are you a actually fixing
those outputs and B are you not
regressing the stuff that was already
working in the first place these are
like these are really important things
um there's some other stuff that doesn't
really matter as much but it still kind
of matters around like latency and error
rates um I'm labeling them this way
because to be frank they're usually
pretty easy to solve in part because
users don't expect language models to be
instantaneous and so if it takes like
one or two seconds to produce a response
it's usually fine uh these things are
getting hugely better over time uh when
we released our our application early
last year um average response times were
like 5 Seconds and now it's down to like
1.5 seconds uh through like no no action
of our own um cost is also something
that like I mean in this economy
everybody's worried about cost but like
let's be real like most organizations
have budget for AI and they're willing
to spend it uh and you really don't need
the most powerful models to achieve most
outcomes that you're looking for uh
we've been live in production with gbd
3.5 since May of last year and have had
no need to change it uh if we can do it
you probably can do um and
hallucinations like C previous slide
people talk about oh I don't want the AI
app to hallucinate but like it's not
about hallucinations it's am I feeding
the right information to the model and
am I producing the right output based
off of that right information that I'm
feeding in the first place and can I
actually systematically show that over
time this is just the core of making
these apps more
reliable and so like a way that that
might look is you could imagine you have
a whole bunch of info like you want to
log like a full prompt that you build up
programmatically maybe you have a whole
bunch of steps that lead up to that um
in the the application that we built
last year there's actually on the order
of about 38 distinct operations that
happen Upstream of a language model call
so like logging all of that stuff and
tracking that understanding your latency
like your status code what your error
was like your usage if you're doing any
post-processing on the Json like what
postprocessing steps you're actually
doing your diagram kind of looks like
this um and uh it just involves
Gathering user input contextual
information request to a service
sometimes you may do multiple searches
sometimes you may have to rerank search
results based off of like different
techniques that could work better and
certain C like certain inputs May lend
themselves better to a different like
search um system like these are kind of
complicated things um and like you
eventually get to the point where you're
calling an llm and you want to have like
okay what was the input what was the
output but like there's this whole
system that you're trying to gather
information about and post processing
steps can often be a rather large um set
of things like um speaking again from
from production we have about two dozen
or so possible post-processing steps
that can occur where a language model
gets something mostly right and that
mostly right is actually something that
we can deterministically check and
either insert or remove data from like
the response that we get like this is
when you're in production you're trying
to make stuff better for your users you
find out all of this fun stuff where you
can make this stuff actually work um so
sounds an awful lot like a tracing
problem right I got all this stuff
happening Upstream of this black box
maybe it involves some other black boxes
maybe it involves a whole bunch of calls
to language models maybe it eventually
calls a language model maybe it calls a
language model 20 times maybe it calls
it five times who knows who cares I do
something afterwards like there's all
these words like Services flying around
this is literally just a tracing problem
this is an observability problem so this
is where I talk about open Telemetry um
and as I said the otel part is like one
of the least interesting Parts but I
think that's great because
uh it's actually quite fit for purpose
here um what do you want to capture well
traces yay uh you have like an end to
end flow like a user types in a thing
and they like click a button or they hit
enter like what are all the different
things that are actually hit how do you
capture that well you use traces to like
tie all of that together now it gets a
little bit more um into the Weeds about
if you want to capture a whole bunch of
information in that Trace data or if you
want to capture like for example like a
full prompt text or LM response
depending on its size like that may be
more fit for a log that you then
correlate with the trace it's kind of up
to you it's kind of up to like what you
use for your tracing backend to analyze
this data in the first place um you want
to capture information about
post-processing results um and you can
also aggregate some metrics around
things like latency and cost your
typical error rate just typical boring
stuff you can throw up on a dashboard to
sort of say okay like I know that
generally speaking it's doing all right
um there is literally nothing as far as
I can tell at least in my experience an
open Telemetry that prevents you from
doing this today um they're depending on
the language you're using maybe like for
example go with logs is like not as far
along with Java with logs so if you have
a Java app it's going to be a lot easier
than if you use go or something but like
fundamentally all the places are there
for you to be able to do this um and so
then you get into the fun stuff like
actually analyzing this information um I
have found I I I I put it in quotes I
called it the golden triplet I don't
know if it's actually that um inputs
errors and responses for each request
that a user gives and uh like if I have
an agent or a chain or something like
maybe there's there's like a correlation
ID that I that I tie to like that
particular thing that I'm doing or maybe
it's represented as multiple traces that
are linked together via span links um
again oel fit for purpose for this kind
of stuff um and I just look at patterns
of inputs and outputs like in the
natural language query feature that we
built last year it was somebody that
like for example Honeycombs back end is
like strangely complicated to ask what
an error rate is if you don't have a
metric about that um and people were
asking for what's my error rate and it's
like well crap actually that is like
weirdly unanswerable in certain ways so
like what do we even do um when when
like this is a common thing they want to
do so like we were failing in like a
category like you can imagine all the
different ways that somebody might
phrase what is my error rate um doesn't
matter how they phrased it the category
of input led to a category of output
that just sucked and so we're like great
this is like a class of bug that can now
try to solve for and we can dig into
some of those requests be like okay
these are all the decisions that we made
Upstream with a language model call this
is what the language model actually
produced these are the post-processing
steps where like we accidentally just
removed a bunch of stuff that we
shouldn't have removed and there was
like a bug in that that was unrelated to
the language model it was just us being
dumb um and uh brought it into
development and just said great I have
like concrete what is actually happening
here and I can start then annotating
outputs and saying this is what the
output was this is what the output
should be and that's called an
evaluation if you're in the ml world and
you start building up sets of these
evaluations and then you can start
systematically actually fixing this
stuff and making it better and it makes
these inscrutable black boxes tangible
and actionable rather than just throw
stuff at the wall and hope it sticks so
what's open Telemetry doing to help well
aside from being mostly fit for purpose
um there is work going on in the uh llm
semantic conventions working group on
slack uh this is where it turns out
there's a whole lot of common operations
in this kind of application that you're
building when you're talking about
Vector databases you're talking about
calling different language models whe a
language model is like a single shot
sort of thing or if it's a part of an
agent um like there's names that you can
assign to this kind of stuff and names
that we are uh assigning to like what
should live on a span versus like should
this be captured in an event that's
correlated to a span and like what
should the default be should this data
captured or should it be redacted by
default and can you turn it on what does
that mechanism look like um and we're
working with the uh open lemetry folks
who have taken a spike at like let's
build a bunch of Auto instrumentations
for this stuff and see what it actually
looks like and working with them to say
okay based off of that this works this
one doesn't work this one works really
well this one yeah maybe I don't know
and see if we can formalize that into a
spec so it's very much underway right
now um there are pieces that are like
pretty like I don't want to say it's
stable it's like totally experimental
but like you could reasonably build
instrumentations off of what's been
defined today uh but there's a lot more
work to come and uh we really I would
really encourage anyone who's interested
in this space to uh uh engage in this
area um especially if you're working for
any of the tech companies that is
involved in building models because
y'all models have like weird ways to
capture inputs and outputs and like we
like standards and stuff so it'd be
great if we could figure out the best
possible way way to represent stuff
instead of treating open AI as a deao
standard for example um so this is
what's going on otel is like good enough
for you to use today you got to do a
little bit more manual instrumentation
but like chances are uh with the budget
that's being assigned to these
applications you're going to have the
time to do that uh and there's going to
be more Auto instrumentations coming
there's good um uh good spec level stuff
being defined right now and I think in
the near future you could see this as
being as commonplace in otel as like
database stuff or HTTP stuff um and
hopefully without too much turn in the
spec itself so that's what I
got uh right now this is all patching
from the outside um so like I've written
like a library for like python that that
just calls like the open wraps the open
AI calls for example from the python SDK
um the open elemetry project is similar
sorts of things um what we are hoping as
a part of this that um like the AI
providers in their sdks just have the
otel apis and just you know like it's
just producing like no up spands for
example so it doesn't impact anyone
unless they turn it on um kind of going
again with sort of the goal of votel
where like instrumentation is everywhere
and then it just you can just turn it on
and and it's available so um but first I
think like we need to lay some of the
groundwork because they're going to have
immediately questions like hey should I
like put the prompt in the span or
should I create an event or like should
I even do that like what do I I do uh
and that's where like us nailing this
down on the spec Level side from the
open Telemetry project standpoint is
really going to help them out yeah so so
that yeah for anyone who didn't hear the
question this is so I gave input an
example of like user input possible
error and LM output as something that
you could look at what are some other
examples of things um so uh some some
examples that I can tell you by way of
um example from one of honeycombes
features is so like it's like natural
language to querying tool um so you need
to query a data set that data set has a
schema that schema can be massive and
you can't just include literally every
single name of everything in the schema
inside of every request that you make to
the model so there's this problem of
like okay what subset do we actually
pick which one is the most appropriate
subset so um we have like like there's
there's text based search and there's
Vector search and there's like which one
did we pick um which subset from each
did we end up picking what was the
actual like result that we gave so like
you know I'm I'm like you know you you
don't want to capture like the ACT if
you use ve Vector edings you don't want
to capture the actual Vector embeddings
because they're massive and like you're
not going to be able to interpret them
but you want to distill that down a
little bit uh there's also other
contextual things so like for example in
in our application um each request that
a user Make may be different to other
requests so like if you're talking to a
different data set that's a different
schema involved so you want to capture
some information about what that what
what's actually going on there so you
can distinguish between what are my
errors for this data set versus what are
my errors for this data set and like do
we perform better or one or the other
and does that tell us like okay is that
a problem with our prompting or is that
a problem with how we do retrieval
across different data sets um another
thing is there's other like specific
stuff that you pull in so a very common
prompting technique is called um like
few shot prompting where you sort of
embed little examples inside of the
prompt that you send as a part of a
request uh you can actually create a
database of those examples of like
well-known okay given some like
representation of what retrieval data
looks like a user's input and what the
ideal output for that input should look
like based off of that data you can
build up a whole a whole database of
that you can also do search techniques
on which pieces of that you actually
pull in on a pro request basis so if you
have like 50 fuse shot examples that are
all like generally really good which
three to five are going to be the most
helpful for this specific request
capture that information and then like
basically what you end up with is you
end up with like a really really big
grouping like imagine like a big CSV
with just tons and tons of columns and
you're like okay for each request here's
all the stuff that was interesting about
that and now you get into like okay what
are the patterns in each of those um
user behaviors uh fun fact if you are
working with an ml engineer they're
going to want that CSV uh and they're
going to want as many columns in it as
possible because that's going to help
their job if they're building like
evaluation sets it's going to make them
more um like I don't know I've talked
with a bunch of ml engineers and they're
like please load that CSV up with as
much data as you possibly can like air
on the side of too much dat data because
it's probably not even enough um so like
I don't know that that hopefully that's
helpful um well gigabytes per
hour I don't know it kind of depends on
the application I would say that first
chances are your prompts don't need to
be as big as they are and your responses
may not necessarily be that big either
um but like this I think is not too
different from any other observability
problem regarding sampling like chances
are that for some system there's going
to be some like Paro distribution of
like the kinds of inputs that people
actually want to ask about like if it's
a natural language tool for like
Prometheus for example um like 80% of
the questions that people are going to
ask are going to follow a pretty similar
kind of pattern and so you could sample
that much more aggressively than others
and so there's ways that you could
actually detect that um there are other
like to be frank like some observability
systems are a lot cheaper than others
and like it's a great opportunity to be
like oh wow maybe my bill's a little too
high right now and um maybe per gigabyte
pricing is not the right pricing scheme
for what I'm trying to deal with um I
think it kind of depends there but like
I don't think we're really at a point
where we're going to be limited by that
unless you're at the like Amazon
Microsoft scale of like oh I have a
million users who are doing this well I
don't know it's just going to be
expensive operating at that scale is
expensive um I think today yes there's
like there is certainly some active
research being done around like true
debuggability into these things but like
I I I think some of that could also just
end up being like incomprehensible where
like a model like gbd 3.5 even is just
there's so many like activations of
different layers that are going on that
like it that may not even be helpful um
or it may just be too hard to sort of
wrangle around now I know that there are
certain um things that you can do like
you can um you can ask it to generate
multiple responses and you have a system
that picks which response you want and
there's these there's these things that
are called log probabilties that assign
like okay the the probability of like
this token like this and and it'll
basically say like here's like a set of
tokens that we were going to generate
and these were the probabilities that
were assigned to them and that's why
this one was was was chosen now it
doesn't tell you the actual
decision-making process that led into
that but that can inch you a little bit
closer to that um to be honest I've not
really run into anyone who's like really
used that stuff a whole lot uh like I
know that it exists but like it's um
you're you're getting pretty
sophisticated and you're debugging it at
that point and I would say that most
people are just not at that point yet um
if ever um I think like also it's like
regular observability of systems is like
well sometimes we're working with stuff
that are black boxes that kind of suck
um sometimes and do weird stuff in
production that you can never reproduce
locally uh and then you are still still
in this place of like okay like what
patterns are leading to these outputs
and what can I do with that info um and
uh I think we're there right now
now um we might get somewhere in the
future where you could more like
fine-tune debug something but um
probably not for a while
Voir Plus de Vidéos Connexes
GopherCon 2020: Ted Young - The Fundamentals of OpenTelemetry
Generative AI For Developers | Generative AI Series
How to Become an AI Product Manager - AI PM Community Session #42
Telemetry Over Events: Developer-Friendly Instrumentation at American... Ace Ellett & Kylan Johnson
Transformes for Time Series: Is the New State of the Art (SOA) Approaching? - Ezequiel Lanza, Intel
OpenTelemetry for Mobile Apps: Challenges and Opportunities in Data Mob... Andrew Tunall & Hanson Ho
5.0 / 5 (0 votes)