Mastering Summarization Techniques: A Practical Exploration with LLM - Martin Neznal
Summary
TLDRThe speaker discusses using large language models like GPT for text summarization and other natural language processing tasks. He outlines common issues when deploying these models in production, like poor quality output, instability, and evolving model versions. The talk then covers techniques to improve summarization quality, including data cleaning and prompting properly. Methods of evaluating summary quality are also mentioned. The speaker concludes by describing challenges of scaling multiple production NLP services relying on a single provider's API.
Takeaways
- π The talk focused on using large language models like GPT for text summarization and other natural language tasks
- π Cleaning and processing input text before feeding it into models improves summarization quality
- π‘ Careful prompting, including context, instructions, and examples, significantly impacts model performance
- π There are various methods to evaluate summarization quality, from reference-based to annotation-based
- π€ OpenAI API provides high quality summaries, but has downsides like rate limits, changes and outages
- β± Deploying summarization at scale has challenges around processing speed, errors, and rate limits
- π Regularly evaluating new language models is key to maintain optimal production systems
- π Relying solely on one provider like OpenAI has risks, so backup plans should be considered
- π Managing customer data privacy with third-party models requires transparency and secure pipelines
- π For free alternatives, quality depends on the specific use case and pretrained open source models
Q & A
What were some of the initial challenges faced when deploying large language models into production?
-Some initial challenges were getting low quality or nonsense summaries, figuring out which model works best for each use case, handling instability and outages of models like OpenAI, and dealing with constantly evolving models.
What are two main categories of problems encountered with using large language models?
-The two main problem categories are: 1) Quality of results - unclear how to achieve the best results for each model and use case. 2) ML engineering - issues like outages, instability, and models rapidly evolving over time.
How can preprocessing and cleaning of input text improve summarization results?
-Preprocessing to remove irrelevant text, filter common/uncommon n-grams, select key sentences etc. helps GPT focus on the most salient parts of the document for better summarization.
How does prompting help in generating high quality summaries using GPT?
-Prompting provides critical context about the purpose, reader, expected structure/length. It also includes examples and clear instructions of what content to include/exclude. This guides GPT to produce more accurate summaries.
What are some common methods used to evaluate quality of AI-generated summaries?
-Reference-based (compare to human summary), pseudo-reference based (compare to auto-generated summary of key points), and annotation-based (manually label and validate summary quality).
What are some advantages and disadvantages of using OpenAI APIs in production?
-Advantages are high output quality and model selection flexibility. Disadvantages are low rate limits, instability, frequent changes, and outages.
How often does the author's team evaluate new language models for production use?
-The author's team re-evaluates the landscape of new language models, their quality, APIs, costs etc. every quarter to determine the optimal model for production.
What were some complications faced in deploying summarization models at scale?
-Issues faced were OpenAI API slowness and errors, hitting rate limits quickly, instability requiring model switching, and prioritizing requests across multiple concurrent services and new customers.
What other NLP services does the author's company offer beyond summarization?
-Other services offered are topic and entity summarization, real-time conversational summarization, embedding generation for search, sentiment analysis, and more.
What is the current challenge the engineering team is working on?
-They are working on a middleware to optimally manage and prioritize requests across their various NLP services into the OpenAI APIs to maximize throughput.
Outlines
π Introducing topic of mastering summarization techniques
The speaker introduces the topic of the talk - mastering summarization techniques using large language models. He discusses the hype around large language models and some of the challenges with using them, such as unpredictable quality of results, constantly changing models, and infrastructure/engineering challenges.
π Improving summarization quality with data processing and prompting
The speaker explains two key techniques that helped improve the quality of GPT-generated summaries: 1) Cleaning and processing the input text to remove noise, filter out certain n-grams, etc. 2) Carefully crafting prompts to provide context, instructions, and examples to GPT on what is needed.
π Evaluating quality of AI-generated summaries
The speaker discusses different methods to evaluate the quality of GPT-generated summaries, including: 1) Reference-based methods like BLEU and ROUGE that compare to human summaries 2) Pseudo-reference methods that compare to auto-generated reference summaries 3) Annotation methods where humans manually assess and label summary quality.
π€ Comparing capabilities of different language models
The speaker compares different large language models like GPT, Jurassic, Anthropic Claude etc. in terms of quality, cost, limits etc. For their use cases, OpenAI provided the best balance but they reevaluate models quarterly as the landscape keeps changing.
π Sharing experience deploying summarization in production
The speaker shares challenges faced while deploying GPT summarization in production - dealing with OpenAI outages, rate limits, scaling requests optimally. He discusses the need for a middleware to manage requests across services using OpenAI.
π Discussing current challenges and future work
The speaker concludes by listing their other production services using LLMs (real-time streaming, embeddings etc.) and the challenge of making these services aware of each other to optimize OpenAI API usage. He invites interested folks to join them in solving these problems.
π Wrapping up main points covered in the talk
The speaker wraps up by highlighting the key points covered in his talk: techniques for summarization using GPT, evaluating quality of summaries, comparing different language models, and experience deploying summarization in production.
Mindmap
Keywords
π‘Summarization
π‘GPT
π‘Prompting
π‘Evaluation
π‘Production
π‘OpenAI
π‘Alternative models
π‘Multitask orchestration
π‘Data privacy
π‘Model evolution
Highlights
There is hype around large language models, but actually using them can be challenging
Key problems when using LLMs are unpredictable quality and ML engineering issues
Cleaning and processing input text before feeding it to the LLM improves results
Prompting is critical for getting good predictions from LLMs
Reference-based, pseudo-reference-based, and annotation-based methods can evaluate LLM summary quality
OpenAI APIs have good quality but can have rate limits, changes, and outages
We regularly evaluate new LLM models for production use cases
Deploying LLMs has many real-world complexities to handle
We use GPT-3.5 for most production summarization
Relying solely on one LLM provider has risks
Other LLM models can have advantages for specific use cases
We tell customers openAI does not train on their data
For free summarization would use open source LLM matched to use case
We process customer data separately from openAI
Do not currently evaluate quality of document embeddings
Transcripts
hi
everyone so I prepared for you topic of
mastering summarization techniques
practical exploration uh with large
language
models this talk will be mainly about
summarization but I think it's
applicable not only to summarization but
to many different LLP task and using
summarization I want to basically show
you our story
how we first deployed our uh model into
production using large language
models I would like to start this talk
with the hype around large language
models I suppose that all of you here
saw it know about it you all know that
everyone is talking about it yesterday
no no two days ago open AI had a big
keynote presentation during which they
presented lot of new stuff that is that
is
happening and
so there is super big hype around it but
when when it comes to the actual usage
is it actually that easy to use it can
we just connect something some NP API to
our text and get summaries get topics
get I don't know sentiment get whatever
we
want we've actually started using large
Fage models like almost two years ago we
started with
GPT and we wanted to get summaries
this was our first uh one of the first
examples that we get as a summary that
was a summary of a this is a recipe how
to make scrambled X and but we didn't
feed this document to the to GPT we we
had some feedback document some of our
customers had some problem and we wanted
to summarize this document but GPT gener
generalized the gener generated this
this
summary there are many other problems
with with large Mage model generally
here I group them into two categories or
two fields from the data science point
of view so basically about the quality
of the the
results still I I think it's better than
it was I don't know year or two years
ago but still it's unclear how to get
the best best summaries how to achieve
the best results when you use these
models there are many different uh
models that you can use it's not that
straightforward to know which model is
the best for which which use case and so
on and so on the
type of problems it's related to the ml
engineering for that I just wanted to
show you what happened like uh two or
three hours ago there was a big outage
of open AI it was for one and a half
hours the open AI apis for both GPT gp4
and so on wasn't working at all and this
affected all our services we have like
five different models in Productions and
this is for example what we were getting
this this is just like some internal
dashboard that we have and I'm showing
you uh some examples of uh errors that
we were getting and that we were
basically not able to generate uh
generate these predictions uh into our
uh in our
production back to the
presentation another problem that's
that's in this era is that these models
are constantly changing and evolving
some people are telling you that you
have to deploy your own uh open open
source model and use it in our own
infrastructure even though it's for
example not not the case for you because
these models can if you use open AI it's
cheap so you have to think of it from
these point of
views so this was about the hype now
with the summarization I would like to
tell you basically how we how we are
using GPT for summarization and for
other task in product board I want to
show it on example this is a feedback
document from open AI Community Forum
someone is having problem with openi
website and when we just easily ask uh
GPT to generate uh summary we just say
GPT summarize this this is the summary
that we get it's not optimal because
it's in first person and so on but when
we just like simply work with the prompt
we ask in a in a better way we received
much better summary it's more concise
it's in a third person and so on by the
way can you hear me well I'm not I'm not
sure if yeah
cool so I would like to mention few
steps what what helped us when using GPD
I will not be covering all of them this
would be like a separate topic for like
complete complete talk but I just want
to mention the two most important things
that helped us and that's uh basically
processing and cleaning of the Tex that
we are feeding into
GPT the GPT itself it doesn't require
text in a human readable form so what
you can do is to you can clean the data
process it remove all system text you
can also do some engram filtering remove
some uh engrams that are either
occurring quite often or not that often
and so on or you can apply some more
advanced methods you can uh for example
when when you are facing some
multi-document uh summarization when you
want to let's say that you want to
summarize
thousands of documents and generate One
summary for all these documents you can
for example select the most sent
sentences uh buy some approaches and
generate summaries only for those those
sentences so that was the one thing that
helped us but the second thing and the
thing that really helped us the most
while achieving good summaries but not
only summaries with any prediction that
we we are generating and it was
prompting I think that this is like
quite known fact that prompting is
really important and I just want to
quickly summarize what how we actually
prompting
GPT it is really important to give it
the context so basically tell GPT why
you are actually uh asking it to do some
things and how who should be the the
person who will be reading this output
how it should look like and so on so
give it really clear definition what you
want it to happen what you what you want
it to generate and what you don't want
it to generate uh you can also use some
quotes to separate instructions for
example and an input and if you can also
try some future prompting to give it
some examples for example in this case
with summarization to give it some
examples of good
summaries this is just an example of
what we use in production for generating
uh for generation of some summaries here
uh you see that we are telling it you
are an assistant helping product manager
summarize feedback from various customer
so this is the
context people that are usually reading
our summaries our product managers in
different companies so this should help
GPT to know how the output should look
like and you can also give specific
instruction here this is like quite a
short list but you can have like yearly
tens T tens of different instructions
here in this case we are telling it that
the output should shouldn't be in bullet
ballet points order list and only
maximum two sentences so this is
basically the
idea
so now you kind of know how you can what
you can used to generate good summaries
let's say that you have those summaries
and let's say that you have a system
that is generating tens of thousands of
summaries that can you somehow assess
the quality can you somehow know which
summaries are good which summaries you
don't want to show to customers for that
there are many different methods how you
can achieve it here I prepared a brief
list of these methods the first group of
method is called reference-based
evaluation the goal of this method is
basically that you you generate the uh
summary using the AI method so in this
case we use GPD generate summaries then
we ask some human to actually read this
document and generate the summary and
then we basically compare those two
summaries uh and based on the similarity
we measure how the artificial the
summary generated by AI how good it
is another set of methods is Pudo
reference based evaluation the idea here
is similar but in this case you don't
have to ask some human to generate the
summary what you do is that you generate
the reference summary automatically so
you can generate for example this
reference summary by taking the most
important sentences the most sent
sentences in the uh in the document and
then compare basically those
artificially created
summaries the advantage of it is that it
doesn't require any input from humans
but it's not that precise the last the
last set of methods is annotation based
evaluation the idea is that you B you
generate the summaries and then you read
those summaries you read the input
document and you validate the quality so
you basically assess the quality by some
labels and in the end you know that this
uh you for example have labels that this
that the summaries can be good okay or
bad and in the end you know that you
generate those summaries for one 100
examples you read them and validate and
evaluate them and you can in the end
when you Summit uh to know what is the
quality I will just briefly name some
examples of these methods so the first
two ones from the reference based
evaluation there are methods called blue
and rou these are quite simple methods
they exists like 20 or 30 years I think
and here the idea is that you basically
compare the number of matching words so
you take the words in the reference
summary you take the wordss in the
summary generated by GPT and you uh take
the number of matching words number of
same words and you divided it either by
number of words in the reference summary
or the number of words in the mission
generated summary so these are kind of
like the recall or Precision of of
summary as you can probably imagine
these methods are quite simple uh it
cannot it can only match if the words
are matching and if there is some like
semantically semantic similarity it
cannot match it so it's it's like the
usage is quite limited on the other hand
we have some other method method called
bcore for example example the idea of
this method is that it use embeddings
and then it Compares uh the similarity
of those embeddings the ex the advantage
of these methods is that it's more
precise the results are better but it's
more timec consuming and uh it might
take longer to actually generate those
those similarities and to get overall
the quality of the
summary from the pelo reference based
evaluation what we can do is that you
you somehow can find uh uh these
important sentences in the in the input
document and using that you can generate
the uh the reference summary so you can
use I don't know you can either normally
select the important sentences you can
use text text R uh basically whatever
meod that can find you some important
part of the document and then compare
this important part of the document with
the summary that you generated and uh
use it as the as the uh quality of the
summary as you can probably imagine this
is like not size uh but the advantage is
that it doesn't require any human input
and you can use this to assess quality
of thousands of
summaries the yeah I yeah uh so that was
about the methods this is again area for
like completely whole presentation so I
just briefly describe some of the
methods uh and this I was mentioning for
summaries but I think this is applicable
for any any almost any NLP task when you
want to validate some some qualities of
some some NLP
predictions what we use in in in product
board we actually use a combination of
these of the method of the second and
third
method we use the third methon the the
last method in a case in which we want
to get uh quality of when we want to
assess quality of a new prompt so let's
say that we want to test a new prompt we
have a new variant and for that we use
the the third variant we generate
results based on the new prompt and we
read actually those those summaries We
compare them with the input document and
we label it and then we have some number
in the end and we can compare it with
the current result that is for example
production and replace the new prompt if
if these results are
better the the second the second method
we use is in production when we want to
assess the quality of thousands of
summaries so we generate uh I don't know
thousand thousands 10 tens of thousands
summaries daily and we cannot obviously
read them but we don't want to send to
our customers a recipe how to make
scramber eggs so so that that's actually
when it comes in we have a we are using
this uh this techniques to check
automatically if the summary is talking
about a similar thing as the input
document so that was about generate
generating summaries and how to assess
the
quality now I would like to slightly
move out and go to the large language
model space and basically try to discuss
what what what language model exists I
will like not
not describe all of them I would maybe
say that as you all probably know there
are like open source models or some paid
ones uh here is some list this is
changing basically every day for example
this Gro I think it was announced last
week by by Elon Musk so this is changing
basically every time and when you have
some production jobs or production
models that are using that they
generating LLP predictions you want to
have the best prediction so you need to
evaluate the quality
or the parameters of these models quite
often here I just wanted to show an
example summaries from these models we
generally found this was a year ago so I
don't know if it's still the case but
the summaries for the open source models
were not great for example this is
summary from the from the FC model and
it's like yeah
nonsense and so this this was the list
of models what we used we used as it was
probably visible from my talk we used
open AI for all our uh for all our use
cases it has some pros and pros and cons
the the advantages of openi I would say
are that the the quality of the results
is the best and we can also choose which
models we want to use so if we have
something that we need to be super
precise for that we use
gp4 uh if we have some uh use cases for
which uh we it doesn't have to be super
precise we can use uh GPD five and so on
the other Advantage Advantage is the
that we don't need to serve it on our
own maintain it we are actually like
small team we are only three ml
engineers and we don't have power to
like deploy model in our architecture
take care of it uh especially as usage
of openi is quite cheap and relatively
stable the advant disadvantages of it
are for example rate limits uh if you
want to
use uh open AI to generate like quite a
lot of predictions it's not that not
that easy
always the I think the base the the base
limits for open eye are quite low so if
if you have for example three use cases
in in in production that are using open
AI uh you might strle with eight limits
and that your uh your pipelines will be
in a que and waiting for uh space to
actually gener those predictions the
other problems are that it's changing
basically every day that there there are
outages as we saw that was like one a
few hours ago and uh that's kind of
that's these are kind of the the cons of
of open AI so this basically means that
we are regularly every quarter we sit
together we look at the new models that
were launched we see how we can actually
use them if they have some API what are
the rate limits
what is the performance of the models
what what what is the quality of let's
say summaries and other other
predictions and we want to do this to
achieve to basically have the best
possible modeling production and
currently we are using open AI I would
say that next at least few quarters we
will use it but who knows maybe in a
year we will we can migrate to some new
model in the end I would like to share
with you a story how we deploy
summarization in Wild optimally it it
sounds like an easy thing right that
let's say that there is a there is a
customer uh this customer wants you to
generate thousands of summary they have
thousand of document and they want it
they want those documents to be
summarized so you feed those summaries
into your M pipeline they got processed
the prom is created you feed it into
open Ai and then you send it to
production yeah this is ideal situation
in a real scenario it doesn't work like
that the speed of open I is quite slow
so you cannot generate uh you would like
to generate them in parallel okay this
doesn't work also because what happens
quite often actually with open a is
there are some errors um there is some
like minor outage so some summaries are
not processed so you need to repeat
Sometimes some of the modules is not
working properly so you need to switch
to a different uh GPT model uh there are
those straight limits as I was saying
saying so you cannot feed all these
thousand request in in parallel to GPT
because it would crash so you have to
basically feed them there as soon as
those rate limits are down and there is
some space in the in the rate limits and
uh this is even more complicated when
there is a new customer and then he
comes at the same time and he wants more
and more predictions to be
generated so this was about this was
about summarization I would like to just
briefly describe our current problems
what we have summarization is only one
of the service services that we have uh
we are summarizing uh single documents
we are also summarizing topics some
different uh entities we also have a
realtime streaming service this
streaming service it works kind of like
C GPT when we are communicating with it
uh you can use it to generate let's say
that you have some long document that
you want to generate pain points from
this document uh it's then streaming
these results into into product board we
also have embedding surveys that is
generating embeddings and we are using
those embeddings for semantic search for
topics and others and this this is nice
but then let let's say how you
tell some system to be aware of other
initiative how you how you tell
summarization surveys that it should
wait for some time because there is
something more important in in the
streaming service and some user is
waiting uh for the prediction to be
there while for example summaries don't
have to be there in one second because
uh the user don't mind it much so this
is for example our current um current
challenge that we are uh working on to
basically prepare some some some this
box that will be able to some middleware
that will get the request from all the
services and feed them optimally to to
openi and then into product
board so that that was it for me I here
I have a summary generated by GPD I just
pasted the do into into GPD so I've
discussed summarization techniques using
the GPD
model how to actually evaluate the
quality of summaries uh what are other
other other models that we can use and
what are their disadvantages and
advantages uh and how we actually
deployed it into into uh into
production that was it for me thank you
for your attention and I just want to
tell you that if for example the uh the
problem that I was describing like two
minutes ago is something that you would
like help us to solve uh we are hiring
so let us know and we are looking
forward to thear you thank
you I think now is the now is the time
for
questions
yeah do you have your own Workforce to
evalate summaries when doing the
supervised Evolution or do you Outsource
if few Outsource what do use and how
satisfied are you with the provider yeah
unfortunately we used our own Workforce
uh we ask help our our product managers
our support team
and
we it's it's not that we would be uh
validating like this tens of thousands
of summaries so we are able to manage it
on on our
own I I will go to the next one
which GPT model version are you using
did you do some cost benefit analysis uh
we are using the GPT 3.5 there are
multiple variants of the when it was
launched and I'm not sure which one
which one it is exactly but for majority
of our initiative initiatives we use
this one uh we have uh semantic search
in production that is basically able
to able to you WR write some some some
input it's able to find you a feedback
that is similar to the input that you
write and for that we actually use gp4
because because it help us to really get
us the best like we use uh embeddings
embeddings model to generate embeddings
but when we have those embeddings to
search in in those embeddings we use GPD
4 because it uh it help us to get the
the best
results I hope it's it's it's responding
the question but if you have additional
question to
that yeah I will just repeat a question
if we try to compare basically GPD 4
with uh the
gp35 uh we we were comparing it uh the
performance was the performance was
better but for example in the case for
summar the price is I'm not sure now
because I think the price changed uh on
on Tuesday but it was like 20 times more
expensive to use GPD 4 than GPD 3.5
something like that and so like it
wasn't that the predictions would be
like 20 times better or something like
that so we just we were we were
completely fine with uh with the the
current
version do you think in a day to-day
work people would also benefit from
enhanced promting I mean I think context
concrete request uh I'm not sure if I'm
the right one to to respond to this
question but I think that obviously if
we would be yeah if we would say why we
want some things to happen then probably
yes uh follow a question would people
benefit from introducing a quality
metric like a prompt a prompt to your
colleague does not satisfy enhance
prompt C rephrasing like grammarly for
quality or it would be too much or well
um
yes for Which languages are we providing
the solution so basically majority of
customers that use product board uh they
use it in English but we have customers
that use it only in French Russian maybe
in Czech or I think and for the this
solution that we have it works in all
languages majority of these of of these
uh use cases work in all languages
however the prediction that we generate
is always in English so let's say that
you have some text that you want to be
summarized let's say it's in French we
generate a summary in English so
majority of it it's in English we have
some initiatives in which we don't
support other languages for example it
was
if I'm not sure it
was in sentiment analysis for example
for which we use some prain model that
is in
English open ey models are dominant in
the field due to their brand recognition
and implementation in Asia is there any
risk of relying on a single model
provider yes do different models from
different companies have advantages for
certain use cases like to the to the
first question definitely there is it
was with for example in this outage that
was happening uh few days a few hours
ago like we had no backup and like if
this outage would take two two days yeah
we are kind of in in problems because we
don't have a backup so it does it is a
it is a risk and we are like we have in
mind a way how to solve it but it wasn't
priority for us and we think that this
uh we are like in know in for open I I
would say we are quite a small customer
we are not like I don't know like notion
for example that might be using quite a
lot um so I think that it would be quite
a big problem for open AI so I don't
think that something is like has like
big probability to happen but definitely
we have it in mind and uh we are uh
trying to think about it about the
second
question uh I would say yes it can be
not only about models from different
companies I would say also about these
open Source models when you have some
model that is trained for example to
summarize uh conversations it can
perform in a similar way as open AI for
summarization of conversations like you
cannot use this model to generate
summaries for uh I don't know some other
things or to any other uh any any other
domain but if you use it for uh for
conversations it the results are good so
definitely uh some other models uh have
advantages for some specific use
cases have you have you tried to improve
the prompt using meta prompting for
instance you have a summary score which
you are trying to optimize and you can
include it to get prom and interactive
prom the module conver to converge to
better prompt uh no it's it's a good
questions we good question we haven't
died it uh but but yeah it's a it's a
good question
I don't know how much time do we
have Co how do you manage to process
data from your customers in a private
and safe way considering third party
large language models
so that was one of the main one of the
main problems that our customers had
with large language models uh
they they were not happy as uh for
example openi had some data leaks and so
on what will happen to their data if
openi is not training uh based on their
data So currently we are telling those
customers what open ey basically is
telling publicly that they are not
training on their model using the data
that we use through the
API uh and uh so we are telling this to
the
customers and uh yeah I hope that it's
basically respon to the
question uh when we have some pipelines
that for for example we have thousands
of customers we are always uh processing
data from our customers in a different
environment so we don't Mi data of our
customers how do you evaluate the
quality of document
embeddings good question we are not we
are not doing
it and yeah we are not doing it I would
say the approaches could
be some way what I was mentioning would
be applicable but yeah we are not
evaluating it and uh we don't have yeah
we are not we are not doing
it maybe time for last question cool
cool if paying for service not an option
what free llm would you use for TCH
summarization I would say it
depends
if it depends on the specific use case
if I would if I if I would need to
summarize books or if I would need to
summarize conversations I would use I
don't now recall the name I think I have
some of the on on my slides but I would
use uh
some uh some pretrained open source
model if I would
want to have this model to be General as
open AI I'm not sure and I don't have a
recommendation in in this
field
cool that's it for me thank
you
Browse More Related Video
![](https://i.ytimg.com/vi/TRjq7t2Ms5I/hqdefault.jpg?sqp=-oaymwEXCJADEOABSFryq4qpAwkIARUAAIhCGAE=&rs=AOn4CLDRfTRa4V1hfpCUcJ6VFtfn_zieuA)
Building Production-Ready RAG Applications: Jerry Liu
![](https://i.ytimg.com/vi/6E7GsUST6XY/hq720.jpg)
"More Agents is All You Need" Paper | Is Collective Intelligence the way to AGI?
![](https://i.ytimg.com/vi/69bH4IHZivs/hq720.jpg)
"Next Level Prompts?" - 10 mins into advanced prompting
![](https://i.ytimg.com/vi/_OIq-9dKkbI/hq720.jpg)
Lessons From Fine-Tuning Llama-2
![](https://i.ytimg.com/vi/ZLbVdvOoTKM/hq720.jpg)
How to Build an LLM from Scratch | An Overview
![](https://i.ytimg.com/vi/cfqtFvWOfg0/hq720.jpg)
Why Large Language Models Hallucinate
5.0 / 5 (0 votes)