RAG for long context LLMs
Summary
TLDRThe talk explores the evolving role of Retrieval-Augmented Generation (RAG) in the context of increasing context window sizes in language models. It discusses the phenomenon of 'context stuffing' and its limitations, particularly in retrieving and reasoning over multiple facts within a large context. The speaker presents experiments and analyses that highlight the challenges of retrieving information from the start of the context and suggests that RAG will continue to evolve, potentially moving towards document-centric approaches and incorporating more sophisticated reasoning mechanisms.
Takeaways
- π Context windows for language models (LMs) are increasing, with some proprietary models surpassing the 2 trillion token regime.
- π§ The rise of larger context windows has sparked a debate on the relevance of retrieval-augmented generation (RAG) systems, questioning if they are still necessary.
- π RAG involves reasoning and retrieval over chunks of information, typically documents, to ground responses to questions.
- π Experiments show that as the context window grows, the ability to retrieve and reason about information (needles) decreases, especially for information at the start of the context.
- π€ The phenomenon of decreased retrieval performance with larger context windows may be due to a recency bias, where the model favors recent tokens over older ones.
- π« There are concerns about the reliability of long context LMs for retrieval tasks, as they may not guarantee the quality of information retrieval.
- π‘ The future of RAG may involve less focus on precise chunking and more on document-centric approaches, using full documents or summaries for retrieval.
- π New indexing methods like multi-representation indexing and hierarchical indexing with Raptor provide interesting alternatives for document-centric RAG systems.
- β»οΈ Iterative RAG systems, which include reasoning on top of retrieval and generation, are becoming more relevant as they provide a more cyclic and self-correcting approach.
- π Techniques like question rewriting and web searches can be used to handle questions outside the scope of the retrieval index, offering a fallback for RAG systems.
- π The evolution of RAG systems is expected to continue, incorporating long-context embeddings and cyclic flows for improved performance and adaptability.
Q & A
What is the main topic of Lance's talk at the San Francisco meetups?
-The main topic of Lance's talk is whether the Retrieve-And-Generate (RAG) approach is becoming obsolete due to the increasing context window sizes of large language models (LLMs).
How has the context window size for LLMs changed recently?
-The context window size for LLMs has been increasing, with state-of-the-art models now able to handle hundreds to thousands of pages of text, as opposed to just dozens of pages a year ago.
What is the significance of the 'multi-needle' test conducted by Lance and Greg Cameron?
-The 'multi-needle' test is designed to pressure test the ability of LLMs to retrieve and reason about multiple facts from a larger context window, challenging the idea that LLMs can effectively replace RAG systems.
What did the analysis of GPD-4 with different numbers of needles placed in a 120,000 token context window reveal?
-The analysis revealed that the performance or the percentage of needles retrieved drops with respect to the number of needles, and it also gets worse if the model is asked to reason on those needles.
What is the 'recency bias' mentioned in the talk and how does it affect retrieval?
-The 'recency bias' refers to the tendency of models to focus on recent tokens, which makes retrieval of information from the beginning of the context window more difficult compared to information near the end.
What are the three main observations from the analysis of the 'multi-needle' test?
-The three main observations are: 1) Reasoning is harder than retrieval, 2) More needles make the task more difficult, and 3) Needles towards the start of the context are harder to retrieve than those towards the end.
What is the 'document-centric RAG' approach mentioned in the talk?
-The 'document-centric RAG' approach involves operating on the context of full documents rather than focusing on precise retrieval of document chunks. It uses methods like multi-representation indexing and hierarchical indexing to retrieve the right document for the LLM to generate a response.
How does the 'Raptor' paper from Stanford propose to handle questions that require information integration across many documents?
-The 'Raptor' paper proposes a method where documents are embedded, clustered, and summarized recursively until a single high-level summary for the entire corpus of documents is produced. This summary is used in retrieval for questions that draw information across numerous documents.
What is the 'self-RAG' paper and how does it change the RAG paradigm?
-The 'self-RAG' paper introduces a cyclic flow to the RAG paradigm, where the system grades the relevance of documents, rewrites the question if necessary, and iterates through retrieval and generation stages to improve accuracy and address errors.
How does the 'corrective RAG' approach handle questions that are outside the scope of the retriever's index?
-The 'corrective RAG' approach grades the documents and if they are not relevant, it performs a web search and returns the search results to the LM for final generation, providing a fallback mechanism for out-of-domain questions.
What are some key takeaways from Lance's talk regarding the future of RAG systems?
-Key takeaways include the continued relevance of routing and query analysis, the potential shift towards working with full documents, the use of innovative indexing methods like multi-representation and hierarchical indexing, and the integration of reasoning in the retrieval and generation stages to create more robust and cyclic RAG systems.
Outlines
π€ Introduction to the Debate on the Relevance of RAG in Large Language Models
The speaker, Lance, introduces the topic of the debate surrounding the relevance of Retrieval-Augmented Generation (RAG) in the context of increasingly large language models (LLMs). He notes the growing size of context windows in LLMs and questions the need for a retrieval system when models can process thousands of pages. Lance discusses the phenomenon of 'context stuffing' and its implications for RAG, highlighting the importance of understanding the limitations and potential of current models in retrieving and reasoning over information.
π Analysis of GPD-4's Performance in Needle Retrieval
Lance presents an analysis of GPD-4's performance in retrieving 'needles' (specific facts) from a larger context. He explains the methodology of placing needles at various intervals within the context and testing the model's ability to retrieve them. The results show a decrease in retrieval performance with an increase in the number of needles and a notable difficulty in retrieving needles placed earlier in the context. Lance also discusses the potential reasons for this phenomenon, such as recency bias, and shares insights from others in the field.
π The Evolution of RAG and the Shift Towards Document-Centric Systems
The speaker discusses the evolution of RAG and the potential shift towards more document-centric systems. He questions the traditional approach of precise chunking and suggests that long context models may change the way we think about RAG. Lance introduces the idea of multi-representation indexing and the Raptor approach for document retrieval, emphasizing the importance of considering full documents and their summaries for efficient information retrieval.
π‘ Enhancing RAG with Iterative Reasoning and Adaptive Retrieval
Lance explores the concept of enhancing RAG with iterative reasoning and adaptive retrieval. He introduces the idea of self-RAG, which involves grading the relevance of documents and using this feedback to improve the generation process. The speaker also discusses the potential of using web searches as a fallback for questions outside the scope of the index, thus making RAG systems more robust and adaptable.
π Future Directions for RAG and Large Context Models
In the concluding part, Lance outlines the future directions for RAG and the use of large context models. He emphasizes the continued relevance of query analysis, document-centric indexing, and iterative reasoning in enhancing RAG systems. The speaker also highlights the importance of balancing performance, accuracy, and latency, and suggests that we will likely see more cyclic and self-reflective RAG pipelines as we move towards more sophisticated language models.
Mindmap
Keywords
π‘Context Windows
π‘Retrieval-Augmented Generation (RAG)
π‘Token
π‘Needle in a Haystack (Hy) Challenge
π‘Recency Bias
π‘Long Context LMs
π‘Multi-Needle Challenge
π‘Reasoning
π‘Latency
π‘Document-Centric RAG
π‘Multi-Representation Indexing
Highlights
Context windows are getting larger for LLMs, with proprietary models reaching over 2 trillion token regime.
State of the art models a year ago were processing 4,000 to 8,000 tokens, which has increased to hundreds of pages with recent models like Claud 3 and Gemini.
The phenomenon of larger context windows has sparked a debate on whether RAG (retrieval-augmented generation) is becoming obsolete.
RAG involves reasoning and retrieval over chunks of information, typically documents, to ground responses in the retrieved content.
Experiments were conducted with Greg Cameron to pressure test the capabilities of LLMs in multi-needle scenarios, which mimic RAG use cases.
Results show that as the number of needles (facts) increases, the performance of retrieval drops, especially when reasoning is involved.
There is a tendency for models to have better retrieval for information closer to the end of the context window, indicating a potential recency bias.
The paper discusses the limitations of context stuffing in large LLMs, emphasizing that there are no guarantees for retrieval.
The future of RAG may involve a shift from precise chunking to more document-centric approaches, using full documents or summaries for retrieval.
Multi-representation indexing is introduced as a method for document retrieval, using document summaries for indexing and retrieval, then passing full documents to the LM.
Raptor, a hierarchical document summarization and indexing approach, is presented as a solution for integrating information across many documents.
Self-RAG is a cyclic flow approach that involves grading document relevance and performing question rewriting or further iterations to improve accuracy.
C-RAG (Corrective RAG) is a method that uses web searches as a fallback when questions are outside the domain of the retriever.
The talk emphasizes the importance of query analysis, routing, and construction in RAG systems, regardless of the LLM context length.
The future of RAG is likely to see more cyclic flows and document-centric indexing, moving away from a naive prompt-response paradigm.
The discussion on recency bias highlights the need for careful consideration of information retrieval mechanisms in LLMs.
The talk concludes with the assertion that RAG is not dead but will evolve alongside improvements in long context LLMs.
Transcripts
hi this is Lance from Lang chain this is
a talk I gave at two recent meetups in
San Francisco called is rag really dead
um and I figured since you know a lot of
people actually weren't able to make
those meetups I just record this and put
this on YouTube and and see if this is
of interest to folks um so we all kind
of recognize that context windows are
getting larger for llms so on the x-axis
you can see the tokens used in
pre-training that's of course you know
getting larger as well um proprietary
models are somewhere over the 2 trillion
token regime we don't quite know where
they sit uh and we've all the way down
to smaller models like 52 trained on far
fewer
tokens um but what's really notable is
on the y axis you can see about a year
ago state of the art models were on the
order of 4,000 to 8,000 tokens and
that's you know dozens of pages um we
saw Claud 2 come out with a 200,000
token model earlier I think it was last
year um gbd4 128,000 tokens now that's
hundreds of pages and now we're seeing
Claud 3 and Gemini come out with million
token models so this is hundreds to
thousands of pages so because of this
phenomenon people have been kind of
wondering is rag dead if you can stuff
you know many thousands of pages into
the Contex window open llm why do you
need a retrieval system um it's a good
question spoke sparked a lot of
interesting debate on Twitter um and
it's maybe first just kind of grounding
on what is rag so rag is really the
process of reasoning and retrieval over
chunks of of information that have been
retrieved um it's starting with you know
documents that are indexed um they're
retrievable through some mechanism
typically some kind of semantic sity
search or keyword search other
mechanisms retrieve doct then pass to an
llm and the llm reasons about them to
ground response to the question in the
retrieve document so that's kind of the
overall
flow but the important point to make is
that typically it's multile documents
and involve some form of
reasoning so one of the questions I
asked recently is you know if longc LMS
can replace rag it should be able to
perform you know multia retrieval and
reasoning from its own context really
effectively so I teamed up with Greg
Cameron uh to kind of pressure test this
and he had done some really nice needle
and the Haack analyses already focused
on kind of single facts called needles
placed in a Hy stack of Paul Graham
essays um so I kind of extended that to
kind of mirror the rag use case or kind
of the rag context uh where I took
multiple facts so I call it multi-
needle um I buil on a funny needle in
the HTO challenge published by anthropic
where they at they basically placed
Pizza ingredients in the context uh and
asked the LM to retrieve this
combination of pizza ingredients I did I
kind of Rift on that and I basically
split the pizza ingredients up into
three different needles and placed those
three ingredients different places in
the context and then ask the LM to
recover those three ingredients um from
the context so again the setup is the
question is whether the secret
ingredients need to build a perfect
Pizza the needles are the ingredients
figs Pudo goat cheese um I place them in
the context at some specified intervals
so the way this test works is you can
basically set the percent of context you
want to place the first needle and the
remaining two are placed at roughly
equal intervals in the remaining context
after the first so that's kind of the
way the test is set up now it's all open
source by the way the link is below so
needs are placed um you ask a question
you prompt LM with with kind of um with
this context in the question and then
produces the answer and now the the
framework will grade the response both
one are you know all are all the the
specified ingredients present in the
answer and two if not which ones are
missing so I ran analysis on this with
GPD 4 and came kind of came up with some
with some fun results um so you can see
on the left here what this is looking at
is different numbers of needles placed
in 120,000 token context window for GPD
4 and I'm asking um gbd4 to retrieve
either one three or 10 needles now I'm
also asking it to do reasoning on those
needles that's what you can see in those
red bars so green is just retrieved the
ingredients red is reasoning and the
reasoning challenge here is just return
the first letter of each ingredient so
we find is basically two things the
performance or the percentage of needles
retrieved drops with respect to the
number of needles that's kind of
intuitive you place more facts
performance gets worse but also it gets
worse if you ask it to reason so if you
say um just return the needles it does a
little bit better than if you say return
the needles and tell me the first letter
so you overlay reasoning so this is the
first observation morax is harder uh and
reasoning is harder uh than just
retrieval now the second question we ask
is where are these needles actually
present in the context that we're
missing right so we know for example um
retrieval of um 10 needles is around 60%
so where are the missing needles in the
context so on the right you can see
results telling us actually which
specific needles uh it our the model
fails to retrieve so we can see is as
you go from a th000 tokens up to 120,000
tokens on the X here and you look at
needle one placed at the start of the
document to needle 10 placed at the end
at a th000 token context link you can
retrieve them all so again kind of match
what we see over here small well
actually sorry over here everything I'm
looking at 120,000 tokens so that's
really not the point uh the point is
actually smaller context uh better
retrieval so that's kind of point one um
as I increase the context window I
actually see that uh there is increased
failure to retrieve needles which you
see can see in red here towards the
start of the
document um and so this is an
interesting result um and it actually
matches what Greg saw with single needle
case as well so the way to think about
it is it appears that um you know if you
for example read a book and I asked you
a question about the first chapter you
might have forgotten it same kind of
phenomenon appears to happen here with
retrieval where needles towards the
start of the context are are kind of
Forgotten or are not well retrieved
relative to those of the end so this is
an effect we see with gbd4 it's been
reproduced quite a bit so I ran nine
different trials here Greg's also seen
this repeatedly with single needle so it
seems like a pretty consistent
result and there's an interesting point
I put this on Twitter and a number of
folks um you know replied and someone
sent me this paper which is pretty
interesting and it mentions recency bias
is one possible reason so the most
informative tokens for predicting the
next token uh you know are are are
present close to or recent to kind of
where you're doing your generation and
so there's a bias to attend to recent
tokens which is obviously not great for
the retrieval problem as we saw here so
again the results show us that um
reasoning is a bit harder than retrieval
more needles is more difficult and
needles towards the start of your
context are harder to retrieve than
towards the end those are three main
observations from this and it may be
indeed due to this recency bias so
overall what this kind of tells you is
be wary of just context stuffing in
large long context LMS there are no
retrieval
guarantees and also there's some recent
results that came out actually just
today suggesting that single needle may
be misleadingly easy um you know there's
no reason
it's retrieving a single needle um and
also these guys I'm I show this tweet
here showed that um the in a lot of
these needle and Haack challenges
including mine the facts that we look
for are very different than um the
background kind of Hy stack of Paul
Graham essays and so that may be kind of
an interesting artifact they note that
indeed if the needle is more subtle
retrieval is worse so I think basically
when you see really strong performing
needle and hyack analyses put up by
model providers you should be skeptical
um you shouldn't necessarily assume that
you're going to get high quality
retrieval from these long contact LMS uh
for numerous reasons you need to think
about retrieval of multiple facts um you
need to think about reasoning on top of
retrieval you need to think about the
subtlety of the retrieval relative to
the background context because for many
of these needle and the hyack challenges
it's a single needle no reasoning and
the needle itself is very different from
the background so anyway those may all
make the challenge a bit easier than a
real world scenario of fact retrieval so
I just want to like kind of lay out that
those cautionary notes but you know I
think it is fair to say this will
certainly get better and I think it's
also fair to say that rag will change
and this is just like a nearly not a
great joke but Frank zap a musician made
the point Jazz isn't dead it just smells
funny you know I think same for rag rag
is not dead but it will change I think
that's like kind of the key Point here
um so just as a followup on that rag
today is focus on precise retrieval of
relevant doc chunks so it's very focused
on typically taking documents chunking
them in some particular way often using
very idiosyncratic chunking methods
things like chunk size are kind of
picked almost arbitrarily embedding them
storing them in an index taking a
question embedding it doing KNN uh
similarity search to retrieve relevant
chunks you're often setting a k
parameter which is the number of chunks
you retrieve you often will do some kind
of filtering or Pro processing on the
retriev chunks and then ground your
answer in those retrieve chunks so it's
very focused on precise retrieval of
just the right chunks now in a world
where you have very long context models
I think there's the a fair question to
ask is is this really kind of the most
reasonable approach so kind of on the
left here you can kind of see this
notion closer to today of I need the
exact relevant chunk you can risk over
engineering you can have you know higher
complexity sensitivity to these odd
parameters like chunk size k um and you
can indeed suffer lower recall because
you're really only picking very precise
chunks you're beholden to very
particular embedding models so you know
I think going forward as long contact
models get better and better there are
definitely question you should certainly
question the current kind of very
precise chunking rag Paradigm but on the
flip side I think just throwing all your
docks into context probably will also
not be the preferred approach you'll
suffer higher latency higher token usage
I should note that today 100,000 token
GPD 4 is like $1 per generation I spent
a lot of money on Lang Chain's account
uh on that multi needle analysis I don't
want to tell Harrison how much I spent
uh so it's it's you know it's not good
right um You Can't audit retrieval um
and security and and authentication are
issues if for example you need different
users different different access to
certain kind of retriev documents or
chunks in the Contex stuffing case you
you kind of can't security as easily so
there's probably some predo optimal
regime kind of here in the middle and um
you know I I put this out on Twitter I
think there's some reasonable points
raised I think you know this inclusion
at the document level is probably pretty
sane documents are self-contained chunks
of context um so you know what about
document Centric rag so no chunking uh
but just like operate on the context of
full documents so you know if you think
forward to the rag Paradigm that's
document Centric you still have the
problem of taking an input question
routing it to the right document um this
doesn't change so I think a lot of
methods that we think about for kind of
query analysis um taking an input
question rewriting it in a certain way
to optimize retrieval things like
routing taking a question routing it to
the right database be it a relational
database graph database Vector store um
and quer construction methods so for
example text to SQL text to Cipher for
graphs um or text to even like metadata
filters for for Vector stores those are
all still relevant in the world that you
have long Contex llms um you're probably
not going to dump your entire SQL DB and
feed that to the llm you're still going
to have SQL queries you're still going
to have graph queries um you may be more
permissive with what you extract but it
still is very reasonable to store the
majority of your structured data in
these in these forms likewise with
unstructured data like documents like we
said before it still probably makes
sense to en to you know store
independently but just simply aim to
retrieve full documents rather than
worrying about these OS syncratic
parameters like like chunk size um and
along those lines there's a lot of
methods out there we've we've done a few
of these that are kind of well optimized
for document retrieval so one I want a
flag is what we call
multi-representation indexing and
there's actually a really nice paper on
this called dense X retriever or
proposition indexing but the main point
is simply this what you do is you take
your OD document you produce a
representation like a summary of that
document you index that summary right
and then um at retrieval time you ask
your question you embed your question
and you simply use a highle summary to
just retrieve the right document you
pass the full document to the LM for uh
kind of final generation so it's kind of
a nice trick where you don't have to
worry about embedding full documents in
this particular case you can use kind of
very nice descriptive summarization
prompts to build descriptive summaries
and the problem you're solving here is
just get me the right document it's an
easier problem than get me the right
chunk so this is kind of a nice approach
it there's also different variants of it
which I share below one is called parent
document retriever where you could use
in principle if you wanted smaller
chunks but then just return full
documents but anyway the point is
preserving full documents for Generation
but using representations like summaries
or chunks for retrieval so that's kind
of like approach one that I think is
really interesting approach two is this
idea of raptor is a cool paper came out
of Stanford somewhere recently and this
solves the problem of what if for
certain questions I need to integrate
information across many documents so
what this approach does is it takes
documents and it it embeds them and
clusters them and then it summarizes
each cluster um and it does this
recursively until end up with only one
very high level summary for the entire
Corpus of documents and what they do is
they take this kind of this abstraction
hierarchy so to speak of different
document summarizations and they just
index all of it and they use this in
retrieval and so basically if you have a
question that draws an information
across numerous documents you probably
have a summary present and and indexed
that kind of has that answer captured so
it's a nice trick to consolidate
information across documents um they
there a paper actually reports you know
these documents in their case or the
leafes are actually document chunks or
slic but I actually showed I have a
video on it in a notebook that this
works across full documents as well um
and that's a nice segue into to do this
you do need to think about long context
embedding models because you're
embedding full documents and that's a
really interesting thing to track um the
you know hazy research uh put out a
really nice um uh blog post on this
using uh with the Monarch mixer so it's
kind of a new architecture that tends to
longer context they have a 32,000 token
embedding model that's pres that's
available on together AI absolutely
worth experimenting with I think this is
really interesting Trend so long long
Contex embeddings kind of play really
well with this kind of idea you take
full documents embed them using for
example long contct edting models and
you can kind of build these document
summarization trees um really
effectively so I think this another nice
trick for working with full documents in
the long context kind of llm regime um
one other thing I'll note I think
there's also going to Mo be move away
from kind of single shot rag well
today's rag we typically you know we
chunk do doents uh uh embed them store
them in an index you know do retrieval
and then do generation but there's no
reason why you shouldn't kind of do
reasoning on top of the generation or
reasoning on top of the retrieval and
feed back if there are errors so there's
a really nice paper called selfrag um
that kind of reports this we implemented
this using Lang graph works really well
and the simp the idea is simply to you
know grade the relevance of your
documents relative to your question
first if they're not relevant you
rewrite the question you can do you can
do many things in this case we do
question rewriting and try again um we
also grade for hallucinations we grade
for answer relevance but anyway it kind
of moves rag from a single shot Paradigm
to a kind of a cyclic flow uh in which
you actually do various gradings
Downstream and this is all relevant in
the long context llm regime as well in
fact you know it you you absolutely
should take advantage of of for example
increasingly fast and performant LMS to
do these great
um Frameworks like lra allow you to
build these kind of these flows which
build which allows you to kind of have a
more performant uh kind of kind of
self-reflective rag pipeline now I did
get a lot of questions about latency
here and I completely agree there's a
trade-off between kind of performance
accuracy and latency that's present here
I think the real answer is you can opt
to use very fast uh for example models
like grock we're seeing um you know gp35
turbos of very fast these are fairly
easy grading challenges so you can use
very very fast LMS to do the grading and
for example um you you can also restrict
this to only do one turn of of kind of
cyclic iteration so you can kind of
restrict the latency in that way as well
so anyway I think it's a really cool
approach still relevant in the world as
we move towards longer context so it's
kind of like building reasoning on top
of rag um in the uh generation and
retrieval stages and a related point one
of the challenges with
is that your index for example you you
may have a question that is that ask
something that's outside the scope of
your index and this is kind of always a
problem so a really cool paper called c
c rag or corrective rag came out you
know a couple months ago that basically
does a grading just like we talked about
before and then if the documents are not
relevant you kick off and do a web
search and basically return the search
results to the LM for final generation
so it's a nice fallback in cases where
um you're you the questions out of the
domain of your retriever so you know
again nice trick overlay reasing on top
of rag I think this trend you know
continues um because you know it it just
it makes rag systems you know more
performant uh and less brittle to
questions that are out of domain so you
know that's another kind of nice idea
this particular approach also we showed
works really well with with uh with open
source models so I ran this with mraw 7B
it can run locally on my laptop using a
llama so again really nice approach I
encourage you to look into this um and
this is all kind of independent of the
llm kind of context length these are
reasoning you can add on top of the
retrieval stage that that can kind of
improve overall performance and so the
overall picture kind of looks like this
where you know I think that the the the
problem of routing your question to the
right database and or to the right
document kind of remains in place queer
analysis is still quite relevant routing
is still relevant queer construction is
still relevant
um in the long context regime I think
there is less of an emphasis on document
chunking working with full documents is
probably kind of more Paro optimal so to
speak um there's some some clever tricks
for IND indexing of documents like the
multi-representation indexing we talked
about the hierarchical indexing using
Raptor that we talked about as well are
two interesting ideas for document
Centric indexing um and then kind of
reasing in generation post retrieval on
retrieval itself to grade on the
generations themselves checking for
hallucinations those are all kind of
interesting and relevant parts of a rag
system that I think we'll probably will
see more and more of as we move more
away from like a more naive prompt
response Paradigm more to like a flow
Paradigm we're seeing that actually
already in code generation it's probably
going to carry over to rag as well where
we kind of build rag systems that have
kind of a cyclic flow to them operate on
documents use lomics llms um and still
use kind of routing and query analysis
so reasoning pre- retrieval reasoning
post- retrieval so one any that was kind
of my talk um and yeah feel free to
leave any comments on the video and I'll
try to answer any questions but um yeah
that's that's probably about it thank
you
Browse More Related Video
Retrieval Augmented Generation - Neural NebulAI Episode 9
Introduction to Generative AI (Day 7/20) #largelanguagemodels #genai
Fine Tuning, RAG e Prompt Engineering: Qual Γ© melhor? e Quando Usar?
"I want Llama3 to perform 10x with my private knowledge" - Local Agentic RAG w/ llama3
Fine Tuning ChatGPT is a Waste of Your Time
Why Longer Context Isn't Enough
5.0 / 5 (0 votes)