Building Corrective RAG from scratch with open-source, local LLMs
Summary
TLDRThe transcript discusses building self-reflective retrieval-augmented generation (RAG) applications using open-source and local models. It highlights the concept of self-reflection in RAG, where the system grades the relevance of retrieved documents and performs knowledge refinement. The speaker introduces 'langra', a tool for implementing these ideas locally, and demonstrates its use with a local LLM and a CPU-optimized embedding model. The process involves creating a local index, grading documents, and using conditional logic to decide whether to perform web searches or generate responses. The transcript emphasizes the potential of logical flows and local models for complex reasoning tasks without the need for an agent.
Takeaways
- 🌟 The concept of self-reflection in RAG (Retrieval-Augmented Generation) is gaining popularity, allowing for more dynamic and relevant information retrieval and generation based on feedback loops.
- 📚 The 'Corrective RAG' (C-RAG) paper demonstrates a straightforward approach to self-reflection, by grading retrieved documents and refining knowledge based on their relevance and correctness.
- 💡 Implementing self-reflective RAG apps can be achieved using open-source and local models, which can run efficiently on a laptop without the need for large-scale, API-gated models.
- 🛠️ The LangChain team has developed a tool called 'langra' which facilitates the implementation of self-reflective RAG using local LLMs (Language Models).
- 🔍 For local information retrieval, the use of no embeddings and GPT for all embeddings from Nomic is suggested for its CPU optimization and effectiveness.
- 📈 The process of building a RAG app involves creating a graph of logical steps, where each node represents a specific operation or function, and the state is propagated through these steps.
- 🔗 The use of AMA (Alama) with the mraw 7B model is highlighted for its ability to run large-scale models locally and its support for JSON mode, which structures the model's output for easy interpretation and flow control.
- 🔄 The concept of logical gates is introduced, where the output from one step (e.g., document grading) determines the next step in the process (e.g., appending relevant documents or performing a web search).
- 🔍 The demonstration showcases a multi-step logical flow in action, including retrieval, grading, web search, question transformation, and generation, all running locally and seamlessly integrated.
- 🚀 The potential of using local models in a constrained, step-by-step manner is emphasized over using them as agent executors, which can lead to more reliable and effective logical reasoning tasks.
Q & A
What is the main focus of the Lang Chain team's discussion?
-The main focus of the Lang Chain team's discussion is building self-reflective rag (retrieval-augmented generation) applications from scratch, using only open source and local models that run strictly on a laptop.
What is the significance of self-reflection in rag research?
-Self-reflection in rag research is significant as it allows the system to perform retrieval based on a question from an index, assess the relevance or quality of the retrieved documents, and perform reasoning to potentially retry various steps, leading to more accurate and refined outputs.
How does the concept of self-reflection improve the rag process?
-The concept of self-reflection improves the rag process by allowing the system to not just perform a single-shot retrieval and generation but to also self-reflect, reason, and retry steps from alternative sources, leading to enhanced accuracy and relevance in the final output.
What is the role of local LLMs in the discussed approach?
-Local LLMs play a crucial role in the discussed approach as they are smaller and more manageable models that run locally on a system, allowing for efficient and fast processing without relying on API-gated, large-scale models.
How does the 'corrective rag' paper contribute to the self-reflection idea?
-The 'corrective rag' paper contributes to the self-reflection idea by demonstrating a method where the system performs retrieval, grades the documents based on relevance, refines knowledge when documents are correct, and performs a web search to supplement retrieval when documents are ambiguous or incorrect.
What is the benefit of using open source tools like AMA and Langra for local model implementation?
-The benefit of using open source tools like AMA and Langra for local model implementation is that they provide an easy, efficient, and seamless way to run models locally, enabling users to leverage powerful machine learning capabilities without the need for extensive infrastructure or API access.
How does the use of GPT for all embeddings from nomic enhance the local indexing process?
-The use of GPT for all embeddings from nomic enhances the local indexing process by providing a CPU-optimized, contrastively trained embedding model that works well locally, ensuring fast and efficient document indexing without relying on external APIs or cloud services.
What is the purpose of the conditional edge in the logical flow of the rag process?
-The purpose of the conditional edge in the logical flow of the rag process is to make decisions based on the output of certain nodes, such as the grading step, to determine the next course of action, like whether to append a relevant document or perform a web search to supplement the retrieval.
How does the JSON mode in AMA help in constraining the output of the local LLM?
-The JSON mode in AMA helps in constraining the output of the local LLM by enforcing a specific output format, such as a binary yes/no score in JSON, which makes it easier to interpret and process the model's output within the logical flow of the rag application.
What is the key takeaway from the discussion on building logical flows using local models and Lang graph?
-The key takeaway is that building logical flows using local models and Lang graph allows for the creation of reliable and efficient rag applications by breaking down the process into a series of logical steps, each performed by the local model, without the need for a complex agent executor. This approach enhances the reliability and manageability of the system.
Outlines
🚀 Introduction to Self-Reflective Rag Apps
Lance from the Lang chain team introduces the concept of building self-reflective rag apps from scratch using open source and local models. He discusses the trend of self-reflection in rag research, where the system performs retrieval based on a question, grades the relevance of documents, and refines its process based on the quality of generations. Lance highlights the importance of feedback and retry mechanisms in self-reflective rag, and introduces the 'corrective rag' paper as an example of this approach. He also mentions the use of Langra for implementing these ideas effectively with smaller local models.
🛠️ Setting Up Local LMS with AMA
Lance explains how to set up local LMS using AMA, a tool that allows for easy model deployment on various platforms. He guides through the process of downloading the AMA application, selecting a model from the list, and preparing the environment for model usage. Lance chooses the mrol instruct model, a 7 billion parameter model, and demonstrates how to pull the model locally using AMA's JSON mode. He also discusses the use of no embeddings for local retrieval and the creation of a local index for performing rag on a specific blog post.
🔍 Building a Retrieval and Grading System
In this section, Lance details the process of building a retrieval and grading system for the rag app. He uses GPT for generating embeddings and chroma, a local vector store, to create a retriever. Lance demonstrates how to retrieve relevant documents based on a query and how to grade these documents using the local llm. He emphasizes the use of JSON mode in AMA for structuring the model's output to facilitate downstream processing in the graph.
📈 Defining the Logical Flow of the Rag Graph
Lance outlines the logical flow of the rag graph, explaining how state is transformed at each node. He describes the state as a dictionary containing keys relevant to rag, such as the question, pen documents, and independent generation. He also discusses the conditional edge in the graph that decides the next step based on the grading results. Lance highlights the convenience of AMA's JSON mode in enforcing structured output for logical reasoning in the graph.
🧠 Implementing Functions for Each Node and Conditional Edge
Lance provides a walkthrough on implementing functions for each node and conditional edge in the rag graph. He explains how each node modifies the state and how functions are defined for retrieval, grading, query transformation, web search, and generation. He demonstrates how the grading function filters relevant documents and triggers web search when necessary. Lance also shows how the decide to generate function acts as a conditional edge, determining the next node to traverse based on the search flag.
🎯 Testing the Rag App with Different Queries
Lance tests the rag app with different queries, one relevant to the context and another not in the index. He shows how the app performs retrieval, grading, and generation for a question about agent memory, which is successfully answered using the blog post index. For a question about Alpha codium, which is not in the context, the app correctly identifies the irrelevance and performs a web search to supplement the answer. Lance emphasizes the reliability of using local models for logical reasoning tasks and the effectiveness of constraining the model to perform specific tasks at each step of the graph.
📚 Conclusion and Encouragement for Local LLM Usage
Lance concludes by encouraging the use of local models for complex logical reasoning tasks. He suggests that for certain problems, a state machine or a graph with a series of logical steps might be more effective than using an agent. He highlights the benefits of constraining the local model to perform small tasks at each step, which he finds more reliable for logical reasoning. Lance assures that the code for the rag app will be shared and encourages others to experiment with this approach.
Mindmap
Keywords
💡self-reflective
💡RAG (Retrieval-Augmented Generation)
💡open source
💡local models
💡Lang Chain
💡knowledge refinement
💡web search
💡query rewrite
💡logical flow
💡state machine
Highlights
Building self-reflective RAG (Retrieval-Augmented Generation) apps from scratch using open source and local models.
Utilizing recent trends in self-reflection within RAG research to improve the quality of document retrieval and generation.
Implementing the idea of self-reflection in RAG by performing retrieval, grading documents, and potentially retrying steps based on relevance and quality.
The introduction of the corrective RAG (C-RAG) paper, which has gained attention and presents a straightforward approach to enhancing RAG.
Using LangChain, a tool developed recently, which works well with smaller, local LLMs (Language Models) and is an alternative to large-scale, API-gated models.
The process of running LMS (Language Models) locally, with a focus on AMA (AI Model Application) as a simple and efficient way to run models on personal devices.
Downloading and using the mraw open source model as a demonstration of how to work with local models for RAG applications.
Creating a local index for RAG using a blog post on autonomous agents and splitting it into chunks for efficient retrieval.
Employing GPT for all embeddings from nomic, a CPU-optimized model that runs locally without the need for an API.
Using Chroma, an open-source local vector store, to facilitate efficient document retrieval and indexing.
Defining a logical flow for RAG that involves a series of steps including retrieval, grading, decision-making, query transformation, web search, and generation.
The use of conditional edges in the logical flow graph, which allows for dynamic decision-making based on the output of previous steps.
AMA's JSON mode is highlighted as a crucial tool for structuring model output in a way that can be reliably interpreted by subsequent steps in the logical flow.
A detailed example of how to build a RAG app using local models, including defining graph states, implementing functions for each node, and connecting nodes through edges.
The demonstration of a multi-step logical flow working effectively with local models, showing the potential for reliable and efficient RAG applications without the need for large-scale models.
The encouragement to consider the use of state machines or graph-based logical flows instead of agent-based executors for certain tasks, as it can be more reliable and manageable.
Transcripts
hi this is Lance from Lang chain team
I'm going to talk about building a
self-reflective rag apps from scratch
using only open source and local models
um that run strictly on my
laptop now one of the most interesting
Trends in the rag research and a lot of
like methods that become pretty popular
in recent U months and weeks is this
idea of self-reflection
so when you do rag you perform retrieval
based upon a question from an index and
this idea of self-reflection is saying
based upon for example the relevance of
the retriev documents to my question or
based upon you know the quality the
generations relative to my question or
the generations relative to the
documents I want to make I want to
perform some kind of reasoning and
potentially feed back and retry various
steps so that's kind of the big idea and
there's a there's a few really
interesting papers that implement this
and what I want to kind of show is that
implementing these ideas using something
that we've developed recently called
langra is a really nice approach um and
it works really well with local llms
that are much smaller for example than
you know API uh gated very large scale
Foundation
models um and so we're going to look at
particular paper called corrective rag
or C rag now this paper is kind of um
there's been some attention on for
example Twitter about this work uh it's
a really neat
paper and the idea is actually pretty
simple and straightforward if you go
down to the figure
here you're going to do perform
retrieval and you're going to grade the
documents relative to the the question
so you're kind of doing a relevance
grading
and there's some theistic like basically
if the documents are deemed correct they
actually do some uh knowledge refinement
where they further strip the documents
to compress relevant chunks within the
documents and retain them um and if the
documents are either deemed ambiguous
relative to the query or incorrect it
performs a web search and supplements
retrieval with the Webster so that's
kind of the big idea but it's a nice
illustration of this General principle
of don't just do rag as kind of like a
you know a singleshot process where you
perform retrieval and then go to
generation you can actually perform
self-reflection and reasoning you can
retry you can uh retrieve from
alternative sources and so forth that's
kind of the big
idea now in our build here we're going
to make some minor simplifications um
here's kind of a layout of the graph
that we're interested in we're going to
perform retrieval and for that we're
going to use no embeddings which run
locally um we're going to build a node
for grading those documents relative to
the question to say are they relevant or
not and if any documents are deemed
irrelevant we'll go ahead and do a query
rewrite web search and we'll s go ahead
to generation based upon the web search
results so that's the
flow now first things first is how do I
get started running LMS locally and and
kind of where do I go and where I often
direct people and what I found to be
really useful is
AMA um it is a really nice way to run
models locally uh for example on your
Mac laptop very easily and they are
launching support for various other
platforms as well um and so basically if
you go to their website it's very simple
you simply download their application um
you can see it's running here on my
machine um and once you have it
downloaded you all you need to do is you
can go to their model list and you can
kind of search around so you can
actually look I think it's sorted by
popularity so you can see mraw obviously
a really interesting open source model
um is kind of one of the top so you can
see it has like 210,000 polls it's one
of the top models I click on this and
where this takes me is a model page I
can look at this
tags uh Tab and this basically shows me
um a bunch of model versions that I can
really easily uh just like download and
run and we'll we'll show how to do that
here very
shortly um what I'm going to do is I'm
going to choose mrol instruct so that is
their 7 billion uh parameter instruct
model and so all I would do I'm going to
go over to my notebook
here so I have an empty notebook and all
I've done is I've already uh done a few
pip installs
and I've also set uh a few environment
variables to use Langs Smith and we'll
see why that's useful later that's
really all I've
done now I'm going to put a note here to
uh for
olama and what I'm going to do is this
AMA pull the model I want and you just
run
that so normally this will take a little
bit because you're actually pulling the
model and typically it's like a couple
gigs I actually already have this model
so it's faster um it's actually already
done but that's really all you do okay
so that's kind of like step one and then
what we're going to do is I'm just going
to create this variable local
llm
um that I am going to yeah so I'm just
going to Define this variable Mr all
instruct because this is the model that
I download using a llama pull miston
struct that's all that's going on here
so this is be the llm I'm going to work
with I've pulled this so it's local on
my system it's available V Lama which is
basically running in the background on
my system and you can see is really
seamless and easy to
use now the first thing I want to do for
this approach is I'm going to call this
um
index so because this was a corrective
rag approach I need an index that I care
about that I'm actually performing rag
on and so here I'm going to use uh a
particular blog post that I like on
agents and we can like pull it up here
and have a look let pull it up over here
actually so this is a pretty neat blog
post on autonomous agents it's like
pretty long and mey so it's kind of like
a good Target for performing retrieval
on lots of details here uh really neat
really detailed blog post so what I'm
going to do is I'm going to load it here
I'm going to split it and I'm going to
use a chunk size of 500 tokens um these
are kind of somewhat arbitrary
parameters you can play with these as
you want the point is here I'm just B
building a a quick local index um so I
load it I split it into chunks now this
is the interesting bit I'm going to use
GPT for all embeddings from nomic which
is let's actually pull up the link here
I had it available here so
these are um you can see right here it
is a CPU uh CPU optimized cont
contrastively trained s basically eser
model um so you can like drill into
sentence Transformers so you can see um
yep so there it is the initial work is
described in our paper espert basically
so the key point is this this is a
locally running CPU optimized embedding
model that works quite well I found um
runs on your system no AP I nothing so
it's pretty nice runs fast so we're
going to go ahead and use that um from
our friends at
nomic and I'm also going to use chroma
which is an open source local Vector
store that's really easy to spin up runs
locally and all I'm doing is I'm taking
my documents um I'm going to define a
new collection taking my embedding model
GPD for all embeddings I'm going to
create a retriever from this so there we
go okay it shows you some uh you some
parameters so cool I have a retri so we
can actually call we can say get
relevant
documents um and I can say something
about like um let's say like agent
memory or something you know let's just
test and okay cool look at that so it's
nice and quick we get a bunch of
documents out that relate to memory so
yeah you can see memory stream like the
documents are are sayane so it looks
like everything's kind of working here
so that's great we have a
retriever
now let's think a little bit about what
we want to do next next so when I do
these kinds of uh kind of logical rag
flows but as graphs first I always try
to lay out the
logic and um let me move this up here I
try to lay out kind of The Logical
steps um and in each logical
step what's happening is I'm
transforming state so in these Gra
really all you're doing is you're
defining a stake that you're just
modifying throughout the flow of the
graph now in this case because we're
we're interested in rag our state is
just going to be a dictionary and that
dictionary you can see I actually kind
of schematically uh laid it out here
it's just going to contain a few keys
that are things relevant to rag it's
going to be like a question then it's
going to be you pen documents to your
dick um and then eventually you're
independent generation so that's really
all that's going on in terms of like how
your State's being propagated through
the graph and at every node you're
making some modification to State that's
the key point so you're basically going
to do you start with a question from the
user you perform retrieval relevant to
the question um you're then going to
grade the documents so you're going to
do a modification of the documents then
you're going to make a decision are they
relevant or not if they're not relevant
um you're going to transform the query
so you modify the question do a web
search the final step is a generation
based upon the D documents so that's
your
flow now what I want to call out here is
there's one very important what we call
conditional Edge where depending upon
the results of the grading step I want
to do one thing or another so I'm going
to make a decision so I want to show you
something that's very
convenient um that we can use with olama
to help us
here so this
is I'm going to kind of make a note here
um to note what I'm going to highlight
so this is AMA Json
mode so the basic logic of that
conditional Edge decide to generate is
going to be something like this I
already have this prompt laid out um but
it's basically going to be I'm going to
take a document and I'm going to take my
question and I'm going to do some kind
of comparison to say is the document
relevant to the question that's really
what I want to do but here's the catch
because I want that edge to process very
particular output either yes or no I
want to make sure that my output is
structured in a way that can reliably be
interpreted Downstream in my in my graph
this is where Json mode from Alama is
really useful and you can see all I do
is now I'm I'm importing chat llama this
is going to reference that local model
that I specified up here Mr instruct
which I've downloaded so I have the
model
locally and I'm just flagging this um
format Json to tell the model to Output
Json
specifically and what I'm going to do in
my prompt here I'm basically saying you
know you're a grer um here's the
documents here's the question and here's
the catch give a binary score yes no um
and provide it as Json with a single key
score and no Preamble no explanation so
I kind of explain in the prompt what I
want and when I call this with Json mode
uh it will enforce that Json is returned
and hopefully with this single key we
expect score and either binary yes no
and when I'm going to run that as a
chain so I'm going to supply that prompt
to my llm and I'm going to then parse
that Json string out into a Json object
which I can work with so let's try that
we're going to try to run this chain we
defined we're going to run retrieval on
here's a here's a question here's our
docs let's grade one of the docs using
basically passing question and one
document and we're going to take the
page content from the document which is
like basically all the text and we're
going to run
this so let's test that quickly and it
is still running now it's finished let's
check the output here we can see so we
get a Json back which just is the score
yes no so that's exactly right that's
what we want and we can actually look
under the hood here at
um yeah so we can actually look under
the hood in Langs Smith at that grading
process and we can see here that our
prompt got populated with um the context
so here is the
document um
and um right here was a question here is
a document and um the task was of course
to grade it so we can see here's like
the full prompt you're a grader
assessing the relevance retri document
here's a document and then here's the
model output score yes so this is really
nice we've basically enforced the output
from our local
llm um using Json mode so we know every
time it's going to Output binary yes no
score as a Json object which we again
extract so that's a very key point that
I just wanted to flag it's a very nice
thing that ama offers that's extremely
helpful when building out uh
particularly these kind of logical
graphs where you really want to
constrain the flow at certain
edges so that's kind of the like really
key thing I wanted to highlight a lot of
the rest of this is actually pretty
straightforward so let's now Define our
graph State this is the dictionary that
we're going to basically pass between
our nodes so this is just some code I'm
going to copy over
this is defining your graph State you're
just saying it's a dict that's all
there's really to that um now here is
where I'm going to copy over some code
that basically implements a function for
every node and every conditional Edge in
our graph so if you remember we can kind
of go over and look our graph is laid
out like this and all we're doing is for
every node drawn we're going to find a
corresponding function here which
performs some operation so retrieve is
basically just doing we had our retri
defined get relevant documents and write
them out to state so again we take a
question in so if you look here we
basically have this state dict passed
into the function we extract the state
dict here uh we extract the question
from the state dict we do retrieval and
we write that state dict back out to the
so you think about every node is just
doing some modification on the state
reading it in doing something writing it
back out that's really all that's going
on and we can really just march across
our little like diagram here and see how
um basically each one of these nodes is
implemented as a function and again you
can see in every case we're using uh for
example cadow llama in some of these
cases we don't need Json mode so if
we're just doing like a generation step
um as you can see here we don't need
Json mode for the grading we do so we're
actually going to implement here the
same thing we just showed um chat AMA
using Json mode and what's going to
happen is we can see we generate our
score every time and then we can extract
our grade from that
Json and then we know the grade is going
to constrained to the output yes or no
then here's the key point we do some
logical reasoning on that um to say for
example if the grade is yes um then
we're going to um like append the
document it's relevant if not then what
we're going to do is we're going to
filter that document out and we're also
going to set this flag to search perform
web search as yes so what really
happening here is we are kind of
applying a kind of a logical gate to say
if any document is scored as relevant
then we just add it to our final list of
filter documents if not we're going to
go ahead and do a web search and we're
going to set the search flag to be yes
and we're not going to include that
document in the output and you can see
here we return a dictionary which
contains our filter documents our
question and then that flag to run web
search yes or no you can see it was
default no but if we ever encounter an
irrelevant document we change that to
yes so that's really all that's going
going on here um you can see we do our
queer transform down here again we just
use um Mr all again here is like a a
transform prompt but you kind of get the
idea um web search node we use Tav web
search here it's really kind of a nice
quick way to perform web searches um and
you can see we just supplement the
documents with the web search results
and then this was kind of the final step
where we wrote out yes or no to our
search key and depending upon the state
which we can read in here we make
decision to uh basically either return
transform query or return generate which
will basically that's determining the
next uh node to go to um so this decide
to generate is our conditional Edge
that's actually right here and so it's
looking at the results that we wrote out
from grade documents in particular
that uh search yes or no key in our dict
and it's then going to basically
determine the next node to Traverse to
that's really all we're doing here so
that's kind of nice now what we're going
to do is we kind of copied over all
these um these functions we then can go
ahead and run that and now we just lay
out our graph so again
our graph was kind of explained here and
here's where we actually just lay out
the full kind of graph
organization um how we're going to
connect each node so we add the nodes
first we set our entry point and then we
add the edges accordingly between the
nodes and basically the logic here just
Maps over to our diagram here that's
really all that's
happening
um
cool
so I'm going to go ahead and go
down and now let's kind of see this all
working together so I'm going to go
ahead and compile My
Graph and I'm going to go ahead and ask
a question explain how the different
types of agent memory work and what I'm
going to do let's go back to our D
diagram so we can kind of reference that
I'm going to call this and I'm actually
just going to like this will like
Traverse every step along the away and
it'll print out something to explain
what's happening so you can see I
perform retrieval and now I'm doing my
grading steps and this is all running
locally um and they were all deemed
relevant so then I'm going to go ahead
and
generate and it's running right
now and there we go so we can go over to
Lang Smith and let's actually have a
look at what happened under the hood so
this is what just ran so we can see that
at each one of these steps
we called Shadow llama with our mraw 7B
model that's running
locally um and this is our grading step
so this was each document being graded
um so again like look at this so it
outputs a binary score yes no as a dict
that's great um so this has a bunch more
down here so these are all of our
documents uh graded and now here is that
final llm call which basically packed
that all into our rag prompt you're an
assistant for question answering task
use a following to answer the question
here's all up our docs here's the answer
so that's pretty cool um we can see that
this uh multi-step logical flow all
works um now let's try something kind of
interesting I'm going to ask a question
that I know is not in the context and
see if it will kind of perform that
default to do web search so um I'm going
to say Explain how how uh
Alpha codium works so this is a recent
paper that came out that's not relevant
at all to this blog post so I know uh
that retrieval should not be considered
relevant and let's go ahead and run that
and convince oursel that that's true so
good this is perfect so the greater is
determining these documents are not
relevant and so it should be making that
decision to perform web search so it it
should be kind of going to this lower
branch
transform the query run web search and
looks like that all ran so it tells us
Alpha coding is an open source AI coding
generation tool developed by Cod M this
is perfect that's exactly what it is and
we can actually go into Langs Smith and
again see what happened here so you can
see here the trace is a little bit more
extensive because all of our grades are
incorrect so or irrelevant again we get
the nice Json
out um
and okay so this is pretty cool so this
was our question rewriting node so
basically provid an improved input
question without any Preamble so what is
the mechanism behind Alpha codium
functionality so it modifies the
question we use Tali search right here
so it basically does retrieval it
searches for Stuff related to Alpha
codium so that's great and then we
finally passed that to our our model for
Generation based on this new context and
there we go Alpha codom Source AI code
assistant tool um so that kind of gives
you the main idea and the key point is
this is all running locally again I used
GPT for all embeddings for indexing up
at the top right here and I used AMA
with mrol 7B instruct um and Json mode
for that one crucial step where I need
to constrain the output to be kind of a
score of yes no um and for other things
I just use the model without Json mode
to do perform Generations like to
question rewrite or to do the final
generation so in any case I hope this
gives you kind of an overview of how to
think about building logical uh flows
doesn't have to be rag but rag is a
really good kind of use case uh for this
using local models and Lang graph and
the thing I want to kind of leave you
with is there is a lot of interest in
complex logical reasoning using local
llms and a lot of you know focus on
using agents and I do want to kind of
encourage you to think about depending
on the problem you're trying to solve
you may or may not actually need an
agent it's possible that kind of
implementing a state machine or a graph
kind of as shown here with some series
of logical steps this can incorporate
Cycles or Loops back to like prior
stages we have some more complex
examples that show that um this actually
can work really well with local mod
models because a local model is only
performing a step um within each node so
you're kind of constraining it to like
just do this little thing just do this
little thing like just rewrite the
question just grade the document rather
than using the local llm um as like you
know an agent executor that has to make
all these decisions kind of jointly um
or kind of in a less controlled workflow
where for example like the the The
Ordering of these various tasks can be
determined arbitrarily by the agent here
we really nicely constrain The Logical
flow and let the local model just do
little tasks at each step and I've just
found it to be a lot more reliable and
really useful for these kinds of like
logical reasoning tasks um so hopefully
this is helpful give it a try um and
we'll make sure all this code is is
easily shared thank thank
you
تصفح المزيد من مقاطع الفيديو ذات الصلة
Llama-index for beginners tutorial
Perplexity CEO explains RAG: Retrieval-Augmented Generation | Aravind Srinivas and Lex Fridman
Introduction to Generative AI (Day 10/20) What are vector databases?
Chat With Documents Using ChainLit, LangChain, Ollama & Mistral 🧠
Why Everyone is Freaking Out About RAG
"I want Llama3 to perform 10x with my private knowledge" - Local Agentic RAG w/ llama3
5.0 / 5 (0 votes)