Self-reflective RAG with LangGraph: Self-RAG and CRAG
Summary
TLDRLance 从 Lang Chain 介绍了如何利用 Lang 图构建多样化和复杂的响应式问答(RAG)流程。他首先概述了基本的 RAG 流程,包括问题检索、文档传递给大型语言模型(LLM)生成答案。然后,他探讨了在实践中遇到的问题,如何时基于问题上下文进行检索,检索到的文档是否合适,以及如果不合适应如何处理。Lance 引入了主动 RAG 的概念,即 LLM 根据现有检索或生成结果决定何时何地进行检索。他讨论了在 RAG 应用中对 LLM 控制的不同级别,并引入了状态机的概念,允许 LLM 在指定所有可用转换的情况下选择不同步骤。Lance 展示了如何使用 Lang Graph 实现状态机,通过构建图来实现更复杂和多样化的 RAG 流程。他以一篇名为 CAG(Corrective RAG)的论文为例,演示了如何实现一个包含检索、文档评级、基于评级生成答案或从外部源检索的复杂流程。Lance 还展示了如何使用 Lang Smith 观察生成的追踪,提供了对整个流程的清晰视图。最后,他鼓励观众尝试使用 Lang Graph 进行流程工程,并关注即将发布的关于实现自适应 RAG 和 C RAG 的博客文章。
Takeaways
- 📚 基本的RAG(Retrieval-Augmented Generation)流程从检索相关问题文档开始,然后通过LLM(Large Language Model)生成答案。
- 🤔 在实践中,会遇到多种类型的问题,如何时基于问题上下文进行检索,检索到的文档是否合适,以及如果不合适是否应该丢弃并改进问题后重新检索。
- 🔄 主动RAG的概念,即LLM根据现有检索或生成结果决定何时何地进行检索。
- 🎛️ 在RAG应用中,可以对LLM进行不同级别的控制,包括选择单步输出、路由决策以及构建更复杂的逻辑流程。
- 🤖 状态机的概念,允许LLM在RAG流程中选择不同步骤,同时指定所有可用的转换。
- 📈 Lang Chain最近发布的Lang Graph提供了一种构建状态机的好方法,可以用于RAG和其他应用。
- 🔍 通过CAG(Corrective Retrieval-Augmented Generation)论文介绍了一种主动RAG方法,该方法结合了多种想法,如文档检索、评级、生成答案以及知识精炼。
- 📝 演示了如何使用Lang Graph实现状态机,包括简化流程、使用搜索引擎补充输出以及查询优化。
- 📈 通过Lang Smith可以观察到RAG流程的每一步,包括检索、评级、文档过滤和生成答案。
- 🛠️ 强调了“流程工程”的重要性,即在构建复杂逻辑流程时,需要仔细思考每个阶段的状态变化。
- 📝 讨论了如何通过Lang Graph实现复杂的RAG逻辑流程,并强调了其在逻辑推理和工作流程构建方面的优势。
- 🔗 提到了即将发布的一篇博客文章,将讨论使用状态机和Lang Graph实现自适应RAG和CAG的两种不同主动RAG方法。
Q & A
什么是基本的RAG流程?
-基本的RAG流程包括从索引中检索与问题相关的文档,将这些文档传入大型语言模型(LLM)的上下文窗口,以生成基于检索文档的答案。
在实践中,我们如何处理不同类型的问题?
-在实践中,我们可能会遇到几种不同类型的问题,例如何时根据问题的上下文进行检索,检索到的文档是否足够好,如果不好,我们是否应该丢弃它们,以及我们如何通过改进问题来重新尝试检索。
什么是主动RAG?
-主动RAG是一种过程,其中大型语言模型(LLM)根据现有的检索或生成结果来决定何时何地进行检索。
在RAG应用中,我们如何控制LLM?
-在RAG应用中,我们可以通过几种不同的方式控制LLM,包括使用LLM选择单个步骤的输出,使用路由来决定问题应该路由到向量存储还是图数据库,或者构建更复杂的逻辑流程,让LLM在不同步骤之间进行选择,同时指定所有可用的转换,这被称为状态机。
Lang Graph是什么,它如何帮助构建状态机?
-Lang Graph是一种工具,它提供了一种很好的方式去构建RAG和其他应用的状态机。它允许用户构建更多样化和复杂的RAG流程,并将它们实现为图形,这有助于更广泛地理解流程工程,即思考所需的工作流程并实现它。
CAG(Corrective RAG)是什么,它如何实现主动RAG?
-CAG是一种用于主动RAG的方法,它结合了几个不同的概念。首先进行文档检索,然后对它们进行评分。如果至少有一个文档的相关性超过阈值,就会进行生成。如果所有文档都不符合标准,它会从外部源检索,使用网络搜索,并将搜索结果作为生成答案的上下文。
如何使用Lang Graph实现CAG?
-使用Lang Graph实现CAG涉及到定义状态、创建节点和条件边。首先决定是否有文档相关,然后进行网络搜索以补充输出。接着,定义一个状态,这是一个将在图中传递和修改的核心对象。通过定义函数来修改每个节点的状态,实现检索、评分、条件决策、查询转换和生成等步骤。
在Lang Graph中,状态是如何被修改的?
-在Lang Graph中,状态是一个字典,它包含了与RAG相关的内容,如问题、文档、生成等。在图的每个节点上,通过定义一个函数来修改状态,例如在检索节点上,通过检索函数将检索到的文档添加到状态中。
Lang Smith是什么,它如何帮助我们?
-Lang Smith是一个平台,它允许用户记录和检查Lang Graph的运行情况。通过设置API密钥,所有的生成都会被记录在Lang Smith中,用户可以查看节点、评估结果以及每一步的输出,这有助于直观地检查和理解整个RAG流程。
如何通过Lang Graph进行流程工程?
-通过Lang Graph进行流程工程涉及仔细思考整个工作流程,然后实现它。这包括定义状态、创建节点和条件边,并为每个节点定义函数来执行所需的状态修改。通过这种方式,可以构建出编码了更复杂逻辑推理工作流程的清晰、良好工程化的图形。
Lang Graph在构建复杂逻辑工作流时有什么优势?
-Lang Graph在构建复杂逻辑工作流时的优势在于它能够清晰地指定所有想要执行的转换,并且每个节点都被明确地枚举出来,这在使用其他更复杂的推理方法时并不总是可能的。此外,Lang Graph的使用非常直观,有助于理解和构建逻辑工作流。
如何使用Lang Graph来优化网络搜索?
-使用Lang Graph来优化网络搜索涉及到在状态中设置一个搜索值,并根据文档的评分结果来决定是否执行网络搜索。如果所有文档都不相关,就会触发网络搜索,并将搜索结果添加到上下文中,用于生成答案。
Outlines
📚 介绍LangChain和RAG流程
Lance从LangChain介绍了如何使用LangChain构建多样化和复杂的RAG(Retrieval-Augmented Generation)流程。首先,他概述了基本的RAG流程,即从索引中检索相关问题文档,然后通过LLM(大型语言模型)生成答案。接着,他讨论了在实践中遇到的几种问题,如何时检索、检索到的文档质量如何,以及如果文档质量不佳应如何处理。Lance引入了主动RAG的概念,即LLM基于现有检索或生成结果决定何时何地进行检索。他还提到了在RAG应用中对LLM的不同控制级别,包括基本案例、路由以及构建更复杂逻辑流程的状态机。Lance强调了LangGraph在构建状态机和RAG流程中的作用,并提出了流程工程的概念,即先规划工作流程,然后实施。
🔍 RAG流程的实现和状态定义
Lance展示了如何使用LangGraph实现CAG(Corrective Retrieval-Augmented Generation)流程,这是一种主动RAG方法,它首先检索文档,然后对文档进行评分。如果至少有一个文档的相关性超过阈值,则进行生成。如果所有文档都不符合要求,则从外部源检索,并使用网络搜索来补充上下文以生成答案。Lance详细介绍了状态的定义,即在LangGraph中传递和修改的核心对象,它是一个包含问题、文档、生成等信息的字典。他通过一个例子说明了状态在每个节点如何被修改,包括检索、文档评分、逻辑决策(基于评分决定是否进行网络搜索或直接生成答案)以及查询转换。
🔗 状态机的逻辑和实现
Lance继续讨论了如何通过LangGraph实现状态机,他强调了逻辑决策的重要性,并展示了如何通过定义函数来修改状态。他解释了如何为每个节点定义函数,这些函数接收状态并进行相应的修改,如检索、生成答案、文档评分等。Lance还提到了如何使用条件边来决定基于评分结果的下一步操作,例如,如果文档不相关,则进行查询转换和网络搜索。他还展示了如何在LangSmith上观察LangGraph运行的结果,包括每个节点的输出和评分。
🚀 LangGraph在RAG中的应用和优势
Lance总结了LangGraph在RAG中的应用,并强调了使用LangGraph进行流程工程的好处。他提到,通过LangGraph,可以以清晰、工程化的方式编码更复杂的逻辑推理工作流程,并明确指定所有想要执行的转换。Lance认为这种构建逻辑工作流的方式非常直观,并且使用LangGraph进行追踪检查也很直观,因为每个节点都被清晰地枚举出来。他鼓励观众尝试使用LangGraph,并提到即将发布的一篇博客文章,该文章讨论了使用状态机和LangGraph实现自适应RAG和CAG两种不同的主动RAG方法。
Mindmap
Keywords
💡Lang图
💡检索(Retrieval)
💡上下文窗口(Context Window)
💡大型语言模型(LLM)
💡主动检索(Active Retrieval)
💡状态机(State Machine)
💡向量存储(Vector Store)
💡文档分级(Document Grading)
💡条件边(Conditional Edge)
💡流工程(Flow Engineering)
💡Lang Smith
Highlights
Lance介绍了使用Lang图构建复杂的RAG流程的基本概念。
基本RAG流程包括问题检索、文档获取和基于这些文档的答案生成。
提出了Active RAG的概念,即让LLM根据已有的检索或生成决定何时何地进行检索。
介绍了更复杂的逻辑流程构建方法,如状态机,以管理不同的RAG操作步骤。
Lang图支持构建用于RAG的状态机,促进复杂流程的实现。
讨论了如何通过Lang图实现一个称为CAG的RAG改进方法。
CAG方法中,首先检索文档,然后评估文档的相关性,如果相关则生成答案。
如果检索到的文档不相关,将从外部源检索新的文档用于答案生成。
演示了使用Lang图简化RAG状态机构建流程的步骤。
详细展示了在Lang图中如何为每个节点定义功能和逻辑流程。
讨论了流程工程的概念,即如何系统地设计和实现复杂的工作流。
通过Lang图监控RAG操作的实际运行和追踪结果。
提供了即将发布的博客文章的预告,讨论了使用Lang图实现不同RAG方法的细节。
强调了通过Lang图进行流程审查的直观性和效率。
鼓励使用Lang图进行流程工程,以优化和简化复杂逻辑操作。
展望了Lang图在RAG流程创新中的潜力和未来应用。
Transcripts
hi this is Lance from Lang chain I'm
going to be talking about using Lang
graph to build a diverse and
sophisticated rag
flows so just to set the stage the basic
rag flow you can see here starts with a
question retrieval of relevant documents
from an index which are passed into the
context window of an llm for generation
of an answer grounded in the ret
documents so that's kind of the basic
outline and we can see it's like a very
linear path
um in practice though you often
encounter a few different types of
questions like when do we actually want
to
retrieve based upon the context of the
question um are the retrieve documents
actually good or not and if they're not
good should we discard them and then how
do we loot back and retry retrieval with
for example an improved
question so these types of questions
motivate an idea of active rag which
which is a process where an llm actually
decides when and where to retrieve based
upon like existing
retrievals or existing
Generations now when you think about
this there's a few different levels of
control that you have over an llm in a
rag
application the base case like we saw
with our chain is you just use an llm to
choose a single steps output so for
example in traditional rag you feed it
documents and it decides the generation
so it's just kind of one step now a lot
of rag workflows will use the idea of
routing so like given a question should
I route it to a vector store or a graph
DB um and we have seen this quite a
bit now this newer idea that I want to
introduce is how do we build more
sophisticated logical
flows um in a rag
pipeline um that you let the llm choose
between different steps but specify all
the transitions that are
available and this is known as we call a
state
machine now there's a few different
architectures that have emerged uh to
build different types of radic chains
and of course chains are traditionally
used just for like very basic graph but
there's no State machine is a bit newer
and Lang graph which we recently
released provides a really nice way to
build State machines for Rag and for
other
things and the general idea here is that
you can lay out more diverse and
complicated rag flows and then Implement
them as
graphs and it kind of motivates this
more broad idea of of like flow
engineering and thinking through the
actual like workflow that you want and
then implementing it um and we're going
to actually do that right now so I'm
going to Pi uh a recent paper called CAG
corrective rag which is really a nice
method um for active rag that
incorporates a few different
ideas um so first you retrieve documents
and then you grade
them now if at least one document
exceeds the threshold for relevance
you go to generation you generate your
answer um and it does this knowledge
refinement stage after that but let's
not worry about that for right now it's
kind of not essential for understanding
the basic flow here so again you do a
grade for relevance for every document
if any is relevant you
generate now if they're all ambiguous or
incorrect based upon your grader
you retrieve from an external Source
they use web
search and then they pass that as their
context for answer
generation so it's a really neat
workflow where you're doing retrieval
just like with basic rag but then you're
reasoning about the documents if they're
relevant go ahead and at least one is
relevant go ahead and generate if
they're not retrieve from alternative
source and then pack that into the
context and generate your answer
so let's see how we would implement this
as a state machine using Lang
graph um we'll make a few
simplifications
um we're going to first decide if any
documents are relevant we'll go ahead
and do the the web
search um to supplement the output so
that's just like kind of one minor
modification um we'll use search for web
search um we use Query writing to
optimize the search for uh to optimize
the web search but it follows a lot of
the the intuitions of the main paper uh
small note here we set the Tav API key
and another small mode I've already set
my lsmith API key um which we'll see is
useful a bit later for observing the
resulting
traces now I'm going to index three blog
posts that I
like um I'm going to use chrom ADB I'm
going to use open eye embeddings I'm
going to run this right now this will
create a vector store for me from these
three blog
posts and then what I'm going to do is
Define a
state now this is kind of the core
object that's going to be passed around
my graph that I'm going to
modify and right here is where I Define
it and the key point to know note right
now is it's just a dictionary and it can
contain things that are relevant for rag
like question documents generation and
we'll see how we update that in in in a
little bit but the first thing to note
is we Define our state and this is
what's going to be modified in every
note of our
graph now here's really the Crux of it
and this is the thing I want to zoom in
on a little bit um
so when you kind of move from just
thinking about prompts to thinking about
overall flows it it's like kind of a fun
and interesting exercise I kind of think
about this as it's been mentioned on
Twitter a little bit more like flow
engineering so let's think through what
was actually done in the paper and what
modifications to our state are going to
happen in each stage so we start with a
question you can see that on the far
left and this kind of state is
represented as a dictionary like we have
we start with a question we perform
retrieval from our Vector ster which we
just created that's going to give us
documents so that's one node we made an
an adjustment to our state by adding
documents that's step
one now we have a second node where
we're going to grade the documents and
in this node we might filter some out so
we are making a modification to state
which is why it's a node so we're going
to have a
greater then we're going to have what
we're going to call a conditional Edge
so we saw we went from question to
retrieval retrieval always goes to
grading and now we have a
decision if any document is
irrelevant we're going to go ahead and
do web search to
supplement and if they're all relevant
will go to generation it's a minor kind
of a minor kind of logical uh decision
that we're going to
make um if any are not relevant we'll
transform the query and we'll do we
search and we'll use that for Generation
so that's really it and that's how we
can kind of think about our flow and how
our States can be modified throughout
this
flow now all we then need to do and I I
kind of found
spending 10 minutes thinking carefully
through your flow
engineering is really valuable because
from here it's really just
implementation
details um and it's pretty easy as
you'll see so basically I'm going to run
this code block but then we can like
walk through some of it I won't show you
everything so it'll get a little bit
boring but really all we're
doing is we're finding functions for
every node that take in the state and
modify in some way that's all it's going
on so thing about retrieval we run
retrieval we take in state remember it's
a dict we get our state dict like this
we extract one key question from our
dick we pass that to a retriever we get
documents and we write back out State
now with documents key added that's
all generate going to be similar we take
in state now we have our question and
documents we pull in a prompt we Define
an LM we do minor postprocessing on
documents we set up a chain for
retrieval uh or sorry for Generation
which is just going to be take our
prompt pump Plum that to an llm pars the
output to string and we run it right
here invoking our documents and our
question to get our answer we write that
back to State that's
it and you can kind of follow here for
every node we just Define a function
that performs the state modification
that we want to do on that
node grading documents is going to be
the same um in this case I do a little
thing extra here because I actually
Define a pantic data model for my grater
so that the output of that particular
grading chain is a binary yes or no you
can look at the code make sure it's all
shared um and that just makes sure that
our output is is very deterministic so
that we then can down here perform
logical filtering so what you can see
here is um we Define this search value
no and we iterate through our documents
we grade them if any document uh is
graded as not relevant we flag a search
thing to yes that means we're going to
perform web search we then add that to
our state dict at the end so run web
search now that value is true that's
it and you can kind of see we go through
some other nodes here there's web search
node um now here is where our one
conditional Edge we Define right here
this is where we decide to generate or
not based on that search key so we again
get our state let's extract the various
values so we have the search value now
if search is yes we return the next no
that we want to go to so in this case
it'll be transform query which will then
go to web search else we go to
generate so what we can see is we laid
out our graph which you can kind of see
up
here and now we Define functions for all
those nodes as well as the conditional
Edge
and now we scroll down all we have to do
is just lay that out here again as our
flow and this is kind of what you might
think of as like kind of flow
engineering where you're just laying out
the graph as you drew it where we have
set our entry point as retrieve we're
adding an edge between retrieve and
grade documents so we went retrieval
grade documents we add our conditional
Edge depending on the grade either
transform the query go to web search or
just go to generate we create an edge
between transform the query and web
search then web search to generate and
then we also have an edge generate to
end and that's our whole graph that's it
so we can just run
this and now I'm going to ask a question
so let's just say um how does agent
memory work for example let's just try
that and what this is going to do is
going to print out what's going on as we
run through this graph so um um first we
going to see output from
retrieve this is going to be all of our
documents that we retrieved so that's
that's fine this is from our our
retriever then you can see that we're
doing a relevance check across our
documents and this is kind of
interesting right you can see we grading
them here one is grade as not
relevant um and okay you can see the
documents are now filtered because we've
remove the one that's not relevant and
because one is not relevant we decide
okay we're going to is transform the
query and run web
search and um you can see after query
transformation we rewrite the question
slightly we then run web
search um and you can see from web
search it searched from some additional
sources um which you can actually see
here it's
appended as a so here it is so here it's
a new document appended from web search
which is from memory and knowledge
requirements so it it basically looked
up some AI architecture related to
memory uh web results so that's fine
that's exactly what we want to
do and then um we generate a
response so that's great and this is
just showing you everything in kind of
gory detail but I'm going to show you
one other thing that's that's really
nice about this if I go to
lsmith I have my AP I key set so all my
Generations are just logged to to Lang
Smith and I can see my Lang graph run
here now what's really cool is this
shows me all of my nodes so remember we
had retrieve
grade we evaluated the grade because one
was irrelevant we then went ahead and
transformed the query we did a web
search we pended that to our context you
can see all those steps are laid out
here in fact you can even look at every
single uh grader and its output I will
move this up LLY um so you can see that
the different scores for grades okay so
this particular retrieval was graded as
as not relevant so that's fine that that
can happen in some cases and because of
that um we did a query transformation so
we modified the question slightly how
does memory how does the memory system
and artificial agents function so it's
just a minor rephrasing of the question
we did this tly web search this is where
it queried from this particular blog
post or medium so it's like a sing web
query we can like sanity check it and
then what's need is we can go to our
generate step look at open Ai and here's
our full prompt how does the memory
system in our official agents function
and then here's all of our documents so
this is the this is the web search as
well as we still have the relevant
chunks that were atriev from our blog
posts um and then here's our answer so
that's really it you can see how um
really moving from the notion of just
like I'll actually go back to the
original um moving
from uh I will try to open this up a
little
bit
um yeah I can see my face still um the
transition from from laying out simple
chains to
flows is a really interesting and
helpful way of thinking about why graphs
are really interesting because you can
encode more sophisticated logical
reasoning
workflows but in a
very like clean and well-engineered way
where you can specify all the
transitions that you actually want to
have
executed um and I actually find this way
of thinking and building kind of logical
uh like workflows really
intuitive um we have a blog post coming
out uh tomorrow that discusses both
implementing self rag as well as C rag
for two different active rag approaches
using uh this idea of of State machines
and Lang graph um so I encourage you to
play with it uh I found it really uh
intuitive to work with um I also found
uh inspection of traces to be quite
intuitive using Lang graph because every
node is enumerated pretty clearly for
you which is not always the case when
you're using other types of of more
complex reasoning approaches for example
like agents so in any case um I hope
this was helpful and I definitely
encourage you to check out um kind this
notion of like flow engineering using
line graph and in the context of rag it
can be really powerful hopefully as You'
seen here thank
you
Просмотреть больше связанных видео
![](https://i.ytimg.com/vi/LhnCsygAvzY/hq720.jpg)
Chatbots with RAG: LangChain Full Walkthrough
![](https://i.ytimg.com/vi/v9fkbTxPzs0/hq720.jpg)
LangGraph AI Agents: How Future of Internet Search will look like?
![](https://i.ytimg.com/vi/lwe3_x50_uw/hq720.jpg)
【生成式AI導論 2024】第4講:訓練不了人工智慧?你可以訓練你自己 (中) — 拆解問題與使用工具
![](https://i.ytimg.com/vi/UlmyyYQGhzc/hq720.jpg)
Is RAG Really Dead? Testing Multi Fact Retrieval & Reasoning in GPT4-128k
![](https://i.ytimg.com/vi/JLmI0GJuGlY/hq720.jpg?sqp=-oaymwEmCIAKENAF8quKqQMa8AEB-AH-CYAC0AWKAgwIABABGGUgUihPMA8=&rs=AOn4CLDMU6k8dBTOIKV-VhORCAgtRf9cZA)
Python Advanced AI Agent Tutorial - LlamaIndex, Ollama and Multi-LLM!
![](https://i.ytimg.com/vi/rL8X2mlNHPM/hq720.jpg)
Intro to Algorithms: Crash Course Computer Science #13
5.0 / 5 (0 votes)