第一篇: 先进的RAG管道(Advanced RAG Pipeline) 中英文字幕
Summary
TLDR本课程详细介绍了如何使用Llama Index建立基础和高级的检索增强生成(RAG)管道。课程首先解释了基础RAG管道的工作机制,包括摄取、检索和合成三个部分。随后,通过使用True Lens定义一组评估指标,我们可以对高级RAG技术与基础管道进行基准测试。课程还探讨了如何通过使用句子窗口检索和自动合并检索等高级技术来提升性能,并利用True Lens进行评估和基准测试。
Takeaways
- 📚 介绍了如何使用Llama Index建立基础和高级的检索增强生成(RAG)管道。
- 🔍 RAG管道由三个组件组成:摄取(ingestion)、检索(retrieval)和合成(synthesis)。
- 📈 通过True Lens定义一组指标,用于基准测试高级RAG技术与基础管道的性能。
- 📊 展示了如何使用Llama Index和OpenAI的LM来创建一个简单的RAG应用程序。
- 📝 讨论了如何对文档进行分块、嵌入和索引化处理。
- 🔑 介绍了如何使用GBT 3.5 Turbo和Hugging Face BG Small模型进行文档的嵌入和检索。
- 💡 通过True Lens模块初始化反馈函数,创建了RAG评估三元组,包括答案相关性、上下文相关性和基础性。
- 📌 强调了自动化评估(如True Lens)在大规模评估生成AI应用时的重要性。
- 🌟 比较了基础RAG管道和高级检索技术(如句子窗口检索和自动合并检索)的性能。
- 🚀 展示了如何设置和评估句子窗口检索和自动合并检索这两种高级检索技术。
- 📈 提供了一个综合的排行榜,展示了不同检索技术在评估指标上的表现和效率。
Q & A
什么是RAG(Retrieval-Augmented Generation)管道?
-RAG管道是一种结合了信息检索和文本生成的技术,它包含三个部分:信息摄取、检索和合成。通过这种管道,可以对用户查询生成更加丰富和准确的回答。
RAG管道中的'摄取'阶段是做什么的?
-在RAG管道的'摄取'阶段,首先加载一组文档,然后将每个文档分割成多个文本块,并为每个文本块生成嵌入向量,最后将这些嵌入向量存储到索引中。
如何使用Llama Index建立RAG管道?
-通过使用Llama Index,我们可以创建一个简单的LLM应用程序,它内部使用OpenAI的LM。首先,我们需要创建一个服务上下文对象,指定使用的LM和嵌入模型,然后使用Llama Index的Vector Store Index对文档进行索引。
如何评估RAG管道的性能?
-使用True Lens初始化反馈函数,创建一个RAG评估三元组,包括查询、响应和上下文之间的成对比较。这样可以创建三个不同的评估模块:回答相关性、上下文相关性和基础性。
什么是句子窗口检索?
-句子窗口检索是一种先进的检索技术,通过嵌入和检索单个句子来工作,检索后将句子替换为原始检索句子周围的更大句子窗口。这样可以为AI提供更多的上下文信息,以更好地回答查询。
自合并检索器是如何工作的?
-自合并检索器通过构建一个层级结构,其中较大的父节点包含引用父节点的较小子节点。在检索过程中,如果父节点的大部分子节点被检索出来,那么子节点就会被父节点替换,从而实现检索节点的层级合并。
如何设置和使用自合并检索器?
-使用自合并检索器需要构建一个层级索引,并从这个索引中获取查询引擎。可以使用辅助函数来构建句子窗口索引,并在接下来的课程中深入了解其工作原理。
在评估RAG管道时,True Lens的作用是什么?
-True Lens是一个标准机制,用于大规模评估生成性AI应用程序。它允许我们根据特定领域的需求和应用程序的动态变化来评估应用程序,而不必依赖昂贵的人工评估或设置基准。
在RAG管道中,如何提高上下文相关性?
-通过使用更先进的检索技术,如句子窗口检索和自合并检索,可以提高检索到的上下文相关性。这些技术能够提供更多的上下文信息,从而在合成最终回答时,使得回答更加相关和准确。
RAG管道中的'合成'阶段是什么?
-在RAG管道的'合成'阶段,将检索到的上下文块与用户查询结合起来,放入LM的提示窗口中,从而生成最终的回答。
在RAG管道中,如何优化总成本?
-通过提高检索和合成的性能,可以在保持高相关性的同时降低总成本。例如,使用句子窗口检索和自合并检索技术可以提高基础性和上下文相关性,从而在不增加总成本的情况下提高性能。
Outlines
📚 基础与高级RAG管道的搭建
本段落介绍了如何搭建基础和高级的检索增强生成(RAG)管道。首先,通过加载文档并将其分块,使用文本分割器和嵌入模型创建嵌入,然后将这些嵌入存储到索引中。接着,通过用户查询执行检索,获取与查询最相似的文档块,并将其与用户查询结合,用于语言模型的合成阶段,以生成最终响应。此外,还讨论了如何使用Triara建立评估基准,以及如何通过Llama Index和OpenAI的语言模型快速开始。最后,鼓励用户上传自己的PDF文件进行实践,并进行了一些基本的文档检查。
🔍 评估和基准测试
这一部分详细描述了如何使用True Lens初始化反馈函数,创建RAG评估三元组,包括回答相关性、上下文相关性和基础性评估。通过预建的问题列表和True Lens模块的初始化,可以对应用程序进行评估。True Lens作为一种评估机制,允许根据特定领域的需求和应用程序的变化来定制评估标准。此外,还介绍了如何使用True Lens记录器进行评估,并展示了如何通过UI界面查看评估结果,包括回答相关性、上下文相关性和基础性等指标。
🚀 高级检索技术
本段落探讨了两种高级检索技术:句子窗口检索和自动合并检索。句子窗口检索通过嵌入和检索单个句子,然后将句子替换为原始检索句子周围的更大句子窗口,以提供更多上下文信息。自动合并检索则通过构建一个层级结构,将检索节点合并到更大的父节点中。段落中还展示了如何设置这些高级检索技术,并提供了示例查询以及如何使用True Lens评估这些技术的性能。
📈 评估结果与技术比较
在这一部分中,对基础RAG管道和两种高级检索技术(句子窗口检索和自动合并检索)的评估结果进行了比较和分析。通过对比各项指标,如基础性、回答相关性和上下文相关性,以及总成本和延迟,可以得出哪种检索技术更为有效。结果显示,自动合并检索在基础性和上下文相关性方面表现更好,且总成本更低。最后,通过UI界面展示了如何比较不同检索技术的性能。
📖 深入了解评估模块
最后一段概述了在下一课中将深入探讨RAG三元组评估模块:回答相关性、上下文相关性和基础性。这将帮助用户更好地理解如何使用这些模块,并了解每个模块的具体需求。
Mindmap
Keywords
💡rag Pipeline
💡llama index
💡embedding model
💡user query
💡evaluation Benchmark
💡true lens
💡sentence window retrieval
💡auto merging retriever
💡context relevance
💡groundedness
Highlights
本课程提供了如何使用Llama Index设置基础和高级RAG(检索增强生成)管道的全面概述。
RAG管道由三个不同组件组成:摄取、检索和合成。
通过向量数据库等存储系统创建索引视图,可以提高检索的准确性。
使用True Lens定义一组度量标准,以便将高级RAG技术与基础管道进行基准测试。
通过注入阶段,首先加载一组文档,然后将每个文档分割成文本块。
对于每个文本块,使用嵌入模型生成嵌入,然后将嵌入的文本块卸载到索引中。
检索阶段首先对索引发起用户查询,然后检索与用户查询最相似的K个块。
合成阶段将检索到的块与用户查询结合,放入语言模型的提示窗口中,以生成最终响应。
本课程将深入探讨如何使用Llama Index和True Lens设置评估基准。
通过创建简单的LLM应用程序,使用Llama Index内部使用的OpenAI LLM。
使用True Lens初始化反馈函数,创建RAG评估三元组,包括查询响应和上下文之间的成对比较。
ALMs作为评估生成性AI应用的标准机制,允许我们以定制化的方式评估应用。
通过True Lens评估器,可以在仪表板上查看每个查询的输入输出、记录ID标签等详细信息。
高级检索技术,如句子窗口检索和自动合并检索,可以提高检索和合成性能。
句子窗口检索通过嵌入和检索单个句子,然后在检索后用原始检索句子周围的更大句子窗口替换句子。
自动合并检索器通过将检索节点合并到更大的父节点中工作,从而在检索期间实现层次化合并。
通过比较不同检索技术的结果,可以得到一个综合的排行榜,了解各种技术的表现。
Transcripts
in this lesson you'll get a full
overview of how to set up both a basic
and advanced rag Pipeline with llama
index we'll load in an evaluation
Benchmark and use true lens to define a
set of metrics so that we can Benchmark
Advanced rag techniques against the Bas
line or basic pipeline in the next few
lessons we'll explore each lesson a
little bit more in
depth let's first walk through how a
basic retrieval augmented generation
pipeline works or rag pipeline it
consists of three different components
ingestion retrieval and synthesis
going through the injection phase we
first load in a set of documents for
each document we split it into a set of
text trunks using a text sper then for
each Chunk we generate an embedding for
that chunk using an embedding model and
then for each chunk with embedding we
offload it to an index which is a view
of a storage system such as a vector
database once the data is stored within
an index we then perform retrieval
against that index first we launch a
user query against against the index and
then we fetch the top K move similar
chunks to the user
query afterwards we take these relment
chunks combine it with the user query
and put it into the prompt window of the
LM in the synthesis phase and this
allows us to generate a final
response this notebook will walk you
through how to set up a basic and
advanced rag Pipeline with llama index
we will also use triara to help set up
an evaluation Benchmark so that we can
measure improvements against the
Baseline
for this quick start you will need an
open API key note that for this lesson
we'll use a set of helper functions to
get you s and running quickly and we'll
do a deep dive into some of these
sections in the future
lessons next we'll create a simple llm
application using llama index which
internally uses an open AI llm in terms
of the data source we'll use the how to
build a career in AI PDF written by
Andro note that you can also upload your
own PDF file if you wish uh and for this
lesson we encourage you to do so let's
do some basic sanity checking of what
the document consist of as well as the
length of the
document we see that we have a list of
documents uh there's 41 elements in
there uh each item in that list is a
document object and we'll also show uh a
snippet of the text for a given document
next we'll merge these into a single
document because it helps with overall
text blending accuracy when using more
advanced retrieval methods such as a
sentence window retrieval as well as
Auto merging retrieval The Next Step
here is to index these documents and we
can do this with the vector store index
within llama
index next we Define a service context
object which contains both the alab
we're going to use as well as the
embedding model we're going to use the
alab we're going to use is gbt 3.5 Turbo
from open Ai and then the embedding
model that we're going to use is the
hugging face BG small
model these Cas steps show this inje
process right here we've loaded in
documents and then in one line Vector
stor index off from documents we're
doing the chunking embedding and
indexing under the hood with the
embedding mod that you
specified next we obtain a query engine
from this index that allows us to send
user queries that do retrieval and
synthesis against this
data let's try out our first
request and the query is what are steps
to take when finding projects to build
your experience let's
F star small and gradually increase the
scope and complexity of your projects
great so it's working so now you've set
up the basic rag pipeline uh the next
step is to set up some evaluations
against this pipeline to understand how
well it performs and this will also
provide the basis for defining our
Advanced retrieval methods the sentence
window retriever as well as the auto
merging retriever in this section we use
true lens to initialize feedback
functions we initialize a helper
function get feedbacks to return a list
of feedback functions to evalate our app
here we've created a rag evaluation
Triad which consists of pairwise
comparisons between the query response
and context and so this really creates
three different evaluation modules
answer relevance context relevance and
groundedness answer relevance is is the
response relevant to the query context
relevance is is the retrieve context
relevant to the
query and groundedness is is the
response supported by the
context we'll walk through how to set
this up yourself in the next few
notebooks the first thing we need to do
is to create a set of questions on which
to test our application here we've
pre-written the first 10 and we
encourage you to add to
theist and now we have some evaluation
questions what are the keys to building
a career in AI how can teamwork
contribute to success in AI Etc the
first thing we need to do is to create a
set of questions on which to test our
application here we've pre-written the
first 10 but we encourage you to also
add to this list here we specify fund
new question what is the right AI job
for me and we add it to the eval
questions list now we can initialize the
true lens modules to begin our
evaluation
process we've initialized the true L
module and now we've reset the
database we can now initialize our
evaluation modules alms are growing as a
standard mechanism for evaluating
generative AI applications at scale
rather than relying on expensive human
evaluation or set benchmarks alms allow
us to evaluate our applications in a way
that is custom to The Domain in which we
operate and dynamic to the changing
demands for our
application here we've pre-built a shin
recorder to use for this example in the
recorder we've included the standard
Triad of evaluations for evaluating rags
ground in this context relevance and
answer relevance uh we'll also specify
an ID so that we can track this version
of our app as we experiment we can track
new versions by simply changing app ID
now we can run the query engine again
with the true lines
context so what's happening here is that
we're sending each query to our query
engine and in the background uh the true
lens recorder is evaluating each four
queries against these three
metrics if you see some warning messages
uh don't worry about it uh some of it is
system about
it here we can see a list of queries as
well as their Associated
responses you can see the input output
the record ID tags and
more you can also see the answer
relevance context relevance and
groundedness for each row in this
dashboard you can see your evaluation
metrics like context relevance answer
relevance and groundedness as well as
average leny total cost and more and in
a
UI here we see that the answer relevance
and groundedness are decently high but
Cloud taex relevance is pretty low now
let's see if we can improve these
metrics with more advanced retrieval
techniques like sentence window
retrieval as well as onut merging
retrieval the first Advanced technique
will talk about is sentence window
retrieval this works by embedding and
retrieving single sentences so more
granular trunks but after retrieval the
sentences are replaced with a larger
window of sentences around the original
retrieve
sentence the intuition is that this
allows for the Alm to have more context
for the information retrieved in order
to better answer queries while still
retrieving on more granular pieces of
information so ideally improving both
retrieval as well as synthesis
performance now let's take a look at how
to set it
up first we'll use opening IBT 3.5 turbo
next we'll construct our sentence window
index over the given
document just a reminder that we have a
helper function for constructing the
sentence window index and we'll do a
deep dive in how this works under the
hood in the next few
lessons similar to before we'll get a
query engine from from the sentence
window
index and now that we've set this up we
can try running an example
query here the question is how do I get
started on a personal project in
Ai and we get back response get started
on a personal project in AI is first
important to identify scope the project
great similarly to before let's try
getting the true lines evaluation
context and try benchmarking the
results
so here we import the true recorder
sentence window which is a pre-built
true lines recorder for the sentence
window
index and now we'll run the sentence
window retriever on top of these
evaluation questions and then compare
performance on the rag Triad of
evaluation
modules here we can see the respons come
in as they're being
run some examples of questions or
responses um how can teamwork contribute
to success in AI teamwork can contribute
to success in AI by allowing individuals
to Leverage The expertise and insights
of their colleagues what what's the
importance of networking in AI
networking is important in AI because it
allows individuals to connect with
others who have experience and knowledge
in the
field great now that we've run
evaluations for two techniques the Bic
rag pipeline as well as a sentence
window retrieval pipeline let's get a
leader W of the results and see what's
going
on here we see that General groundedness
um is eight percentage points better
than the Baseline drag pip Point answer
relevance is more or less the same
context relevance is also better for the
sentence window prary engine latency is
more or less the same and the tottal
cost is
lower
since the groundedness and context
viment are higher but the total cost is
lower we can intu it that the sentence
window retriever is actually giving us
more relevant context and more
efficiently as
well when we go back into the UI we can
see that we now have a comparison
between the direct query engine the
Baseline as well as a sentence window
and we can see the metrix that we just
saw in the notebook display DOI as well
the next Advanced retrieval technique
we'll talk about is the auto merging
retriever here we construct a hierarchy
of larger parent nodes with smaller
child nodes that reference the parent
node so for instance we might have a
parent node of chunks size 512 tokens
and underneath there are four child
nodes of Chunk size 128 tokens that link
to this parent
node the autom merging retriever works
by merging retrieve nodes into larger
parent nodes which means that during
retrieval if a parent actually has a
majority of its children nodes retrieved
then we'll replace the children nodes
with the parent
node so this allows us to hierarchically
merge a retriev noes the combination of
all the child nodes is the same text as
the parent Noe similarly to the sentence
window retriever in the next few lessons
we'll do a bit more of a deep dive on
how it works here we'll show you how to
set it up with our helper functions
here we've built the auto merging index
um again using gbt 3.5 turbo for the LM
as well as the BG model for the embeding
model we got the query engine from the
auto merging
Retriever and let's try running an
example
query how do I build a portfolio of AI
projects in the logs here you actually
see the merging project process go one
we're merging nodes into a parent node
uh to basically Retreat the parent node
as opposed to the child
node to build a portfolio of AI projects
it is important to start with simple
undertakings and gradually progress some
more complex
ones great so we see that it's working
now let's Benchmark results with true
lines we got a pre-built true lens
recorder on top of our Auto virgin
retriever we then run the auto virgin
Retriever with true lents on top of our
valuation
questions here for each question you
actually see the merging process going
on such as merging three nodes into the
parent node for the first question if we
scroll down just a little bit we see
that for some of these other questions
we're also performing the merging
process merging three nodes into the
parent node merging one node into par an
example question response pair is what
is the importance of networking in AI
networking is important in AI because it
helps in building a strong professional
networking Community now that we've run
all three retrieval techniques the basic
rag pipeline as well as the two Advanced
retrieval methods we can view a
comprehensive leaderboard to see how all
three techniques stack
up we get pretty nice results for the
autover virgent query engine on top of
the evaluation questions we get 100% in
terms of groundedness 94% in terms of
answer relevance um 43% in terms of
context relevance Which is higher than
both the sentence window and the
Baseline rack
Pipeline and we get roughly equivalent
total costs to the sentence window opary
engine implying that the retrieval here
is more efficient with equivalent
latency and at the end you can view this
in the dashboard as well this lesson
gives you a comprehensive overview of
how to set up a basic and advanced frag
pipeline
and also how to set up evaluation
modules to measure performance in the
next lesson aicon will do a deep dive
into these evaluation modules
specifically the rag Triad of graved
answer relevance and context relevance
and you'll learn a bit more about how to
use these modules and what each module
needs
Voir Plus de Vidéos Connexes
5.0 / 5 (0 votes)