Introduction to Query Pipelines (Building Advanced RAG, Part 1)
Summary
TLDRこの動画は、クエリパイプラインの導入とその応用について説明しています。クエリパイプラインは、複雑なタスクをシンプルに定義し、効率的に実行するための宣言的なAPIです。ビデオでは、クエリのリライトや検索、ランキング、応答合成などの高度なワークフローを作成する方法を紹介し、それらを効率的に組み合わせて複雑なシステムを構築する方法を学ぶことができます。また、非同期処理や並列実行など、クエリパイプラインの高度な機能についても触れています。
Takeaways
- 📚 紹介: クエリパイプラインの概念と、高度な検索システムを構築するための柔軟性とカスタマイズ性について解説します。
- 🔍 クエリパイプライン: 異なるモジュールをDAG(有向グラフ)に連結する宣言的なAPIで、複雑な検索システムを構築するための抽象化を提供します。
- 🔧 機能性の向上: クエリパイプラインを使用することで、一般的なパターンを抽象化し、カスタムワークフローを簡単に定義できます。
- 🚀 効率性の向上: クエリパイプラインは、ボイラープレートコードを減らし、異なるコンポーネントを組み合わせてシステムの効率を向上させることができます。
- 🔄 シンプルなチェイン: 入力をシンプルなプロンプト、LLM、出力パーサーのチェインに流し込むことができます。
- 🌐 複雑なDAG: より複雑なDAGを表現でき、例えばプロンプト、LLM、リトライバー、再ランキング、応答合成などのモジュールを組み合わせることができます。
- 📈 可視化: クエリパイプラインをネットワーク可視化ツールを使用して可視化することができます。
- 🔁 ループと複雑なシステム: クエリパイプラインを使用することで、ループや複雑なシステムを構築し、高度な検索システムを実現できます。
- 🌟 カスタムコンポーネント: クエリパイプラインには、サブクラス化されたカスタムクエリコンポーネントや、単純な関数コンポーネントを作成できます。
- ⚙️ アシンクロン/並列実行: クエリパイプラインはアシンクロンまたは並列で実行でき、システムのパフォーマンスを向上させます。
- 🛠️ 応用例: クエリリライトニング、リトライバル、応答合成などの検索システムの例を通じて、クエリパイプラインの使い方を学ぶことができます。
Q & A
どのようなタスクを自動化するためのクエリパイプラインが紹介されましたか?
-クエリパイプラインは、検索リライトイング、ルーティング、ランキング、応答合成などのタスクを自動化するためのAPIです。
クエリパイプラインの利点は何ですか?
-クエリパイプラインは、一般的なパターンを抽象化し、カスタムワークフローを簡単に定義し、ボイラープレートコードを減らし、異なるコンポーネントを組み合わせ、システムの効率を向上させることが可能です。
クエリパイプラインの構文はどのように機能しますか?
-クエリパイプラインは、宣言型のAPIで、異なるモジュールをDAG(有向グラフ)に連鎖させて構成します。これにより、複雑なワークフローを簡潔かつ読みやすく表現できます。
クエリパイプラインの例として示されたシンプルなチェーンは何ですか?
-シンプルなチェーンの例として、入力をプロンプトに渡し、LLM(Large Language Model)を介して出力パーサーに渡すワークフローが示されました。
クエリパイプラインのdag構文とは何ですか?
-dag構文は、グラフのような構造を定義するために使用され、ノードとエッジの集合を表します。これにより、より複雑なワークフローを表現できます。
クエリパイプラインのストリーミングサポートについてどういいますか?
-クエリパイプラインは、ストリーミングをサポートしており、ノートブックでの開発やフルスタックアプリケーションでのUX改善に役立ちます。
カスタムコンポーネントをクエリパイプラインに追加する方法は何ですか?
-カスタムコンポーネントを追加するには、サブクラス化するか、任意の関数を関数コンポーネントに渡すことができます。これにより、独自のロジックや検証を実装できます。
非同期および並列実行の利点は何ですか?
-非同期および並列実行により、クエリパイプラインは最適な方法で実行され、遅延を改善し、システムのパフォーマンスを最大限に活用できます。
クエリパイプラインで示されたエンSEMBLE検索の例は何ですか?
-エンSEMBLE検索の例では、同じデータに異なるチャンクサイズ(128、256、512、1024)で4つの異なる検索エンジンを定義し、それらを並列に実行して結果を組み合わせます。
クエリパイプラインの可視化はどのように助かりますか?
-クエリパイプラインの可視化は、ワークフローの全体像を把握し、デバッグや問題解決に役立ちます。
クエリパイプラインを使用する際の主要な注意点は何ですか?
-クエリパイプラインを使用する際の主要な注意点は、適切なモジュールの選択、它们的の間の適切な接続、そしてエラー処理や検証ロジックの正確な実装です。
Outlines
📚 紹介とクエリパイプラインの基本
この段落では、Jerryがロンドンからやってくることを紹介し、クエリパイプラインの概念とその重要性を説明しています。高度な検索システム(Advanced rag)は、単純なベクトルデータベースのトップK検索を超えたり、クエリの再書、ルーティング、ランキング、応答合成などのカスタマイズ可能なコンポーネントを持っています。このビデオの目的は、これらの高度なワークフローを抽象化し、定義し、組み合わせて効率的にシステムを構築する方法を紹介することです。
🔍 クエリパイプラインの利点と応用
この段落では、クエリパイプラインの利点が説明されています。コードの簡素化、可読性の向上、一連の処理を遡跡する機能、シリアル化の容易さなどがあります。また、クエリパイプラインを使用した基本的なワークフローと、より高度な例(例えば、プロンプト、LLM、出力解析器の組み合わせ)の使用方法が紹介されています。
📈 クエリパイプラインの実行とストリーミングサポート
この段落では、クエリパイプラインの実行方法とストリーミング機能が説明されています。シンプルなプロンプトとLLMのチェーンから、出力解析器を追加するまで、パイプラインの実行方法を示しています。また、ストリーミングサポートの重要性と使用方法も説明されています。
🔄 クエリリライトリングと取得ワークフロー
この段落では、クエリリライトリングと取得のワークフローが紹介されています。具体的な例として、Paul Gramの生活に関する質問を生成し、その答えを取得するプロセスが説明されています。このプロセスは、クエリリライトリング、LLM、取得、および応答合成モジュールを含む完全なワークフローを示しています。
🛠️ カスタマイズされたクエリパイプラインの構築
この段落では、クエリパイプラインをカスタマイズする方法が説明されています。カスタムクエリコンポーネントをサブクラス化する方法と、単純な関数コンポーネントを使用する方法が紹介されています。また、関連する映画を生成し、シェイクスピアの声でリライトするカスタムコンポーネントの例も提供されています。
🚀 非同期および並列実行の例
最後の段落では、クエリパイプラインの非同期および並列実行の利点と具体的な例が説明されています。複数の検索エンジンに入力を並列に送信し、結果を組み合わせる方法が示されています。この例では、異なるチャンクサイズに応じて4つの個別の検索エンジンを定義し、それらを並列に実行して結果をまとめることを示しています。
Mindmap
Keywords
💡Query Pipelines
💡Advanced RAG (Retrieval-Augmented Generation)
💡Rewriting
💡Routing
💡Hybrid Search
💡Embedding Search
💡Response Synthesis
💡Async and Parallel Execution
💡Observability
💡Custom Components
Highlights
Introduction to query pipelines, a declarative query API for chaining different modules into a directed acyclic graph (DAG).
Advanced RAG (Retrieval-Augmented Generation) systems can include custom components such as query rewriting, routing, ranking, and response synthesis.
Query pipelines aim to abstract common patterns in RAG workflows, simplify custom workflow definitions, reduce boilerplate code, and improve system efficiency.
The ability to express simple chains, like feeding input into a prompt and then into an LLM (Language Model), in a concise and readable manner.
Query pipelines support arbitrarily complex DAGs, allowing for sophisticated system compositions and interactive examples.
Visualization of query pipelines using networkx for better understanding of the system's structure.
Advantages of using query pipelines include reduced lines of code, better readability, and built-in tracing for system monitoring.
Example of a simple query pipeline chaining a prompt and an LLM to generate related movies.
Adding output parsing to the pipeline for structured outputs using Pantic models.
Support for LM streaming in query pipelines, which is essential for a good user experience in notebooks or full-stack applications.
Building an end-to-end fragment pipeline with query rewriting and retrieval, covering basic retrieval concepts like query rewriting and HIDE (Hallucinate and DErive).
Creating a full RAG pipeline using a graph-like syntax for more expressive representation of the system's workflow.
Defining custom components in a query pipeline through subclassing or using a simpler function component syntax.
Example of a custom component for generating related movies and rewriting prompts in the voice of Shakespeare.
Async and parallel execution capabilities of query pipelines, improving latency and taking advantage of system resources.
Ensemble retrieval example, running inputs in parallel to multiple RAG engines and combining results for a more comprehensive response.
The importance of observability tools like RISE Phoenix for debugging and understanding the flow of data through the system.
Overall, the video provides an introductory guide to using query pipelines in both simple and advanced RAG workflows, showcasing their versatility and potential impact on system efficiency.
Transcripts
hey everyone uh Jerry from lond du here
and in this video I'll talk about an
introduction to query
pipelines so just to set the stage
Advanced rag can have a lot of different
components inside of it and there's a
lot of stuff that you can do beyond the
simple top K rag over a vector database
so when you're actually building these
Advanced components they can oftentimes
be deeply custom um some examples shown
here include query rewriting for
instance transforming the query into
another query before you feed it into
your Downstream retrieval system uh
routing so taking a query translating it
into a set of choices retrieval being
able to actually query your uh Vector
database and doing DSE search sparse
search Hybrid search Etc ranking taking
in all these different nodes and
actually reordering them based uh on the
query using a more fancy model than than
DSE embedding search response synthesis
actually feeding these contexts to the
LM to synthesize the response and of
course a lot of other components right
you can include arbitrary complicated
prompt chains tool use um other modules
like other storage systems graph stores
Vector DBS uh SQL DBS and Etc and so
this diagram from you know uh uh
excellent blog post here shows showcases
just a snippet of the different
components of advanced
rack so our goal in this video is to
really introduce some abstractions that
help you one abstract away some common
patterns uh in composing some of these
Advanced rag workflows two letting you
easily Define custom workflows three
reducing boilerplate code in being able
to actually string together different
components and four improving some
systems efficiency being able to get
stuff like streaming and async out of
the
box so today I'll be talking about query
pipelines query pipelines are a
declarative query API that allow you to
chain different modules into a dag or
directed basic graph in future videos
we'll show how you can even compose or
combine this into an agent so you can
actually do loops over this dag and
therefore create a more sophisticated
system but for the purposes of this
video we'll do an introduction to how
this syntax and this overall API allows
you to compose these Advanced rag
workflows in the diagram shown here it
shows you a basic rag workflow with
reranking um and query rewriting and so
given a query you know you might pass it
to a prompt uh which you then pass to an
llm to then rewrite the query and of
course from there you know you you send
it through the retriever re ranker
response synthesis and of course for
these three modules at each stage you
also want to pass in the generated query
string as well um so in the end you have
a dag and when you execute this entire
dag you get back a response uh you can
also represent this as a set of nodes
and edges as shown in this diagram right
here some of the features of this query
pipeline include being able to concisely
express simple chains uh so for instance
uh feeding in an input into a prompt
than an llm than an output parser this
is a pretty popular workflow these days
and we let you do it in a very concise
but also very readable manner we also
let you express a arbitrarily complex
dag similar to the example below uh or
beforehand but actually you know even
more complex and we'll walk through a
few interactive examples on how this
works um and so we will show you for
instance how to compose rag systems tool
use and all that stuff in this video and
also subsequent
videos uh in this screenshot you can see
basically two modes of using the query
Pipeline and again we'll go through the
notebook walkthrough but in the first
stage you can see like a sequential
chain where you know given a query
pipeline you can instantiate what we
call like a linear chain right and this
is just a sequence of values um and you
know every subsequent module follows uh
the previous module
and so an example of such a very simple
train is just you chain together a
prompt and L one and when you pass an
input it flows through both modules and
gives you back an
output a more complex uh you know uh dag
might require our dag syntax and here we
just show you a very basic example where
you can Define stuff kind of more like a
graph you define the set of nodes where
you add the modules in this case it's
also just a prompt and Alm it's very
simple example and then you can also
Define the link between these modules uh
between the prompt and also the allum
and you know regardless how you define
this you can combine these as well you
run this pipeline um and you B you get
back the response you feed in the input
that the first component expects like
the root module and then you get back
the output of the leaf
module you might ask before we jump into
the example why do we want to do this um
and and it's a good question right
because you can actually orchestrate all
these workflows without the query
pipeline abstraction by just
imperatively using L index modules we
have all the modules for llms prompts
retrievers respon synthesis you can just
write the python code you know in a
function with if statements while Loops
Etc to basically achieve the result that
you want you know it's fine either way I
think but we thought a little bit about
this and there are some potential
advantages of using a query pipeline
some of these include being able to
express common workflows with a fur
lines of cod or boiler plate for
instance if you're string together like
a prompt LM output parser it's readable
to just Define it as a query pipeline
you can do it in one line of code and
you don't have to worry about like
string conversions and those types of
things uh it's there's a certain element
of just like readability when you can
actually um compose things into a chain
or a dag uh and we also will show you
how to actually visualize this so it's
kind of nice to just show what the
pipeline looks like through networkx um
uh additional benefit is just from a
system side you can pass hallbacks
everywhere under the hood um so that
tracing is baked in you know we have a
variety of different partners uh where
we enable tracing and in this example
we'll show you an example with rise
Phoenix and then in the future there's
also some other benefits by having a
more declarative syntax uh you can have
potentially easier serializability of
these components which makes it a little
bit easy to for instance like deploy
this to a different server or or Port
this to to someone else's machine um but
you know that that's kind of something
that's more work in progress
so in this video we'll go through an
introductory guide uh to using the query
pipelines uh we'll go through basically
these two docs in the documentation uh
if you click on these you can find these
in the docs yourselves one is an
introduction to llama index query
pipelines and another is query pipelines
with async and parallel execution um and
so we'll actually just directly jump
into a notebook example that covers both
both of these
aspects so let's go through a walkr of
how to use LL index query pipelines in
both simple and advanced workflows so
we'll go through some basic stuff of
just prompt training and then expand to
more fancy you know rag retrieval use
cases and then also go through some use
cases of async and streaming so let's
get
started the first thing we'll actually
do is just set up an observability Tool
uh glom index provid
Integrations with a lot of different
observability tools with a very easy to
use interface so in about two to four
lines of code you can set up uh trace a
callback and by calling set Global
Handler you're able to log all D INF
traces to a provider in this case we'll
use arise
Phoenix and let's actually just go
through this and we'll also Define some
imports so here um you'll see that
there's um one main import which is from
blam index. query pipeline import query
pipeline all other Imports will be used
in Downstream modules like llms prompts
Vector
stores the next part is we'll load in
the classic Paul gram essay this is our
favorite example if you follow the quick
start tutorial La
index and you'll see that the docs look
something like this what I work
on
after we load in the documents we want
to define a vector store over these
documents so we call Vector store index
stop from documents on the docs this
takes in the documents parses it
transforms it by chunking it into a
bunch of little chunks and then calls an
embedding model on each chunk and puts
it into an in-memory Vector store of
course we have a lot of different Vector
store Integrations we also have a you
know default trunk size and embedding
model that we use and then afterwards we
persist it onto a storage so we call
Storage context. persist into
storage this is just some wrapper code
that says you know if it doesn't already
exist build it if it already exists then
just load the index from disk so let's
call all
this now we're ready to go through some
of the usage uh use cases of query
pipelines in the very first example uh
we'll just chain together a prompt and L
so we actually don't use the vector
store index uh right now but we'll use
it in a few subsequent sections in this
case we'll just do the simple workflow
of we have a prompt and an llm and let's
use our query pipeline syntax to combine
them together if the prompt is please
generate related movies to movie name
where movie name is a template variable
we then want to pass the formatted
prompt into an
llm what we can do is Define a query
pipeline that chains a prompt template
and an llm together that way given any
sort of llm it'll first go through the
prompt template and the output is a
fully formatted prompt that then gets
passed into the LM and then the output
is the LM output so let's run this very
simple example in action where we do
output equals p. Run movie name equals
The
Departed you'll see that by calling
verbose equals true it logs the inputs
to each
module and then let's take a look at the
output and we'll see a list of 10
generated
movies just quick note on this
declarative syntax of a query pipeline
versus the imperative syntax that you
could just do yourself already you can
absolutely create this using an ex
existing Lama index modules and this
actually just walks you through what
that looks like given a prompt in llm
and given an input you have a movie name
The
Departed you just need to call Prompt
template. format movie name equals movie
name take the output and then pass it in
as a chat message to the LM to use lm.
chat and the chat message uh is a
container that takes in some text as
well as the RO itself once it takes in
this input it generates the output and
we can run this and we'll get back the
same result you'll see that you know you
can basically do either one the main
advantage of a query pipeline is that
the syntax is a little bit more concise
um a little bit more readable and that's
just one of these advantages and you'll
see some of the other advantages down
the road as well including ease of
visualization uh and being able to run
stuff like async and parallel under the
hood after combining the prompt with LM
let's add an additional module of output
parsing so let's say we actually want
structured outputs so what do we do if
we want an output like this where we
have have uh class movies where movies
takes in a list of movie objects and
both movies and movie is a pantic model
right each movie contains a name as well
as a year of the movie and movies just
contains a list of
movies then we can add this as a pantic
output parser um a pantic output parser
it just takes in some string and then
parses it out you know uh the Json from
that string into a pantic object and
this will be the last component in our
chain the other component that we're
going to need to modify is a prompt
where instead of just saying please
generate related movies we're going to
need to modify a little bit to also tell
the El to Output stuff in Json
format so here we see this is what the
Json prompt string looks like please
generate related movies to the movie
name here is the Json schema to follow
um so this is the output pantic schema
and output of valid Jason object
we can now Define the query pipeline
chain which now contains this modified
prompt template the LM as well as the
output parser and if we run this on for
instance Toy Story as a movie we'll run
through all three modules and get back
one you know a Json that goes into the
output parser and we get back a pantic
object this is uh movies object with
movie Finding Nemo cars Monsters And The
Incredibles
so that's output parsing and next next
let's walk through streaming support so
the query pipelines have LM streaming
support uh streaming is a fundamental
part of having a good ux um whether
you're building stuff in a notebook or
building a full stack application and
you you just need to toggle lm. as query
component streaming equals true here um
if the llm is the last component in this
overall pipeline then the output of this
pipeline will be streaming and if if
it's a middle component then the outputs
will be be collected and then passed to
Downstream
modules so let's walk through a basic
chain of multiple prompts with streaming
where the first prompt is please
generate related movies to the movie
name and then the second prompt is given
some sort of text uh which is you know
the set of movies please rewrite this
with the summary of each
movie uh and the output here you see we
Define an llm object equal open AI but
then we just call lm. as query component
streaming equals true um why do we do
this it's just because the query
pipeline expects generally a set of
query components um when you pass in a
default object like LMS or query
entrance or retrievers it actually gets
autoc converted under the hood but you
can of course explicitly Define this if
you want um and when you explicitly
Define this you have the option to pass
in additional parameters like stream
equals true as well as
partials so this chain contains a prompt
the Alm as well as the second prompt and
llm and if we run this over a given
movie The Dark Knight it'll run through
all these different modules and you can
see the output is
streaming
of course you can also feed the
streaming output to an output parsing
module as well so given a basic Json
prompt template pass it to an LM that is
streaming you can parse it to an output
parsing module and then you can see the
structured output right
here the next step is to show you not
just how to do prompt orchestration but
also build an end to-end fragment
pipeline um and in this first example
we'll show you a basic query rewriting
and retrieval workflow and this will
cover some basic uh retrieval Concepts
such as career rewriting as well as hide
and then we'll plug it into a
retriever here we can still Define a
chain where the first prompt is please
generate a concise question about Paul
Gram's life regarding the following
topic so your first input is a topic and
then this will convert it into a
question and then the second prompt says
please write a passage to answer these
question try to include as many key
details as possible note that this does
not actually take in any context it just
tries to use the llms prior to answer
the question so this is essentially the
hide technique right given a question
try to hallucinate the answer and then
actually use that answer the
hallucinated answer to um as the input
to a retriever to try to fetch relevant
results so this quer pipeline contains
the first prompt G right question to LM
second prompt which is the hide prompt
generate passage to LM and then take the
output and pass it to a
retriever uh we can ignore this
reranking module for now going comment
this
out now given you know um a input topic
College will'll run this and then it'll
actually run through all four of these
modules given an input topic please
generate concise question about his life
regarding the following topic you can
see the input is how did Paul Graham's
college experience shape his career and
entrepreneural
mindset and then this is the
hallucinating
answer this is a good time to actually
jump to
Phoenix and you can see that all the
traces and chains that we've already
called have already been logged um and
so from the most basic single llm calls
to the slightly more advanced uh multi
all un calls and so here you can see
that you know the the the items that
we've logged are two subsequent Al un
calls question generation and then
passage writing so this is uh hide
prompt and then the retrieval method
right and and the hide passage you know
given an input will hallucinate the
answer you can see that it says Paul
gram um went to Cornell which is not
true so that part is hallucinated um but
then the idea is that you use this
hallucinated passage as an input into
the retriever to try to fetch the actual
Knowledge from the knowledge bank and
these are the set of output
documents so definitely encourage you to
check out these observability
Integrations they're pretty neat and
it's a nice way to debug your
flow so that was actually just the
retrieval flow but now let's actually
create a full rag pipeline to to create
a full rag pipeline we need to use a
slightly different syntax um because we
can't just Define a chain a chain just
assumes that modules are linear and so
one module follows the the previous one
but to define a full rag pipeline uh
some nodes might have multiple outputs
or multiple edges some nodes might take
in multiple edges and so you need a more
expressive syntax to actually represent
that um basically you need a graph likee
syntax so to define a rag pipeline you
essentially just need to Define some
sort of graph
um this is a very declarative approach
towards building a pipeline and in this
first example we'll show you how to
build a rag Pipeline with Creer
rewriting so this rag pipeline is just
consists of the following components um
it takes in an input rewrites it um into
a query passes it to a retriever then a
ranker then a response synthesis module
so the modules here is again you know
gener question generation given a a
topic pass passes it to the downstream
retriever then a coher rank
module and then tree summarize um this
is a response synthesis module that
takes in input context or input nodes
and hierarchically summarizes it given a
query to try to answer a
question so this is a core module in L
index so um let's run
this and then um once we have these
modules to the next step is to define
the query
pipeline you see that here instead of
defining a chain we first actually just
register the modules each module
contains a key mapped to the module
value itself so here is query pipeline
um here's llms prompts retrievers
summarizers free
rankers so let's run
that the next step is to draw links
between these modules um and instead of
again kind of like defining a chain
we'll just manually Define the edges
between these
modules we'll link the prompt to the LM
similar to before and then the LM to the
retriever right so inputs flow through
the prompt to the LM gets Rewritten into
a query that query goes into the
retriever this retriever the output of
this retriever goes into the re ranker
but the main thing is the re ranker
actually takes in uh two inputs right it
takes in the nodes from the retriever
but it also takes in the original input
query so actually when you're defining
these links to the ranker you you need
to explicitly specify which input is
corresponding to and this is what you do
through the destination keys so the desk
key nodes is the um you know the ranker
expects nodes and this is what the
retriever supplies the rerer also
expects query strings and that's what
the LM supplies so let's run
that you can take a look at the
summarizer required input Keys which is
again query string and
nodes next we can actually visualize
this overall graph and this is a nice
property of a query pipeline which is
because it's declarative and you define
this as the overall dag you can easily
visualize this um here's an input prompt
to an
llm and then the llm goes to you know a
retriever um this retriever goes to a
ranker along with the input from the LM
and then the summarizer takes the
outputs of the the ranker as well as the
alline let's run
this you'll see we've run all the
modules end to end and we get back a
final
response
great the next step is to show you what
a rag pipeline looks like without query
rewriting which is and sounds very
similar um and actually the main reason
we show you this is the use of a special
component called the input component so
in this case we do from llama index.
query pipeline import input
component and the main use for the input
component is in cases where you actually
want to link the input to multiple
Downstream modules instead of just one
Downstream module in that case you
actually need basically a special
placeholder component um that allows you
to you know register the input as a
module so here we register it under
input as a key to find the input
component and then we add a link from
the input to multiple downst modules to
set up you know a rag pipeline the input
which is the query needs to be linked to
the retriever but also the downstream
response inthis module right like both
of these take in the input and this
essentially provides you a placeholder
variable to do
this so we run
this and then run this rag pipeline what
did the author do in
YC and this is the classic rag pipeline
you know without query re uh rewriting
or even reranking and then you get back
in
output the next section here is learning
how to define a custom component in a
query pipeline um there's actually a
variety of ways to do this um you can uh
here we show you how to subclass uh
custom query components
um and so you define a subass and
Implement a set of required functions
and so this is the most comprehensive
way of doing so and allows you to input
import uh or Define stuff like uh
validation input keys and so on and so
forth there's also a simpler syntax
where you can actually just pass in an
arbitrary function into a function
component I'll show you a little bit of
that it's not defined in this notebook
but will be defined in subsequent
notebooks here to subass a custom query
component you have a uh we want a subass
to define a related movie
component the main methods you need to
implement are validate component inputs
input Keys output keys and run component
run component is where you define the
core
logic um input Keys is where you define
the expected input
variables um you also can call validate
component inputs to um decide whether or
not the inputs passed in from other
modules are valid so you can throw an
error if they're not and it's
implemented similar to a pantic
validator so the input just gets passed
through and you can modify along the
way you also Define a set of output Keys
um to define the number and the type of
outputs that you
have so after you define that you can
Define run component and in this case we
just use the simple query pipeline that
we Define in the very first section
which again is just a prompt and Ln uh
given an input movie name just you know
run it through a chain of prompt an Ln
to give you back an output you see we
pass in the input key which is in kar's
movie we run this Pipeline and then we
return this as a dictionary uh the
dictionary contains the output Keys
which is just output and then the output
value so let's run
this and then actually you know right
below this I show you a very dumb simple
example of what a function component
looks like which also is a way of
defining custom components but is way
simpler um all you have to do is import
function component Define any sort of
function you want right this is a dummy
Fu function and then just Define a
function component over it what this
will do is it'll Auto inspect the
function signature and Define those as
input Keys it'll inspect the output
signature and then um it'll just run
your function as a component so we'll
show this more in later um videos and
tutorials as we set up more advanced
stuff but just keep this in mind for the
future so let's try this custom
component out we'll run this custom
component which again try read for lated
movies and then also just run it through
a prompt afterwards that says can you
rewrite this in the voice of Shakespeare
right so at this point we all know the
drill we take in a component um run it
generates for related movies passes into
this prompt and then an LM and this is
back to our favorite chain syntax and
then um once we Define this module and
then uh run this overall
pipeline we're able to get back a list
of movies um whose summaries are
actually uh written in a Shakespeare
invoice so this finish running and we
can see this is what the output looks
like um it outputs a list of movies and
this is the
output okay last but not least we'll
Define um we'll show show you how to do
async and parallel execution with our
query pipelines one of the benefits of
query pipelines is the fact that you can
run them in an async or parallel fashion
and we'll handle the coordination under
the hood without you having to worry
about whether or not your function is
truly as Synchro parallel so we try to
run it in the most optimal manner this
improves latency um and takes advantage
of your system so to really take
advantage of this feature we want to
Define an ensemble retrieval example
where we actually take in an input and
send it in parallel to four different
rag engines uh take the results and then
combine them into one response and so
we'll send this uh we'll Define this toy
example which is similar to Ensemble
retrieval or some sort of fusion and
we'll Define a different query engine
over the same data one per chunk size so
if our chunk sizes are 128 256 512 and
1024 we'll Define four separate rag
Pipelines over these trunk sizes and
then send an input to all four of these
in parallel and then combine the results
at the end this code snippet just takes
each chunk size and uh splits up the
document as such and defines it as a
separate Factor store and then stores
everything in a query engine dictionary
so the query engine dictionary is just a
mapping from the chunk size string um to
the output query engine right so you'll
have four elements after running this uh
cell
so the key again is keyed by trunk size
and then the output is a query
engine next we'll construct a query
pipeline you'll see that the set of
modules we Define include the query
engines that we defined above so the IDS
are the chunk size um and then the the
value is the corresponding query engine
we'll Define an input component a
summarizer component and and then what
we call a join component you'll see this
in the graph visualization in just a bit
but the join component takes in an
arbitrary set of input edges from other
modules uh from Upstream modules and
then similar to the pack you know syntax
in Python where you can take in some
arbitrary components via star arcs and
packs it into a list this does something
similar so it takes in an arbitrary
number of edges and packs it into a list
of objects and then this list of objects
is a single output that you can then
pass to other down
modules what happens here is first we
will add a link from the input to every
trunk size um and again every trunk size
is the query engine right so we pass the
input to every to the four query engines
we then take the outputs of the these
query engines and Link them to the join
module so this join module is taking the
outputs from all the query
engines it'll pack them into a list so
now you have a list of different nodes
there's actually convert function which
will convert the outputs into a node
with score right so this is element wise
and convert it into a list of nodes and
it'll pass this list of nodes to the
summarizer right with the destination
key
nodes the summarizer of course also
takes in the input
query so let's run this and let's
visualize this right just to make this
Crystal Clear we take a look at this
input um here is the
input
it links to chunk sizes 128 256 512 1024
these are all separate query
engines all of these outputs flow into
the join module Which packs it into a
list this list goes into the summarizer
which takes in the set of nodes as well
as the input and then returns an
output so now that we visualize this
let's run this we'll see that by running
this in an async fashion um a weight p.
aun it actually out runs very quickly um
if you can tell this actually ran all
four of these Retrievers in parallel the
128 256
5224 all these query engines were run in
parallel and the final output is 3.53
seconds right and the benefit of running
all these in parallels each of these
actually requires an LM call so it's not
just retrieval um so it does
legitimately take some
time in the meantime if if you compare
this with the synchronous
method you'll see that it does run
everything sequentially right 128 256
512 doesn't take advantage of
paralyzable function
calling and then at the end of the day
it takes 8.4
seconds so that's all the sections in
this introductory query pipeline video
and of course we'll highlight more
advanced use cases and other components
and exciting features in subsequent
videos but thank you for listening and
take
care
Weitere ähnliche Videos ansehen
Build a personal tasks page in Notion
【非同期処理】Pythonの async / await 構文を使ってみよう!
Aggregates Composition: A new view on Aggregates - Jérémie Chassaing - DDD Europe 2023
How vector search and semantic ranking improve your GPT prompts
【最新AI】センス不要でWebサイトや図解を一瞬で作成!生産性10倍の話題AI「v0」を徹底解説!
Reliable, fully local RAG agents with LLaMA3
5.0 / 5 (0 votes)