Introduction to Query Pipelines (Building Advanced RAG, Part 1)

LlamaIndex
9 Feb 202433:01

Summary

TLDRこの動画は、クエリパイプラインの導入とその応用について説明しています。クエリパイプラインは、複雑なタスクをシンプルに定義し、効率的に実行するための宣言的なAPIです。ビデオでは、クエリのリライトや検索、ランキング、応答合成などの高度なワークフローを作成する方法を紹介し、それらを効率的に組み合わせて複雑なシステムを構築する方法を学ぶことができます。また、非同期処理や並列実行など、クエリパイプラインの高度な機能についても触れています。

Takeaways

  • 📚 紹介: クエリパイプラインの概念と、高度な検索システムを構築するための柔軟性とカスタマイズ性について解説します。
  • 🔍 クエリパイプライン: 異なるモジュールをDAG(有向グラフ)に連結する宣言的なAPIで、複雑な検索システムを構築するための抽象化を提供します。
  • 🔧 機能性の向上: クエリパイプラインを使用することで、一般的なパターンを抽象化し、カスタムワークフローを簡単に定義できます。
  • 🚀 効率性の向上: クエリパイプラインは、ボイラープレートコードを減らし、異なるコンポーネントを組み合わせてシステムの効率を向上させることができます。
  • 🔄 シンプルなチェイン: 入力をシンプルなプロンプト、LLM、出力パーサーのチェインに流し込むことができます。
  • 🌐 複雑なDAG: より複雑なDAGを表現でき、例えばプロンプト、LLM、リトライバー、再ランキング、応答合成などのモジュールを組み合わせることができます。
  • 📈 可視化: クエリパイプラインをネットワーク可視化ツールを使用して可視化することができます。
  • 🔁 ループと複雑なシステム: クエリパイプラインを使用することで、ループや複雑なシステムを構築し、高度な検索システムを実現できます。
  • 🌟 カスタムコンポーネント: クエリパイプラインには、サブクラス化されたカスタムクエリコンポーネントや、単純な関数コンポーネントを作成できます。
  • ⚙️ アシンクロン/並列実行: クエリパイプラインはアシンクロンまたは並列で実行でき、システムのパフォーマンスを向上させます。
  • 🛠️ 応用例: クエリリライトニング、リトライバル、応答合成などの検索システムの例を通じて、クエリパイプラインの使い方を学ぶことができます。

Q & A

  • どのようなタスクを自動化するためのクエリパイプラインが紹介されましたか?

    -クエリパイプラインは、検索リライトイング、ルーティング、ランキング、応答合成などのタスクを自動化するためのAPIです。

  • クエリパイプラインの利点は何ですか?

    -クエリパイプラインは、一般的なパターンを抽象化し、カスタムワークフローを簡単に定義し、ボイラープレートコードを減らし、異なるコンポーネントを組み合わせ、システムの効率を向上させることが可能です。

  • クエリパイプラインの構文はどのように機能しますか?

    -クエリパイプラインは、宣言型のAPIで、異なるモジュールをDAG(有向グラフ)に連鎖させて構成します。これにより、複雑なワークフローを簡潔かつ読みやすく表現できます。

  • クエリパイプラインの例として示されたシンプルなチェーンは何ですか?

    -シンプルなチェーンの例として、入力をプロンプトに渡し、LLM(Large Language Model)を介して出力パーサーに渡すワークフローが示されました。

  • クエリパイプラインのdag構文とは何ですか?

    -dag構文は、グラフのような構造を定義するために使用され、ノードとエッジの集合を表します。これにより、より複雑なワークフローを表現できます。

  • クエリパイプラインのストリーミングサポートについてどういいますか?

    -クエリパイプラインは、ストリーミングをサポートしており、ノートブックでの開発やフルスタックアプリケーションでのUX改善に役立ちます。

  • カスタムコンポーネントをクエリパイプラインに追加する方法は何ですか?

    -カスタムコンポーネントを追加するには、サブクラス化するか、任意の関数を関数コンポーネントに渡すことができます。これにより、独自のロジックや検証を実装できます。

  • 非同期および並列実行の利点は何ですか?

    -非同期および並列実行により、クエリパイプラインは最適な方法で実行され、遅延を改善し、システムのパフォーマンスを最大限に活用できます。

  • クエリパイプラインで示されたエンSEMBLE検索の例は何ですか?

    -エンSEMBLE検索の例では、同じデータに異なるチャンクサイズ(128、256、512、1024)で4つの異なる検索エンジンを定義し、それらを並列に実行して結果を組み合わせます。

  • クエリパイプラインの可視化はどのように助かりますか?

    -クエリパイプラインの可視化は、ワークフローの全体像を把握し、デバッグや問題解決に役立ちます。

  • クエリパイプラインを使用する際の主要な注意点は何ですか?

    -クエリパイプラインを使用する際の主要な注意点は、適切なモジュールの選択、它们的の間の適切な接続、そしてエラー処理や検証ロジックの正確な実装です。

Outlines

00:00

📚 紹介とクエリパイプラインの基本

この段落では、Jerryがロンドンからやってくることを紹介し、クエリパイプラインの概念とその重要性を説明しています。高度な検索システム(Advanced rag)は、単純なベクトルデータベースのトップK検索を超えたり、クエリの再書、ルーティング、ランキング、応答合成などのカスタマイズ可能なコンポーネントを持っています。このビデオの目的は、これらの高度なワークフローを抽象化し、定義し、組み合わせて効率的にシステムを構築する方法を紹介することです。

05:00

🔍 クエリパイプラインの利点と応用

この段落では、クエリパイプラインの利点が説明されています。コードの簡素化、可読性の向上、一連の処理を遡跡する機能、シリアル化の容易さなどがあります。また、クエリパイプラインを使用した基本的なワークフローと、より高度な例(例えば、プロンプト、LLM、出力解析器の組み合わせ)の使用方法が紹介されています。

10:02

📈 クエリパイプラインの実行とストリーミングサポート

この段落では、クエリパイプラインの実行方法とストリーミング機能が説明されています。シンプルなプロンプトとLLMのチェーンから、出力解析器を追加するまで、パイプラインの実行方法を示しています。また、ストリーミングサポートの重要性と使用方法も説明されています。

15:03

🔄 クエリリライトリングと取得ワークフロー

この段落では、クエリリライトリングと取得のワークフローが紹介されています。具体的な例として、Paul Gramの生活に関する質問を生成し、その答えを取得するプロセスが説明されています。このプロセスは、クエリリライトリング、LLM、取得、および応答合成モジュールを含む完全なワークフローを示しています。

20:06

🛠️ カスタマイズされたクエリパイプラインの構築

この段落では、クエリパイプラインをカスタマイズする方法が説明されています。カスタムクエリコンポーネントをサブクラス化する方法と、単純な関数コンポーネントを使用する方法が紹介されています。また、関連する映画を生成し、シェイクスピアの声でリライトするカスタムコンポーネントの例も提供されています。

25:10

🚀 非同期および並列実行の例

最後の段落では、クエリパイプラインの非同期および並列実行の利点と具体的な例が説明されています。複数の検索エンジンに入力を並列に送信し、結果を組み合わせる方法が示されています。この例では、異なるチャンクサイズに応じて4つの個別の検索エンジンを定義し、それらを並列に実行して結果をまとめることを示しています。

Mindmap

Keywords

💡Query Pipelines

Query Pipelinesは、異なるモジュールを連鎖して一つのグラフに組み込むための宣言的なAPIです。この動画では、Query Pipelinesを使用して、複雑なタスクをシンプルで読みやすい形式で定義する方法が説明されています。例えば、検索リクエストを変換し、それをリターナーやランキングモジュールに渡し、最終的にレスポンスを合成する一連のプロセスを示しています。

💡Advanced RAG (Retrieval-Augmented Generation)

Advanced RAGは、検索エンジンとジェネレーションモデルを組み合わせた高度なシステムです。この動画では、RAGの高度な機能を紹介し、Query Pipelinesを使用してそれらを実装する方法を説明しています。RAGシステムは、検索を行った後、得られた情報を基にテキストを生成する機能を持っています。

💡Rewriting

Rewritingは、検索クエリを別のクエリに変換するプロセスです。この動画では、Rewritingを使用して、クエリを変換し、それをリターナーに渡す方法が説明されています。これにより、より適切な検索結果を取得することができ、最終的な生成されたテキストの品質を向上させることができます。

💡Routing

Routingは、クエリを一連の選択肢に変換するプロセスで、クエリを異なるモジュールに送信するために使用されます。この動画では、Routingを使用して、クエリを最適なモジュールに導く方法が説明されています。

💡Hybrid Search

Hybrid Searchは、複数の検索方法を組み合わせた検索です。この動画では、Hybrid Searchを使用して、ベクトルデータベースだけでなく、他のデータソースも検索する方法が説明されています。これにより、より包括的な検索結果を取得することができ、システムの効率を向上させることができます。

💡Embedding Search

Embedding Searchは、データポイントのベクトル表現を使用して検索を実行する方法です。この動画では、Embedding Searchを使用して、ベクトルデータベースを検索する方法が説明されています。これにより、高度な検索機能を提供し、検索結果の正確性を向上させることができます。

💡Response Synthesis

Response Synthesisは、生成されたテキストを合成するプロセスです。この動画では、Response Synthesisを使用して、複数の入力から一つの統合された出力を生成する方法が説明されています。これにより、最終的なレスポンスの品質を向上させることができます。

💡Async and Parallel Execution

AsyncとParallel Executionは、非同期的にまたは並列に処理を実行することです。この動画では、Query Pipelinesを使用して、複数のタスクを同時に実行する方法が説明されています。これにより、システムの効率を向上させ、処理時間を短縮することができます。

💡Observability

Observabilityは、システムの状態や動作を監視し、理解する能力です。この動画では、Observabilityツールを使用して、Query Pipelinesの実行状況を追跡する方法が説明されています。これにより、システムのトラブルシューティングやデバッグが容易になります。

💡Custom Components

Custom Componentsは、既存の機能を拡張またはカスタマイズするために作成された独自のモジュールです。この動画では、Query Pipelinesに独自のコンポーネントを追加する方法が説明されています。これにより、ユーザーは自分の要件に合わせてシステムをカスタマイズすることができます。

Highlights

Introduction to query pipelines, a declarative query API for chaining different modules into a directed acyclic graph (DAG).

Advanced RAG (Retrieval-Augmented Generation) systems can include custom components such as query rewriting, routing, ranking, and response synthesis.

Query pipelines aim to abstract common patterns in RAG workflows, simplify custom workflow definitions, reduce boilerplate code, and improve system efficiency.

The ability to express simple chains, like feeding input into a prompt and then into an LLM (Language Model), in a concise and readable manner.

Query pipelines support arbitrarily complex DAGs, allowing for sophisticated system compositions and interactive examples.

Visualization of query pipelines using networkx for better understanding of the system's structure.

Advantages of using query pipelines include reduced lines of code, better readability, and built-in tracing for system monitoring.

Example of a simple query pipeline chaining a prompt and an LLM to generate related movies.

Adding output parsing to the pipeline for structured outputs using Pantic models.

Support for LM streaming in query pipelines, which is essential for a good user experience in notebooks or full-stack applications.

Building an end-to-end fragment pipeline with query rewriting and retrieval, covering basic retrieval concepts like query rewriting and HIDE (Hallucinate and DErive).

Creating a full RAG pipeline using a graph-like syntax for more expressive representation of the system's workflow.

Defining custom components in a query pipeline through subclassing or using a simpler function component syntax.

Example of a custom component for generating related movies and rewriting prompts in the voice of Shakespeare.

Async and parallel execution capabilities of query pipelines, improving latency and taking advantage of system resources.

Ensemble retrieval example, running inputs in parallel to multiple RAG engines and combining results for a more comprehensive response.

The importance of observability tools like RISE Phoenix for debugging and understanding the flow of data through the system.

Overall, the video provides an introductory guide to using query pipelines in both simple and advanced RAG workflows, showcasing their versatility and potential impact on system efficiency.

Transcripts

play00:00

hey everyone uh Jerry from lond du here

play00:03

and in this video I'll talk about an

play00:04

introduction to query

play00:08

pipelines so just to set the stage

play00:11

Advanced rag can have a lot of different

play00:13

components inside of it and there's a

play00:15

lot of stuff that you can do beyond the

play00:17

simple top K rag over a vector database

play00:20

so when you're actually building these

play00:21

Advanced components they can oftentimes

play00:23

be deeply custom um some examples shown

play00:26

here include query rewriting for

play00:29

instance transforming the query into

play00:30

another query before you feed it into

play00:32

your Downstream retrieval system uh

play00:34

routing so taking a query translating it

play00:36

into a set of choices retrieval being

play00:39

able to actually query your uh Vector

play00:41

database and doing DSE search sparse

play00:43

search Hybrid search Etc ranking taking

play00:46

in all these different nodes and

play00:48

actually reordering them based uh on the

play00:50

query using a more fancy model than than

play00:53

DSE embedding search response synthesis

play00:56

actually feeding these contexts to the

play00:57

LM to synthesize the response and of

play01:00

course a lot of other components right

play01:01

you can include arbitrary complicated

play01:04

prompt chains tool use um other modules

play01:07

like other storage systems graph stores

play01:09

Vector DBS uh SQL DBS and Etc and so

play01:13

this diagram from you know uh uh

play01:15

excellent blog post here shows showcases

play01:18

just a snippet of the different

play01:20

components of advanced

play01:22

rack so our goal in this video is to

play01:25

really introduce some abstractions that

play01:27

help you one abstract away some common

play01:30

patterns uh in composing some of these

play01:32

Advanced rag workflows two letting you

play01:35

easily Define custom workflows three

play01:38

reducing boilerplate code in being able

play01:40

to actually string together different

play01:42

components and four improving some

play01:44

systems efficiency being able to get

play01:46

stuff like streaming and async out of

play01:48

the

play01:51

box so today I'll be talking about query

play01:54

pipelines query pipelines are a

play01:56

declarative query API that allow you to

play01:59

chain different modules into a dag or

play02:02

directed basic graph in future videos

play02:05

we'll show how you can even compose or

play02:07

combine this into an agent so you can

play02:09

actually do loops over this dag and

play02:11

therefore create a more sophisticated

play02:13

system but for the purposes of this

play02:15

video we'll do an introduction to how

play02:17

this syntax and this overall API allows

play02:20

you to compose these Advanced rag

play02:21

workflows in the diagram shown here it

play02:24

shows you a basic rag workflow with

play02:26

reranking um and query rewriting and so

play02:29

given a query you know you might pass it

play02:31

to a prompt uh which you then pass to an

play02:33

llm to then rewrite the query and of

play02:36

course from there you know you you send

play02:37

it through the retriever re ranker

play02:39

response synthesis and of course for

play02:41

these three modules at each stage you

play02:43

also want to pass in the generated query

play02:45

string as well um so in the end you have

play02:47

a dag and when you execute this entire

play02:49

dag you get back a response uh you can

play02:51

also represent this as a set of nodes

play02:53

and edges as shown in this diagram right

play02:57

here some of the features of this query

play03:00

pipeline include being able to concisely

play03:02

express simple chains uh so for instance

play03:05

uh feeding in an input into a prompt

play03:07

than an llm than an output parser this

play03:10

is a pretty popular workflow these days

play03:12

and we let you do it in a very concise

play03:14

but also very readable manner we also

play03:17

let you express a arbitrarily complex

play03:20

dag similar to the example below uh or

play03:23

beforehand but actually you know even

play03:24

more complex and we'll walk through a

play03:26

few interactive examples on how this

play03:28

works um and so we will show you for

play03:30

instance how to compose rag systems tool

play03:33

use and all that stuff in this video and

play03:35

also subsequent

play03:37

videos uh in this screenshot you can see

play03:39

basically two modes of using the query

play03:41

Pipeline and again we'll go through the

play03:42

notebook walkthrough but in the first

play03:45

stage you can see like a sequential

play03:46

chain where you know given a query

play03:49

pipeline you can instantiate what we

play03:51

call like a linear chain right and this

play03:53

is just a sequence of values um and you

play03:56

know every subsequent module follows uh

play03:58

the previous module

play04:00

and so an example of such a very simple

play04:02

train is just you chain together a

play04:04

prompt and L one and when you pass an

play04:05

input it flows through both modules and

play04:07

gives you back an

play04:08

output a more complex uh you know uh dag

play04:12

might require our dag syntax and here we

play04:14

just show you a very basic example where

play04:17

you can Define stuff kind of more like a

play04:18

graph you define the set of nodes where

play04:20

you add the modules in this case it's

play04:22

also just a prompt and Alm it's very

play04:24

simple example and then you can also

play04:26

Define the link between these modules uh

play04:28

between the prompt and also the allum

play04:31

and you know regardless how you define

play04:33

this you can combine these as well you

play04:35

run this pipeline um and you B you get

play04:38

back the response you feed in the input

play04:40

that the first component expects like

play04:42

the root module and then you get back

play04:44

the output of the leaf

play04:47

module you might ask before we jump into

play04:50

the example why do we want to do this um

play04:54

and and it's a good question right

play04:57

because you can actually orchestrate all

play04:58

these workflows without the query

play05:00

pipeline abstraction by just

play05:02

imperatively using L index modules we

play05:04

have all the modules for llms prompts

play05:07

retrievers respon synthesis you can just

play05:09

write the python code you know in a

play05:11

function with if statements while Loops

play05:12

Etc to basically achieve the result that

play05:15

you want you know it's fine either way I

play05:17

think but we thought a little bit about

play05:19

this and there are some potential

play05:20

advantages of using a query pipeline

play05:23

some of these include being able to

play05:25

express common workflows with a fur

play05:27

lines of cod or boiler plate for

play05:29

instance if you're string together like

play05:31

a prompt LM output parser it's readable

play05:33

to just Define it as a query pipeline

play05:35

you can do it in one line of code and

play05:37

you don't have to worry about like

play05:38

string conversions and those types of

play05:40

things uh it's there's a certain element

play05:42

of just like readability when you can

play05:45

actually um compose things into a chain

play05:48

or a dag uh and we also will show you

play05:50

how to actually visualize this so it's

play05:51

kind of nice to just show what the

play05:53

pipeline looks like through networkx um

play05:56

uh additional benefit is just from a

play05:58

system side you can pass hallbacks

play05:59

everywhere under the hood um so that

play06:01

tracing is baked in you know we have a

play06:02

variety of different partners uh where

play06:04

we enable tracing and in this example

play06:06

we'll show you an example with rise

play06:08

Phoenix and then in the future there's

play06:11

also some other benefits by having a

play06:13

more declarative syntax uh you can have

play06:15

potentially easier serializability of

play06:18

these components which makes it a little

play06:19

bit easy to for instance like deploy

play06:21

this to a different server or or Port

play06:23

this to to someone else's machine um but

play06:26

you know that that's kind of something

play06:27

that's more work in progress

play06:32

so in this video we'll go through an

play06:35

introductory guide uh to using the query

play06:38

pipelines uh we'll go through basically

play06:40

these two docs in the documentation uh

play06:43

if you click on these you can find these

play06:44

in the docs yourselves one is an

play06:46

introduction to llama index query

play06:48

pipelines and another is query pipelines

play06:51

with async and parallel execution um and

play06:54

so we'll actually just directly jump

play06:56

into a notebook example that covers both

play06:59

both of these

play07:01

aspects so let's go through a walkr of

play07:04

how to use LL index query pipelines in

play07:07

both simple and advanced workflows so

play07:11

we'll go through some basic stuff of

play07:13

just prompt training and then expand to

play07:15

more fancy you know rag retrieval use

play07:17

cases and then also go through some use

play07:19

cases of async and streaming so let's

play07:22

get

play07:23

started the first thing we'll actually

play07:25

do is just set up an observability Tool

play07:28

uh glom index provid

play07:29

Integrations with a lot of different

play07:31

observability tools with a very easy to

play07:33

use interface so in about two to four

play07:35

lines of code you can set up uh trace a

play07:38

callback and by calling set Global

play07:40

Handler you're able to log all D INF

play07:42

traces to a provider in this case we'll

play07:45

use arise

play07:48

Phoenix and let's actually just go

play07:51

through this and we'll also Define some

play07:54

imports so here um you'll see that

play07:58

there's um one main import which is from

play08:01

blam index. query pipeline import query

play08:04

pipeline all other Imports will be used

play08:06

in Downstream modules like llms prompts

play08:09

Vector

play08:10

stores the next part is we'll load in

play08:13

the classic Paul gram essay this is our

play08:16

favorite example if you follow the quick

play08:17

start tutorial La

play08:20

index and you'll see that the docs look

play08:23

something like this what I work

play08:28

on

play08:29

after we load in the documents we want

play08:31

to define a vector store over these

play08:33

documents so we call Vector store index

play08:35

stop from documents on the docs this

play08:38

takes in the documents parses it

play08:41

transforms it by chunking it into a

play08:43

bunch of little chunks and then calls an

play08:46

embedding model on each chunk and puts

play08:48

it into an in-memory Vector store of

play08:50

course we have a lot of different Vector

play08:51

store Integrations we also have a you

play08:53

know default trunk size and embedding

play08:55

model that we use and then afterwards we

play08:57

persist it onto a storage so we call

play09:00

Storage context. persist into

play09:02

storage this is just some wrapper code

play09:05

that says you know if it doesn't already

play09:06

exist build it if it already exists then

play09:08

just load the index from disk so let's

play09:11

call all

play09:15

this now we're ready to go through some

play09:17

of the usage uh use cases of query

play09:20

pipelines in the very first example uh

play09:22

we'll just chain together a prompt and L

play09:25

so we actually don't use the vector

play09:26

store index uh right now but we'll use

play09:29

it in a few subsequent sections in this

play09:32

case we'll just do the simple workflow

play09:34

of we have a prompt and an llm and let's

play09:37

use our query pipeline syntax to combine

play09:39

them together if the prompt is please

play09:42

generate related movies to movie name

play09:44

where movie name is a template variable

play09:46

we then want to pass the formatted

play09:48

prompt into an

play09:50

llm what we can do is Define a query

play09:53

pipeline that chains a prompt template

play09:56

and an llm together that way given any

play09:59

sort of llm it'll first go through the

play10:01

prompt template and the output is a

play10:03

fully formatted prompt that then gets

play10:06

passed into the LM and then the output

play10:08

is the LM output so let's run this very

play10:12

simple example in action where we do

play10:14

output equals p. Run movie name equals

play10:17

The

play10:19

Departed you'll see that by calling

play10:22

verbose equals true it logs the inputs

play10:24

to each

play10:25

module and then let's take a look at the

play10:27

output and we'll see a list of 10

play10:30

generated

play10:38

movies just quick note on this

play10:41

declarative syntax of a query pipeline

play10:43

versus the imperative syntax that you

play10:44

could just do yourself already you can

play10:47

absolutely create this using an ex

play10:49

existing Lama index modules and this

play10:51

actually just walks you through what

play10:52

that looks like given a prompt in llm

play10:56

and given an input you have a movie name

play10:59

The

play10:59

Departed you just need to call Prompt

play11:01

template. format movie name equals movie

play11:04

name take the output and then pass it in

play11:07

as a chat message to the LM to use lm.

play11:10

chat and the chat message uh is a

play11:13

container that takes in some text as

play11:15

well as the RO itself once it takes in

play11:17

this input it generates the output and

play11:20

we can run this and we'll get back the

play11:21

same result you'll see that you know you

play11:23

can basically do either one the main

play11:26

advantage of a query pipeline is that

play11:28

the syntax is a little bit more concise

play11:31

um a little bit more readable and that's

play11:34

just one of these advantages and you'll

play11:35

see some of the other advantages down

play11:37

the road as well including ease of

play11:39

visualization uh and being able to run

play11:41

stuff like async and parallel under the

play11:46

hood after combining the prompt with LM

play11:49

let's add an additional module of output

play11:51

parsing so let's say we actually want

play11:53

structured outputs so what do we do if

play11:56

we want an output like this where we

play11:58

have have uh class movies where movies

play12:02

takes in a list of movie objects and

play12:05

both movies and movie is a pantic model

play12:08

right each movie contains a name as well

play12:10

as a year of the movie and movies just

play12:13

contains a list of

play12:14

movies then we can add this as a pantic

play12:18

output parser um a pantic output parser

play12:21

it just takes in some string and then

play12:23

parses it out you know uh the Json from

play12:26

that string into a pantic object and

play12:29

this will be the last component in our

play12:30

chain the other component that we're

play12:33

going to need to modify is a prompt

play12:34

where instead of just saying please

play12:36

generate related movies we're going to

play12:38

need to modify a little bit to also tell

play12:40

the El to Output stuff in Json

play12:43

format so here we see this is what the

play12:45

Json prompt string looks like please

play12:48

generate related movies to the movie

play12:49

name here is the Json schema to follow

play12:52

um so this is the output pantic schema

play12:56

and output of valid Jason object

play13:01

we can now Define the query pipeline

play13:03

chain which now contains this modified

play13:05

prompt template the LM as well as the

play13:07

output parser and if we run this on for

play13:10

instance Toy Story as a movie we'll run

play13:13

through all three modules and get back

play13:17

one you know a Json that goes into the

play13:19

output parser and we get back a pantic

play13:21

object this is uh movies object with

play13:24

movie Finding Nemo cars Monsters And The

play13:27

Incredibles

play13:31

so that's output parsing and next next

play13:34

let's walk through streaming support so

play13:37

the query pipelines have LM streaming

play13:39

support uh streaming is a fundamental

play13:41

part of having a good ux um whether

play13:43

you're building stuff in a notebook or

play13:45

building a full stack application and

play13:48

you you just need to toggle lm. as query

play13:50

component streaming equals true here um

play13:54

if the llm is the last component in this

play13:56

overall pipeline then the output of this

play13:58

pipeline will be streaming and if if

play14:00

it's a middle component then the outputs

play14:02

will be be collected and then passed to

play14:05

Downstream

play14:06

modules so let's walk through a basic

play14:09

chain of multiple prompts with streaming

play14:12

where the first prompt is please

play14:13

generate related movies to the movie

play14:15

name and then the second prompt is given

play14:18

some sort of text uh which is you know

play14:20

the set of movies please rewrite this

play14:22

with the summary of each

play14:24

movie uh and the output here you see we

play14:27

Define an llm object equal open AI but

play14:30

then we just call lm. as query component

play14:33

streaming equals true um why do we do

play14:37

this it's just because the query

play14:39

pipeline expects generally a set of

play14:41

query components um when you pass in a

play14:45

default object like LMS or query

play14:47

entrance or retrievers it actually gets

play14:49

autoc converted under the hood but you

play14:52

can of course explicitly Define this if

play14:54

you want um and when you explicitly

play14:55

Define this you have the option to pass

play14:57

in additional parameters like stream

play14:59

equals true as well as

play15:03

partials so this chain contains a prompt

play15:06

the Alm as well as the second prompt and

play15:10

llm and if we run this over a given

play15:12

movie The Dark Knight it'll run through

play15:15

all these different modules and you can

play15:17

see the output is

play15:27

streaming

play15:31

of course you can also feed the

play15:32

streaming output to an output parsing

play15:34

module as well so given a basic Json

play15:38

prompt template pass it to an LM that is

play15:40

streaming you can parse it to an output

play15:42

parsing module and then you can see the

play15:45

structured output right

play15:50

here the next step is to show you not

play15:54

just how to do prompt orchestration but

play15:56

also build an end to-end fragment

play15:58

pipeline um and in this first example

play16:01

we'll show you a basic query rewriting

play16:03

and retrieval workflow and this will

play16:06

cover some basic uh retrieval Concepts

play16:08

such as career rewriting as well as hide

play16:10

and then we'll plug it into a

play16:14

retriever here we can still Define a

play16:16

chain where the first prompt is please

play16:18

generate a concise question about Paul

play16:20

Gram's life regarding the following

play16:22

topic so your first input is a topic and

play16:24

then this will convert it into a

play16:26

question and then the second prompt says

play16:29

please write a passage to answer these

play16:31

question try to include as many key

play16:32

details as possible note that this does

play16:35

not actually take in any context it just

play16:37

tries to use the llms prior to answer

play16:39

the question so this is essentially the

play16:41

hide technique right given a question

play16:43

try to hallucinate the answer and then

play16:46

actually use that answer the

play16:47

hallucinated answer to um as the input

play16:50

to a retriever to try to fetch relevant

play16:53

results so this quer pipeline contains

play16:56

the first prompt G right question to LM

play16:59

second prompt which is the hide prompt

play17:01

generate passage to LM and then take the

play17:04

output and pass it to a

play17:07

retriever uh we can ignore this

play17:09

reranking module for now going comment

play17:11

this

play17:14

out now given you know um a input topic

play17:18

College will'll run this and then it'll

play17:20

actually run through all four of these

play17:23

modules given an input topic please

play17:26

generate concise question about his life

play17:28

regarding the following topic you can

play17:30

see the input is how did Paul Graham's

play17:32

college experience shape his career and

play17:33

entrepreneural

play17:36

mindset and then this is the

play17:38

hallucinating

play17:39

answer this is a good time to actually

play17:42

jump to

play17:46

Phoenix and you can see that all the

play17:48

traces and chains that we've already

play17:50

called have already been logged um and

play17:53

so from the most basic single llm calls

play17:56

to the slightly more advanced uh multi

play17:59

all un calls and so here you can see

play18:01

that you know the the the items that

play18:04

we've logged are two subsequent Al un

play18:06

calls question generation and then

play18:09

passage writing so this is uh hide

play18:13

prompt and then the retrieval method

play18:16

right and and the hide passage you know

play18:18

given an input will hallucinate the

play18:20

answer you can see that it says Paul

play18:22

gram um went to Cornell which is not

play18:25

true so that part is hallucinated um but

play18:27

then the idea is that you use this

play18:29

hallucinated passage as an input into

play18:32

the retriever to try to fetch the actual

play18:35

Knowledge from the knowledge bank and

play18:37

these are the set of output

play18:44

documents so definitely encourage you to

play18:46

check out these observability

play18:48

Integrations they're pretty neat and

play18:49

it's a nice way to debug your

play18:51

flow so that was actually just the

play18:53

retrieval flow but now let's actually

play18:55

create a full rag pipeline to to create

play18:57

a full rag pipeline we need to use a

play18:59

slightly different syntax um because we

play19:02

can't just Define a chain a chain just

play19:03

assumes that modules are linear and so

play19:05

one module follows the the previous one

play19:08

but to define a full rag pipeline uh

play19:11

some nodes might have multiple outputs

play19:13

or multiple edges some nodes might take

play19:15

in multiple edges and so you need a more

play19:18

expressive syntax to actually represent

play19:20

that um basically you need a graph likee

play19:22

syntax so to define a rag pipeline you

play19:26

essentially just need to Define some

play19:27

sort of graph

play19:29

um this is a very declarative approach

play19:31

towards building a pipeline and in this

play19:33

first example we'll show you how to

play19:35

build a rag Pipeline with Creer

play19:36

rewriting so this rag pipeline is just

play19:40

consists of the following components um

play19:43

it takes in an input rewrites it um into

play19:45

a query passes it to a retriever then a

play19:49

ranker then a response synthesis module

play19:52

so the modules here is again you know

play19:54

gener question generation given a a

play19:57

topic pass passes it to the downstream

play20:00

retriever then a coher rank

play20:03

module and then tree summarize um this

play20:06

is a response synthesis module that

play20:07

takes in input context or input nodes

play20:10

and hierarchically summarizes it given a

play20:11

query to try to answer a

play20:13

question so this is a core module in L

play20:19

index so um let's run

play20:24

this and then um once we have these

play20:27

modules to the next step is to define

play20:30

the query

play20:31

pipeline you see that here instead of

play20:34

defining a chain we first actually just

play20:36

register the modules each module

play20:39

contains a key mapped to the module

play20:42

value itself so here is query pipeline

play20:45

um here's llms prompts retrievers

play20:48

summarizers free

play20:50

rankers so let's run

play20:53

that the next step is to draw links

play20:56

between these modules um and instead of

play20:59

again kind of like defining a chain

play21:00

we'll just manually Define the edges

play21:02

between these

play21:03

modules we'll link the prompt to the LM

play21:06

similar to before and then the LM to the

play21:08

retriever right so inputs flow through

play21:10

the prompt to the LM gets Rewritten into

play21:12

a query that query goes into the

play21:15

retriever this retriever the output of

play21:17

this retriever goes into the re ranker

play21:19

but the main thing is the re ranker

play21:21

actually takes in uh two inputs right it

play21:24

takes in the nodes from the retriever

play21:26

but it also takes in the original input

play21:28

query so actually when you're defining

play21:30

these links to the ranker you you need

play21:32

to explicitly specify which input is

play21:34

corresponding to and this is what you do

play21:36

through the destination keys so the desk

play21:38

key nodes is the um you know the ranker

play21:42

expects nodes and this is what the

play21:43

retriever supplies the rerer also

play21:46

expects query strings and that's what

play21:47

the LM supplies so let's run

play21:53

that you can take a look at the

play21:55

summarizer required input Keys which is

play21:57

again query string and

play22:01

nodes next we can actually visualize

play22:03

this overall graph and this is a nice

play22:07

property of a query pipeline which is

play22:08

because it's declarative and you define

play22:10

this as the overall dag you can easily

play22:12

visualize this um here's an input prompt

play22:16

to an

play22:17

llm and then the llm goes to you know a

play22:20

retriever um this retriever goes to a

play22:23

ranker along with the input from the LM

play22:26

and then the summarizer takes the

play22:28

outputs of the the ranker as well as the

play22:33

alline let's run

play22:36

this you'll see we've run all the

play22:38

modules end to end and we get back a

play22:41

final

play22:47

response

play22:49

great the next step is to show you what

play22:51

a rag pipeline looks like without query

play22:53

rewriting which is and sounds very

play22:56

similar um and actually the main reason

play22:58

we show you this is the use of a special

play23:00

component called the input component so

play23:03

in this case we do from llama index.

play23:05

query pipeline import input

play23:08

component and the main use for the input

play23:11

component is in cases where you actually

play23:13

want to link the input to multiple

play23:15

Downstream modules instead of just one

play23:18

Downstream module in that case you

play23:20

actually need basically a special

play23:22

placeholder component um that allows you

play23:24

to you know register the input as a

play23:26

module so here we register it under

play23:29

input as a key to find the input

play23:31

component and then we add a link from

play23:33

the input to multiple downst modules to

play23:36

set up you know a rag pipeline the input

play23:38

which is the query needs to be linked to

play23:40

the retriever but also the downstream

play23:43

response inthis module right like both

play23:45

of these take in the input and this

play23:47

essentially provides you a placeholder

play23:48

variable to do

play23:50

this so we run

play23:53

this and then run this rag pipeline what

play23:57

did the author do in

play24:00

YC and this is the classic rag pipeline

play24:03

you know without query re uh rewriting

play24:05

or even reranking and then you get back

play24:07

in

play24:13

output the next section here is learning

play24:15

how to define a custom component in a

play24:18

query pipeline um there's actually a

play24:20

variety of ways to do this um you can uh

play24:24

here we show you how to subclass uh

play24:26

custom query components

play24:28

um and so you define a subass and

play24:30

Implement a set of required functions

play24:32

and so this is the most comprehensive

play24:34

way of doing so and allows you to input

play24:36

import uh or Define stuff like uh

play24:39

validation input keys and so on and so

play24:41

forth there's also a simpler syntax

play24:43

where you can actually just pass in an

play24:45

arbitrary function into a function

play24:46

component I'll show you a little bit of

play24:48

that it's not defined in this notebook

play24:50

but will be defined in subsequent

play24:53

notebooks here to subass a custom query

play24:56

component you have a uh we want a subass

play24:59

to define a related movie

play25:01

component the main methods you need to

play25:03

implement are validate component inputs

play25:05

input Keys output keys and run component

play25:09

run component is where you define the

play25:11

core

play25:12

logic um input Keys is where you define

play25:14

the expected input

play25:16

variables um you also can call validate

play25:19

component inputs to um decide whether or

play25:23

not the inputs passed in from other

play25:24

modules are valid so you can throw an

play25:26

error if they're not and it's

play25:28

implemented similar to a pantic

play25:29

validator so the input just gets passed

play25:32

through and you can modify along the

play25:34

way you also Define a set of output Keys

play25:37

um to define the number and the type of

play25:40

outputs that you

play25:41

have so after you define that you can

play25:43

Define run component and in this case we

play25:46

just use the simple query pipeline that

play25:49

we Define in the very first section

play25:50

which again is just a prompt and Ln uh

play25:53

given an input movie name just you know

play25:55

run it through a chain of prompt an Ln

play25:57

to give you back an output you see we

play26:00

pass in the input key which is in kar's

play26:03

movie we run this Pipeline and then we

play26:05

return this as a dictionary uh the

play26:08

dictionary contains the output Keys

play26:09

which is just output and then the output

play26:12

value so let's run

play26:15

this and then actually you know right

play26:18

below this I show you a very dumb simple

play26:20

example of what a function component

play26:22

looks like which also is a way of

play26:24

defining custom components but is way

play26:25

simpler um all you have to do is import

play26:28

function component Define any sort of

play26:30

function you want right this is a dummy

play26:33

Fu function and then just Define a

play26:35

function component over it what this

play26:37

will do is it'll Auto inspect the

play26:39

function signature and Define those as

play26:41

input Keys it'll inspect the output

play26:44

signature and then um it'll just run

play26:46

your function as a component so we'll

play26:48

show this more in later um videos and

play26:51

tutorials as we set up more advanced

play26:53

stuff but just keep this in mind for the

play26:56

future so let's try this custom

play26:59

component out we'll run this custom

play27:01

component which again try read for lated

play27:03

movies and then also just run it through

play27:05

a prompt afterwards that says can you

play27:07

rewrite this in the voice of Shakespeare

play27:09

right so at this point we all know the

play27:10

drill we take in a component um run it

play27:13

generates for related movies passes into

play27:15

this prompt and then an LM and this is

play27:17

back to our favorite chain syntax and

play27:20

then um once we Define this module and

play27:23

then uh run this overall

play27:26

pipeline we're able to get back a list

play27:29

of movies um whose summaries are

play27:31

actually uh written in a Shakespeare

play27:40

invoice so this finish running and we

play27:44

can see this is what the output looks

play27:46

like um it outputs a list of movies and

play27:48

this is the

play27:52

output okay last but not least we'll

play27:56

Define um we'll show show you how to do

play27:58

async and parallel execution with our

play28:00

query pipelines one of the benefits of

play28:02

query pipelines is the fact that you can

play28:05

run them in an async or parallel fashion

play28:08

and we'll handle the coordination under

play28:10

the hood without you having to worry

play28:12

about whether or not your function is

play28:14

truly as Synchro parallel so we try to

play28:16

run it in the most optimal manner this

play28:18

improves latency um and takes advantage

play28:20

of your system so to really take

play28:23

advantage of this feature we want to

play28:26

Define an ensemble retrieval example

play28:28

where we actually take in an input and

play28:31

send it in parallel to four different

play28:33

rag engines uh take the results and then

play28:35

combine them into one response and so

play28:38

we'll send this uh we'll Define this toy

play28:41

example which is similar to Ensemble

play28:43

retrieval or some sort of fusion and

play28:45

we'll Define a different query engine

play28:48

over the same data one per chunk size so

play28:51

if our chunk sizes are 128 256 512 and

play28:54

1024 we'll Define four separate rag

play28:56

Pipelines over these trunk sizes and

play28:59

then send an input to all four of these

play29:01

in parallel and then combine the results

play29:02

at the end this code snippet just takes

play29:06

each chunk size and uh splits up the

play29:09

document as such and defines it as a

play29:11

separate Factor store and then stores

play29:14

everything in a query engine dictionary

play29:16

so the query engine dictionary is just a

play29:18

mapping from the chunk size string um to

play29:21

the output query engine right so you'll

play29:23

have four elements after running this uh

play29:26

cell

play29:32

so the key again is keyed by trunk size

play29:34

and then the output is a query

play29:38

engine next we'll construct a query

play29:41

pipeline you'll see that the set of

play29:43

modules we Define include the query

play29:45

engines that we defined above so the IDS

play29:47

are the chunk size um and then the the

play29:50

value is the corresponding query engine

play29:53

we'll Define an input component a

play29:55

summarizer component and and then what

play29:57

we call a join component you'll see this

play29:59

in the graph visualization in just a bit

play30:02

but the join component takes in an

play30:04

arbitrary set of input edges from other

play30:07

modules uh from Upstream modules and

play30:10

then similar to the pack you know syntax

play30:12

in Python where you can take in some

play30:13

arbitrary components via star arcs and

play30:16

packs it into a list this does something

play30:18

similar so it takes in an arbitrary

play30:20

number of edges and packs it into a list

play30:22

of objects and then this list of objects

play30:24

is a single output that you can then

play30:26

pass to other down

play30:28

modules what happens here is first we

play30:32

will add a link from the input to every

play30:34

trunk size um and again every trunk size

play30:36

is the query engine right so we pass the

play30:39

input to every to the four query engines

play30:42

we then take the outputs of the these

play30:44

query engines and Link them to the join

play30:47

module so this join module is taking the

play30:49

outputs from all the query

play30:52

engines it'll pack them into a list so

play30:54

now you have a list of different nodes

play30:56

there's actually convert function which

play30:58

will convert the outputs into a node

play31:00

with score right so this is element wise

play31:02

and convert it into a list of nodes and

play31:05

it'll pass this list of nodes to the

play31:06

summarizer right with the destination

play31:08

key

play31:09

nodes the summarizer of course also

play31:12

takes in the input

play31:14

query so let's run this and let's

play31:17

visualize this right just to make this

play31:20

Crystal Clear we take a look at this

play31:23

input um here is the

play31:26

input

play31:29

it links to chunk sizes 128 256 512 1024

play31:33

these are all separate query

play31:36

engines all of these outputs flow into

play31:39

the join module Which packs it into a

play31:41

list this list goes into the summarizer

play31:43

which takes in the set of nodes as well

play31:45

as the input and then returns an

play31:48

output so now that we visualize this

play31:50

let's run this we'll see that by running

play31:54

this in an async fashion um a weight p.

play31:57

aun it actually out runs very quickly um

play32:01

if you can tell this actually ran all

play32:03

four of these Retrievers in parallel the

play32:05

128 256

play32:08

5224 all these query engines were run in

play32:12

parallel and the final output is 3.53

play32:15

seconds right and the benefit of running

play32:18

all these in parallels each of these

play32:19

actually requires an LM call so it's not

play32:21

just retrieval um so it does

play32:23

legitimately take some

play32:25

time in the meantime if if you compare

play32:27

this with the synchronous

play32:29

method you'll see that it does run

play32:32

everything sequentially right 128 256

play32:36

512 doesn't take advantage of

play32:38

paralyzable function

play32:40

calling and then at the end of the day

play32:42

it takes 8.4

play32:44

seconds so that's all the sections in

play32:47

this introductory query pipeline video

play32:50

and of course we'll highlight more

play32:51

advanced use cases and other components

play32:53

and exciting features in subsequent

play32:54

videos but thank you for listening and

play32:57

take

play32:59

care

Rate This

5.0 / 5 (0 votes)

相关标签
クエリパイプラインデータベース検索自然言語処理機械学習検索最適化並列処理非同期実行カスタムコンポーネント可視化効率化
您是否需要英文摘要?