Reliable, fully local RAG agents with LLaMA3

LangChain
19 Apr 202421:19

Summary

TLDRランスがL chain So Metaからお届けするビデオでは、最新の「llama 3」を利用してローカルで実行可能な信頼性の高いエージェントを構築する方法について解説しています。llama 3は8億パラメータモデルで、強力な性能を誇ります。ビデオでは、3つの学術論文のアイデアを取り入れ、複雑で興味深いRAG(Retrieval-Augmented Generation)フローを作成します。このフローは、問い合わせの内容に基づいてベクターストアまたはウェブ検索にルーティングし、ドキュメントの関連性に基づいてフェールバックを導入します。さらに、生成された内容が幻覚を含んでいるかどうか、また元の問い合わせに関連しているかどうかを自らチェックし、必要に応じてウェブ検索にフォールバックします。ランスは、Mac M2 30 32GBという比較的手頃なサイズのノートパソコンで、この複雑なRAGフローを実行し、ローカルで確実に動作するデモンストレーションを行います。

Takeaways

  • 🚀 ランチャインのLanceが、新しいLlama 3モデルの登場について話している。これは8億パラメータのモデルで、性能が非常に優れているとされています。
  • 💡 Lanceはローカルで実行できる信頼性の高いエージェントの構築方法について説明し、Mac M2 30 32GBのような通常サイズのラップトップで実行できると述べています。
  • 🔍 3つの学術論文からアイデアを取り入れ、複雑で面白く、かつ強力なRAG(Retrieval-Augmented Generation)フローを作成する予定です。
  • 🛤️ Adaptive RAGからのルーティングアイデアを取り入れ、質問の内容に基づいてベクターストアまたはウェブ検索にルーティングします。
  • 🔄 Fallbackの概念を取り入れ、ベクターストアに関連性のある質問にはドキュメントの検索と評価を行い、そうでない場合はウェブ検索にフォールバックします。
  • 🔧 修正型RAGのアイデアを取り入れ、生成された内容が幻覚を含んでいるか、元の質問に関連しているかをチェックし、必要に応じてウェブ検索にフォールバックします。
  • 🧠 エージェントとはタスクを小さなサブゴールに分割し、計画し、メモリを持つことができる存在であり、Lilian Wangのブログポストで説明されています。
  • 🤖 Reactフレームワークを使用してエージェントを構築する際には、計画、観察、思考、そして次のアクションの選択を行います。
  • 📈 L graphを使用することで、事前に制御フローを定義し、LLMが特定のタスクにのみ焦点を当てることができます。これにより、ローカルで実行される小さなLLMでも高い信頼性を確保できます。
  • 🔩 Lanceは、制御フローを定義することで、より高い柔軟性を持つReactエージェントと比較して、L graphを使用したエージェントは制約されたパスしか辿りませんが、信頼性と互換性が向上します。
  • 📝 コードの実行結果を示し、Lanceは実際にローカルでLlama 3を使用して、問い合わせに応じてドキュメントを検索、評価、生成し、ウェブ検索にフォールバックする複雑なエージェントを構築しました。
  • 🔬 Lanceは、生成された内容をドキュメントと照らし合わせて、幻覚や質問への関連性に基づいて評価し、必要に応じてウェブ検索にフォールバックする2つの新しいグラダーを追加しました。

Q & A

  • ラマ3がどのような性能向上を提供しているのか説明してください。

    -ラマ3は、8億パラメータモデルの性能特性が非常に優れており、以前のmrawよりも多くの指標やベンチマークで若干優れているとされています。

  • ローカルで実行できる信頼性の高いエージェントとは何ですか?

    -ローカルで実行できる信頼性の高いエージェントとは、計画性があり、タスクを小さなサブゴールに分け、メモリ、チャット履歴、ベクターストアを使用し、ツールを利用できるものです。

  • ラマ3で使用するプロンプトフォーマットにはどのような注意点がありますか?

    -ラマ3には特定のプロンプトフォーマットがあり、そのフォーマットに注意して提示する必要があります。また、AMAのJsonモードを使用することで、LLMからの出力をJson形式で得ることができます。

  • どのようにしてドキュメントのインデックスを作成するのか説明してください。

    -ドキュメントのインデックスは、好きなウェブページやブログ記事を3つ選んで、chunk sizeを設定し、Chromaというローカルベクターストアを使用して作成します。

  • リトリーブ・グレーダーの役割は何ですか?

    -リトリーブ・グレーダーは、リtrieveしたドキュメントを質問に対する関連性に基づいてグレードし、その結果をJsonで返します。これにより、ドキュメントの関連性を判断し、次に適切なアクションを決定することができます。

  • 生成(Generation)とは何ですか?

    -生成とは、リtrieveしたドキュメントと質問をLLMに入力して、回答を生成するプロセスです。カスタムのRAGプロンプトを使用して、迅速に回答を生成することが可能です。

  • 検索ツールとは何ですか?また、どのようにしてウェブ検索を行うのですか?

    -検索ツールは、ウェブを検索するためのツールであり、タブ(Tab)のようなクイック検索ツールが挙げられます。検索APIを利用して、ウェブ検索を行い、検索結果を状態に追加します。

  • グラフ状態とは何ですか?また、どのようにして制御フローを定義するのですか?

    -グラフ状態とは、エージェントの制御フロー全体を通じて保持したい情報を保持する状態であり、グラフの各ノード(関数)が状態を変更します。制御フローは、事前にエンジニアによって定義され、LLMは各ステップで特定のタスクしか実行しません。

  • リアクティブエージェントと制御フローエージェントの違いは何ですか?

    -リアクティブエージェントは、各ステップで決定を行うため、低い信頼性と柔軟性があります。一方、制御フローエージェントは、事前に制御フローが定義されるため、高い信頼性と限定的な柔軟性を持っています。

  • どのようにして自己修正(Self-correction)を実行するのか説明してください。

    -自己修正は、生成された回答が妄想を含んでいないか、元の質問に対する回答かどうかをチェックするプロセスです。妄想を検出すると、再度ウェブ検索にフォールバックします。また、回答が質問に関連しているかどうかもチェックし、関連性がない場合はウェブ検索にフォールバックします。

  • ルーターとは何ですか?また、どのようにして質問をルーティングするのですか?

    -ルーターは、質問の内容に基づいてベクターストアまたはウェブ検索にルーティングする機能です。質問がベクターストアに関連するトピックであればベクターストアを使用し、そうでなければウェブ検索にフォールバックします。

  • このスクリプト全体を通じて、どのような主要なポイントが強調されていますか?

    -スクリプト全体を通じて強調されている主要なポイントは、ローカルで実行できる信頼性の高いエージェントの構築方法、制御フローの定義、およびラマ3を使用した複雑なRAGフローの実装です。また、これらのエージェントがローカルで実行され、そのプロセスが理解しやすく、トレースを通じて監視できることが示されています。

Outlines

00:00

🚀 ランチェインのLanceがLLaMa 3の登場とローカルで実行可能なエージェントの構築について語る

ランチェインのLanceが、LLaMa 3のリリースについて話しています。彼は、この新しい言語モデルを使ってローカルで実行可能な信頼性の高いエージェントを構築し、自分のMac M2 30 32ギガのラップトップで実行できると述べています。Lanceは、3つの学術論文からアイデアを取り入れ、複雑で興味深いRAG(Retrieval-Augmented Generation)フローを作成し、それがどのように機能するかを説明しています。また、エージェントとは何か、そしてどのようにしてタスクを小さなサブゴールに分け、メモリとツールを使って実行するかについても語っています。

05:03

🛠️ LLaMa 3を使ったローカルモデルの構築とテスト

Lanceは、LLaMa 3を使ったローカルモデルの構築方法を説明し、実際にコードを使ってデモンストレーションしています。まず、ウェブページのインデックスを作成し、次にドキュメントの関連性を評価するGRADERを構築します。LLaMa 3を使って、ドキュメントと質問を入力して生成(Generation)を行い、その後、ウェブ検索ツールを定義します。さらに、各ノードを関数としてラップし、状態を更新することで、制御フローを構築しています。

10:05

🔍 ドキュメントの関連性とウェブ検索によるフィルタリング

Lanceは、ドキュメントの関連性に基づいてフィルタリングし、ウェブ検索を行う方法を説明しています。各ノードが状態をどのように変更するか、そしてエッジを通じて状態に基づいて次のステップを決定する条件式を実装する方法についても詳述しています。ウェブ検索が必要な場合や、生成されたテキストがドキュメントに基づいていない場合に、ウェブ検索を実行するようにエージェントを設定しています。

15:06

🔄 条件付きエッジを使った複雑なRAGフローの実装

Lanceは、条件付きエッジを使って複雑なRAGフローを構築し、実行する方法を紹介しています。生成されたテキストが疑わしい(hallucinations)かどうか、また質問に関連しているかどうかをチェックする2つの新しいグレーダーを追加し、それらの結果に基づいて次のステップを決定する条件式を実装しました。また、ルーターを追加して、質問の内容に応じてベクターストアかウェブ検索を選択するようにエージェントを設定しました。

20:07

📈 LLaMa 3を使ったローカルで実行可能な複雑なRAGフローのデモンストレーション

Lanceは、LLaMa 3を使用してローカルで実行される複雑なRAGフローを実際に実行し、その結果を示しています。質問に応じてベクターストアからドキュメントを取得し、それらを評価、フィルタリングし、生成されたテキストを評価し、必要に応じてウェブ検索を行って、最終的な回答を生成するプロセスを説明しています。全体的な実行時間も短く、ローカルで実行可能であることが強調されています。

Mindmap

Keywords

💡LLaMa 3

LLaMa 3は、最近発表されたパラメータ数8億の言語モデルです。このモデルは、多くの基準やベンチマークにおいて優れた性能を示しており、Lanceがローカルで実行し、信頼性の高いエージェントを構築するために使用しています。

💡エージェント

エージェントとは、タスクを小さく分割し、計画的に進める能力を持つソフトウェアです。このビデオでは、LanceがLLaMa 3を使用して、ローカルで実行できる信頼性の高いエージェントを構築しています。

💡RAG(Retrieval-Augmented Generation)

RAGは、ドキュメントの情報を参照しながらテキストを生成する手法です。ビデオでは、RAGを使用して、質問に応じた回答を生成するエージェントを構築しています。

💡Vector Store

ベクトルストアは、テキスト情報を埋め込みベクトルとして保存するデータベースです。ビデオでは、質問に関連するドキュメントを検索するためにベクトルストアを使用しています。

💡Web Search

Web Searchは、インターネット上の情報を検索するプロセスです。ビデオのエージェントでは、ベクトルストアに関連性のない質問に対して、ウェブ検索を行って情報を探すことができます。

💡Control Flow

Control Flowは、プログラムの実行順序を決定する仕組みです。ビデオでは、事前に決定したControl Flowを使用して、エージェントが正確にタスクを実行するように制御しています。

💡Graph State

Graph Stateは、状態情報をグラフ構造で管理する方法です。ビデオでは、Graph Stateを使用して、エージェントの状態を制御フロー全体で持続させています。

💡Reactivity

Reactivityは、エージェントが状況に応じて即座に意思決定と行動をとる能力です。ビデオでは、Reactivityの代わりにControl Flowを使用することで、より高い信頼性を確保しています。

💡Self-correction

Self-correctionは、生成されたテキストが正確かつ有用かどうかを自己検査するプロセスです。ビデオでは、生成されたテキストが妄想を含んでいないかどうか、また質問に関連しているかどうかをチェックしています。

💡Router

Routerは、エージェントが質問に応じてベクトルストアかウェブ検索どちらを使用するかを決定する機能です。ビデオでは、Routerを使用して、質問の内容に基づいて最適な検索方法を選びます。

💡Local Execution

Local Executionは、ソフトウェアがリモートサーバーではなくローカルコンピュータ上で実行されることを指します。ビデオでは、LLaMa 3を使用したエージェントをローカルで実行し、その信頼性とパフォーマンスを示しています。

Highlights

Lance from L chain So Meta discusses the release of LLaMa 3, a highly anticipated model with strong performance characteristics.

Introduction to building reliable agents using LLaMa 3 that can run locally on a laptop.

LLaMa 3 outperforms mraw on popular benchmarks, indicating its potential for superior results.

Proposed approach combines ideas from three sophisticated research papers to create a complex and reliable agent flow.

Implementation of routing from the Adaptive Rag Paper to direct questions to either a vector store or web search.

Incorporation of a fallback mechanism for retrieval from the vector store and web search based on question relevance.

Self-correction or checking of generations for hallucinations and relevance to the original question.

Introduction of a control flow that allows the LLM to perform specific tasks within predefined steps, enhancing reliability.

Comparison between React-style agents and Lang graph agents in terms of reliability and flexibility.

Demonstration of how to use LLaMa 3 with a local language model to build an index of web pages.

Use of GPD embeddings for local models and the integration with the Lang chain nomic partner package.

Lance's personal laptop, a Mac M2 with 32GB, is used to run the complex agent flow, showcasing its local capabilities.

Explanation of an agent's components, including planning, memory, and tool use, as described by Lilian Wang.

Construction of a corrective Rag flow using custom LLaMa 3 prompts and the use of a local vector store.

Real-time tracing and inspection of the agent's operations using lsmith, providing transparency into the agent's decision-making process.

Inclusion of a router to decide between vector store and web search based on the content of the question.

Successful execution of the complex agent flow, including routing, retrieval, grading, and generation, all within 14 seconds on a local laptop.

Public availability of the code for others to experiment with and replicate the demonstrated agent flow.

Transcripts

play00:00

hi this is Lance from L chain So Meta

play00:03

llama 3 came out today which is super

play00:04

exciting and something that I've been

play00:05

waiting for for a while and I want to

play00:07

hop on here and H and talk about how to

play00:09

build uh reliable agents using llama 3

play00:12

that can actually run on your laptop so

play00:14

it can run locally now just for a quick

play00:17

kind of refresher here llama 3 just

play00:19

dropped today we can see looking at the

play00:21

performance characteristics for the 8

play00:22

billion parameter model they're very

play00:25

strong so I've done a lot of work with

play00:26

mraw which was previously kind of my

play00:28

go-to and it looks like on a number of

play00:31

popular metrics or benchmarks you know

play00:33

met llama 3 is is indeed a bit better so

play00:37

again I haven't tested this yet this is

play00:39

kind of like a first dry run but uh it's

play00:41

really exciting so to convince you that

play00:44

we can build local and reliable agents

play00:47

I'm going to pick ideas from three

play00:48

different rag papers they're all pretty

play00:51

sophisticated and they're going to kind

play00:52

of roll up into this pretty in this

play00:54

pretty uh kind of interesting and

play00:56

complex rag flow so we're going to do

play00:58

routing from the adaptive rag paper

play01:01

which will basically take a question

play01:02

route it to either a vector store or to

play01:05

web search based on the content of the

play01:06

question where they're going to

play01:08

introduce the idea of fallback so

play01:10

basically we're going to do retrieval

play01:11

from our Vector store if the question's

play01:13

relevant to the vector store we're going

play01:14

to grade our documents if they're not

play01:16

relevant to the question we're going to

play01:18

fall back and do web search so that's an

play01:20

idea from this corrective rag paper and

play01:23

then we're also going to do

play01:24

self-correction or checking of the

play01:25

generations to see if they're if they

play01:27

contain hallucinations and if they're

play01:29

relevant to the to the original question

play01:32

and if they're not we'll fall back and

play01:33

do web search so the point is we're

play01:35

going to implement an interesting

play01:36

complex raglow we're going to show we

play01:38

can run this reliably and locally on my

play01:40

laptop I have a Mac M2 30 32 gigs so it

play01:44

is a reasonably sized laptop but it's

play01:45

not insane um so first and foremost

play01:49

what's an agent and this is kind of

play01:50

controversial in itself but a really

play01:52

good blog post from Lilian Wang lays out

play01:53

an agent is something that has planning

play01:56

so it can break down task into smaller

play01:57

sub goals or subtasks it has memory

play02:00

um chat history your long-term memory in

play02:02

a vector store and it can use

play02:04

tools now let's just say we want to use

play02:06

an agent to build corrective rag which

play02:08

is that middle blue thing we talked

play02:10

about right so typically when people

play02:12

think about an agent they immediately

play02:13

say oh you know react that's like a very

play02:16

popular framework for building agents um

play02:19

and it typically involves a flow looks

play02:20

like this for planning uh the LM will

play02:23

select an action observe the result

play02:25

think and then choose the next action um

play02:28

and again you know rea agents typically

play02:30

will use memories chat history or vector

play02:32

store and of course can use different

play02:34

tools so if I want to do this above flow

play02:37

as a react agent it would look like this

play02:39

I would take my question it would I was

play02:41

first perform an action like use my

play02:43

Vector story to get documents I would

play02:45

then observe the documents I would say

play02:47

okay I need to think about grading them

play02:49

and then I would go back to my action

play02:51

and then I would you choose the grader

play02:52

tool and then I would kind of go in this

play02:55

Loop until hopefully I follow this

play02:57

trajectory as laid out here right

play03:00

so that's kind of how it work with a

play03:01

react agent now I want to introduce

play03:04

another idea from implementing this that

play03:08

uh you can basically lay this out as a

play03:09

control flow so instead of having an

play03:11

agent make a decision at every step in

play03:14

this like kind of in this Loop instead

play03:16

we're going to layout as the engineer

play03:18

ahead of time here's the control flow I

play03:20

want my agent to take every time it's

play03:22

run so I'm basically taking this uh kind

play03:25

of like planning away from the llm and

play03:27

I'm actually creating a control flow

play03:30

that I'm defining and what's nice is

play03:32

that the llm then only has specific

play03:35

tasks within each of these steps so you

play03:39

know in terms of planning I'm laying out

play03:40

a control flow ahead of time in terms of

play03:43

memory I can basically use what I'm

play03:45

going to call a graph state to

play03:46

persistent information across this

play03:48

control flow um and of course it can be

play03:51

relevant talk things relevant to rag

play03:54

like documents question and in terms of

play03:57

tool use you know each uh graph note

play03:59

like we talked about we can use a

play04:01

different tool like the vector s

play04:03

retrieval we'll just use a retriever

play04:04

tool um the greater we'll use a greater

play04:06

tool um you know the web search will use

play04:09

a web search tool so you know again

play04:11

thinking about this what are the

play04:12

tradeoffs react of course one of the big

play04:15

challenges is that you get lower

play04:17

reliability when you have this react

play04:19

style flow the agent has to make the

play04:21

correct decision at every point and this

play04:23

is kind of when you can see things go

play04:25

off the rails and get off track

play04:27

particularly with small llms whereas

play04:29

with Lang graph you're actually laying

play04:30

out this flow ahead of time so the the

play04:32

agent effectively always traverses this

play04:34

path every time the llm doesn't have to

play04:36

make choices about you know which node

play04:38

to go to

play04:39

next uh in kind of an unconstrained way

play04:43

um now in terms of flexibility a react

play04:44

agent would be more flexible so it could

play04:46

choose any sequence of actions through

play04:49

this given these tools whereas the

play04:52

control flow that I lay out with L graph

play04:54

is constrained so it only ever traverses

play04:56

this path um but we'll see because of

play05:00

this constrained kind of control flow uh

play05:03

this these landcraft agents are very are

play05:05

compatible and actually quite reliable

play05:06

with with local and smaller llms and

play05:08

that's kind of one of the main benefits

play05:09

I want to kind of bring home to you

play05:11

today so let's actually get to the code

play05:13

let's kick this off and start with the

play05:15

the corrective rag piece so that's kind

play05:17

of the middle piece of this overall

play05:18

agent we want to build and what I'm

play05:20

going to do is I'm just going to take a

play05:21

few of these components and just test

play05:23

them individually so you can see them

play05:24

working so I have a notebook here a few

play05:26

pip installs um here's again my flow to

play05:29

reference now for local models for

play05:31

embeddings I'm going to use GPD for all

play05:33

uh which is you know from Lang chain

play05:35

nomic I I uh we have a partner package

play05:38

with them with nomic and uh I really

play05:40

like these embeddings they're really

play05:41

good now AMA just came out it's

play05:44

available uh sorry llama 3 just came out

play05:46

it's available on AMA uh and all I have

play05:49

to do is AMA pull llama 3 um and the

play05:52

only other thing I'm going to reference

play05:53

is occasionally met llama 3 has a

play05:56

particular prompt format which we have

play05:58

to pay attention to so that's really it

play06:00

now let's kick this off I'm going to

play06:01

choose a local LM I'm going to say llama

play06:03

3 and first I'm going to do I'm just

play06:06

going to build an index so I'm going to

play06:08

build an index of three web pages uh

play06:10

blog post that I like and I'm going kick

play06:12

that off I'm setting a chunk size um I'm

play06:15

using chroma local Vector store and that

play06:18

all ran so cool now I have an index so

play06:21

that's this piece right basically the

play06:22

index is the key component of my rag

play06:24

flow I need to be able to retrieve

play06:26

documents then now I'm going to get into

play06:29

some fun stuff here I want a retrieval

play06:32

grader so that is this piece I want the

play06:34

be to retrieve documents and grade them

play06:36

for relevance relative to my question so

play06:38

that's what's happening here now here's

play06:40

where llama 3 comes in and I'm not going

play06:41

to use something really convenient I set

play06:43

my local LM to llama 3 AMA has Json mode

play06:47

which confirms that the output from the

play06:48

llm is Json so my prompt basically just

play06:52

says grade the documents and return a

play06:55

Json with score yes no that's it so I'm

play06:59

going to do a mock retrieval so I'm

play07:00

going to say my question is Agent memory

play07:03

I can just call invoke here actually uh

play07:05

on my Vector store it's a little bit

play07:07

more convenient um I can kick that off

play07:09

so this is now running now one other

play07:12

thing I did I set tracing in lsmith so I

play07:16

actually and that already finished but

play07:18

what's nice is you'll see me reference

play07:19

over here um when this runs um as this

play07:22

is running I can actually inspect what's

play07:24

happening under the hood so we've called

play07:25

a shadow llama and we get this nice Json

play07:28

out that's great that's exactly what we

play07:30

want cool so that's our grader now for

play07:33

Generation I'm just going to do good Old

play07:35

Rag so again I have some custom rag

play07:38

prompt here nothing too unusual I'm

play07:40

still going to use uh llama 3 of course

play07:43

and I'm basically just taking my

play07:46

documents and my question I'm Plumbing

play07:47

them into llama 3 there you go so you

play07:49

can see it runs pretty quick let's

play07:50

actually check over

play07:53

here um so I can see my the time is

play07:56

around 4 seconds not bad and this is

play07:59

pretty cool so I can look at the my my

play08:02

prompt which contains the documents and

play08:04

the output great so we're rolling here

play08:06

this is pretty good

play08:09

um and okay we got our index we got our

play08:12

grater we got our generation done uh I'm

play08:15

going to find a Search tool as well so

play08:17

this is basically just a tool that I'm

play08:18

going to use to query the web I like tab

play08:21

for this it's kind of a really nice

play08:23

Quick Search tool and here's where I'm

play08:25

going to um basically Define My Graph so

play08:29

all that's happening here is each of our

play08:32

uh I'll should go up and show you here

play08:35

so each of these pieces these green

play08:37

things let's call them nodes so each of

play08:39

these is just a function okay so this

play08:42

first node retrieve documents I'm just

play08:44

going to wrap that as a function that's

play08:45

called

play08:46

retrieve uh this generate I'm just going

play08:49

to wrap this a function

play08:51

generate grade documents again it's just

play08:53

going to be a function now what you're

play08:55

going to see here is each of these

play08:56

functions these are basically the nodes

play08:58

of my graph I'm going to take in the

play09:00

state I'm going to modify it in some way

play09:02

so in the retriever node all it's

play09:04

happening is I'm taking the state which

play09:06

I've defined up here this is a

play09:07

placeholder uh dictionary that contains

play09:09

my state so the way to think about this

play09:11

is the state is information that I want

play09:14

to persist across my agent so again it's

play09:16

kind of that notion of memory it's

play09:17

basically short-term memory that lives

play09:19

over the lifetime of my agent and it

play09:21

contains everything I want my agent to

play09:22

be aware of throughout this control flow

play09:24

right so for rag it's like question

play09:26

generation web search intuitive stuff

play09:28

right so so my retrieve node is just

play09:31

going to take in my question um and this

play09:33

is going to be passed from the user and

play09:35

it's just going to do a document

play09:36

retrieval so I'm actually going to use

play09:37

invoke for this it's slightly kind of a

play09:39

nicer way to do it um that's my

play09:42

retrieval node generation same deal I'm

play09:44

just going to use that rag tun we

play09:46

defined above I'm just going to call it

play09:47

but I'm going to invoke it now on my

play09:49

graph State my question and my

play09:51

documents um and you can see at every

play09:53

note here we just write uh you know in

play09:56

this case we write the question and the

play09:57

documents back out to state so we just

play10:00

update the state at each of these nodes

play10:01

that's it now grading is what I would

play10:05

call um you know another node so we're

play10:07

basically going through our documents we

play10:09

are grading them for relevance and if

play10:11

they're not relevant we're going to

play10:13

filter them out and we're also going to

play10:14

turn on this flag to say web search so

play10:16

if anyone's not relevant we'll go ahead

play10:18

and do web search and then web search is

play10:20

my final node this is basically going to

play10:23

hit my hit my search API that tab API

play10:26

and again just append those search

play10:28

documents to my state

play10:30

now here's where I'm introducing of a

play10:32

conditional Ed so all Happening Here is

play10:35

those noes we just talked about they all

play10:37

just take in state modify it in some way

play10:39

right so retrieval will'll just grab

play10:41

documents from my retriever add it to

play10:43

State grading will filter those

play10:44

documents in state now I have this

play10:46

notion of an edge where I want to make a

play10:48

decision based upon State as to where to

play10:51

go next so this is where I can Implement

play10:53

kind of interesting logic here all I'm

play10:55

doing is basically I'm taking in state

play10:57

and actually I'm doing something really

play10:59

simp simple and kind of silly here I

play11:01

previously set this flag web search

play11:03

that's going to tell it to do web search

play11:05

if any document was deemed irrelevant uh

play11:07

we set that up here in this uh grade

play11:10

documents node so what all I need to do

play11:12

is in this Edge I'm basic going to take

play11:14

in my state I'm going to see does the

play11:16

state contain web search if yes then go

play11:19

to web search if no go to generate and

play11:22

this what I return this string is just

play11:24

the name of a node right web search or

play11:26

generate that's it so we defined all our

play11:30

nodes and that's it we are kind of

play11:34

rolling

play11:35

here now I'm just going to scroll down

play11:38

uh now all we need to do is to build our

play11:42

graph and I'm going to kind of flag this

play11:46

graph

play11:47

build very

play11:50

nice and so all it's happening here is

play11:53

I'm actually just implementing the

play11:55

control flow I want so I registered all

play11:57

my nodes up here and here here's where

play11:59

I'm just setting like the order of the

play12:00

noes so you can see my entry points give

play12:02

me retrieval I'm going to go from

play12:04

retrieval to grading um and then I'm

play12:06

going to add my conditional Edge again

play12:08

it's following our diagram up here

play12:09

retrieval grading Edge and then um we

play12:13

can see here uh after web search I go to

play12:16

generate and after generate I go to end

play12:18

so that's all it's going to happen and I

play12:21

can go ahead and compile that so

play12:22

basically I can run my graph and let's

play12:24

see if this works I kick that off so you

play12:27

can see it ran our retriever so it's I'm

play12:28

basically printing out the steps as we

play12:30

go cool so we're doing our grading right

play12:31

now so we're grading our documents first

play12:34

document's relevant second document's

play12:36

relevant and I can go over to lsmith

play12:38

here and let's see what's going on so

play12:40

it's kind of chugging along so I can see

play12:42

I can really dig into this like I can

play12:44

look at every every document getting

play12:46

graded I can look at the individual

play12:47

grade prompts the individual documents

play12:49

it's all pretty nice it's all logged for

play12:50

us so that's all our grading stuff and

play12:54

it ran that's great so we just build a

play12:56

little simple agent again it has memory

play13:00

it has state it has planning it has a

play13:02

control flow um and it uses tools it's

play13:05

an agent and it ran locally all on my

play13:08

laptop so that's cool that's kind of

play13:10

step one now let's beef this up a little

play13:12

bit I can throw in the selfrag stuff

play13:15

which is what we see in our diagram up

play13:17

here in green I just need two new

play13:19

graders but it's going to use the same

play13:21

stuff we just talked about I want to

play13:22

grade the the generations for

play13:24

hallucinations I want to grade the

play13:26

generations for relevance to my question

play13:29

so let's throw in uh two additional

play13:31

graders here and um why don't I add them

play13:34

up here just for convenience um so this

play13:38

is me my hallucination grader let's kick

play13:41

that off and I'm just doing a kind of

play13:44

simple test here so all this is going to

play13:46

do is really simply um determine whether

play13:50

or not there's the the answer is

play13:52

grounded in my documents so yes no right

play13:54

if if it's grounded then yes otherwise

play13:56

no um and same here so answer grader um

play14:01

you know does my generation answer the

play14:03

question again you can look at the

play14:05

prompts I'll share all this code of

play14:06

course and so that all runs cool so we

play14:09

have that now all I need to add this is

play14:11

actually pretty simple is just one

play14:13

additional conditional Edge to My Graph

play14:16

and let's scroll down to where my edges

play14:18

are so you you before we defined decide

play14:21

to generate as a conditional Edge and

play14:22

again what was that doing that was

play14:24

making this decision the next decision

play14:26

is is this um you know hallucination

play14:29

conditional Edge so basically if my

play14:30

hallucination grader uh identifies that

play14:33

there's hallucinations I'm going to feed

play14:35

back and we will go ahead and see that

play14:39

here shortly so

play14:42

um here's additional conditional Edge

play14:45

and this is actually going to wrap in

play14:47

both of my checks so you can see what's

play14:49

happening here is first my hallucination

play14:51

grader looks at the generation R to the

play14:53

documents it gets the grade um if the

play14:56

grade is yes um then the generation is

play15:00

grounded in in the document so that's a

play15:02

good thing um and then we go ahead and

play15:05

move on to test whether or not it's

play15:07

grounded in the if relevant to the

play15:08

question um and so it's basically going

play15:11

to return three things um either the a

play15:15

generation has hallucinations I it's not

play15:17

supported by the documents um or the

play15:20

generation is in which case then it's

play15:24

either useful or it's not useful okay so

play15:27

that's kind of how we set up our

play15:29

conditional ledge and all we need to do

play15:32

now is we can update basically our graph

play15:35

build here and so what you're going to

play15:38

do is we can map from the outputs of our

play15:41

conditional Edge so remember we're

play15:43

outputting either useful not useful and

play15:45

not supported we map between those to

play15:47

the associated nodes we want to go to so

play15:50

if it's not you if it's not useful we're

play15:52

going to fall back to web search okay so

play15:54

that's basically this case so basically

play15:57

if it's not use doesn't answer the

play15:59

question we're kicking back to web

play16:01

search right so that's kind of scenario

play16:04

one uh if it's not supported we try

play16:07

again we go back to generate otherwise

play16:09

it's useful and we finish that's it so

play16:13

let's go ahead and try that so we're

play16:17

going to retrieve again we're going to

play16:19

check our document relevance again

play16:21

documents relevant so we can see our

play16:22

agent rolling here this is always kind

play16:24

of fun uh just kicking just chugging

play16:27

along and I can close this down so it's

play16:31

checking my relevance it determined

play16:33

relevance is good it's doing generation

play16:35

now so that's also cool let's also open

play16:37

that up so we can see the whole Trace in

play16:39

real time doing our grading here

play16:42

generation so we again we can like

play16:44

really dig in and then this is our this

play16:46

is our second grading step you can

play16:47

really drill drill into all these pieces

play16:49

so I really like having my traces here

play16:51

where I can actually see what's going on

play16:52

under the hood in each of these pieces

play16:54

you can see it's like really nicely laid

play16:56

out you can kind of close this stuff

play16:57

down if you don't want to see it this is

play16:59

pretty cool right um so yeah we're going

play17:02

generation then we're doing our grading

play17:04

and so this is pretty cool so it did

play17:06

generation it checked hallucinations it

play17:08

found that the generation is grounded in

play17:10

the documents um and then so the the

play17:13

hallucinations are good and then it

play17:15

found that the generation addresses our

play17:17

question so that's really cool we're

play17:19

really rolling here and we really only

play17:21

need one more piece so let's just throw

play17:23

in a router Router is pretty easy it's

play17:26

going to build on what we just talked

play17:27

about actually so router is just again

play17:30

I'm going to use Json mode but here I'm

play17:31

just going to basically say hey given

play17:33

the question given what's in my Vector

play17:35

store so I tell it the vector store has

play17:37

uh llm agents prompt engineering and

play17:39

adversarial attacks right um if the

play17:42

questions related to those topics you

play17:43

use Vector store otherwise fall back to

play17:45

web search I tell it return either

play17:47

Vector store web search that's it let's

play17:50

just do a test here make sure that

play17:51

actually works um so I'm passing a

play17:54

question related to my Vector store it

play17:57

determines yeah use the vector store

play18:00

easy um so I want one more Edge route

play18:04

questions um let me go ahead and throw

play18:07

this in here so here's all my edges

play18:09

let's just add this one cool so route

play18:12

questions so again follows ex what we

play18:14

did before we look at our question

play18:17

invoke our router depending on the

play18:19

router State um so I can probably get

play18:21

rid of these extra prints H why not I'll

play18:24

keep them um so basically if the source

play18:27

is web search then go to we Sear search

play18:29

if it's not good if it's Vector store go

play18:30

to Vector store right really simple

play18:32

stuff um cool let's go ahead and do

play18:36

that um let's now build our graph so

play18:42

we're going to set an entry point now so

play18:44

my entry points the router it's going to

play18:46

decide to go to web search or

play18:48

retriever um and then you can see this

play18:50

this control flow is the same as we had

play18:52

before that's it nice and let's try that

play18:57

so first we're going to Route the

play18:58

question so it's kind of printing that

play19:00

out and it decides to go to the vector

play19:02

store as we expect that's great and now

play19:05

it's following the same flow we talked

play19:06

about before so we've just implemented

play19:09

this final piece of the routing you can

play19:11

see it kind of went to uh it went

play19:13

correctly routed to our Vector store so

play19:15

that's great let's look at the

play19:17

trace and yeah so we can like dig into

play19:20

all this we can look at the router yeah

play19:23

so it makes the right decision let's go

play19:24

to the vector store that's

play19:26

fantastic um then it does retrieval then

play19:28

it grades the documents we saw this

play19:30

before so it's kind of nothing new it's

play19:31

still chugging along here we can look at

play19:33

the

play19:35

generation okay and it looks like it

play19:38

probably

play19:40

finished very cool so that's kind of

play19:42

everything now let's actually just s

play19:44

check this again let's let's ask a

play19:45

question like related to current events

play19:47

uh okay so we can ask a question who are

play19:49

the Bears expected to draft first in the

play19:51

NFL draft let's see if that kind of all

play19:54

flows as

play19:55

expected we can route to we'll route to

play19:57

web search

play19:59

um check for

play20:01

hallucinations we can actually look at

play20:03

our land graph Trace just to kind of

play20:05

Sandy check what's going

play20:07

on

play20:10

um cool the Bears are expected to draft

play20:12

USC star yeah Caleb Williams this is

play20:15

this is kind of the consensus pick so

play20:17

that looks great so anyway we've seen in

play20:19

relatively short period of time that we

play20:22

can build a pretty complex rag flow as

play20:25

you see right here with routing with

play20:28

retrieval grading with different

play20:30

interesting decision points a fall back

play20:32

to web search uh grading of generations

play20:34

for two different criteria we can build

play20:37

this it runs reliably it runs locally on

play20:39

my laptop it runs with llama llama 38b

play20:42

and you know we can look at you can look

play20:44

at Lang Smith traces here um we can look

play20:46

at the latencies this whole thing ran in

play20:48

14 seconds which is pretty good for

play20:50

something that's running locally on my

play20:51

laptop um and you can see this is like a

play20:53

non-trivial rag flow introducing ideas

play20:55

from three papers they were able to do

play20:57

uh all locally

play20:59

and again this idea of control flows is

play21:00

really what allows you to kind of lay

play21:02

these out in such a way that a local

play21:05

kind of agent can actually run reliably

play21:07

I think that's like really the important

play21:08

point to to kind of bring home and uh I

play21:11

encourage you to play with this I'll

play21:12

make sure the code's public but um yeah

play21:14

hopefully this is useful and feel free

play21:16

to leave any comments thanks

Rate This

5.0 / 5 (0 votes)

Etiquetas Relacionadas
LLaMa 3ローカル実行エージェント構築信頼性向上RAGWeb検索ドキュメント評価機械学習AI開発L chain So Meta
¿Necesitas un resumen en inglés?