Reliable, fully local RAG agents with LLaMA3
Summary
TLDRランスがL chain So Metaからお届けするビデオでは、最新の「llama 3」を利用してローカルで実行可能な信頼性の高いエージェントを構築する方法について解説しています。llama 3は8億パラメータモデルで、強力な性能を誇ります。ビデオでは、3つの学術論文のアイデアを取り入れ、複雑で興味深いRAG(Retrieval-Augmented Generation)フローを作成します。このフローは、問い合わせの内容に基づいてベクターストアまたはウェブ検索にルーティングし、ドキュメントの関連性に基づいてフェールバックを導入します。さらに、生成された内容が幻覚を含んでいるかどうか、また元の問い合わせに関連しているかどうかを自らチェックし、必要に応じてウェブ検索にフォールバックします。ランスは、Mac M2 30 32GBという比較的手頃なサイズのノートパソコンで、この複雑なRAGフローを実行し、ローカルで確実に動作するデモンストレーションを行います。
Takeaways
- 🚀 ランチャインのLanceが、新しいLlama 3モデルの登場について話している。これは8億パラメータのモデルで、性能が非常に優れているとされています。
- 💡 Lanceはローカルで実行できる信頼性の高いエージェントの構築方法について説明し、Mac M2 30 32GBのような通常サイズのラップトップで実行できると述べています。
- 🔍 3つの学術論文からアイデアを取り入れ、複雑で面白く、かつ強力なRAG(Retrieval-Augmented Generation)フローを作成する予定です。
- 🛤️ Adaptive RAGからのルーティングアイデアを取り入れ、質問の内容に基づいてベクターストアまたはウェブ検索にルーティングします。
- 🔄 Fallbackの概念を取り入れ、ベクターストアに関連性のある質問にはドキュメントの検索と評価を行い、そうでない場合はウェブ検索にフォールバックします。
- 🔧 修正型RAGのアイデアを取り入れ、生成された内容が幻覚を含んでいるか、元の質問に関連しているかをチェックし、必要に応じてウェブ検索にフォールバックします。
- 🧠 エージェントとはタスクを小さなサブゴールに分割し、計画し、メモリを持つことができる存在であり、Lilian Wangのブログポストで説明されています。
- 🤖 Reactフレームワークを使用してエージェントを構築する際には、計画、観察、思考、そして次のアクションの選択を行います。
- 📈 L graphを使用することで、事前に制御フローを定義し、LLMが特定のタスクにのみ焦点を当てることができます。これにより、ローカルで実行される小さなLLMでも高い信頼性を確保できます。
- 🔩 Lanceは、制御フローを定義することで、より高い柔軟性を持つReactエージェントと比較して、L graphを使用したエージェントは制約されたパスしか辿りませんが、信頼性と互換性が向上します。
- 📝 コードの実行結果を示し、Lanceは実際にローカルでLlama 3を使用して、問い合わせに応じてドキュメントを検索、評価、生成し、ウェブ検索にフォールバックする複雑なエージェントを構築しました。
- 🔬 Lanceは、生成された内容をドキュメントと照らし合わせて、幻覚や質問への関連性に基づいて評価し、必要に応じてウェブ検索にフォールバックする2つの新しいグラダーを追加しました。
Q & A
ラマ3がどのような性能向上を提供しているのか説明してください。
-ラマ3は、8億パラメータモデルの性能特性が非常に優れており、以前のmrawよりも多くの指標やベンチマークで若干優れているとされています。
ローカルで実行できる信頼性の高いエージェントとは何ですか?
-ローカルで実行できる信頼性の高いエージェントとは、計画性があり、タスクを小さなサブゴールに分け、メモリ、チャット履歴、ベクターストアを使用し、ツールを利用できるものです。
ラマ3で使用するプロンプトフォーマットにはどのような注意点がありますか?
-ラマ3には特定のプロンプトフォーマットがあり、そのフォーマットに注意して提示する必要があります。また、AMAのJsonモードを使用することで、LLMからの出力をJson形式で得ることができます。
どのようにしてドキュメントのインデックスを作成するのか説明してください。
-ドキュメントのインデックスは、好きなウェブページやブログ記事を3つ選んで、chunk sizeを設定し、Chromaというローカルベクターストアを使用して作成します。
リトリーブ・グレーダーの役割は何ですか?
-リトリーブ・グレーダーは、リtrieveしたドキュメントを質問に対する関連性に基づいてグレードし、その結果をJsonで返します。これにより、ドキュメントの関連性を判断し、次に適切なアクションを決定することができます。
生成(Generation)とは何ですか?
-生成とは、リtrieveしたドキュメントと質問をLLMに入力して、回答を生成するプロセスです。カスタムのRAGプロンプトを使用して、迅速に回答を生成することが可能です。
検索ツールとは何ですか?また、どのようにしてウェブ検索を行うのですか?
-検索ツールは、ウェブを検索するためのツールであり、タブ(Tab)のようなクイック検索ツールが挙げられます。検索APIを利用して、ウェブ検索を行い、検索結果を状態に追加します。
グラフ状態とは何ですか?また、どのようにして制御フローを定義するのですか?
-グラフ状態とは、エージェントの制御フロー全体を通じて保持したい情報を保持する状態であり、グラフの各ノード(関数)が状態を変更します。制御フローは、事前にエンジニアによって定義され、LLMは各ステップで特定のタスクしか実行しません。
リアクティブエージェントと制御フローエージェントの違いは何ですか?
-リアクティブエージェントは、各ステップで決定を行うため、低い信頼性と柔軟性があります。一方、制御フローエージェントは、事前に制御フローが定義されるため、高い信頼性と限定的な柔軟性を持っています。
どのようにして自己修正(Self-correction)を実行するのか説明してください。
-自己修正は、生成された回答が妄想を含んでいないか、元の質問に対する回答かどうかをチェックするプロセスです。妄想を検出すると、再度ウェブ検索にフォールバックします。また、回答が質問に関連しているかどうかもチェックし、関連性がない場合はウェブ検索にフォールバックします。
ルーターとは何ですか?また、どのようにして質問をルーティングするのですか?
-ルーターは、質問の内容に基づいてベクターストアまたはウェブ検索にルーティングする機能です。質問がベクターストアに関連するトピックであればベクターストアを使用し、そうでなければウェブ検索にフォールバックします。
このスクリプト全体を通じて、どのような主要なポイントが強調されていますか?
-スクリプト全体を通じて強調されている主要なポイントは、ローカルで実行できる信頼性の高いエージェントの構築方法、制御フローの定義、およびラマ3を使用した複雑なRAGフローの実装です。また、これらのエージェントがローカルで実行され、そのプロセスが理解しやすく、トレースを通じて監視できることが示されています。
Outlines
🚀 ランチェインのLanceがLLaMa 3の登場とローカルで実行可能なエージェントの構築について語る
ランチェインのLanceが、LLaMa 3のリリースについて話しています。彼は、この新しい言語モデルを使ってローカルで実行可能な信頼性の高いエージェントを構築し、自分のMac M2 30 32ギガのラップトップで実行できると述べています。Lanceは、3つの学術論文からアイデアを取り入れ、複雑で興味深いRAG(Retrieval-Augmented Generation)フローを作成し、それがどのように機能するかを説明しています。また、エージェントとは何か、そしてどのようにしてタスクを小さなサブゴールに分け、メモリとツールを使って実行するかについても語っています。
🛠️ LLaMa 3を使ったローカルモデルの構築とテスト
Lanceは、LLaMa 3を使ったローカルモデルの構築方法を説明し、実際にコードを使ってデモンストレーションしています。まず、ウェブページのインデックスを作成し、次にドキュメントの関連性を評価するGRADERを構築します。LLaMa 3を使って、ドキュメントと質問を入力して生成(Generation)を行い、その後、ウェブ検索ツールを定義します。さらに、各ノードを関数としてラップし、状態を更新することで、制御フローを構築しています。
🔍 ドキュメントの関連性とウェブ検索によるフィルタリング
Lanceは、ドキュメントの関連性に基づいてフィルタリングし、ウェブ検索を行う方法を説明しています。各ノードが状態をどのように変更するか、そしてエッジを通じて状態に基づいて次のステップを決定する条件式を実装する方法についても詳述しています。ウェブ検索が必要な場合や、生成されたテキストがドキュメントに基づいていない場合に、ウェブ検索を実行するようにエージェントを設定しています。
🔄 条件付きエッジを使った複雑なRAGフローの実装
Lanceは、条件付きエッジを使って複雑なRAGフローを構築し、実行する方法を紹介しています。生成されたテキストが疑わしい(hallucinations)かどうか、また質問に関連しているかどうかをチェックする2つの新しいグレーダーを追加し、それらの結果に基づいて次のステップを決定する条件式を実装しました。また、ルーターを追加して、質問の内容に応じてベクターストアかウェブ検索を選択するようにエージェントを設定しました。
📈 LLaMa 3を使ったローカルで実行可能な複雑なRAGフローのデモンストレーション
Lanceは、LLaMa 3を使用してローカルで実行される複雑なRAGフローを実際に実行し、その結果を示しています。質問に応じてベクターストアからドキュメントを取得し、それらを評価、フィルタリングし、生成されたテキストを評価し、必要に応じてウェブ検索を行って、最終的な回答を生成するプロセスを説明しています。全体的な実行時間も短く、ローカルで実行可能であることが強調されています。
Mindmap
Keywords
💡LLaMa 3
💡エージェント
💡RAG(Retrieval-Augmented Generation)
💡Vector Store
💡Web Search
💡Control Flow
💡Graph State
💡Reactivity
💡Self-correction
💡Router
💡Local Execution
Highlights
Lance from L chain So Meta discusses the release of LLaMa 3, a highly anticipated model with strong performance characteristics.
Introduction to building reliable agents using LLaMa 3 that can run locally on a laptop.
LLaMa 3 outperforms mraw on popular benchmarks, indicating its potential for superior results.
Proposed approach combines ideas from three sophisticated research papers to create a complex and reliable agent flow.
Implementation of routing from the Adaptive Rag Paper to direct questions to either a vector store or web search.
Incorporation of a fallback mechanism for retrieval from the vector store and web search based on question relevance.
Self-correction or checking of generations for hallucinations and relevance to the original question.
Introduction of a control flow that allows the LLM to perform specific tasks within predefined steps, enhancing reliability.
Comparison between React-style agents and Lang graph agents in terms of reliability and flexibility.
Demonstration of how to use LLaMa 3 with a local language model to build an index of web pages.
Use of GPD embeddings for local models and the integration with the Lang chain nomic partner package.
Lance's personal laptop, a Mac M2 with 32GB, is used to run the complex agent flow, showcasing its local capabilities.
Explanation of an agent's components, including planning, memory, and tool use, as described by Lilian Wang.
Construction of a corrective Rag flow using custom LLaMa 3 prompts and the use of a local vector store.
Real-time tracing and inspection of the agent's operations using lsmith, providing transparency into the agent's decision-making process.
Inclusion of a router to decide between vector store and web search based on the content of the question.
Successful execution of the complex agent flow, including routing, retrieval, grading, and generation, all within 14 seconds on a local laptop.
Public availability of the code for others to experiment with and replicate the demonstrated agent flow.
Transcripts
hi this is Lance from L chain So Meta
llama 3 came out today which is super
exciting and something that I've been
waiting for for a while and I want to
hop on here and H and talk about how to
build uh reliable agents using llama 3
that can actually run on your laptop so
it can run locally now just for a quick
kind of refresher here llama 3 just
dropped today we can see looking at the
performance characteristics for the 8
billion parameter model they're very
strong so I've done a lot of work with
mraw which was previously kind of my
go-to and it looks like on a number of
popular metrics or benchmarks you know
met llama 3 is is indeed a bit better so
again I haven't tested this yet this is
kind of like a first dry run but uh it's
really exciting so to convince you that
we can build local and reliable agents
I'm going to pick ideas from three
different rag papers they're all pretty
sophisticated and they're going to kind
of roll up into this pretty in this
pretty uh kind of interesting and
complex rag flow so we're going to do
routing from the adaptive rag paper
which will basically take a question
route it to either a vector store or to
web search based on the content of the
question where they're going to
introduce the idea of fallback so
basically we're going to do retrieval
from our Vector store if the question's
relevant to the vector store we're going
to grade our documents if they're not
relevant to the question we're going to
fall back and do web search so that's an
idea from this corrective rag paper and
then we're also going to do
self-correction or checking of the
generations to see if they're if they
contain hallucinations and if they're
relevant to the to the original question
and if they're not we'll fall back and
do web search so the point is we're
going to implement an interesting
complex raglow we're going to show we
can run this reliably and locally on my
laptop I have a Mac M2 30 32 gigs so it
is a reasonably sized laptop but it's
not insane um so first and foremost
what's an agent and this is kind of
controversial in itself but a really
good blog post from Lilian Wang lays out
an agent is something that has planning
so it can break down task into smaller
sub goals or subtasks it has memory
um chat history your long-term memory in
a vector store and it can use
tools now let's just say we want to use
an agent to build corrective rag which
is that middle blue thing we talked
about right so typically when people
think about an agent they immediately
say oh you know react that's like a very
popular framework for building agents um
and it typically involves a flow looks
like this for planning uh the LM will
select an action observe the result
think and then choose the next action um
and again you know rea agents typically
will use memories chat history or vector
store and of course can use different
tools so if I want to do this above flow
as a react agent it would look like this
I would take my question it would I was
first perform an action like use my
Vector story to get documents I would
then observe the documents I would say
okay I need to think about grading them
and then I would go back to my action
and then I would you choose the grader
tool and then I would kind of go in this
Loop until hopefully I follow this
trajectory as laid out here right
so that's kind of how it work with a
react agent now I want to introduce
another idea from implementing this that
uh you can basically lay this out as a
control flow so instead of having an
agent make a decision at every step in
this like kind of in this Loop instead
we're going to layout as the engineer
ahead of time here's the control flow I
want my agent to take every time it's
run so I'm basically taking this uh kind
of like planning away from the llm and
I'm actually creating a control flow
that I'm defining and what's nice is
that the llm then only has specific
tasks within each of these steps so you
know in terms of planning I'm laying out
a control flow ahead of time in terms of
memory I can basically use what I'm
going to call a graph state to
persistent information across this
control flow um and of course it can be
relevant talk things relevant to rag
like documents question and in terms of
tool use you know each uh graph note
like we talked about we can use a
different tool like the vector s
retrieval we'll just use a retriever
tool um the greater we'll use a greater
tool um you know the web search will use
a web search tool so you know again
thinking about this what are the
tradeoffs react of course one of the big
challenges is that you get lower
reliability when you have this react
style flow the agent has to make the
correct decision at every point and this
is kind of when you can see things go
off the rails and get off track
particularly with small llms whereas
with Lang graph you're actually laying
out this flow ahead of time so the the
agent effectively always traverses this
path every time the llm doesn't have to
make choices about you know which node
to go to
next uh in kind of an unconstrained way
um now in terms of flexibility a react
agent would be more flexible so it could
choose any sequence of actions through
this given these tools whereas the
control flow that I lay out with L graph
is constrained so it only ever traverses
this path um but we'll see because of
this constrained kind of control flow uh
this these landcraft agents are very are
compatible and actually quite reliable
with with local and smaller llms and
that's kind of one of the main benefits
I want to kind of bring home to you
today so let's actually get to the code
let's kick this off and start with the
the corrective rag piece so that's kind
of the middle piece of this overall
agent we want to build and what I'm
going to do is I'm just going to take a
few of these components and just test
them individually so you can see them
working so I have a notebook here a few
pip installs um here's again my flow to
reference now for local models for
embeddings I'm going to use GPD for all
uh which is you know from Lang chain
nomic I I uh we have a partner package
with them with nomic and uh I really
like these embeddings they're really
good now AMA just came out it's
available uh sorry llama 3 just came out
it's available on AMA uh and all I have
to do is AMA pull llama 3 um and the
only other thing I'm going to reference
is occasionally met llama 3 has a
particular prompt format which we have
to pay attention to so that's really it
now let's kick this off I'm going to
choose a local LM I'm going to say llama
3 and first I'm going to do I'm just
going to build an index so I'm going to
build an index of three web pages uh
blog post that I like and I'm going kick
that off I'm setting a chunk size um I'm
using chroma local Vector store and that
all ran so cool now I have an index so
that's this piece right basically the
index is the key component of my rag
flow I need to be able to retrieve
documents then now I'm going to get into
some fun stuff here I want a retrieval
grader so that is this piece I want the
be to retrieve documents and grade them
for relevance relative to my question so
that's what's happening here now here's
where llama 3 comes in and I'm not going
to use something really convenient I set
my local LM to llama 3 AMA has Json mode
which confirms that the output from the
llm is Json so my prompt basically just
says grade the documents and return a
Json with score yes no that's it so I'm
going to do a mock retrieval so I'm
going to say my question is Agent memory
I can just call invoke here actually uh
on my Vector store it's a little bit
more convenient um I can kick that off
so this is now running now one other
thing I did I set tracing in lsmith so I
actually and that already finished but
what's nice is you'll see me reference
over here um when this runs um as this
is running I can actually inspect what's
happening under the hood so we've called
a shadow llama and we get this nice Json
out that's great that's exactly what we
want cool so that's our grader now for
Generation I'm just going to do good Old
Rag so again I have some custom rag
prompt here nothing too unusual I'm
still going to use uh llama 3 of course
and I'm basically just taking my
documents and my question I'm Plumbing
them into llama 3 there you go so you
can see it runs pretty quick let's
actually check over
here um so I can see my the time is
around 4 seconds not bad and this is
pretty cool so I can look at the my my
prompt which contains the documents and
the output great so we're rolling here
this is pretty good
um and okay we got our index we got our
grater we got our generation done uh I'm
going to find a Search tool as well so
this is basically just a tool that I'm
going to use to query the web I like tab
for this it's kind of a really nice
Quick Search tool and here's where I'm
going to um basically Define My Graph so
all that's happening here is each of our
uh I'll should go up and show you here
so each of these pieces these green
things let's call them nodes so each of
these is just a function okay so this
first node retrieve documents I'm just
going to wrap that as a function that's
called
retrieve uh this generate I'm just going
to wrap this a function
generate grade documents again it's just
going to be a function now what you're
going to see here is each of these
functions these are basically the nodes
of my graph I'm going to take in the
state I'm going to modify it in some way
so in the retriever node all it's
happening is I'm taking the state which
I've defined up here this is a
placeholder uh dictionary that contains
my state so the way to think about this
is the state is information that I want
to persist across my agent so again it's
kind of that notion of memory it's
basically short-term memory that lives
over the lifetime of my agent and it
contains everything I want my agent to
be aware of throughout this control flow
right so for rag it's like question
generation web search intuitive stuff
right so so my retrieve node is just
going to take in my question um and this
is going to be passed from the user and
it's just going to do a document
retrieval so I'm actually going to use
invoke for this it's slightly kind of a
nicer way to do it um that's my
retrieval node generation same deal I'm
just going to use that rag tun we
defined above I'm just going to call it
but I'm going to invoke it now on my
graph State my question and my
documents um and you can see at every
note here we just write uh you know in
this case we write the question and the
documents back out to state so we just
update the state at each of these nodes
that's it now grading is what I would
call um you know another node so we're
basically going through our documents we
are grading them for relevance and if
they're not relevant we're going to
filter them out and we're also going to
turn on this flag to say web search so
if anyone's not relevant we'll go ahead
and do web search and then web search is
my final node this is basically going to
hit my hit my search API that tab API
and again just append those search
documents to my state
now here's where I'm introducing of a
conditional Ed so all Happening Here is
those noes we just talked about they all
just take in state modify it in some way
right so retrieval will'll just grab
documents from my retriever add it to
State grading will filter those
documents in state now I have this
notion of an edge where I want to make a
decision based upon State as to where to
go next so this is where I can Implement
kind of interesting logic here all I'm
doing is basically I'm taking in state
and actually I'm doing something really
simp simple and kind of silly here I
previously set this flag web search
that's going to tell it to do web search
if any document was deemed irrelevant uh
we set that up here in this uh grade
documents node so what all I need to do
is in this Edge I'm basic going to take
in my state I'm going to see does the
state contain web search if yes then go
to web search if no go to generate and
this what I return this string is just
the name of a node right web search or
generate that's it so we defined all our
nodes and that's it we are kind of
rolling
here now I'm just going to scroll down
uh now all we need to do is to build our
graph and I'm going to kind of flag this
graph
build very
nice and so all it's happening here is
I'm actually just implementing the
control flow I want so I registered all
my nodes up here and here here's where
I'm just setting like the order of the
noes so you can see my entry points give
me retrieval I'm going to go from
retrieval to grading um and then I'm
going to add my conditional Edge again
it's following our diagram up here
retrieval grading Edge and then um we
can see here uh after web search I go to
generate and after generate I go to end
so that's all it's going to happen and I
can go ahead and compile that so
basically I can run my graph and let's
see if this works I kick that off so you
can see it ran our retriever so it's I'm
basically printing out the steps as we
go cool so we're doing our grading right
now so we're grading our documents first
document's relevant second document's
relevant and I can go over to lsmith
here and let's see what's going on so
it's kind of chugging along so I can see
I can really dig into this like I can
look at every every document getting
graded I can look at the individual
grade prompts the individual documents
it's all pretty nice it's all logged for
us so that's all our grading stuff and
it ran that's great so we just build a
little simple agent again it has memory
it has state it has planning it has a
control flow um and it uses tools it's
an agent and it ran locally all on my
laptop so that's cool that's kind of
step one now let's beef this up a little
bit I can throw in the selfrag stuff
which is what we see in our diagram up
here in green I just need two new
graders but it's going to use the same
stuff we just talked about I want to
grade the the generations for
hallucinations I want to grade the
generations for relevance to my question
so let's throw in uh two additional
graders here and um why don't I add them
up here just for convenience um so this
is me my hallucination grader let's kick
that off and I'm just doing a kind of
simple test here so all this is going to
do is really simply um determine whether
or not there's the the answer is
grounded in my documents so yes no right
if if it's grounded then yes otherwise
no um and same here so answer grader um
you know does my generation answer the
question again you can look at the
prompts I'll share all this code of
course and so that all runs cool so we
have that now all I need to add this is
actually pretty simple is just one
additional conditional Edge to My Graph
and let's scroll down to where my edges
are so you you before we defined decide
to generate as a conditional Edge and
again what was that doing that was
making this decision the next decision
is is this um you know hallucination
conditional Edge so basically if my
hallucination grader uh identifies that
there's hallucinations I'm going to feed
back and we will go ahead and see that
here shortly so
um here's additional conditional Edge
and this is actually going to wrap in
both of my checks so you can see what's
happening here is first my hallucination
grader looks at the generation R to the
documents it gets the grade um if the
grade is yes um then the generation is
grounded in in the document so that's a
good thing um and then we go ahead and
move on to test whether or not it's
grounded in the if relevant to the
question um and so it's basically going
to return three things um either the a
generation has hallucinations I it's not
supported by the documents um or the
generation is in which case then it's
either useful or it's not useful okay so
that's kind of how we set up our
conditional ledge and all we need to do
now is we can update basically our graph
build here and so what you're going to
do is we can map from the outputs of our
conditional Edge so remember we're
outputting either useful not useful and
not supported we map between those to
the associated nodes we want to go to so
if it's not you if it's not useful we're
going to fall back to web search okay so
that's basically this case so basically
if it's not use doesn't answer the
question we're kicking back to web
search right so that's kind of scenario
one uh if it's not supported we try
again we go back to generate otherwise
it's useful and we finish that's it so
let's go ahead and try that so we're
going to retrieve again we're going to
check our document relevance again
documents relevant so we can see our
agent rolling here this is always kind
of fun uh just kicking just chugging
along and I can close this down so it's
checking my relevance it determined
relevance is good it's doing generation
now so that's also cool let's also open
that up so we can see the whole Trace in
real time doing our grading here
generation so we again we can like
really dig in and then this is our this
is our second grading step you can
really drill drill into all these pieces
so I really like having my traces here
where I can actually see what's going on
under the hood in each of these pieces
you can see it's like really nicely laid
out you can kind of close this stuff
down if you don't want to see it this is
pretty cool right um so yeah we're going
generation then we're doing our grading
and so this is pretty cool so it did
generation it checked hallucinations it
found that the generation is grounded in
the documents um and then so the the
hallucinations are good and then it
found that the generation addresses our
question so that's really cool we're
really rolling here and we really only
need one more piece so let's just throw
in a router Router is pretty easy it's
going to build on what we just talked
about actually so router is just again
I'm going to use Json mode but here I'm
just going to basically say hey given
the question given what's in my Vector
store so I tell it the vector store has
uh llm agents prompt engineering and
adversarial attacks right um if the
questions related to those topics you
use Vector store otherwise fall back to
web search I tell it return either
Vector store web search that's it let's
just do a test here make sure that
actually works um so I'm passing a
question related to my Vector store it
determines yeah use the vector store
easy um so I want one more Edge route
questions um let me go ahead and throw
this in here so here's all my edges
let's just add this one cool so route
questions so again follows ex what we
did before we look at our question
invoke our router depending on the
router State um so I can probably get
rid of these extra prints H why not I'll
keep them um so basically if the source
is web search then go to we Sear search
if it's not good if it's Vector store go
to Vector store right really simple
stuff um cool let's go ahead and do
that um let's now build our graph so
we're going to set an entry point now so
my entry points the router it's going to
decide to go to web search or
retriever um and then you can see this
this control flow is the same as we had
before that's it nice and let's try that
so first we're going to Route the
question so it's kind of printing that
out and it decides to go to the vector
store as we expect that's great and now
it's following the same flow we talked
about before so we've just implemented
this final piece of the routing you can
see it kind of went to uh it went
correctly routed to our Vector store so
that's great let's look at the
trace and yeah so we can like dig into
all this we can look at the router yeah
so it makes the right decision let's go
to the vector store that's
fantastic um then it does retrieval then
it grades the documents we saw this
before so it's kind of nothing new it's
still chugging along here we can look at
the
generation okay and it looks like it
probably
finished very cool so that's kind of
everything now let's actually just s
check this again let's let's ask a
question like related to current events
uh okay so we can ask a question who are
the Bears expected to draft first in the
NFL draft let's see if that kind of all
flows as
expected we can route to we'll route to
web search
um check for
hallucinations we can actually look at
our land graph Trace just to kind of
Sandy check what's going
on
um cool the Bears are expected to draft
USC star yeah Caleb Williams this is
this is kind of the consensus pick so
that looks great so anyway we've seen in
relatively short period of time that we
can build a pretty complex rag flow as
you see right here with routing with
retrieval grading with different
interesting decision points a fall back
to web search uh grading of generations
for two different criteria we can build
this it runs reliably it runs locally on
my laptop it runs with llama llama 38b
and you know we can look at you can look
at Lang Smith traces here um we can look
at the latencies this whole thing ran in
14 seconds which is pretty good for
something that's running locally on my
laptop um and you can see this is like a
non-trivial rag flow introducing ideas
from three papers they were able to do
uh all locally
and again this idea of control flows is
really what allows you to kind of lay
these out in such a way that a local
kind of agent can actually run reliably
I think that's like really the important
point to to kind of bring home and uh I
encourage you to play with this I'll
make sure the code's public but um yeah
hopefully this is useful and feel free
to leave any comments thanks
Browse More Related Video
RAG From Scratch: Part 1 (Overview)
RAG Evaluation (Document Relevance) | LangSmith Evaluations - Part 14
Adding Agentic Layers to RAG
Online Evaluation (RAG) | LangSmith Evaluations - Part 20
Introduction to Query Pipelines (Building Advanced RAG, Part 1)
RAG Evaluation (Answer Correctness) | LangSmith Evaluations - Part 12
5.0 / 5 (0 votes)