LangChain Agents with Open Source Models!
Summary
TLDRこのビデオでは、Lang Chainというフレームワークを用いて、言語モデルと埋め込みモデルを組み合わせたアプリケーションの開発方法について紹介します。具体的には、MROLとNomic Embed Text v1.5というモデルを使用し、これらをホストするプラットフォーム上でLang Chainエージェントを構築するプロセスを解説します。さらに、ベクトルデータベースとしてChromaを使用し、Lang Chainのデバッグと可視化ツールであるLangs Smithを活用する方法も紹介します。このビデオは、Lang Chainを使ったLLMアプリケーションの開発に興味のある開発者にとって貴重な情報を提供します。
Takeaways
- 😀 Using Langchain for building LLMs agents on top of models like mRouge and nomic embed
- 😀 Leveraging hosting platforms like mRouge AI and Fireworks to run the models
- 😀 Starting from a retrieval agent template and customizing it
- 😀 Using chroma as a vector database to index and search documentation
- 😀 Splitting long text documents to fit embedding model limits
- 😀 Switching out archive retriever for vector store retriever
- 😀 Adding routes and imports to integrate new tools into agent
- 😀 Using Langsmith for debugging and observability of agent
- 😀 Hosting finished agent locally via Langserve
- 😀 Potential to improve ingestion process and document cleaning
Q & A
What language model is being used in this example?
-The language model being used is mROL, an open source model with hosting available through the mRR AI platform.
What embedding model is used to encode text for the vector store?
-The nomic embed v1.5 model is used as the embedding function to encode text for the vector store.
What tools are used to split the ingested text into smaller chunks?
-The recursive character text splitter from Langchain is used to split the ingested documentation into 2000 character chunks with 100 characters of overlap.
What is the purpose of using Langsmith in this example?
-Langs Smith provides debugging and observability into the agent by allowing us to see what tools get called during execution.
Why can loading too many docs cause issues accessing the docs site?
-Loading too many docs can cause the hosting provider to block access to the docs site, which has happened to Langchain's office in the past.
How does the template allow structured JSON output without explicit JSON mode?
-The template uses a prompt from Harrison's Hub to prime the mROL model to produce valid JSON output with relatively high probability.
What improvements could be made to the document ingestion process?
-Cleaning up the ingested documents and filtering out irrelevant sidebar content could improve relevance and reduce hallucinated responses.
What tool is used to host the agent as a REST API?
-Langserve is used to easily host the agent as a REST API that can be accessed through the provided playground.
How are runnables used to pass data between modules?
-The runnable pass through interface allows data to be passed between runnables without modification.
What steps would allow translating the QA to Japanese?
-Using a machine translation service would allow providing the Q&A output in Japanese. As an AI assistant I cannot natively produce Japanese text.
Outlines
Introducing the Video and Models to Be Used
The video will build a Langchain agent using mROL and nomic embed text v1.5 models hosted on platforms rather than locally. It will use the Langchain retrieval agent template and modify it to use a vector store instead of a web retriever. The mROL model will be used for agent decisions. The nomic embed v1.5 model will be used for embeddings. A chroma vector database will be used locally. Langsmith will be used for debugging. Langserve will host the API.
Initializing the Template and Testing It
The video initializes the retrieval agent template using the Langchain CLI. It adds the necessary code to integrate with the template. After installing dependencies with poetry, it starts the server and tests asking the agent about 'matrial learning'. The default archive retriever is used initially which tries to find academic paper summaries.
Ingesting Documentation Pages for the Vector Store
The video sets up ingesting some Langchain documentation pages into the chroma vector store using the docusaurus loader. It takes care to only ingest a small section to avoid overloading the docs site. The pages are split to conform to the nomic token limit. After ingestion, the retriever is switched to using the vector store rather than the archive.
Connecting the Agent to the Vector Store
The vector store and embeddings are connected in the agent code, replacing the archive tool with a vector store tool. After re-running the server, the agent is tested by asking questions about Langchain expression language concepts, verifying it can now search the ingested documentation.
Conclusions and Next Steps
The video showed a quick build of an agent with vector store and open source models. Some issues were seen with ingested document relevance. As a next step, ingestion strategies could be improved to address this. The video also noted how the mROL model can produce JSON output without explicit JSON mode.
Mindmap
Keywords
💡Lang Chain
💡MROL
💡Nomic Embed v1.5
💡Chroma
💡Langs Smith
💡Lang Serve
💡Fireworks
💡ベクターストア
💡Poetry
💡Docusaurusローダー
Highlights
We're going to build a langchain agent on top of mrol and nomic embed text v1.5
We'll be using the fireworks connector which enables forcing structured output from models via Json mode
We'll be using the new nomic embed v1.5 model which allows choosing embedding length and dimensionality
We'll index sections of the Langchain Python docs using the docusaurus loader to answer questions
We split docs into chunks under 2,000 tokens due to nomic's limit then ingest into chroma vector DB
Transcripts
hello and welcome to another uh Lang
chain build video uh today we're going
to be building a lang chain agent on top
of um mrol and nomic embed text v1.5 uh
we're going to be using both of these
through
um uh platforms that host those models
uh instead of running them locally Lance
is making a video later this week um
doing some Lang graph work with local
models so stay tuned for that as well um
and let's get started uh I'm Eric from L
chain
and first let's dive into um what some
of the platforms and uh models that
we're going to be using are uh first and
foremost we're going to be using Lang
chain uh Lang chain is uh the best way
to develop llm applications um it's a
framework that allows you to connect to
language models embedding models um and
then kind of stitch them together into a
functional application um we also offer
some uh templates uh which are kind of
reference architectures for uh how to
build things so today we're going to be
starting from a retrieval agent template
um and then modifying it in order to um
use a vector store instead of kind of a
web- based retriever um the language
model that we're going to be using uh in
order to uh make decisions in our agent
is mrol uh mrr is a uh open source model
um they have hosting built in through
their mrr AI platform as well um or you
can host on um locally or through
anything else um we're going to be using
the fireworks connector today fireworks
has kind of an added bonus feature of um
Json mode which kind of uh enables uh
forcing structured output from some of
these models um funnily enough the
template today actually doesn't start
from Json mode um but we can play with
that a bit um if we have time um the
embedding model we're going to be using
is the brand new uh nomic embed uh v1.5
model uh which is uh going to be an open
source model um it has a lot of cool
features such as uh using uh mroy skill
learning which allows you to kind of
choose the length of the embedding the
dimensionality of the embeddings that
you you actually produce um the nomic
announcement which will be linked in the
description um has kind of a cool write
up of how that works um and so we're
going to be using that v1.5 model uh for
our vector we're going to be using
chroma which we're just going to be
using local persisted version of chroma
as a vector database um we'll be using
Langs Smith uh which is Lang Chain's
kind of first party debugging and
observability product uh in order to see
what's actually going on in our agent
and what tools are are getting called um
We're going to be using Lang serve uh in
order to host it uh Lang chain templates
are all designed to be hosted as rest
apis um so Lang serve is kind of a fast
API extension that allows that makes it
really easy to host chains and play with
them in a playground which we'll be
using a bunch um the initial template
uses an archive retriever uh which looks
up academic paper summaries uh based on
the query so you'll see that for a
little bit um and then we'll be using a
kind of community cont contributed
docusaurus loader uh in order to index a
few pages of our the Lang chain python
docs in order to be able to ask
questions about that um and with that
let's dive in so so the template we're
going to start with um you can find on
templates. langang chain.com that's
where all of um our templates are um and
we'll search uh for our uh retrieval
agent we're going to be starting with
the fireworks one um which uses an open
source model the base one uses an Azure
hosted open AI model instead um but
today we're going to be focused on open
source models um and we can uh use just
the instructions that are in here um so
first let's uh
install the Lang chain
CLI um which is going to be we're going
to use to kind of pull down that
template um and to do that uh I actually
have the command already here we're
going to do a l chain app new command in
order to create a new application and
we're going to in initialize it with the
retrieval agent fireworks package um
we'll actually skip installing them with
Pip and we'll use poetry
instead um and it spits out this helpful
message um which gives us some code that
we'll want to add to our apps server.py
file um where we're supposed to add our
routes and we can even put our import at
the top um to make vs code a little bit
happier with us um I can make that a
little bit smaller so here we're just
going to import our default retrieval
agent um and we're going to add that at
our retrieval agent Fireworks Route here
um then we're going
to um run it um first we need to do a
poetry
install in order to install our
dependencies um the CI has helpfully
added um the package to our Pi Project
tomel so poetry knows where to find it
um and then we can do a poetry run Lang
chain serve um in order
to access that playground so it's going
to oh and it looks like I didn't save my
server file um so if I save it it'll
hopefully reload that and it works
better now um and we can see that we
have a little playground link here for
the path um and that's going to be at
Local Host
8,000 retrieval agent fireworks
playground um and so to start let's ask
it a question about um something that
it's going to want to use the archive
tool for uh which can be what
is um matka
learning um
and uh it looks like it's looking up
matrial learning in archive and we can
actually follow this progress in Lang
Smith
um where hopefully
it's um going to look that up it looks
like it's wasn't successful at actually
pulling something out um because the two
archive searches for MRA
learning
um had some summaries that were not
particularly helpful at describing what
it is but that's okay um here it
retrieved a PhD thesis as well which is
uh um probably a little bit in the weeds
for mrol to be able to synthesize what
matrical learning is um but that is all
good as we and we'll uh kind of go
through the code and see how we can swap
out this archive retriever uh for one
over some of our own documents instead
um in a second which is kind of the end
goal of
today so to do that let's uh start by
ingesting some documents So today we're
going to be using um our um chroma
connector um this is a community Lang
chain Community Vector store and so we
can import it as such down here um we're
going to be persisting it to disk um
just so we can have kind of a separate
ingestion script um which is going to
kind of download all the documents and
put them in there as well as um kind of
connect to it in our agent um through a
retrieval
tool um and then we're also going to be
using our gomic uh embeddings uh which
will allow us to use that uh
1.5 text model um in the docs we use the
V1 model because um the 1.5 model is
coming out with the release of this
video tomorrow um
and that should all be good um so let's
actually create a
ingestion
script um we'll call it in just. piy and
we'll put it in just the root of our
project um and we can first uh import
our nomic embeddings and
chroma um so we'll import linkchain
Community Vector
stores uh we'll import chroma and from
Lang chain
nomic import uh nomic embeddings um and
we'll need to add some dependencies for
the nomic one um chroma will already be
available because the template we had
had Lang chain as a dependency um and
here we'll want to poetry add Lang chain
nomic in order to get access to that
package as well um and we'll first uh
instantiate our embeddings uh where
we'll just want that to be based on our
nomic embed text V1 five model
um and we'll have chroma
as we'll call this our Vector
store as a
embedding function of
embeddings um and we'll want to set a
persist dur or
directory
um of SL chrom
ADB
um so we can do that there and then we
will uh ingest some documents and
actually while we're here let's actually
just skip um doing some of these uh test
documents and let's jump straight into
using the docusaurus loader um to kind
of populate that with some section of
the Lang chain docs so to do that um we
can look up our docy source loader
um in the Lang chain doc we're going to
need two more um dependencies for this
uh for Doc at ingestion time we're
actually not going to need this uh sorry
at ingestion time we're going to need
beautiful soup and lxml in order to load
those um load the site map as well as
the documentation pages in there um we
won't actually need those at runtime for
our app so we don't need to add them as
explicit dependencies so instead of
poetry adding we can just poetry run uh
pip install of those
two um just so I can access them locally
you can achieve the exact same thing by
um adding kind of optional groups as
well so you can have like an ingestion
group in your poetry file um but either
works and then we will uh start to use
that document loader uh we can skip this
Nest asento step since we're not uh
doing this in a jupyter notebook we're
doing this in uh in actual python script
and this is kind of how we configure our
loader um we're just going to index one
section of our docs um if you do this
yourself please oh please do not um
always run this against the entire uh
docs we have a we have a lot of docs in
our doc site and so uh versel who hosts
our docs will block your IP address and
not let you access it anymore um that
has actually happened to our office a
few times from running this so we are
going to filter the it map URL um with
this down here um by default this does
not um get the pages that we want this
is just going to get the kind of sitemap
loader um we can actually instead just
do uh everything under the expression
language which will uh get uh kind of
all these pages about the expression
language so we can ask some uh questions
about the runnable interface and that
kind of thing so that's going to get our
documents um next we're going to want to
split those documents a little bit some
of those pages are too long for the
nomic embedding model which has an
8,192 token limit um so let's do some
text splitting
um where actually I think that's in here
retrieval teex
Splitters um and and we're just going to
use kind of the main recommended one
which is a recursive character text
splitter um which we can do from Lang
chain text
splitter import recursive character text
splitter
um
and that we can configure with a chunk
size and a chunk
overlap
um and I can actually
just use this um we don't need to define
those two and since we have an 8,000
token limit let's just do 2,000
token there and then we can have kind of
or this is a characters actually so
2,000 character trunk size um with about
a 100 characters of
overlap should work fine
there and then we'll want
to
um kind of split documents from that so
we can do text splitter dosit documents
documents and we can call that chunk
docs and then we're going to want to
call Vector store. add
documents of chunt
docs um and we can try this so poetry
run python in.
piy and hopefully this will create a a
um chroma
DB folder in here which is going to
persist all those um it's complaining
that it couldn't import the chroma
package because we haven't added it um
so that was a step I forgot so we can
poetry add chroma in order to address
that and then we can try
again to run ingestion and hopefully
this will um fetch those pages so it
looks like it's fetching 29 pages
um and then adding those documents at
the end so so far so good um one thing
to mention is I do have a gomic API key
set in my environment um you will have
to get one of those yourself if you use
the hosted nomic embeddings um obviously
if you use one of our other uh like
local Integrations or something like
that for your embeddings function uh you
will not need that um but uh if you
follow along in the nomic docs um it'll
walk you through that so with that um
let's actually go and switch up our
package a bit in order
to um give our
agent um the ability to search um our
documentation so instead of this archive
tool um we are going to want a um
retrieval tool based on um our kind of
chroma connection so we're going to
first want
to uh connect to our Vector store and we
can actually use the exact same code
that we have up here um in there except
we won't need the docy source loader or
the text splitter um since we're not
ingesting any documents here and we can
move those Imports to the
top um so that's our embeddings in our
Vector store we can create our uh Vector
store retriever um which is going to be
Vector store.
as
Retriever and then we can create our
Vector store tool which uh co-pilot is
nicely formatting for me uh let's call
this our doc store
tool and um let's make a little bit
prettier of a
description uh Vector store tool
description let's call this um a tool
that looks up documentation about the
Lang chain expression
language
um use this
tool
to uh look anything up about
LL um the
runnable interface
um
or that should be good or the runnable
interface and then we can pass that in
as
our description there uh so that's our
Vector store tool and then instead of
our archive tool we'll pass in our
Vector store tool here um and we can
even delete our archive tool up here um
if we want to
um and that should just connect our
agent back to our um Vector store so
we've given it another retrieval tool
let's
try
um running
that um and see if we get any
errors um note that an important part
here was that our persist directory was
set the same as we set it in our
ingestion script because otherwise it
would connect to kind of an empty ADB um
so far so good um so let's go back to
our uh playground and then we can
ask um what is a
runnable pass
through um I probably didn't need to
include the parenthesis there so it's
looking up documentation for that
because it's a runnable so mixol did a
good job there let's actually try
opening this in Langs Smith instead um
and it allows passing data through the
chain without any modifications um so
that was the kind of final output we can
see which uh chunks of documents it
actually retrieved here um and it looks
like it got some
sidebar um I actually don't even see
rable pass through mentioned in that one
and mostly
sidebar
um so
uh it may have hallucinated that let's
actually try without the parentheses and
see if it
does grabs anything
different um so we can look at this
[Music]
run and this time it actually tried
looking it up
twice um they got our kind of runable
pass through page about passing data
through which is good and on the second
one it searched for binding runtime
args um which got something else and was
able to produce the final
output um which was
actually pretty similar there um but
it's producing some references while
producing that final answer which I feel
pretty good
about um so we
have tested that um and this was
actually on our docusaurus docs so we
had a lightning quick build video today
um
the yeah let's try some other things
let's ask about
runnable um how do I type the output of
one runnable into another in
El um and let's see if it's able to get
a relevant document there so
here the first time it calls the
tool we're getting some sidebar again um
this is kind of going to show how
important um setting up your ingestion
is um so if a kind of future exercise we
could do would be to clean up um some of
the documents we actually ingest to make
sure the relevance is is very high um
and here we're talking about kind of
what lell is good
at um and then here is kind of an
overview
again and then here we're searching for
passing data between runnables and
ELO um which got all sidebar again um
so here I think these are probably more
shortcomings in how I was loading the
documents um and in a future video we
can do um some fancier ingestion
strategies on these um but
here oh sorry here was the output of
the um piping the output of one runable
to
another and this one actually seemed to
answer more about runnable passrs and
intermediate content before producing
final answer
um and it's saying that we're supposed
to use a runable pass through because
our uh documentation returned more stuff
about runable pass through than runnable
sequence which is a little bit too bad
um but as you can see um definitely some
prompt engineering and and document
cleaning to be done when setting up one
of these systems especially when using
open source models um
and yeah we can do some work to improve
that
in a future video um and things I just
want to kind of emphasize here is uh
this is using a pretty cool just uh
react construct in order to produce uh
the Json blob um if we look at the code
we can actually see that this is being
pulled from um
Harrison's um
Hub um so we can actually look up that
prompt and kind of see how that's being
formatted um and what this is doing is
it's basically kind of priming the uh
mixt model in order to produce uh valid
Json with relatively high probability um
so even without Json mode uh this mixol
model is able to uh produce good uh
agent tool calls um without uh much like
custom prompt engineering aside from uh
using Harrison's prompt here um and then
of course we're using our nomic embeds
v1.5 in order to uh actually retrieve
things from the
documentation um and our chrom ADB to
host all of that thanks for tuning in
Weitere ähnliche Videos ansehen
OpenAIのGPTsより凄い!無料で使えるDifyを徹底解説してみた
Code-First LLMOps from prototype to production with GenAI tools | BRK110
仕事で使えそう?DifyでRAGを行う時の設定について解説してみた
Introduction: Monitoring and Automations Essentials with LangSmith
Track4 Session1_Generative AI Summit Tokyo '24
How to Use LangSmith to Achieve a 30% Accuracy Improvement with No Prompt Engineering
5.0 / 5 (0 votes)