Building open source LLM agents with Llama 3
Summary
TLDRビデオスクリプトでは、オープンソースのLLMエージェントを構築する方法が紹介されています。Lance L chainが、ツール使用、計画、記憶というエージェントの主要なコンポーネントを解説し、llama 3を用いてそれらを実装する方法を説明しています。ツールとしてGrockやTav、lsmithを用いて、LLMが外部ツールを認識し、必要なペイロードを返すことで、機能を呼び出す方法を解説しています。さらに、Lang chainを使用して、LLMとツールを組み合わせたエージェントの構築プロセスを説明しており、多様なツールを通じて複雑なタスクを実行する能力を示しています。
Takeaways
- 🧠 キーポイント:Lance L chainは、オープンソースのLLMを使用してエージェントを構築することに高い関心を示しており、その方法を紹介しています。
- 🛠️ ツール使用の概念:エージェントの中心的なコンポーネントとして、計画、記憶、ツール使用が挙げられます。ツール使用は、LLMに外部ツールへの認識を与え、そのツールを呼び出すためのペイロードを返すことです。
- 🔧 LLMへのツールのバインド:任意の関数をツールとしてLLMにバインドし、そのツールを使用するためのペイロードをLLMが返す仕組みが説明されています。
- 📚 Lang chainのツールデコレータ:任意の関数をツールに変換し、LLMにバインドするためのメカニズムが提供されています。
- 🔍 ウェブ検索ツールの例:ウェブ検索機能をLLMが使用するためのツールとして実装する例が紹介されています。
- 🤖 LLMの応答プロンプト:LLMが有用なアシスタントとして機能し、ウェブ検索やカスタム関数を用いた応答を提供することが示されています。
- 🔗 ツール呼び出しの流れ:LLMがツールを呼び出すかどうかの決定プロセスと、その結果に基づく応答の流れが説明されています。
- 🌐 Lang graphの紹介:Lang graphを使用して、エージェントのフローを構築し、サイクルを含む複雑なフローを扱うことができると説明されています。
- 🔄 状態とノード:Lang graphでは、状態がグラフ全体にわたって保持され、ノードによってアクセスされることが示されています。
- 🎨 多様なツールの追加:Replicateを用いて、テキストから画像への変換、画像からテキストへの変換、テキストから音声への変換など、多様なツールをLLMに追加する方法が紹介されています。
- 🌟 エージェントの柔軟性:エージェントは汎用であり、さまざまな種類のツールを組み合わせて使用することができると強調されています。
Q & A
ラナス・L・チェーンはどのようなプロジェクトに興味を持っていますか?
-ラナス・L・チェーンは、オープンソースのLLM(Large Language Model)を使用してLLMエージェントを構築することに高い興味を持ちました。
エージェントの中心的なコンポーネントは何ですか?
-エージェントの中心的なコンポーネントは計画、記憶、そしてツールの使用です。
ツールの使用とは何を意味しますか?
-ツールの使用とは、LLMに存在する外部ツールに対する認識を与え、LLMがそのツールを呼び出すために必要なペイロードを返すことを意味します。
Magic Functionの例は何を行うものですか?
-Magic Functionは、与えられた入力に2を加算するシンプルな関数です。LLMがこの関数を認識し、ユーザーの入力に基づいて実行する必要があるかどうかを判断し、必要な引数を返すようにトレーニングされています。
Grockを用いた場合、どのようにLLMにツールをバインドさせるのでしょうか?
-Grockを使用すると、任意の関数をツールに変換し、そのツールをLLMにバインドさせることができ、LLMは自然言語の質問に対して、ツールの名前と引数を返すことができます。
LLMがツールを呼び出す際に返すペイロードには何が含まれていますか?
-LLMがツールを呼び出す際に返すペイロードには、呼び出すツールの名前と、そのツールに渡す引数が含まれています。
ラン・グラフ(Lang Graph)とは何であり、どのようにエージェントを構築するのに使われますか?
-ラン・グラフは、フィードバックを含むフローを定義する方法であり、エージェントの構築に使われます。状態がグラフ全体にわたって保持され、グラフ内のすべてのノードからアクセスできます。
LLMがツールを呼び出すかどうかを判断する条件エッジとは何ですか?
-条件エッジは、LLMの結果に基づいてツールが呼び出されるかどうかを判断し、呼び出された場合はツールノードにルーティングし、呼び出されなかった場合は終了するものです。
Replicateサービスを使用してLLMにどのような機能を追加することができますか?
-Replicateサービスを使用して、テキストから画像への変換、画像からテキストへの変換、テキストから音声への変換など、多様なモデルをLLMに追加することができます。
Replicateを用いてLLMに新しいツールを追加するプロセスはどのように行われますか?
-Replicateを用いてLLMに新しいツールを追加するには、ツールデコレーターと関数定義を使用してモデルを呼び出し、その後、ツールリストに新しい関数を追加するだけです。
このスクリプトを通じてどのようにエージェントの柔軟性と拡張性が示されていますか?
-このスクリプトでは、単純な関数やウェブ検索だけでなく、Replicateを通じて多様なモデルをLLMにアタッチし、その機能を拡張する例を通じて、エージェントの柔軟性と拡張性が示されています。
Outlines
🤖 LLM エージェントのツール使用入門
Lance LがLLMエージェントの作り方を解説。ツール使用の概念をLilan Wangのブログポストに基づいて説明し、ツールを用いた計画、記憶、ツール使用の3つの主要なコンポーネントを解説。Llama 3を用いて、ツールを使用する方法をステップバイステップで解説。ツールのバインドや、外部ツールへのアクセスをLLMに提供する方法を説明。Grockを用いたデモンストレーションで、ウェブ検索やカスタム関数を通じて、LLMがどのように自然言語で問い合わせに応えるかを示す。
🔄 Lang ChainとLgraphを使ったエージェントの構築
Lang Chainのツールデコレータを使って任意の関数をツールに変換し、LLMにバインドする方法を解説。Lgraphを使って、ツールの使用を含むエージェントのフローを構築する方法を紹介。条件エッジを用いて、ツールが呼び出された場合のルーティングを示し、ツールからの返答をLLMがどのように処理するかを説明。簡単な例として、'Magic function'を用いたデモンストレーションを行い、自然言語での応答までのプロセスを追う。
🌐 複雑なツールを用いたエージェントの拡張
Replicateサービスを用いて、Llama 3にマルチモーダル能力を追加する方法を解説。テキストから画像への変換、画像からテキストへの変換、テキストから音声への変換など、さまざまなモデルをツールとして追加。これらのツールをLLMエージェントにバインドし、複雑な問い合わせに応答する能力を拡張する方法を紹介。
🎨 エージェントの多様性とツールの柔軟性
エージェントの多様性とツールの柔軟性を示す例として、テキストから画像への変換ツールを使って、特定のテキストを画像に変換するプロセスを紹介。Replicateを通じてアクセス可能な様々なモデルを用いて、LLMエージェントの能力を拡張し、複雑なタスクに対応する能力を示す。
Mindmap
Keywords
💡LLMエージェント
💡ツール使用
💡Grok
💡Tav
💡Llama 3
💡Lang Chain
💡条件エッジ
💡Lgraph
💡Replicate
💡テキスト-to-イメージ
💡テキスト-to-音声
Highlights
使用开源大型语言模型(LLM)构建代理的讨论,特别是使用LangChain和Llama 3。
Lilan Wang的博客文章概述了代理的三个核心组成部分:规划、记忆和工具使用。
展示了如何使用LangChain将任意函数转换为代理可以使用的工具。
使用Grock作为语言模型(LM),Tav作为网页搜索工具,以及Lsmith用于追踪。
解释了如何让LLM识别外部工具并返回调用该工具所需的有效载荷。
演示了如何将自定义函数和网络搜索功能绑定到LLM上。
展示了LLM如何根据用户输入决定是否调用工具,并返回相应的函数名和参数。
讨论了LLM需要通过微调或提示来启用工具使用的重要性。
介绍了LangGraph的概念,它是一种布局流程的方式,特别是具有循环特性的流程。
展示了如何在LangGraph中定义代理,包含两个节点:助手节点和工具节点。
解释了条件边的概念,它根据LLM的输出决定是否调用工具。
演示了如何使用LangGraph构建一个完整的代理流程,并可视化为图形。
测试了代理流程,展示了如何调用自定义函数和网络搜索工具。
讨论了如何将Replicate服务用于增加Llama 3的多模态能力。
展示了如何将文本到图像、图像到文本和文本到语音等工具集成到代理中。
通过Replicate集成不同模型,展示了如何扩展Llama 3的功能。
最终演示了如何使用LangChain和LangGraph构建具有多种工具的复杂代理。
Transcripts
hey this is Lance L chain we seem very
high interest in building llm agents
using open source llms and so we wanted
to talk through how to do that from
scratch using llama 3 so first what is
an agent so lilan Wang is a very nice
blog post that laid out the central
components of Agents being planning
memory and Tool use so I want to walk
through these components individually
and how I can use them with llama 3 so
first let's talk about tool use I'm
going to copy over some code and we're
going to walk through it so I this
notebook done a few pip installs set a
few API Keys we'll use grock as our LM
we'll use Tav uh for web search as one
of our tools and we'll use lsmith for
tracing but that's all I've done here
okay and I'm going to kind of have this
image side by side so we can look at it
so first tool use what's the big idea
here the big idea is simply this I want
to take an llm give it awareness of some
external tool that exists and have the
llm return the payload necessary to
invoke that tool that's really all
that's going going on now this is often
kind of confused and I wanted to kind of
zoom in and explain this exactly so
let's say I have a function called Magic
function which which takes an input and
adds two to it I want to give an llm the
ability to recognize whether not or not
to invoke this function and to return
the payload necessary to run the
function given the user input so here's
exactly what I want to have happen I
want to take that function somehow bind
it to my llm and give it an input then
return both the function name itself and
the arguments necessary to run the
function remember llms are just string
to string right it doesn't have the
magic ability to call that function
natively but what it can do is return
okay I've seen this function I know it
exists and I'm going to give you exactly
like the input format necessary or the
payload to run the function as well as
the name of the function okay so that's
really all that's going on so first this
tool decorator in Lang chain allows you
to take any arbitrary function just turn
it into a tool and let's just kick this
off so here's my magic function and
here's a web search function so these
are two things that I want to kind of
turn into tools and I can do that right
here so we can run this now if I look at
Magic function now it's a structured
tool it has a name it has a
description and um it also has that
input or arguments as uh captured as a
pantic schema okay so all this
information can be passed directly to
our llm that's the key point so this
allows us to go from arbitrary functions
to tools that can be bound to an llm
okay so that's kind of step one now step
two this is where things are kind of
interesting I'm going to use grock here
and I'm going to use a prompt I'm
basically going to say you're helpful
assistant with two tools web search and
a custom function use web search for
current events use the magic function if
the user directly asked for it otherwise
just answer directly okay so that's kind
of my prompt and let's test this in two
cases to explain exactly how this works
okay so all I'm doing I'm using chat
grock setting llama
370b and I'm creating this uh runnable
this is kind of a lang chain primitive
for it basically invoking llm so that's
all I've done now here's what's
interesting this is piping the prompt to
an llm and I've bound my tools to the LM
so this is automatically taking those
tools we defined and it's basically
giving them to the LM such that it's
aware of them so it's that's basically
represented in this red box here you
take external tools and you basically
bind them to LM so the is aware that
they exist that's kind of step one now
here's step two I can basically take a
question so I'm going to ask what is
Magic function 3 I'm going to invoke my
runnable or my chain right with this and
let's see what happens I'm going to run
this now here's what's interesting that
payload contains an object tool calls
which contains the name of the function
and the arguments that's it so that's
the key thing and I can look at the raw
payload as well so the raw payload is
just simply this AI message it contains
you know a bunch of information but
here's the main thing it contains
basically um the name of the function to
call and the arguments pass to the
function so again that's exactly
represented here all that's happening is
I've taken a function I've turned it
into a tool I've bound it to my llm I
can ask a question natural language and
the llm can respond directly with the L
the function to call or the tool to use
and the input argument to use based upon
the user input that's the key point and
that's really all that's happening
function calling that's all I need you
to know okay so here's the other key
thing what if I just ask a question
about the United States based on my
prompt it should not try to invoke any
of these tools now now let's test that I
run this good and so this payload tool
calls empty I can look at the raw
payload and yeah now it's just a chat
response right the capital of the US is
Washington DC great okay so that's it so
hopefully now you understand how tool
use works and now remember this requires
an LM that's actually been fine-tuned or
prompted or otherwise is compatible with
tool use and this is a very important
Point uh we talked to the folks at Croc
they have kind of an an a proprietary
implementation for how they do this um
which we don't know fully but it is
reported that works very well with llama
70b llama 370b and that in my experience
I've seen it to indeed work quite well
so in any case the key point is this I
can take any arbitrary functions I want
I can turn them into tools I can then
pass those tools to an llm I can bind
them and then you can see right here
when I invoke my llm with a question the
LM makes decision to use one of the
tools and if it does it's going to
return to you the name of the tool it
wants to use and the input argument
that's the key Point okay so that is
really what uh you need to know about
tool use now we get to the fun stuff
we're going to build the agent and for
this I'm going to use Lang graph and I'm
going to explain kind of how this works
over time but first the way to think
about L graph is basically it's a way to
lay out
flows and flows in particular with L
graph are often characterized by Cycles
so the ability to kind of do feedback
and that's really relevant for agents
and we'll explain why here shortly so L
graph basically takes a state which can
live over the course of your graph or
flow and it can be accessed by all kind
of what we're going to call nodes in
your graph okay so first as state I'm
just going to find a set of messages and
don't worry too much about this for now
this will all make sense in about a
minute okay now here's where things are
going to get interesting I'm going to
Define an agent that contains two nodes
okay so
first first we're going to take our
input again it's a human message we pass
that to our LM which has the bound tools
the llm is going to make a decision to
use a tool or not and we just walk
through this that's the step one that's
this thing we've already seen right now
what we're going to do in Lang graph is
is we're going to add basic what we're
going to call an a conditional Edge so
this Edge is going to all it's going to
do is say was there a tool call or not
if there was a tool call I'm going to
route that over to a separate node that
basically runs the tool so let's walk
through with our example we just did um
what is Magic function of
three the llm made the decision to
invoke the magic function and it gave it
had the it gave us the payload right we
just saw that so that's arguments input
is three name is Magic function those
get plumbed over to what we're going to
call tool node which actually invokes
the necessary tool so it's going to
basically take in this name magic
function it's going to look up magic
function itself and it's basically just
going to run that function with this
input payload and then it's going to
return that as a tool message to the llm
that's all it's going to go on llm is
going to see that tool message it's
going to make a decision about what to
do next and eventually this is going to
keep running until there's a natural
language response
and this in this kind of toy example the
tool message would return with the
result of five that would be returned to
the LM the LM would see that and say
okay the result is five and then you
would exit so that's like the toy
example we want to we want to see now we
can implement this all in line graph
really easily and let's actually just
talk through that quickly I've copied
over the code here so all basic we've
defined here is we have this assistant
so this is basically just wrapping the
chain that we defined up here this
assistant runnable we just wrap that and
basically all doing here is we're adding
a
retry so basically if a tool is if a
tool is called then we're good that's
valid if it has meaningful text we're
good but otherwise we do reprompt it
that's all we're doing here right we're
just making sure that the llm actually
return to valid response so that's
really all to worry about here there um
we're also creating this tool node so
this tool node basically just will try
to invoke the tool um and it'll
basically have a little um we're going
to add a little thing to handle errors
in the feedback that this is all these
are just like utility functions so don't
really worry too much about them now
here's kind of the interesting bit we're
just going to build the graph and it's
going to look exactly like we show here
so what we're going to do is we're going
to add a node for our assistant right
we're going to add a node for our tool
node and that's kind of this piece and
this piece that's our tool node um and
then we're going to add this conditional
Edge which is a tools condition which is
all it's going to be is this piece it's
basically going to take the result from
the LM is a tool called if yes go to the
tool node if no end and we we can
Implement that right here um so this
tools condition that's all it's going to
do it's basically going to return either
a tool is invoked or end um and then we
go from tools back to the assistant now
let's run all this and we can see what's
nice about Lang graph is we actually
it'll automatically lay this out as a
graph for us we can visualize it here so
what's going to happen is we're going to
start we're going to invoke our
assistant um our assistant will in some
cases um ask to use a tool it'll go then
go to the tool node the tool will be
invoked that'll return to the assistant
and that will continue until there's
natural language response and then we'll
end that's it nice and easy so let's
actually test this
out
um and I'm going to go ahead let's ask a
super simple question so let's look at
we I have kind of two questions was
magic function 3 and was the weather and
SF let's ask question the first question
what's Magic function 3 boom so we're
going to run this now now like i' like
to go over to lsmith and look at the
result here so let's actually just walk
through this this basically allows us to
say we basically started we went to our
assistant and these are the functions
available to our assistant so that's
kind of know we gave it magic function
we gave it web search you know here's
the prompt what's Magic function 3 and
what we get as an output is again the
function to use and the payload to pass
to the function so again remember this
is kind of always a little bit of a
confusing thing an llm can't magically
call functions an is typed string to
string it can return strings um and it
ingests strings so that's fine all it's
going to return in this particular case
is just the payload to run the function
as well as the function name but that's
it that's all the LM is responsible for
then what we need to do is we have this
tools node see that's here that will
then invoke our function and so you can
see the input is just the argument the
output is you know 3 + uh 3 + 2 5 great
now this goes back to our llm
and then our llm just simply sees okay
it sees this tool message that the
function was called here's the output of
five and it returns natural language the
result of magic function is five and
then we end that's it nice and simple
and we can see that also kind of laid
out here here's our human message this
is the AI message um so basically the AI
makes a decision to invoke the tool and
it gives you the input payload then
here's the output tool message saying I
ran the tool here's the output the llm
gets that back and basically gives you
natural language and then based upon our
condition here this tools condition if
it's natural language it ends if it's a
tool invocation it goes back to the tool
node right so that goes to here um so in
this particular case it went back to the
assistant and now it's a natural
language response which means we just
end that's
it so that's kind of a nice and simple
example now if we go let's try something
light slight more complicated let's try
our other tool so let's like try what's
the weather and SF right now um so we're
going to try to run that and cool we can
actually see that it's going to call our
web search endpoint that's great it gets
this this kind of raw tool message back
from the endpoint and then the AI will
synthesize that into uh you know the
weather is 60° right now with mist okay
so that's really it this explains how
you can lay out arbitrary agents with
llama 3 open source llm uh we use chat
grock to do that grock has been uh
adapted for Tool use and that's the kind
of main important thing you need to
recognize that you need an LM that
actually has tool use enabled via
prompting or fine tuning or
otherwise um and what you can see is if
we kind of go back to the
diagram what we've done here is we're
using linecraft to kind of orchestrate
this process and what's going to happen
is you take a question in our L makes
the decision based on the question to
invoke a tool and then this conditional
Ed Ed will determine hey if a tool is is
kind of invoked then go to the tool node
and actually execute the tool the tool
is executed you get a tool message back
with the tool output send that back to
the LM LM reasons again and it could
make a decision to call another tool but
in our particular case in both cases the
tool message output was returned to the
LM the LM then responds in natural
language here is the solution and
because of that we end and that's it
that's kind of how to build an agent
from scratch using an open source llm
llama 3 with Lang Lang graph to
orchestrate it hopefully um from kind of
kind of very simple components and first
principles and again the key thing here
really is the ability or the ability for
an LM to reliably invoke tool so we
talked through the case of adding two
tools magic function and web search to
our agent now let's say we wanted to
make this a little bit more complicated
and try some additional tools so
replicate is a service allows you to to
access many different uh models which is
really convenient and I'm going to go
ahead and use it uh to augment llama 3
with a few multimodal capabilities so
all I've done is I've set my replicate
API key so I've actually already done
that I've import replicate and I'm going
to use a few different things here so
I'm going to do a text toer text to
image tool which is going to call this
particular model which is basically an
open doly model which will go text to
image um I'm going to create again
another tool image to text in this case
take an image in it'll use a lot laa uh
a version of lava to then produce text
from the image and text of speech this
is another option so really all you need
to do here is very simply just again use
this tool decorator with a function
definition that invokes the model of
choice so now the question is how do we
add these as tools to our agent so again
it's kind of like before all we need to
do is just update our tools list to
include some of our new functions here
that's it pretty simple now that tools
list is already bound to our our uh our
agent here so let's just go ahead and
kind of rerun everything just to make
sure this all works and I'm going to do
here is just update my question list to
include a few new questions that related
to my new tools and let's go ahead and
try one so let's say I want to try um my
index 2 question so questions two and
this is going to be my question related
to um image to uh this is going to be
text to image so let's basically say
I'll kick this off and I'll go back can
show you um so this is going to
basically uh in hopefully invoke the
text image tool based on this prompt a
yellow a yellow puppy uh running through
running free with wild flowers in the
mountains behind so that's our prompt
we're going to pass it to um our text
image
tool and it looks like that has been
called correctly so that's great now we
can also go over to
lsmith I can check my projects
here uh cool here's my agent here it is
running so we can also look at the trace
to confirm that everything's working so
cool so it looks like it is calling text
image tool so that's fantastic that's
running right
now great so our tool ran now we can
check our image here and look at that
very nice so again this is just showing
you the ability to create agents that
have many different types of tools again
previously we only had covered uh kind
of two very simple tools a magic
function web search but we can actually
do pretty interesting things so this
actually shows how you can take
replicate for example and basically
invoke many different llms hosted by
replicate or or know not just llms but
different types of models so this is a
text image model image of text and so
forth text of speech basically to
augment llama 3 and give it multimodal
capabilities so in any case it's a
really nice kind of illustration of the
fact that um agents are very general and
tools can be composed of many different
kinds of things in this particular case
different models through replicate which
we can attach to llama 3 to augments
capabilities thanks
5.0 / 5 (0 votes)