Self-correcting code assistants with Codestral
Summary
TLDRランスがコード生成モデルのコードSTWを紹介。このモデルはコード補完など、プログラミング言語に特化したタスクに優れており、ツール使用もサポート。コード生成モデルは多くの企業が欲するカスタムコードアシスタンスに適しており、コードの実行が可能かどうかの評価が容易。Codium AIが提案したコード生成のためのフローエンジニアリングのアイデアを取り入れた。コード生成後、インラインでコードをテストし、失敗すればループバックして再試行する。この方法はより良い結果を生み出し、Lang chainのツール使用と組み合わせてデモンストレーションされた。
Takeaways
- 😀 LanceがLang chainから来ており、mrawがコード生成モデルであるCode STWをリリースしたと発表しています。
- 🔥 Code STWはコード生成タスクにおいて優れており、中間コードの埋め込みやコード補完に適しています。
- 📚 Code STWはプログラミング言語に訓練されており、ツール使用をサポートするinstructバージョンもあります。
- 🛠️ Lanceはコード生成モデルが非常に有用であると語り、多くの企業がカスタムコードアシスタンスを求めていると指摘しています。
- 🔍 Lanceはコード生成の評価とテストが容易であるという強みを強調しており、Codium AIとCarboiが提唱したコード生成のためのフローエンジニアリングのアイデアに触れています。
- 🔧 Code STWを使用して、質問に応じたコード解決策を生成し、そのコードをテストし、失敗時にはループバックして再試行するというシンプルなフローを紹介しています。
- 💻 Lanceは実際にCode STWとLang chainを使用して、ユーザーからの質問に応じてコードを生成し、テストするデモンストレーションを行いました。
- 📝 Lang chainのツール使用機能を使用して、モデルからの出力に特定の構造を持たせることができます。
- 🔗 Lang graphライブラリを使用して、コード生成とそのテストを繰り返すフローを作成しました。
- 🔄 フローのデモンストレーションでは、エラーが発生した場合にLLMにエラーを反映させ、自己訂正を試みるプロセスを示しました。
- 📈 最後に、LanceはCode STWモデルとLang graphを使用して、コード生成と自己訂正を効果的に行うことができると結論付けています。
Q & A
コードSTWとは何ですか?
-コードSTWは、コード生成タスクに優れた新しいコード生成モデルです。コード補完やミドルフィルなどに特化しています。
コードSTWの主な特徴は何ですか?
-コードSTWは、プログラミング言語に特化して訓練されており、ツールの使用をサポートする指示バージョンもあります。また、生成されたコードを簡単にテストできる点が特徴です。
LangChainのChat LangChainとは何ですか?
-Chat LangChainは、LangChainのドキュメントに対するQAを行うツールで、ユーザーの質問に基づいて動作するコードブロックを生成できます。
コード生成モデルが一般的に役立つ理由は何ですか?
-コード生成モデルは、コードの実行が簡単にテストできるため、多くの企業が独自のコードアシスタントをカスタマイズする際に役立ちます。
コード生成のフローエンジニアリングとは何ですか?
-フローエンジニアリングは、コード生成の解決策を生成し、生成されたコードをその場でチェックして、問題があれば再試行するという強力なアイデアです。
コードチェックの具体例は何ですか?
-コードチェックの具体例としては、インポートが機能するか、コードが実行されるか、ユニットテストに合格するかを確認することがあります。
LangGraphの役割は何ですか?
-LangGraphは、フローを構築するためのライブラリで、特にサイクルやフィードバックを含むワークフローの構築に適しています。
コード生成フローの基本的な構造はどのようなものですか?
-コード生成フローは、ユーザーの質問を受け取り、コードを生成し、そのコードをテストし、問題があれば再試行するという流れで構成されます。
LangChainの構造化出力機能とは何ですか?
-LangChainの構造化出力機能は、特定のスキーマに従った出力を生成し、それをJson形式で返し、後で解析するためのものです。
コード生成フローの自己修正機能の利点は何ですか?
-自己修正機能により、コード生成モデルが生成したコードのエラーを認識し、そのエラーを反映して再試行することで、精度と使いやすさが大幅に向上します。
Outlines
😀 コード生成モデルの紹介
ランスは、Lang chain社がリリースした新しいコード生成モデルであるCode STWについて話しています。このモデルはコード生成タスクにおいて優れており、中間コードの埋め込みやコード補完などを行うことができます。プログラミング言語に訓練されており、ツール使用をサポートするinstructバージョンもあります。ランスは、コード生成モデルが非常に汎用的であると語り、多くの企業がカスタムコード支援を求めていると説明しています。例えば、Lang chainではchat Lang chainというQAシステムを持ち、ユーザーの質問に基づいて機能するコードブロックを生成することができます。コード生成は非常に評価しやすく、テストもしやすいという利点もあります。また、コード解決策を生成することで、その場で簡単に検証が可能であるというアイデアも紹介されています。
🛠️ コード生成フローの紹介
ランスは、コード生成におけるフローエンジニアリングの強力さを強調しています。Codium AIとCarboiが提唱したアイデアに基づいて、コード生成フローがより良い結果を生み出すと説明しています。このフローは、コード解決策を生成し、その場でテストし、失敗したらループバックして再挑戦するという単純な考え方です。実際に、ランスはこの手法を使って簡単なテストケースを紹介し、コード生成モデルの精度とユーザビリティを向上させる方法を示しています。また、ツール使用とCodolを組み合わせて、出力オブジェクトに前提文、インポート、コード自体を含め、シンプルなコードチェックを組み込む方法も紹介されています。
🔄 Lang graphを使用したコードチェックの自動化
ランスは、Lang graphライブラリを使用してコードチェックを自動化する方法を紹介しています。Lang graphは、フィードバックループを持つタスクに適しており、コード生成と組み合わせて使用することができます。ランスは、グラフのノードとエッジを定義し、コード生成からコードチェックまでを自動化するフローを作成しています。エラーがあった場合、エラーメッセージをLLMに渡して再生成を試みる、というシンプルなワークフローを構築しています。このプロセスは、ランスが実際に実行した例を通じて説明されており、エラーが発生した場合にLLMが自己修正できる様子も示されています。
📈 コード生成と自己修正のテストケース
ランスは、コード生成と自己修正のプロセスをさらに詳しく説明しています。具体的には、Pythonで'Hello World'を表示するプログラムを作成する簡単な問題から、より複雑な関数のベクター化の問題まで、テストケースを通じてモデルの能力を示しています。各テスト ケースでは、コード生成モデルが問題を解決するコードを生成し、そのコードが実行可能かどうかをチェックします。エラーが発生した場合は、エラーメッセージをLLMに渡し、自己修正を促します。このプロセスは、lsmithを使用して追跡されており、ランスはこのプロセスが非常に効果的であると結論付けています。
Mindmap
Keywords
💡コード生成モデル
💡ツール使用
💡コードの評価
💡フローエンジニアリング
💡Lang chain
💡Codium AI
💡Lang Smith
💡構造化出力
💡Lang graph
💡自己訂正
Highlights
Lance from Lang chain introduces mraw's release of code STW, a code generation model.
Code STW excels in tasks like code completion and is trained on a programming language.
The instruct version of code STW supports tool use.
Lance discusses the general usefulness of code generation models for companies.
Introduces chat Lang chain, a QA system that produces functioning code blocks from questions.
Code is easy to evaluate, either by execution or through unit tests.
Flow engineering for code generation is highlighted as a powerful idea from Codium AI.
The concept allows for inline checking of code solutions during the inference flow.
Lance demonstrates a simple test case using code STW with a code generation flow.
Shows how to use function calling or tool use with code STW to produce an output object.
Details the structure of the output object including preamble, imports, and code.
Lance explains the incorporation of simple code checks into the workflow.
Demonstrates how to loop back and retry if code checks fail.
Lang graph is introduced as a library for building flows, especially with cycles or feedback.
L graph is used to create a workflow for code generation with self-correction.
Lance builds and runs a graph for a simple 'Hello World' program.
Shows how the flow appends error messages to guide the model for self-correction.
Demonstrates a more sophisticated example of vectorizing a function with self-correction.
Lance concludes by showcasing the effectiveness of using code generation with self-correction in practical scenarios.
Transcripts
hey this is Lance from Lang chain so
mraw released code STW today which is a
code generation model um which I'm
actually really excited about so it's
really good at code generation tasks
like fill INE middle or code completion
it's trained on a programming language
it has an instruct version that supports
tool use but one of the reasons why I
really like code generation models and
I've actually done quite a bit of work
with them is that they're just very
generally useful so lots of companies
for example want C custom code
assistance that might combine like some
documentation Plus Code gener ation um
at Lang chain for example we have
something called chat Lang chain it's
basically QA over our docs it can
produce functioning code blocks for
users based on questions um and one of
the other things is cool about code is
really easy to evaluate it's really easy
to test does this code actually execute
or not um and so a really powerful idea
related to code generation was put out a
few months ago um from the folks at
codium Ai and carboi summarized it
really nicely here in that this idea of
flow engineering for code for code
generation is really powerful and the
idea shown in the paper the alpha codium
work and kind of highlighted here in
this tweet from karpathy is simply that
if you produce a code solution you can
really easily check it in line kind of
as mentioned here it's pretty easy to
evaluate code at the minimum does it
execute do the Imports work um in the
maximum case do you have like a an
actual solution do unit test but in any
case the point is code is very easy to
test and you can actually test it in
your inference flow so you produce a
generation you then test the code if it
fails you can loop back and try again
and this idea of kind of a a code
generation flow was shown in the paper
to produce much better results and it's
something that I want to show today uh
using Cod strw uh in a really simple
test case so this is something I've done
a little bit in the past and I found to
be extremely effective it's a very
simple idea but here's the basic flow
that we will kind of highlight
so I want to be able to take a question
related to code generation pass it to
the model so pass it to coal and have
cool produce a
solution um now what I'm going to do is
I'm going to use a function calling or
tool use with codol to produce an output
object that has three things a preamble
stating like here is the problem I'm
trying to solve the Imports and the code
itself and what I'm going to do is I'm
going to show how it's really easy to
incorporate some simple code checks like
do the Imports work the code execute if
either fail like there's a bug in the
code then I'm going to show how to loot
back and retry and this simple kind of
like check retry Loop is a way to
significantly improve the the accuracy
and and kind of usability of code
generation models I'm going to show how
to do that right now so to kick this off
I have a notebook here I've done a few
pip installs I've just set my mistol API
key that's really it and I'm also going
to use Lang Smith or tracing of course
this is optional I'm going to set envir
IR ment variable for my line chain
project which will basically will
indicate where all my traces will go and
this is just like the the kind of flow
we want to lay out so we basically want
to use cod strol to take a user question
produce a solution and we want to test
that solution if it passes our test
return to the user if it doesn't try
again that's all we want to do so what
I'm going to do here is first let me
just show some very basic components so
first let's just talk about how to
actually use code here's my general
prompt I basically just I'm going to
tell the model your code assistant
ensure that all the code can be executed
with all the Imports and variables
defined structure your answer in three
ways give me a preamble or a prefix
describing the code solution give me the
Imports give me a function and code
block so I'm going to ask for those
three things now here's where tool use
comes in I can actually Define the
schema of the output that I actually
want and what I can do is I can bind
that using a lang chain very convenient
with structured output I can basically
bind that to the llm and then this chain
will invoke the LM using the structured
output now here's how that actually
works under the
hood basically this object that passes a
pantic object is converted into function
schema form a STW and it's then passed
or bound to the llm so the llm has
access to this function and it knows the
schema that it should return when that
function call or tool is invoked so
basically what happens is I can take a
user question the function is invoked
and then the llm knows to produce an
output that adheres to my schema and
this will basically be a Json string
again remember llm is just string to
string so it's going to be a Json string
and then under the hood with this with
structured output thing that I'm using
from Lang chain we apply an output
parser that basically pantic parti
should take a Json string convert it
back to pantic object so that's it
that's all that's going on um but I'll
show you how this is really cool so I'm
defining this object this is what I want
to get out here's my chain now let's
test this out write a function for
Fibonacci
um I passed it in as a user question um
so that's
it now this is
running great and we see a result now
here's what's cool if you look at this
result object it actually is a code
object so it basically it's pantic
object following the scheme we specify
here it has a prefix boom it has some
imports actually in this case none and
then it has the code block that's it so
we'll see why this is really useful in a
little bit but want to introduce that
idea of basically we can use cod strol
with tool use to produce structured
outputs which is generally very useful
and in the particular for this notion of
kind of like inline self-correction is
extremely useful cool so that's that
first piece now what I'm going to
introduce here is Lang graph so Lang
graph is a library from the Lang chain
team and we've used this a number of
other videos um and I've used this kind
of extensively in general uh to build
flow flows and this is an example of a
flow the main characteristic of the flow
that I highlight here that Lang graph is
really well suited for is anything with
a cycle so anything with feedback
basically what it's saying is I want
every time I run my app I want to do
this code generation produce a
structured output do some kind of code
checks make a decision based on the
outcome of those code checks feedback if
they fail finish if they pass that you
can think of as like a very simple kind
of like workflow um and L graph is a
great way to build these kinds of
workflows and we'll see why so the first
thing I need to specify with L graph is
just simply the graph State now this is
just a thing that lives throughout the
lifetime of My Graph it basically
represents all information that's shared
across what you might call these nodes
so in this case I have two particular
nodes and you might call this an edge so
this is kind of where I'm making a
decision um so State lives across these
nodes and edges so that's really it so
I'm going to Define my state it's going
to attain some information that's
relevant to the flow I just talked about
so it's contain an error message it's
going to contain my final generation
it's going to contain the messages that
are being passed to my
llm and this will all become a little
bit more clear as we go forward so here
I'm going to lay out this is basically
the nodes and the edges of My
Graph now what you'll see is for every
node here's my generate node and that's
what we laid out here
generate um it's going to take in the
state and the nodes just modify the
state in some way so that's how to think
about the nodes so in this case I take
in the state I unpack the state into
like some messages uh some of iterations
an error message these are things we're
going to use throughout our graph um so
then what I'm going to Simply do is
compute a code solution so I'm going to
look at my messages in and I'm going to
generate a solution now remember that's
exactly what we did up here so this is
actually nothing new remember look at
this this is just we Define a set of
messages invoke our code gen chain get
an output
same thing we're going to be doing in
our graph so this is nothing exotic
we've actually already tested this and
once that runs I'm just going to pend
that output of code solution to my
messages okay so you know again here's
my attempt to solve the problem I'm just
going to take the codes the prefix the
Imports and the code I'm just going to
add that as a new message I'm going to
increment my iterations we'll use as
iterations to determine when to stop um
and I'm just going to return then my
state with a few things here first is
going to be my code solution My
Generation that's it then it's going to
be my my stack of messages which is
basically just pended to and then the
number of iterations that's really it
that's it so nice and easy there now the
code check is the second kind of big
node that we're going to be working with
so our first node is Generation the
second node is our code checks we just
saw Generation generation can return the
the generation with the three pieces the
Preamble The Imports and the code block
now it's going to be passed to code
check
so code check is really anything we want
to be we can do any kind of checks on
this code now maybe in the best case
with some kind of unit test we could run
I'm going to show you the simplest
possible code check that we might want
to do so in this particular particular
case what I'm going to do is I'm I'm
going to get the code solution from our
state remember we wrote that out to
state so the generation contains our
code solution and in this node I just
pick it back up from State you know
State's pass every node I get the code
solution I EX exract the three pieces
and we just showed that above so I get
the prefix the Imports and the code and
now all I'm going to do is simply just
test
execution do does Imports execute if not
I'm going to throw a flag or I'm going
to kind of flow throw a message here
code import failed I'm going to take an
error message I'm going to pen that
error message to our messages object and
I'm going to return that that's it and
alternatively if that passes I'm going
to go ahead and try the whole thing so
I'm going to combine the Imports and the
code um I'm going to go ahead and
execute the the code and again if that
fails I'm going to basically return
another error message um and now if
there's no errors then that's great I
confirm that you know there's no test
failures I've set this error flag to no
and everything else the same as before I
return the messages I return iterations
I return code Solution that's it now
this is the final bit all we're going to
do here is decide whether or not to
finish this is basically our little
conditional Edge which we talked about
here and all this needs to to do because
we wrote that error flag to state so
again remember we wrote error no if none
of these tests passed we wrote error yes
if either one does right there and if
that's the case all we need to do then
is get our get our error from the state
um if it's no or we've exceed the Max
iterations uh then we just go ahead and
finish um and if yes then we go back to
generate that's
it and that's really it so we're just
going to find all of those
pieces and we're basically almost done
here let's just build this graph now
this is how in L graph you can actually
assemble your workflow all I need to do
is take that function I defined generate
add it as a node take the function we
defined code check add it as a node
again this is my like this is the state
graph um and I just build the graph here
set my entry point as generate um add an
edge and then uh so basically I go from
generate to check code and I go from
check code to basically I decide to
finish based
upon this logic right here and basically
if if it returns end then I end if it
returns generate I go back to generate
okay um so that's really the Crux of all
you need to do and I can go ahead and
run that and actually this will draw my
graph for me using this little display
feature right here so we can see we
start we go to generate we go to code
check optionally depending on what
happens from the code check based onal
Edge will'll go back to try to
regenerate um or if there's no errors we
go to end so that's really it it's
pretty
nice um and the one thing I'll just make
a note of is as we're going through this
flow we're actually appending to our
messages and so basically if the if
there's an error we're appending that
failure to our messages and we're
basically telling llm here is the
failure reflect on this error um State
what you think went wrong and try again
so that's really
it nice and easy and there's our
flow cool now let's try this out here's
like the simplest possible um you know
kind of problem write a Python program
that prints hello world right so let's
try this out I'm going to run this and
what's kind of nice is I have this kind
of this this nice kind of formatting
stuff you can kind of see the input um
yeah okay so R program that prints hello
world generating code solution and then
here's here's kind of my attempt to
solve it um here's actually the the
Imports none the code here's the code it
goes through the checks no test failures
and it ends cool now what it's nice is I
can go over to lsmith and I have this
project right here now this project
actually lays out exactly what we just
did but we can actually dig into each
piece so here we went we start our graph
we went to generate I go to again this
is using Cod strol model so here's here
is my U you know human message or my
question in um this is showing that it
does indeed invoke my function so that's
great um here's the prefix here's the
Imports here's the code block um it uses
a pantic tools parser to basically write
that out as a pantic object we talked
about that
previously um and then here's the code
check so basically it goes through the
the various code checks and you can kind
of check all these here um and then it
goes through the decision to finish um
and in this particular case because none
of the code checks failed and finished
so that's great so this is a good
example of like of kind of how the flow
Works in a very simple test case now
let's try something this a little more
sophisticated cool so in this case I'm
basically asked to vectorize a function
I give it a function um I show me ask it
to show me a test case with this with
this actually working okay so we can see
that it kicks off the flow here I want
to vectorize a function um here's the
inmt to solve the problem uh so here's
kind of the initial solution now what we
see here is your solution failed the
code execution test it did not Define
image reflect on the error attempt to
solve it here's my attempt to solve a
problem the error C because variable the
the variable image is not defined to
solve this problem so you make it kind
of reflect on its error and try again so
it goes and tries again we see it fails
again for a different reason isolution
failed the C secution test it could not
broadcast cast um uh 505050 into a shape
50-50 53 okay um so here's my here's the
attempt to solve the problem it kind of
explains itself it goes back through and
then all the code tests now pass um so
that's basically it and then
finish so this showcases how you can use
code generation using new code STW model
stol with self correction using Lang
graph and what we showed in in general
is the ability to to perform code
generation and perform arbitrary checks
on the output of the generation itself
if any checks fail Loop them back use a
message cue to accumulate over time uh
or over iterations in this flow the
various errors and then pass them back
to the LM to attempt to self-correct and
we seen we've seen this work pretty
effectively in a very simple test case
but I've actually seen this work really
well in the case of uh code generation
with Rag and and uh we have some other
uh resources on that which I'll share
later thanks
5.0 / 5 (0 votes)