Building a self-corrective coding assistant from scratch

LangChain
27 Feb 202424:26

Summary

TLDRThe video discusses using LangGraph to implement iterative code generation with error checking and handling, similar to AlphaCode. It shows loading documentation context, structuring LM outputs, defining graph nodes to generate code, check imports and execution, and retry on failure. An example is shown where a coding mistake is fixed via the graph by passing the error back into the context prompt to induce reflection. Experiments found using the graph boosts success rates by 25% on a question set. The video encourages users to try this simple yet effective technique of code generation with tests and reflection themselves.

Takeaways

  • πŸ˜€ Introduced LangGraph as a way to build arbitrary logic flows and graphs with LLMs
  • πŸ‘Œ Showed how to implement iterative code generation and testing using LangGraph, inspired by Alpha Codium paper
  • πŸ’‘ Structured LLM outputs using Pydantic for easy testing and iteration on components
  • πŸ”¬ Evaluated code generation success rates with vs without LangGraph - saw ~50% improvement
  • πŸ“ˆ LangGraph enables feedback loops and reflection by re-prompting with prior errors
  • 🌟 Built an end-to-end example flow for answering coding questions using LangGraph
  • πŸ“š Ingested 60K tokens of documentation for code generation context
  • βœ… Checked both imports and execution success of generated code before final output
  • ❀️ Emphasized simplicity of idea and approach for reproducing key concepts from sophisticated models like Alpha Codium
  • πŸ‘ Encouraged viewers to try out LangGraph flows for their own applications

Q & A

  • What is the key innovation introduced in the Alpha Codium paper for code generation?

    -The Alpha Codium paper introduces the idea of flow engineering for code generation, where solutions are tested on public and AI-generated tests, and then iteratively improved based on the test results.

  • How does LangGraph allow building arbitrary graphs to represent logical flows?

    -LangGraph allows defining nodes as functions in a workflow, specifying conditional edges to determine the next node based on output, and mapping the nodes and edges to logical flows like in the code generation example.

  • What is the benefit of using a structured output format from the generation node?

    -Using a structured output format with distinct components allows easily implementing tests and checks for aspects like imports and code execution, as well as feeding errors back into the regeneration process.

  • How does the error handling and regeneration process work?

    -When an error occurs in checking imports or executing code, it is appended to the prompt to provide context. The regeneration node then produces a new solution attempt, using the prior error information.

  • What were the results of evaluating the LangGraph method on a 20 question dataset?

    -While import checks were similar with and without LangGraph, code execution success improved from 55% to 80%, showing a significant benefit from the retry and reflection mechanism.

  • How many iterations does the graph allow before deciding to finish?

    -The example graph allows up to 3 iteration attempts before deciding to finish, to prevent arbitrarily long execution.

  • What size context is used for the generation node?

    -The generation node ingests around 60,000 tokens of documentation related to Lang expression language to use as context for answering questions.

  • What model architecture is used for the generation node?

    -The example implements the generation node using a 125M parameter GPT-3 style model (Dall-E architecture) tuned on Lang documentation.

  • What is the purpose of tracking the question iteration count?

    -Tracking the number of generation attempts for each question allows implementing logic to finish execution after a certain number of tries.

  • How could this approach be extended to more complex use cases?

    -Possibilities include testing against larger public benchmarks, integrating more sophisticated testing frameworks, and using additional regeneration strategies.

Outlines

00:00

πŸ˜€ Introducing code generation with LangGraph

The first paragraph introduces using LangGraph for code generation, inspired by AlphaCode. It talks about representing flows with graphs and iterating solutions based on test results.

05:00

πŸ˜ƒ Structured output using pantic models

The second paragraph demonstrates how to use pantic models to structure code generation output into prefix, imports, and code for later processing. It also shows querying large context models on LangSmith.

10:00

😊 Building a code generation graph

The third paragraph walks through building a code generation graph with nodes for generation, import checking, code execution, and conditionals for retries. It shows how errors can be passed back to the generator.

15:02

πŸ€“ Completing the code generation workflow

The fourth paragraph completes the workflow, connecting the nodes into a full graph. It then tests an example question, showing how errors trigger retries and reflection.

20:05

🧐 Evaluation shows improved performance

The fifth paragraph summarizes an evaluation on 20 questions. Using the graph boosts code execution success rates from 55% to 80%, showing the value of retry and reflection.

Mindmap

Keywords

πŸ’‘code generation

Code generation refers to automatically generating source code using AI models. It relates to the video's theme as Anthropic researcher Lan introduces code generation as an interesting capability of large language models. He highlights a paper on Alpha Codium which does code generation using flow engineering.

πŸ’‘flow engineering

Flow engineering is an approach to code generation where an AI system iteratively tries to improve a code solution using testing and feedback loops. The video focuses on implementing similar ideas using Lang Graph to add tests and retries.

πŸ’‘Lang Graph

Lang Graph allows building graphs to represent logical flows and steps with AI models. Lan shows how to implement flow engineering concepts like generation, testing and retries using Lang Graph.

πŸ’‘tests

Tests refer to validation checks on the generated code, like checking if imports work or if the code executes properly. Tests allow catching errors and provide feedback to retry code generation.

πŸ’‘retries

When tests fail, the system can retry code generation using the error trace as feedback. Retries with reflection on previous mistakes relate to flow engineering ideas shown in the video.

πŸ’‘feedback loop

A feedback loop feeds back test outputs to improve subsequent code generations. The video focuses on implementing feedback loops using Lang Graph for tasks like coding.

πŸ’‘reflection

Reflection here refers to introspecting on previous failed attempts, including the error trace, to improve the next code generation retry.

πŸ’‘context

Context means the reference documents - over 60,000 tokens of code docs - provided to the language model to enable answering coding questions.

πŸ’‘performance

Performance refers to the success rate of generating executable code. Simple tests and retries using Lang Graph improve performance significantly as per the video.

πŸ’‘language model

A language model is the AI system generating the code, conditioned on the coding question prompt and documentation context provided to it.

Highlights

Introduced idea of flow engineering for code generation using testing and iteration to improve solutions

Paper showed ranking and testing solutions on public and AI-generated tests, then iterating to improve based on results

Tweet by Karpathy highlighted moving from prompt-answer to flow where you build up an answer iteratively over time using testing

Introduced LangGraph weeks ago as a way to build arbitrary graphs representing different kinds of flows

Showed using tooling to always format code generation output as a Pydantic object for easy testing and iteration

Implemented simple version of AlphaCode ideas in LangGraph with question node, generation node, import and execution testing nodes, and retry logic on errors

On errors, retry generation appends the error trace to the prompt to induce reflection and retry answering based on prior output

Import checks performed fine without retry logic, but code execution success rate increased from 55% to 80% using graph and reflection

Showed example of catching error on first try, passing to prompt, and second try getting correct functional code

Simple checks and reflections with graphs can significantly improve code generation performance

AlphaCode shows sophistication, this shows simplicity and ease of implementing powerful ideas yourself

LangGraph is great for building reflective, self-improving loops with logical flows and feedback

All code available to run this on any codebase and see improvements

Showed real evaluation results over multiple runs to demonstrate statistical validity of performance gains

Encouraged experimenting with these ideas using the provided building blocks

Transcripts

play00:01

hi this is Lan from Lang chain I want to

play00:04

talk about using Lang graph for code

play00:06

Generation Um so co- generation is one

play00:09

of the really interesting applications

play00:10

of llms like we've seen projects like

play00:13

GitHub co-pilot become extremely popular

play00:16

um and a few weeks ago a paper came out

play00:20

um by the folks at codium AI called

play00:23

Alpha codium and this was really cool

play00:26

paper in particular because it

play00:28

introduced this idea

play00:30

of doing code generation using what you

play00:34

can think of as like flow engineering so

play00:37

instead of just like an llm a coding

play00:41

prompt like solve this problem and a

play00:43

solution what it does is it generates a

play00:47

set of solutions ranks them so that's

play00:50

fine that's like kind of standard like

play00:52

kind of prompt response style flow but

play00:56

what it does here that I want to draw

play00:58

your attention to is if it actually

play01:00

tests that code in a few different ways

play01:03

on public tests and on AI generated

play01:06

tests and the key point is this it

play01:08

actually iterates and tries to improve

play01:11

the solution based upon those test

play01:12

results so that was really

play01:15

interesting and a tweet came out by

play01:18

karpathy on this theme which kind of

play01:22

mentions hey this idea of flow

play01:24

engineering is a really nice Paradigm

play01:26

moving away from just like naive prompt

play01:28

answer to

play01:30

flow where you can build up an answer

play01:32

itely over time using

play01:35

testing so it's a really nice idea and

play01:39

what's kind of cool is a few weeks ago

play01:42

we introduced Lang graph as a way to

play01:44

build kind of arbitrary graphs which can

play01:46

represent different kinds of flows and

play01:49

I've done some videos on this previously

play01:51

talking about Lang graph or things like

play01:53

rag where like you can do retrieval and

play01:55

then you can do like a retrieval quality

play01:57

check like grade the documents if

play01:59

they're not good you can like try to

play02:02

retrieve again or you can like do a web

play02:04

search or something but it's a way to

play02:06

represent arbitrary kind of logical

play02:08

flows with

play02:10

llms in a lot of the same way we do with

play02:12

agents but the benefit of graphs is that

play02:14

you can outline a flow that's a little

play02:16

bit more it's kind of like an agent with

play02:18

guardrails it's like you define the

play02:21

steps in a very particular order and

play02:24

every time you run the graph it just

play02:25

executes in that

play02:27

order so what I want to do is I want to

play02:30

try to implement some of the ideas from

play02:31

alpha codium using L graph and we're

play02:35

going to do that right now so in

play02:38

particular let's say we want to answer

play02:40

coding questions about some part of the

play02:43

Lang chain documentation and for this

play02:45

I'm going to choose the L chain

play02:47

expression language docs so it's a

play02:49

subset of our docs it's around 60,000

play02:52

tokens and it focuses only on line chain

play02:55

expression language which is basically a

play02:56

way you can represent chains using

play02:58

inline chain and we'll talk about that

play03:00

in a little

play03:01

bit but I want to do a few simple things

play03:04

so I want to have one what we're going

play03:07

to call a node in our graph that takes a

play03:09

question and outputs an answer using

play03:12

Lang chain expression language docs as a

play03:15

reference and then with that answer I

play03:19

want to be able to parse out components

play03:22

so given the answer I want to be able to

play03:23

parse out like the Preamble what is this

play03:26

answering the import specifically and

play03:29

then the code and to do this I want to

play03:32

use some like a pedantic object so it's

play03:34

like very nicely

play03:36

formatted if I have that I can really

play03:39

easily Implement tests for things like

play03:42

check to make sure the Imports work

play03:45

check to make sure the code executes and

play03:47

if either of those fail I can loop back

play03:50

to my generation node and say Hey try

play03:52

again here's the error Trace so again

play03:55

what they're doing in Al codium is way

play03:57

more sophisticated I don't mean to

play03:58

suggest we're iing imple M this as is um

play04:01

this actually works on like a bunch of

play04:03

public coding challenges it actually has

play04:06

tests um for each question that are both

play04:08

add and publicly available so again

play04:13

we're doing something much simpler but I

play04:14

want to show how you can Implement these

play04:16

kinds of ideas and you can make it

play04:18

arbitrarily complex if you

play04:20

want so I'm going to copy over some code

play04:24

into a notebook that I have running and

play04:26

all I've done is I've just done some pip

play04:27

installs and I've BAS to find a few

play04:30

environment variables for Lang Smith

play04:32

which we'll see later is pretty

play04:34

useful and I'm going to call this

play04:36

docs so this is where I'm going to

play04:38

ingest the docs related to Lang

play04:40

expression language and I'm going to

play04:43

kick off uh this right now so that's

play04:47

running so again this is using a URL

play04:49

loader grab all the docs sort them and

play04:51

clean them a little bit and here we go

play04:54

so here we go these are all the docs

play04:56

related to Lang and expression language

play04:58

it's around 60,000 token tokens I've

play05:00

measured it in the past so there's our

play05:03

docs now I I want to show you something

play05:05

that's very useful um I'll call it tool

play05:09

use um with open ey models and and other

play05:12

LMS have similar functionality but I

play05:14

want to show you something that's really

play05:16

useful um what I'm going to do here is

play05:19

show how to build a chain that will

play05:23

output remember we talked about in our

play05:25

diagram we want three things for every

play05:27

solution we want a preamble we want want

play05:29

Imports we want code as a structured

play05:32

object that we can like work with

play05:33

individually I'll show you right here

play05:35

how to do that so we're doing is we're

play05:38

importing um uh from pantic this base

play05:42

model and field and we're defining a

play05:43

data model for our output so I want a

play05:46

prefix which is just like the plain

play05:48

language solution like here's the setup

play05:50

to the problem the import statement and

play05:53

the code I want those as three distinct

play05:54

things that I can work with

play05:56

later I'll use dpd4 uh1 25 say 128

play06:00

context window model um and what I'm

play06:04

going to do is I'm going to take this

play06:06

this data model I'm going to turn it

play06:07

into a tool and I'm going to bind it to

play06:10

my model and so basically What's

play06:12

Happening Here is it's going to always

play06:14

perform a function call to attempt to

play06:15

Output in this format I specify here

play06:18

that's all it's happening I Define a

play06:21

prompt that says here's all the L cell

play06:23

they're Lang CH expression language

play06:25

pronounced or or substituted as LC

play06:29

here's all LCL docs answer the questions

play06:32

structure your output in a few ways but

play06:34

what's cool is we're always growing that

play06:36

function call to basically try to Output

play06:38

a pantic object so there we go now

play06:41

what's nice is I can just invoke this

play06:44

with a question so let's just try

play06:47

that so I'm going to say

play06:50

question and I'm going to say how to

play06:54

create a rag chain in NLC we want to

play07:00

run

play07:02

um okay this needs to be a dict there we

play07:05

go boom so that's

play07:08

running now this is actually just we can

play07:11

see right here we passed in all those

play07:14

docs that we previously loaded so it's

play07:16

like 60,000 tokens of context and again

play07:19

you think about newer long context llms

play07:21

like Gemini that becomes more more

play07:23

feasible to do do things like this take

play07:26

like a whole a whole code base a whole

play07:28

set of documentation load it and just

play07:29

stuff it into a model and have it say

play07:32

hey answer questions about this that's

play07:34

still running now the latency is

play07:36

definitely higher because it's very very

play07:37

large context but that's fine we have a

play07:40

little bit of time and we can go over to

play07:42

Lang Smith Al this is running and have a

play07:43

look so we can see here was our prompt

play07:46

okay so there you go look at this 63,000

play07:49

tokens you can see it's a lot of context

play07:51

um that's fine and we can actually see

play07:53

it all here so it's in Langs Smith um we

play07:56

don't want to scroll through all that

play07:57

mess but you can see we've asked a

play08:00

question we're grounding the response in

play08:02

all this L cell docs and we're going to

play08:05

hopefully output the response as a

play08:07

pedantic object that we can play with so

play08:09

let's just see and okay nice it's done

play08:12

so you can see our object here has a

play08:15

prefix um and it actually has um it's

play08:21

also going to have our Imports here as

play08:23

well we can actually can see that in

play08:26

lsmith uh the answer is going to be here

play08:29

and there you go see your Imports your

play08:32

code and your prefix and these can all

play08:34

be extracted from that object uh really

play08:37

easily um so it's basically a list and

play08:39

it's a pantic object code and you can

play08:41

extract each one just like Co you know

play08:44

answer. prefix answer do uh whatever our

play08:47

variables or whatever our keys are

play08:49

answer. Imports answer. code so that's

play08:51

great so that just shows you how tool

play08:54

use Works um and how we can get the

play08:56

structured output out of our generation

play08:58

node

play09:00

now what I'm going to do here is now

play09:03

that we've established that we can do

play09:04

that I'm going to start setting up our

play09:07

graph and what I'm going to do is first

play09:09

I'm going to find our graph state so

play09:11

this is just going to be a dictionary

play09:13

which contains things relevant to our

play09:14

problem it'll contain our code solution

play09:16

it'll contain any errors and that's all

play09:19

we're going to

play09:20

need and here is where this is all the

play09:23

code related to My Graph and we're going

play09:25

to walk through this so don't worry too

play09:26

much but I just want to kind of get this

play09:28

all here

play09:29

so here's our code now if we go up the

play09:32

way to think about this is simply this

play09:35

um I want to go back to my diagram here

play09:38

so every node in our graph just has a

play09:40

corresponding function and that function

play09:43

modifies the state in some way so what's

play09:45

happening is our generation node is

play09:48

going to be working with question and

play09:51

iteration those are the parts of state

play09:53

that we want as like inputs you can see

play09:55

it kind of maps to here you have

play09:56

question and you have iteration just

play09:58

counts how many times you've tried this

play10:00

we'll see why that's interesting

play10:02

later um this is exactly what we saw

play10:06

before data model llm tool

play10:10

use all the same stuff right template

play10:13

now here's where it's

play10:14

interesting if our state contains an

play10:17

error this error key what that means is

play10:21

we've fed back from some of our tests

play10:24

and we have an error that's already been

play10:25

generated so we're retrying here's why

play10:29

interesting if we're

play10:31

retrying what we're going to do is

play10:33

append our prompt just like we saw above

play10:35

we're going to add something to our

play10:36

prompt that says hey you tried this

play10:38

before here was your solution we saved

play10:41

that as generation key um and in our

play10:46

states you can see it's right here code

play10:47

solution generation here is your error

play10:51

please retry to answer this so it's kind

play10:52

of like inducing a reflection based on

play10:55

your prior

play10:57

generation and

play10:59

error and retry so that's a very

play11:02

important point because basically gives

play11:03

us that feedback from if there's a

play11:05

mistake and either the Imports or the

play11:07

executions we're feeding that back to

play11:08

generation and generation is going to

play11:10

retry with that information present so

play11:12

that's that's all that's happening there

play11:15

um and we're basically adding that to

play11:16

the prompt um and we're then invoking

play11:19

the chain with that error and then we're

play11:20

getting a new code solution so again

play11:22

that's if errors in our state dick if it

play11:26

isn't then we're going to go ahead and

play11:28

generate our solution just like we did

play11:30

above same thing so

play11:33

easy um one little thing is every time

play11:37

we return the the basically we're going

play11:39

to rewrite that output to the state

play11:41

we're going to increment our iterations

play11:42

and say Here's how many times we've

play11:44

tried to answer this question that's

play11:45

really it and you can see that's all we

play11:48

do return the generation return the

play11:50

question return the number of iterations

play11:54

easy now here's what's kind of nice we

play11:56

talked about having these two import

play11:58

checks the check for imports to check

play11:59

for execution let's our check import

play12:02

node just going to be really simple we

play12:04

have our

play12:05

solution from the solution we can get

play12:07

the Imports out just like we showed

play12:09

above this code solution Imports is from

play12:11

our pantic object um I'll move it over

play12:14

so you can kind of see so a pantic

play12:16

object has Imports we can get the

play12:18

Imports and all we do is just attempt to

play12:22

execute the Imports if it fails we alert

play12:25

hey import check failed and here's the

play12:28

key point we're just going to create a

play12:29

new key uh error in our dict identifying

play12:34

that hey there's an error present um

play12:37

something failed here and you'll see

play12:40

we're going to use that later now one

play12:41

other little trick if there was a prior

play12:44

error in our state we're just going to

play12:46

pend it so we do want to kind of

play12:47

maintain that um if there's like an

play12:49

accumulation of errors as we run

play12:51

multiple iterations we want to keep

play12:53

accumulating them so we don't like

play12:55

revert and make the same mistake we

play12:57

already did on a future iteration so

play12:59

we're going to maintain our set of

play13:01

Errors now if there's no error here then

play13:04

we're going to rewrite none so we're

play13:06

going to say we're good keep going uh

play13:09

and basically the same thing with code

play13:10

execution right in that case we're just

play13:12

extracting our code and our Imports we

play13:15

create a code block of imports Plus Code

play13:17

try to execute it again if it fails

play13:21

write our error and append all prior

play13:23

errors if it doesn't return none that's

play13:26

it that's all you really need to know

play13:29

now here note that we're going to do two

play13:31

kind of gates so we want to know if did

play13:33

either of those tests fail and again all

play13:36

we need to do is we can uh grab our

play13:39

error and remember if there is no error

play13:44

then if error is none keep going so here

play13:47

we're at the code execution like

play13:50

decision point so do you want to go to

play13:51

code execution or do you want to like

play13:53

revert back and retry so you can see

play13:56

here if there's no error when we get to

play13:58

the this point um then because we've

play14:01

done our import check if there's no

play14:03

error there keep going go to code

play14:05

execution we can see we return this node

play14:07

we want to go to um and if there is an

play14:10

error we can say hey return to the

play14:12

generate node so really what these

play14:14

functions do so these are conditional

play14:17

edges what these do is they do some kind

play14:20

of conditional check based upon like our

play14:22

output state so again if there's an

play14:25

error or not if there's no error it

play14:28

tells you go to this node if there is an

play14:31

error it tells you go back to generate

play14:33

node that's it same deal with deciding

play14:35

to finish again if there's no error and

play14:39

now here's the iteration thing for the

play14:42

sake of Simplicity what I say is give it

play14:45

three tries I don't want it to run

play14:47

arbitrarily long uh if there's no error

play14:50

or if you try three times just end

play14:53

that's it uh otherwise go back to

play14:56

generate so again same kind of thing

play14:58

decide to finished based upon um yeah

play15:02

based upon whether or not there's an

play15:04

error in our code execution or not

play15:06

that's really it that's all we're doing

play15:08

so we can go down we already grabbed all

play15:11

this now here is where we actually

play15:13

Define our what we call our

play15:16

workflow um and so this is actually

play15:20

where we defined all our edges and nodes

play15:22

as these functions and here's just where

play15:23

we kind of stitch them all together um

play15:26

so it's actually pretty straightforward

play15:27

it just follows exactly like the diagram

play15:29

we showed above um we like we're

play15:31

basically adding all of our nodes and

play15:33

we're building our graph following the

play15:36

diagram that we show so we can go back

play15:39

to the diagram so like you can kind of

play15:41

follow along right set your entry point

play15:43

it's generate add an edge generate check

play15:47

code Imports now our conditional Edge um

play15:51

so if we're going to decide to check

play15:54

code execution that was our function

play15:56

here right here

play15:59

so if um basically depending on the

play16:03

output here we can decide the next node

play16:05

to go to so um if the output of the

play16:08

function says check code

play16:10

execution we go to that node if the

play16:12

output says go to generate we go back to

play16:14

generate so these are where you specify

play16:17

the logic of the next node you want to

play16:18

go to and same here so that's all we do

play16:22

compile it done and just map to this

play16:26

diagram um kind of like one to one so

play16:29

that's actually pretty

play16:31

straightforward and there's just one

play16:33

little thing we now need to do uh we are

play16:37

going to go

play16:38

ahead and try a

play16:41

question so here's a question I've I've

play16:44

run a bunch of these tests already this

play16:46

is a question that seems kind of random

play16:48

but it's like we actually built a NE

play16:49

valve set and so it's a question that we

play16:51

we've sound that there's some problems

play16:52

with so I want to show you why this is

play16:53

pretty cool I'm passing it text Key Food

play16:56

in my prompt I want to process it with

play16:57

some function process text how do I do

play17:00

this using uh line transpression

play17:02

language so it's a weird question but

play17:03

you'll see why it's kind of fun in a

play17:05

short in a little bit and what I'm going

play17:07

to do is I'm just going to run my graph

play17:09

so what we can see because we print out

play17:11

what happens at every step we're can to

play17:13

kind of follow along and see what's

play17:15

happening

play17:16

here um so it's going to generate

play17:18

solution now we can see this may take a

play17:20

little bit because it's the same kind of

play17:22

long context generation that we saw

play17:23

previously so this is now running we can

play17:26

go to Lang Smith and we can actually

play17:27

just check this Lang graph and we can

play17:29

see okay so it's loading up and we're at

play17:32

generate so it's actually doing this

play17:34

generation this is still pending here is

play17:36

all our input docs so you can see that

play17:38

um that uh you know we passed this very

play17:42

large context to our LM uh so that's

play17:45

cool okay so here's this is interesting

play17:48

so what's happening is it's going

play17:49

through some checks so um the code

play17:52

import check worked decide to check code

play17:55

execution a decision testing code

play17:57

execution

play17:59

here's an interesting one code block

play18:01

check

play18:02

failed um decision retry so it's

play18:07

actually doing a

play18:08

regeneration

play18:10

so okay let's see it looks like it came

play18:16

to an answer um let's actually go and

play18:21

look at what happened in our Lang graph

play18:23

to kind of understand what happened so

play18:26

what happened if we look at the

play18:31

um let's look at when we

play18:35

attempted yeah exactly so here let me

play18:39

actually pull up the error

play18:42

here

play18:44

um here was our

play18:47

response um and what I want to show you

play18:51

is the error that we appended to our

play18:55

prompt

play18:57

um

play18:59

and we can actually make this a bit

play19:00

faster we can

play19:02

scroll this is the Crux of what I want

play19:04

to show you um okay here it is so what's

play19:10

cool is our initial attempt to solve

play19:13

this problem introduced an error there

play19:16

was an execution error it unsupported

play19:19

Opera for types dict and string so

play19:21

basically it did something wrong and we

play19:25

passed that in the prompt to the llm

play19:29

when it performs this retry so the our

play19:31

initial solution was here and it had a a

play19:34

coding error as noted but here you can

play19:39

see we provide that error and we say

play19:42

please try to reans this structure you

play19:44

know like the same instructions before

play19:46

here was uh here was the the question

play19:50

and we can see okay so this is actually

play19:54

the test of code execution which now

play19:56

works we can see previously when we

play19:59

tried this uh it fails and this was the

play20:02

error that error was passed along in the

play20:04

prompt like we just saw the new the new

play20:07

test indeed Works our final solution is

play20:11

functional

play20:12

code that's it so you can kind of get

play20:14

some intuition for the fact that when

play20:16

you have this retry Lube you can recover

play20:18

from errors using a little bit of

play20:20

reflection that's really the big idea um

play20:23

and again you get your answer out

play20:25

here um and so there's a bunch of keys

play20:29

and we don't necessarily I'll show you

play20:32

quickly uh keys and then we can just

play20:35

look at the generation

play20:37

key um

play20:39

cool and it's going to be a list so

play20:42

let's just break it out so there it is

play20:43

there's our code object we can see

play20:46

prefix okay so there's the prefix

play20:51

import uh and let's try code and hey

play20:56

let's just convince ourselves this

play20:57

actually works so we can just run exe

play20:59

Imports that works

play21:02

ex

play21:04

exec the code and this should work it's

play21:08

doing something there it tells a joke

play21:10

great um so this is pretty cool it

play21:13

initially when to try to answer this

play21:14

question produce an error and it then

play21:17

retry by passing that error back to the

play21:19

context just like we outlined in our

play21:22

graph and on the second try it gets it

play21:24

correct so that's nice it's a good

play21:26

example of how you can do this feedback

play21:27

and and reflection stuff now we actually

play21:31

have done quite a bit more work on this

play21:34

so I built an eval set of 20 Questions

play21:36

related to Lang and expression language

play21:39

and evaluated them all uh using this

play21:43

approach relative to not using Lang

play21:46

graph and here's the results I want to

play21:48

kind of draw your attention to this

play21:49

because it's a pretty interesting result

play21:51

for the import check without Lang graph

play21:54

versus with Lang graph it's about the

play21:56

same so Imports weren't really a problem

play21:59

before this like retry reflection stuff

play22:02

Imports were okay on oural set of 20

play22:05

questions I should make a note we

play22:07

actually ran this uh this is showing

play22:09

standard errors we ran this four times

play22:11

and so I basically accumulate the

play22:12

results I compute standard errors you

play22:13

can see that there's there's some degree

play22:15

of statistical reasonable inness to

play22:17

these results um in any case import

play22:20

checks were were kind of fine without it

play22:23

but here's a big difference there's a

play22:25

big difference in our code execution uh

play22:27

per performance with and without land

play22:29

graph so before land graph if you just

play22:30

try like kind of single shot answer

play22:32

generation a lot of the times this was

play22:34

like a 55% success rate many of the

play22:37

cases we saw code execution fail but

play22:40

with Lang graph with this kind of retry

play22:42

and reflection stuff we actually can see

play22:45

that the the success rate goes up to

play22:47

around I believe it was 80% so it's

play22:50

around like a almost a 50% Improvement

play22:52

in performance um with and without using

play22:55

L graph so that was actually really

play22:57

impressive and and it just shows the

play22:58

power of like a very simple idea um

play23:01

attempting code generation with these

play23:03

kinds of like just very simple checks

play23:06

and reflection can significantly bump up

play23:08

your performance and again the alpha

play23:11

codium paper shows this in like a very

play23:12

sophisticated context but what's cool is

play23:15

this is like a very simple idea you can

play23:17

imp Implement yourself in not much time

play23:20

um and we have this all available as a

play23:22

notebook you can run this on any piece

play23:25

of code you want so just take whatever

play23:27

documents want here Plum them in and you

play23:29

can run this and you can test this out

play23:31

for yourself but I've been really

play23:33

impressed I think it's pretty cool um

play23:36

and in general I think lra's a really

play23:38

nice way to build these kind of like

play23:40

reflective or self-reflective

play23:41

applications where you can build these

play23:43

feedback loops to you can do a check if

play23:47

the check fails try again with that Fe

play23:50

with that uh feedback

play23:52

present um in the retry and I'll just

play23:56

show you we have a Blog coming out I'm

play23:57

not sure there's anything else in that

play23:58

blog I haven't already showed you

play24:02

um yeah not nothing really to highlight

play24:05

this was our results again um this is

play24:07

maybe a little bit clearer to see um but

play24:10

again pretty significant Improvement in

play24:12

performance simple idea uh I definitely

play24:15

encourage you to experiment with this um

play24:17

and of course all this code will be

play24:18

available for you so um uh you know feel

play24:21

free to experiment and let us know how

play24:22

it goes thank

play24:24

you