How to Use LangSmith to Achieve a 30% Accuracy Improvement with No Prompt Engineering

LangChain
2 May 202415:50

Summary

TLDR在这段视频中,Harrison 介绍了如何通过使用 Lang Smith 平台,一个与 Lang Chain 独立但兼容的代码工程工具,来显著提升应用程序性能。Lang Smith 通过集成日志记录、追踪、数据测试和人类标注等工具,帮助用户改善应用性能。视频中,Harrison 通过一个分类任务的实例,展示了如何设置环境变量,使用 Open AI 客户端进行分类,并利用 Lang Smith 追踪和收集反馈。通过自动化规则,将带有反馈的数据点移动到数据集中,然后使用这些数据点作为示例来改进应用程序。此外,还介绍了如何使用语义搜索来选择与当前输入最相似的示例,以提高应用性能。整个过程展示了一个反馈循环,通过收集反馈、自动化处理和应用改进,不断优化应用程序。

Takeaways

  • 🚀 Harrison 来自 Lang chain,他们发布了一篇博客,讲述了 dosu 如何通过使用 Lang chain 构建的工具,在没有进行任何代码工程的情况下提高了应用性能 30%。
  • 🛠️ dosu 使用的是 Lang Smith 平台,这是一个与 Lang chain 独立的平台,可以单独使用或与 Lang chain 一起使用。
  • 🔍 Lang Smith 通过日志记录、追踪、测试和评估数据流来改善应用性能,其强大之处在于这些功能都集成在一个平台上。
  • 📈 dosu 通过 Lang Smith 实现性能提升的具体任务是分类,这是一个相对简单的任务,按照大型语言模型(LLM)的标准来看。
  • 📝 在教程中,首先设置环境变量,这些变量将用于将数据记录到 Lang Smith 项目中。
  • 🔗 dosu 使用 OpenAI 客户端直接进行分类任务,而不是使用 Lang chain。
  • 🔑 通过 Lang Smith 可以为运行留下反馈,这些反馈与特定的运行 ID 关联,以便随着时间的推移收集反馈。
  • 🔄 Lang Smith 中的数据飞轮可以通过自动化规则将带有反馈的数据点移动到数据集中。
  • 📊 通过自动化规则,可以将正面反馈和带有修正的负面反馈分别添加到不同的数据集中。
  • 🔧 在 Lang Smith 中设置好规则后,需要重新运行数据点以触发规则,以便规则能够识别并处理这些数据点。
  • ⏱️ 规则默认每 5 分钟运行一次,可以通过查看日志来确认规则是否已触发以及它们运行的数据点。
  • 📚 通过 Lang Smith 收集的反馈和数据集可以用来改进应用程序,例如通过使用少量示例来训练模型,使其学习并泛化到其他输入。
  • 🔍 dosu 还进行了语义搜索,以在大量示例中找到与当前输入最相似的少数几个示例,以提高应用性能。

Q & A

  • Dosu是如何通过使用Lang Smith提高应用性能的?

    -Dosu通过使用Lang Smith平台,结合日志记录、追踪、测试和评估数据,以及用户反馈,创建了一个数据流,从而提高了应用性能30%。

  • Lang Smith是如何帮助Dosu改进应用的?

    -Lang Smith通过集中日志记录、追踪、测试和评估等工具,允许用户在一个平台上进行操作,从而形成了一个数据流,帮助Dosu改进其应用。

  • 在Lang Smith中,用户如何留下与运行相关的反馈?

    -用户可以在Lang Smith中通过创建一个运行ID来关联反馈。通过这个运行ID,用户可以为特定的运行留下正面或负面的反馈,包括纠正错误的标签。

  • 如何使用Lang Smith的自动化功能来改进应用?

    -通过设置自动化规则,可以将带有反馈的数据点移动到数据集中。这些数据集随后可以在应用中使用,以改进应用的性能。

  • Dosu在Lang Smith中是如何使用分类任务的?

    -Dosu在Lang Smith中使用分类任务来识别问题的主题,如bug、改进、新特性、文档或集成等,并通过Lang Smith追踪和反馈机制来优化分类准确性。

  • Lang Smith中的正面反馈和负面反馈是如何定义的?

    -在Lang Smith中,正面反馈是通过用户评分为1来定义的,表示用户对结果满意。负面反馈则是通过提供纠正值来定义的,表示结果需要改进。

  • 如何通过Lang Smith的数据集来改进应用的分类准确性?

    -通过将Lang Smith中收集的正面和负面反馈(包括纠正值)添加到数据集中,可以在应用中使用这些数据点来训练和改进分类模型。

  • Dosu是如何使用Lang Smith的语义搜索来优化输入的?

    -Dosu通过创建所有示例的嵌入,然后为当前输入创建嵌入,并找到最相似的示例来进行语义搜索,从而优化输入并提高应用性能。

  • Lang Smith中的自动化规则是如何触发的?

    -自动化规则在设置后,会根据预设的条件自动触发。例如,可以设置规则以便在收集到正面或负面反馈时,自动将相关数据点添加到特定的数据集中。

  • Lang Smith如何帮助Dosu处理大量的用户反馈?

    -Lang Smith允许Dosu通过自动化和语义搜索技术,从大量的用户反馈中筛选出最相关的示例,并将这些示例作为输入来改进应用。

  • 如何将Lang Smith中的反馈和数据集应用到实际的应用程序中?

    -通过Lang Smith的API,可以将收集到的反馈和数据集中的示例集成到应用程序中,用于训练和改进模型,从而提高应用程序的性能。

  • Dosu在Lang Smith中使用的分类任务是否仅限于简单的任务?

    -虽然Dosu在Lang Smith中使用的分类任务是一个相对简单的任务,但Lang Smith的概念和工具也适用于更复杂的任务,有助于在各种应用场景中提高性能。

Outlines

00:00

🚀 应用性能提升30%的秘诀

Harrison介绍了Lang chain团队成员dosu如何通过使用Lang Smith平台,在没有进行工程改造的情况下提升应用性能达30%。Lang Smith是一个独立的平台,可以与Lang chain结合使用,也可以单独使用。它通过日志记录、追踪数据流、测试评估和人工注释来提高应用的数据流效率。Dosu通过Lang Smith的集成特性,建立了一个数据反馈循环,从而优化应用性能。教程中展示了如何设置环境变量,使用OpenAI进行分类任务,并使用Lang Smith进行追踪和反馈,以实现性能提升。

05:00

🔍 Lang Smith的数据反馈循环

本段详细介绍了如何使用Lang Smith收集反馈,并利用这些反馈构建数据集以优化应用。首先,通过创建反馈规则,将带有正面反馈的运行数据和带有纠正反馈的运行数据分别添加到不同的数据集中。然后,在Lang Smith中设置自动化规则,定期触发这些规则,将符合特定条件的运行数据和反馈移动到相应的数据集中。这样,就可以利用这些数据点来改进应用的性能,实现持续优化。

10:02

📈 利用数据集提升应用性能

在建立了数据集之后,可以通过将这些数据点作为示例输入到应用中,来提升应用的性能。具体做法是,使用Lang Smith客户端从数据集中提取示例,并将它们格式化为提示模板的一部分。通过这种方式,应用可以学习到之前输入输出的模式,并根据这些模式对新的输入做出更准确的分类。此外,还可以通过留下更多的反馈,来进一步训练和改进应用。

15:02

🔗 构建有效的反馈循环

最后一段概述了如何构建一个有效的反馈循环,以持续提升应用性能。首先,捕获与运行相关的反馈并将其存储在Lang Smith中。然后,设置自动化规则,将这些运行及其反馈移动到创建的数据集中。接下来,从数据集中提取示例,并将它们用于应用以提高其性能。这个过程不仅适用于分类任务,还可以应用于更复杂的场景。作者鼓励有兴趣的人进行尝试,并提供帮助。

Mindmap

Keywords

💡Lang chain

Lang chain 是一个开源平台,它允许用户构建和部署机器学习模型。在视频中,Lang chain 被用来展示如何通过其构建的工具提高应用程序性能。它与 Lang Smith 配合使用,但 Lang Smith 可以独立于 Lang chain 运行。

💡Lang Smith

Lang Smith 是一个与 Lang chain 配合使用的平台,它通过集成日志记录、追踪、测试、评估和人类注释等功能,帮助用户改善应用程序的数据流。在视频中,它被用来收集反馈并利用这些反馈来提高应用程序的性能。

💡数据流

数据流指的是应用程序中数据的流动过程。在视频中,通过 Lang Smith 优化数据流是提高应用程序性能的关键,这涉及到日志记录、追踪和测试应用程序中的数据。

💡分类任务

分类任务是一种机器学习任务,它涉及将输入数据分配到预定义的类别中。在视频中,分类任务被用作一个例子,展示了如何通过 Lang Smith 收集反馈并利用这些反馈来提高模型的性能。

💡反馈

反馈是用户或系统对应用程序性能的评估。在视频中,通过 Lang Smith 收集的反馈被用来创建数据集,这些数据集随后被用来改进应用程序的分类准确性。

💡自动化

自动化是指在 Lang Smith 中设置规则,以自动地将带有反馈的数据点移动到特定的数据集中。在视频中,自动化规则被用来简化数据收集和处理流程,提高效率。

💡数据集

数据集是一组用于训练或测试机器学习模型的数据。在视频中,通过自动化规则创建的数据集包含了用户反馈,这些数据集随后被用来训练和改进应用程序的分类模型。

💡few-shot learning

few-shot learning 是一种机器学习范式,其中模型使用非常少量的示例进行学习。在视频中,通过使用 Lang Smith 收集的少量示例,应用程序能够学习并改进其分类性能。

💡语义搜索

语义搜索是一种搜索技术,它根据内容的语义相关性而不是关键词匹配来检索信息。在视频中,dosu 利用语义搜索来找到与当前输入最相关的示例,以提高分类的准确性。

💡嵌入

嵌入是将数据(如文本)转换为可以由机器学习模型处理的数值向量的过程。在视频中,dosu 为所有示例创建嵌入,然后使用这些嵌入来进行语义搜索,找到最相关的示例。

💡GitHub 接口

GitHub 接口是一个软件开发平台,它允许用户进行版本控制和协作。在视频中,提到了使用 GitHub 接口来收集用户反馈,这些反馈随后被用来改进应用程序的分类模型。

Highlights

Harrison 来自 Lang chain,他们发布了一篇博客,讲述了 dosu 如何通过使用 Lang chain 构建的工具,在没有进行任何重工程的情况下,将其应用程序性能提高了 30%。

dosu 使用了 Lang Smith 平台,这是一个与 Lang chain 独立且可以单独使用的工具,用于改善应用程序的数据流。

Lang Smith 结合了日志记录、追踪、测试、评估和人类注释等多种功能,所有这些功能都集成在一个平台上。

通过 Lang Smith,可以设置数据飞轮,开始提升应用程序性能。

教程展示了 dosu 如何通过分类任务实现性能提升,这是一个相对简单的任务,按照大型语言模型(LLM)的标准来看。

dosu 使用 OpenAI 客户端直接进行分类任务,而不是使用 Lang chain。

通过 Lang Smith 追踪功能,可以优雅地追踪事物,并将反馈整合到应用程序中。

dosu 面临的挑战是构建一个既适用于 Lang chain 也适用于其他项目(如 pantic)的系统。

在 Lang Smith 中,可以通过运行反馈来留下与运行相关的反馈。

通过创建反馈函数,可以标记运行结果为好或坏,并使用运行 ID 关联反馈。

Lang Smith 允许用户通过自动化规则将反馈数据移动到数据集中。

通过自动化规则,可以将正面反馈和纠正的反馈分别添加到不同的数据集中。

教程展示了如何重新运行数据点,以便自动化规则可以捕获它们。

在 Lang Smith 中,可以查看运行、输入、输出以及反馈,包括更正。

通过使用数据集中的示例作为少量样本,可以改善应用程序的性能。

dosu 通过语义搜索在示例中找到与当前输入最相似的例子,以此来提高应用程序的准确性。

教程提供了一个指南,展示了如何将 Lang Smith 中的数据点集成到应用程序中,以提高性能。

通过持续收集反馈并将它们作为示例,可以让模型学习并提高对新输入的分类准确性。

该方法不仅适用于分类任务,Lang chain 认为这些概念也可以应用于更复杂的任务。

Lang chain 对于将这些概念应用于更广泛场景的可能性感到兴奋,并愿意提供帮助。

Transcripts

play00:00

hi y this is Harrison from Lang chain

play00:02

today we released a Blog about how dosu

play00:05

a code engineering teammate improved

play00:08

some of their application performance by

play00:10

30% without any prompt engineering and

play00:14

it's using a lot of the tools that we've

play00:16

built at laying chain over the past few

play00:19

months and so today in this video I want

play00:22

to walk through roughly how they did

play00:25

that and walk through a tutorial that

play00:27

will teach you how you can do it on your

play00:30

application as well so specifically what

play00:33

they used was Lang Smith and so Lang

play00:36

Smith is our separate platform it's

play00:37

separate from Lang chain the open source

play00:39

it actually works with and without Lang

play00:40

chain and actually dosu doesn't use

play00:42

linkchain but they use Lang Smith and

play00:44

what lsmith is is a combination of

play00:46

things that can be aimed at improving

play00:49

the data fly whe of your application so

play00:51

this generally consists of a few things

play00:53

this generally consists of logging and

play00:55

tracing all the data that goes through

play00:57

your applications testing and valuation

play01:00

and Lance is doing a whole great series

play01:02

on that right now there's a promp hub

play01:05

there's some human annotation cues but

play01:07

the real power of Lang Smith comes from

play01:10

the fact that these aren't all separate

play01:12

things these are all together in one

play01:14

platform and so you can set up a really

play01:17

nice flywheel of data to to start

play01:20

improving the performance of your

play01:22

application so let's see what exactly

play01:26

that

play01:27

means there's a tutorial that we put

play01:30

together that walks through in similar

play01:33

steps some of the same things that dosu

play01:35

did to achieve a 30% increase um and the

play01:38

task that they did it for was

play01:40

classification um which is a relatively

play01:42

simple task by llm standards but let's

play01:44

take a look at what exactly it involves

play01:47

so we're just going to walk through the

play01:48

tutorial the first thing we're going to

play01:50

do is set up some environment variables

play01:53

here these this is how we're going to

play01:55

log uh data to our laying Smith project

play01:58

I'm going to call it classifier

play02:00

demo um set that up let me let me

play02:04

restart my kernel clear all previous

play02:07

ones now set that up

play02:10

awesome so this is the simple

play02:12

application that mimics uh some of what

play02:15

uh dosu did um so if we take a look at

play02:17

it um we can see that we're using open

play02:20

AI we're not even using Lang chain we're

play02:22

just using open AI client directly and

play02:24

we're basically doing a classification

play02:26

task we've got this like F string prompt

play02:28

template thing that's class classify the

play02:30

type of the issue as one of the

play02:31

following topics we've got the topics up

play02:33

here bug Improvement new feature

play02:35

documentation or integration we then put

play02:37

the issue text um and and then we really

play02:41

just wrap this in the Langan Smith

play02:43

traceable this just will Trace things

play02:46

nicely to Lang Smith um and this is our

play02:49

this is our

play02:50

application if we try it out we can see

play02:54

that it does some classification steps

play02:57

so if I paste in this issue fix bug in

play03:00

lell I would expect this to be

play03:02

classified as a bug and we can see

play03:04

indeed that it is um and if I if I do

play03:09

something else like let's do H like fix

play03:12

bug in documentation so this is slightly

play03:14

trickier because it touches on two

play03:16

concepts it touches on bug and it

play03:18

touches on documentation now in the

play03:20

Linkin repo we would want this to be

play03:22

classified as a documentation related

play03:24

issue but we can see that off the bat

play03:26

our prompt template classifies it as a

play03:28

bug adding even more complexity in here

play03:32

the fact that we want it classified as

play03:34

documentation is something that's maybe

play03:36

a little bit unique to us if if pantic

play03:39

or some other project was doing this

play03:41

maybe they would want this to be

play03:42

classified as a bug and so Devon at dosu

play03:45

has a has a really hard job of of trying

play03:47

to build something that'll work for both

play03:49

us and pantic and part of the the way

play03:53

that he's able to do that is by starting

play03:55

to incorporate some feedback from us as

play03:58

and users into his applic

play04:01

so one of the things that you can do in

play04:04

uh laying Smith is leave feedback

play04:06

associated with runs um so for this

play04:10

first run that gets a positive

play04:13

score so if we if we run this again

play04:16

notice one of the things that we're

play04:17

doing is we're passing in this run ID um

play04:21

and so this run ID is basically a uu ID

play04:24

that we're passing in the reason that

play04:25

we're creating it up front is so that we

play04:27

can associate feedback with it over for

play04:29

time um so if we run

play04:32

this and then if we create our L Smith

play04:37

client and if we create the feedback

play04:40

associated with this this is a pretty

play04:42

good one so we can assume that that it's

play04:45

been marked as good um we've collected

play04:47

this in some way if you're using like

play04:49

the GitHub interface that might be you

play04:51

know they they don't change the label

play04:54

they think it's good and so we'll mark

play04:56

this as user score

play04:57

one and we're using the run ID that we

play05:00

create above and pass in so we're using

play05:02

this to collect feedback now we've got

play05:04

this followup fix bugging documentation

play05:06

it creates the wrong uh kind of like

play05:08

label we can leave feedback on that as

play05:10

well so we can now call this create

play05:13

feedback function and notably we're

play05:16

leaving a

play05:17

correction so so uh this key can be

play05:20

anything I'm just calling it correction

play05:22

to line up but then instead of passing

play05:24

in score as we do up here I'm passing in

play05:27

this correction value and this corre C

play05:29

value is something that's a first uh

play05:31

first class citizen in lsmith to denote

play05:35

the corrected values of what a run

play05:37

should be and so this should be

play05:39

documentation and let's assume that I've

play05:41

gotten this feedback somehow maybe as an

play05:43

end user I correct the label in in

play05:45

GitHub to have it say documentation

play05:48

instead of bug so let's log that to link

play05:51

Smith okay awesome so this is generally

play05:54

like what I set up in my code I now need

play05:57

to do a few things in Lang Smith in

play05:59

order to take advantage of this data

play06:01

flywheel so let's switch over to link

play06:03

Smith I can see I've got this classifier

play06:07

demo project if I click in I can see the

play06:08

runs that I just ran if I click into a

play06:11

given run I can see the inputs I can see

play06:13

the output I can click into feedback and

play06:16

I can see any feedback so here I can see

play06:18

correction and I can see the correction

play06:20

of documentation if I go to the Run

play06:22

below I can see that I've got a score of

play06:25

one because this is the input that was

play06:27

fixed bug and lell and output of that

play06:30

okay awesome so I have this data in here

play06:31

I've got the feedback in here let's

play06:33

start to set up some Automation and what

play06:35

I'm going to want to do is I'm going to

play06:37

want to move data that has feedback

play06:40

associated with it into a data

play06:43

set so I'm going to do that by I'm I'm

play06:45

going to click add a rule I'm going to

play06:47

call this posit positive feedback I'm

play06:52

going to say sampling rate of one I'm

play06:54

going to add a filter um I'm going to

play06:56

add a filter of where feedback is is

play07:01

user score is one um and I can see that

play07:06

actually actually let me switch out my

play07:08

view so I can see one thing I can one

play07:09

thing that's nice to do is just preview

play07:11

what the filters that you add to the

play07:13

rule are actually going to do so I can

play07:15

do that here I can go

play07:17

filter feedback user score one and I can

play07:21

see that when applied this applies to

play07:23

one run so I can basically preview my

play07:25

filters here I can now click add rule it

play07:27

remembers that filter

play07:30

let's call this positive feedback and if

play07:33

I get this positive feedback I just want

play07:35

to add it to a data set so I just want

play07:37

to add it to a data set let me create a

play07:38

new one let me name it uh

play07:42

classifier demo um it's going to be a

play07:45

key value data set which basically just

play07:47

means it's going to be dictionaries in

play07:49

dictionaries out and let me create

play07:52

this and I've now got this rule um I am

play07:56

not going to click use Corrections here

play07:58

because remember this is the positive

play08:00

feedback that I'm collecting okay great

play08:04

let's save that now let's add another

play08:06

rule let's go back here let's remove

play08:08

this filter and let's add another filter

play08:10

which is instead when it has Corrections

play08:12

so now I'm saying anytime there's

play08:13

corrections I can see the filter applied

play08:16

again go here add

play08:18

rule I can now uh let's call it

play08:22

negative feedback I'm going to add it to

play08:25

a data set let's call it classifier demo

play08:28

and now I'm going to click use

play08:29

Corrections cuz now when this gets added

play08:32

to the data set I want it to basically

play08:35

use the corrections instead of the True

play08:36

Value so let's save this and now I've

play08:39

got two rules

play08:43

awesome okay so now I've got these rules

play08:45

set up these rules only apply to data

play08:48

points and feedback that are logged

play08:50

after they are set up so let's go in

play08:54

here and we basically need to rerun and

play08:57

and have these same data points in there

play08:58

so that the rule rules can pick them up

play09:00

so let's run this this is the one with

play09:03

positive feedback so let's leave that

play09:04

correction let's rerun

play09:06

this this is the one with negative

play09:08

feedback so let's leave that correction

play09:10

um and now basically we need to wait for

play09:12

the rules to trigger by default they run

play09:15

every 5 minutes we can see now that it

play09:17

is 11:58 just 1159 and so this will

play09:21

trigger in about a minute so I'm going

play09:22

to pause the video and wait for that to

play09:26

trigger all right so we're back it's

play09:29

just after noon which means the rules

play09:32

should have run the way I can see if

play09:34

this happened by the way I can click on

play09:35

rules I can go see logs so I can see

play09:38

logs and I can see that there was one

play09:40

rule um or there was one run that was

play09:42

triggered by this rule I can go to the

play09:44

other one I can see again there was one

play09:46

run that was triggered by this Rule and

play09:48

so basically that's how I can tell if

play09:49

these rules were run and what data

play09:51

points they were run over so now that

play09:54

they've been run I can go to my data

play09:56

sets and testing I can search for

play09:58

classify

play10:00

demo I can look in and I can see that I

play10:01

have two examples I have this fixed bug

play10:04

in lell with the output of bug and so

play10:05

this is great this is just the the

play10:07

original output and then I also have

play10:09

this other one fix bug and documentation

play10:11

with this new output of documentation

play10:13

and this is the corrected value so we

play10:15

can see that what I'm doing is I'm

play10:16

building up this data set of correct

play10:19

values and then basically what I'm going

play10:21

to do is I'm going to use those data

play10:24

points in my application to improve its

play10:27

performance so let's see how to do do

play10:29

that and so we can go back to this nice

play10:31

little guide we've got it walks through

play10:33

the automations here and now we've got

play10:36

some new code for our application so

play10:38

let's pull it down and let's take a look

play10:40

at what's going on so we've got the

play10:42

Langs Smith client and we're going to

play10:43

need this for our application because

play10:45

now we're pulling down these uh these

play10:47

examples in the data set I've got this

play10:50

little function this little function is

play10:51

going to take in examples and it's

play10:53

basically going to create a string that

play10:54

I'm going to put into the prompt so it's

play10:56

basically going to create a string

play10:57

that's just alternating inputs and then

play11:00

outputs super easy and that's that's

play11:03

honestly most of the new code this is

play11:05

all same code as before here we Chang

play11:07

The Prompt template so we add these two

play11:10

lines here here are some examples and

play11:11

then a placeholder for

play11:13

examples okay and we'll see how we use

play11:15

that later on and now what we're doing

play11:17

is inside this function we're pulling

play11:19

down all the examples that are part of

play11:22

this classifier demo um

play11:26

project so I'm listing examples that

play11:28

belong to the this data set and then by

play11:32

default it returns an iterator so I'm

play11:33

calling list on it to get a concrete

play11:35

list I'm passing that list into my a

play11:37

function that I defined above create

play11:39

example string and then I'm formatting

play11:41

The Prompt by passing in uh the examples

play11:44

variable to be this example string all

play11:47

right so let's now try this out with the

play11:49

same input as before so if we scroll up

play11:52

and we take this same input fix bug and

play11:55

documentation and if we run it through

play11:57

this new method we can see that we get

play12:00

back documentation notice here that the

play12:03

input is the same as before so it's just

play12:05

learning that if it has the exact same

play12:07

input then it should output the same

play12:10

output the thing that we're doing by

play12:13

using this as a few shot example is it

play12:15

can also generalize to other inputs so

play12:18

if we change this to like address bug in

play12:22

documentation we can see that that's

play12:23

still classified as documentation

play12:25

there's still these conflicting kind of

play12:26

like bug and documentation ideas but

play12:30

it's learning from the example and it's

play12:32

learning that there should be

play12:33

documentation um let's see what some

play12:36

other okay so now you know like does

play12:39

this fix all issues no let's let's try

play12:42

out some things like make Improvement in

play12:44

documentation is this going to be

play12:46

classified as a Improvement or as

play12:49

documentation so it's classified as

play12:51

Improvement we probably want it to be

play12:53

classified as documentation so one thing

play12:55

we can do is we can leave more feedback

play12:58

for it and so this this imitates exactly

play13:00

what would happen um in real life in

play13:02

GitHub issues like you keep on seeing

play13:05

these new types of questions that come

play13:07

in that aren't exactly the same as

play13:09

previous inputs because obviously

play13:10

they're not and then you can start to

play13:13

capture that as feedback and use them as

play13:16

examples to improve your application so

play13:18

we can create more feedback for for this

play13:21

run like hey we want this to be about

play13:23

documentation great so that's a little

play13:26

bit about how we can start to capture

play13:28

these examples use them as few shot

play13:30

examples have the model learn from

play13:32

previous patterns about what it's about

play13:35

what it's

play13:36

seen the last cool thing that uh dosu

play13:39

did that I'm not going to walk through

play13:41

or I'm not going to replicate it in code

play13:43

but I'll walk through it is they

play13:45

basically did a semantic search over

play13:46

examples and so what is this and why did

play13:49

they do this first they did this because

play13:51

they were getting a lot of feedback so

play13:54

they had hundreds of data points of good

play13:57

and corrected feedback that they they

play13:59

were logging to lsmith and so at some

play14:01

point it becomes too much to pass in

play14:03

hundreds or thousands of examples so

play14:06

rather what they wanted to do was they

play14:08

only wanted to pass in like five or 10

play14:09

examples but they didn't want to just

play14:11

pass in five or 10 random examples what

play14:13

they wanted to do was pass in the

play14:15

examples that were most similar to the

play14:17

current input and so the rationale there

play14:20

is that if you look for examples that

play14:22

are similar to the input the outputs

play14:25

should also be similarish or there

play14:27

should be like the logic that applies to

play14:30

those inputs should be similar to the

play14:33

logic that applies to the new input so

play14:36

basically what they did was they they

play14:37

took all the examples um they created

play14:40

embeddings for all of them they then

play14:42

took the incoming uh kind of uh uh they

play14:46

they took the incoming input created

play14:48

embeddings for that as well and then

play14:50

basically found the examples that were

play14:51

most similar to that and so this is a

play14:54

really cool way to have thousands of

play14:58

examples but still only use five or 10

play15:00

for your application for a given point

play15:02

in

play15:03

time hopefully this is a nice overview

play15:06

of how you can start to really build the

play15:08

feedback loop you can capture feedback

play15:11

associated with runs and store those in

play15:14

link Smith you can set up automations to

play15:17

move those runs and sometimes their

play15:19

feedback as well to create data sets of

play15:22

good examples and you can then pull

play15:24

those examples down into your

play15:25

application and use that to improve the

play15:27

performance going forward

play15:30

doing this with classification is a

play15:31

relatively simple

play15:33

example however there are lots more

play15:35

complex examples that we think these

play15:37

same exact Concepts can be relevant for

play15:40

and so we're very excited to try those

play15:42

out if you have any questions or if you

play15:44

want to explore this more please get in

play15:46

touch we' love to help

Rate This

5.0 / 5 (0 votes)

Related Tags
性能优化Lang Smith数据反馈自动化规则分类任务应用改进用户体验教程指南代码工程人工智能机器学习
Do you need a summary in English?