Optimization of LLM Systems with DSPy and LangChain/LangSmith

LangChain
6 Apr 202457:54

Summary

TLDR在这段对话中,Harrison Chase 和 Omar 讨论了 DSP(Declarative Self-Improving Language Programs)框架,这是一个用于开发和优化语言程序的工具。Harrison 作为 Lang chain 的 CEO 和联合创始人,介绍了 DSP 如何帮助开发者构建程序,通过将复杂问题分解为可由语言模型处理的小块,来解决困难任务。Omar,作为斯坦福大学的 PhD 候选人,分享了他在 NLP 和 K-bear 检索系统上的工作,并讨论了 DSP 如何用于构建语言模型的程序。他们还探讨了如何使用 DSP 来生成有效的提示(prompts),以及如何优化这些程序以适应不同的语言模型。此外,他们还讨论了 DSP 与其他工具如 Lang chain 的潜在整合,以及如何利用 DSP 进行流工程(flow engineering)和优化。

Takeaways

  • 🚀 Harrison Chase 是 LangChain 的 CEO 和联合创始人,LangChain 是一个开发者工具包,旨在简化构建大型语言模型(LM)应用程序的过程。
  • 🎓 Omar 是斯坦福大学的博士候选人,专注于自然语言处理(NLP)和信息检索,并且对 DSPI 框架——一种用于开发和优化语言程序的框架——感到兴奋。
  • 🤖 DSPI(Declarative Self-Improving Language Programs)允许开发者通过编写代码来解决复杂问题,其中包含一些由 AI 模型特别是语言模型处理的“模糊步骤”。
  • 🔍 DSPI 的目标是创建一个程序,该程序可以针对不同的语言模型和不同的任务进行优化,以提高运行效率。
  • 🔧 在 DSPI 中,开发者可以使用标准的编程工具,如循环、递归、断言等,但是关键步骤是将问题分解并利用语言模型来解决。
  • 🧩 DSPI 通过模块化的方式处理问题,允许开发者声明式地定义程序的不同阶段,然后自动生成优化的程序版本。
  • 🔗 DSPI 通过“展示、搜索、预测”(Demonstrate, Search, Predict)的方法来设计程序流程,其中“展示”阶段已被自动化,简化了程序的开发。
  • 📈 DSPI 通过优化器(optimizers)来改进程序,这些优化器负责生成有效的提示(prompts)或微调(fine-tune)语言模型,以适应特定的任务。
  • 🔬 DSPI 允许开发者通过少量示例(甚至无需标签)来训练模型,这通过引导式射击(bootstrap few-shot)和随机搜索等技术实现。
  • ⚙️ LangChain 和 DSPI 可以结合使用,LangChain 提供了更高级别的抽象和集成语言模型的能力,而 DSPI 提供了优化和流工程的工具。
  • 🌐 讨论了如何将 LangChain 中的追踪和测试功能与 DSPI 的优化能力结合起来,以提高整个系统的效能和可观测性。

Q & A

  • DSP框架是如何帮助开发者构建语言程序的?

    -DSP框架通过提供一种思考方式,帮助开发者理解如何将复杂问题分解成可以由语言模型处理的小块。开发者可以利用标准编程工具,如循环、递归等,同时使用DSP来表达那些难以用标准代码表达的模糊步骤,然后DSP会将这些高级声明转换成优化的程序,使其在特定的语言模型上有效运行。

  • 什么是DSP框架的主要问题和它试图解决的问题?

    -DSP框架主要解决的问题是如何构建程序以及如何为不同类型的问题设计程序。特别是,一旦编写了程序并希望在不同的语言模型上运行它时,如何生成有效的提示(prompts)来指导语言模型有效执行程序中的各个阶段。

  • DSP框架如何帮助开发者优化程序以适应不同的语言模型?

    -DSP框架允许开发者在高层次上声明程序的模糊步骤,并自动处理如何将这些步骤转换为特定语言模型可以执行的优化版本。这意味着,即使开发者更换了语言模型,或者由于成本、性能等因素需要调整模型,他们不需要重写提示,而是可以利用DSP框架提供的优化器来适应新模型。

  • DSP框架中的'flow engineering'是指什么?

    -在DSP框架中,'flow engineering'是指创建或使系统像流水一样工作的过程,其中包括不同的阶段。开发者可以声明系统中的各个阶段,并优化每个阶段以提高整体系统的性能。

  • DSP框架中的'demonstrate search predict'是什么意思?

    -DSP框架中的'demonstrate search predict'是一种设计模式,用于构建语言程序。其中'search'是指语言系统如何收集信息,'predict'是指如何设计系统架构以输出基于收集信息的答案。DSP框架提供了一系列的设计模式来指导如何构建这些阶段。

  • 如何使用DSP框架来优化生成搜索查询的程序?

    -在DSP框架中,开发者可以定义一个程序,该程序能够根据用户的问题生成优化的搜索查询。DSP框架允许开发者声明这些查询应该如何生成,并自动优化这个过程,以确保查询能够有效地找到相关的信息。

  • DSP框架中的模块和优化器是如何工作的?

    -DSP框架中的模块用于处理程序中的特定步骤,如Chain of Thought模块,它负责学习如何教授语言模型进行特定类型的推理。优化器则负责优化模块的行为,这可能包括生成指令、创建示例或构建用于微调模型的数据。优化器使用少量示例(有时甚至无需标签)通过搜索过程来最大化在给定示例上的平均质量。

  • 在DSP框架中,如何使用优化器来提高系统的性能?

    -在DSP框架中,优化器通过搜索提示(prompts)的设计空间来提高系统性能。开发者可以定义一个性能指标,然后优化器会尝试不同的提示,以找到能够最大化该指标的提示。这个过程可以是随机搜索、贝叶斯优化或其他优化技术。

  • DSP框架中的'trace'是什么,它如何帮助优化过程?

    -在DSP框架中,'trace'是指程序执行过程中的记录,它可以捕捉到程序在处理特定输入时的行为。通过分析这些trace,开发者可以收集到有效的示例,并将它们用作优化过程中的输入,从而提高优化的效率和质量。

  • 如何将DSP框架与LangChain结合使用?

    -DSP框架可以与LangChain结合使用,以利用LangChain的高级抽象和工具。例如,可以将LangChain中的链(chain)或代理(agent)与DSP的优化器结合,以提高整个系统的优化性能。此外,LangChain的追踪和评估工具可以用来收集数据集,这些数据集随后可以用于DSP框架中的优化过程。

  • 在DSP框架中,如何确保生成的提示(prompts)是有效的?

    -在DSP框架中,生成的提示是通过优化器生成的,优化器会尝试不同的提示,并使用少量示例来评估它们的效果。通过这种方式,优化器可以学习到哪些提示对于特定的语言模型和任务更有效,从而确保生成的提示能够有效地指导模型的行为。

Outlines

00:00

😀 介绍DSP框架及其用途

Harrison Chase,Lang Chain的首席执行官和联合创始人,介绍了DSP(Declarative Self-Improving Language Programs)框架。DSP是一个用于构建和优化语言程序的工具,它允许开发者通过分解问题并利用编程工具和语言模型来解决复杂问题。Harrison讨论了如何使用DSP来构建程序,特别是当涉及到需要AI模型介入的模糊步骤时,以及如何将这些步骤优化为可在特定语言模型上有效运行的程序。

05:00

🔍 DSP框架的核心理念和工作流程

Omar,一位斯坦福大学的博士候选人,讨论了DSP框架的核心理念,包括搜索(search)、预测(predict)和演示(demonstrate)的概念。他解释了如何通过DSP框架设计程序流程,以及如何优化这些流程以提高语言模型在特定任务上的表现。Omar还提到了如何使用DSP框架来生成针对不同语言模型的提示(prompts),并强调了DSP框架在迁移和适应不同语言模型时的灵活性。

10:02

🤖 DSP框架的编程和编译过程

讨论了如何使用DSP框架编写程序,并将其编译成可以在语言模型上运行的低级代码。这个过程涉及到将高级的Python代码转换成语言模型可以理解的提示(prompts)。强调了编译过程中的优化,包括如何根据不同的语言模型和需求选择不同的优化器(optimizers),以及如何通过编译过程提高程序的执行效率和效果。

15:03

🔧 DSP框架的优化技术

详细介绍了DSP框架中的优化技术,包括如何通过优化指令(instructions)、示例(examples)或模型权重(weights)来提高语言模型的性能。讨论了如何使用少量示例(few-shot examples)来训练模型,并通过随机搜索或贝叶斯优化等技术来选择最佳的提示。还提到了如何通过追踪(tracing)整个程序的执行过程来收集数据集,并利用这些数据集来优化程序。

20:04

📈 Lang Chain与DSP框架的结合

探讨了如何将Lang Chain与DSP框架结合起来,以实现更高效的语言模型优化和部署。讨论了Lang Chain在表达程序、优化、测试和评估方面的优势,以及如何将这些优势与DSP框架的模块化和优化技术相结合。提出了一些可能的集成方案,包括在Lang Chain中使用DSP框架的优化器,以及将Lang Chain中的数据集和追踪信息用于DSP框架的优化过程。

25:04

🚀 优化流程和流工程的未来方向

最后,讨论了优化流程和流工程(Flow Engineering)的未来方向,包括如何进一步控制和优化语言模型的执行流程。虽然提出了一些关于自动化生成程序流程的想法,但也强调了当前阶段对程序设计的重视,以及对程序流程的深入理解和控制的重要性。同时,提到了即将举行的关于流工程的网络研讨会,预示着这一领域的持续发展和创新。

Mindmap

Keywords

💡DSP

DSP(Declarative Self-Improving Language Programs)是一个框架,用于构建和优化语言程序。它允许开发者通过声明式编程来定义程序的逻辑流程,然后自动生成或优化与语言模型交互的提示(prompts)。在视频中,DSP被用来展示如何将复杂的语言模型任务分解为更易于管理和优化的模块。

💡Chain of Thought

Chain of Thought是一种提示策略,它要求语言模型在生成答案之前先生成一系列推理步骤。这种方法有助于提高模型生成答案的质量和相关性。在视频中,作为一种DSP模块,Chain of Thought被用来展示如何教导语言模型执行特定的推理策略。

💡Optimization

在DSP框架中,优化指的是通过不同的算法和技术来改善语言模型的性能。这包括优化提示(prompts)、生成的例子(examples)、或模型本身的权重(weights)。视频中讨论了如何使用DSP的优化器来提高语言模型在特定任务上的表现。

💡Language Model

语言模型是一种机器学习模型,能够理解和生成自然语言。在视频中,语言模型是DSP框架的核心,用于执行各种语言任务,如文本生成、问答等。DSP通过优化与这些模型的交互来提升整体系统的性能。

💡Flow Engineering

Flow Engineering是管理和优化复杂系统中数据流动的实践。在视频的上下文中,它与使用DSP框架构建语言程序有关,强调了设计和优化模块化和可组合的系统的重要性。

💡Prompt Engineering

Prompt Engineering是指设计和调整输入到语言模型的提示,以引导模型生成期望的输出。在视频中,Prompt Engineering是DSP框架中的一个重要组成部分,它涉及到如何有效地与语言模型交互以解决特定问题。

💡Language Program

语言程序是使用语言模型来解决特定语言任务的程序。在视频中,DSP框架被用来构建这类程序,它允许开发者通过声明式的方式定义程序的逻辑,并自动处理与语言模型的交互。

💡Tracing

Tracing是记录和跟踪程序执行过程中的信息,以便于分析和优化。在视频的上下文中,Tracing用于收集语言程序执行的实例,这些实例随后可以用于训练或优化语言模型。

💡Lange Chain

Lange Chain是一个工具或框架,用于构建和部署涉及语言模型的应用程序。视频讨论了如何将Lange Chain与DSP框架结合起来,以便于利用DSP的优化能力来提升Lange Chain应用程序的性能。

💡Control Flow

控制流是指程序执行的顺序和条件。在DSP框架中,控制流的表达允许开发者定义程序的逻辑结构,如循环、条件判断等。视频中提到,DSP提供了一种方式来表达和优化这些控制流结构。

💡Assertions

Assertions是编程中用来验证某个条件是否为真的语句。在DSP框架中,Assertions用于定义输出的约束条件,确保生成的查询或答案满足特定的要求。视频提到Assertions作为优化过程中的一部分,帮助生成符合预期的输出。

Highlights

Harrison Chase介绍了LangChain,这是一个开发者工具包,旨在简化构建大型语言模型(LM)应用程序的过程。

Omar,斯坦福大学的博士候选人,讨论了他在自然语言处理(NLP)领域的工作,特别是关于K-bear和检索方面的研究。

DSP框架被提出,用于开发和优化语言程序,它允许用户通过分解问题结构来构建程序,而不是仅仅依赖于模型理解任务。

DSP框架的关键是通过使用标准的编程工具和抽象来表达程序中的模糊步骤,这些步骤通常由语言模型处理。

DSP框架的一个主要目标是帮助用户思考如何构建程序,以及如何为不同类型的问题设计解决方案。

DSP框架还解决了如何为GPT-4等语言模型编写有效的提示,以确保程序的各个阶段能够高效执行的问题。

在DSP中,用户可以编写Python类来定义模块,并通过编译过程将高级程序转换为优化版本的程序,以适应特定的语言模型。

DSP框架通过分离程序逻辑和语言模型的教学步骤,使得程序更加模块化,易于优化和转换。

DSP框架允许用户通过优化器来优化程序,优化器负责将高级描述转换为有效的系统。

DSP框架提供了一种将自然语言处理任务转化为机器学习问题的方法,使用户能够更系统地处理语言模型的使用。

DSP框架中的优化过程通常快速且成本低廉,适用于新设置,并且可以从少量示例中学习。

DSP框架的优化器可以通过生成示例来学习模块行为,并通过搜索设计空间来最大化示例的平均质量。

DSP框架支持使用不同的语言模型,并且可以轻松地在不同的模型之间转换,而无需重写提示。

DSP框架允许用户通过编写代码来声明模块,并通过前向函数来组合这些模块,从而解决了传统提示编写的复杂性。

DSP框架中的自然语言签名(Natural Language Signatures)允许模型了解它应该执行的任务,而模块(Modules)则负责学习特定的模式。

LangSmith是一个工具,用于收集和优化DSP框架中的示例,它可以从生产日志中自动创建数据集,并用于优化和评估。

LangChain和DSP框架的结合使用提供了一种强大的方法,可以优化语言模型的提示、示例和权重,同时保持对系统的控制。

Transcripts

play00:00

before that we can do brief

play00:01

introductions and then jump right into

play00:02

it so so my name is Harrison chase a CEO

play00:05

co-founder of Lang chain um a a

play00:07

developer toolkit making it easy to

play00:09

build LM

play00:11

applications um thank you Harrison and

play00:13

uh thanks everyone for joining me looks

play00:15

like we have a a lot of folks um I'm

play00:17

Omar um I'm a PhD candidate at Stanford

play00:22

um I work in NLP Broadley you might know

play00:26

about my work on K bear uh in retrieval

play00:29

um and um you know excited to talk about

play00:32

dspi today which is a framework for

play00:35

developing and optimizing uh language

play00:37

programs and I'll tell you all about

play00:39

language programs

play00:40

today all right so DSP that's how it's

play00:43

pronounced that puts all the questions

play00:45

to bed yes like like numpy or pytorch

play00:48

DSP all right so yeah I mean let's jump

play00:51

right into it what what is DSP DSP um

play00:55

why did you build it how how are people

play00:57

using it yeah so this is a framework for

play01:00

really thinking about those like this

play01:01

notion of language programs you're

play01:03

writing a piece of code and you know

play01:06

you're trying to solve a hard problem

play01:08

and the way you're going to solve that

play01:09

hard problem is not by working really

play01:11

hard at you know teaching that model to

play01:14

um understand your task it's going to be

play01:16

by you know breaking this problem down

play01:18

into pieces and thinking about the

play01:19

problem structure that you have you

play01:22

could employ you know um all standard

play01:25

sorts of tools and we have't programming

play01:26

so you could have um you know Loops or

play01:29

uh recursion or you know um assertions

play01:32

or all other sorts of um standard

play01:35

programming tools um and abstractions

play01:38

but the key thing is that in this

play01:39

program there are these steps that you

play01:41

don't know how to express um in standard

play01:44

code so there are these kind of fuzzy

play01:46

steps that are generally best tackled by

play01:49

an AI model by a language model in

play01:51

particular um and so DSP is basically a

play01:54

framework for thinking about that layer

play01:57

how do we express this type of thing and

play01:59

then how do we actually go from you know

play02:02

a high level Declaration of what these

play02:05

you know fuzzy steps are going to do and

play02:07

how how they sort of compose together

play02:09

into a bigger program um into an

play02:11

optimized version of the program that

play02:15

runs effectively given any particular

play02:17

language model in any particular program

play02:20

um so one one of the main problems we're

play02:23

trying to tackle is well thinking about

play02:27

you know how do you build a program this

play02:30

kind like what should you do for

play02:31

different types of problems um and then

play02:33

the one that is sort of I think DSi is

play02:36

better known for is now that you've

play02:38

written this program and you want to run

play02:40

it for

play02:42

gp4 what should the prompts look like um

play02:46

to gp4 so that it does all of these

play02:48

stages in your program um effectively so

play02:51

you have a you're solving a you know

play02:53

you're doing rag for example and let's

play02:55

say you don't want um your rag component

play02:59

to rely entirely on this retrieval uh

play03:01

system because in many cases retrievers

play03:03

are kind of optimized to be fast and and

play03:05

and cheap and scalable but you really

play03:07

want it to be very accurate so maybe

play03:09

your program is given the user's

play03:11

question um I want the language model to

play03:13

generate a bunch of search queries um

play03:16

that will you know be optimized one way

play03:19

or another to find as many of the

play03:21

relevant facts as possible maybe this is

play03:23

like patent search or something and so

play03:24

you really really care about Recall now

play03:27

that you've uh generated those queries

play03:29

you want to basically

play03:30

um find the documents that

play03:33

um um that match those sties and then

play03:36

you want the last stage of your system

play03:38

maybe to summarize the information there

play03:40

with

play03:41

citations in principle you could just

play03:43

ask a language model to do these steps

play03:45

but as we're all aware these models are

play03:47

very very sensitive and they're

play03:49

sensitive in different ways so over time

play03:51

these models get updated or you want to

play03:53

switch from gp4 to to clad or you want

play03:57

to sort of go to a smaller mod mod to

play04:00

save costs over time or because you're

play04:02

struggling with the learn with the with

play04:04

the rate limits or some other challenges

play04:06

um in the general problem that you face

play04:09

is

play04:10

that even if you manage to get prompts

play04:13

that more or

play04:15

less declare the behavior that you're

play04:18

interested in and in and enforce it

play04:20

they're very very um they're

play04:22

non-transferable right they're just tied

play04:24

to the particular system that you built

play04:26

with a particular language model that

play04:28

you built so the SP is basically adding

play04:30

this layer in between where you can

play04:32

express those stages in terms of what

play04:34

you want to do rather than how you want

play04:37

the particular language model to do it

play04:39

um and um it basically gives you the

play04:42

rest of the pieces of the puzzle to

play04:44

treat this almost like a machine

play04:46

learning extended machine learning

play04:47

problem where you're trying to learn to

play04:49

fill in the blanks of how to use a

play04:51

language model for a particular task and

play04:54

so so one of the terms that I've heard a

play04:56

bunch on Twitter recently is is flow

play04:58

engineering and this idea of like yeah

play05:00

creating or your system being kind of

play05:02

like a flow and there's different stages

play05:04

and is that essentially what you're

play05:05

talking about with the first part of it

play05:07

like this way of declaring kind of like

play05:09

that system right so dspi stands for

play05:11

declarative self-improving language

play05:14

programs pythonic which is a which is

play05:17

which is a complex kind of backronym

play05:19

because originally DSP was demonstrate

play05:22

search predict and uh so that's a 2022

play05:25

paper um we wrote so demonstrate search

play05:28

predict is a particular way um to think

play05:32

about designing these these flows if you

play05:34

will um so basically search is this

play05:37

notion of to solve a task you basically

play05:39

want to be able to break down how your

play05:43

language

play05:44

system takes the input maybe a question

play05:47

or whatever just trying to solve and

play05:49

gathers information and you know if you

play05:51

read if you read the DSP paper not the

play05:53

DSP paper it basically includes a number

play05:55

of design patterns what should how

play05:57

should you build a search stage with

play05:59

language models and you know there's a

play06:02

lot of literature on stuff of this kind

play06:03

now like hide and other stuff um that

play06:06

for example use language models to

play06:07

generate documents and then search

play06:09

search for them or U multihop retrieval

play06:12

or other sorts of um types of that kind

play06:15

of things so let me see if I

play06:17

can hopefully you don't get a lot of

play06:20

that um and predict is like well now

play06:23

that you've gathered those stuff how do

play06:25

you you know design the system

play06:27

architecture so that you um

play06:31

output grounded answers you know

play06:34

potentially uh with citations and other

play06:36

other uh this type of constraint so DSP

play06:41

was very kind of like retrieval focused

play06:43

right DP is more General and can be used

play06:46

for kind of like anything things or

play06:48

would you say that it's still fairly

play06:49

retrieval focused no the dpy is not

play06:52

retrieval focused um DSP but the bigger

play06:56

difference the bigger difference was

play06:58

that DSP required you to think about

play07:00

this demonstrate stage so I said

play07:02

demonstrate search predict and I didn't

play07:04

describe what demonstrate was you had to

play07:06

manually declare the or Define the logic

play07:11

of how your language model will be

play07:12

taught to do the task so you wouldn't

play07:14

write you wouldn't be writing the

play07:15

prompts but you would be writing the

play07:16

code that that essentially searched over

play07:19

prompts to to Define how the program

play07:22

should work and in dsy that's kind of we

play07:25

automated that for for for people and we

play07:27

created a bunch of optimizers so that

play07:29

you know the the the standard flows that

play07:32

we saw were very effective um could just

play07:35

work so in DSP you basically just write

play07:37

to python class um maybe this is a good

play07:41

place to um to share screen I don't have

play07:44

slides specifically for today but I have

play07:46

some kind of things I could probably

play07:47

just

play07:48

show yeah I think that helpful because

play07:51

one of the things that struck me looking

play07:52

at and I think you've said this is it

play07:54

looks very similar to P torch um and so

play07:57

how about so what think what I could do

play08:00

is I could share screen slides are

play08:02

slightly not for today but

play08:05

um let's see if we can make this

play08:09

work I have the repo pulled up as well

play08:12

if you want me to share that oh there we

play08:14

go so um I said I started by saying that

play08:18

these language programs are advancing

play08:20

many tasks today and you know this is

play08:23

part of the in the context of a talk

play08:24

where I discuss language programs I

play08:26

built in like 2020 and 2021 which was

play08:29

not with prompting it was with

play08:30

fine-tuning and it was you know you

play08:32

didn't couldn't really kind of it was

play08:34

hard to figure out how to generate data

play08:36

for them but now it's much easier

play08:38

because you could prompt these language

play08:39

models to build stuff like storm people

play08:41

might be familiar with this cool work um

play08:44

from y Stanford um and and and folks uh

play08:48

with us um basically writes whole

play08:50

Wikipedia articles and the interesting

play08:53

thing about something like storm is not

play08:55

really the language model itself the

play08:57

language model is more of a device in

play08:58

the background

play09:00

the interesting thing is how these parts

play09:02

are are are composed together to um to

play09:05

solve this hard task and to ensure that

play09:07

you know outputs which in this case are

play09:10

entire Wikipedia articles are planned

play09:12

properly and um are citing their sources

play09:15

properly another one is AA um with Carl

play09:19

and and folks um so this is basically a

play09:22

system for classifying um long documents

play09:25

but not classifying them into five or 10

play09:26

classes it's classif classifying them

play09:28

into up to I think 15,000 classes and

play09:31

the challenge is in general you don't

play09:33

have labels for all these classes and so

play09:35

you're basically bringing in a language

play09:37

model and in this case a retrieval model

play09:40

um in an interesting way to solve this

play09:43

and this sort of thing is obviously very

play09:44

very popular in Lang chain you know you

play09:46

this is basically um all the sorts of

play09:48

chains you guys U you guys have

play09:50

popularized dramatically um but lots of

play09:53

folks are building these you know text

play09:55

tosql stuff um really cool one from

play09:58

Google rar um that revises um you know

play10:02

tax generation from Models um Alpha

play10:05

codium right this is the um flow

play10:06

engineering uh uh paper um so this is

play10:10

stuff this stuff is is super popular and

play10:12

then basically the problem that dsy is

play10:13

solving is that these diagrams are

play10:16

really cool like you look at these

play10:18

diagrams and they're very modular

play10:20

they're very attractive there slot like

play10:23

about them but the problem is because

play10:26

language models are very sensitive to

play10:27

how to how you prompt them if you think

play10:29

of them as like as as like basically um

play10:32

CPUs it's like their instruction sets

play10:35

are just very different and ad hoc um so

play10:38

under the hood this stuff looks more

play10:40

like as as we're all familiar looks like

play10:42

like this looks more of um you know

play10:45

here's a long and and very carefully

play10:47

crafted prompt that teaches the model

play10:50

how each of these stages should work and

play10:52

we really hope that the output of each

play10:53

stage looks good enough to the input of

play10:56

the next stage because if they don't

play10:58

we're kind of going to have a Sy that

play10:59

doesn't work and so people are sort of

play11:00

writing these and and I'm guilty of that

play11:02

as well you know we're all kind of

play11:03

writing these um you know tens of

play11:06

kilobytes of raw text files and just

play11:09

kind of trying to do our best with that

play11:11

and and this basically leaves us very

play11:13

dissatisfied in in DSP because we're

play11:15

thinking these language model pipelines

play11:18

that people are building are so modular

play11:20

in principle and there's lots to love

play11:22

about them and there's something really

play11:24

attractive and moving beyond the world

play11:26

where language models are you know are

play11:29

are so important into a model where they

play11:30

are basically devices you know you just

play11:32

buy or rent the best one you can get and

play11:35

then you're building the software on top

play11:37

um and you know we have all these large

play11:39

design spaces but in reality this is

play11:42

extremely messy and hog and it's you

play11:44

know that's that's really limiting what

play11:46

you can do so in particular you take any

play11:49

of the pipelines or some of the

play11:50

pipelines we looked at and you want to

play11:52

change your language model to a cheaper

play11:53

one or you want to switch from prompting

play11:55

to fine tuning um and you're sure of

play11:58

stuck like do you have to rewrite the

play12:00

prompts do you have to like redesign

play12:02

this whole thing so that it could find

play12:03

you in the model um that doesn't feel

play12:06

like you know the right way to go about

play12:09

this especially given how powerful these

play12:11

language models are supposed to be um so

play12:13

the goal of dspi is to sort of make um

play12:17

working with these language programs

play12:19

more like systematic machine learning um

play12:22

and we have this sort of um theme of

play12:25

programming and not prompting language

play12:27

models um so basically you're not going

play12:30

to be writing these Long messy uh

play12:33

prompts which again we're all guilty of

play12:35

writing and in in many cases basically

play12:38

you know you write those like thousand

play12:40

words or 2,000 words and maybe you're

play12:43

not even satisfied with your results at

play12:45

that at the end of this um instead

play12:47

you're going to write a piece of code um

play12:50

as you said Harrison that looks like a

play12:52

pytor

play12:53

program um so you're basically going to

play12:55

declare a dsy module and in your dsy

play12:59

module you're going to Define you're

play13:01

going to sort of um Define one or more

play13:05

subm modules and these sub modules um

play13:08

allow you to express internal sort of

play13:12

control flow um that declares how uh

play13:15

signature is going to work so I'll say

play13:17

more about these terms but basically

play13:19

you're going to declare a bunch of

play13:20

modules here and then you have a forward

play13:21

function that's just using these modules

play13:23

it's using generate quy and using

play13:25

generate answer um uh you know in a loop

play13:28

or whatever to solve an actual problem

play13:31

and you notice here that we're not

play13:32

writing any prompts anywhere or we're

play13:33

not writing you know like how the models

play13:35

should be used um again the slides are

play13:37

not really meant for today so they're uh

play13:40

the ordering might not be ideal but

play13:42

we'll try to make it work um the idea

play13:44

then is that we want to take this

play13:47

program and essentially compile it down

play13:49

and compiling in general in in the

play13:51

general sense I'm using here means

play13:53

taking a high level program and sort of

play13:56

generating a lower level language and

play13:59

something that is strikes people as

play14:00

interesting or odd or whatever at first

play14:02

when they see DSP is we're actually

play14:04

compiling computer code in Python to the

play14:07

lowlevel language which in many cases is

play14:09

like English for example um for the

play14:11

promise so we're actually treating the

play14:13

natural language that we use to speak to

play14:15

the models as a low level as a low as a

play14:17

lower level abstraction and that's not a

play14:20

mistake that's intentional because the

play14:22

language these modelist speak is not

play14:24

necessarily the English that you or I

play14:26

understand it's it's really you know

play14:28

it's a much more subtle thing where very

play14:31

small differences to us you know or or

play14:34

in some cases even very strange uh

play14:36

changes to us um may impact model

play14:39

quality dramatically and and maybe I can

play14:41

share some examples of that that like

play14:43

ended up getting really popular but idea

play14:45

is that when you compile this this

play14:46

program which is actually shorter than

play14:48

the prompt we had in the last slide um

play14:50

you can get better scores you can get a

play14:52

whole lot of better scores with bigger

play14:53

models but the same program can be find

play14:56

can be um compiled into smaller open

play15:00

models um or even tiny language models

play15:03

through fine-tuning um and you're not

play15:05

really changing your code or your

play15:07

description of the problem in any way

play15:08

you're just selecting different

play15:10

optimizers um which are the um sort of

play15:14

modules in thepi responsible for turning

play15:17

this this thing as it being compiled

play15:19

into a highly effective um into a highly

play15:23

effective system and what makes this all

play15:26

possible is two ideas um want to

play15:29

separate the program logic so what you

play15:31

just call like this information flow um

play15:34

from how we're going to teach the

play15:35

language model each of the

play15:37

steps and we want to um optimize we want

play15:42

to think about algorithms that can

play15:44

optimize how we will prompt or fine-tune

play15:46

the language models um and in general

play15:49

the role the the the approach we'll take

play15:51

here is some sort of discrete search

play15:53

over modular language elements so we

play15:56

don't want to be doing like you know

play15:57

reinforcement learning or something

play15:59

where we need a whole lot of data or um

play16:02

other approaches we we actually want to

play16:04

use the language models themselves to

play16:06

think about hey how should we convert

play16:08

this high level sort of description of

play16:10

our task into uh into something that's

play16:13

effective um and obviously language

play16:15

models are not very good at this sort of

play16:17

thing on their own so we want to give

play16:18

them the U the support to um to make

play16:22

this to make this

play16:24

effective um so let's see what's a good

play16:26

way to do this so let's yeah I think

play16:28

this may be a decent

play16:30

example um a decent example basically is

play16:34

let's say you want to teach three

play16:36

different language models to answer a

play16:38

question after they do some Chain of

play16:40

Thought how would you express that so

play16:42

let's say you want to do this with gp4

play16:44

I'd say one way to do this is you would

play16:46

like think about how you would craft an

play16:49

instruction and then you trust that GPT

play16:50

for can answer questions and can do

play16:52

Chain of Thought So you want to keep

play16:55

this cheap and you don't want to like

play16:56

make the prompt too long unnecessarily

play16:58

but if you want to do this with like an

play17:00

open model that's a bit smaller maybe

play17:02

mistal or or llu um on the smaller side

play17:05

of those models um then maybe you want

play17:08

to start showing examples and this is

play17:10

trickier than it sounds because your

play17:12

examples should demonstrate the pattern

play17:16

you're trying to uh to show so for

play17:17

example if you want to do a Chain of

play17:18

Thought well you need to think about how

play17:20

do I write the right Chain of Thought to

play17:22

teach this model how to do this for my

play17:23

types of questions um and let's let's

play17:26

say you write something that looks like

play17:27

this and maybe you have several more

play17:29

examples because more is usually more

play17:31

effective um but in some cases you want

play17:33

to solve the same task but with a much

play17:34

smaller model and here you know with a

play17:37

T5 model for example prompting is not

play17:39

going to do much at all the model is not

play17:41

you know um in context learning aware so

play17:44

you're going to have to synthesize or

play17:46

otherwise create maybe a thousand

play17:47

examples of this pattern um so we have

play17:50

questions the Chain of Thought and the

play17:52

answers and you want to actually fine

play17:53

tune the model on it but it's what's

play17:55

dissatisfying here is that we're solving

play17:57

the exact same problem and actually

play17:59

we're expressing the exact same um

play18:02

control flow um although this is just

play18:04

one language model call um in three

play18:07

vastly different ways and and the idea

play18:10

is that like instructing models fop

play18:12

prompting fine-tuning are just different

play18:14

implementations of the idea that I have

play18:16

a question and I want to generate some

play18:18

reasoning and I want to generate an

play18:21

answer and this is actually how you

play18:23

would Express this in dsy you say I you

play18:25

know this is like a function signature

play18:28

if you will so this is my input and

play18:30

these are my outputs and you could have

play18:31

multiple inputs or multiple outputs um

play18:34

and the key question that um dspi is to

play18:39

answer automatically for you is what is

play18:41

a question like how do I tell a model

play18:44

this is what a question is and this is

play18:45

what you should do to it you know to

play18:47

generate a reasoning and this is what a

play18:48

reasoning is and this is what an answer

play18:49

is um but this is something that we want

play18:52

uh to learn from the from the data so

play18:54

you could have say things like document

play18:55

to summary or context and question to

play18:57

search query um or as we said question

play19:00

to answer um and you can obviously

play19:02

specify a lot more constraints you don't

play19:04

have to it doesn't have to be a string

play19:05

like this you could have a more like you

play19:07

could say I want the output to be you

play19:09

know integer and and I want you know I

play19:12

want it to behave like this or that

play19:14

there there are various ways to

play19:15

constrain these things but the key thing

play19:17

is that what we'll Define what a

play19:19

document or search query or reasoning is

play19:21

is something that we want to learn

play19:23

within the context of this program that

play19:24

we built um so um I guess maybe we I can

play19:30

stop after this part and and and and

play19:32

just make sure we're on uh you know we

play19:35

we

play19:36

we everything makes makes sense to folks

play19:38

in the audience um but let's say you

play19:41

want to express this simple kind of

play19:43

pipeline so you have a system that takes

play19:45

questions and it generates search quties

play19:48

it searches with them and then it like

play19:51

asks further questions it searches again

play19:53

and then it uses a retriever to find

play19:55

some documents and then uses these

play19:57

documents to finally answer the question

play19:59

so in DSi it would express it like this

play20:01

um and initialization you think of this

play20:04

almost like a neural network if you are

play20:06

familiar with that stuff but now each of

play20:08

these layers is actually a language

play20:10

model call if you're not familiar with

play20:12

neural networks and pyour and all that

play20:14

it doesn't matter you don't you don't

play20:15

need to know that but if you are

play20:17

familiar it's a good analogy so now

play20:19

you're going to declare um that you just

play20:21

want a chain of thought that will

play20:23

Express each of the blue uh Parts here

play20:26

so both of them take a question and

play20:27

generate the quity

play20:29

um but you know they could also take

play20:31

some context that was retrieved earlier

play20:33

or they could take an empty context at

play20:34

the beginning and the black sort of box

play20:37

here uh takes this context that was

play20:39

retrieved and takes the question and

play20:41

generates an answer and this is the full

play20:43

sort of description of of you know the

play20:47

different blocks on the on the

play20:49

right you can then sort of compose them

play20:52

together in a forward function where

play20:54

you're calling these modules so you're

play20:55

generating query and then you're

play20:57

retrieving um and then you're dumping

play20:59

the stuff you're achieving in this

play21:00

context and then you're generating an

play21:01

answer at the end um so this is the the

play21:04

forward function and and this is

play21:06

basically nothing but like a very direct

play21:08

translation of the diagram into a piece

play21:10

of code um with the interesting piece

play21:13

being that we have these signatures that

play21:15

are telling the model what it's supposed

play21:17

to be doing um so these are called

play21:19

natural language signatures um and we

play21:22

touched them we touched on them like a

play21:24

couple minutes ago um how these are or

play21:27

like you know the strateg or the control

play21:29

flow of how these signatures will be

play21:32

expressed are are through what we call

play21:34

modules so there are several modules in

play21:36

the5 or all the kind of prompting

play21:39

techniques that people have um if you

play21:42

want to do Chain of Thought in general

play21:43

as we said earlier you kind of have to

play21:46

think of how to teach your particular

play21:48

model this particular reasoning strategy

play21:51

but in thep it's just a module that you

play21:53

give your signature and there's an

play21:55

actual like module there that is going

play21:58

to learn that particular pattern for us

play22:01

um and I I think I might have a slide on

play22:03

what it's what it does you know in the

play22:05

case of Chain of Thought but the

play22:06

interesting piece here is that

play22:08

internally this Chain of Thought is not

play22:10

like a template uh although it has a

play22:12

template um it's actually a module that

play22:15

maintains three things it basically is

play22:17

going to remember what instructions it

play22:19

should use and so the idea is it wants

play22:21

to learn the right instructions for your

play22:23

language model for this particular

play22:25

signature um it also wants to learn what

play22:27

input out out with examples it should

play22:29

show to the model and um if you use fine

play22:33

tuning then it also wants to know you

play22:35

know what model fine unit should be

play22:38

using and there are other Notions of

play22:41

constraining the outputs like assertions

play22:43

where you say well you know I don't just

play22:45

want to generate a query I want this

play22:46

query to have a particular property for

play22:48

example I want it to be less than 100

play22:50

words um and I want the program to learn

play22:53

to generate queries that pass my

play22:57

assertions uh um so I'll pause here for

play23:00

a bit and then we can continue and again

play23:02

apologies that the slides are not really

play23:05

kind of purpose built I I wasn't

play23:06

expecting to present them no this is

play23:09

great um a bunch of questions around

play23:12

optimizers and you mentioned this is

play23:14

kind of like one of the the two big

play23:15

things what exactly is being optimized

play23:19

and and how how does that work yeah

play23:22

that's do that um so I guess this is

play23:24

another way of asking you know what

play23:26

defines the context the question the quy

play23:28

these are terms and if you go to a

play23:30

language model and you say I want an

play23:32

answer it kind of has some prior over

play23:34

what that might mean but it might not be

play23:36

exactly right

play23:38

um this is we should go we should get

play23:41

back to this so that

play23:42

people understand actually let's go in

play23:45

order so that we don't forget so

play23:47

prompting strategies like Chain of

play23:49

Thought they're just dsy modules too the

play23:51

difference is that they're going to take

play23:52

our signature and change it so they're

play23:55

going to take our signature and say like

play23:57

oh I just want to add a field um that's

play24:00

going to be like let's think step by

play24:01

step or or whatever um and then I'm

play24:03

going to build kind of a another module

play24:06

with the newer signature and then

play24:08

basically I'm going to like forward all

play24:10

my calls to that um to that extended

play24:13

extended uh um uh signature and this is

play24:18

nice because now this extended signature

play24:20

is is what we would learn we would

play24:22

actually going to learn like you

play24:23

specified that you have questions you

play24:25

want and you want answers but when you

play24:26

have a Chain of Thought we're actually

play24:27

going to learn this reasoning um uh

play24:30

field as well uh when we're optimizing

play24:33

this thing um yeah this is so the role

play24:37

of of the of the language program

play24:39

optimizers is this process of taking

play24:41

each of your signatures and figuring out

play24:43

how to construct those prompts um and

play24:47

you know there could either be by

play24:48

creating instructions or by creating

play24:51

examples or by building data to find

play24:53

tune models um for the same signature um

play24:56

and the way this is going to work is not

play24:58

blind you don't just go to the model and

play25:00

say like give me good prompts which I'm

play25:02

starting to see out there that doesn't

play25:04

make sense because language models you

play25:06

know they're not necessarily aware of

play25:09

what will work as a good prompt they

play25:10

give you a long prompt for sure and odds

play25:13

are it's not completely random but it

play25:16

might not actually be very effective um

play25:18

instead what you want is some way to

play25:19

have to get like to get feedback um so

play25:22

you want to build a metric um and you

play25:25

want to have a small number of examples

play25:27

maybe you know 20 some people are

play25:29

successful with as little as 20 usually

play25:32

around 50 or 100 is is is good um

play25:35

sometimes without any labels um you can

play25:37

still U proceed and you know I don't

play25:40

think we need to like go through the

play25:41

formula but basically um the idea is

play25:44

we're trying in these optimizers to

play25:46

search over basically the design space

play25:49

of these prompts um to maximize the

play25:52

quality on average over your examples

play25:55

and I say sometimes it's without labels

play25:57

um

play25:58

metrics can be something like the output

play26:01

of my let's go back to the pipeline we

play26:03

had here one metric one set of uh of

play26:08

of Optimizer inputs here could be take

play26:11

this program here are 20 questions and

play26:14

here are their answers and the metric is

play26:17

you know does the answer match what I

play26:19

what I give you but a different uh type

play26:22

of of metric could be here's here's set

play26:26

of 20 questions um I don't know what the

play26:28

right answer is but I do know that I

play26:30

Want U my answers to be not based on

play26:33

hallucinations I want them to be

play26:34

grounded so my metric is going to be

play26:37

another DSP program that checks for

play26:39

hallucination um and it says when I get

play26:42

an answer from my system the text that

play26:45

I'm getting is actually grounded you

play26:47

know is is based on the documents or the

play26:50

context that was retrieved and the nice

play26:53

thing about that being a DSi program is

play26:54

that we can actually optimize that

play26:57

metric so that it is reliable at

play26:59

detecting hallucination it's just a

play27:00

classification problem it's very easy to

play27:02

optimize um and now we have a really

play27:04

good way to optimize this program so

play27:07

that it generates answers that we that

play27:09

don't have hallucinations and we did not

play27:10

even need labels for that but in many

play27:13

cases you know if you have some labels

play27:14

you should use them

play27:17

um so the general idea of these

play27:20

optimizers is we want to learn module

play27:23

Behavior by generating examples so

play27:25

you're giving us those inputs um let's

play27:28

actually do this again on the same

play27:30

diagram you're giving us

play27:33

um you're giving us this program here

play27:37

and you're giving us 20 questions and we

play27:40

want to teach the model to generate

play27:41

search queries but we don't know what

play27:43

types of search queries will work for

play27:45

this retriever like you know you just

play27:47

switch from kber V2 to like I don't know

play27:49

kber V3 when it exists or something or

play27:51

to like a um u.com search or some other

play27:55

thing um and the types of queries that

play27:58

work for each of them is different so

play28:01

how can you teach a particular language

play28:03

model to generate good queries here well

play28:05

one idea is what if we take this

play28:06

question and we actually try to run it

play28:08

through the pipeline in various ways and

play28:11

actually we trace the um inputs and

play28:14

outputs of this stage and see if they've

play28:16

been effective in the past at you know

play28:19

getting the retriever to do what we want

play28:21

and we can in general detect that

play28:22

because we're measuring you know um

play28:25

we're measuring the the the metric uh

play28:28

the end or because you're giving us

play28:30

those assertions although um you know in

play28:32

general you don't need them that sort of

play28:34

give us a signal as to whether things

play28:35

are going right or not and when we have

play28:37

those examples well we could um you know

play28:40

we could go back to the um optimizers

play28:43

here and basically use them as generated

play28:46

examples for each of these stages of our

play28:48

system um so this is the notion of

play28:50

bootstrap fuse shot and then there are

play28:52

extensions of that that you should be

play28:54

using if you want to treat this as an

play28:56

actual optimization problem like boot

play28:58

have F shot with random search um and so

play29:00

that's basically saying um I'll create

play29:03

few shot examples for St for the stages

play29:05

of your pipeline starting from your

play29:06

inputs um and to choose which

play29:09

combination um you know I I should work

play29:11

with I'll run this random search process

play29:14

um trying to maximize your

play29:16

metric um bootstrap Point tune is just

play29:18

saying well since I can create examples

play29:20

now and I have this potentially

play29:22

optimized set of prompts I can create a

play29:24

lot more data now pretty efficiently and

play29:27

use it to actually update the weights of

play29:30

each of these modules through my

play29:32

pipeline assuming that you're working

play29:34

with a language model that we have an

play29:35

API to find tun it so not you know you

play29:37

can't do this with gp4 for example and

play29:39

in general you shouldn't um even if they

play29:41

give you an API to find tun it but maybe

play29:42

you can find tun in a small llama model

play29:44

or a T5 model um or you could do things

play29:47

that actually learn instructions and

play29:50

examples uh like the recent Meo that we

play29:52

released uh like a couple of months ago

play29:55

um and several other other stuff and I

play29:57

think um people look at optimization and

play30:00

they kind of fear that this stuff is

play30:01

expensive but in reality you know partly

play30:04

because llms are so cheap if you're

play30:06

doing like 3.5 level models um

play30:09

optimization is something that generally

play30:11

takes a few minutes and you know a few

play30:13

dollars um for a new kind of setting

play30:16

that you have um if you have a couple

play30:18

hundred examples if you have less it's

play30:20

even cheaper um but don't you know

play30:23

generally start small and generally

play30:24

start with smaller models so you

play30:26

understand what's happening and can sort

play30:28

of track the quality before adding

play30:29

complexity or scale um so I can tell you

play30:33

how the me Optimizer works this is you

play30:35

know this you could also look at the

play30:36

thread about Meo and and other resources

play30:39

um but I'll also pause here and just

play30:42

sort of make sure we're uh we're doing

play30:44

well on time as well so so just to maybe

play30:47

summarize some of that and correct me if

play30:49

I'm wrong but it seems like there's

play30:50

three main ways that you can kind of

play30:51

like optimize some of these programs you

play30:53

optimize the instructions the the

play30:56

examples or the few shot that you're

play30:58

giving or the weights of the underlying

play31:00

model itself and sometimes combinations

play31:02

of them is is that correct or is is that

play31:05

yeah this is right this is exactly right

play31:08

what's interesting though is um if you

play31:11

take a

play31:12

program and and this is not really

play31:15

changing what you said Harrison it's

play31:16

kind of just building on it um if you

play31:18

take this program and you optimize It

play31:21

generally the easiest way to get started

play31:23

is to optimize over examples and this is

play31:26

giving you like a pretty power powerful

play31:28

boost very quickly um um and and the

play31:32

idea isn't just adding examples it's

play31:33

like the notion that they're actually

play31:35

coming from your program and there're

play31:37

things that worked before and then on

play31:39

top of that if you want to do random

play31:40

search or you want to do bean search or

play31:42

like ban optimization um um to sort of

play31:45

Select these things uh these things

play31:48

together what's it can be really

play31:51

powerful but what's really cool here is

play31:53

that when you take this program and

play31:55

let's say you optimize it for GPT 3.5

play31:58

with bootstrap F shot with random search

play32:01

for

play32:02

example or with Meo so let's say use Meo

play32:05

so this gives you a setup where each of

play32:07

these modules now learns a new

play32:09

instruction and learns like two or three

play32:11

nice examples um in in the prompt you

play32:15

can take this exact same um optimized

play32:17

program and say well that's my teacher

play32:19

program now I want to actually learn

play32:22

again with bootstrap F shot or with Meo

play32:26

um but now I want examples that are

play32:29

being constructed to come from this

play32:31

pre-optimized version and so you're

play32:33

basically composing the optimization and

play32:36

what's really powerful about that is in

play32:37

some cases it means that you can get

play32:39

much better examples um because now

play32:41

you're sampling them from this better

play32:43

program um the other thing is now that

play32:45

you've optimized your prompts you can

play32:47

take this really high quality version

play32:48

and Nest Nest it again into optimizing

play32:51

the weights so you don't need to start

play32:53

each of these from scratch you can

play32:55

actually compose the process of

play32:56

optimizing them and when when I say

play32:58

compos the process of optimizing them I

play33:00

don't know if that sounds scary or not

play33:01

but in in practice that's like two lines

play33:03

you just pass an extra argument

play33:05

somewhere I can try to find if I have a

play33:07

an example of the um of the code for

play33:10

optimizing it's like two lines but it's

play33:12

not in this set of slides I I'll get it

play33:14

at the end if we want um yeah uh that's

play33:18

hope that answered that question H yeah

play33:21

absolutely another thing I wanted to

play33:22

highlight because I think it's um I

play33:24

think it's a really interesting

play33:26

similarity between kind of like Lang

play33:28

chain and and and DSP is something you

play33:31

said around kind of like tracing and

play33:32

traces and just to emphasize why I think

play33:34

this cool is like you could have a

play33:36

program with like 30 different steps in

play33:38

it and when you're talking about

play33:39

creating these uh data set of examples

play33:42

you're not talking about creating a data

play33:44

set for each of those 30 steps you're

play33:45

talking about creating one for the whole

play33:48

end to end Pipeline and then from that

play33:50

you can generate you can generate

play33:51

synthetically examples for each of those

play33:53

30 steps or instructions for each of

play33:54

those 30 steps but you do it once for

play33:56

this high level pip

play33:58

and that that all and I you know I I dug

play34:01

around in the dsy code based bit and as

play34:03

far as I like tracing is really

play34:05

important for that and and so um do you

play34:07

maybe want to talk a little bit about

play34:08

like what tracing is and why it's needed

play34:11

for that and like what it actually

play34:13

provides yeah sure um so the idea there

play34:17

is you've written this program or you

play34:20

know this could have been you know a l

play34:22

chain chain we have we have done this

play34:23

nice collab in the past where you know

play34:25

you could take an LCL expression and

play34:27

then basically run a dsy Optimizer on

play34:29

top and we showed like a nice cool

play34:31

example of I think it was

play34:34

optimizing rag to generate engaging

play34:37

tweets or something for anwers data set

play34:40

um so basically you have this this

play34:42

program or this chain with has all these

play34:44

pieces and um as you said you have a

play34:47

data set that's describing the task not

play34:49

your pipeline and this is really

play34:51

important because let's say you're you

play34:53

know you're building a you're working on

play34:55

a on a on a at a startup or building a

play34:58

nice project um usually the task is not

play35:02

going to change every day it's like I'm

play35:04

writing this thing that's helping people

play35:07

schedule on email or something that task

play35:10

is fixed but your pipeline that you're

play35:12

building is something you want to

play35:13

explore all all all the time and what

play35:16

that means is we want to enable you to

play35:19

like go and break the pipeline add

play35:20

pieces remove pieces um but also we

play35:23

still want to enable some notion of

play35:24

learning of what your task looks like so

play35:27

in general as you said if if you have a

play35:29

bunch of inputs and ideally a bunch of

play35:32

good outputs at the end um if you don't

play35:35

have that we could rely basically on

play35:36

language model based evaluations um and

play35:40

and in some cases that can take you far

play35:41

in other cases it's harder

play35:44

um now the notion of tracing is well

play35:48

given those inputs if we just make an

play35:50

educated guess about the initial setting

play35:52

of each of your prompts of each of your

play35:54

modules then we can simply add I know

play35:57

set a set the high temperature and run

play35:59

through your pipeline and just see what

play36:01

happens right see if you we're getting

play36:02

good outputs see if we're getting bad

play36:04

outputs and we know that because we have

play36:05

the metric at the end or because we have

play36:07

these assertions but the metric is is

play36:09

the primary thing here um and so let's

play36:12

say it's you're answering you're

play36:13

building a question answering setup well

play36:15

what if I just tell the model please

play36:17

generate search quties and you know it's

play36:19

hit and miss right sometimes it's pretty

play36:21

good as language models you know

play36:23

sometimes they surp surprise us they're

play36:24

pretty decent um other times well

play36:26

doesn't really know what we're saying so

play36:27

it generates I've seen things like it

play36:29

generates SQL quties um or it generates

play36:32

um you know um like quties in this in

play36:36

the archaic kind of like um keyword

play36:39

format with pluses and minuses and like

play36:41

uh you know which many mod retrievers

play36:43

are not really optimized for and it's

play36:45

kind of Overkill or even even

play36:47

distracting um for these models but like

play36:49

you can't blame it it doesn't really

play36:50

know like how would how how is it

play36:52

supposed to know that it's not that it's

play36:54

not to do that um but you know if you

play36:57

sample many sufficiently many of these

play36:59

um you know maybe a couple hundred um

play37:02

you're bound to find some interesting

play37:04

ones that perform really well um and we

play37:07

can basically collect those traces so

play37:09

that's what bootstrap fot does it it's

play37:11

will store those traces and then we can

play37:13

plug them back into the prompt and say

play37:15

when I ask you to generate the search

play37:17

quy here's actually a good example of me

play37:19

doing that in the past and here's the

play37:20

quy you gave me so how about do more of

play37:22

that um and you know it's basically few

play37:24

shot learning but on on this on these

play37:26

traces um

play37:29

and it becomes a discrete search over

play37:32

those so assuming we can generate and

play37:34

filter for good examples um optimizing

play37:38

with having selected them is actually a

play37:40

pretty simple um process from there on

play37:43

the question just becomes like how do we

play37:45

do this without like trying all possible

play37:46

combinations or you know um other stuff

play37:49

and sort of research on hyperparameter

play37:52

optimization for NE networks is actually

play37:53

very useful here because it's kind of a

play37:55

similar type of problem where you have

play37:57

the architecture and then you're trying

play37:58

to plug in those hyper parameters but

play38:00

you don't want to like try all

play38:02

combinations in an expensive

play38:04

way yeah that that's a that's a great

play38:07

explanation um I see a bunch of

play38:09

questions in the chat around how to

play38:11

combine Lang chain and dspi um I think

play38:14

that would be really interesting to

play38:15

discuss before that I want to spend

play38:16

maybe like three minutes um showing some

play38:19

of the things in Lange chain and L Smith

play38:22

that we've been working on because I

play38:23

think they're exactly related to tracing

play38:27

and uh uh testing and evaluation um and

play38:30

data sets and so I I maybe want to set

play38:33

the stage with that and then we can and

play38:34

then I think the rest of the time would

play38:35

be great to discuss okay like and maybe

play38:37

brainstorm with the audience as well

play38:38

what is the right kind of like way to

play38:40

put these puzzle pieces together um so

play38:42

let me share my screen really quick and

play38:44

you were just talking about tracing um

play38:46

and and that's one of the main things

play38:48

that we've invested pretty heavily in at

play38:51

Lan chain so so this is a this is a list

play38:53

of chat Langan chain queries um so on on

play38:56

we have a chat bot for Shadow over Lang

play38:57

chain documentation and if I click into

play39:00

one of these let me find one that has

play39:02

multiple uh language model steps cuz uh

play39:05

that would be interesting um so here you

play39:07

can see that this this is a trace of

play39:09

kind of what's happening and so this

play39:10

this mimics um one of the uh examples

play39:14

that uh Omar was talking about earlier

play39:17

where you have kind of like this this

play39:19

generation of a query um and uh then

play39:22

there's this retrieval step and you get

play39:24

back you get back uh uh uh documents and

play39:27

then there's this generation of a

play39:29

response um and so this this is a very

play39:32

similar uh kind of like uh in this case

play39:35

this is written in lell but it's it's a

play39:36

very it's a very similar program uh

play39:38

putting aside how it's written um and

play39:41

again I think one of the key

play39:42

importantance things here is the

play39:43

importance of tracing and here we we use

play39:45

it for observability we find that

play39:47

there's huge value in just seeing what's

play39:48

going on in this pipeline but as Omar

play39:50

talked about you can also use it to

play39:52

collect all these these examples um one

play39:55

of the things that's interesting is when

play39:59

you're collecting them in lsmith there's

play40:01

actually a difference to how to what

play40:03

we're collecting and it's a very

play40:04

productive difference um faes we collect

play40:07

are like they're kind of synthetic on

play40:11

like you basically hit compile or you

play40:13

run the code with compiling and it's

play40:14

collecting them but the traces you're

play40:16

collecting here are maybe like

play40:17

interactions with users and they're kind

play40:20

of in production and they increase over

play40:22

time and they kind of get sustained the

play40:24

traces in our case we build them we

play40:25

optimize we throw them away essentially

play40:27

or we cach them but we we're not you

play40:29

know this is this is a different type of

play40:31

tracing but it's you know very very uh

play40:34

valuable yeah no that's a that's a great

play40:36

point I think it comes down to like what

play40:37

the data set is right like what's the

play40:38

data you're running it over in in the

play40:40

case of uh DSP it's a bunch of examples

play40:43

that you've curated and that may and you

play40:45

have some metrics for and maybe you have

play40:46

some labels for in this case this is

play40:48

like production traffic um and I think

play40:50

there's pros and cons to each I'd say

play40:52

the pros probably of this and and to be

play40:55

clear we also have this concept of kind

play40:56

of like data sets contesting and Lang

play40:58

Smith which I can jump to but the pros

play40:59

of that is that you generally have maybe

play41:01

thought a little bit more about metrics

play41:04

and you have maybe have labels as well

play41:06

like you have ground truth labels so you

play41:07

can create better metrics here the pros

play41:09

of this are you you get to see questions

play41:12

that people are ask actually asking kind

play41:14

of like your chatbot in in real life and

play41:16

so I think there's just a broader swath

play41:18

of them for better or or for worse but

play41:21

yeah you're absolutely right that it's

play41:22

different different settings that that

play41:24

this tracing is being done in um

play41:27

and and on the concept of kind of like

play41:29

data say one of the things that we

play41:31

purposefully added since the beginning

play41:32

is this really uh or is this concept of

play41:35

adding um any part of this Trace to a

play41:38

data set so you can add like the high

play41:40

level thing you can add like this

play41:42

individual kind of like component you

play41:43

can click add the data set and you can

play41:45

basically add it to to an existing data

play41:48

set um data sets themselves uh just

play41:50

showing them really quickly are nothing

play41:53

more than kind of like examples um so

play41:56

examp examples of inputs and optionally

play41:59

outputs um so so as aror said there's a

play42:01

bunch of cool things you can do if even

play42:03

if you don't have outputs um and then

play42:06

you can run tests over them and so I

play42:08

think the tie in here is like just as

play42:10

you would Define a metric to kind of

play42:11

like optimize your system here you're

play42:13

defining a very similar metric and a

play42:15

very similar kind of like data set but

play42:16

you're using it for this this kind of

play42:18

just like evaluation of your system as a

play42:20

whole um and so I think very very

play42:24

similar Concepts kind of like across the

play42:26

two we've also this is one thing we've

play42:28

added recently is this way of kind of

play42:29

like moving data um automatically from

play42:33

production logs into data sets um and I

play42:36

think uh one of the things that we

play42:38

wanted to make sure if if you think

play42:41

about what um and and tell me tell me if

play42:43

this is a a bad understanding no more

play42:45

but what you're what you're doing in the

play42:47

optimization is you're basically saying

play42:50

uh I'm going to run it over these high

play42:51

LEL inputs and high level outputs I'm

play42:53

going to look at places where it was

play42:56

scored correctly at it was scored

play42:59

correctly and then I'm going to take all

play43:02

the like sub examples for each

play43:04

individual node or thing associated with

play43:06

that and create a data set of few shot

play43:08

examples or a data set to find tune on

play43:11

or or somehow update the prompt

play43:12

instructions for those nodes but

play43:14

basically there's this again there's

play43:15

this this Trace which has the

play43:17

association between highle metric

play43:19

low-level actually what went into the

play43:21

prompt what actually are the examples um

play43:24

and and that's kind of like the data

play43:25

sets that you want together um that's

play43:29

exactly right that's exactly right and

play43:30

when you said instructions you basically

play43:32

are taking this small data set and

play43:34

you're basically getting a language

play43:35

model to say oh well the task that is

play43:39

kind of um described by those examples

play43:41

is the following instructions and you

play43:44

want to generate several of these so

play43:45

that you can run run a search over those

play43:47

as well but it's really it's building

play43:49

those mini data

play43:50

sets yeah and so that's that's one of

play43:53

the things that we put a lot of um

play43:55

effort into and so when you have fil

play43:57

filtering here it's not just filtering

play43:59

on the top level that's the default but

play44:01

you can do things you can add filter so

play44:03

like okay give me uh all things where

play44:07

the name is like find find docs or

play44:10

something like this so this would be a

play44:12

substep then you can add a filter on the

play44:15

trace so you can say okay all places and

play44:17

Trace we mean like high level parent

play44:18

level Trace so give me like all all all

play44:22

sub runs where there's this fine doc

play44:24

where it has this highlevel user score

play44:27

is one and you can filter for that and

play44:29

then you can set up an automation um and

play44:31

you can set up an automation where it

play44:33

automatically adds all of these to a

play44:34

data set over time or something like

play44:36

that um and so kind of like

play44:37

programmatically taking these production

play44:39

data and building uh and then and this

play44:43

is where this is where I think uh you

play44:45

know it's maybe a good spot to

play44:47

transition into like how to merge these

play44:50

because we have no concept of

play44:52

optimization um and and so you know

play44:55

that's obviously one really big thing

play44:57

that's missing from things in Lang chain

play44:59

Lang Smith and and and DSP and the value

play45:03

it

play45:04

provides and and maybe just backtracking

play45:07

you into the beginning I know you

play45:08

mentioned in the beginning there's kind

play45:09

of like two big components one's like

play45:11

the way of of expressing this program

play45:13

and then the other is kind of like this

play45:14

optimization um how with with those

play45:18

things as context how how can we think

play45:21

about combining Lang chain and

play45:23

dspt so I I um and and we've disc this a

play45:27

couple times Harrison um I I like to

play45:29

think of P of DSP as sort of P torsion

play45:32

dis stack so if you I I like analogies

play45:35

and if you think of deep Neer networks

play45:37

as like this wonderfully successful um

play45:39

Paradigm in in machine learning that you

play45:42

know got us here with language

play45:44

models it really only I think there was

play45:47

a threshold of Crossing from very fairly

play45:50

messy space to like something where we

play45:52

can actually all talk about the same

play45:54

thing precisely um or or and and sort of

play45:57

run the same code with the introduction

play46:00

of like Fano and you know Cafe and

play46:02

eventually things like pytorch and

play46:04

tensorflow and other and other cool

play46:05

Frameworks um and byor is kind of really

play46:07

neat in my opinion in a bunch of ways

play46:09

but it's not unique so I like to think

play46:12

here of like the stack we have in neon

play46:14

networks there is something like pytorch

play46:16

and then there's lower level things like

play46:18

you know um kernels you know um you know

play46:21

mkl for CPUs or you know various Cuda

play46:25

stuff and then on top of that there are

play46:26

higher level things like hugging face

play46:28

Transformers which is you know most

play46:31

people I'd argue a lot more people are

play46:34

familiar with using using hugging face

play46:35

Transformers than pytorch um because you

play46:38

what you just want to get birth or gpt2

play46:40

or one of these models and you just get

play46:42

it ready from there byor is more of like

play46:45

here are the here are kind of like the

play46:47

modular building pieces good luck um

play46:50

here are optimizers and here are like

play46:51

layers and here's convolution layers

play46:54

here's Dropout layers um do something

play46:57

cool um and I kind of like to think of

play46:59

DSP as this the byor of this stack of of

play47:02

basically saying you know it's a small

play47:04

framework we're going to give you these

play47:05

layers here's a Chain of Thought layer

play47:07

here is a program of thought layer here

play47:08

is a react layer that um sort of is is

play47:11

is also similarly abstract and here's

play47:14

the notion of signatures and the notion

play47:16

of optimizers here is how you build

play47:18

stuff with them good luck sort of you

play47:20

know um and and we can help you sort of

play47:23

think about how to construct this these

play47:24

programs um if you want but otherwise

play47:27

here is a bunch of optimizers that we're

play47:28

continually expanding to take these

play47:31

programs and give you high quality

play47:32

systems um in my conception obviously

play47:35

you know please feel free to correct

play47:36

that or otherwise like you

play47:38

know describe your your your framing um

play47:41

one of the things it's not necessarily

play47:43

the only thing one of the things that

play47:44

Lang chain is just outstanding at is

play47:46

kind of how it it brings in higher level

play47:49

pieces of the stack and also lower level

play47:51

pieces of the stack so you have all

play47:52

these like language models integrated

play47:54

and all these like uniform abstraction

play47:57

around all kinds of tools that's on the

play47:58

lower level but on the higher level you

play48:00

have all these you know various chains

play48:01

and various agents and lots of sort of

play48:04

um sort of tracing systems and other

play48:06

things that that say like well um

play48:10

assuming you know the right prompts or

play48:12

assuming you're given the right prompts

play48:13

in one way or another um or that you're

play48:15

using prompts that are sort of pre

play48:17

pre-built um either from The Hub or that

play48:20

are sort of internal to L chain you know

play48:22

or otherwise um here's here's what you

play48:24

could do with it and here's how you

play48:25

could sort of think about streaming and

play48:27

think about um these types of these

play48:29

types of of things in DSP we don't think

play48:31

about almost any of that um except to

play48:33

the extent that we strictly need um

play48:35

where we're focused is um what are the

play48:38

right declarative modules like Chain of

play48:40

Thought um and what are the right um

play48:43

sort of fundamental abstractions like

play48:45

assertions or metrics um and what are

play48:47

the right optimizers and this is really

play48:49

where we spend most of the time it's