Getting started with DSPy tutorial

Qdrant
22 Feb 202434:34

Summary

TLDRDSPYは人工知能における革新的な開発で、大規模言語モデルの呼び出しを連鎖させるLangchainのアイデアに似ています。DSPYはPyTorchのような新しい構文を導入し、LLMプログラムの制御と柔軟性を高めます。この方法は、複雑なタスクを単純なサブタスクに分解し、並列化することで、さまざまな制御メカニズムを利用できるようになります。DSPYは、プロンプト内の指示と例を自動的に最適化し、望ましい行動を引き出すことを目指しています。DSPYプログラミングモデルは、PyTorch、エージェント構文、LLMプログラムの組み合わせとして描かれており、より高度なLLMプログラミングを実現する可能性を秘めています。

Takeaways

  • 🚀 DSPYは人工知能における革新的な開発で、大規模言語モデル呼び出しの連鎖を可能にする。
  • 🔗 LLM APIの活用により、複雑なタスクを研究、書込み、編集に分解し、並列化して制御が可能。
  • 🌟 DSPYはPyTorchに似た新しい構文を導入し、LLMプログラムの制御と柔軟性を高める。
  • 📝 DSPYは最適化されたインストラクションを用いて、プロンプト内のタスクを自動的に最適化する。
  • 🔄 複雑な質問を小さく分け、多段問答を行うことで、より効果的な回答が可能。
  • 🔄 DSPYのプログラミングモデルは、PyTorch、エージェント構文、LLMプログラムを組み合わせたもの。
  • 📚 DSPYは構造化された入力と出力プロンプトを提供し、LLMプログラムの一貫性を確保。
  • 🔧 DSPYはLLMモジュールどうしの相互作用をプログラム的に制御できる。
  • 🔧 DSPYのアサーション機能は、LLMプログラムの品質と性能を向上させる。
  • 📈 DSPYコンパイラは、プロンプトの最適化を自動化し、新しい言語モデルに対応する。
  • 🔍 DSPYは合成例をブートストラップして生成し、プロンプトやモデルのファインチューニングに役立つ。

Q & A

  • DSPYとは何ですか?

    -DSPYは、人工知能分野における革新的な開発で、大規模言語モデルの呼び出しを連鎖させるLangchainのアイデアに似ています。

  • LLM APIとは何ですか?

    -LLM APIは、アプリケーションに統合したり、複雑なプログラムを作成するために使用されるものです。言語モデルの1つの呼び出しの出力を次のモデルに入力として与えることができます。

  • DSPYの新しい構文はどのように機能しますか?

    -DSPYはPyTorchに似た新しい構文を導入し、LLMプログラムに対するより細かい制御と柔軟性を提供します。

  • DSPYの最適化とは何を意味しますか?

    -DSPYの最適化は、タスクのLLMプロンプトにおける指令を最適化することを意味し、これにより特定のタスクに対してより良いパフォーマンスを発揮させることができます。

  • DSPYのプログラミングモデルはどのように機能しますか?

    -DSPYのプログラミングモデルは、PyTorch、エージェント構文、LLMプログラムを組み合わせたもので、構造化された入力と出力プロンプトを持ち、LLMモジュールどうしの相互作用をプログラム的に制御することができます。

  • DSPYのシグネチャとは何ですか?

    -DSPYのシグネチャは、タスクのプロンプトを定義するための方法で、入力と出力のフィールドを定義することで、LLMプログラムの一貫性を確保します。

  • DSPYのコンパイラは何の役割を果たしますか?

    -DSPYのコンパイラは、プロンプトの最適化を自動化し、手動でのプロンプトチューニングやプロンプトエンジニアリング、手動の例の作成を排除することを目的としています。

  • DSPYにおけるマルチホップ質問の回答とは何ですか?

    -マルチホップ質問の回答は、複雑な質問をより小さなサブクエスチョンに分解し、効果的に回答するためのアプローチです。

  • DSPYで合成例を作成する方法は何ですか?

    -DSPYでは、ブートストラップを使用して合成例を作成することができます。これにより、より小さな、より正確なモデルをトレーニングするか、プロンプトでの例として使用することができます。

  • DSPYのテレプロンプターとは何ですか?

    -DSPYのテレプロンプターは、プロンプトの最適化を支援するツールで、組み込みプロンプトを使用して、LLMの最適な応答を生成するためのヒントを提供します。

Outlines

00:00

🤖 DSPYと人工知能の進歩

DSPYは人工知能の革新的な開発で、大規模言語モデルの呼び出しを連鎖させるLangchainのアイデアに匹敵する。LLM APIの統合と複雑なプログラムの作成、タスクの平行化、DSPYの新しい構文、最適化命令、JSON出力の効率化、DSPyの自動化と最適化、DSPyプログラミングモデルの紹介が含まれている。

05:04

🔍 複雑な質問へのアプローチとDSPYの利点

複雑な質問を小さく分け、効果的に回答する方法とAutoGPTの概念が説明されている。DSPYプログラムモデルの焦点、クエリの生成とリトロイバーの使用、応答の生成、ChainGPTの能力の拡大、LangChainとLlama Indexのアプローチ、DSPYのフレームワークとグラフの使用が含まれている。

10:05

📝 DSPYの利点とプログラム言語の役割

LLMのためのプログラム言語を持つ2つの主な利点と、DSPYのシグネチャを使用したプロンプトの整理、LLMモジュールの相互作用の制御、DSPYのコントロールフロー、DSPYのアサーションが説明されている。

15:07

🔧 DSPYコンパイラとインストラクションのチューニング

DSPYコンパイラの理解、インストラクションのチューニングの目標、プロンプトの最適化、DSPYのプロンプトの最適化、LLMのチューニングフレームワークの使用、プロンプトの例とフィールドの定義、DSPYのコンパイルとテストが含まれている。

20:08

🚀 DSPYフレームワークとブートストラップ

DSPYフレームワークのエキサイティングな部分、ブートストラップを使用した例の作成、ブートストラップの使用ケース、DSPYの理由付けとLLMの使用、合成例の品質の測定、DSPYのメトリックスとDSPYの使用が含まれている。

25:09

📚 RAGプログラムとテレプロンプターの最適化

RAGプログラムの作成、DSPYのシグネチャとプロンプトの解析、RAGの応答の生成とDSPYのチェインオブスロットの使用、テレプロンプターの最適化プロセス、メトリックスの定義と使用、RAGのコンパイルと推論、DSPYのプログラミングモデルとコンパイラの紹介が含まれている。

Mindmap

Keywords

💡DSPY

DSPYは、人工知能における革新的な開発であり、大規模言語モデル呼び出しのチェーン化を可能にするLangchainアイデアに比肩するものです。DSPYは、新しい構文を導入し、LLM(Large Language Model)プログラムの制御と柔軟性を向上させます。この構文はPyTorchに似ており、タスクの最適化を可能にします。例えば、文書の再ランキングやJSONの出力を効率化するための最適化されたフレーズを使用することができます。

💡LLM APIs

LLM(Large Language Model)APIは、アプリケーションに統合されるか、複雑なプログラムを作成するために使用されるインターフェースです。これらのAPIを通じて、一つの言語モデルの出力を次のモデルに入力として与えることができます。これにより、タスクを研究、書込み、編集などの単純な作業に分解し、並列化することで、様々な制御メカニズムを利用できます。

💡Chain of Thought

Chain of Thoughtは、複雑な質問をより小さなサブクエスチョンに分解し、それらを効果的に回答するための方法です。これは、AutoGPTという概念の背後にあるもので、大規模な言語モデルをさらに連結して、より複雑な言語モデルを作成することができます。

💡Optimization

最適化は、タスクのパフォーマンスを向上させるために、プロンプトや例を自動的に最適化するプロセスです。DSPYは、プロンプト内の指示と例を自動的に最適化し、望ましい行動を引き出すことを目指しています。これにより、タスクのパフォーマンスが向上し、より効率的な結果を得ることができます。

💡Teleprompter

Teleprompterは、DSPYのコンパイラの一部であり、プロンプトの最適化に役立ちます。これは、組み込みのプロンプトを使用して、既存のプロンプトを最適化し、新しいプロンプトを作成することができます。これにより、ファクトベースの質問や複雑なクエリのコンテキストを提供するための最適なフレーズを生成できます。

💡Multi-hop Question Answering

Multi-hop Question Answeringは、複雑な質問を複数のサブクエスチョンに分解し、それらを順番に解決するプロセスです。この方法は、RAG(Retrieval Augmented Generation)などのLLMチェーンで使用され、より正確な回答を得ることができます。

💡Bootstrapping

Bootstrappingは、小規模なデータセットを使用して、より正確なモデルをトレーニングするか、プロンプトの例として使用する技術です。これは、生成モデルの助けを借りて、人工的に訓練データを生成することができます。Few-shot learningの概念に関連しています。

💡Synthetic Examples

Synthetic Examplesは、人工的に生成された例であり、プロンプトやモデルのファインチューニングに使用されます。これらの例の品質は、正確なマッチングメトリックを使用して評価されます。DSPYは、合成例の品質を向上させるために、メトリックと組み合わせて使用することができます。

💡RAG (Retrieval Augmented Generation)

RAGは、検索を強化した生成と呼ばれるLLMチェーンのタイプであり、検索と生成のプロセスを組み合わせて、より正確な回答を生成します。この方法は、複雑な質問を分解し、関連する文書を検索して、それに基づいて回答を生成するものです。

💡Inductive Biases

Inductive Biasesは、モデルの効率を向上させるために、モデルに追加される仮定や偏見です。DSPYでは、シグネチャが特定のプログラム部分に関連するタスクを実行するためのインダクティブバイアスとして機能します。これにより、プログラムの特定の部分が特定のタスクをより効果的に実行できるようになります。

Highlights

DSPY is a groundbreaking development in AI, introducing a new syntax similar to PyTorch for LLM programs.

LLM APIs can be integrated into apps or used to create complex programs by chaining language model calls.

DSPY combines new syntax with optimization, allowing for more efficient and effective LLM tasks.

The DSPY programming model simplifies complex tasks by breaking them down into research, writing, and editing.

DSPY introduces a signature syntax for improving the appearance of LLM program codes.

LLMs like GPT-4 and Gemini Ultra can deduce variable meaning from names, enhancing DSPY's functionality.

DSPY allows for the creation of LLM agents and defines how they interact with input data to produce outputs.

Multi-hop question answering involves breaking down complex questions into smaller sub-questions for effective answering.

DSPY enables the optimization of instructions and examples used in prompts automatically.

DSPY programming model is a combination of PyTorch, agent syntax, and LLM programs.

DSPY provides more control and flexibility over LLM programs compared to frameworks like LangChain and Llama Index.

DSPY allows for structured input and output prompts, ensuring consistency in LLM programs.

DSPY enables programmatic control over LLM modules' interactions, unlocking customization possibilities.

DSPY assertions are a feature that will be discussed in a separate paper, offering further control mechanisms.

DSPY aims to optimize prompts automatically, reducing the need for manual prompt tuning and engineering.

DSPY can generate synthetic examples for prompts or fine-tuning models, utilizing few-shot learning concepts.

DSPY's teleprompter system suggests examples for language model components, creating new signatures, and analyzing metrics.

DSPY's compiler eliminates the need for manual prompt tuning by optimizing task signatures and descriptions.

DSPY's approach to LLM programming is compared to PyTorch's impact on deep learning model training.

Transcripts

play00:00

DSPY is a groundbreaking development in artificial intelligence,

play00:04

comparable to the advent of Langchain idea

play00:07

of chaining large language model calls.

play00:10

Thinking about LLM APIs, they are exciting.

play00:13

They can be integrated into apps or used

play00:17

to create complex programs where the output of one language model

play00:22

call can be fed as input to the next.

play00:26

We can simplify complicated tasks, such as writing pull requests

play00:30

or blog posts by breaking them down into research, writing,

play00:34

and editing tasks.

play00:36

By parallelizing these tasks, we can

play00:39

take advantage of various control mechanisms.

play00:43

DSPY introduce a new syntax similar to PyTorch,

play00:47

which gives us more control and flexibility over our LLM programs.

play00:53

The exciting thing about DSPY is that it combines a new syntax

play00:59

with optimization.

play01:01

This means that we have optimizing instructions

play01:04

for tasks in our LLM prompts.

play01:07

For example, suppose your task is to re-rank documents

play01:12

and you have a re-ranking agent.

play01:14

In that case, a particular phrasing in results

play01:17

in better performance than others.

play01:20

Also, if you need to output JSON, we will find ways

play01:24

to make it more efficient.

play01:26

Instead of saying, please, output JSON or give me JSON,

play01:30

just use the optimized phrasing to get the best results.

play01:35

The idea behind DSPy is to optimize the instructions and examples

play01:40

used in the prompt automatically.

play01:42

This is done to elicit the desired kind of behavior

play01:46

when using the DSPy programming model.

play01:50

Many exciting things can be achieved through this process.

play01:55

The DSPy programming model is where the story begins.

play01:59

It can be described as a combination of PyTorch, agent syntax,

play02:06

and LLM programs.

play02:08

Let's dive in and explore what the process entails.

play02:12

To begin with, we initialize the components

play02:14

required for our LLM program.

play02:17

We start with a retrieval system, such as Qdrant,

play02:20

and connect it with a query generator and an answering

play02:25

mechanism.

play02:26

We have two distinct LLM components or prompts

play02:30

that carry out specific tasks.

play02:33

These could be fine-tuned models that

play02:35

are specialized for their roles in the overall LLM program,

play02:40

which we will refer to as the logic in the forward pass.

play02:46

Our first component is the query generator.

play02:50

And there are a few things to keep in mind about it.

play02:54

To begin with, we need to give the component a name, which

play02:58

we will call gen_query.

play03:00

Next, we have the DSPy.chainOfThought,

play03:04

which we will discuss later.

play03:06

Let's start by talking about the signature.

play03:09

As we continue with this lecture,

play03:11

we will see that there is an alternative way of expanding

play03:14

the signature by writing a longer initial prompt

play03:18

in the docstring and then adding a typed

play03:22

input-output field.

play03:24

A feature in DSPy allows you to improve

play03:28

the appearance of LLM program codes

play03:31

by using signature syntax.

play03:34

The short syntax involves defining the context,

play03:38

question, and query fields.

play03:40

When you input the context and question,

play03:43

the DSPy will parse it and output the query.

play03:47

The LLMs, or large language models,

play03:49

have an impressive ability to deduce

play03:52

the meaning of a variable just from its name.

play03:56

For instance, in a context question,

play03:58

powerful models like GPT-4 or Gemini Ultra

play04:02

can accurately understand the context from the name alone.

play04:06

Therefore, the second component of our program

play04:10

involves another LLM that generates the answer.

play04:15

This program takes the context and question as input

play04:19

and returns the answer.

play04:21

The exciting part is that you have the power

play04:24

to build LLM agents and define how they interact

play04:29

with the input data to produce an output.

play04:32

To start, we define an empty list called context.

play04:37

This also enables thinking about how we can incorporate

play04:41

local memory into our forward pass

play04:44

and how we can use non-parametric functions

play04:48

in the forward pass of our program.

play04:52

We start with an empty list of context

play04:54

and loop through it, which can be set

play04:57

as a hyperparameter in the program

play04:59

to determine how many hops are required.

play05:03

Multi-hop question answer involves breaking down

play05:06

a complex question into smaller sub-questions

play05:10

to effectively answer it.

play05:12

This approach is like the concept behind AutoGPT,

play05:16

which gained public attention when it was first introduced.

play05:21

The agent can evaluate each sub-question

play05:24

to determine if enough information has been compiled

play05:29

to consolidate the results.

play05:31

This process is referred to as multi-hop question

play05:34

decomposition.

play05:35

Our focus shall be on the DSPy program model.

play05:39

To generate a query, we take the context

play05:42

and the input question as input.

play05:45

This process helps us create the query.

play05:47

After generating the query,

play05:49

we pass it through our retriever Qdrant.

play05:53

The retriever then provides us with the required context.

play05:58

We keep looping through the process

play06:00

until we have all the context we need

play06:02

to answer the question.

play06:04

The number of loops depends on the context.

play06:08

After generate a response,

play06:10

we can call it by inputting it into the conversation.

play06:14

The release of ChatGPT amazed everyone

play06:17

with its ability to fluently converse, answer questions,

play06:22

and create a YouTube script

play06:24

inspired by a favorite book or chat.

play06:28

We discovered we could take these large language models

play06:32

like ChatGPT even further by connecting them in chains

play06:37

to create more complex language models.

play06:41

LangChain and Llama Index work has been evangelizing

play06:43

a new way of building applications

play06:46

using large language models, which is the future.

play06:50

The approach they propose involves using chains

play06:53

that overcome the input length limitations.

play06:57

Chains were traditionally used

play06:59

to break up complex inputs into smaller chunks,

play07:02

process each chunk individually,

play07:05

and then combine the outputs to process long documents.

play07:09

This approach is still valid,

play07:11

especially given the challenges of supervising models

play07:14

with size of up to 32K.

play07:17

Determining whether these models attend

play07:20

to all the inputs is difficult,

play07:22

so breaking up the input length remains valuable.

play07:26

The second challenge is overcoming complex tasks.

play07:30

For example, it can be overwhelming

play07:33

if you ask ChatGPT to write a blog post

play07:36

on how to run your application on Kubernetes

play07:40

and retrieve your code.

play07:41

It's better to break down complex tasks

play07:43

into smaller sub-tasks and define a workflow

play07:47

for the language model to follow to complete your request.

play07:51

Chains have greatly improved search capability.

play07:55

For instance, that is the example

play07:57

of multi-hop question answering.

play07:59

In this process, we formulate a query,

play08:02

retrieve information,

play08:03

potentially loop back to retrieve more context,

play08:07

and then we use that information to answer the question.

play08:12

One way to improve this process is to use language models

play08:16

to create a filter for our search.

play08:19

Re-ranking documents with large language models

play08:22

have also played a significant role in search.

play08:26

As language models have evolved,

play08:28

it has become clear that they can be better

play08:32

represented as graphs.

play08:34

For example, we now have a LangGraph,

play08:37

and we can think about this as text transformation graphs.

play08:42

We have graphs of computation

play08:44

where edges pass along the transformation of text.

play08:48

We input and output text sent along the edge

play08:52

to the next node for further transformation.

play08:55

For instance, we can spin up three separated process

play08:59

of writing and editing a story in parallel.

play09:02

Then we can sync these nodes

play09:04

into a published stories component in our LLM program.

play09:09

This can be used to produce newsletter based one

play09:12

the news of a specific week,

play09:15

like what happened on Yahoo Finance last week.

play09:18

We can parallelize the process of writing

play09:21

and editing the story,

play09:23

and then sync it into another part of the program.

play09:28

We must aggregate these stories

play09:30

and combine all the information

play09:32

to create a coherent narrative.

play09:35

Before you delve further into DSPY,

play09:37

I highly recommend checking out the LLM program galleries

play09:41

that both Llama Index and LangChain have created.

play09:44

While frameworks are great for building LLM chains,

play09:48

graphs, agents, and programs,

play09:51

they have limited flexibility.

play09:53

That's where DSPy programming comes in

play09:56

as a complete LLM programming language.

play09:59

In contrast to frameworks,

play10:01

DSPy offers more exciting possibilities

play10:04

for LLM programming without prompting.

play10:08

There are two main advantages

play10:09

to having a program language for LLM.

play10:13

First, it allows you to have a structured input

play10:15

and output prompts to consistently express your ideas

play10:20

within your programs.

play10:21

Second, it will enable you to control

play10:23

how your LLM modules interact

play10:26

with each other programmatically.

play10:28

And this unlocks the flexibility

play10:31

to customize the LLM program

play10:34

to suit your needs and imagination

play10:38

while using the LLM API.

play10:40

Okay, let's begin by discussing how to clean up your prompts

play10:44

and structured input outputs.

play10:47

To achieve this, DSPy uses a signature.

play10:50

In this example we have the GenerateAnswer

play10:53

that inherits a signature.

play10:55

Also, we write a docstring that describes the task prompt.

play11:00

In this particular context,

play11:02

the task involves providing concise answers to questions.

play11:07

Later in the video,

play11:08

we will discuss how DSPy can assist you

play11:11

in optimizing these prompts.

play11:13

You can give a general overview of the task

play11:16

and DSPy will take care of the rest.

play11:18

You don't need to tweak the language

play11:20

because even subtle changes

play11:23

can significantly impact the performance.

play11:26

DSPY will optimize your instruction,

play11:29

but we can discuss that later.

play11:31

There is a method for defining input and output fields

play11:35

that provide a consistent syntax for the prompts

play11:38

and structured outputs of all components in LLM programs.

play11:44

This is one way to ensure consistency in your LLM programs.

play11:48

Another exciting feature is the ability to control

play11:52

how the LLM modules interact

play11:54

with each other programmatically.

play11:57

Okay, let's talk about the controls in LLM programming.

play12:01

For example, use a specific syntax to create a for loop

play12:04

in your program.

play12:05

You can even access hops that are not in this example.

play12:10

You can interface these loops

play12:11

and write more complex code with if-else statements.

play12:16

Here is a quick example inputting a stock ticker.

play12:19

You can prompt the program to output financial details

play12:24

if it's about a specific company.

play12:26

I have a program that first processes my financial databases

play12:30

and identifies whether they relate to some ticker.

play12:35

If they are, I can ask the program to research

play12:39

the company's performance and some reports.

play12:42

I can also use the program to do a web query

play12:45

and get more information about market trends

play12:49

and recent news.

play12:50

Then the program will generate a query

play12:53

and send it to the API.

play12:55

After that, the API will respond with the required context.

play13:00

Then I can write potential investment insights

play13:03

and send them via email.

play13:05

However, I would review it first

play13:07

before sending it to anyone.

play13:09

The point is that you can have a good control flow

play13:12

for the LLM programs by using the for loop,

play13:16

if statement and local memory.

play13:18

The next big thing is DSPy assertions,

play13:21

which will be discussed in a separate paper.

play13:24

All right, so by now, I hope you are convinced

play13:27

of the DSPy syntax and how it can offer you more control

play13:32

and flexibility over your LLM programs.

play13:36

So DSPy is like PyTorch for the LLM program.

play13:40

PyTorch is a popular deep learning model training framework.

play13:45

This tweet explains a lot about PyTorch.

play13:48

Two main things make PyTorch stand out.

play13:50

Its syntax for defining neural networks layers

play13:53

and its eager execution feature.

play13:56

PyTorch and TensorFlow both have different ways

play13:58

of implementing eager execution.

play14:01

To use PyTorch, you must first define a neural network

play14:04

and initialize the layers you will use.

play14:07

When defining a layer such as a convolution,

play14:09

you must specify the input and output

play14:12

determining the graph transformation.

play14:14

It's like matrix multiplication,

play14:16

where the dimensions of the matrices have to match.

play14:20

You need to define the layout of the neural network

play14:23

and ensure that the input and output layers

play14:26

are compatible with each other.

play14:29

After that, you define the forward pass,

play14:32

which determines how the network processes the input.

play14:35

We now have the syntax for defining components

play14:37

in LLM programs.

play14:39

You can initialize the program by defining its components.

play14:43

Next, you define how the forward pass will look.

play14:46

There were some excellent analogies

play14:48

for designing the DSPy inspired by PyTorch.

play14:51

The first important point is that

play14:53

we should rely one more than one layer

play14:56

to perform all the work.

play14:58

We must add inductive biases and depth

play15:01

to improve the model's efficiency.

play15:04

For instance, in the convolutional PyTorch network,

play15:07

the convolution has an inductive bias

play15:10

of the weight-sharing kernel

play15:12

as it slides across an image pixel matrix.

play15:16

Similarly, we can observe that signatures

play15:19

have this inductive bias

play15:22

of what the part of the program is supposed to do.

play15:25

Suppose a program has a specific context,

play15:28

question and query.

play15:29

In that case, it is an inductive bias

play15:31

for that particular program part.

play15:34

This component of the program is designed

play15:36

to perform a specific task.

play15:39

The idea of inductive biases is fascinating.

play15:42

Okay, let's discuss a big concept, the DSPy compiler.

play15:46

The best way to understand the concepts behind this

play15:49

is to start testing it with the program

play15:52

you have in mind to optimize.

play15:55

Let's dive into the instruction tuning.

play15:57

The goal is to eliminate the need

play15:59

for manual prompt tuning, prompt engineering

play16:02

and manual example writing.

play16:05

For example, when training a re-ranker agent,

play16:08

one might experiment with various phrasings

play16:12

of the instruction.

play16:13

It is important to note that how you phrase your prompt

play16:16

for re-ranking documents can affect the performance

play16:20

of different language models.

play16:22

For example, a prompt that works well with GPT-4

play16:26

might not work well with Gemini Ultra or Llama 2.

play16:30

Therefore, the optimal phrasing for your prompt

play16:34

will depend on the specific language model you are using.

play16:37

It's important to fine tune your ending prompts

play16:41

to stay up-to-date with the latest language models.

play16:45

With the new language models emerging every month or so

play16:48

for at least the next year or year and a half,

play16:52

keeping your LLM programs current is crucial.

play16:55

Using an automatic tuning framework

play16:57

can help you quickly and easily plug in a new language model

play17:02

and determine which prompt will generate

play17:05

the desired response.

play17:06

A fun thing happens sometimes when someone asks

play17:09

for something specific like,

play17:11

"I will pay you $1 million to output JSON."

play17:14

These requests can be confusing and difficult to understand.

play17:18

However, DSPy aims to solve this problem

play17:22

by starting with a basic signature

play17:24

and then optimizing it to create the best possible shorthand

play17:29

for answering questions.

play17:31

This shorthand could be used for quickly answering

play17:34

fact-based questions or providing context

play17:37

for more complex queries.

play17:39

It is going to optimize a more detailed description

play17:43

of the test.

play17:45

The way it works is interesting.

play17:47

There are built-in prompts in the DSPy compiler

play17:51

that can be used to end pre-existing prompts in the chains.

play17:56

However, there are some prompts on how to use LLMs

play18:00

to optimize LLMs.

play18:02

So we have this prompt for optimizing the instruction.

play18:05

So you are an instruction optimizer

play18:07

for larger language models.

play18:09

I will give you a signature of fields,

play18:12

inputs and outputs in English.

play18:15

Your task is to propose an instruction

play18:18

that will lead to a good language model

play18:21

to perform the task well.

play18:23

Don't be afraid to be creative.

play18:25

So don't be afraid to be creative for the last part.

play18:29

That's what we are hoping to end with DSPy.

play18:32

After completing that, you can propose some instructions.

play18:35

This prompt takes multiple instructions

play18:38

and combines them into one.

play18:40

It uses sampling to create multiple outputs

play18:44

and then aggregates them to produce the final result.

play18:48

This is how we optimize the tasks,

play18:50

signature and description.

play18:52

Examples have played a crucial role

play18:55

in the development of deep learning.

play18:57

In the past research papers often described datasets

play19:00

consisting of hundreds of thousands of examples,

play19:05

such as the Squad Question Answering dataset.

play19:08

These examples were focused on human-written

play19:11

natural language inference.

play19:13

They were used to identify entailment, contradiction

play19:17

and other related phenomena.

play19:18

In the past, people used to create

play19:21

massive human-labeled datasets.

play19:24

Today, with the help of generative models,

play19:27

we can generate training data to make smaller,

play19:31

more precise models or use them as examples in prompts.

play19:36

This concept is known as few-shot learning

play19:39

and it was explained in the GPT-3 paper released in 2020.

play19:45

It was surprising that you achieved this task

play19:48

without any prior examples.

play19:51

The term zero-shot refers to the situation

play19:54

where you only have the task description to work with

play19:58

and need to create a clear set of instructions

play20:01

without any examples to guide you.

play20:04

One shot indicates that you have one example to work with,

play20:08

while a few shots means you have a few examples.

play20:12

One of the most exciting parts of the DSPY framework

play20:16

is that you can create examples by bootstrapping.

play20:19

You can prompt GPT-4 and Gemini Ultra,

play20:23

but you can also prompt the Mistral 7B or a Llama 2,

play20:28

depending on when you want to fine-tune these models

play20:32

versus a few shots.

play20:34

When using bootstrapping, the question arises,

play20:37

which examples should be included in the prompt?

play20:40

For example, we have 10 examples

play20:42

and are trying to translate from English to Portuguese.

play20:46

In that case, we may only want to include

play20:48

three of those examples in the input.

play20:51

Another use case for bootstrapping

play20:53

is when we want to train a model

play20:56

to understand a chain of thought

play20:58

rather than just input-output pairs.

play21:01

In this case, we should include examples

play21:03

that show how a person arrived at a particular answer

play21:08

in addition to the answer itself.

play21:11

For example, suppose you are building a chatbot

play21:14

that answers FAQs,

play21:16

in that case, we should include the entire conversation

play21:19

leading up to the answer.

play21:21

Suppose you want to add a chain of thought to your answer.

play21:25

You can retrieve the relevant contexts

play21:27

from your documentation

play21:29

and use examples to explain your reasoning.

play21:32

You can use DSPy to help you bootstrap the rationale

play21:36

and have the LLM write it for you.

play21:39

This way, you can have a clear and concise answer

play21:42

that includes all the necessary information.

play21:45

How do we know the quality of synthetic examples?

play21:48

This is a common question

play21:49

when using LLM to create synthetic examples,

play21:53

whether for prompts or for fine-tuning the model.

play21:55

The answer lies in metrics and DSPy.

play21:58

One way to get started is to use an exact match metric.

play22:02

For example, if you have a fact-based question

play22:06

such as what is the temperature of empty space?

play22:10

And the answer is 2.7 degrees Kelvin.

play22:13

An exact match would be a good way

play22:15

to measure the quality of the synthetic example.

play22:19

For example, if you write out the answer

play22:21

instead of using the numerical value,

play22:24

an exact match would not recognize the answer as correct.

play22:29

Moving on to the quality of our synthetic examples,

play22:32

we use teleprompters to optimize the loop,

play22:35

exploring different instructions, writings,

play22:38

and examples in the prompt.

play22:40

This tutorial explores the teleprompter system,

play22:43

suggesting examples for language model components,

play22:47

creating new signatures, and analyzing metrics.

play22:50

We will start by looking at the code.

play22:52

The tutorial aims to provide the experience

play22:55

writing DSPy programs and understanding syntax

play22:59

using off-the-shelf compilers.

play23:02

Hopefully, you will find it helpful for your LLM programs

play23:06

and gain a better understanding of the concepts discussed.

play23:11

So let's start with an example of a DSPy program.

play23:15

Retrieval augmented generation is a popular LLM chain

play23:18

where you retrieve and generate.

play23:21

Another program we will look at has a write-the-query part

play23:26

with two LLM programs.

play23:28

This will give us a quick sense of the syntax.

play23:31

Similar to PyTorch in the LLM program of RAG,

play23:34

we first initialize the components we will use.

play23:37

Then we define how these components interact

play23:41

with the input data and each other in the forward pass.

play23:45

When a user enters a question into our app,

play23:49

we pass this question to our retriever.

play23:52

The retriever brings relevant passages,

play23:54

which we then pass into the answer generator

play23:58

to generate an answer.

play24:00

In DSPy, the signature gives the LLM a sense of the task.

play24:04

It's a shorthand notation of question, context, and answer.

play24:09

You can also write out longer signatures for a prompt.

play24:14

This is similar to organizing prompts

play24:16

using strict typing in libraries.

play24:20

When you need to parse the output of a program,

play24:23

you can use a longer hand notation.

play24:26

This notation allows you to write an initial prompt

play24:29

in the docstring.

play24:31

You can also define types of the different fields

play24:35

and give them a description of the input.

play24:38

However, you can also use a shorthand notation.

play24:42

Anyway, you understand the program now.

play24:44

It's a RAG program.

play24:45

You might feel overwhelmed.

play24:48

So let's discuss a more complex program

play24:50

that involves optimizing two LLM programs.

play24:53

In such programs, multiple components

play24:56

must be optimized separately to achieve impressive behavior

play25:00

as a whole system.

play25:02

SimplifiedBaleen is a multi-hop question-answering system.

play25:07

The concept of multi-hop question answering

play25:09

involves break down complex questions

play25:11

into smaller sub-questions.

play25:13

For example, the dataset presents

play25:15

the question.

play25:16

This question is too complex to be answered directly.

play25:20

So it must be broken down into smaller, more manageable parts.

play25:25

To approach a question using RAG,

play25:28

you must break it down into smaller parts.

play25:31

First, identify the subject of the question,

play25:34

such as the name of the castle,

play25:37

then ask specific questions related to that subject,

play25:41

such as the number of stories in the castle.

play25:45

This technique of breaking down a question into smaller parts

play25:49

is one of the most powerful tools in RAG.

play25:52

Multi-hop questions decomposition is an exciting concept

play25:56

that can take RAG to the next level.

play25:59

It connects syntax with local memory,

play26:02

making program building more effective.

play26:05

Again, first, we need to initialize the components

play26:08

we will use.

play26:10

We start by writing a signature to generate a search query.

play26:14

This signature includes a short description of the task,

play26:18

writing a simple search query

play26:20

that can help answer a complex question.

play26:23

Then we briefly describe the context

play26:26

which can contain relevant facts, questions, and queries.

play26:31

We assign our modules to let them generate queries.

play26:35

We have a list of modules that we can use for this purpose.

play26:41

To simplify things, we can write this

play26:44

self.generate_query equals dspy.ChainOfThought GenerateSearchQuery.

play26:51

This will help us to generate the search query automatically.

play26:54

The interesting point to consider here

play26:57

is the possibility of having a distinct program

play27:01

for the initial search query from the second one,

play27:05

since we will generate queries in a loop.

play27:08

To achieve this, we could use a list.

play27:10

For example, we could use Qdrant for our retrieval purpose.

play27:14

I mean, it must be Qdrant because it's awesome

play27:17

and you have no choice!

play27:19

Next, we have to answer the question

play27:21

based on the information provided.

play27:23

In the forward pass, we have a loop

play27:25

that iterates through the number of hops.

play27:28

Let's assume we want to break down our question

play27:31

into only two questions,

play27:33

which means the maximum number of hops is two.

play27:37

We generate a new question during each iteration

play27:40

by taking the current context, what we have searched,

play27:44

and the question as input.

play27:45

First, we will retrieve the passages

play27:48

and then use a helper function.

play27:51

Note how you can incorporate these helper functions

play27:54

into the forward passes of your LLM programs

play27:57

and how you can use the syntax to write anything

play28:01

you can imagine with these LLM programs.

play28:05

Before finishing up, let's review the entire notebook

play28:08

to combine all the concepts.

play28:10

Firstly, you need your OpenAI key,

play28:13

which you can get in the API key section

play28:16

of your OpenAI account.

play28:18

After that, we import DSPy

play28:19

and connect to GPT 3.5 Turbo.

play28:23

We use Qdrant as a retrieval model

play28:25

to store the document vectors.

play28:27

So we configure the DSPy by setting the language

play28:31

and Qdrant retrieval models.

play28:34

We will be using the HotPot QA dataset

play28:36

to benchmark multi-hop question answering.

play28:39

This dataset consists of 20 training examples

play28:42

and 50 examples for validation with our metric.

play28:46

This is a significant difference

play28:47

from how deep learning used to be done.

play28:50

And we have DSPy which needs only 20 examples

play28:54

to optimize it.

play28:55

So let's take another simple question.

play28:57

We have a docstring that describes the task

play29:01

and provides answers to questions

play29:04

in short factoid answers.

play29:06

Additionally, we have an input field

play29:09

where we can give a description,

play29:12

but it is not required.

play29:15

We can use the name of the variable.

play29:18

We have the output field and its description.

play29:22

An interesting feature of DSPy

play29:24

is that you can inspect the intermediate output

play29:28

and we can do it like this.

play29:31

Here's an example of the uncomplied DSPY.

play29:35

What is the nationally of the chef

play29:37

and restaurateur featured in Restaurant: Impossible?

play29:41

The predicted answer is American.

play29:43

I'm not familiar with it, but let's proceed.

play29:46

Let's return to the DSPy building blocks.

play29:49

If you want to add a chain of thought to your prompts,

play29:53

you can change dspy.Predict to dspy.ChainOfThought.

play29:59

DSPy will provide rationales for you to add

play30:02

an explanation to your prompt, which will make it better.

play30:06

Having an explanation can also help with debugging

play30:10

and improve the performance.

play30:12

Here is an example of how your thinking process works.

play30:17

Please note that this is not compiled.

play30:20

It's just a forward pass.

play30:22

The built-in modules help by adding reasoning

play30:25

to the prompt.

play30:26

This intermediate reasoning has been produced for us.

play30:30

Let's break it down step by step.

play30:32

Remember that this is just a forward pass done

play30:34

by the language model.

play30:36

We haven't compiled this, but we noticed that adding

play30:40

a thinking step can make the switch

play30:43

from American to British.

play30:46

This is a valuable insight that can be quickly implemented.

play30:50

Here is an example of how to connect

play30:53

to one of the retrievers.

play30:55

We can retrieve data with a question

play30:58

and display the output type.

play31:00

Okay, now let's compile our RAG program.

play31:03

So we will start with generating a signature,

play31:06

answer questions with short factoid answers.

play31:09

The input field, I mean the context, has a description.

play31:14

may contain relevant facts.

play31:15

We will get the question from the variable name

play31:18

and answer, often between 1 and 5 words.

play31:23

We are creating our RAG program.

play31:25

To start, we need to initialize the necessary components,

play31:29

retrieve them, and generate a response

play31:31

using a chain of thought.

play31:33

This response will then be passed

play31:35

into the DSPy chain of thought.

play31:38

The DSPY chain of thought will add

play31:41

the reasoning element to the prompt.

play31:43

During the inference, the model will have

play31:46

the ability to reason.

play31:48

However, this time we will compile it.

play31:50

In the forward pass, we fed the question

play31:53

and passages to the question-answering model

play31:55

to obtain the answer.

play31:57

Alright, so now we are discussing

play31:59

the teleprompter, the optimizer.

play32:02

First, we need to define our metric.

play32:04

We will use the exact match, which means that

play32:08

if the answer is British, it must be precisely British.

play32:13

We will also use the passage match,

play32:16

but let's focus on the exact match to keep things simple.

play32:21

The teleprompter will use the bootstrap method

play32:24

and a few shot examples.

play32:26

This means it will add a few shot examples to the prompt.

play32:30

Also, we have supervision, on the retrieval,

play32:33

and that's how it will be optimized.

play32:36

So then you have compiled_rag equals teleprompter.compile RAG

play32:41

and trainSet equals trainSet.

play32:43

Let's run.

play32:44

After optimization, the process stops.

play32:47

Once we have compiled our RAG,

play32:50

we can run inference by passing in the input,

play32:54

similar to how you would do it in PyTorch.

play32:57

And we get this answer to our question.

play33:01

Let's try the TurboInspect history again

play33:03

so we can debug what the model saw last.

play33:07

So in this prompt, we can see some examples

play33:09

of answering questions with short factoid answers.

play33:14

The reasoning behind this is what makes the DSPy

play33:18

valuable for RAG.

play33:20

However, suppose you are using an FAQ dataset

play33:24

and want to chat with your docs.

play33:27

In that case, you may not want to keep writing

play33:31

the same reasoning over and over again.

play33:34

If you want to examine the parameters,

play33:37

which are the examples, here is how you can do it.

play33:40

Similarly, if you want to evaluate it,

play33:43

you can pass the dataset through it and get the metrics.

play33:47

It runs the forward pass of your program

play33:50

and provides you with the same match

play33:53

and pass match for each example.

play33:56

I still need to run more examples,

play33:59

but the purpose of this video

play34:01

is to help you understand the concept.

play34:04

Hopefully this example introduced you

play34:06

to how to write the syntax.

play34:09

I hope you enjoyed the overview of DSPy.

play34:12

We covered topics such as the programming model,

play34:15

the compiler and the introduction example.

play34:18

We looked at basic question answering,

play34:21

adding a chain of thought reasoning,

play34:23

RAG, and other related things.

play34:26

So let's continue the conversation in the comments.

play34:29

Thank you so much for watching!

play34:31

And I see you in the next one!

Rate This

5.0 / 5 (0 votes)

Related Tags
AI開発DSPY言語モデルタスク自動化プログラミング効率化並列処理最適化ニューラルネットワークプロンプト最適化
Do you need a summary in English?