Getting started with DSPy tutorial
Summary
TLDRDSPYは人工知能における革新的な開発で、大規模言語モデルの呼び出しを連鎖させるLangchainのアイデアに似ています。DSPYはPyTorchのような新しい構文を導入し、LLMプログラムの制御と柔軟性を高めます。この方法は、複雑なタスクを単純なサブタスクに分解し、並列化することで、さまざまな制御メカニズムを利用できるようになります。DSPYは、プロンプト内の指示と例を自動的に最適化し、望ましい行動を引き出すことを目指しています。DSPYプログラミングモデルは、PyTorch、エージェント構文、LLMプログラムの組み合わせとして描かれており、より高度なLLMプログラミングを実現する可能性を秘めています。
Takeaways
- 🚀 DSPYは人工知能における革新的な開発で、大規模言語モデル呼び出しの連鎖を可能にする。
- 🔗 LLM APIの活用により、複雑なタスクを研究、書込み、編集に分解し、並列化して制御が可能。
- 🌟 DSPYはPyTorchに似た新しい構文を導入し、LLMプログラムの制御と柔軟性を高める。
- 📝 DSPYは最適化されたインストラクションを用いて、プロンプト内のタスクを自動的に最適化する。
- 🔄 複雑な質問を小さく分け、多段問答を行うことで、より効果的な回答が可能。
- 🔄 DSPYのプログラミングモデルは、PyTorch、エージェント構文、LLMプログラムを組み合わせたもの。
- 📚 DSPYは構造化された入力と出力プロンプトを提供し、LLMプログラムの一貫性を確保。
- 🔧 DSPYはLLMモジュールどうしの相互作用をプログラム的に制御できる。
- 🔧 DSPYのアサーション機能は、LLMプログラムの品質と性能を向上させる。
- 📈 DSPYコンパイラは、プロンプトの最適化を自動化し、新しい言語モデルに対応する。
- 🔍 DSPYは合成例をブートストラップして生成し、プロンプトやモデルのファインチューニングに役立つ。
Q & A
DSPYとは何ですか?
-DSPYは、人工知能分野における革新的な開発で、大規模言語モデルの呼び出しを連鎖させるLangchainのアイデアに似ています。
LLM APIとは何ですか?
-LLM APIは、アプリケーションに統合したり、複雑なプログラムを作成するために使用されるものです。言語モデルの1つの呼び出しの出力を次のモデルに入力として与えることができます。
DSPYの新しい構文はどのように機能しますか?
-DSPYはPyTorchに似た新しい構文を導入し、LLMプログラムに対するより細かい制御と柔軟性を提供します。
DSPYの最適化とは何を意味しますか?
-DSPYの最適化は、タスクのLLMプロンプトにおける指令を最適化することを意味し、これにより特定のタスクに対してより良いパフォーマンスを発揮させることができます。
DSPYのプログラミングモデルはどのように機能しますか?
-DSPYのプログラミングモデルは、PyTorch、エージェント構文、LLMプログラムを組み合わせたもので、構造化された入力と出力プロンプトを持ち、LLMモジュールどうしの相互作用をプログラム的に制御することができます。
DSPYのシグネチャとは何ですか?
-DSPYのシグネチャは、タスクのプロンプトを定義するための方法で、入力と出力のフィールドを定義することで、LLMプログラムの一貫性を確保します。
DSPYのコンパイラは何の役割を果たしますか?
-DSPYのコンパイラは、プロンプトの最適化を自動化し、手動でのプロンプトチューニングやプロンプトエンジニアリング、手動の例の作成を排除することを目的としています。
DSPYにおけるマルチホップ質問の回答とは何ですか?
-マルチホップ質問の回答は、複雑な質問をより小さなサブクエスチョンに分解し、効果的に回答するためのアプローチです。
DSPYで合成例を作成する方法は何ですか?
-DSPYでは、ブートストラップを使用して合成例を作成することができます。これにより、より小さな、より正確なモデルをトレーニングするか、プロンプトでの例として使用することができます。
DSPYのテレプロンプターとは何ですか?
-DSPYのテレプロンプターは、プロンプトの最適化を支援するツールで、組み込みプロンプトを使用して、LLMの最適な応答を生成するためのヒントを提供します。
Outlines
🤖 DSPYと人工知能の進歩
DSPYは人工知能の革新的な開発で、大規模言語モデルの呼び出しを連鎖させるLangchainのアイデアに匹敵する。LLM APIの統合と複雑なプログラムの作成、タスクの平行化、DSPYの新しい構文、最適化命令、JSON出力の効率化、DSPyの自動化と最適化、DSPyプログラミングモデルの紹介が含まれている。
🔍 複雑な質問へのアプローチとDSPYの利点
複雑な質問を小さく分け、効果的に回答する方法とAutoGPTの概念が説明されている。DSPYプログラムモデルの焦点、クエリの生成とリトロイバーの使用、応答の生成、ChainGPTの能力の拡大、LangChainとLlama Indexのアプローチ、DSPYのフレームワークとグラフの使用が含まれている。
📝 DSPYの利点とプログラム言語の役割
LLMのためのプログラム言語を持つ2つの主な利点と、DSPYのシグネチャを使用したプロンプトの整理、LLMモジュールの相互作用の制御、DSPYのコントロールフロー、DSPYのアサーションが説明されている。
🔧 DSPYコンパイラとインストラクションのチューニング
DSPYコンパイラの理解、インストラクションのチューニングの目標、プロンプトの最適化、DSPYのプロンプトの最適化、LLMのチューニングフレームワークの使用、プロンプトの例とフィールドの定義、DSPYのコンパイルとテストが含まれている。
🚀 DSPYフレームワークとブートストラップ
DSPYフレームワークのエキサイティングな部分、ブートストラップを使用した例の作成、ブートストラップの使用ケース、DSPYの理由付けとLLMの使用、合成例の品質の測定、DSPYのメトリックスとDSPYの使用が含まれている。
📚 RAGプログラムとテレプロンプターの最適化
RAGプログラムの作成、DSPYのシグネチャとプロンプトの解析、RAGの応答の生成とDSPYのチェインオブスロットの使用、テレプロンプターの最適化プロセス、メトリックスの定義と使用、RAGのコンパイルと推論、DSPYのプログラミングモデルとコンパイラの紹介が含まれている。
Mindmap
Keywords
💡DSPY
💡LLM APIs
💡Chain of Thought
💡Optimization
💡Teleprompter
💡Multi-hop Question Answering
💡Bootstrapping
💡Synthetic Examples
💡RAG (Retrieval Augmented Generation)
💡Inductive Biases
Highlights
DSPY is a groundbreaking development in AI, introducing a new syntax similar to PyTorch for LLM programs.
LLM APIs can be integrated into apps or used to create complex programs by chaining language model calls.
DSPY combines new syntax with optimization, allowing for more efficient and effective LLM tasks.
The DSPY programming model simplifies complex tasks by breaking them down into research, writing, and editing.
DSPY introduces a signature syntax for improving the appearance of LLM program codes.
LLMs like GPT-4 and Gemini Ultra can deduce variable meaning from names, enhancing DSPY's functionality.
DSPY allows for the creation of LLM agents and defines how they interact with input data to produce outputs.
Multi-hop question answering involves breaking down complex questions into smaller sub-questions for effective answering.
DSPY enables the optimization of instructions and examples used in prompts automatically.
DSPY programming model is a combination of PyTorch, agent syntax, and LLM programs.
DSPY provides more control and flexibility over LLM programs compared to frameworks like LangChain and Llama Index.
DSPY allows for structured input and output prompts, ensuring consistency in LLM programs.
DSPY enables programmatic control over LLM modules' interactions, unlocking customization possibilities.
DSPY assertions are a feature that will be discussed in a separate paper, offering further control mechanisms.
DSPY aims to optimize prompts automatically, reducing the need for manual prompt tuning and engineering.
DSPY can generate synthetic examples for prompts or fine-tuning models, utilizing few-shot learning concepts.
DSPY's teleprompter system suggests examples for language model components, creating new signatures, and analyzing metrics.
DSPY's compiler eliminates the need for manual prompt tuning by optimizing task signatures and descriptions.
DSPY's approach to LLM programming is compared to PyTorch's impact on deep learning model training.
Transcripts
DSPY is a groundbreaking development in artificial intelligence,
comparable to the advent of Langchain idea
of chaining large language model calls.
Thinking about LLM APIs, they are exciting.
They can be integrated into apps or used
to create complex programs where the output of one language model
call can be fed as input to the next.
We can simplify complicated tasks, such as writing pull requests
or blog posts by breaking them down into research, writing,
and editing tasks.
By parallelizing these tasks, we can
take advantage of various control mechanisms.
DSPY introduce a new syntax similar to PyTorch,
which gives us more control and flexibility over our LLM programs.
The exciting thing about DSPY is that it combines a new syntax
with optimization.
This means that we have optimizing instructions
for tasks in our LLM prompts.
For example, suppose your task is to re-rank documents
and you have a re-ranking agent.
In that case, a particular phrasing in results
in better performance than others.
Also, if you need to output JSON, we will find ways
to make it more efficient.
Instead of saying, please, output JSON or give me JSON,
just use the optimized phrasing to get the best results.
The idea behind DSPy is to optimize the instructions and examples
used in the prompt automatically.
This is done to elicit the desired kind of behavior
when using the DSPy programming model.
Many exciting things can be achieved through this process.
The DSPy programming model is where the story begins.
It can be described as a combination of PyTorch, agent syntax,
and LLM programs.
Let's dive in and explore what the process entails.
To begin with, we initialize the components
required for our LLM program.
We start with a retrieval system, such as Qdrant,
and connect it with a query generator and an answering
mechanism.
We have two distinct LLM components or prompts
that carry out specific tasks.
These could be fine-tuned models that
are specialized for their roles in the overall LLM program,
which we will refer to as the logic in the forward pass.
Our first component is the query generator.
And there are a few things to keep in mind about it.
To begin with, we need to give the component a name, which
we will call gen_query.
Next, we have the DSPy.chainOfThought,
which we will discuss later.
Let's start by talking about the signature.
As we continue with this lecture,
we will see that there is an alternative way of expanding
the signature by writing a longer initial prompt
in the docstring and then adding a typed
input-output field.
A feature in DSPy allows you to improve
the appearance of LLM program codes
by using signature syntax.
The short syntax involves defining the context,
question, and query fields.
When you input the context and question,
the DSPy will parse it and output the query.
The LLMs, or large language models,
have an impressive ability to deduce
the meaning of a variable just from its name.
For instance, in a context question,
powerful models like GPT-4 or Gemini Ultra
can accurately understand the context from the name alone.
Therefore, the second component of our program
involves another LLM that generates the answer.
This program takes the context and question as input
and returns the answer.
The exciting part is that you have the power
to build LLM agents and define how they interact
with the input data to produce an output.
To start, we define an empty list called context.
This also enables thinking about how we can incorporate
local memory into our forward pass
and how we can use non-parametric functions
in the forward pass of our program.
We start with an empty list of context
and loop through it, which can be set
as a hyperparameter in the program
to determine how many hops are required.
Multi-hop question answer involves breaking down
a complex question into smaller sub-questions
to effectively answer it.
This approach is like the concept behind AutoGPT,
which gained public attention when it was first introduced.
The agent can evaluate each sub-question
to determine if enough information has been compiled
to consolidate the results.
This process is referred to as multi-hop question
decomposition.
Our focus shall be on the DSPy program model.
To generate a query, we take the context
and the input question as input.
This process helps us create the query.
After generating the query,
we pass it through our retriever Qdrant.
The retriever then provides us with the required context.
We keep looping through the process
until we have all the context we need
to answer the question.
The number of loops depends on the context.
After generate a response,
we can call it by inputting it into the conversation.
The release of ChatGPT amazed everyone
with its ability to fluently converse, answer questions,
and create a YouTube script
inspired by a favorite book or chat.
We discovered we could take these large language models
like ChatGPT even further by connecting them in chains
to create more complex language models.
LangChain and Llama Index work has been evangelizing
a new way of building applications
using large language models, which is the future.
The approach they propose involves using chains
that overcome the input length limitations.
Chains were traditionally used
to break up complex inputs into smaller chunks,
process each chunk individually,
and then combine the outputs to process long documents.
This approach is still valid,
especially given the challenges of supervising models
with size of up to 32K.
Determining whether these models attend
to all the inputs is difficult,
so breaking up the input length remains valuable.
The second challenge is overcoming complex tasks.
For example, it can be overwhelming
if you ask ChatGPT to write a blog post
on how to run your application on Kubernetes
and retrieve your code.
It's better to break down complex tasks
into smaller sub-tasks and define a workflow
for the language model to follow to complete your request.
Chains have greatly improved search capability.
For instance, that is the example
of multi-hop question answering.
In this process, we formulate a query,
retrieve information,
potentially loop back to retrieve more context,
and then we use that information to answer the question.
One way to improve this process is to use language models
to create a filter for our search.
Re-ranking documents with large language models
have also played a significant role in search.
As language models have evolved,
it has become clear that they can be better
represented as graphs.
For example, we now have a LangGraph,
and we can think about this as text transformation graphs.
We have graphs of computation
where edges pass along the transformation of text.
We input and output text sent along the edge
to the next node for further transformation.
For instance, we can spin up three separated process
of writing and editing a story in parallel.
Then we can sync these nodes
into a published stories component in our LLM program.
This can be used to produce newsletter based one
the news of a specific week,
like what happened on Yahoo Finance last week.
We can parallelize the process of writing
and editing the story,
and then sync it into another part of the program.
We must aggregate these stories
and combine all the information
to create a coherent narrative.
Before you delve further into DSPY,
I highly recommend checking out the LLM program galleries
that both Llama Index and LangChain have created.
While frameworks are great for building LLM chains,
graphs, agents, and programs,
they have limited flexibility.
That's where DSPy programming comes in
as a complete LLM programming language.
In contrast to frameworks,
DSPy offers more exciting possibilities
for LLM programming without prompting.
There are two main advantages
to having a program language for LLM.
First, it allows you to have a structured input
and output prompts to consistently express your ideas
within your programs.
Second, it will enable you to control
how your LLM modules interact
with each other programmatically.
And this unlocks the flexibility
to customize the LLM program
to suit your needs and imagination
while using the LLM API.
Okay, let's begin by discussing how to clean up your prompts
and structured input outputs.
To achieve this, DSPy uses a signature.
In this example we have the GenerateAnswer
that inherits a signature.
Also, we write a docstring that describes the task prompt.
In this particular context,
the task involves providing concise answers to questions.
Later in the video,
we will discuss how DSPy can assist you
in optimizing these prompts.
You can give a general overview of the task
and DSPy will take care of the rest.
You don't need to tweak the language
because even subtle changes
can significantly impact the performance.
DSPY will optimize your instruction,
but we can discuss that later.
There is a method for defining input and output fields
that provide a consistent syntax for the prompts
and structured outputs of all components in LLM programs.
This is one way to ensure consistency in your LLM programs.
Another exciting feature is the ability to control
how the LLM modules interact
with each other programmatically.
Okay, let's talk about the controls in LLM programming.
For example, use a specific syntax to create a for loop
in your program.
You can even access hops that are not in this example.
You can interface these loops
and write more complex code with if-else statements.
Here is a quick example inputting a stock ticker.
You can prompt the program to output financial details
if it's about a specific company.
I have a program that first processes my financial databases
and identifies whether they relate to some ticker.
If they are, I can ask the program to research
the company's performance and some reports.
I can also use the program to do a web query
and get more information about market trends
and recent news.
Then the program will generate a query
and send it to the API.
After that, the API will respond with the required context.
Then I can write potential investment insights
and send them via email.
However, I would review it first
before sending it to anyone.
The point is that you can have a good control flow
for the LLM programs by using the for loop,
if statement and local memory.
The next big thing is DSPy assertions,
which will be discussed in a separate paper.
All right, so by now, I hope you are convinced
of the DSPy syntax and how it can offer you more control
and flexibility over your LLM programs.
So DSPy is like PyTorch for the LLM program.
PyTorch is a popular deep learning model training framework.
This tweet explains a lot about PyTorch.
Two main things make PyTorch stand out.
Its syntax for defining neural networks layers
and its eager execution feature.
PyTorch and TensorFlow both have different ways
of implementing eager execution.
To use PyTorch, you must first define a neural network
and initialize the layers you will use.
When defining a layer such as a convolution,
you must specify the input and output
determining the graph transformation.
It's like matrix multiplication,
where the dimensions of the matrices have to match.
You need to define the layout of the neural network
and ensure that the input and output layers
are compatible with each other.
After that, you define the forward pass,
which determines how the network processes the input.
We now have the syntax for defining components
in LLM programs.
You can initialize the program by defining its components.
Next, you define how the forward pass will look.
There were some excellent analogies
for designing the DSPy inspired by PyTorch.
The first important point is that
we should rely one more than one layer
to perform all the work.
We must add inductive biases and depth
to improve the model's efficiency.
For instance, in the convolutional PyTorch network,
the convolution has an inductive bias
of the weight-sharing kernel
as it slides across an image pixel matrix.
Similarly, we can observe that signatures
have this inductive bias
of what the part of the program is supposed to do.
Suppose a program has a specific context,
question and query.
In that case, it is an inductive bias
for that particular program part.
This component of the program is designed
to perform a specific task.
The idea of inductive biases is fascinating.
Okay, let's discuss a big concept, the DSPy compiler.
The best way to understand the concepts behind this
is to start testing it with the program
you have in mind to optimize.
Let's dive into the instruction tuning.
The goal is to eliminate the need
for manual prompt tuning, prompt engineering
and manual example writing.
For example, when training a re-ranker agent,
one might experiment with various phrasings
of the instruction.
It is important to note that how you phrase your prompt
for re-ranking documents can affect the performance
of different language models.
For example, a prompt that works well with GPT-4
might not work well with Gemini Ultra or Llama 2.
Therefore, the optimal phrasing for your prompt
will depend on the specific language model you are using.
It's important to fine tune your ending prompts
to stay up-to-date with the latest language models.
With the new language models emerging every month or so
for at least the next year or year and a half,
keeping your LLM programs current is crucial.
Using an automatic tuning framework
can help you quickly and easily plug in a new language model
and determine which prompt will generate
the desired response.
A fun thing happens sometimes when someone asks
for something specific like,
"I will pay you $1 million to output JSON."
These requests can be confusing and difficult to understand.
However, DSPy aims to solve this problem
by starting with a basic signature
and then optimizing it to create the best possible shorthand
for answering questions.
This shorthand could be used for quickly answering
fact-based questions or providing context
for more complex queries.
It is going to optimize a more detailed description
of the test.
The way it works is interesting.
There are built-in prompts in the DSPy compiler
that can be used to end pre-existing prompts in the chains.
However, there are some prompts on how to use LLMs
to optimize LLMs.
So we have this prompt for optimizing the instruction.
So you are an instruction optimizer
for larger language models.
I will give you a signature of fields,
inputs and outputs in English.
Your task is to propose an instruction
that will lead to a good language model
to perform the task well.
Don't be afraid to be creative.
So don't be afraid to be creative for the last part.
That's what we are hoping to end with DSPy.
After completing that, you can propose some instructions.
This prompt takes multiple instructions
and combines them into one.
It uses sampling to create multiple outputs
and then aggregates them to produce the final result.
This is how we optimize the tasks,
signature and description.
Examples have played a crucial role
in the development of deep learning.
In the past research papers often described datasets
consisting of hundreds of thousands of examples,
such as the Squad Question Answering dataset.
These examples were focused on human-written
natural language inference.
They were used to identify entailment, contradiction
and other related phenomena.
In the past, people used to create
massive human-labeled datasets.
Today, with the help of generative models,
we can generate training data to make smaller,
more precise models or use them as examples in prompts.
This concept is known as few-shot learning
and it was explained in the GPT-3 paper released in 2020.
It was surprising that you achieved this task
without any prior examples.
The term zero-shot refers to the situation
where you only have the task description to work with
and need to create a clear set of instructions
without any examples to guide you.
One shot indicates that you have one example to work with,
while a few shots means you have a few examples.
One of the most exciting parts of the DSPY framework
is that you can create examples by bootstrapping.
You can prompt GPT-4 and Gemini Ultra,
but you can also prompt the Mistral 7B or a Llama 2,
depending on when you want to fine-tune these models
versus a few shots.
When using bootstrapping, the question arises,
which examples should be included in the prompt?
For example, we have 10 examples
and are trying to translate from English to Portuguese.
In that case, we may only want to include
three of those examples in the input.
Another use case for bootstrapping
is when we want to train a model
to understand a chain of thought
rather than just input-output pairs.
In this case, we should include examples
that show how a person arrived at a particular answer
in addition to the answer itself.
For example, suppose you are building a chatbot
that answers FAQs,
in that case, we should include the entire conversation
leading up to the answer.
Suppose you want to add a chain of thought to your answer.
You can retrieve the relevant contexts
from your documentation
and use examples to explain your reasoning.
You can use DSPy to help you bootstrap the rationale
and have the LLM write it for you.
This way, you can have a clear and concise answer
that includes all the necessary information.
How do we know the quality of synthetic examples?
This is a common question
when using LLM to create synthetic examples,
whether for prompts or for fine-tuning the model.
The answer lies in metrics and DSPy.
One way to get started is to use an exact match metric.
For example, if you have a fact-based question
such as what is the temperature of empty space?
And the answer is 2.7 degrees Kelvin.
An exact match would be a good way
to measure the quality of the synthetic example.
For example, if you write out the answer
instead of using the numerical value,
an exact match would not recognize the answer as correct.
Moving on to the quality of our synthetic examples,
we use teleprompters to optimize the loop,
exploring different instructions, writings,
and examples in the prompt.
This tutorial explores the teleprompter system,
suggesting examples for language model components,
creating new signatures, and analyzing metrics.
We will start by looking at the code.
The tutorial aims to provide the experience
writing DSPy programs and understanding syntax
using off-the-shelf compilers.
Hopefully, you will find it helpful for your LLM programs
and gain a better understanding of the concepts discussed.
So let's start with an example of a DSPy program.
Retrieval augmented generation is a popular LLM chain
where you retrieve and generate.
Another program we will look at has a write-the-query part
with two LLM programs.
This will give us a quick sense of the syntax.
Similar to PyTorch in the LLM program of RAG,
we first initialize the components we will use.
Then we define how these components interact
with the input data and each other in the forward pass.
When a user enters a question into our app,
we pass this question to our retriever.
The retriever brings relevant passages,
which we then pass into the answer generator
to generate an answer.
In DSPy, the signature gives the LLM a sense of the task.
It's a shorthand notation of question, context, and answer.
You can also write out longer signatures for a prompt.
This is similar to organizing prompts
using strict typing in libraries.
When you need to parse the output of a program,
you can use a longer hand notation.
This notation allows you to write an initial prompt
in the docstring.
You can also define types of the different fields
and give them a description of the input.
However, you can also use a shorthand notation.
Anyway, you understand the program now.
It's a RAG program.
You might feel overwhelmed.
So let's discuss a more complex program
that involves optimizing two LLM programs.
In such programs, multiple components
must be optimized separately to achieve impressive behavior
as a whole system.
SimplifiedBaleen is a multi-hop question-answering system.
The concept of multi-hop question answering
involves break down complex questions
into smaller sub-questions.
For example, the dataset presents
the question.
This question is too complex to be answered directly.
So it must be broken down into smaller, more manageable parts.
To approach a question using RAG,
you must break it down into smaller parts.
First, identify the subject of the question,
such as the name of the castle,
then ask specific questions related to that subject,
such as the number of stories in the castle.
This technique of breaking down a question into smaller parts
is one of the most powerful tools in RAG.
Multi-hop questions decomposition is an exciting concept
that can take RAG to the next level.
It connects syntax with local memory,
making program building more effective.
Again, first, we need to initialize the components
we will use.
We start by writing a signature to generate a search query.
This signature includes a short description of the task,
writing a simple search query
that can help answer a complex question.
Then we briefly describe the context
which can contain relevant facts, questions, and queries.
We assign our modules to let them generate queries.
We have a list of modules that we can use for this purpose.
To simplify things, we can write this
self.generate_query equals dspy.ChainOfThought GenerateSearchQuery.
This will help us to generate the search query automatically.
The interesting point to consider here
is the possibility of having a distinct program
for the initial search query from the second one,
since we will generate queries in a loop.
To achieve this, we could use a list.
For example, we could use Qdrant for our retrieval purpose.
I mean, it must be Qdrant because it's awesome
and you have no choice!
Next, we have to answer the question
based on the information provided.
In the forward pass, we have a loop
that iterates through the number of hops.
Let's assume we want to break down our question
into only two questions,
which means the maximum number of hops is two.
We generate a new question during each iteration
by taking the current context, what we have searched,
and the question as input.
First, we will retrieve the passages
and then use a helper function.
Note how you can incorporate these helper functions
into the forward passes of your LLM programs
and how you can use the syntax to write anything
you can imagine with these LLM programs.
Before finishing up, let's review the entire notebook
to combine all the concepts.
Firstly, you need your OpenAI key,
which you can get in the API key section
of your OpenAI account.
After that, we import DSPy
and connect to GPT 3.5 Turbo.
We use Qdrant as a retrieval model
to store the document vectors.
So we configure the DSPy by setting the language
and Qdrant retrieval models.
We will be using the HotPot QA dataset
to benchmark multi-hop question answering.
This dataset consists of 20 training examples
and 50 examples for validation with our metric.
This is a significant difference
from how deep learning used to be done.
And we have DSPy which needs only 20 examples
to optimize it.
So let's take another simple question.
We have a docstring that describes the task
and provides answers to questions
in short factoid answers.
Additionally, we have an input field
where we can give a description,
but it is not required.
We can use the name of the variable.
We have the output field and its description.
An interesting feature of DSPy
is that you can inspect the intermediate output
and we can do it like this.
Here's an example of the uncomplied DSPY.
What is the nationally of the chef
and restaurateur featured in Restaurant: Impossible?
The predicted answer is American.
I'm not familiar with it, but let's proceed.
Let's return to the DSPy building blocks.
If you want to add a chain of thought to your prompts,
you can change dspy.Predict to dspy.ChainOfThought.
DSPy will provide rationales for you to add
an explanation to your prompt, which will make it better.
Having an explanation can also help with debugging
and improve the performance.
Here is an example of how your thinking process works.
Please note that this is not compiled.
It's just a forward pass.
The built-in modules help by adding reasoning
to the prompt.
This intermediate reasoning has been produced for us.
Let's break it down step by step.
Remember that this is just a forward pass done
by the language model.
We haven't compiled this, but we noticed that adding
a thinking step can make the switch
from American to British.
This is a valuable insight that can be quickly implemented.
Here is an example of how to connect
to one of the retrievers.
We can retrieve data with a question
and display the output type.
Okay, now let's compile our RAG program.
So we will start with generating a signature,
answer questions with short factoid answers.
The input field, I mean the context, has a description.
may contain relevant facts.
We will get the question from the variable name
and answer, often between 1 and 5 words.
We are creating our RAG program.
To start, we need to initialize the necessary components,
retrieve them, and generate a response
using a chain of thought.
This response will then be passed
into the DSPy chain of thought.
The DSPY chain of thought will add
the reasoning element to the prompt.
During the inference, the model will have
the ability to reason.
However, this time we will compile it.
In the forward pass, we fed the question
and passages to the question-answering model
to obtain the answer.
Alright, so now we are discussing
the teleprompter, the optimizer.
First, we need to define our metric.
We will use the exact match, which means that
if the answer is British, it must be precisely British.
We will also use the passage match,
but let's focus on the exact match to keep things simple.
The teleprompter will use the bootstrap method
and a few shot examples.
This means it will add a few shot examples to the prompt.
Also, we have supervision, on the retrieval,
and that's how it will be optimized.
So then you have compiled_rag equals teleprompter.compile RAG
and trainSet equals trainSet.
Let's run.
After optimization, the process stops.
Once we have compiled our RAG,
we can run inference by passing in the input,
similar to how you would do it in PyTorch.
And we get this answer to our question.
Let's try the TurboInspect history again
so we can debug what the model saw last.
So in this prompt, we can see some examples
of answering questions with short factoid answers.
The reasoning behind this is what makes the DSPy
valuable for RAG.
However, suppose you are using an FAQ dataset
and want to chat with your docs.
In that case, you may not want to keep writing
the same reasoning over and over again.
If you want to examine the parameters,
which are the examples, here is how you can do it.
Similarly, if you want to evaluate it,
you can pass the dataset through it and get the metrics.
It runs the forward pass of your program
and provides you with the same match
and pass match for each example.
I still need to run more examples,
but the purpose of this video
is to help you understand the concept.
Hopefully this example introduced you
to how to write the syntax.
I hope you enjoyed the overview of DSPy.
We covered topics such as the programming model,
the compiler and the introduction example.
We looked at basic question answering,
adding a chain of thought reasoning,
RAG, and other related things.
So let's continue the conversation in the comments.
Thank you so much for watching!
And I see you in the next one!
5.0 / 5 (0 votes)