使用ChatGPT API构建系统1——大语言模型、API格式和Token
Summary
TLDR本视频提供了对OpenAI的大型语言模型工作原理的全面概述,从它们是如何训练的,到令牌化器如何影响输出,以及聊天格式如何指定系统与用户消息。视频深入探讨了监督学习如何成为训练这些模型的核心构建块,以及如何通过预测下一个词来构建大型语言模型。此外,还介绍了基本模型与指令调整模型的差异,以及如何通过人类反馈进行强化学习来提高模型质量。最后,视频还提供了关于如何安全地调用OpenAI API、如何有效地利用提示以及如何通过聊天格式来精细控制模型输出的实用技巧。
Takeaways
- 😀 LMs are trained using supervised learning to predict the next word given a sequence of words
- 😃 Instruction-tuned LMs like ChatGPT try to follow instructions rather than freely generate text
- 🤓 The tokenizer breaks text into commonly occurring sequences called tokens
- 😮 Reversing words is hard for LMs because they see tokens, not individual letters
- 📝 The chat format lets you specify system and user messages to control the LM's behavior
- 👍 Getting human feedback helps further improve LM quality over time
- 🔢 On average each token is about 4 characters or 3/4 of a word long
- ⚠️ Storing API keys locally is more secure than having them in plain text
- 🚀 Prompting lets you build AI apps much faster than traditional ML methods
- 📋 This doesn't work as well for structured/tabular data as for text
Q & A
大型语言模型是如何工作的?
-大型语言模型通过使用监督学习来预测下一个词汇,基于大量的文本数据进行训练,从而能够在给定的文本片段后预测出最可能的下一个词汇。
在训练大型语言模型时,监督学习是如何应用的?
-在监督学习中,模型通过学习输入和输出(X到Y)的映射关系,使用标记过的训练数据来进行学习。例如,通过给定的句子片段预测下一个词汇。
什么是基础语言模型和指令调整语言模型?
-基础语言模型通过文本训练数据重复预测下一个词汇。指令调整语言模型则进一步通过微调,让模型能够根据输入的指令产生输出,从而更好地遵循指令。
指令调整语言模型的训练过程是怎样的?
-首先训练一个基础语言模型,然后通过在更小的示例集上进行进一步的微调,这些示例显示了如何根据指令生成响应。这个过程可以在更小的数据集上,使用更少的计算资源完成。
什么是从人类反馈中学习的强化学习(Rohf)?
-这是一种通过获取人类对语言模型输出质量的评价(例如,是否有用、诚实和无害),然后进一步调整模型以提高生成高评价输出的概率的过程。
如何改善大型语言模型处理单词游戏或特定任务的能力?
-通过改变输入的格式,例如,在单词之间添加短划线或空格,可以帮助模型更好地理解和处理这些任务。
在使用OpenAI API时,如何更安全地管理API密钥?
-推荐使用环境变量来存储API密钥,而不是直接在代码中以明文形式输入。这可以通过读取本地的.env文件来实现,从而提高安全性。
使用大型语言模型和传统的机器学习应用开发相比,有什么优势?
-使用大型语言模型,可以通过简单地定义提示(prompting)快速构建文本应用,从而大大缩短从开发到部署的时间,这一过程可能只需几分钟到几小时。
为什么大型语言模型对于处理结构化数据应用不那么有效?
-因为这些模型主要是针对非结构化数据(如文本或图像)设计的。对于包含大量数值和表格数据的结构化数据应用,它们的效果可能不如专门的机器学习方法。
如何使用系统、用户和助理消息来定制大型语言模型的输出?
-通过指定系统消息来设定总体行为或语调,然后通过用户消息提供具体指令,可以定制语言模型的输出,使其符合既定的风格或回答特定的请求。
Outlines
🤖 大型语言模型的工作原理
这一部分介绍了大型语言模型(LM)的基础,包括它们是如何通过监督学习训练的,以及如何使用标记化(Tokenization)过程影响模型的输出。解释了大型语言模型通过预测下一个词来生成文本的过程,并区分了基础模型与经过指令调整的模型(如ChatGPT),后者能更好地遵循给定的指令。此外,还讨论了通过使用人工标注的数据对模型进行微调,以改进其遵循指令的能力。
🔍 优化模型输出的技术
介绍了如何通过人类反馈来优化语言模型的输出,特别是通过使用来自人类反馈的增强学习(Reinforcement Learning from Human Feedback, ROHF)技术来提高输出的质量。这一过程比基础模型的训练更快、所需的数据集更小,且计算资源要求更低。还展示了如何使用OpenAI提供的API和帮助函数来获取模型的输出,并讨论了模型是如何处理和预测“令牌”(而不是单个字词)的,这对于理解模型的工作方式至关重要。
📝 使用API和消息格式的高级技巧
探讨了使用API调用大型语言模型时的高级技巧,包括如何通过定义系统、用户和助理消息来精细控制模型的行为。通过具体例子展示了如何设置不同的系统消息来指定输出的风格和长度,以及如何通过添加助理消息来管理多轮对话。此外,提供了一种方法来估算使用的令牌数量,这对于优化和控制API调用非常有用。
🔐 安全地管理API密钥和提示编程的革命
讨论了如何安全地管理和使用API密钥,推荐了使用环境变量而不是将密钥硬编码到代码中的方法。强调了提示(Prompting)编程在AI应用开发中的革命性影响,使得原本需要数月才能完成的任务现在可以在几小时内完成。虽然这种方法适用于处理非结构化数据,如文本和图像,但对于结构化数据应用则不那么有效。最后,展望了使用大型语言模型,特别是在文本应用中的前景,并预告了下一个视频的内容。
Mindmap
Keywords
💡大型语言模型
💡监督学习
💡Tokenizer
💡基础LM
💡指令调优LM
💡强化学习
💡API调用
💡Token
💡提示技巧
💡环境变量
Highlights
Overview of how OpenAI's large language models (LMs) work, including training and output influence.
Explanation of supervised learning as a fundamental method for training LMs.
Description of the process of teaching LMs to predict the next word in a sentence.
Distinction between base LMs and instruction-tuned LMs like ChatGPT.
Explanation of the process of fine-tuning base LMs with human feedback.
The use of reinforcement learning from human feedback (RLHF) to improve LM outputs.
Discussion on computational resources and time required for training LMs.
Demonstration of how to use an LM with Python libraries and the OpenAI API.
Explanation of tokenization in LMs and its impact on text processing.
Techniques for improving LM responses in specific tasks like reversing letters.
Overview of token limits in models like GPT-3.5 Turbo and handling large inputs.
Introduction to the chat format for LMs specifying system, user, and assistant messages.
Illustration of customizing LM responses using system messages in chat format.
Examples of using system messages for style and length control in LM responses.
Tips on securely managing API keys for using LMs.
Comparison of traditional ML workflows with prompting-based ML for rapid development.
Transcripts
in this first video I'd like to share
with you an overview of how om's large
language models work we'll go into how
they are trained as well as details like
the tokenizer and how that can affect
the output of when you prompt an LM and
we'll also take a look at the chat
format for LMS which is a way of
specifying both system as well as user
messages and understand what you can do
with that capability let's take a look
first how does a large language model
work you're probably familiar with the
text generation process where you can
give a prompt I love eating an awesome
om to fill in what the things are likely
completions given this problem and it
may say Vegas with cream cheese or my
Mother's Meatloaf or else with friends
but how did the model learn to do this
the mates who used to train in LOM is
actually supervised learning
in supervised learning a computer learns
and input outputs or extra y mapping
using label training data so for example
if you're using supervised learning to
learn to classify the sentiment of
restaurant reviews you might collect a
trading set like this where a review
like the pest driving time which is
great is labeled as a positive sentiment
review and so on and
service is slow the food was also is
negative and the Earth Gray tea was
fantastic has a positive label by the
way both Ezra and I were born in the UK
and so both of us like our Gray tea and
so their process for supervised learning
is typically to get label data and then
train a model on data and after training
you can then deploy and call the model
and give it a new restaurant review like
best pizza I've had you hopefully output
that has a positive sentiment
it turns out that supervised learning is
a core building block for training large
language models specifically a large
language model can be built by using
supervised learning to repeatedly
predict the next word let's say that
in your training sets of a lot of text
Data you have to sentence my favorite
food is a bagel with cream cheese and
locks then this sentence is turned into
a sequence of training examples where
given a sentence fragment my favorite
fruit is a if you want to predict the
next word in this case was Bagel or
given the sentence fragment or sentence
prefix my favorite food is a bagel the
next word in this case would be with and
so on and given the last training sets
of hundreds of billions or sometimes
even more words you can then create a
massive training set where you can start
off with part of a sentence or part of a
piece of text and repeatedly ask the
language model to learn to predict what
is the next word so today there are
broadly two major types of large
language models the first is a base om
and the second which is what is
increasingly used is the instructions
you know um
so the base om repeatedly predicts the
next word based on text training data
and so if I give it a prompt once upon a
time there was a unicorn then it may by
repeatedly predicting one word at the
time come up over completion that tells
a story about their unicorn living in
the magical forest with all unicorn
friends
now the downside of this is that if you
were to prompt it with what is the
capital of France quite plausible that
on the internet there might be a list of
quiz questions about France so let me
complete this with what is France's
largest city what is France's population
and so on
but what you really want is you wanted
to tell you what is the capital of
France probably rather than list all
these questions
so the instruction tune om instead tries
to follow instructions and will
hopefully say the couple of friends is
Paris
how do you go from a base LM to the
instruction Tunes om
this is what the process of training an
instruction to an LM like chat GPC looks
like you first train a base om on a lot
of data so hundreds of billions of words
maybe even more and this is a process
that can take months on a large
supercomputing system
after you've trained the base LM you
would then further train the model by
fine-tuning it on a smaller set of
examples where the output
follows an input instruction and so for
example you may have contractors
help you write a lot of examples of an
instruction and then a good response to
an instruction and that creates a
training set to carry out this
additional fine-tuning so the learns to
predict what is the next word if it's
trying to follow an instruction
after that to improve the quality of the
lm's output a common process now is to
obtain human ratings of the quality of
many different LM outputs on criteria
such as whether the output is helpful
honest and harmless and you can then
further tune the LM to increase the
probability of its generating the more
highly rated outputs and the most common
technique to do this is rohf which
stands for reinforcement learning from
Human feedback
and whereas training the base our own
can take months the process of going
from the base LM to the instructions You
Know M can be done in maybe days on much
more on a much more modest size data set
and much more modern size computational
resources
so this is how you would use an OM
um you can import a few libraries
I'm going to load my open AI key here
I'll say a little bit more about this
later in this video and here's a helper
function to get a completion given a
prompt if you have not yet installed the
open AI package on your computer you
might have to run pip install open AI
but I already have an install here so I
won't run that and let me hit shift
enter
to run these and now I can set response
equals
get completion
what is the capital of France
and hopefully
it will
give me a good result
now
about
now in the description of the large
language model so far I talked about it
as predicting one word at a time but
there's actually one more important
technical detail if you were to tell it
take the letters in the word word
lollipop
and reverse them
this seems like an easy task maybe like
a four-year-old could do this toss but
if you were to ask chai GPT to do this
it actually
outputs are somewhat gobble whatever
this is this is not l-o-l-i-p-o-p this
is not
lollipops lets us reverse so why is Chad
GPC unable to do what seems like a
relatively simple task it turns out that
there's one more important detail for
how a large language model Works which
is it doesn't actually repeatedly
predict the next word it instead
repeatedly predicts the next token and
what that LM actually does is it will
take a sequence of characters like
learning new things is fun and group the
characters together to form tokens that
comprise comedy occurring sequences of
characters so here learning new things
is fun each of them is a fairly common
word and so each token corresponds to
one word or one word in a space or an
exclamation mark
but if you were to give it input with
some somewhat less frequently used words
like prompting a powerful developer to
the word prompting is still not that
common in the English language that's
certainly gaining a popularity and so
prompting is actually broken down to
three tokens with prompt and ing because
those three are commonly occurring
sequences of letters
and if you were to give it the word
lollipop
the tokenizer actually breaks us down
into three tokens L and O and epop and
because chai GPT isn't seeing the
individual letters is instead seeing
these three tokens
is more difficult for it to correctly
print out these letters in reverse order
so here's a trick you can use to fix
this
if I were to add dashes
between these letters and spaces would
work too or other things would work too
until I take the lesson a lot of the
apartment reverse them then it actually
does a much better job this Loi pop and
the reason for that is if you pass it
lollipop with dashes in between the
letters it tokenizes each of these
characters into the individual token
making it easier for it to see the
individual letters and print them out in
reverse order
so if you ever want to use chai GPD to
play a word game like where are they all
Scrabble or something this Nifty trick
helps it to better see the individual
letters of the words
for the English language one token
roughly on average corresponds to about
four characters or about three quarters
of a word
and so different large language models
will often have different limits on the
number of input plus output tokens it
can accept the input is often called the
context and the output is often called
the completion and the model GPT 3.5
turbo for example is the most commonly
used check GPT model has a limit of
roughly 4 000 tokens in the inputs plus
output
so if you try to feed in an input
context that's much longer than this so
actually throw an exception of
generating an error
next I want to share with you another
powerful way to use an API
which involves specifying separate
system user and assistant messages
let me show you an example then we can
explain in more detail what is actually
doing
here's a new helper function called get
completion from messages and when we
prompt this LM we are going to give it
multiple messages here's an example of
what you can do
I'm going to specify first a message in
the row of a system so this a system
message
and the contents of the system messages
you're an assistant who responds to the
style of Dr Seuss
then I'm going to specify a user message
so the row of the second message is row
user and the content of this is write me
a very short poem about a happy carrot
and so let's run that
and with temperature equals one I
actually never know what's going to come
out but okay that's a cool point oh how
Jolly is this character that I see and
it actually Rhymes pretty well all right
well done chat GPD
and so in this example the system
message specifies the overall tone of
what you want the large language model
to do and the user message is a specific
instruction that you wanted to carry out
given this higher level behavior that
was specified in the system message
here's an illustration of how it all
works
so this is how the chat format works
the system message sets the overall tone
or behavior of the large language model
or the assistant and then when you give
the user message such as maybe such as a
tell me a joke or write me a poem it
will then output
an appropriate response following what
you asked for in the user message and
consistence with the overall Behavior
set in the system message
and by the way although I'm not
illustrating it here if you want to use
this in a multi-term conversation you
can also input assistant messages
in this messages format to let chai GPD
know what it had previously said
if you wanted to continue the
conversation based on things that had
previously said as well
but here are a few more examples
if you want to set the tone
to tell it to have a one sentence long
output then in the system message I can
say all your responses must be one
sentence long
and when I execute this it outputs a
single sentence is no longer a poem not
in a soft Dr Seuss but this is a single
sentence there's a story about the happy
carrots
and
if we want to combine both specifying
the style and the length then I can use
the system message to say you're an
assistant in response to style Dr Seuss
all your sentences must be one sentence
long and now
this generates a nice one sentence poem
it was always smiling and never scary I
like that that's a very happy poem
and then lastly just for fun if you are
using an OM and you want to know how
many tokens are you using
here's the hope for function that is a
little bit more sophisticated in that it
gets a response from the open AI API
endpoint and then it uses other values
in the response to tell you how many
prompt tokens completion tokens and
total tokens were used in your API call
let me Define that
and if I
run this now
here's the response and here is
accounts of how many tokens we use so
this output which had 55 tokens whereas
The Prompt input had 37 tokens so this
used up 92 tokens altogether when I'm
using our models in practice I don't
worry that much frankly about the number
of tokens I'm using maybe one case where
it might be worth checking the number of
tokens is if you're worried that the
user might have given you too long an
inputs that exceeds the 4000 or so token
limits of chai GPT in which case you
could double check how many tokens it
was and trunk later to make sure you're
staying within the input token limit of
the launch language model
now I want to share with you one more
tip for how to use a large language
model
calling the open AI API requires using
an API key that's tied to either free or
a paid account and so many developers
will write the API key in plain text
like this into the jupyter notebook and
this is a less secure way of using API
keys that I would not recommend you use
because it's just too easy to share this
notebook with someone else or check this
into GitHub or something and does end up
leaking your API key to someone else
in contrast
what you saw me do in the Jupiter
notebook was this piece of code where I
use the Library dot end and then run
this commands and load Dot and find.n to
read a local file which is called dot
end that contains my secret key
and so with this code snippet
I have locally stored a file called dot
enth that contains my API key and this
loads it into the operating systems
environmental variable
and then OS dot get nth open-air API key
stores it into this variable and in this
whole process I don't ever have to enter
the API key in plain text in unencrypted
plain text into my Jupiter notebook so
this is a relatively more secure and a
better way to access the API key and in
fact this is a general method for
storing different API keys from lots of
different online services that you might
want to use and call for the Jupiter
notebook
lastly
I think
the degree to which prompting is
revolutionizing AI application
development is still underappreciated
in the traditional supervised machine
learning workflow like the restaurant
reviews sentiment classification example
that I touched on just now if you want
to build a classifier to classify
restaurant review positive and negative
sentiments you at first get a bunch of
label data Maybe hundreds of examples
this might take I don't know weeks maybe
a month
then you would train a model on data and
getting an appropriate open source model
tuning on the model evaluating it that
might take
days weeks maybe even a few months
and then you might have to find a cloud
service to deploy it and then get your
model upload to the cloud and then run
the model and finally be able to call
your model and it's again not in common
for this to take a team a few months to
get working
in contrast with prompting base machine
learning when you have a text
application you can specify a prompt
this can take minutes maybe hours if you
need to iterate a few times to get an
effective prompt and then in hours maybe
at most days but frankly more often
hours you can have this running using
API calls and start making calls to the
model and once you've done that in just
again maybe minutes or hours you can
start calling the model and start making
inferences
and so there are applications that used
to take me maybe six months or a year to
build that you can now build in minutes
or hours maybe very small numbers of
days using prompting and this is
revolutionizing what AI applications can
be built quickly one important caveat
this applies to many unstructured data
applications including specifically text
applications and maybe increasing the
vision applications although division
technology is much less mature right now
but it's kind of getting there this
recipe doesn't really work for
structured data applications meaning
machine learning applications on tabular
data with lots of numerical values and
Excel spreadsheets say
but for applications to which this does
apply the fact that AI components can be
built so quickly is changing the
workflow of how the entire system might
be built build the entire system I still
take days or weeks or something but at
least this piece of it can be done much
faster
and so with that let's go on to the next
video where Iza will show how to use
these components to evaluate the input
to a customer service assistant and this
will be part of a bigger example that
you see developed through this course
for building a customer service
assistant for an online retailer
Voir Plus de Vidéos Connexes
大语言模型微调之道1——介绍
第2集-指南-ChatGPT提示工程师|AI大神吴恩达教你写提示词
Trying to make LLMs less stubborn in RAG (DSPy optimizer tested with knowledge graphs)
Introduction to Generative AI
【生成式AI導論 2024】第5講:訓練不了人工智慧?你可以訓練你自己 (下) — 讓語言彼此合作,把一個人活成一個團隊 (開頭有芙莉蓮雷,慎入)
Networking for GenAI Training and Inference Clusters | Jongsoo Park & Petr Lapukhov
5.0 / 5 (0 votes)