大语言模型微调之道2——为什么要微调
Summary
TLDR本课程介绍了为什么应该对大型语言模型(LLMs)进行微调,微调是什么,以及如何通过实验比较微调模型与非微调模型。微调是将通用模型专业化,使其更适合特定用途,如聊天或代码自动完成。与提示工程相比,微调可以处理更多数据,纠正错误信息,并减少幻觉现象。此外,微调有助于提高模型在特定领域的性能和一致性,增强隐私保护,降低成本,并提供更好的控制。课程还介绍了用于微调的不同技术,包括PyTorch、Hugging Face和Laminai库。
Takeaways
- 📚 微调(Fine-tuning)是将通用模型(如GPT-3)转变为特定用途的模型,如聊天机器人或代码自动完成工具。
- 👨⚕️ 微调模型类似于从全科医生(通用模型)到心脏病专家(特定用途模型)的转变,提供更深入的专业知识。
- 🧠 微调使模型能够从大量数据中学习,而不仅仅是访问数据,从而提升其性能和专业性。
- 💡 微调有助于模型提供更一致的输出和行为,减少模型的幻觉(hallucinations)问题。
- 🚀 与提示工程(Prompt Engineering)相比,微调可以处理几乎无限量的数据,允许纠正模型之前的错误信息。
- 💼 微调适用于企业级或特定领域的用例,适合生产环境使用。
- 🔒 微调可以在私有环境中进行,有助于防止数据泄露和保护隐私。
- 💰 微调可以提高成本透明度,对于大量使用的模型,可以降低每请求的成本。
- ⏱️ 微调可以减少模型响应的延迟,对于需要快速响应的应用(如自动完成)尤其重要。
- 🛡️ 微调允许为模型设置更多的安全防护,如自定义响应和内容过滤。
- 📚 微调过程中可以使用多种技术,包括PyTorch、Hugging Face和Laminar等库。
- 📈 微调模型在实际应用中的表现明显优于未微调模型,能够提供更准确和有用的信息。
Q & A
为什么要进行模型微调?
-模型微调是为了让通用模型(如GPT-3)专门化,适应特定的使用场景,比如将GPT-4转变为专门自动完成代码的GitHub Copilot。微调可以使模型从更多数据中学习,提升其在特定领域的专业性和一致性,减少错误信息的产生,并能更好地适应用户的需求。
微调和提示工程(prompt engineering)有什么区别?
-提示工程是通过精心设计的问题来引导模型产生期望的输出,不需要额外数据,适合快速开始和原型开发。而微调则是通过大量特定数据训练模型,使其在特定任务上表现得更好,但需要更多的数据和计算资源。
微调模型有哪些优势?
-微调模型可以提高性能,减少生成错误信息的情况,使模型在特定领域有更深入的专业知识,输出更一致,并且可以更好地进行内容审查。此外,微调可以在私有环境中进行,有助于保护数据隐私和防止数据泄露。
微调模型有哪些潜在的缺点?
-微调模型需要大量的高质量数据,存在前置计算成本,并且可能需要一定的技术知识来正确地准备和使用数据。与简单的提示工程相比,微调的门槛更高。
微调模型适合哪些使用场景?
-微调模型适合企业级或特定领域的使用场景,特别是当模型需要频繁使用或者需要处理大量请求时。微调可以使模型更加专业和稳定,适合生产环境。
在微调过程中,隐私如何得到保护?
-微调可以在私有云(VPC)或本地进行,这样可以防止数据泄露和第三方解决方案可能带来的数据安全问题。
微调模型如何帮助降低成本?
-通过微调一个较小的模型,可以降低每请求的成本,从而在大量使用模型时节省开支。此外,微调后的模型可以更好地控制成本,包括响应时间和吞吐量。
在微调模型时,有哪些工具和库可以使用?
-可以使用包括PyTorch、Hugging Face和Laminai(Llama)库在内的多种工具和库。PyTorch是最低级别的接口,Hugging Face提供了更高级别的接口,而Laminai则提供了一个非常高级的接口,可以用很少的代码训练模型。
微调模型时,如何处理模型的自动补全问题?
-在微调模型时,可以通过添加指令标签来告知模型具体的指令和边界,从而避免模型继续自动补全无关的内容。
在比较微调模型和非微调模型时,有哪些明显的差异?
-微调模型在处理特定任务时,如训练狗坐下的指令,能够提供更详细、更准确的指导。相比之下,非微调模型可能无法理解或正确响应特定的指令。
微调模型在对话中的表现如何?
-微调模型在对话中能够更好地理解上下文和问题,提供连贯和相关的回答。而非微调模型可能无法进行有效的对话,其回答可能不相关或者缺乏连贯性。
Outlines
📚 了解为什么需要微调AI模型
本段落介绍了微调AI模型的重要性和基本概念。微调是将通用模型(如GPT-3)转化为特定用途的模型(如聊天机器人或GitHub Copilot),以提高其在特定任务上的表现。通过微调,模型能够从更多数据中学习,从而变得更专业,类似于从全科医生到专科医生的转变。此外,微调有助于模型提供更一致的输出,减少错误信息,并能够根据新数据更新知识库。
🔍 微调与提示工程的比较
这一段讨论了微调和提示工程(prompt engineering)的区别和各自的优缺点。提示工程不需要额外数据,成本低,易于开始,适合快速原型制作和通用场景。而微调需要更多高质量数据,有前期计算成本,可能需要技术知识,但可以处理大量数据,纠正错误信息,适用于企业级或特定领域的使用场景。此外,微调有助于提高模型的性能、一致性和隐私性,同时降低成本和提高对模型的控制力。
🧠 微调模型的实际应用示例
本段落通过实际的编程示例展示了微调模型和非微调模型在实际应用中的差异。通过比较两个模型对同一问题的回答,明显可以看出微调模型在理解指令、提供相关回答和进行对话方面表现更佳。微调模型能够根据指令提供具体的训练狗狗坐下的步骤,而未微调的模型则无法给出有用的回答。此外,微调模型在处理Mars话题和Taylor Swift的搜索查询时也显示出更好的理解和回答能力。
Mindmap
Keywords
💡fine-tuning
💡prompt engineering
💡general-purpose models
💡specialized models
💡data
💡hallucinations
💡retrieval augmented generation (RAG)
💡cost
💡privacy
💡moderation
💡libraries
Highlights
本课程将学习为什么应该对大型语言模型(LLMs)进行微调,以及微调究竟是什么。
微调是将通用模型如GPT-3专业化,以适应特定的使用案例,例如聊天机器人或GitHub Copilot自动完成代码。
微调的模型可以处理比提示更多的数据,从而从这些数据中学习并提升自身能力。
微调有助于模型提供更一致的输出和行为,减少模型的幻觉问题。
与提示工程相比,微调可以适应几乎无限量的数据,允许纠正模型之前学习的错误信息。
微调后,模型在特定领域内的性能得到提升,可以更加专业和一致。
微调可以在私有环境中进行,有助于防止数据泄露和数据泄露。
微调可以降低成本,提高对成本的控制,包括响应时间和成本。
微调有助于提高模型的监管能力,为模型提供定制化响应。
使用微调的模型可以在对话中更好地进行轮次转换,提供更有用的回复。
微调模型能够针对特定指令给出清晰、具体的指导,而不是简单的重复或无关信息。
微调模型在处理查询时能够提供更准确的信息,如关于泰勒·斯威夫特最好的朋友的问题。
微调模型在模拟对话时能够更好地理解上下文并给出相关回复。
微调模型可以通过特定的指令标签来控制模型的输出,避免不必要的自动完成。
在比较微调模型和非微调模型时,微调模型在执行特定任务时表现更佳。
微调模型能够根据用户的输入提供更加详细和有用的步骤指导。
微调模型在处理复杂查询时能够提供更加精准和相关的信息。
微调模型在对话中能够更好地捕捉用户意图,提供更加个性化的回复。
Transcripts
in this lesson you'll get to learn why
you should fine tune what fine tuning
really even is compare it to prompt
engineering and go through a lab where
you get to compare a fine-tuned model to
a non-fine-tuned model
cool let's get started
all right so why should you fine-tune
llms
well before we jump into why let's talk
about what fine-tuning really is so what
fine tuning is is taking these general
purpose models like gpd3 and
specializing them into something like
chat GPT the specific chat use case to
make it chat well or using gpt4 and
turning that into a specialized GitHub
co-pilot use case to autocomplete code
an analogy I like to make is a PCP a
primary care physician is like your
general purpose model you go to your PCP
every year for a general checkup but a
fine tune or specialized model is like a
cardiologist or dermatologist a doctor
that has a specific specialty and can
actually take care of your heart
problems or skin problems in much more
depth so what fine tuning actually does
for your model is that it makes it
possible for you to give it a lot more
data than what fits into the prompt so
that your model can learn from that data
rather than just get access to it and
from that learning process it's able to
upgrade itself from that PCP into
something more specialized like a
dermatologist so you can see in this
figure you might have some symptoms that
you input into the model like skin
irritation redness itching and the base
model which is the general purpose model
might just say this is probably acne a
model that is fine-tuned on Dermatology
data however by taking the same symptoms
and be able to give you a much clearer
more specific diagnosis in addition to
learning new information
fine-tuning can also help steer the
model to more consistent outputs or more
consistent behavior for example you can
see the base model here when you ask it
what's your first name it might respond
with what's your last name because it's
seen so much survey data out there of
different questions so it doesn't even
know that it's supposed to answer that
question but a fine-tuned model by
contrast when you ask it what's your
first name would be able to respond
clearly my first name is Sharon this bot
was probably trained on Me In addition
to steering the model to more consistent
outputs or behavior fine tuning can help
the model reduce hallucinations which is
a common problem where the model makes
stuff up maybe it will say my first name
is Bob when this was trained on my data
and my name is definitely not Bob
overall fine tuning enables you to
customize the model to a specific use
case in the fine-tuning process which
will go into far more detail later is
very similar to the model's earlier
training recipe
so now to compare it with something that
you're probably a little bit more
familiar with which is prompt
engineering this is something that
you've already been doing for a while
with large language models but maybe
even for over the past decade with
Google which is just putting a query in
editing the query to change the results
that you see so there are a lot of Pros
to prompting one is that you really
don't need any data to get started you
can just start chatting with the model
there's a smaller upfront cost so you
don't really need to think about cost
since every single time you ping the
model it's not that expensive
and you don't really need technical
knowledge to get started you just need
to know how to send a text message
what's cool is that there are now
methods you can use such as retrieval
augmented generation or rag to connect
more of your data to it to selectively
choose what kind of data goes into the
prompt
now of course if you have more than a
little bit of data then it might not fit
into the prompt so you can't use that
much data oftentimes when you do try to
fit in a ton of data unfortunately it
will forget a lot of that data there are
issues with hallucination which is when
the model does make stuff up and it's
hard to correct that incorrect
information that it's already learned so
while using retrieval augmented
generation can be great to connect your
data it will also often miss the right
data get the incorrect data and cause
the model to Output the wrong thing fine
tuning is kind of the opposite of
prompting so you can actually fit in
almost an unlimited amount of data which
is nice because the model gets to learn
new information on that data as a result
you can correct that incorrect
information that it may have learned
before or even put in recent information
that it hadn't learned about previously
there's less cost afterwards if you do
fine tune a smaller model and this is
particularly relevant if you expect to
hit the model A lot of times so have a
lot of either throughput or you expect
it to just handle a larger load
and also retrieval augmented generation
can be used here too I think sometimes
people think it's a separate thing but
actually you can use it for both cases
so you can actually connect it with far
more data as well even after it's
learned all this information
there are cons however you need more
data and that data has to be higher
quality to get started there is an
upfront compute cost as well so it's not
free necessarily it's not just a couple
dollars just to get started of course
there are now free tools out there to
get started but there is compute
involved in making this happen far more
than just prompting
and oftentimes you need some technical
knowledge to get the data in the right
place
um and that that's especially you know
surrounding this data piece and you know
there are more and more tools now that's
making this far easier but you still
need some understanding of that data and
uh you don't you don't have to be just
anyone who can send a text message
necessarily
so finally what that means is for
prompting you know that's great for
generic use cases it's great for
different side projects and prototypes
it's great to just get started really
really fast meanwhile fine tuning is
great for more Enterprise or domain
specific use cases and for production
usage and we'll also talk about how it's
useful for privacy in this next section
which is the benefits of fine-tuning
your own llm so if you have your own llm
that you fine-tuned one benefit you get
is around performance so this can stop
the llm from making stuff up especially
around your domain it can have far more
expertise in that domain it can be far
more consistent so sometimes these
models will just produce you know
something really great today but then
tomorrow you hit it and it isn't
consistent anymore it's not giving you
that great output anymore and so this is
one way to actually make it far more
consistent and reliable and and you can
also have it be better at moderating if
you've played a lot with Chachi BT you
might have seen charging Beauty to say
I'm sorry I I can't respond to that and
you can actually get it to say the same
thing or something different that's
related to your company or use case to
help the person chatting with it stay on
track and again so now I want to touch
on privacy when you fine-tune your own
llm this can happen in your VPC or on
premise this prevents data leakage and
data breaches that might happen on off
the shelf third-party Solutions and so
this is one way to keep that data safe
that you've been collecting for a while
that might be the last few days it might
be the last couple decades as well
another reason you might want to
fine-tune your own llm is around cost so
one is just cost transparency maybe you
have a lot of people using your model
and you actually want to lower the cost
per request then fine-tuning a smaller
llm can actually help you do that and
overall you have greater control over
cost and a couple other factors as well
that includes uptime and also latency
you can greatly reduce the latency for
certain applications like autocomplete
you might need latency that is sub 200
milliseconds so that it is not
perceivable by the person doing
autocomplete you probably don't want
autocomplete to happen across 30 seconds
which is currently the case with running
gpd4 sometimes and finally in moderation
we talked about that a little bit here
already but basically if you want the
model to say I'm sorry to certain things
or to say I don't know to certain things
or even to have a custom response this
is one way to actually provide those
guard rails to the model and what's
really cool is you actually get to see
an example of that in the notebooks all
right so across all of these different
Labs you'll be using a lot of different
Technologies to fine-tune so there are
three python libraries one is pi torch
developed by meta this is the lowest
level interface that you'll see and then
there's a great Library by hugging face
on top of Pi torch and a lot of the
great work that's been done and it's a
much higher level you can import data
sets and train models very easily and
then finally you'll see the laminai
library which I've been developing with
my team and we call it the Llama library
for all the great llamas out there and
this is an even higher level interface
where you can train models with just
three lines of code all right so let's
hop over to the notebooks and see some
fine-tuned models in action
okay so we're going to compare a
fine-tuned model with a non-fine-tuned
model so first we're importing from the
Llama Library again this is from lamini
the basic model Runner and all this
class does is it helps us run open
source models so these are hosted open
source models on gpus to run run them
really efficiently and the first model
you can run here is the Llama 2 model
which is very popular right now and this
one is not fine-tuned so we're going to
just instantiate it based on this is its
hugging face name and we're gonna say
tell me how to train my dog to sit so
it's just you know really really simple
here into the non-fine-tuned model
we're going to get the output out
and let's print
non-tuned
output and see
okay so we asked it tell me how to train
my dog to sit it said period and then
tell me how to train my dog to say tell
me how to teach by dog to Cub and tell
me how to get my dog to heal so clearly
this is very similar to the what's your
first name what's your last name answer
this model has not been told or trained
to actually respond to that command
so maybe a bit of a disaster but let's
keep looking so maybe we can ask it
um what do you think of Mars
so now you know at least it's responding
to the question but it's not great
responses I think it's a great Planet I
think it's a good Planet I think it'll
be a great Planet so it keeps going
um very philosophical potentially even
existential if you keep reading
all right what about something like a
Google search query like Taylor Swift's
best friend let's see what that actually
says
all right well uh it doesn't quite get
Taylor Swift's best friend but
um it did say that it's a huge Taylor
Swift fan
um
all right let's keep exploring maybe
something that's a conversation to see
if it can do turns in a conversation
like Chachi PT so this is uh agent for
an Amazon delivery order
okay so uh at least it's doing the
different customer agent turns here uh
but it isn't quite getting getting
anything out of it this is not something
usable for any kind of like fake turns
or or help with making an auto agent all
right so you've seen enough of that
let's actually compare this to llama 2
that has been fine-tuned to actually
chat
so I'm going to instantiate the fine
tune model notice that this name all
that's different is this chat here and
then I'm going to let this fine tune
model
do the same thing so tell me how to
train my dog to sit I'm going to print
that
okay very interesting so you can
immediately tell a difference so tell me
how to train my dog to say it's still
trying to auto complete that so tell me
how to train my dog to sit on command
but then it actually goes through almost
a step-by-step guide of what to do to
train my dog to sit
cool so that's much much better
and the way to actually quote unquote
get rid of this extra autocomplete thing
is actually to inform the model that you
want instructions so I'm actually
putting these instruction tags here this
was used for llama too you can use
something different when you fine-tune
your own model but this helps with
telling the model hey these are my
instructions and these are the
boundaries I'm done with giving this
instruction stop stop continuing to give
me an instruction so here you can see
that it doesn't autocomplete that on
command thing and just to compare just
to be fair we can use the see what the
non-fine-toed model actually says
um great it just repeats the same uh
same thing or something very similar so
um not quite right cool let's keep going
down so what do you think of Mars this
model
oh it's a fascinating planet let's
capture the imagination of humans for
centuries Okay cool so something that's
much better out uh output here what
about Taylor Swift's best friend let's
see how this does
okay this one's pretty cute it has a few
candidates for who uh Taylor Swift's
best friend actually is
let's take a look at these turns from
the Amazon Delivery Agent okay it says I
see can you provide me with your order
number this is much much better it's
interesting because I down here it also
summarizes what's going on
which you know may or may not be
something that you would want and that
would be something you can fine tune
away and now I'm curious Which hat GPD
would say for tell me how to train my
dog to sit
okay so it gives you know different
steps as well great all right feel free
to use Chachi BT or any other model to
to see what else they can each do uh and
compare the results but it's pretty
clear I think that the ones that have
been fine-tuned including Chachi PT and
this llama2 chat llm they're clearly
better than the one that was not
fine-tuned
now in the next lesson we're going to
see where fine tuning fits in in the
whole training process so you'll get to
see the first step in in how to even get
here with this fine-tuned model
Weitere ähnliche Videos ansehen
5.0 / 5 (0 votes)