Determinism in the AI Tech Stack (LLMs): Temperature, Seeds, and Tools
Summary
TLDRこのビデオでは、AIスタックとソフトウェアスタックの違いに焦点を当て、特に大きな言語モデル(LLM)における決定性と非決定性の問題について考察しています。AIテキストスタックは非決定的であることが多く、ソフトウェアスタックと比較して信頼性が低いと説明されています。ビデオでは、決定性を高めるために行うかもしれない実験と改善策を探求し、数学問題を解決するLLMの能力をテストします。また、温度パラメータの調整、プロンプトの改善、シードの使用など、決定性を高めるためのさまざまな手法を紹介しています。最後に、生成的AIにおける「幻覚」がバグか特徴かという問題についても触れています。
Takeaways
- 🧑💻 AIスタックとソフトウェアスタックを比較し、決定性と非決定性を検討した。
- 🔄 AIテキストスタックはソフトウェアスタックと比較して非決定性が高く、同じコードを複数回実行しても異なる結果が得られることがある。
- 🛠️ ベクターデータベースやLLMオーケストレーター、パラメータの変更などを使ってLLMの決定性を改善することができる。
- 📈 モデルプロバイダーやフロントエンド開発者などが関与するAIテキストスタックとソフトウェアスタックは混在するが、決定性と非決定性の出力を区別する。
- 🧩 LLMの決定性を改善するための実験を行い、パラメータの調整やプロンプトの改善、シードの使用、ファインチューニングなどがある。
- 🔢 数式を解決する実験を通じて、LLMが数学問題を解決する能力を評価し、決定性のレベルを比較した。
- 🌡️ 温度パラメータの調整がLLMの決定性に影響を与えることを実験で確認した。
- 📚 OpenAIのシード機能を使用することで、LLMの出力をより決定的に行うことができる。
- 🤖 GPT-40のような大きなファウンデーションモデルでは、決定性のある正確な答えを得ることができる。
- 🔧 AIスタックとソフトウェアスタックを組み合わせることで、より決定的な出力を生成することができる。
Q & A
AI スタックとソフトウェア スタックの違いは何ですか?
-AI スタックは非決定性と決定性という観点から考えられますが、ソフトウェア スタックは信頼性が高く、同じコードを何度も実行しても同じ結果が得られるという決定性を持っています。
LLMの決定性を改善するために何ができますか?
-ベクターデータベースやLLMオーケストレーターを使用したり、パラメータを変更することでLLMの決定性を改善できます。
温度パラメータがLLMの出力にどのような影響を与えますか?
-温度パラメータを1から0に変更することで、LLMの出力は決定性を高めることができますが、答えが間違っている場合もあります。
大きな基礎モデルとオープンソースモデルの間にはどのような違いがありますか?
-大きな基礎モデル(例:GPT-40)は数学問題を正しく解決できますが、オープンソースモデル(例:F3)は決定性は高められるものの、答えが正しくないことがあります。
決定性ソフトウェアとLLMを組み合わせることで何が得られますか?
-決定性ソフトウェアとLLMを組み合わせることで、より決定的な出力を得ることができます。F3モデルで生成されたコードを実行することで、方程式を解決できる可能性があります。
OpenAIが提供するシード機能とは何ですか?
-シード機能は、同じシード値とパラメータを指定することで、繰り返しのリクエストが同じ結果を返すように努力するベータ機能です。しかし、決定性は保証されず、バックエンドの変更を監視するためにシステムフィンガープリント応答パラメータを参照する必要があります。
ジェネラティブAIにおける「幻觉」はバグですか、それとも特徴ですか?
-ジェネラティブAIでは、新しいものを創造するためには「幻觉」のような非決定性要素が必要とされていますが、特定のユースケースでは常に正確な結果が必要な場合もあります。
プロンプトの改善はLLMの決定性にどのように影響を与えますか?
-コンテキストの追加、グラウンドイング、システムメッセージの使用、シード、ファインチューニングなどのプロンプトの改善は、LLMの決定性を高めるのに役立ちます。
スポンサーであるHubSpotが提供する無料のebookは何について述べていますか?
-HubSpotが提供する無料のebookは、AIがスタートアップのGTM戦略をどのように再定義しているかについて述べており、特に人気のあるAIツールとスケーリングのためのベストプラクティスについて触れています。
AI スタックとソフトウェア スタックを組み合わせることでどのような利点がありますか?
-AI スタックとソフトウェア スタックを組み合わせることで、より決定的な出力を得られるだけでなく、AIの創造性と多様性とソフトウェアの信頼性と決定性を両立できます。
Outlines
🤖 AIとソフトウェアスタックの比較と決定性向上の可能性
ビデオではAIスタックとソフトウェアスタックを比較し、特に大きな言語モデル(LLM)における決定性の向上について議論します。AIスタックは非決定性とされる一方で、ソフトウェアスタックは決定性を持ちます。しかし、AIスタックでも決定性を高める方法が存在する可能性について実験を通じて探求します。また、ビデオはHubSpotによってスポンサーされています。
🔢 Pythonコードを使った決定性検証とAIモデルの比較
ビデオではPythonコードを用いた決定性のある答えとそれに対比してF modelの温度パラメータを1と0に設定して実行した場合の結果を比較します。温度パラメータが1の時には答えがばらつき、0に設定すると答えが一定になりましたが、正確性は失われていた点に着目し、温度パラメータの変更が決定性に与える影響について解説します。
🔧 AIと決定性ソフトウェアの組み合わせによる数学問題の解決
ビデオではF3モデルを使用して数学問題を解決し、温度パラメータを0に設定して決定性を高める方法を探求します。さらに、大きなFoundationモデルであるGPT-4を使用して同じ問題を解決し、その後F3モデルに決定性ソフトウェアを組み合わせて問題を解決する手法を提案します。
🌱 Open AIのシード機能を使った決定性の向上
ビデオではOpen AIが提供するシード機能を使ってLLMの出力の決定性を向上させる方法を紹介します。シードパラメータを用いて同じプロンプトに対する応答が一定になるかどうかを検証し、決定性の高い出力を得るための新たな手法としてシード機能の有効性を示します。
🎯 LLMの決定性と擬似的現象の議論
ビデオの最後に、LLMの決定性と擬似的現象(hallucinations)の存在意義について考察します。LLMは決定的でないことが創造性に寄与すると指摘し、一方で特定の用途では決定性が求められると同時に、これらのモデルがどのように進化し、その間のバランスが取られるかについて見極める必要があると結びます。
Mindmap
Keywords
💡AIスタック
💡ソフトウェアテキストスタック
💡非決定性
💡決定性
💡LLM(Large Language Models)
💡温度パラメータ(Temperature)
💡ベクターデータベース
💡LLMオーケストレーター
💡プロンプト(Prompt)
💡シード(Seed)
Highlights
AI stack与软件技术栈的比较,探讨了确定性与非确定性的概念。
提出了改善大型语言模型(LLM)确定性的方法,包括使用向量数据库和LLM协调器。
展示了通过改变温度参数来影响LLM输出的确定性。
进行了实验,测试了在不同温度设置下,LLM解决数学问题的效果。
讨论了生成AI中的幻觉问题,探讨它是bug还是feature。
介绍了赞助商HubSpot及其为初创企业提供的免费电子书。
使用Python代码展示了确定性计算方程的方法。
通过改变温度参数,观察到LLM输出从非确定性变为确定性。
解释了温度参数在LLM中的作用,以及如何影响输出的创造性和确定性。
展示了GPT-4模型在解决同一数学问题时的正确性和确定性。
提出了结合AI技术栈和软件技术栈的方法,以提高确定性。
使用F3模型生成Python代码,并执行以解决方程,展示了混合技术栈的优势。
介绍了OpenAI提供的seed参数,用于改善LLM输出的确定性。
通过实验,验证了seed参数在保持LLM输出一致性方面的效果。
讨论了LLM的幻觉问题,以及在某些情况下它可能是一个特性而非缺陷。
总结了视频内容,强调了尽管LLM天生非确定性,但有方法可以改善其确定性输出。
Transcripts
I want to do a bit of a different video
today so I want to talk about these
three topics here so the AI stack versus
the software Tech stack how we can
improve the determinism of llm so I want
to do some experiments testing out
different things we can actually do to
improve this and I want to take a quick
look at hallucinations is that a bug is
it a feature in generative AI so yeah
let's just get into it this video was
sponsored by HubSpot okay so this is
kind of my idea of how I kind of look at
AI text stack versus the software text
St you can see I have put on
nondeterministic and deterministic so
this is kind of my idea of how I think
about the differences between these
Stacks because in the software text
stack it's very easy and it's reliable
and it should be the terministic like if
you run the same code 10 times you would
expect to get the same result every time
right uh that is kind of not the case
when we are using the AI text stack
especially when we are involving llms
right there are we can do to improve
this of course and that could be like a
vector database this could be like an
llm orchestrator we have some parameters
we can change uh but basically uh in my
head when I'm operating with the AI teex
stack I'm thinking more in this
deterministic non-deterministic mode
right and that is what's kind of new I
guess to software devs working with this
stack because it's not so easy to get
the same results every time but I just
wanted to dive a bit into kind of how we
can improve this uh can we uh I'm not
100% sure but uh there are some things
we can do and one of the things is like
mixing this together so I have some
examples of that we're going to take a
look at that but yeah basically if you
want to pause here and read kind of what
I think about this you can see we have
the model providers we have the front
and Dev here on the software text stack
some key trends I put down so maybe tool
development AI agents emphasis on
reduction and
efficiency and basically these tax
stacks of course mix together but uh I
think kind of the differences or what
sets it apart is for me kind of the non
deterministic versus the deterministic
outputs right so I kind of wanted to
dive a bit more into that and take a
look at can we create some experiments
that kind of shows that we can actually
improve this deterministic side of llm
outputs so you can see we have a math
problem here it's a kind of big equation
here and if you have played around with
these llms you know they are not very
good at solving math problems especially
when they get a bit complicated I guess
maybe the best foundation models now
have gotten quite good at it to be
honest uh but we are going to do some
experiments here with just uh a very
small open source model so that's going
to be 5 three I think the first test we
just going to run this equation trying
to solve it with the temperature or the
soft Max set at one and then we're going
to move move on to zero I'm going to
explain a bit what is actually the
difference when we are setting our
temperature to one versus zero how does
this affect our determinism of the
output and I think we're just going to
run into GPT 4 looking at the foundation
model see what kind of result we get
that and we're going to do like a
combined of the stacks so we're going to
do like a side of the llm uh Tech stack
and we're going to mix it up with using
some more deterministic software stack
um maybe like just like a straight up
python code so we have some other things
we can do to improve the determinism too
that is kind of improve our prompts we
can add context grounding we can do a
system message we can we're going to
look at seeds that is very interesting
from open Ai and fine tuning I'm going
to skip it now but that is also
something we can do to improve our
deterministic outputs when using llms
right uh so the first thing I want to do
is just run a straight up deterministic
answer to this equation so we're just
going to uh run a python code to
calculate this and look at the result
but first let's just take a quick look
at today's sponsor are you running a
startup are you planning to start a
startup well take a look at this do you
know how AI is revolutionizing the
startup World well to make this easier
to understand HubSpot has created a free
ebook called how AI is redefining
startup GTM strategy it's a must read
for any startup that is looking to
harness the power of AI to Skyrocket
their growth one one section that really
grabbed my attention was the most
popular AI tools for GTM and best
practices for scaling section the book
looks at how to build an AI charge text
stack that can take your startup to new
heights and the subsection what's in an
AI chart startup text stack breaks down
the most game-changing AI tools out
there from chat GPT to hubspot's very
own AI powered Suite but what sets this
ebook apart is that it doesn't just list
tools it provides a clear actionable
framework for choosing the right AI
solutions for your specific needs and
integrating them seamlessly into your
existing workflows imagine being able to
automate repetitive task personalized
customer instruction at scale and
uncover hidden insights into your data
all with the power of AI That's the kind
of Competitive Edge this ebook can help
you achieve so if you are ready to take
your startup strategy to the next level
with AI you don't want to miss out on
this ebook just click the link in the
description to get your free copy today
a big thanks to HubSpot for sponsoring
this video now let's get back to the
project okay so here we are kind of
operating on kind of the traditional
software stack we want to do some
equations using python we want to
calculate this equation here uh we just
going to start by importing math right
and let's comment out that and we set
this up here we want to run this and we
want to print the result right so when
we run this now right uh we get this
result 11 16.95 but every time we run
this now we're not expecting to get a
different result right we want to get
this every single time and that is the
deterministic ru right uh but let's see
what happens now when we use our F model
with the temperature set to one to
actually try to do this will we get the
result same result every time let's take
a look okay so here we have a simple uh
function that runs through llama we're
going to set our model to five3 right we
have a system message we have a user
into query argument so let's change our
temperature to one here and let's go
down here so we're going to run this
with 53 temperature one right and yeah
we got just feed in the system message
you are a helpful AI assistant we're
going to feed in our equation here that
is the prompt right and we're going to
run that in here and we're just going to
print the answer here right so let's see
now when we run this a few times will we
get the same result every single time so
let's run this the first time now you
can kind of see this is going to take a
bit more time because we are running
this on our local computer but it's a
pretty fast model right okay so you can
see the first result we got was 72. 34
that was horribly wrong right I tried to
do some calculations here but yeah it
kind of failed big time let's run it
again so remember 72.3 4 let's try again
now we got
117.0 let's do one more time now we got
147.450 so this is all over the place we
can't really use this for anything
because this is way too unreliable to do
anything with but let's see what changes
now when we actually change the
temperature from one down to zero okay
so let's just go in here let's set this
to zero and let's see if this improves
anything yeah let's clear that and let's
run it again okay so the first run we
got 1
15.9 that is not correct but still let's
see if we get the same result again now
11 15.8 now
11.89 let's do it again
11589 so you can see now we got at least
we got like a deterministic output we
almost got the same exactly the same
output every single time okay so we got
it again so you can see what a big
difference that made just changing our
temperature from 1 to zero the answer is
wrong but at least we got some
consistency in our output right it
wasn't all over the place so yeah let me
just explain a bit what is exactly
happening here when we change the
temperature from 1 to zero okay so the
best explanation I have seen for
temperature and soft Max is this video
from three blue one brown H I'm going to
leave a link in the description to this
video you better watch this if you're
into Transformers deep learning and
stuff this is a great video to explain
everything what is happening inside on
llm right uh but if you look at his
example here here you can see the
temperature is set to zero and we have
another example where the temperature is
set to five and we have this example
once upon a time there was a once upon a
time there was a right with two
different temperatures and in the first
case when the temperature is zero you
can see 100% of the time here we're
going to select the word little from
kind of the a probability standpoint
right but here we kind of spread out the
probability of what word we're going to
select next and you can see a little has
a 6% chance of being selected
but young also has a 6% chance and great
princess has a 4% so here any kind of
bird can pop up because it's so widely
spread out right and this temperature
setting is kind of what makes llms more
creative more diverse writing if you
want to think of it like that but that's
not always a good thing let's say we
want to do some math we don't want
random things happening right in output
so while I say like for math equation
and stuff you better put the temper
quite low if you want the some kind of
determinism but there a listen to this
for a bit I I think it's very
interesting some situations like when
chat GPT is using this distribution to
create a next word there's room for a
little bit of extra fun by adding a
little extra spice into this function
with a constant T thrown into the
denominator of those exponents we call
it the temperature since it vaguely
resembles the role of temperature in
certain thermodynamics equ equations and
the effect is that when T is larger you
give more weight to the lower values
meaning the distribution is a little bit
more uniform and if T is smaller then
the bigger values will dominate more
aggressively yeah so you can see here
when he increases the temperature the
kind of the probability of something
being selected is more spread out but
when we decrease the temperature like
the the bigger ones is going to be
selected always right we're in the
extreme setting t equal to zero means
all of the weight goes to that maximum
value for example I'll have gpt3
generate a story with the seed text once
upon a time there was a but I'm going to
use different temperatures in each case
temperature zero means that it always
goes with the most predictable word and
what you get ends up being kind of a
trit derivative of
Goldilocks yeah so that was kind of what
I was talking about so it can be very
general very boring and maybe very
related to the training data right but
here when we changed up the temperature
it could be more creative feel more yeah
let's hear what he has to say about it a
higher temperature gives it a chance to
choose less likely words but it comes
with a risk in this case the story
starts out a bit more originally about a
young web artist from South Korea but it
quickly degenerates into
nonsense yeah so here kind of the it was
too much of a spread right so it doesn't
even make makes sense sometimes because
it it's so far from selecting the word
that makes sense so that is interesting
technically speaking the API doesn't
actually let you pick a temperature
bigger than two there is no mathematical
reason for this it's just an arbitrary
constraint imposed I suppose to keep
their tool from being seen generating
things that are too nonsensical so if
you're curious the way this animation is
actually working is I'm taking the 20
most probable next tokens that gpt3
generates which seems to be the maximum
they'll give me and then I tweak the
probabilities based on an exponent of
1/5 yeah so I kind of think we get it so
yeah a very good explanation also this
visual component here so yeah go check
out this video it's in the description
okay so now we kind of know what the
temperature is let's take a look at what
we have so far so we were on the test
with F3 with the temperature set to one
we got all over the place like different
results we set it down to zero we got
the same result every single time but it
was wrong right this is the the correct
answer uh now let's move on to running
like a very new big foundation model and
see if this GPT 40 model can solve this
and then use F3 again and let's add some
deterministic software to help it right
and see if we can solve it so let's do
the GPT 40 test now so let's just swap
out this with the GPT 40 with no python
code let's just do yeah the same and
let's run it and see if if we actually
can get uh if we can solve this without
any extra help the temperature should be
set to let's set it to zero now because
we know that's the the best way to do
this okay so let's clear this and let's
run it okay so you can see here the
answer is 11
16.95 that is correct so yeah good job
by the GPT 40 model here let's run it
one more time just to be sure remember
we have the set temperature to zero so
we should get exactly the same output
right yeah perfect so yeah that means
with the bigger Foundation models we can
actually solve this equation by just
running it through kind of the AI text
stack without adding any uh additional
tools right so yeah GPT 40 solved this
correctly but now let's take a look at
our final test so this is going to be
the F3 model that didn't solve this but
let's add in some extra help here by
combining uh this with some uh
deterministic software here so let me
just show you how I set that up so to
solve this we want to do it in a bit
different way we want to start with
creating a code a code prompt so we're
going to feed in our equation here into
this prompt question it's going to be
equal to our prompt right this is our uh
equation and from the question AB
generate a python code to solve the
equation okay so we're going to run this
with AMA chat we're going to try to
generate a python code and here we're
going to switch up our system message so
we're going to you an expert at writing
correct executable python code don't use
any extra explanation of text write pure
executable python code to solve the
problem okay so we're going to generate
a python code to try to solve this
equation we're going to clean it up with
a simple function here just to remove
some yeah some stuff here to make it
executable and then we have function
that actually executes uh our python
code and then we're going to print the
result so remember we are still running
the five3 model we can set the
temperature to zero first let's try it
with one first and then we going to
change it back to zero and see if that
makes any difference so yeah hopefully
this will give us an a more
deterministic output and hopefully solve
the equation okay so let's run this so
this is with the temperature set to one
right uh yeah so the result is correct
so you can see we imported mat we run
the result and we have the equation here
and we just run this using pi 3 uh
combined with a python code and yeah
perfect so that was correct uh so I'm
pretty sure like if we switch up the
temperature to zero now we also going to
get this correct but let's just do it so
let's say that to zero let's run it
again uh we might actually change up the
input a bit Yeah you can
see uh it's a bit different isn't
it uh I don't know it looks pretty
similar to be honest I guess we got
print result here instead uh but other
than that yeah so here you can kind of
see the advantage of adding like a
mixing up like with the traditional
software stack mixing that with the AI
stack the llms and using this as tools
to kind of get a more deterministic
output so when we combine 53 with a tool
we could actually solve this right
because we used the fight3 model to just
generate the code we needed and then we
can just execute on that code right so
just to sum this up you can see uh with
using the deterministic just the python
code we solve this easy right uh with
the temperature set to one using the 53
model we couldn't Solve IT same here
with the zero temperature but when we
run the bigger Foundation model we can
solve the equation but then we can use
the F tree model open source model and
add some deterministic software like
just a simple python code feed in our
equation and then we actually solve it
finally I just wanted to take a quick
look at some other things we can do to
improve determinism but uh uh yeah I
think we're just going to look at the
seed part now maybe we do some other
stuff with context uh and stuff in
another video I don't want to drag this
out to 45 minutes but let's take a quick
look at the open AI solution to this
they have this something called seeds so
let's go over there take a look at that
and how we can actually use it okay so
if we go into the API docs for the open
AI chat completion you can see we have
this option here called seed this
features in beta if specified our system
will make the best effort to sample
deterministically such that repeated
request with the same seed and
parameters should return the same result
determinism is not guaranteed and you
should should refer to the system
fingerprint response parameter to
monitor the changes in the back end so
this is kind of their attempt of trying
to improve determinism in these llm
outputs but let's now kind of take a
look at how does this work does it work
and how does this compare to just a
regular shat completion request so uh I
went over here I created uh some
implementation of this so we see we have
the function here that has included the
seed parameter so seed equals seeds and
we're going to feed this in as an ARG
here so we're going to set our seed
value just to random number 69 and we
have a second function that does not use
the seed both temperatures is going to
be set to one and we're going to feed in
the same prompt what is the best pizza
in New York like with a lot of cheese
only least the one top place placee and
we going to run this like in two
requests here one with the seed and one
without and let's see if we can see if
like the first uh answer here is more
deterministic than the second one so
yeah let's just fire this up and see uh
yeah if it works okay so let me just run
this a few times now and then we can
kind of compare the result to see if we
can see any differences in using the
seed and not using the seed okay so I
think we had some good results here so
you can see the first one here is always
where we use the seed so the best pizza
in New York is often cited to be from
the far Pizza in Brooklyn and you can
see here is just uh the other uh request
without the seed the best pizza New York
is often said to be the best P New York
is often considered the best PSI New
York is often attributed to the far piz
size Brooklyn's wiely regarded so this
produces a different result every single
time but you can see where we use the
seed we get the same output every single
request here so it seems to be working
but like they said it's not a guarantee
and my thinking is if we expand the
prompt we want like a WR long result
this is probably not going to hold up
but we probably going to check that some
other time but for this simple answer
here it looks to produce the same output
every single time so at least that's
something we can do to kind of improve
our deterministic output using llms
right so this could be an interesting
feature to try out going forward right
okay just to wrap this up uh I think we
kind of pro today that we can do
something to improve the terministic out
puts of an llm but probably as you
already know that these are not designed
to actually be deterministic the way
they calculate or the way they use the
neural network to kind of come up with a
a probabilistic answer or the next
sentence or the next word or the next
token so by Nature these are not tools
created to be deterministic because they
are generative uh if they were 100%
deterministic then
yeah what new stuff could they create
could they create an image could they
create a new kind of text uh I don't
think so that would be just reciting the
training data and without that
hallucination part of it I don't think
these llms would be that interesting but
of course some use cases are heavily
reliant on that we can reproduce the
same output and be precise right so I
think these uh big companies are working
like with open Ai and the seed part they
are working on making these models much
more deterministic we can add context
like we said we can add different tools
so going forward I think we're going to
see these tools pop up more to kind of
get the outputs we are expecting we seen
this with rag R adding grounding
improving at prompt system messages all
of these tools are to reduce
hallucinations right but if you think
about it is it a bug is it a feature I'm
leaning more towards that this
hallucinations has to be there if it's
going to be something interesting coming
out of these models but in some cases
like we saw with Google last week uh
it's not going to work every single time
uh but yeah we just have to wait and see
what happens in this space when it comes
to these problems with hallucinations
and stuff but uh yeah that is just kind
of my take on it a bit of a different
video today hope you enjoyed it uh other
than that thank you for tuning in and
I'll see you again on Wednesday
Просмотреть больше связанных видео
5.0 / 5 (0 votes)