Determinism in the AI Tech Stack (LLMs): Temperature, Seeds, and Tools

All About AI
2 Jun 202422:59

Summary

TLDRこのビデオでは、AIスタックとソフトウェアスタックの違いに焦点を当て、特に大きな言語モデル(LLM)における決定性と非決定性の問題について考察しています。AIテキストスタックは非決定的であることが多く、ソフトウェアスタックと比較して信頼性が低いと説明されています。ビデオでは、決定性を高めるために行うかもしれない実験と改善策を探求し、数学問題を解決するLLMの能力をテストします。また、温度パラメータの調整、プロンプトの改善、シードの使用など、決定性を高めるためのさまざまな手法を紹介しています。最後に、生成的AIにおける「幻覚」がバグか特徴かという問題についても触れています。

Takeaways

  • 🧑‍💻 AIスタックとソフトウェアスタックを比較し、決定性と非決定性を検討した。
  • 🔄 AIテキストスタックはソフトウェアスタックと比較して非決定性が高く、同じコードを複数回実行しても異なる結果が得られることがある。
  • 🛠️ ベクターデータベースやLLMオーケストレーター、パラメータの変更などを使ってLLMの決定性を改善することができる。
  • 📈 モデルプロバイダーやフロントエンド開発者などが関与するAIテキストスタックとソフトウェアスタックは混在するが、決定性と非決定性の出力を区別する。
  • 🧩 LLMの決定性を改善するための実験を行い、パラメータの調整やプロンプトの改善、シードの使用、ファインチューニングなどがある。
  • 🔢 数式を解決する実験を通じて、LLMが数学問題を解決する能力を評価し、決定性のレベルを比較した。
  • 🌡️ 温度パラメータの調整がLLMの決定性に影響を与えることを実験で確認した。
  • 📚 OpenAIのシード機能を使用することで、LLMの出力をより決定的に行うことができる。
  • 🤖 GPT-40のような大きなファウンデーションモデルでは、決定性のある正確な答えを得ることができる。
  • 🔧 AIスタックとソフトウェアスタックを組み合わせることで、より決定的な出力を生成することができる。

Q & A

  • AI スタックとソフトウェア スタックの違いは何ですか?

    -AI スタックは非決定性と決定性という観点から考えられますが、ソフトウェア スタックは信頼性が高く、同じコードを何度も実行しても同じ結果が得られるという決定性を持っています。

  • LLMの決定性を改善するために何ができますか?

    -ベクターデータベースやLLMオーケストレーターを使用したり、パラメータを変更することでLLMの決定性を改善できます。

  • 温度パラメータがLLMの出力にどのような影響を与えますか?

    -温度パラメータを1から0に変更することで、LLMの出力は決定性を高めることができますが、答えが間違っている場合もあります。

  • 大きな基礎モデルとオープンソースモデルの間にはどのような違いがありますか?

    -大きな基礎モデル(例:GPT-40)は数学問題を正しく解決できますが、オープンソースモデル(例:F3)は決定性は高められるものの、答えが正しくないことがあります。

  • 決定性ソフトウェアとLLMを組み合わせることで何が得られますか?

    -決定性ソフトウェアとLLMを組み合わせることで、より決定的な出力を得ることができます。F3モデルで生成されたコードを実行することで、方程式を解決できる可能性があります。

  • OpenAIが提供するシード機能とは何ですか?

    -シード機能は、同じシード値とパラメータを指定することで、繰り返しのリクエストが同じ結果を返すように努力するベータ機能です。しかし、決定性は保証されず、バックエンドの変更を監視するためにシステムフィンガープリント応答パラメータを参照する必要があります。

  • ジェネラティブAIにおける「幻觉」はバグですか、それとも特徴ですか?

    -ジェネラティブAIでは、新しいものを創造するためには「幻觉」のような非決定性要素が必要とされていますが、特定のユースケースでは常に正確な結果が必要な場合もあります。

  • プロンプトの改善はLLMの決定性にどのように影響を与えますか?

    -コンテキストの追加、グラウンドイング、システムメッセージの使用、シード、ファインチューニングなどのプロンプトの改善は、LLMの決定性を高めるのに役立ちます。

  • スポンサーであるHubSpotが提供する無料のebookは何について述べていますか?

    -HubSpotが提供する無料のebookは、AIがスタートアップのGTM戦略をどのように再定義しているかについて述べており、特に人気のあるAIツールとスケーリングのためのベストプラクティスについて触れています。

  • AI スタックとソフトウェア スタックを組み合わせることでどのような利点がありますか?

    -AI スタックとソフトウェア スタックを組み合わせることで、より決定的な出力を得られるだけでなく、AIの創造性と多様性とソフトウェアの信頼性と決定性を両立できます。

Outlines

00:00

🤖 AIとソフトウェアスタックの比較と決定性向上の可能性

ビデオではAIスタックとソフトウェアスタックを比較し、特に大きな言語モデル(LLM)における決定性の向上について議論します。AIスタックは非決定性とされる一方で、ソフトウェアスタックは決定性を持ちます。しかし、AIスタックでも決定性を高める方法が存在する可能性について実験を通じて探求します。また、ビデオはHubSpotによってスポンサーされています。

05:01

🔢 Pythonコードを使った決定性検証とAIモデルの比較

ビデオではPythonコードを用いた決定性のある答えとそれに対比してF modelの温度パラメータを1と0に設定して実行した場合の結果を比較します。温度パラメータが1の時には答えがばらつき、0に設定すると答えが一定になりましたが、正確性は失われていた点に着目し、温度パラメータの変更が決定性に与える影響について解説します。

10:03

🔧 AIと決定性ソフトウェアの組み合わせによる数学問題の解決

ビデオではF3モデルを使用して数学問題を解決し、温度パラメータを0に設定して決定性を高める方法を探求します。さらに、大きなFoundationモデルであるGPT-4を使用して同じ問題を解決し、その後F3モデルに決定性ソフトウェアを組み合わせて問題を解決する手法を提案します。

15:04

🌱 Open AIのシード機能を使った決定性の向上

ビデオではOpen AIが提供するシード機能を使ってLLMの出力の決定性を向上させる方法を紹介します。シードパラメータを用いて同じプロンプトに対する応答が一定になるかどうかを検証し、決定性の高い出力を得るための新たな手法としてシード機能の有効性を示します。

20:04

🎯 LLMの決定性と擬似的現象の議論

ビデオの最後に、LLMの決定性と擬似的現象(hallucinations)の存在意義について考察します。LLMは決定的でないことが創造性に寄与すると指摘し、一方で特定の用途では決定性が求められると同時に、これらのモデルがどのように進化し、その間のバランスが取られるかについて見極める必要があると結びます。

Mindmap

Keywords

💡AIスタック

AIスタックとは、人工知能を活用するシステムを構築するために必要な技術やツールのセットを指します。このビデオでは、AIテキストスタックとソフトウェアテキストスタックを比較し、それぞれがどのように異なるのかを探求しています。AIスタックは非決定性と決定性の違いに重点を置いて議論されています。

💡ソフトウェアテキストスタック

ソフトウェアテキストスタックは、ソフトウェア開発におけるツールや技術の組み合わせを指し、コードの実行が常に同じ結果を出すという決定性を特徴としています。ビデオでは、これはAIテキストスタックと対比され、信頼性と一貫性を持っている点が強調されています。

💡非決定性

非決定性とは、同じ操作を繰り返しても異なる結果が得られる性質を指します。ビデオでは、特に大きな言語モデル(LLM)を含むAIテキストスタックにおいて、同じコードを何度も実行しても異なる結果が得られるという非決定性を問題視しています。

💡決定性

決定性とは、操作を繰り返しても同じ結果が得られる性質を指します。ビデオでは、ソフトウェアテキストスタックにおける決定性と比較して、AIテキストスタックにおける決定性の向上について議論されています。

💡LLM(Large Language Models)

LLMとは、非常に大きなデータセットを用いてトレーニングされた言語モデルを指し、自然言語処理タスクにおいて優れた性能を発揮します。ビデオでは、LLMの出力の非決定性と、それを改善するための実験について説明しています。

💡温度パラメータ(Temperature)

温度パラメータは、確率分布の平坦さを制御するハイパーパラメータで、LLMの出力の多様性や創造性を調整するために使用されます。ビデオでは、温度を1から0に変更することで、LLMの出力の決定性を向上させる例が紹介されています。

💡ベクターデータベース

ベクターデータベースは、高次元空間のデータポイントを効率的に検索・格納できるデータベースです。ビデオでは、AIテキストスタックにおける決定性の向上に役立つツールとしてベクターデータベースが言及されています。

💡LLMオーケストレーター

LLMオーケストレーターとは、言語モデルを効果的に管理・調整するためのツールまたはシステムのことです。ビデオでは、LLMオーケストレーターがAIテキストスタックにおける決定性の向上に役立つと述べています。

💡プロンプト(Prompt)

プロンプトとは、LLMに対して入力するテキストや質問であり、モデルの出力に大きな影響を与えます。ビデオでは、プロンプトの改善がLLMの決定性向上に寄与する手段の1つとして触れられています。

💡シード(Seed)

シードとは、確率的なプロセスにおいて特定の結果を得るために使用される初期値です。ビデオでは、OpenAIのAPIでシードを使用することでLLMの出力の決定性を向上させる方法が紹介されています。

Highlights

AI stack与软件技术栈的比较,探讨了确定性与非确定性的概念。

提出了改善大型语言模型(LLM)确定性的方法,包括使用向量数据库和LLM协调器。

展示了通过改变温度参数来影响LLM输出的确定性。

进行了实验,测试了在不同温度设置下,LLM解决数学问题的效果。

讨论了生成AI中的幻觉问题,探讨它是bug还是feature。

介绍了赞助商HubSpot及其为初创企业提供的免费电子书。

使用Python代码展示了确定性计算方程的方法。

通过改变温度参数,观察到LLM输出从非确定性变为确定性。

解释了温度参数在LLM中的作用,以及如何影响输出的创造性和确定性。

展示了GPT-4模型在解决同一数学问题时的正确性和确定性。

提出了结合AI技术栈和软件技术栈的方法,以提高确定性。

使用F3模型生成Python代码,并执行以解决方程,展示了混合技术栈的优势。

介绍了OpenAI提供的seed参数,用于改善LLM输出的确定性。

通过实验,验证了seed参数在保持LLM输出一致性方面的效果。

讨论了LLM的幻觉问题,以及在某些情况下它可能是一个特性而非缺陷。

总结了视频内容,强调了尽管LLM天生非确定性,但有方法可以改善其确定性输出。

Transcripts

play00:00

I want to do a bit of a different video

play00:01

today so I want to talk about these

play00:03

three topics here so the AI stack versus

play00:05

the software Tech stack how we can

play00:07

improve the determinism of llm so I want

play00:10

to do some experiments testing out

play00:12

different things we can actually do to

play00:13

improve this and I want to take a quick

play00:15

look at hallucinations is that a bug is

play00:17

it a feature in generative AI so yeah

play00:20

let's just get into it this video was

play00:23

sponsored by HubSpot okay so this is

play00:25

kind of my idea of how I kind of look at

play00:27

AI text stack versus the software text

play00:30

St you can see I have put on

play00:32

nondeterministic and deterministic so

play00:35

this is kind of my idea of how I think

play00:37

about the differences between these

play00:38

Stacks because in the software text

play00:41

stack it's very easy and it's reliable

play00:44

and it should be the terministic like if

play00:47

you run the same code 10 times you would

play00:49

expect to get the same result every time

play00:52

right uh that is kind of not the case

play00:54

when we are using the AI text stack

play00:56

especially when we are involving llms

play00:59

right there are we can do to improve

play01:01

this of course and that could be like a

play01:03

vector database this could be like an

play01:06

llm orchestrator we have some parameters

play01:08

we can change uh but basically uh in my

play01:12

head when I'm operating with the AI teex

play01:15

stack I'm thinking more in this

play01:17

deterministic non-deterministic mode

play01:20

right and that is what's kind of new I

play01:23

guess to software devs working with this

play01:25

stack because it's not so easy to get

play01:28

the same results every time but I just

play01:30

wanted to dive a bit into kind of how we

play01:33

can improve this uh can we uh I'm not

play01:36

100% sure but uh there are some things

play01:38

we can do and one of the things is like

play01:41

mixing this together so I have some

play01:44

examples of that we're going to take a

play01:45

look at that but yeah basically if you

play01:47

want to pause here and read kind of what

play01:49

I think about this you can see we have

play01:50

the model providers we have the front

play01:52

and Dev here on the software text stack

play01:54

some key trends I put down so maybe tool

play01:57

development AI agents emphasis on

play02:00

reduction and

play02:01

efficiency and basically these tax

play02:04

stacks of course mix together but uh I

play02:08

think kind of the differences or what

play02:10

sets it apart is for me kind of the non

play02:13

deterministic versus the deterministic

play02:16

outputs right so I kind of wanted to

play02:18

dive a bit more into that and take a

play02:20

look at can we create some experiments

play02:22

that kind of shows that we can actually

play02:23

improve this deterministic side of llm

play02:26

outputs so you can see we have a math

play02:29

problem here it's a kind of big equation

play02:31

here and if you have played around with

play02:33

these llms you know they are not very

play02:35

good at solving math problems especially

play02:38

when they get a bit complicated I guess

play02:40

maybe the best foundation models now

play02:42

have gotten quite good at it to be

play02:44

honest uh but we are going to do some

play02:46

experiments here with just uh a very

play02:49

small open source model so that's going

play02:51

to be 5 three I think the first test we

play02:53

just going to run this equation trying

play02:55

to solve it with the temperature or the

play02:57

soft Max set at one and then we're going

play02:59

to move move on to zero I'm going to

play03:01

explain a bit what is actually the

play03:02

difference when we are setting our

play03:05

temperature to one versus zero how does

play03:08

this affect our determinism of the

play03:11

output and I think we're just going to

play03:12

run into GPT 4 looking at the foundation

play03:14

model see what kind of result we get

play03:16

that and we're going to do like a

play03:18

combined of the stacks so we're going to

play03:21

do like a side of the llm uh Tech stack

play03:24

and we're going to mix it up with using

play03:26

some more deterministic software stack

play03:29

um maybe like just like a straight up

play03:31

python code so we have some other things

play03:34

we can do to improve the determinism too

play03:37

that is kind of improve our prompts we

play03:39

can add context grounding we can do a

play03:41

system message we can we're going to

play03:43

look at seeds that is very interesting

play03:46

from open Ai and fine tuning I'm going

play03:48

to skip it now but that is also

play03:50

something we can do to improve our

play03:53

deterministic outputs when using llms

play03:55

right uh so the first thing I want to do

play03:58

is just run a straight up deterministic

play04:00

answer to this equation so we're just

play04:02

going to uh run a python code to

play04:05

calculate this and look at the result

play04:07

but first let's just take a quick look

play04:08

at today's sponsor are you running a

play04:10

startup are you planning to start a

play04:12

startup well take a look at this do you

play04:14

know how AI is revolutionizing the

play04:16

startup World well to make this easier

play04:17

to understand HubSpot has created a free

play04:20

ebook called how AI is redefining

play04:23

startup GTM strategy it's a must read

play04:25

for any startup that is looking to

play04:26

harness the power of AI to Skyrocket

play04:29

their growth one one section that really

play04:30

grabbed my attention was the most

play04:31

popular AI tools for GTM and best

play04:33

practices for scaling section the book

play04:35

looks at how to build an AI charge text

play04:37

stack that can take your startup to new

play04:39

heights and the subsection what's in an

play04:41

AI chart startup text stack breaks down

play04:43

the most game-changing AI tools out

play04:45

there from chat GPT to hubspot's very

play04:48

own AI powered Suite but what sets this

play04:50

ebook apart is that it doesn't just list

play04:52

tools it provides a clear actionable

play04:54

framework for choosing the right AI

play04:56

solutions for your specific needs and

play04:59

integrating them seamlessly into your

play05:01

existing workflows imagine being able to

play05:03

automate repetitive task personalized

play05:05

customer instruction at scale and

play05:07

uncover hidden insights into your data

play05:10

all with the power of AI That's the kind

play05:12

of Competitive Edge this ebook can help

play05:14

you achieve so if you are ready to take

play05:16

your startup strategy to the next level

play05:18

with AI you don't want to miss out on

play05:20

this ebook just click the link in the

play05:21

description to get your free copy today

play05:23

a big thanks to HubSpot for sponsoring

play05:25

this video now let's get back to the

play05:27

project okay so here we are kind of

play05:29

operating on kind of the traditional

play05:31

software stack we want to do some

play05:33

equations using python we want to

play05:35

calculate this equation here uh we just

play05:37

going to start by importing math right

play05:40

and let's comment out that and we set

play05:42

this up here we want to run this and we

play05:43

want to print the result right so when

play05:46

we run this now right uh we get this

play05:48

result 11 16.95 but every time we run

play05:52

this now we're not expecting to get a

play05:54

different result right we want to get

play05:56

this every single time and that is the

play05:58

deterministic ru right uh but let's see

play06:01

what happens now when we use our F model

play06:04

with the temperature set to one to

play06:06

actually try to do this will we get the

play06:08

result same result every time let's take

play06:10

a look okay so here we have a simple uh

play06:12

function that runs through llama we're

play06:14

going to set our model to five3 right we

play06:16

have a system message we have a user

play06:18

into query argument so let's change our

play06:21

temperature to one here and let's go

play06:23

down here so we're going to run this

play06:26

with 53 temperature one right and yeah

play06:29

we got just feed in the system message

play06:31

you are a helpful AI assistant we're

play06:33

going to feed in our equation here that

play06:35

is the prompt right and we're going to

play06:37

run that in here and we're just going to

play06:39

print the answer here right so let's see

play06:42

now when we run this a few times will we

play06:44

get the same result every single time so

play06:47

let's run this the first time now you

play06:48

can kind of see this is going to take a

play06:50

bit more time because we are running

play06:52

this on our local computer but it's a

play06:54

pretty fast model right okay so you can

play06:56

see the first result we got was 72. 34

play07:00

that was horribly wrong right I tried to

play07:02

do some calculations here but yeah it

play07:04

kind of failed big time let's run it

play07:07

again so remember 72.3 4 let's try again

play07:11

now we got

play07:14

117.0 let's do one more time now we got

play07:24

147.450 so this is all over the place we

play07:27

can't really use this for anything

play07:28

because this is way too unreliable to do

play07:31

anything with but let's see what changes

play07:34

now when we actually change the

play07:35

temperature from one down to zero okay

play07:39

so let's just go in here let's set this

play07:40

to zero and let's see if this improves

play07:44

anything yeah let's clear that and let's

play07:46

run it again okay so the first run we

play07:49

got 1

play07:51

15.9 that is not correct but still let's

play07:54

see if we get the same result again now

play07:57

11 15.8 now

play08:01

11.89 let's do it again

play08:05

11589 so you can see now we got at least

play08:09

we got like a deterministic output we

play08:12

almost got the same exactly the same

play08:14

output every single time okay so we got

play08:16

it again so you can see what a big

play08:19

difference that made just changing our

play08:20

temperature from 1 to zero the answer is

play08:23

wrong but at least we got some

play08:25

consistency in our output right it

play08:27

wasn't all over the place so yeah let me

play08:30

just explain a bit what is exactly

play08:33

happening here when we change the

play08:35

temperature from 1 to zero okay so the

play08:38

best explanation I have seen for

play08:40

temperature and soft Max is this video

play08:42

from three blue one brown H I'm going to

play08:44

leave a link in the description to this

play08:45

video you better watch this if you're

play08:47

into Transformers deep learning and

play08:49

stuff this is a great video to explain

play08:51

everything what is happening inside on

play08:53

llm right uh but if you look at his

play08:56

example here here you can see the

play08:57

temperature is set to zero and we have

play09:00

another example where the temperature is

play09:01

set to five and we have this example

play09:03

once upon a time there was a once upon a

play09:05

time there was a right with two

play09:07

different temperatures and in the first

play09:09

case when the temperature is zero you

play09:10

can see 100% of the time here we're

play09:13

going to select the word little from

play09:16

kind of the a probability standpoint

play09:19

right but here we kind of spread out the

play09:21

probability of what word we're going to

play09:24

select next and you can see a little has

play09:27

a 6% chance of being selected

play09:29

but young also has a 6% chance and great

play09:34

princess has a 4% so here any kind of

play09:37

bird can pop up because it's so widely

play09:39

spread out right and this temperature

play09:43

setting is kind of what makes llms more

play09:46

creative more diverse writing if you

play09:48

want to think of it like that but that's

play09:50

not always a good thing let's say we

play09:53

want to do some math we don't want

play09:55

random things happening right in output

play09:58

so while I say like for math equation

play10:00

and stuff you better put the temper

play10:02

quite low if you want the some kind of

play10:05

determinism but there a listen to this

play10:07

for a bit I I think it's very

play10:09

interesting some situations like when

play10:11

chat GPT is using this distribution to

play10:13

create a next word there's room for a

play10:16

little bit of extra fun by adding a

play10:17

little extra spice into this function

play10:20

with a constant T thrown into the

play10:22

denominator of those exponents we call

play10:24

it the temperature since it vaguely

play10:26

resembles the role of temperature in

play10:28

certain thermodynamics equ equations and

play10:30

the effect is that when T is larger you

play10:32

give more weight to the lower values

play10:35

meaning the distribution is a little bit

play10:36

more uniform and if T is smaller then

play10:39

the bigger values will dominate more

play10:41

aggressively yeah so you can see here

play10:43

when he increases the temperature the

play10:45

kind of the probability of something

play10:47

being selected is more spread out but

play10:50

when we decrease the temperature like

play10:53

the the bigger ones is going to be

play10:55

selected always right we're in the

play10:57

extreme setting t equal to zero means

play11:00

all of the weight goes to that maximum

play11:02

value for example I'll have gpt3

play11:05

generate a story with the seed text once

play11:07

upon a time there was a but I'm going to

play11:10

use different temperatures in each case

play11:13

temperature zero means that it always

play11:15

goes with the most predictable word and

play11:18

what you get ends up being kind of a

play11:20

trit derivative of

play11:22

Goldilocks yeah so that was kind of what

play11:24

I was talking about so it can be very

play11:26

general very boring and maybe very

play11:30

related to the training data right but

play11:32

here when we changed up the temperature

play11:34

it could be more creative feel more yeah

play11:37

let's hear what he has to say about it a

play11:39

higher temperature gives it a chance to

play11:41

choose less likely words but it comes

play11:43

with a risk in this case the story

play11:45

starts out a bit more originally about a

play11:47

young web artist from South Korea but it

play11:50

quickly degenerates into

play11:52

nonsense yeah so here kind of the it was

play11:56

too much of a spread right so it doesn't

play11:58

even make makes sense sometimes because

play12:00

it it's so far from selecting the word

play12:04

that makes sense so that is interesting

play12:07

technically speaking the API doesn't

play12:09

actually let you pick a temperature

play12:10

bigger than two there is no mathematical

play12:12

reason for this it's just an arbitrary

play12:14

constraint imposed I suppose to keep

play12:16

their tool from being seen generating

play12:18

things that are too nonsensical so if

play12:20

you're curious the way this animation is

play12:22

actually working is I'm taking the 20

play12:24

most probable next tokens that gpt3

play12:26

generates which seems to be the maximum

play12:28

they'll give me and then I tweak the

play12:30

probabilities based on an exponent of

play12:32

1/5 yeah so I kind of think we get it so

play12:36

yeah a very good explanation also this

play12:38

visual component here so yeah go check

play12:41

out this video it's in the description

play12:43

okay so now we kind of know what the

play12:45

temperature is let's take a look at what

play12:47

we have so far so we were on the test

play12:49

with F3 with the temperature set to one

play12:51

we got all over the place like different

play12:52

results we set it down to zero we got

play12:55

the same result every single time but it

play12:57

was wrong right this is the the correct

play13:00

answer uh now let's move on to running

play13:02

like a very new big foundation model and

play13:05

see if this GPT 40 model can solve this

play13:08

and then use F3 again and let's add some

play13:12

deterministic software to help it right

play13:14

and see if we can solve it so let's do

play13:17

the GPT 40 test now so let's just swap

play13:20

out this with the GPT 40 with no python

play13:24

code let's just do yeah the same and

play13:27

let's run it and see if if we actually

play13:29

can get uh if we can solve this without

play13:32

any extra help the temperature should be

play13:36

set to let's set it to zero now because

play13:38

we know that's the the best way to do

play13:41

this okay so let's clear this and let's

play13:45

run it okay so you can see here the

play13:47

answer is 11

play13:49

16.95 that is correct so yeah good job

play13:53

by the GPT 40 model here let's run it

play13:55

one more time just to be sure remember

play13:58

we have the set temperature to zero so

play14:00

we should get exactly the same output

play14:02

right yeah perfect so yeah that means

play14:07

with the bigger Foundation models we can

play14:09

actually solve this equation by just

play14:12

running it through kind of the AI text

play14:14

stack without adding any uh additional

play14:18

tools right so yeah GPT 40 solved this

play14:21

correctly but now let's take a look at

play14:22

our final test so this is going to be

play14:24

the F3 model that didn't solve this but

play14:26

let's add in some extra help here by

play14:30

combining uh this with some uh

play14:33

deterministic software here so let me

play14:35

just show you how I set that up so to

play14:37

solve this we want to do it in a bit

play14:39

different way we want to start with

play14:40

creating a code a code prompt so we're

play14:43

going to feed in our equation here into

play14:46

this prompt question it's going to be

play14:47

equal to our prompt right this is our uh

play14:51

equation and from the question AB

play14:53

generate a python code to solve the

play14:56

equation okay so we're going to run this

play14:59

with AMA chat we're going to try to

play15:01

generate a python code and here we're

play15:03

going to switch up our system message so

play15:05

we're going to you an expert at writing

play15:07

correct executable python code don't use

play15:10

any extra explanation of text write pure

play15:13

executable python code to solve the

play15:15

problem okay so we're going to generate

play15:18

a python code to try to solve this

play15:19

equation we're going to clean it up with

play15:21

a simple function here just to remove

play15:24

some yeah some stuff here to make it

play15:27

executable and then we have function

play15:29

that actually executes uh our python

play15:32

code and then we're going to print the

play15:34

result so remember we are still running

play15:36

the five3 model we can set the

play15:39

temperature to zero first let's try it

play15:43

with one first and then we going to

play15:45

change it back to zero and see if that

play15:47

makes any difference so yeah hopefully

play15:50

this will give us an a more

play15:52

deterministic output and hopefully solve

play15:55

the equation okay so let's run this so

play15:58

this is with the temperature set to one

play16:00

right uh yeah so the result is correct

play16:03

so you can see we imported mat we run

play16:06

the result and we have the equation here

play16:09

and we just run this using pi 3 uh

play16:12

combined with a python code and yeah

play16:14

perfect so that was correct uh so I'm

play16:18

pretty sure like if we switch up the

play16:19

temperature to zero now we also going to

play16:21

get this correct but let's just do it so

play16:23

let's say that to zero let's run it

play16:27

again uh we might actually change up the

play16:31

input a bit Yeah you can

play16:33

see uh it's a bit different isn't

play16:37

it uh I don't know it looks pretty

play16:39

similar to be honest I guess we got

play16:41

print result here instead uh but other

play16:44

than that yeah so here you can kind of

play16:47

see the advantage of adding like a

play16:50

mixing up like with the traditional

play16:52

software stack mixing that with the AI

play16:54

stack the llms and using this as tools

play16:57

to kind of get a more deterministic

play17:00

output so when we combine 53 with a tool

play17:04

we could actually solve this right

play17:06

because we used the fight3 model to just

play17:08

generate the code we needed and then we

play17:10

can just execute on that code right so

play17:12

just to sum this up you can see uh with

play17:15

using the deterministic just the python

play17:17

code we solve this easy right uh with

play17:19

the temperature set to one using the 53

play17:21

model we couldn't Solve IT same here

play17:23

with the zero temperature but when we

play17:25

run the bigger Foundation model we can

play17:27

solve the equation but then we can use

play17:29

the F tree model open source model and

play17:32

add some deterministic software like

play17:34

just a simple python code feed in our

play17:36

equation and then we actually solve it

play17:39

finally I just wanted to take a quick

play17:40

look at some other things we can do to

play17:42

improve determinism but uh uh yeah I

play17:45

think we're just going to look at the

play17:46

seed part now maybe we do some other

play17:49

stuff with context uh and stuff in

play17:51

another video I don't want to drag this

play17:53

out to 45 minutes but let's take a quick

play17:56

look at the open AI solution to this

play17:59

they have this something called seeds so

play18:01

let's go over there take a look at that

play18:02

and how we can actually use it okay so

play18:04

if we go into the API docs for the open

play18:07

AI chat completion you can see we have

play18:09

this option here called seed this

play18:11

features in beta if specified our system

play18:13

will make the best effort to sample

play18:15

deterministically such that repeated

play18:17

request with the same seed and

play18:19

parameters should return the same result

play18:21

determinism is not guaranteed and you

play18:23

should should refer to the system

play18:25

fingerprint response parameter to

play18:27

monitor the changes in the back end so

play18:30

this is kind of their attempt of trying

play18:32

to improve determinism in these llm

play18:35

outputs but let's now kind of take a

play18:37

look at how does this work does it work

play18:40

and how does this compare to just a

play18:42

regular shat completion request so uh I

play18:45

went over here I created uh some

play18:47

implementation of this so we see we have

play18:49

the function here that has included the

play18:52

seed parameter so seed equals seeds and

play18:54

we're going to feed this in as an ARG

play18:56

here so we're going to set our seed

play18:58

value just to random number 69 and we

play19:01

have a second function that does not use

play19:04

the seed both temperatures is going to

play19:06

be set to one and we're going to feed in

play19:08

the same prompt what is the best pizza

play19:10

in New York like with a lot of cheese

play19:12

only least the one top place placee and

play19:15

we going to run this like in two

play19:17

requests here one with the seed and one

play19:20

without and let's see if we can see if

play19:23

like the first uh answer here is more

play19:26

deterministic than the second one so

play19:28

yeah let's just fire this up and see uh

play19:31

yeah if it works okay so let me just run

play19:33

this a few times now and then we can

play19:35

kind of compare the result to see if we

play19:37

can see any differences in using the

play19:40

seed and not using the seed okay so I

play19:42

think we had some good results here so

play19:44

you can see the first one here is always

play19:46

where we use the seed so the best pizza

play19:48

in New York is often cited to be from

play19:50

the far Pizza in Brooklyn and you can

play19:53

see here is just uh the other uh request

play19:56

without the seed the best pizza New York

play19:59

is often said to be the best P New York

play20:02

is often considered the best PSI New

play20:04

York is often attributed to the far piz

play20:08

size Brooklyn's wiely regarded so this

play20:10

produces a different result every single

play20:13

time but you can see where we use the

play20:15

seed we get the same output every single

play20:19

request here so it seems to be working

play20:22

but like they said it's not a guarantee

play20:24

and my thinking is if we expand the

play20:27

prompt we want like a WR long result

play20:29

this is probably not going to hold up

play20:31

but we probably going to check that some

play20:33

other time but for this simple answer

play20:35

here it looks to produce the same output

play20:38

every single time so at least that's

play20:41

something we can do to kind of improve

play20:44

our deterministic output using llms

play20:46

right so this could be an interesting

play20:48

feature to try out going forward right

play20:52

okay just to wrap this up uh I think we

play20:54

kind of pro today that we can do

play20:56

something to improve the terministic out

play20:58

puts of an llm but probably as you

play21:01

already know that these are not designed

play21:04

to actually be deterministic the way

play21:06

they calculate or the way they use the

play21:10

neural network to kind of come up with a

play21:12

a probabilistic answer or the next

play21:15

sentence or the next word or the next

play21:17

token so by Nature these are not tools

play21:21

created to be deterministic because they

play21:23

are generative uh if they were 100%

play21:27

deterministic then

play21:29

yeah what new stuff could they create

play21:31

could they create an image could they

play21:32

create a new kind of text uh I don't

play21:35

think so that would be just reciting the

play21:37

training data and without that

play21:41

hallucination part of it I don't think

play21:43

these llms would be that interesting but

play21:46

of course some use cases are heavily

play21:49

reliant on that we can reproduce the

play21:51

same output and be precise right so I

play21:54

think these uh big companies are working

play21:56

like with open Ai and the seed part they

play21:59

are working on making these models much

play22:01

more deterministic we can add context

play22:04

like we said we can add different tools

play22:06

so going forward I think we're going to

play22:08

see these tools pop up more to kind of

play22:11

get the outputs we are expecting we seen

play22:13

this with rag R adding grounding

play22:15

improving at prompt system messages all

play22:18

of these tools are to reduce

play22:21

hallucinations right but if you think

play22:23

about it is it a bug is it a feature I'm

play22:26

leaning more towards that this

play22:28

hallucinations has to be there if it's

play22:31

going to be something interesting coming

play22:33

out of these models but in some cases

play22:35

like we saw with Google last week uh

play22:38

it's not going to work every single time

play22:41

uh but yeah we just have to wait and see

play22:43

what happens in this space when it comes

play22:45

to these problems with hallucinations

play22:47

and stuff but uh yeah that is just kind

play22:50

of my take on it a bit of a different

play22:52

video today hope you enjoyed it uh other

play22:54

than that thank you for tuning in and

play22:56

I'll see you again on Wednesday

Rate This

5.0 / 5 (0 votes)

Related Tags
AIスタックソフトウェアスタック決定性向上LLM改善生成AI妄想分析数学問題温度パラメータコード生成HubSpotスポンサー
Do you need a summary in English?