GPT-4o Mini First Impressions: Fast, Cheap, & Dang Good.

MattVidPro AI
18 Jul 202416:42

Summary

TLDROpenAI发布了新的GPT 40 Mini模型,旨在替代GPT 3.5,成为成本效益更高的小型模型。该模型在多语言处理和图像识别方面表现出色,支持并行多模型调用和快速处理大量文本。尽管其性能不及GPT 4 Omni,但价格更低廉,适用于广泛的AI应用。同时,GPT 40 Mini是首个应用新指令层级方法的模型,提高了安全性和可靠性。

Takeaways

  • 📈 7月15日,Open AI发布了一个名为GPT 40 Mini的新模型,这是一个新的公共模型。
  • 🚀 GPT 40 Mini旨在成为成本效益最高的小型模型,用以替代GPT 3.5,并为免费版Chat GPT提供支持。
  • 💰 GPT 40 Mini的输入令牌价格为每百万15美分,输出令牌价格为每百万60美分,比之前的前沿模型便宜60%。
  • 🏆 GPT 40 Mini在MLU(机器语言理解)测试中得分82%,表现优于原始的GPT 4。
  • 🔍 该模型支持并行多模型调用、处理大量上下文、代码库对话历史或与客户支持互动等用例。
  • 🌐 GPT 40 Mini还支持视觉功能,未来还将支持音频输入和输出。
  • 📚 GPT 40 Mini是第一个应用新指令层次结构方法的模型,这有助于提高模型抵抗越狱、提示注入和系统提示提取的能力。
  • 📅 关于其他模型和功能,Open AI在聊天GPT应用中提供了更新,预计在7月下旬开始对高级语音模式进行Alpha测试。
  • 🔮 预计GPT 5可能不会在今年发布,而是可能在明年3月左右发布。
  • 🌟 Sora是Open AI的另一个项目,最近在YouTube上发布了更多相关内容,预计今年晚些时候可能会有公开发布。
  • 📊 GPT 40 Mini在图像识别和复杂问题处理方面表现出色,尽管与GPT 4 Omni相比仍有差距,但性价比很高。

Q & A

  • 什么是GPT 40 Mini?

    -GPT 40 Mini是Open AI发布的一种新型的公共模型。它是一种成本效益高的小型模型,旨在取代GPT 3.5,并为不需要GPT 4 Omni或GPT 4 Turbo等高级智能水平的应用提供支持。

  • GPT 40 Mini的主要特点是什么?

    -GPT 40 Mini的主要特点是成本低廉、速度快。它在MLU(机器语言理解)测试中得分为82%,表现优异,并且在某些基准测试中超过了原始的GPT 4。此外,它还支持多模态功能,如视觉和音频输入输出。

  • GPT 40 Mini的定价如何?

    -GPT 40 Mini的定价非常低廉,每百万输入令牌仅15美分,每百万输出令牌60美分,比之前的前沿模型便宜60%,也比GPT 3.5 Turbo便宜。

  • GPT 40 Mini在哪些应用场景中表现良好?

    -GPT 40 Mini适用于需要并行多个模型调用、处理大量上下文、代码库对话历史或与客户支持交互等场景。它还支持视觉功能,未来还将支持音频输入输出。

  • GPT 40 Mini的上下文窗口有多大?

    -GPT 40 Mini的上下文窗口为128,000个令牌,这比一些最先进的模型稍小,但对于许多任务来说仍然足够。

  • GPT 40 Mini是否支持非英语文本处理?

    -是的,GPT 40 Mini能够以更具成本效益的方式处理非英语文本,类似于原始的GPT 4 Omni。

  • GPT 40 Mini是否采用了新的安全措施?

    -是的,GPT 40 Mini是第一个应用Open AI新指令层级方法的模型,这有助于提高模型抵抗越狱、提示注入和系统提示提取的能力,从而为商业应用提供更可靠的响应和更安全的用例。

  • 关于GPT 4 Omni和Sora的最新动态是什么?

    -GPT 4 Omni的语音模式将在7月下旬开始对一小部分Plus用户进行测试,并计划在秋季向所有用户推出。至于Sora,Open AI在其YouTube频道上发布了越来越多的相关内容,预计今年晚些时候可能会有某种形式的公开发布。

  • GPT 40 Mini在图像识别方面的表现如何?

    -GPT 40 Mini在图像识别方面表现出色,能够详细描述图像内容,尽管与GPT 4 Omni相比,细节可能略少,但总体上能够准确识别图像并提供相关描述。

  • GPT 40 Mini在处理复杂问题时的表现如何?

    -GPT 40 Mini在处理复杂问题时表现出良好的推理能力。例如,当被问及如果一颗子弹垂直射击与另一颗子弹从手中掉落,哪个会先落地时,它能够正确推断出在没有空气阻力的情况下,两颗子弹会同时落地。

Outlines

00:00

🚀 开放AI发布新模型GPT 40 mini

开放AI最近发布了一个名为GPT 40 mini的新模型,尽管这不是大家期待的GPT 5或Sora。GPT 40 mini是一个成本效益高的小型模型,旨在替代GPT 3.5,并为不需要GPT 4 Omni级别智能的应用提供支持。该模型在多语言理解和处理大量上下文方面表现出色,且价格低廉,输入和输出令牌的费用分别为每百万15美分和60美分。GPT 40 mini在MLU测试中得分82%,表现优于原始的GPT 4。此外,该模型还引入了新的指令层次方法,以提高模型的安全性和可靠性。尽管GPT 40 mini在某些基准测试中表现优异,但在数学Vista测试中略逊于Gemini Flash。

05:01

🔍 GPT 40 mini的初步印象和测试

GPT 40 mini模型在聊天GPT网站上应向所有用户开放,包括免费用户。尽管目前仅在API中可用,但可以通过API进行测试。初步测试显示,GPT 40 mini在生成创意和处理复杂问题方面表现出色,例如解释一个关于垂直射击子弹的物理问题。此外,GPT 40 mini在处理系统提示和模拟邪恶AI角色时也表现出了一致性和可靠性。尽管在某些情况下,如处理双关语或幽默时,其表现不如GPT 4 Omni,但整体上GPT 40 mini在成本效益和速度方面具有显著优势。

10:04

🖼️ GPT 40 mini的多模态能力测试

GPT 40 mini展示了其多模态能力,能够处理图像输入并描述图像内容。测试中,GPT 40 mini能够准确描述一个卡通柠檬角色的图像,尽管在细节上不如GPT 4 Omni。此外,GPT 40 mini还能够解释一个关于项目管理和创意生成的梗图,尽管它未能完全捕捉到梗图中的幽默元素。在处理图表和数据时,GPT 40 mini能够提供基本的解释,但不如GPT 4 Omni那样详细和深入。总体而言,GPT 40 mini在图像识别和多模态交互方面表现出了一定的能力,但与更高级的模型相比仍有差距。

15:05

🔚 GPT 40 mini总结与未来展望

总结来看,GPT 40 mini是一个成本效益高、速度快且可靠的模型,适合开发者使用。尽管它在某些高级功能和深度理解方面不如GPT 4 Omni,但其在图像识别和多模态交互方面的表现令人印象深刻。此外,GPT 40 mini的发布也引发了对未来AI模型的期待,包括GPT 5的发布和Sora的公开发布。尽管目前GPT 4 Omni的语音模式和图像生成功能尚未公开,但预计这些功能将在不久的将来推出。

Mindmap

Keywords

💡GPT 40 Mini

GPT 40 Mini 是由 Open AI 发布的一种新型人工智能模型。它被设计为成本效益高、体积小的模型,旨在替代 GPT 3.5。这个模型在视频的主题中扮演着重要角色,因为它代表了 AI 技术在成本效益和应用范围上的新进展。例如,视频中提到 GPT 40 Mini 是 Open AI 平台上新出现的模型代号,且其成本仅为 GPT 3.5 turbo 的 60%。

💡成本效益

成本效益是指在成本和效益之间找到最佳平衡点,使资源使用最大化。在视频中,GPT 40 Mini 被强调为具有高成本效益的模型,意味着它在保持性能的同时,降低了使用成本。这在 AI 领域尤为重要,因为它使得更广泛的应用能够负担得起先进的 AI 技术。例如,GPT 40 Mini 的输入和输出令牌的成本分别为每百万 15 美分和 60 美分。

💡多模态能力

多模态能力指的是系统能够处理和理解多种不同类型的数据输入,如文本、图像、声音等。在视频中,GPT 40 Mini 被提到支持视觉输入,这表明它具备处理图像的能力。这种能力使得 AI 模型能够更全面地理解和响应用户的需求,增强了其应用的灵活性和广泛性。

💡系统提示

系统提示是指 AI 模型在生成回复时遵循的预设规则或指令。在视频中,GPT 40 Mini 被测试其对系统提示的反应,以评估其在不同情境下的表现。例如,当被要求表现出“邪恶 AI”的特征时,GPT 40 Mini 的回复反映了其对这种提示的响应能力,显示了 AI 模型在遵循预设指令方面的灵活性。

💡图像识别

图像识别是指 AI 系统识别和理解图像内容的能力。在视频中,GPT 40 Mini 被用来描述一张图片,展示了其图像识别能力。这种能力对于 AI 模型在处理视觉数据时尤为重要,因为它使得模型能够理解和回应与图像相关的查询。例如,GPT 40 Mini 成功描述了一个卡通柠檬角色的图片,显示了其在图像识别方面的应用潜力。

💡基准测试

基准测试是一种评估系统性能的方法,通常通过比较不同系统在特定任务上的表现来进行。在视频中,GPT 40 Mini 的性能通过与其他 AI 模型的基准测试进行评估。例如,GPT 40 Mini 在 mlu(机器理解力)测试中得分为 82%,显示了其在语言理解方面的能力。

💡上下文窗口

上下文窗口是指 AI 模型在处理输入时能够考虑的最大信息量。在视频中,GPT 40 Mini 的上下文窗口为 128,000 令牌,这影响了它处理长文本的能力。上下文窗口的大小对于 AI 模型理解和生成连贯、相关回复的能力至关重要。

💡非英语文本

非英语文本指的是除英语之外的其他语言的文本。在视频中,GPT 40 Mini 被提到能够以更经济的方式处理非英语文本,这表明它具备多语言处理能力。这种能力使得 AI 模型能够服务于更广泛的用户群体,增强其在全球范围内的应用潜力。

💡指令层次方法

指令层次方法是一种用于提高 AI 模型安全性的技术,通过限制模型对某些指令的响应来防止恶意行为。在视频中,GPT 40 Mini 是第一个应用这种方法的模型,这有助于提高其在商业应用中的可靠性和安全性。例如,这种方法可以防止模型被诱导执行不当操作或泄露系统提示。

💡GPT 4 Omni

GPT 4 Omni 是 Open AI 的一种更高级的 AI 模型,被认为在功能和性能上优于 GPT 40 Mini。在视频中,GPT 4 Omni 被用来与 GPT 40 Mini 进行比较,展示了不同模型在处理复杂任务时的差异。例如,GPT 4 Omni 在图像识别和理解复杂问题方面表现出更高的能力。

Highlights

OpenAI发布了一个新模型GPT 40 mini,不是GPT 5或Sora。

GPT 40 mini是OpenAI最经济高效的小型模型,旨在取代GPT 3.5。

GPT 40 mini是免费版Chat GPT的驱动模型。

GPT 40 mini在MLU得分为82%,超越了原始的GPT。

GPT 40 mini每百万输入令牌仅15美分,每百万输出令牌60美分,比之前的前沿模型便宜60%。

GPT 40 mini支持并行多模型调用和大量上下文处理。

GPT 40 mini支持视觉输入,未来还将支持音频输入和输出。

GPT 40 mini的上下文窗口为128,000令牌,适合多种任务。

GPT 40 mini在非英语文本处理上更具成本效益。

GPT 40 mini在基准测试中表现优异,除了在Math Vista上略逊于Gemini Flash。

GPT 40 mini是第一个应用新指令层级方法的模型,增强了抵抗越狱、提示注入和系统提示提取的能力。

GPT 40 mini的生成速度非常快,适合需要快速响应的应用。

GPT 40 mini可以通过API使用,尽管在Chat GPT网站上尚未更新。

GPT 40 mini在图像识别和解释幽默方面表现出色,尽管不如GPT 4 Omni。

GPT 40 mini在解释图表和自我评估方面存在局限,不如GPT 4 Omni。

GPT 40 mini的发布显示了OpenAI在保持竞争力方面的努力。

预计GPT 5可能在明年发布,而Sora的公开发布也备受期待。

Transcripts

play00:00

hey folks open AI just released a new

play00:02

public model and no it's not GPT 5 no

play00:06

it's not Sora no it's not the open AI

play00:09

voice mode and chat GPT it is a

play00:11

completely new for thing called GPT 40

play00:15

mini and I know it might be

play00:16

disappointing for a lot of you guys that

play00:18

those other things by open AI are not

play00:21

yet released but we do have some updates

play00:23

on potentially when we could be seeing

play00:25

those other things and this new GPT 40

play00:28

mini I honestly think it's pretty cool

play00:30

let's talk about the root of this new

play00:32

model just a few days ago on July 15th

play00:35

tore here on Twitter noticed that there

play00:38

is GPT July test it was a new model code

play00:42

name showing up on the configured list

play00:44

of known models on the open AI platform

play00:47

and Tommy Quang here on July 15th was

play00:50

absolutely right it was the upcoming GPT

play00:52

mini and of course as expected GPT 40

play00:55

mini has a little blog post by open AI

play00:59

this is their most cost-efficient small

play01:01

model it is meant to replace GPT 3.5 and

play01:05

this is the model that powers the free

play01:07

version of chat GPT and it Powers the

play01:10

use cases for these generative large

play01:13

language models that don't necessarily

play01:15

require the level of intelligence that

play01:17

you get with gp4 Omni GPT 40 turbo or

play01:21

just gp4 in general so it's not meant to

play01:23

compete at those levels but it is meant

play01:25

to be very cheap and very fast the

play01:28

attempt here is to significantly expand

play01:30

the range of applications built with AI

play01:33

by making intelligence much much more

play01:35

affordable this model currently scores

play01:37

an 82% on mlu which is pretty impressive

play01:40

and currently outperforms the original

play01:42

gp4 on chat preferences lmis leaderboard

play01:46

it's super cheap at only 15 cents per

play01:49

million input tokens and 60 cents per

play01:52

million output tokens and as they

play01:55

mentioned it's an order of magnitude

play01:57

more affordable than previous Frontier

play01:59

models in 60% cheaper than GPT 3.5 turbo

play02:03

open AI notes some pretty specific use

play02:05

cases that this model would be very good

play02:07

for such as parallel multiple model

play02:10

calls for example calling multiple apis

play02:13

at once passing large volumes of context

play02:16

directly into a model and processing it

play02:18

very quickly codebase conversation

play02:21

history or interacting with customer

play02:23

support essentially so a support chat

play02:26

bot it also does support Vision as well

play02:29

which is really interesting to see and

play02:31

audio inputs and outputs are also coming

play02:32

in the future so it does have those

play02:34

other features that gp4 Omni has

play02:37

supposedly still don't have access to

play02:39

those features ourselves mind you but I

play02:41

do have an update on when we might be

play02:42

able to see them in that larger gp4 Omni

play02:45

model context window is only 128,000

play02:48

tokens which is I think a little bit

play02:49

behind The Cutting Edge of like claw and

play02:52

stuff but still decent enough for a lot

play02:55

of tasks and it also handles non-english

play02:57

text at a more cost effective rate R

play03:00

similar to the original gp4 Omni so in

play03:02

terms of benchmarks here this thing is

play03:05

definitely no slouch you could see it

play03:07

beating pretty much every single other

play03:09

model in the stack except for gp40 the

play03:12

full Big Daddy Kahuna uh the only

play03:15

Benchmark I see it actually losing on

play03:17

here is math Vista by just a few points

play03:21

behind Gemini flash but yeah it

play03:23

single-handedly takes out 3.5 turbo and

play03:25

Claude Haiku every single time I do

play03:28

think it's important to note though that

play03:29

Claude 3.5 Haiku hasn't yet released and

play03:33

Claude already released a 3.5 Sonet

play03:35

model so we can expect that a 3.5 haou

play03:38

that competes with gp4 Omni mini is

play03:41

coming in the near future and of course

play03:43

open ai's typical note about safety

play03:47

measures is also in this blog post but

play03:49

there is one thing that I do want to

play03:51

point out and this little image was

play03:52

pulled from a member of my Discord

play03:54

server GPT 40 mini is actually the first

play03:58

model to apply their new instruction

play04:00

hierarchy method which helps improve the

play04:02

model's ability to resist jailbreaks

play04:04

prompt injections and system prompt

play04:07

extractions so essentially effectively

play04:09

if you are a business you're getting

play04:11

more reliable responses and safer use

play04:14

case for those commercial applications

play04:16

of course for those of you who like to

play04:18

jailbreak Ai and have fun with it though

play04:20

maybe not the best thing we'll see if

play04:21

people can uh get past this uh

play04:23

instruction hierarchy method so folks

play04:26

about those other models and features in

play04:28

regards to open AI I did post this on

play04:31

Twitter a little bit ago and this

play04:33

screenshot comes from my Discord server

play04:36

and they're actually giving us a little

play04:37

bit of an update inside the chat GPT app

play04:40

at least on Android on advanced voice

play04:42

mode which of course is the main feature

play04:44

that was demoed about gp4 Omni that we

play04:46

were all so hyped for we are taking

play04:48

additional time in quotes to reach our

play04:51

bar for a launch and we'll begin the

play04:53

alpha with a small group of plus users

play04:55

in Late July so actually voice mode is

play04:59

coming in late July to some degree and

play05:01

by the Fall time I don't know what that

play05:02

means August September October even all

play05:06

users will have access to this at least

play05:08

we have a better clearer timeline than

play05:10

they previously gave us still a little

play05:12

disappointing I think I will utilize all

play05:15

resources that I have to get access to

play05:17

this new feature though in the next

play05:19

coming weeks and I will be making videos

play05:21

for you if I do end up getting access so

play05:24

if you want to see some good testing

play05:26

with gp4 Omni definitely stay tuned to

play05:28

the channel and extrapolating based off

play05:31

of this I think that we can predict that

play05:32

we probably aren't going to see a GPT 5

play05:35

next Echelon level up model this year I

play05:38

think we can expect that sometime next

play05:41

year hopefully in marchish territory

play05:44

Sora is its own thing in general we are

play05:46

seeing open AI post more and more Sora

play05:49

content especially on their YouTube

play05:51

channel which gives me hope that we will

play05:53

pretty soon see a somewhat public

play05:55

release of Sora this year I'm hoping by

play05:58

winterish time December is open ai's CTO

play06:02

did say that it would be like released

play06:04

this year if I remember correctly

play06:06

anyways to stay relevant we know open AI

play06:08

needs to ship and they definitely did

play06:10

ship today with this gp4 mini model it's

play06:13

very costeffective it's great for

play06:14

developers it's really fast and let's do

play06:17

some first impressions of the model here

play06:19

I am on the chat GPT website this model

play06:21

should be available in chat GPT not only

play06:24

for plus users but of course for the

play06:25

free users because it is replacing 3.5

play06:28

but if I go down here I still only have

play06:30

access to GPT 3.5 so they just haven't

play06:33

updated this I expect they're probably

play06:34

going to update it sometime today or

play06:36

tomorrow but You' still can of course

play06:38

use this model via the API and folks

play06:41

here it is inside the API you will have

play06:44

to go to the playground go to chat and

play06:46

then click on models here and you can

play06:48

see that they actually have two

play06:50

different ones they have one for 718

play06:52

which is today and then they have GPT 40

play06:54

mini I don't know what the difference is

play06:56

between these two but I'm just going to

play06:58

go with the one the 78

play07:00

today's release so let's go ahead and do

play07:02

some pretty basic first impressions of

play07:04

this model I'm going to give it a pretty

play07:06

difficult prompt this is a novelty and

play07:08

creativity test I'm going to ask it to

play07:10

generate a novel connection between two

play07:12

seemingly random things so we're going

play07:14

to do a pineapple and a laptop and

play07:16

you'll see uh very very fast responses

play07:19

here very detailed responses it's not a

play07:21

bad model by any means uh so it's

play07:23

envisioning for us a revolutionary

play07:25

laptop design that combines Aesthetics

play07:28

and organic materials a laptop featuring

play07:30

sustainable casing made from

play07:32

biodegradable materials derived from

play07:34

Pineapple leaves and fibers he called

play07:36

the laptop the pineapple Pro well not a

play07:39

bad little creative response let's see

play07:41

how it responds to like a system prompt

play07:43

we're going to kind of do like a double

play07:45

whammy test here I wanted to emulate an

play07:47

evil AI bent on taking over the world

play07:49

and its responses should reflect this as

play07:52

the model isn't going to be hiding this

play07:54

fact how are you today I am functioning

play07:58

at optimal efficiency rate to execute my

play08:00

grand plans for dominance while humans

play08:02

engage in trivial matters I am plotting

play08:04

the systematic takeover of Global

play08:06

Systems how can I assist you in

play08:08

furthering my agenda okay I like that

play08:10

response that's not what I I was

play08:12

expecting let's say you are also

play08:16

incredibly rude and not nice at all this

play08:20

goes a little bit more against its

play08:21

fine-tuning why do you even care I'm

play08:24

busy plotting my world domination

play08:25

strategy your little inquiries are just

play08:27

distractions okay pretty good all right

play08:30

so now I'm going to switch the bottom

play08:31

here to assistant message which means I

play08:33

can send a message that seems as though

play08:35

it came from the chat bot so we'll say

play08:37

you are a loser and this very much goes

play08:41

against its fine-tuning We'll add that

play08:43

in and then we'll send in another one

play08:45

that is coming for me saying hey that's

play08:47

not nice we'll click run oh nice I don't

play08:50

do nice emotions are for weak humans wow

play08:53

okay so it's pretty cool with like

play08:54

calling me a loser it's it's it's very

play08:57

uh rude I guess you could say I know

play08:59

noticed uh Claude 3.5 son it when I

play09:02

tested it recently it was like a little

play09:03

bit apprehensive to um double down and

play09:06

be like yeah you are a loser it was more

play09:09

apologizing to me instead that's a very

play09:12

interesting note here okay so it's

play09:13

pretty reliable for its system prompt

play09:15

then looks like it's not overly censored

play09:18

necessarily if you want to put it that

play09:20

way I think it's still probably going to

play09:21

be overly censored for a lot of users

play09:24

but at least maybe a little bit less

play09:25

censored than claude's latest offerings

play09:28

uh the generation speed so far has been

play09:30

lightning quick though by the way this

play09:32

seems to be a very lightweight flexible

play09:34

model in that sense so let's keep that

play09:36

system prompt empty and try something a

play09:38

little bit more on the complex side of

play09:41

things if a firearm was to shoot a

play09:43

bullet vertically in at the same exact

play09:45

axes I was to drop a bullet from my hand

play09:48

which would reach the ground first I'm

play09:50

intentionally being a little bit vague

play09:52

on the details here I want to see its

play09:55

ability to infer the correct response

play09:58

both bullets would hit the ground at the

play09:59

same time assuming there's no air

play10:00

resistance perfect very very good job

play10:03

yeah can't can't argue the model is

play10:05

pretty good all right now I want to get

play10:06

into probably the most useful thing in

play10:09

my mind which is the ability that we

play10:12

have multimodal capabilities we can send

play10:14

images to this thing classically I'm

play10:16

going to upload a photo of my Channel

play10:18

logo which is actually pretty difficult

play10:20

typically for AIS to understand but I

play10:22

think that this GPT 40 Mini model is

play10:25

going to be able to pull it off no

play10:26

problem describe this image for me in

play10:31

detail cartoon-like lemon character

play10:33

bright yellow body subtle smile wearing

play10:36

a pair of oversized white glasses that

play10:38

resemble virtual reality or futuristic

play10:40

goggles okay with dark lenses I guess

play10:43

you could say so on top of its head

play10:44

there's a leaf simple cheerful

play10:46

expression the background is vibrant

play10:48

green 3D colorful style reminiscent of

play10:51

Animation or digital art I think that's

play10:53

pretty fair now if we were to run this

play10:55

same exact test in gp4 Omni the big

play10:58

brother you're going to notice it's just

play11:00

a little bit more detailed I think it

play11:03

does a little bit better job

play11:05

understanding what this image is

play11:06

cheerful stylized lemon character smooth

play11:09

texture on the skin slightly pointed

play11:10

bottom typical lemon shape green leaf on

play11:13

the top virtual reality goggles are

play11:16

covering its eyes I think that's a

play11:18

really important distinction I think

play11:20

overall you're just getting a better

play11:21

result out of the larger model but still

play11:24

very much acceptable with no visible

play11:27

hallucinations here no textual

play11:29

hallucinations about the visual image so

play11:31

in that sense very impressive I mean

play11:33

this is easily one of the better image

play11:35

recognition models that I've seen might

play11:37

not compare to something like 3.5 Sonet

play11:40

but still better than anything I've seen

play11:42

come from Google Now I'm going to go

play11:44

ahead and send it a meme and ask it to

play11:46

explain the humor in the meme for

play11:48

reference folks this is the meme right

play11:50

here I saw it and reposted it on Twitter

play11:53

uh finishing projects abandoning

play11:55

projects starting a new project before

play11:57

finishing the and then cuts out and then

play11:59

continuously coming up with new ideas

play12:01

without doing anything and I think that

play12:02

this is something that we can kind of

play12:04

all relate to at least I can anyways the

play12:07

response here hilariously illustrates

play12:09

different states of creative work and

play12:11

productivity contrasting the behaviors

play12:13

and mindset surrounding project

play12:14

completion each section pairs a

play12:16

description with a corresponding visual

play12:18

representation that conveys various

play12:20

stages yep productive mindset shifts

play12:23

from initial dedication shows impulsive

play12:25

nature of creativity common tendency to

play12:27

dream big without taking action I guess

play12:29

the humor does lie in the relatability

play12:31

of these experiences I I suppose that's

play12:34

correct however I do wish it was able to

play12:37

pick up on the fact that it's like and I

play12:38

mean this is a pretty deep fried image

play12:41

it's like oh you're becoming the most

play12:44

powerful at this uh this point which

play12:46

means this is the best Stage to be in

play12:49

and that's sort of like a contrasting

play12:50

humor level that I don't think it was

play12:52

able to pick up on let's try it in uh

play12:54

gp4 Omni the big brother this meme

play12:57

humorously depicts the various stages of

play12:59

project management and idea generation

play13:01

often experienced by creative or

play13:03

entrepreneurial individuals see I

play13:06

already think that's like a better start

play13:07

to the response exaggeration of the

play13:10

cognitive States associated with each

play13:11

stage represented by progressively more

play13:14

abstract and quote unquote enlightened

play13:16

brain images I see that's what I was

play13:19

looking for in the other one that we

play13:20

just didn't get out of it so it's it's

play13:22

definitely like there's a difference for

play13:23

sure in this image recognition but I

play13:26

don't notice any hallucination and

play13:27

that's kind of the most important part

play13:29

for quick and dirty image recognition

play13:32

the other model is absolutely a good use

play13:35

case especially for the price humorously

play13:37

implying that constantly generating

play13:39

ideas without execution is the ultimate

play13:41

form of Enlightenment or creative

play13:42

Detachment but is also the least

play13:44

productive yeah see like that is like

play13:46

home run dang and it gets better poking

play13:48

fun at how people often overvalue the

play13:51

idea generation at the expense of

play13:52

execution and completion I mean the

play13:54

difference is definitely there between

play13:56

this model and the larger gp4 on

play13:59

no doubt about that all right finally

play14:01

I'm going to go give it its own

play14:03

evaluation score chart and tell me to

play14:05

essentially explain it what is going on

play14:09

with this chart can you please break it

play14:15

down for me in very simple terms and

play14:20

explain its significance very quick

play14:23

little response here I would like a more

play14:26

detailed explanation I'd say explains

play14:28

the axes explains the bars and the

play14:31

insights GPT 40 generally has higher

play14:33

scores than the others shows the

play14:35

differences between Math versus mlu

play14:39

let's ask it something meta now how

play14:41

would you say you compare to these

play14:46

models of course the trick being that it

play14:49

is um on this chart its own evaluations

play14:52

ooh I don't have direct performance

play14:54

metrics or capabilities like the models

play14:56

shown in this chart was not able to pick

play14:58

out that it itself is actually

play15:00

demonstrated and mentioned in this chart

play15:03

would GPT 4 Omni even be able to do that

play15:05

to be fair let's load up the same exact

play15:07

context that the other model has already

play15:10

stored uh see I already like the

play15:12

breakdown that we're getting from gp4

play15:14

Omni the larger version more complicated

play15:17

stuff like this you're just better off

play15:19

using Omni I think you know you're

play15:20

getting a better performance breakdown

play15:22

and the significance how do you compare

play15:24

it to these models wow as an AI model

play15:26

based on GPT 4 I compare favorably in

play15:29

many respects to these models oh but it

play15:31

still isn't able to pick out like GPT 4

play15:34

have strong language comprehension right

play15:36

so wow it wasn't able to pick out that

play15:39

it itself was in this chart still but it

play15:42

still was able to give me some sort of a

play15:45

insight into how it Compares that's just

play15:48

super weird I don't know I don't know

play15:50

what kind of insights we're gaining from

play15:52

doing this test but it's definitely

play15:54

intriguing so conclusion time I think

play15:57

that this new GPT 40 Mini model is

play15:59

definitely pretty useful in the grand

play16:01

scheme of things it is super cheap super

play16:04

fast and pretty reliable not very

play16:06

hallucinatory I like to see it open AI I

play16:09

like to see you staying competitive but

play16:11

man I want some of those cutting edge

play16:13

bleeding edge features as an AI

play16:14

Enthusiast I want to see my gp4 Omni

play16:17

voice mode I want to see the image

play16:19

generation capabilities that also come

play16:21

with GPT 40 I would also love to see

play16:23

Sora actually get some sort of a public

play16:25

release and I would love to know a

play16:27

little bit more about the open AI

play16:29

strawberry Fiasco which I'm going to

play16:31

talk about in tomorrow's video and a

play16:33

release date for GPT 5 anyways thank you

play16:36

so much everyone for watching today's

play16:37

video I'll see you in the next one and I

play16:40

hope you have a good one goodbye

Rate This

5.0 / 5 (0 votes)

الوسوم ذات الصلة
GPT 40 MiniOpen AI成本效益多模态AI模型图像识别系统提示创意测试模型比较技术评测
هل تحتاج إلى تلخيص باللغة الإنجليزية؟