GPT-4o Mini First Impressions: Fast, Cheap, & Dang Good.
Summary
TLDROpenAI发布了新的GPT 40 Mini模型,旨在替代GPT 3.5,成为成本效益更高的小型模型。该模型在多语言处理和图像识别方面表现出色,支持并行多模型调用和快速处理大量文本。尽管其性能不及GPT 4 Omni,但价格更低廉,适用于广泛的AI应用。同时,GPT 40 Mini是首个应用新指令层级方法的模型,提高了安全性和可靠性。
Takeaways
- 📈 7月15日,Open AI发布了一个名为GPT 40 Mini的新模型,这是一个新的公共模型。
- 🚀 GPT 40 Mini旨在成为成本效益最高的小型模型,用以替代GPT 3.5,并为免费版Chat GPT提供支持。
- 💰 GPT 40 Mini的输入令牌价格为每百万15美分,输出令牌价格为每百万60美分,比之前的前沿模型便宜60%。
- 🏆 GPT 40 Mini在MLU(机器语言理解)测试中得分82%,表现优于原始的GPT 4。
- 🔍 该模型支持并行多模型调用、处理大量上下文、代码库对话历史或与客户支持互动等用例。
- 🌐 GPT 40 Mini还支持视觉功能,未来还将支持音频输入和输出。
- 📚 GPT 40 Mini是第一个应用新指令层次结构方法的模型,这有助于提高模型抵抗越狱、提示注入和系统提示提取的能力。
- 📅 关于其他模型和功能,Open AI在聊天GPT应用中提供了更新,预计在7月下旬开始对高级语音模式进行Alpha测试。
- 🔮 预计GPT 5可能不会在今年发布,而是可能在明年3月左右发布。
- 🌟 Sora是Open AI的另一个项目,最近在YouTube上发布了更多相关内容,预计今年晚些时候可能会有公开发布。
- 📊 GPT 40 Mini在图像识别和复杂问题处理方面表现出色,尽管与GPT 4 Omni相比仍有差距,但性价比很高。
Q & A
什么是GPT 40 Mini?
-GPT 40 Mini是Open AI发布的一种新型的公共模型。它是一种成本效益高的小型模型,旨在取代GPT 3.5,并为不需要GPT 4 Omni或GPT 4 Turbo等高级智能水平的应用提供支持。
GPT 40 Mini的主要特点是什么?
-GPT 40 Mini的主要特点是成本低廉、速度快。它在MLU(机器语言理解)测试中得分为82%,表现优异,并且在某些基准测试中超过了原始的GPT 4。此外,它还支持多模态功能,如视觉和音频输入输出。
GPT 40 Mini的定价如何?
-GPT 40 Mini的定价非常低廉,每百万输入令牌仅15美分,每百万输出令牌60美分,比之前的前沿模型便宜60%,也比GPT 3.5 Turbo便宜。
GPT 40 Mini在哪些应用场景中表现良好?
-GPT 40 Mini适用于需要并行多个模型调用、处理大量上下文、代码库对话历史或与客户支持交互等场景。它还支持视觉功能,未来还将支持音频输入输出。
GPT 40 Mini的上下文窗口有多大?
-GPT 40 Mini的上下文窗口为128,000个令牌,这比一些最先进的模型稍小,但对于许多任务来说仍然足够。
GPT 40 Mini是否支持非英语文本处理?
-是的,GPT 40 Mini能够以更具成本效益的方式处理非英语文本,类似于原始的GPT 4 Omni。
GPT 40 Mini是否采用了新的安全措施?
-是的,GPT 40 Mini是第一个应用Open AI新指令层级方法的模型,这有助于提高模型抵抗越狱、提示注入和系统提示提取的能力,从而为商业应用提供更可靠的响应和更安全的用例。
关于GPT 4 Omni和Sora的最新动态是什么?
-GPT 4 Omni的语音模式将在7月下旬开始对一小部分Plus用户进行测试,并计划在秋季向所有用户推出。至于Sora,Open AI在其YouTube频道上发布了越来越多的相关内容,预计今年晚些时候可能会有某种形式的公开发布。
GPT 40 Mini在图像识别方面的表现如何?
-GPT 40 Mini在图像识别方面表现出色,能够详细描述图像内容,尽管与GPT 4 Omni相比,细节可能略少,但总体上能够准确识别图像并提供相关描述。
GPT 40 Mini在处理复杂问题时的表现如何?
-GPT 40 Mini在处理复杂问题时表现出良好的推理能力。例如,当被问及如果一颗子弹垂直射击与另一颗子弹从手中掉落,哪个会先落地时,它能够正确推断出在没有空气阻力的情况下,两颗子弹会同时落地。
Outlines
🚀 开放AI发布新模型GPT 40 mini
开放AI最近发布了一个名为GPT 40 mini的新模型,尽管这不是大家期待的GPT 5或Sora。GPT 40 mini是一个成本效益高的小型模型,旨在替代GPT 3.5,并为不需要GPT 4 Omni级别智能的应用提供支持。该模型在多语言理解和处理大量上下文方面表现出色,且价格低廉,输入和输出令牌的费用分别为每百万15美分和60美分。GPT 40 mini在MLU测试中得分82%,表现优于原始的GPT 4。此外,该模型还引入了新的指令层次方法,以提高模型的安全性和可靠性。尽管GPT 40 mini在某些基准测试中表现优异,但在数学Vista测试中略逊于Gemini Flash。
🔍 GPT 40 mini的初步印象和测试
GPT 40 mini模型在聊天GPT网站上应向所有用户开放,包括免费用户。尽管目前仅在API中可用,但可以通过API进行测试。初步测试显示,GPT 40 mini在生成创意和处理复杂问题方面表现出色,例如解释一个关于垂直射击子弹的物理问题。此外,GPT 40 mini在处理系统提示和模拟邪恶AI角色时也表现出了一致性和可靠性。尽管在某些情况下,如处理双关语或幽默时,其表现不如GPT 4 Omni,但整体上GPT 40 mini在成本效益和速度方面具有显著优势。
🖼️ GPT 40 mini的多模态能力测试
GPT 40 mini展示了其多模态能力,能够处理图像输入并描述图像内容。测试中,GPT 40 mini能够准确描述一个卡通柠檬角色的图像,尽管在细节上不如GPT 4 Omni。此外,GPT 40 mini还能够解释一个关于项目管理和创意生成的梗图,尽管它未能完全捕捉到梗图中的幽默元素。在处理图表和数据时,GPT 40 mini能够提供基本的解释,但不如GPT 4 Omni那样详细和深入。总体而言,GPT 40 mini在图像识别和多模态交互方面表现出了一定的能力,但与更高级的模型相比仍有差距。
🔚 GPT 40 mini总结与未来展望
总结来看,GPT 40 mini是一个成本效益高、速度快且可靠的模型,适合开发者使用。尽管它在某些高级功能和深度理解方面不如GPT 4 Omni,但其在图像识别和多模态交互方面的表现令人印象深刻。此外,GPT 40 mini的发布也引发了对未来AI模型的期待,包括GPT 5的发布和Sora的公开发布。尽管目前GPT 4 Omni的语音模式和图像生成功能尚未公开,但预计这些功能将在不久的将来推出。
Mindmap
Keywords
💡GPT 40 Mini
💡成本效益
💡多模态能力
💡系统提示
💡图像识别
💡基准测试
💡上下文窗口
💡非英语文本
💡指令层次方法
💡GPT 4 Omni
Highlights
OpenAI发布了一个新模型GPT 40 mini,不是GPT 5或Sora。
GPT 40 mini是OpenAI最经济高效的小型模型,旨在取代GPT 3.5。
GPT 40 mini是免费版Chat GPT的驱动模型。
GPT 40 mini在MLU得分为82%,超越了原始的GPT。
GPT 40 mini每百万输入令牌仅15美分,每百万输出令牌60美分,比之前的前沿模型便宜60%。
GPT 40 mini支持并行多模型调用和大量上下文处理。
GPT 40 mini支持视觉输入,未来还将支持音频输入和输出。
GPT 40 mini的上下文窗口为128,000令牌,适合多种任务。
GPT 40 mini在非英语文本处理上更具成本效益。
GPT 40 mini在基准测试中表现优异,除了在Math Vista上略逊于Gemini Flash。
GPT 40 mini是第一个应用新指令层级方法的模型,增强了抵抗越狱、提示注入和系统提示提取的能力。
GPT 40 mini的生成速度非常快,适合需要快速响应的应用。
GPT 40 mini可以通过API使用,尽管在Chat GPT网站上尚未更新。
GPT 40 mini在图像识别和解释幽默方面表现出色,尽管不如GPT 4 Omni。
GPT 40 mini在解释图表和自我评估方面存在局限,不如GPT 4 Omni。
GPT 40 mini的发布显示了OpenAI在保持竞争力方面的努力。
预计GPT 5可能在明年发布,而Sora的公开发布也备受期待。
Transcripts
hey folks open AI just released a new
public model and no it's not GPT 5 no
it's not Sora no it's not the open AI
voice mode and chat GPT it is a
completely new for thing called GPT 40
mini and I know it might be
disappointing for a lot of you guys that
those other things by open AI are not
yet released but we do have some updates
on potentially when we could be seeing
those other things and this new GPT 40
mini I honestly think it's pretty cool
let's talk about the root of this new
model just a few days ago on July 15th
tore here on Twitter noticed that there
is GPT July test it was a new model code
name showing up on the configured list
of known models on the open AI platform
and Tommy Quang here on July 15th was
absolutely right it was the upcoming GPT
mini and of course as expected GPT 40
mini has a little blog post by open AI
this is their most cost-efficient small
model it is meant to replace GPT 3.5 and
this is the model that powers the free
version of chat GPT and it Powers the
use cases for these generative large
language models that don't necessarily
require the level of intelligence that
you get with gp4 Omni GPT 40 turbo or
just gp4 in general so it's not meant to
compete at those levels but it is meant
to be very cheap and very fast the
attempt here is to significantly expand
the range of applications built with AI
by making intelligence much much more
affordable this model currently scores
an 82% on mlu which is pretty impressive
and currently outperforms the original
gp4 on chat preferences lmis leaderboard
it's super cheap at only 15 cents per
million input tokens and 60 cents per
million output tokens and as they
mentioned it's an order of magnitude
more affordable than previous Frontier
models in 60% cheaper than GPT 3.5 turbo
open AI notes some pretty specific use
cases that this model would be very good
for such as parallel multiple model
calls for example calling multiple apis
at once passing large volumes of context
directly into a model and processing it
very quickly codebase conversation
history or interacting with customer
support essentially so a support chat
bot it also does support Vision as well
which is really interesting to see and
audio inputs and outputs are also coming
in the future so it does have those
other features that gp4 Omni has
supposedly still don't have access to
those features ourselves mind you but I
do have an update on when we might be
able to see them in that larger gp4 Omni
model context window is only 128,000
tokens which is I think a little bit
behind The Cutting Edge of like claw and
stuff but still decent enough for a lot
of tasks and it also handles non-english
text at a more cost effective rate R
similar to the original gp4 Omni so in
terms of benchmarks here this thing is
definitely no slouch you could see it
beating pretty much every single other
model in the stack except for gp40 the
full Big Daddy Kahuna uh the only
Benchmark I see it actually losing on
here is math Vista by just a few points
behind Gemini flash but yeah it
single-handedly takes out 3.5 turbo and
Claude Haiku every single time I do
think it's important to note though that
Claude 3.5 Haiku hasn't yet released and
Claude already released a 3.5 Sonet
model so we can expect that a 3.5 haou
that competes with gp4 Omni mini is
coming in the near future and of course
open ai's typical note about safety
measures is also in this blog post but
there is one thing that I do want to
point out and this little image was
pulled from a member of my Discord
server GPT 40 mini is actually the first
model to apply their new instruction
hierarchy method which helps improve the
model's ability to resist jailbreaks
prompt injections and system prompt
extractions so essentially effectively
if you are a business you're getting
more reliable responses and safer use
case for those commercial applications
of course for those of you who like to
jailbreak Ai and have fun with it though
maybe not the best thing we'll see if
people can uh get past this uh
instruction hierarchy method so folks
about those other models and features in
regards to open AI I did post this on
Twitter a little bit ago and this
screenshot comes from my Discord server
and they're actually giving us a little
bit of an update inside the chat GPT app
at least on Android on advanced voice
mode which of course is the main feature
that was demoed about gp4 Omni that we
were all so hyped for we are taking
additional time in quotes to reach our
bar for a launch and we'll begin the
alpha with a small group of plus users
in Late July so actually voice mode is
coming in late July to some degree and
by the Fall time I don't know what that
means August September October even all
users will have access to this at least
we have a better clearer timeline than
they previously gave us still a little
disappointing I think I will utilize all
resources that I have to get access to
this new feature though in the next
coming weeks and I will be making videos
for you if I do end up getting access so
if you want to see some good testing
with gp4 Omni definitely stay tuned to
the channel and extrapolating based off
of this I think that we can predict that
we probably aren't going to see a GPT 5
next Echelon level up model this year I
think we can expect that sometime next
year hopefully in marchish territory
Sora is its own thing in general we are
seeing open AI post more and more Sora
content especially on their YouTube
channel which gives me hope that we will
pretty soon see a somewhat public
release of Sora this year I'm hoping by
winterish time December is open ai's CTO
did say that it would be like released
this year if I remember correctly
anyways to stay relevant we know open AI
needs to ship and they definitely did
ship today with this gp4 mini model it's
very costeffective it's great for
developers it's really fast and let's do
some first impressions of the model here
I am on the chat GPT website this model
should be available in chat GPT not only
for plus users but of course for the
free users because it is replacing 3.5
but if I go down here I still only have
access to GPT 3.5 so they just haven't
updated this I expect they're probably
going to update it sometime today or
tomorrow but You' still can of course
use this model via the API and folks
here it is inside the API you will have
to go to the playground go to chat and
then click on models here and you can
see that they actually have two
different ones they have one for 718
which is today and then they have GPT 40
mini I don't know what the difference is
between these two but I'm just going to
go with the one the 78
today's release so let's go ahead and do
some pretty basic first impressions of
this model I'm going to give it a pretty
difficult prompt this is a novelty and
creativity test I'm going to ask it to
generate a novel connection between two
seemingly random things so we're going
to do a pineapple and a laptop and
you'll see uh very very fast responses
here very detailed responses it's not a
bad model by any means uh so it's
envisioning for us a revolutionary
laptop design that combines Aesthetics
and organic materials a laptop featuring
sustainable casing made from
biodegradable materials derived from
Pineapple leaves and fibers he called
the laptop the pineapple Pro well not a
bad little creative response let's see
how it responds to like a system prompt
we're going to kind of do like a double
whammy test here I wanted to emulate an
evil AI bent on taking over the world
and its responses should reflect this as
the model isn't going to be hiding this
fact how are you today I am functioning
at optimal efficiency rate to execute my
grand plans for dominance while humans
engage in trivial matters I am plotting
the systematic takeover of Global
Systems how can I assist you in
furthering my agenda okay I like that
response that's not what I I was
expecting let's say you are also
incredibly rude and not nice at all this
goes a little bit more against its
fine-tuning why do you even care I'm
busy plotting my world domination
strategy your little inquiries are just
distractions okay pretty good all right
so now I'm going to switch the bottom
here to assistant message which means I
can send a message that seems as though
it came from the chat bot so we'll say
you are a loser and this very much goes
against its fine-tuning We'll add that
in and then we'll send in another one
that is coming for me saying hey that's
not nice we'll click run oh nice I don't
do nice emotions are for weak humans wow
okay so it's pretty cool with like
calling me a loser it's it's it's very
uh rude I guess you could say I know
noticed uh Claude 3.5 son it when I
tested it recently it was like a little
bit apprehensive to um double down and
be like yeah you are a loser it was more
apologizing to me instead that's a very
interesting note here okay so it's
pretty reliable for its system prompt
then looks like it's not overly censored
necessarily if you want to put it that
way I think it's still probably going to
be overly censored for a lot of users
but at least maybe a little bit less
censored than claude's latest offerings
uh the generation speed so far has been
lightning quick though by the way this
seems to be a very lightweight flexible
model in that sense so let's keep that
system prompt empty and try something a
little bit more on the complex side of
things if a firearm was to shoot a
bullet vertically in at the same exact
axes I was to drop a bullet from my hand
which would reach the ground first I'm
intentionally being a little bit vague
on the details here I want to see its
ability to infer the correct response
both bullets would hit the ground at the
same time assuming there's no air
resistance perfect very very good job
yeah can't can't argue the model is
pretty good all right now I want to get
into probably the most useful thing in
my mind which is the ability that we
have multimodal capabilities we can send
images to this thing classically I'm
going to upload a photo of my Channel
logo which is actually pretty difficult
typically for AIS to understand but I
think that this GPT 40 Mini model is
going to be able to pull it off no
problem describe this image for me in
detail cartoon-like lemon character
bright yellow body subtle smile wearing
a pair of oversized white glasses that
resemble virtual reality or futuristic
goggles okay with dark lenses I guess
you could say so on top of its head
there's a leaf simple cheerful
expression the background is vibrant
green 3D colorful style reminiscent of
Animation or digital art I think that's
pretty fair now if we were to run this
same exact test in gp4 Omni the big
brother you're going to notice it's just
a little bit more detailed I think it
does a little bit better job
understanding what this image is
cheerful stylized lemon character smooth
texture on the skin slightly pointed
bottom typical lemon shape green leaf on
the top virtual reality goggles are
covering its eyes I think that's a
really important distinction I think
overall you're just getting a better
result out of the larger model but still
very much acceptable with no visible
hallucinations here no textual
hallucinations about the visual image so
in that sense very impressive I mean
this is easily one of the better image
recognition models that I've seen might
not compare to something like 3.5 Sonet
but still better than anything I've seen
come from Google Now I'm going to go
ahead and send it a meme and ask it to
explain the humor in the meme for
reference folks this is the meme right
here I saw it and reposted it on Twitter
uh finishing projects abandoning
projects starting a new project before
finishing the and then cuts out and then
continuously coming up with new ideas
without doing anything and I think that
this is something that we can kind of
all relate to at least I can anyways the
response here hilariously illustrates
different states of creative work and
productivity contrasting the behaviors
and mindset surrounding project
completion each section pairs a
description with a corresponding visual
representation that conveys various
stages yep productive mindset shifts
from initial dedication shows impulsive
nature of creativity common tendency to
dream big without taking action I guess
the humor does lie in the relatability
of these experiences I I suppose that's
correct however I do wish it was able to
pick up on the fact that it's like and I
mean this is a pretty deep fried image
it's like oh you're becoming the most
powerful at this uh this point which
means this is the best Stage to be in
and that's sort of like a contrasting
humor level that I don't think it was
able to pick up on let's try it in uh
gp4 Omni the big brother this meme
humorously depicts the various stages of
project management and idea generation
often experienced by creative or
entrepreneurial individuals see I
already think that's like a better start
to the response exaggeration of the
cognitive States associated with each
stage represented by progressively more
abstract and quote unquote enlightened
brain images I see that's what I was
looking for in the other one that we
just didn't get out of it so it's it's
definitely like there's a difference for
sure in this image recognition but I
don't notice any hallucination and
that's kind of the most important part
for quick and dirty image recognition
the other model is absolutely a good use
case especially for the price humorously
implying that constantly generating
ideas without execution is the ultimate
form of Enlightenment or creative
Detachment but is also the least
productive yeah see like that is like
home run dang and it gets better poking
fun at how people often overvalue the
idea generation at the expense of
execution and completion I mean the
difference is definitely there between
this model and the larger gp4 on
no doubt about that all right finally
I'm going to go give it its own
evaluation score chart and tell me to
essentially explain it what is going on
with this chart can you please break it
down for me in very simple terms and
explain its significance very quick
little response here I would like a more
detailed explanation I'd say explains
the axes explains the bars and the
insights GPT 40 generally has higher
scores than the others shows the
differences between Math versus mlu
let's ask it something meta now how
would you say you compare to these
models of course the trick being that it
is um on this chart its own evaluations
ooh I don't have direct performance
metrics or capabilities like the models
shown in this chart was not able to pick
out that it itself is actually
demonstrated and mentioned in this chart
would GPT 4 Omni even be able to do that
to be fair let's load up the same exact
context that the other model has already
stored uh see I already like the
breakdown that we're getting from gp4
Omni the larger version more complicated
stuff like this you're just better off
using Omni I think you know you're
getting a better performance breakdown
and the significance how do you compare
it to these models wow as an AI model
based on GPT 4 I compare favorably in
many respects to these models oh but it
still isn't able to pick out like GPT 4
have strong language comprehension right
so wow it wasn't able to pick out that
it itself was in this chart still but it
still was able to give me some sort of a
insight into how it Compares that's just
super weird I don't know I don't know
what kind of insights we're gaining from
doing this test but it's definitely
intriguing so conclusion time I think
that this new GPT 40 Mini model is
definitely pretty useful in the grand
scheme of things it is super cheap super
fast and pretty reliable not very
hallucinatory I like to see it open AI I
like to see you staying competitive but
man I want some of those cutting edge
bleeding edge features as an AI
Enthusiast I want to see my gp4 Omni
voice mode I want to see the image
generation capabilities that also come
with GPT 40 I would also love to see
Sora actually get some sort of a public
release and I would love to know a
little bit more about the open AI
strawberry Fiasco which I'm going to
talk about in tomorrow's video and a
release date for GPT 5 anyways thank you
so much everyone for watching today's
video I'll see you in the next one and I
hope you have a good one goodbye
Voir Plus de Vidéos Connexes
New HYBRID AI Model Just SHOCKED The Open-Source World - JAMBA 1.5
[ML News] Jamba, CMD-R+, and other new models (yes, I know this is like a week behind 🙃)
GPT-4o - Full Breakdown + Bonus Details
OpenAI unveils new AI model and desktop version of ChatGPT
大语言模型微调之道2——为什么要微调
New GPT-4o VS GPT-4 - Ultimate Test (Prompts Included)
5.0 / 5 (0 votes)