GPT-4o - Full Breakdown + Bonus Details
Summary
TLDRGPT-4 Omni,作为OpenAI的最新人工智能模型,以其多模态输入输出、高效的编码能力和较低的延迟受到关注。该模型在文本、图像和视频处理方面展现出色的表现,如生成电影海报、提供客户服务模拟对话、以及实时翻译等。尽管在某些推理基准测试中表现参差不齐,但GPT-4 Omni在数学基准测试和多语言处理方面取得了显著进步。此外,OpenAI的桌面应用程序提供了实时编程协助,进一步增强了用户体验。尽管GPT-4 Omni并非完全达到通用人工智能(AGI)的水平,但其免费提供和多模态交互的特性预示着它可能会吸引数亿新用户,推动AI技术的普及和应用。
Takeaways
- 🚀 GPT-4 Omni 被描述为在多个方面更智能、更便宜、更快,并且在编程、多模态输入输出方面表现出色,能够从 Google 那里夺取焦点。
- 📈 GPT-4 Omni 的命名暗示了其多模态能力,OpenAI 计划从 100 万用户扩展到数亿用户,这可能预示着一个更智能的模型即将到来。
- 📷 GPT-4 在文本和图像的生成准确度上有显著提升,即使是在非演示环境下,生成的文本和图像的准确性也给人留下了深刻印象。
- 🎬 GPT-4 能够根据文本要求设计电影海报,并且在经过改进后的输出中,文本更清晰,颜色更鲜明,整体图像质量得到提升。
- 📹 GPT-4 展示了实时视频处理能力,尽管目前模型还无法进行视频输出,但这一功能预示着未来的可能性。
- 🎓 GPT-4 在数学基准测试上的表现比 GPT-3 有显著提升,尽管它在处理某些数学问题时仍然存在困难。
- 💰 GPT-4 的定价为输入 1 百万个标记 5 美元,输出 1 百万个标记 15 美元,相比其他模型如 Claude 3 Opus 更具成本效益。
- 🌐 GPT-4 在多语言性能上有所提升,尽管英语仍然是最适合的语言,但对非英语语言的支持也有所增强。
- 📱 OpenAI 推出了桌面应用程序,作为一个实时编程助手,这可能会改变开发者与 AI 交互的方式。
- 🤖 GPT-4 在实时演示中表现出了更快的响应时间和更高的交互性,这使得与 AI 的对话更加自然和流畅。
- ⏱️ GPT-4 的一个关键创新是降低了延迟,这使得 AI 的响应时间更接近人类,提高了交互的真实感。
- 🔍 尽管 GPT-4 在某些推理基准测试中表现良好,但在其他一些测试中表现参差不齐,特别是在对抗性阅读理解方面。
Q & A
GPT-4 Omni 与之前的模型相比有哪些显著的改进?
-GPT-4 Omni 在多个方面进行了改进,包括更智能、成本更低、速度更快、编码能力更强,并且在多模态输入输出方面表现出色。它还具有更好的时机,能够从 Google 那里夺取关注。
GPT-4 Omni 在用户规模上的预期增长是多少?
-GPT-4 Omni 预期将用户规模从100万扩展到数亿用户,这表明开发者对于模型的扩展性有着极高的信心。
GPT-4 Omni 在文本生成准确性方面有哪些突破?
-GPT-4 Omni 在文本生成的准确性上取得了显著进步,尽管并非完美,但已经能够生成具有高准确度的文本,这在以往的演示中是未曾见过的。
GPT-4 Omni 是否能够根据文本要求设计电影海报?
-是的,GPT-4 Omni 能够根据文本要求设计电影海报,并且当要求其改进时,它能够提供更清晰、颜色更鲜明、整体效果更佳的海报设计。
GPT-4 Omni 的发布时间是什么时候?
-GPT-4 Omni 的一些功能将在接下来的几周内发布,但具体的发布时间尚未明确。
GPT-4 Omni 在客户服务模拟中的表现如何?
-在模拟的客户服务场景中,GPT-4 Omni 能够成功地进行交流并完成任务,例如询问 Joe 的电子邮件地址并确认邮件的发送。
GPT-4 Omni 在编程编码方面的表现如何?
-GPT-4 Omni 在编程编码方面表现出色,与 GPT-2 等其他模型相比,它在人类等级排行榜上获得了更多的偏好。
GPT-4 Omni 的桌面应用程序有什么特点?
-GPT-4 Omni 的桌面应用程序是一个实时编程助手,它能够实时接收和响应用户的代码,提供实时的编程辅助功能。
GPT-4 Omni 在数学基准测试中的表现如何?
-尽管 GPT-4 Omni 在某些数学问题上存在失败,但它在数学基准测试中的表现相比原始的 GPT-4 有了显著提升。
GPT-4 Omni 在多语言性能方面有哪些提升?
-GPT-4 Omni 在多语言性能方面有所提升,尽管英语仍然是最适合的语言,但它在其他语言上的表现也有了进步。
GPT-4 Omni 的价格是多少,与 Claude 3 Opus 相比如何?
-GPT-4 Omni 的价格为每100万个输入令牌5美元,每100万个输出令牌15美元。相比之下,Claude 3 Opus 的价格为1575美元,且需要通过订阅服务注册。
GPT-4 Omni 在视频输入功能方面有哪些特点?
-GPT-4 Omni 支持视频输入功能,用户可以直接将视频流传输到 Transformer 架构。尽管视频输入的反应时间不如音频输入那样即时,但这一功能仍然令人印象深刻。
Outlines
🚀 GP4 Omni的智能与性能
第一段主要介绍了GP4 Omni的智能水平和性能。GP4 Omni在多个方面超越了Google,包括在编码、多模态输入输出、以及在正确的时间吸引注意力。作者通过基准测试和发布视频,对GP4 Omni的第一印象是它比AGI更像一个显著的进步。GP4 Omni的命名意味着它覆盖了多种模态,并且Open AI计划将其用户规模从100万扩展到数百万。此外,还提到了GP4 Omni在文本、图像生成和视频摘要方面的准确性和能力,以及它在即将发布的几周内的一些功能亮点。
📈 GP4 Omni的基准测试与定价
第二段聚焦于GP4 Omni的基准测试结果和定价策略。GP4 Omni在数学基准测试上的表现令人印象深刻,尽管它在某些数学问题上会失败。在Google的证明研究生测试中,它超越了Claude 3 Opus,这是Anthropic的主要基准。GP4 Omni的定价为每100万个输入令牌5美元,每100万个输出令牌15美元,相比Claude 3 Opus的价格1575美元,GP4 Omni显得更为经济。此外,还讨论了GP4 Omni在翻译、视觉理解评估和多语言性能方面的进步。
🎭 GP4 Omni的实时演示与应用
第三段展示了GP4 Omni的实时演示,包括它在对话中的实时响应能力和个性化的声音。GP4 Omni能够根据要求调整语速,并且能够模拟多种声音进行和谐歌唱。此外,还提到了GP4 Omni在实时翻译方面的潜力,以及它如何通过视频输入功能直接与Transformer架构交互,尽管视频反应时间不如音频那样即时。
🌐 GP4 Omni的普及与未来展望
第四段探讨了GP4 Omni的普及对AI领域的影响,以及对未来的展望。作者指出,尽管GP4 Omni并非在所有测试中都表现出色,但它的免费和多模态特性可能会吸引更多人使用AI。GP4 Omni的发布可能会使更多的人测试AI,并且它可能会成为目前可用的最智能的模型。同时,作者也提到了Open AI的其他更新和即将到来的产品,以及它们如何通过降低延迟来提高AI的真实感。最后,作者邀请观众加入AI Insiders的Discord社区,以获取更多关于AI的信息和交流。
Mindmap
Keywords
💡GPT-4 Omni
💡多模态
💡基准测试
💡实时翻译
💡视频输入
💡响应时间
💡智能代理
💡OpenAI
💡知识截止日期
💡定价策略
💡用户界面
Highlights
GPT-4 Omni 被描述为在多个方面更智能、更便宜、更快,并且在编码、多模态输入输出方面表现更好,完美地从 Google 那里夺取了焦点。
GPT-4 Omni 的命名暗示了它的多模态能力,预示着从100万用户扩展到数亿用户的计划,或者他们即将推出一个更智能的模型。
OpenAI 将 GPT-4 标榜为第 4 级智能,但实际上可能稍微低估了它,特别是在文本、图像和视频生成的准确性方面。
GPT-4 在设计电影海报的任务中展现了出色的创造力,甚至在被要求改进后,能够提供更清晰的文本和更鲜明的颜色。
GPT-4 能够在几周后发布的一项功能是,用户可以与之交互,进行实时的多模态交互,如视频通话和实时翻译。
GPT-4 在数学基准测试中的表现令人印象深刻,尽管它在某些数学提示上失败了,但与原始 GPT-4 相比仍有显著提升。
GPT-4 在 Google Proof Graduate 测试中击败了 Claude 3 Opus,这是 Anthropic 的主要基准测试,表明 GPT-4 的性能超越了竞争对手。
GPT-4 的定价为每 100 万个输入令牌 5 美元,每 100 万个输出令牌 15 美元,相对于 Claude 3 Opus 的定价具有竞争力。
GPT-4 在 DROP 基准测试中的表现略好于原始 GPT-4,但略逊于 Llama 3400b,显示出在推理能力上的细微差异。
GPT-4 在翻译和视觉理解评估方面取得了进步,特别是在非英语语言的令牌需求减少,使得对话更便宜、更快捷。
GPT-4 在多语言性能方面有所提升,尽管英语仍然是最适合的语言。
GPT-4 的视频输入功能令人印象深刻,可以直接将视频流式传输到 Transformer 架构,尽管反应时间不如音频输入那样即时。
GPT-4 能够产生多种声音,并且能够尝试和谐地唱歌,显示出其在声音生成方面的多样性和协调性。
GPT-4 的实时翻译功能预示着不久的将来可能会出现类似的功能,为用户提供更便捷的语言交流体验。
GPT-4 的发布可能会吸引更多的人使用 AI,尤其是当它作为一个免费且健谈的模型时,可能会使数百万人开始测试 AI。
GPT-4 的博客文章强调了将这种强大的 AI 工具免费提供给每个人的重要性,这体现了 OpenAI 的开放性。
GPT-4 的延迟降低是其关键创新之一,这使得它能够实现接近人类水平的响应时间和表现力。
GPT-4 在演示中的交互性,包括实时对话和对视频的反应,展示了其作为聊天机器人的潜力。
尽管 GPT-4 在某些基准测试中表现混杂,但它在多模态交互和实时翻译方面的潜力使其成为一个值得关注的模型。
Transcripts
it's smarter in most ways cheaper faster
better at coding multimodal in and out
and perfectly timed to steal the
spotlight from Google it's gp4 Omni I've
gone through all the benchmarks and the
release videos to give you the
highlights my first reaction was it's
more flirtatious sigh than AGI but a
notable step forward nonetheless first
things first GPT 40 meaning Omni which
is all or everywhere referencing the
different modalities it's got is Free by
making GPT 43 they are either crazy
committed to scaling up from 100 million
users to hundreds of millions of users
or they have an even smarter model
coming soon and they did hint at that of
course it could be both but it does have
to be something just giving paid users
five times more in terms of message
limits doesn't seem enough to me next
open AI branded this as GPT 4 level
intelligence although in a way I think
they slightly underplayed it so before
we get to the video demos some of which
you may have already seen let me get to
some more under the radar announcements
take text image and look at the accuracy
of the text generated from this prompt
now I know it's not perfect there aren't
two question marks on the now there's
others that you can spot like the I
being capitalized but overall I've never
seen text generated with that much
accuracy and it wasn't even in the demo
or take this other example where two
openai researchers submitted their
photos then they asked GPT 40 to design
a movie poster and they gave the
requirements in text now when you see
the first output you're going to say
well that isn't that good but then they
asked GPT 40 something fascinating it
seemed to be almost reverse psychology
because they said here is the same
poster but cleaned up the text is
crisper and the colors Bolder and more
dramatic the whole image is now improved
this is the input don't forget the final
result in terms of the accuracy of the
photos and of the text was really quite
impressive I can imagine millions of
children and adults playing about with
this functionality of course they can't
do so immediately because open AI said
this would be released in the next few
weeks as another bonus here is a video
that open AI didn't put on their YouTube
channel it mimics a demo that Google
made years ago but never followed up
with the openai employee asked GPT 40 to
call customer service and ask for
something I've skipped ahead and the
customer service in this case is another
AI but here is the conclusion could you
provide Joe's email address for me sure
it's Joe example.com
awesome all right I've just sent the
email can you check if Joe received it
we'll check right now please hold sure
thing Hey Joe could you please check
your email to see if the shipping label
and return instructions have arrived
fingers crossed yes I got the
instructions perfect Joe has received
the email they call it a proof of
concept but it is a hint toward the
agents that are coming here are five
more quick things that didn't make it to
the demo how about a replacement for
lensa submit your photo and get a
caricature of yourself or what about
text to new font you just ask for a new
style of font and it will generate one
or what about meeting transcription the
meeting in this case had four speakers
and it was transcribed or video
summaries remember this model is
multimodal in and out now it doesn't
have video out but I'll get to that in a
moment here though was a demonstration
of a 45-minute video submitted to GPC 40
and a summary of that video we also got
character consistency across both woman
and dog almost like an entire cartoon
strep if those were the quick bonuses
what about the actual intelligence and
performance of the model before I get to
official benchmarks here is a human
grade leaderboard pitting one model
against another and yes I'm also a good
gpt2 chatbot is indeed GPT 40 so it
turns out I've actually been testing the
model for days overall you can see the
preference for GPT 40 compared to all
other models in coding specifically the
difference is quite Stark I would say
even here though we're not looking at an
entirely new tier of intelligence
remember that a 100 ELO Gap is is a win
rate of around 2/3 so 1/3 of the time
gp4 turbos outputs would be preferred
that's about the same gap between GPT 4
Turbo and last year's GPT 4 a huge step
forward but not completely night and day
I think one underrated announcement was
the desktop app a live coding co-pilot
okay so I'm going to open the chbt
desktop app like miror was talking about
before okay and to give a bit of
background of what's going on so here we
have um a computer and on the screen we
have some code and then the chat gbt
voice app is on the right so chat gbt
will be able to hear me but it can't see
anything on the screen so I'm going to
highlight the code command C it and then
that will send it to chat GPT and then
I'm going to talk about the code to chat
GPT okay so I just shared some code with
you could you give me a really brief
one- sentence description of what's
going on in the code this code fetches
daily weather data for a specific
location and time period Smooths the
temperature data using a rolling average
and Ates a significant weather event on
the resulting plot and then displays the
plot with the average minimum and
maximum temperatures over the year I've
delayed long enough here are the
benchmarks I was most impressed with
Gypsy 40's performance on the math
benchmark even though it fails pretty
much all of my math prompts that is
still a stark improvement from the
original GPT 4 on the Google proof
graduate test it beats Claude 3 Opus and
remember that was the headline Benchmark
for anthropic in fact speaking of
anthropic they are somewhat challenged
by this release GPT 40 costs $5 per 1
million tokens input and $15 per 1
million tokens output as a quick aside
it also has 128k token context and an
October knowledge cut off but remember
the pricing 5 and 15 Claude 3 Opus is
1575 and remember for Claude 3 Opus on
the web you have to sign up with a
subscription but GPT 40 will be free so
for claw Opus to be beaten in its
headline Benchmark is a concern for them
in fact I think the results are clear
enough to say that gp40 is the new
smartest AI however just before you get
carried away and type on Twitter the AGI
is here there are some more mixed
benchmarks take the drop Benchmark I dug
into this Benchmark and it's about
adversarial reading comprehension
questions they're designed to really
test the reasoning capabilities of
models if you give models difficult
passages and they've got to sort through
references do some counting and other
operations how do they Fair the drop by
the way is discrete reasoning over the
content of paragraphs it does slightly
better than the original GPT 4 but
slightly worse than llama 3400b and as
they note llama 3400b is still training
so it's just about the new smartist
model by a hairs breath however we're
not done yet it's better at translation
than Gemini models quick caveat there
Gemini 2 might be announced tomorrow and
that could regain the lead then there
are the vision understanding evaluations
it was a real step forward on the mm muu
as you can see a clear 10 points better
than Claude Opus again I'm curious if
Google Gemini can exceed it though the
improvements to the tokenizer could be
revolutionary for non-english speakers
the dramatically fewer tokens needed for
languages like Gujarati Hindi Arabic and
more don't just mean that conversations
are cheaper they're also quicker and
what about multilingual performance well
this time they didn't compare it to
other models but compared it to the
original GPT 4 definitely a step up
across languages but English is still by
far the most suited language indeed here
is a video of some of the models
mistakes ending with some dodgy language
tuition I know enough Mandarin to say it
wasn't perfect at churing let's root
root root for the home
what was
that sorry guys I got carried
away right for
round
Francisco I have feeling I'm very wrong
hello uh my name is NCH I'm here with my
coworker hi I'm sh I'm trying to teach
my coworker how to speaking manding we
want to start from simple words like
niow can you teach him how to pronounce
that of course hey Nao nice to meet you
KNE how is pretty straightforward to
pronounce it sounds like KN how just
make sure to keep the KNE part high and
then go down and pitch on the how give
it a try me how
that was great really good first try not
you you're natural it really wasn't he
needs to work on his tones and her face
was the giveaway there were a lot of
other interesting video demonstrations
but before them the GPT 40 blog post
from Sam opman put out tonight he made
the argument that putting this capable
AI tool in the hands of everyone for
free is justifying that open part of
open AI less about open weights or open
source more about open to to everyone
without ads or anything like that he
also draws our attention to the latency
of the models in many ways that is the
key Innovation with GPT 40 by dialing
down the latency you dial up the realism
as he said it feels like AI from the
movies getting to human level response
times and expressiveness turns out to be
a big change indeed I think I should get
a little credit for predicting her like
AI in a video from a month ago but now I
want to get to those demos the response
times were amazing and the model was
clearly designed to be flirtatious note
that in Senate testimony last year
samman said we try to design systems
that do not maximize for engagement so
let me know in the comments whether you
think these demos are designed to
maximize engagement okay so this is what
I wrote down what do you
see a I see I love chat chpt that's so
sweet of
you yeah well I really appreciate all
the
help so yeah as we can see
um wow that's quite the outfit you've
got on yeah as we can see you know we
can chat in real time I was impressed
though that it could speed up its
talking on demand of course one two
three hey actually that's um that's a
little slow could you count
faster Sure Thing 1 2 3 4 5 6 7 8 9 10
okay that was a little bit too fast of
course open AI must be given credit
because for blind people this could be
revolutionary right now the Royal
Standard flag is flying above Buckingham
Palace which is the signal that the
Monarch is present how exciting to be
there while the king is home try and
tell me exactly what they're doing right
now please um right now the Ducks are
gently gliding across the water it will
also help you with interview prep by
analyzing your appearance how about you
I'm awesome list listen I got some huge
news oh do tell I'm allers well in a few
minutes I'm going to be interviewing at
open aai have you heard of them open AI
huh sounds Vaguely
Familiar kidding of course that's
incredible Rocky what kind of interview
well it's for a software engineering
role but I just need to know do I look
presentable
professional well
Rocky you definitely have the I've been
coding all night look down which could
actually work in your favor maybe just
run a hand through your hair or lean
into the Mad genis SP Your Enthusiasm is
what's really going to shine through I
don't have a lot of time so I'm just
going to throw this on what do you
think oh Rocky that's quite a statement
piece I I mean you you'll definitely
stand out though maybe not in the way
you're hoping for an interview okay I
got it I got it I could definitely see
GPT 40 being used for mathematics
tutoring here is a 30second clip but
notice also the slight glitchiness at
certain moments relative to angle Alpha
all right so I'm pretty sure this is the
angle Alpha right here
right cor correct now looking at the
triangle which side do you think is a
hypotonus
um remember the hyot I'm not totally
sure I think I think it might be this
one but I really am not sure this side
AC you're close I also noticed quite a
few glitches in the demo sounds cutting
in and out the one feature though the
open AI wants you to really focus on is
the video in capacity you can live
stream video straight to the Transformer
architecture behind GPT 40 and so the
video in functionality is certainly
impressive note though that the reaction
time of GPC 40 isn't quite as immediate
as for audio Greg Brockman the
co-founder of open aai has to prompt it
to react to the Intruder overall stylish
feel they might be getting ready for a
conversation or a presentation given
their direct engagement with the
camera sounds like we're in for an
intriguing interaction I'm curious about
the lighting you mentioned could you
describe it a bit more is it natural
light are we talking unique fixtures or
colors that add to the
atmosphere the lighting here seems to be
a mix of natural and artificial there's
a noticeable bright light overhead
probably a fixture casting a focused
beam downwards which creates a spotlight
effect this adds a dramatic and modern
feel to the scene the rest of the room
is lit more softly possibly by natural
light filtering anything did anything
unusual happen
recently Yes actually uh just as you
asked another person came into view
behind the first person they playfully
made bunny ears behind the first
person's head and then quickly left the
frame it added a light-hearted and
unexpected moment to the scene now the
focus is back on the original person
with the leather jacket nevertheless gbt
40 can produce multiple voices that can
sing almost in harmony
and really try to harmonize
here San Francisco San Francisco in the
month of May but maybe make it more
dramatic and make the soprano
higher San Francisco in the month of May
San franisco in the month of May it's a
Friday C may we are harmonizing are
Harmon great thank you and I suspect
this real time translation could soon be
coming too Siri later for us so every
time I say something in English can you
repeat it back in Spanish and every time
he says something in Spanish can you
repeat it back in English sure I can do
that let's get this translation train
rolling um hey how's it been going have
you been up to anything interesting
recently
hey I've been good just a bit busy here
preparing for an event next week why do
I say that because Bloomberg reported
two days ago that apple is nearing a
deal with open AI to put chat GPT on
iPhone and in case you're wondering
about GPT 4.5 or even five samman said
we'll have more stuff to share soon and
Mira murati in the official presentation
said that would be soon updating us on
progress on the next big thing whether
that's empty hype or real you can decide
no word of course about openai
co-founder ilas Sask although he was
listed as a contributor under additional
leadership overall I think this model
will be massively more popular even if
it isn't massively more intelligent you
can prompt the model now with text and
images in the open AI playground all the
links will be in the description note
also that all the demos you saw were in
real time at 1X speed that I think was a
nod to Google's botch demo of course
let's see tomorrow what Google replies
with to those who think that GPT 40 is a
huge dry towards AGI I would Point them
to the somewhat mixed results on the
reasoning benchmarks expect GPT 40 to
still suffer from a massive amount of
hallucinations to those though who think
that GPT 40 will change nothing I would
say this look at what chat GPT did to
the popularity of the underlying GPT
series it being a free and chatty model
brought a 100 million people into
testing AI GPT 40 being the smartest
model currently available and free on
the web and multimodal I think could
unlock AI for hundreds of millions more
people but of course only time will tell
if you want to analyze the announcement
even more do join me on the AI insiders
Discord via patreon we have live meetups
around the world and professional best
practice sharing so let me know what you
think and as always have a wonderful day
浏览更多相关视频
[ML News] Jamba, CMD-R+, and other new models (yes, I know this is like a week behind 🙃)
New GPT-4o VS GPT-4 - Ultimate Test (Prompts Included)
GPT-4o Mini First Impressions: Fast, Cheap, & Dang Good.
Googles GEMINI 1.5 Just Surprised EVERYONE! (GPT-4 Beaten Again) Finally RELEASED!
Google IO 2024: The Gemini Era!
一招让你的ChatGPT变聪明|context window原理讲解
5.0 / 5 (0 votes)