Is ChatGPT-4o Actually Better Than GPT-4? OpenAI's Newest Flagship Model and Its Capabilities
Summary
TLDR本视频对比测试了Open AI的新模型GBT 4与旧版GBT 4的性能。通过相同的查询和任务,包括网页浏览、PDF分析、图像处理和编程,来评估新模型的速度和效果。测试结果显示,GBT 40在速度和输出质量上均优于GBT 4。特别是在编码和文章创作任务中,GBT 40不仅响应更快,而且输出的结构和内容也更为优秀。此外,视频还提到,即使在免费计划中,用户也可以免费使用GBT 40,引发了是否需要保留GBT Plus计划的讨论。
Takeaways
- 🚀 视频介绍了Open AI的新模型GBT 40,并计划通过测试比较它与GBT 4的性能差异。
- 🕒 视频作者将使用计时器来量化比较新旧模型的速度,并阅读响应来评估质量。
- 📊 根据Twitter上的反馈,GBT 40被认为比GBT 4更好,视频将验证这一观点。
- 🔍 测试将涵盖编码、PDF分析、图像处理等多个方面,以全面评估模型能力。
- ⏱️ 在基于网页浏览特性的测试中,GBT 40使用了更多参考资料,但比GBT 4快了4秒。
- 📝 在PDF分析测试中,GBT 40提供了更清晰、更易读的答案,而GBT 4的格式和结构较差。
- 💻 在编码测试中,GBT 40不仅速度更快,而且提供的代码结构更具体、更完整。
- ✍️ 在版权写作测试中,GBT 40在27秒内给出了结构良好的文章,而GBT 4用了一分钟,且内容质量较差。
- 🆓 令人惊讶的是,即使在免费计划中,用户也可以免费使用GBT 40。
- 🤔 视频最后提出了一个问题,即是否还有必要保留Chad GBT Plus计划。
- 📹 视频作者计划制作一个新的视频,探讨为什么用户可能还想保留Chad GBT Plus计划。
Q & A
视频的主题是什么?
-视频的主题是对OpenAI的新模型GBT 4和GBT 40进行比较测试,以验证GBT 40是否确实比GBT 4更快更有效。
视频作者通过什么方式来测试新旧模型的性能?
-视频作者通过计时和阅读响应的方式来测试新旧模型的性能,包括响应速度和回答质量。
视频作者在Twitter上收到了怎样的反馈?
-视频作者在Twitter上收到了GBT 40比GBT 4更好的反馈,这给了他一个积极的预兆。
视频测试中使用了哪些功能来评估模型?
-视频测试中使用了网页浏览、PDF分析、图像处理和编码等功能来评估模型。
在网页浏览测试中,哪个模型使用了更多的参考资料?
-在网页浏览测试中,GBT 40使用了更多的参考资料,并且比GBT 4快了4秒给出答案。
PDF分析测试中,哪个模型的回答结构更好?
-在PDF分析测试中,GBT 40的回答结构更好,更易读,而GBT 4的回答格式和结构较差。
编码测试中,哪个模型的输出更快且质量更高?
-在编码测试中,GBT 40的输出更快,且提供的代码结构更具体,质量更高。
视频作者在测试中提到了哪些宠物?
-视频作者提到了澳大利亚牧羊犬和哈士奇作为可能的宠物选择。
在文章创作测试中,哪个模型的回答更快且结构更清晰?
-在文章创作测试中,GBT 40的回答更快,且结构更清晰,质量也更好。
视频作者最后得出的结论是什么?
-视频作者得出的结论是GBT 40在速度和质量上都优于GBT 4,并且对于免费用户来说,GBT 40是免费提供的。
Outlines
🚀 测试新模型GBT 4的性能
视频脚本的第一部分介绍了一个新的AI模型GBT 4,该模型被宣称比前一版本更快更有效。为了验证这一说法,视频作者计划通过同时向两个模型提出相同的问题,并计时比较它们的响应速度和质量。作者还提到了在Twitter上收到的反馈,这些反馈表明新模型确实有所改进。视频将涵盖编码、PDF分析和图像处理等多个方面的测试。
🔍 GBT 4与GBT 40的比较测试
第二部分详细描述了对GBT 4和GBT 40进行的一系列比较测试。首先测试了它们的网页浏览功能,结果显示GBT 40在速度和引用资料的数量上都优于GBT 4。接着进行了PDF文件分析测试,GBT 40在提供支持论文论点的关键事实方面表现更好,并且格式化和结构化也更优。此外,还进行了编码测试,GBT 40在提供React代码的质量和速度上都胜过GBT 4。最后,测试了版权写作能力,GBT 40在文章结构和速度上同样表现出色。视频作者得出结论,GBT 40在速度和质量上都优于GBT 4,并提到了免费计划用户可以免费使用GBT 40,这引发了关于是否需要保留GBT Plus计划的讨论。
Mindmap
Keywords
💡GBT 4
💡效率
💡对比测试
💡PDF分析
💡编码
💡响应时间
💡格式化
💡版权
💡文章创作
💡用户反馈
Highlights
视频介绍了Open AI的新模型GBT 4,并计划通过测试来验证其速度和效果是否优于前一版本。
通过Twitter收集用户反馈,初步判断新模型性能可能更好。
使用计时器量化比较两个模型的速度,并阅读回答以评估质量。
测试包括编码、PDF分析和图像处理等多个方面。
在网页浏览功能测试中,GBT 40使用了更多网站作为参考,并比GBT 4慢了4秒。
GBT 40的回答在某些方面比GBT 4更详细,例如分析师情绪。
在PDF分析测试中,GBT 40和GBT 4的处理时间相同,但GBT 40的格式和结构更好。
两个模型选择了不同的事实来支持PDF中的论点,观众需自行判断哪个更相关。
在编码测试中,GBT 40提供了更具体和有用的代码结构。
GBT 4在编码测试中的表现不如GBT 40,尤其是在CSS提供方面。
在版权创作测试中,GBT 40比GBT 4快了近一倍,并且结构更加清晰。
视频得出结论,GBT 40在速度和质量上都优于GBT 4。
即使在免费计划中,用户也可以免费访问GBT 40,引发了对GBT Plus计划价值的疑问。
视频最后提到将制作一个新的视频,探讨是否还有必要保留GBT Plus计划。
视频还提到了如何下载GBT应用程序到桌面,暗示可能有更多高级功能。
视频以一个幽默的结尾,提示观众点击随机视频可能会有不可预知的结果。
Transcripts
we got a new model from open AI called
gbt 4 now what we've been told about
this model is that it's a lot faster and
a lot more effective than gbt 4
therefore in today's video we're going
to put it to the test I'm going to time
both of these models ask them the exact
same prompt and see if it's actually
faster and then we're going to just read
the response and see if it's actually
better welcome back y'all in today's
video we're going to gut check gbt 40
and see if it's actually better than gbt
4 on my Twitter I've been getting some
respon is telling me it is better so
that's a good indication make sure to
follow me here let's see if it actually
is though cuz a lot of times when we are
told that a new models come out oh my
gosh it's so much better and you
actually play with it and you're just
like wait I don't really see a
substantial difference this is why we're
actually going to quantify that with a
timer and read it let's go ahead and
begin I have two chats open one is GT4
one's gb4 gbt 4
gbtb let's see if it works better we're
going to do coding we're going to do PDF
analysis we're going to do images we're
going to everything above the board
let's begin so my timer looks good but
you can see there's a little like
there's like a void that's just because
the button's green green screen I'm
going be timing both of these we're
going to be looking at this together the
first problem we're going to try out is
going to be basing it off its web
browsing feature I've been getting
feedback that it's a lot faster than it
used to be let's try it I'm going to go
ahead and shrink myself so I don't take
up anything I can't run both at the same
time just cuz how it's built and as you
see like it will switch the models so
we're going to use use my timer on my
phone I'm going to do 40 here let's see
if this thing is actually better so I'm
going to reset this and right when I hit
enter here I'm hit start
boom okay we're looking at around 10
seconds I know this is like this is
probably not the best because you it's
like flipped but 10 seconds for 40 same
question four and boom wait oopsie and
boom so left has three right has two and
that was around 14 seconds two major
things that we just took from this is
first one is that the one on the left
use more sites to reference uh left is
40 right is four I'm just going to say
chbt 4 and four I would say four 40 four
used two sites and it took four seconds
longer to give us the answer let's read
the answers now to When comparing the
two answers it seems like 4L is better
and you can pause the video and check
out yourself but the one thing I really
like about the 40 response specifically
to the question I asked is analyst
sentiment remains positive with a bu
consensus and then this one doesn't
really outline that but it does have
like an overall positive you know
sediment on the underlying stock and we
go to the first prompt here 40
definitely outshines for now another big
thing that a lot of people come to my
channel for or seen videos on is PDF
analysis or dealing with large data
files can 4o perform better than four in
this context let's go ahead and upload a
PDF study about sugar I got a free pdf
here found on the internet about the
impact of sugar on the body the brain
and behavior so we're going to ask a
couple questions here first let's go
ahead and attach this file and see what
it comes up with I'm going to click
attach here we have uploaded the PDF
let's ask a prompt got to make sure I
switch it back to 40 as you know anytime
you do one chat in one area it will
switch to the default model there we're
going to go ahead and I guess I need to
reattach the PDF with this PDF attached
we're going to ask the question what is
the biggest fact that supports the
thesis in this PDF so me saying PDF a
lot hit enter here and start
boom all right so we're looking at a
time around 17 17 and a half seconds to
give us an answer the biggest fact that
supports the thesis so in this context
it says the biggest fact that supports
the thesis is excessive sugar intake is
associated with obesity inflammatory
diseases and so on let's see what gbt 4
says PDF is attached we're in chat gbt 4
we're asking the same question let's hit
start other one took 17 and a half
seconds uhoh same amount of
time but definitely worse formatting and
worse uh structuring of the answer
itself this is three paragraphs easier
and more legible this is one paragraph
let's see what it says for the biggest
fact though interesting so they both
actually chose different facts here gbt
4 went with the fact that sugar is an
addictive behavior gbt 40 though went
over the fact that it causes different
types of diseases therefore you should
make a judgment on which fact you think
is more relevant in this cont context
let's try something else let's try some
coding yes coding we're going to go
ahead and see which one codes better one
of the easiest ways for me to see which
one could perform better on code is to
take a screenshot of a website and ask
for the j6 of in react for the code so
we can see which one formats the code
better which code looks better and then
overall see which one is faster yes I'm
doing the chewy website again do I have
a pet no if I were to get a pet what
would it be I'm thinking Australian
shepher
I'm not too sure yet I would prefer a
husky but I feel like you got to live in
a very cold environment and I would
never want to put a husky in the summer
in a very like you know if I'm in like
the heat you don't want to husky in the
Heat come on you want to be in the cor
environment for that slight Sidetrack
and I didn't realize they sell for
chickens or farm animals okay let's
screenshot this now that I've screenshot
it I'm going to Simply add it as an
attachment click the little paper clip
we are loading up the image here and
also if you didn't even know you could
do this this is pretty cool I'm going to
open this this is the screenshot we have
here let's make our prompt now I can get
more complex on how I structure this
prompt if you actually want to learn how
to code with Chad gbt check out this
video right here I show you how I code
with Chad gbt everything inail with it
it isn't as simple as just at like a
oneliner like this let's go and test
this right here resetting my phone and
my clock and we're going to get hit boom
and we're going
okay this is what I love gbt like even
if the code is not perfect at least it
gives you like some structuring right
cuz who wants to type all that all right
we'll keep going here and
stop around 36 seconds it took for this
output we're going to Quality check it
well let's go ahead and first ask that
same question to chbt 4 we're going to
add this file add the exact same prompt
no bias found here we going ahead and
reset the timer here reset and we are
going to proceed last one was 36 seconds
we are still going here we're reaching
40 seconds here it hasn't given us CSS
it's just j6 right now oh okay we're
seeing a pretty dramatic difference here
y'all and
stop 36
seconds 56 seconds
here 20 seconds more and the quality of
output was not as good how do you know
that Corbin right off the bat it didn't
give me even though it's simple CSS it
didn't even give me some CSS I could
play around with it just gave me the j6
on top of that let's go and go to the
top of each file here imported a
slightly relevant class I suppose but
coming to the actual structuring itself
H this one took more of a route of let's
just be more General smooth the cake
this one took a route of let's be more
specific what could be wrong like one
thing that could be very annoying in the
context of this output is it assumed all
these different images and the paths to
the images like it didn't this one
didn't assume that which is better
because if I wanted to give like it gave
it one time here this one gave it
multiple times here and when dealing
with code and dealing specifically a
path to an image you would probably want
to provide that so it just puts it out
right off the bat furthermore we got the
CSS here and this was overall a faster
output so it seems like 40 is going to
take the cake there last test here let's
go ahead and see how good it is at
copyrighting so specifically I'm going
to put a prompt here that's pretty long
but it's going to be for the context of
creating an article so boom we got a
very long prompt here we're going to go
ahead and put the desired headline for
said article we will go ahead and just
say poses impact on your health good and
bad I'm going to command a command C and
before we begin here let me make sure I
time this and boom it's going faster
it's structuring it
nicely I don't know if nicely is a word
or not there you go in 27 seconds let's
try this exact same prompt in gbt 4 27
seconds here I'm going reset and boom oh
they give us a big header okay y'all
there is actually a very significant
difference between the 40 output and the
four and we have Quantified that in
today's video because right now we are
reaching around we have reached around
an endpoint of a minute so a minute to
do this output compared it to this
output which was 27 seconds here and
just from a brief skimming look how it's
structuring
this comparative to how we got this oneu
structured out here which when it does
provide a specific point it's one
sentence and just not as good and it
took basically twice as long therefore
is Chad gbt 40 better than four I think
we can come to the conclusion yes we've
Quantified it with time when it comes to
speed and then we also double checked it
with quality and overall it seems like
40 is better than four what's even
crazier is that supposedly if you have a
free plan you get access to 40 for free
so I'm going to make a video later this
week of why you even keep your Chad gbt
plus plan as that has come into question
for a lot of people so make sure to
subscribe make sure to leave a like if
you found value in today's video and
I'll see you in the next video this
video right here shows you how to
download the Chad gbt app on your
desktop what is that Corbin there's a
lot of cool stuff that we couldn't do on
the web version that's a random video
maybe good maybe bad that's my face
click it something may or may not happen
I don't know
浏览更多相关视频
New GPT-4o VS GPT-4 - Ultimate Test (Prompts Included)
GPT-4o vs GPT-4: What's the difference?
The Future Of Software Engineering - NO MORE CODING
不要升級M2!除非······ 8GB+256GB 🆚 16GB+512GB 究竟會差多少?|彼得森
使用ChatGPT API构建系统2——输入评估: 分类
[ML News] Jamba, CMD-R+, and other new models (yes, I know this is like a week behind 🙃)
5.0 / 5 (0 votes)