OpenAI o1 VS Sonnet 3.5 in Coding Physics Games - AI Showdown
Summary
TLDRIn diesem Video vergleicht der Sprecher zwei künstliche Intelligenz-Modelle, das Sonet 35 und das neue OpenAI Model, indem er sie dazu herausfordert, einen Auto-Parkplatz-Simulator mit Physik zu entwickeln. Während das Sonet 35 bei früheren Versuchen scheiterte, gelingt es dem OpenAI Modell in einem einzigen Versuch, was seine überlegene Leistung zeigt. Der Sprecher demonstriert, wie das OpenAI Modell auch komplexere Aufgaben wie die Entwicklung eines 3D-Parkplatz-Simulators annimmt, wobei es einige Herausforderungen und Fehler macht, die auf seine Grenzen hindeuten.
Takeaways
- 😀 Der Videoinhalt dreht sich um die Gegenüberstellung zweier KI-Modelle, um deren Leistung bei der Entwicklung eines Parkplatzsimulators mit Physik zu testen.
- 🤖 Das Modell Sonet 35 aus Anthropic wird mit dem neuen OpenAI-Modell O1 verglichen, wobei O1 in zwei Versionen existiert: O1 Preview (größer, langsamer, teurer, besser) und O1 Mini (schneller, kleiner, günstiger).
- 🚗 Die Herausforderung besteht darin, ein Spiel im GTA-Stil mit realistischen Physik- und Radverhalten zu entwickeln, was Sonet 35 bisher nicht schaffte.
- 📊 OpenAI-Modelle zeigen eine signifikante Leistungssteigerung in mathematischen Problemlösungen im Vergleich zu früheren Modellen, wie GP4.
- 💡 O1-Modelle sind speziell darauf trainiert, länger zu denken und ihre Überlegungen vor dem Lösen von Problemen zu verbergen, um dann eine Zusammenfassung zu geben.
- 🔧 O1 Preview schaffte es, den Parkplatzsimulator in einem einzigen Versuch zu entwickeln, was Sonet 35 nicht konnte.
- 🎯 Durch die Iteration mit dem Code von O1 Preview konnte WebSim das Spiel weiter verbessern, indem es zusätzliche Funktionen wie Parkplätze, eine Geschwindigkeitsanzeige und eine Punktzahl hinzufügen konnte.
- 🛠 Die Anforderungen für die KI-Modelle sind hoch, da sie nicht nur Code schreiben, sondern auch verstehen müssen, wie externe Bibliotheken und Physik-Engines funktionieren.
- 🔄 Es zeigt sich, dass selbst hoch entwickelte KI-Modelle wie O1 Preview nicht alle Aufgaben perfekt lösen können und es zu Iterationen und Fehlern kommt.
- 🔮 Die Zukunft der KI-Entwicklung scheint darauf ausgerichtet, intelligentere Modelle für schwierige Probleme zu nutzen und dann effizientere Modelle für weitere Verfeinerungen einzusetzen.
Q & A
Welche beiden Modelle werden im Video verglichen?
-Im Video werden das Sonet 35 und das neue OpenAI Model verglichen.
Was ist das Hauptziel des Vergleichs zwischen den Modellen?
-Das Hauptziel ist zu sehen, wie gut beide Modelle einen Parkplatzsimulator mit Physik schreiben können.
Warum wurde das Sonet 35 Model vorher nicht erfolgreich bei der Entwicklung eines Parkplatzsimulators?
-Das Sonet 35 Model scheiterte wiederholt, weil es die komplexen Anforderungen des Simulators, wie realistische Physik und Raddrehungen, nicht korrekt implementieren konnte.
Was ist ein Beispiel für die verbesserte Leistung des OpenAI Model 01?
-Das OpenAI Model 01 konnte 83% der mathematischen Probleme korrekt lösen, im Gegensatz zum vorherigen Modell, das nur 13% richtig beantworten konnte.
Wie unterscheidet sich das OpenAI Model 01 von früheren Modellen?
-Das OpenAI Model 01 wurde speziell trainiert, um länger zu denken, uncensored, und dann seine Überlegungen zu summarieren und sie dem Benutzer zu zeigen, bevor es die Probleme löst.
Welche Einschränkungen gibt es bei der Verwendung des OpenAI Model 01?
-Es gibt eine begrenzte Anzahl an Aufrufen pro Woche, nämlich 30 für das größere Modell (01 preview) und 50 für das kleinere Modell (01 mini).
Was versucht der Uploader mit dem OpenAI Model 01 zu erreichen, das er mit Sonet 35 nicht konnte?
-Der Uploader versucht, mit dem OpenAI Model 01 einen Parkplatzsimulator zu entwickeln, der realistische Physik und Raddrehungen umsetzt, was er mit Sonet 35 nicht schaffen konnte.
Wie wurde die Leistung des OpenAI Model 01 in der Entwicklung des Parkplatzsimulators bewertet?
-Das OpenAI Model 01 schaffte es, einen funktionierenden Parkplatzsimulator in einem einzigen Versuch zu entwickeln, was eindrucksvoll ist, da es ohne Fehler und ohne Korrekturen funktionierte.
Was zeigte der Uploader, indem er das Ergebnis des OpenAI Model 01 an das WebSim Model weitergab?
-Der Uploader zeigte, dass das Ergebnis eines intelligenteren Modells (OpenAI Model 01) von einem weniger leistungsfähigen Modell (WebSim) korrekt verwendet und verbessert werden kann.
Was versucht der Uploader als nächstes, um die Grenzen des OpenAI Model 01 zu testen?
-Der Uploader versucht, das OpenAI Model 01 zu einem noch schwierigeren Test herauszufordern, indem er es auffordert, einen 3D-Parkplatzsimulator mit realistischer Physik zu entwickeln.
Outlines
🚀 Vergleich von AI-Modellen für ein Parksimulationsspiel
Dieser Absatz stellt einen Vergleich zwischen dem AI-Modell Sonet 3.5 und einem neuen OpenAI-Modell vor, um zu sehen, wie gut sie ein Parksimulationsspiel mit Physik entwickeln können. Der Sprecher beschreibt, wie er bereits mit Sonet 3.5 experimentiert hat, das jedoch wiederholt scheiterte. Das OpenAI-Modell wird als ein reasoning Model vorgestellt, das in zwei Versionen verfügbar ist: '01 preview', das größer und langsamer ist, und '01 mini', das schneller und kostengünstiger ist. Die Tests zeigen, dass das OpenAI-Modell in der Lage ist, das Spiel von Anfang an korrekt zu entwickeln, während Sonet 3.5 immer wieder scheitert.
🔍 Herausforderungen bei der Entwicklung eines Parksimulationsspiels
In diesem Absatz wird erläutert, warum die Entwicklung eines Parksimulationsspiels mit Physik so schwierig ist. Es wird darauf hingewiesen, dass selbst kleine Fehler in der Physikimplementierung das Spiel unbrauchbar machen können. Der Sprecher beschreibt, wie er versucht hat, das Modell dazu zu bringen, ein solches Spiel zu entwickeln, und wie dies mit Sonet 3.5 nicht funktionierte. Es wird auch erwähnt, dass das OpenAI-Modell in der Lage war, ein Spiel zu entwickeln, das den Anforderungen entspricht, aber noch nicht perfekt ist.
🚗 Verbesserung des Spiels mit weiterführenden Anpassungen
Der Sprecher zeigt, wie er das von OpenAI entwickelte Spiel weiter verbessern lässt, indem er das weniger leistungsfähige Modell WebSim dazu verwendet, das Spiel zu erweitern und zu verbessern. Es werden Features wie ein Tachometer, Parkplatzzeichen, eine Punktzahl und Gebäude hinzugefügt. Es wird gezeigt, wie durch die Iterationen und Anpassungen das Spiel immer besser wird, und wie die verschiedenen Modelle zusammenarbeiten können, um ein besseres Endprodukt zu erstellen.
🛠 Test und Iteration mit dem GP4-Modell
In diesem letzten Absatz wird eine Live-Test-Situation mit dem GP4-Modell beschrieben. Der Sprecher will ein 3D-Parksimulationsspiel entwickeln und stellt hohe Anforderungen an die Realismus und Interaktivität des Spiels. Es wird gezeigt, wie das Modell versucht, die Anforderungen zu erfüllen, aber auch, wie es bei der Umsetzung in die Praxis Schwierigkeiten hat. Es wird betont, dass, obwohl das Modell schneller und fähig ist, komplexere Aufgaben zu bewältigen, es immer noch nicht auf der Stufe eines erfahrenen menschlichen Entwicklers ist und weiterhin an einigen Herausforderungen scheitert.
Mindmap
Keywords
💡Sonet 35
💡OpenAI-Modell
💡Parkplatz-Simulator
💡Physik-Engine
💡Rechtschreibprüfung
💡Code-Generierung
💡Canvas
💡JavaScript
💡CSS
💡Interaktivität
Highlights
Comparing Sonet 3.5 and OpenAI's new model for coding a car parking simulator with physics.
Sonet 3.5's repeated failures in creating a physics-based car parking simulator.
Introduction of OpenAI's new reasoning model, featuring two versions: '01 preview' and '01 mini'.
Limited access to the new models with 30 and 50 calls per week respectively.
The new model's ability to solve 83% of mathematical problems in the International Mathematics Olympiad, a 600% improvement over previous models.
The model's training to think longer and uncensored, then summarize its thinking for problem-solving.
Testing the new model by asking it to build an educational parking game with realistic physics.
CLA's initial failure to create the game, despite understanding the prompt.
Web's attempt with the same prompt, resulting in errors and eventual partial success.
The complexity of integrating physics engines for top-down games and the model's challenges in handling it.
OpenAI's '01 preview' model's successful creation of a basic, working car parking simulator in one attempt.
The impressive single-shot success of '01 preview' compared to Sonet 3.5's repeated failures.
Demonstration of using the output from '01 preview' as a starting point for further improvements with a less capable model.
The potential of using smarter models for complex tasks and cheaper, faster models for routine improvements.
Live testing with the GP4 model to push its limits by asking for a 3D parking game with physics.
The model's struggle with the complexity of creating a 3D game, showing it's not yet at human-level development.
The model's iterative process in attempting to correct the wheel positioning in the 3D game.
Final thoughts on the model's capabilities, its room for improvement, and future potential.
Transcripts
hello today in this video we're going to
pit two models one against the other to
see how they perform one model is Sonet
35 that you can use in Claude or in
websi another model is a new open ai1
model and the way we're going to do it
is by asking them to write a car parking
simulator with
physics and this is something that Sonet
35 repeatedly failed to do for me before
so it's going to be interesting why this
going to be interesting well o one is
new open AI reasoning model it's
actually two models in chpt you will see
01 preview which is larger slower more
expensive better model and there is 01
mini which is faster smaller cheaper you
will only get 30 calls to o1 preview per
week and 50 calls to One Mini per week
so kind of limited for now now what does
mean that it's a reasoning model well
one example Opia gives in their blog is
this number their previous best model
gp4 could only solve
13% of mathematical problems correctly
in international mathematics Olympiad
while 01 can score 83 This is 6X
Improvement not 100x Improvement it's
still 600% Improvement so not as high as
hype but pretty high and very very
impressive how does it do it well it was
specifically trained to think for longer
uncensored in a way that open a hides so
it thinks but you do not see it then it
summarizes and sensors its thinking
shows it to you and proceeds to solve
your problem and answer your questions
this together allowed it this kind of 6X
Improvement which is again very
impressive but I wanted to test it for
myself how can I myself build this
Improvement
and what is the easiest way to do this
to answer like is it better the easiest
way is by comparison so I want to
compare it today to a previous best
model I know of which is Sonet 35 from
anthropic in Cloe and webin now however
good the model is however impressive it
is it did very impressive things for me
before surprised me I asked it once to
write for me educational game of how to
park top down game with physics with
rotating Wheels with acceleration with
brakes
and with Trails for car wheels to show
where everything is when you drive kind
of a visualization for you to learn how
to park correctly and this is what I
asked before I asked it to ride this
kind of game and it was failing time and
time and time again well today we will
try and see what 01 can do for this
problem so this is a fight and this is
round one let's
go okay so here is a chat I had with CLA
I copy pasted my prompt it lost the
formatting I'm asking here for HTML CSS
and JavaScript game in GTA top down
style it should be parking game car
should have four wheels they should
rotate they should be realistic physics
and friction and it should be parking
game it leaves Trails uh behind it
Wheels like Trails of Wheels okay not
super prompt there just something I
brought
quickly here is what LO understood and
try to do so it said yes I will create
for you this kind of HTML CSS JavaScript
game with rotating Wheels realistic
physics and whe
trails and WR like everything else I can
see that it understood correctly what I
wanted and it wrote
this a gray Square nothing happens you
cannot write nothing happens no errors
nothing so it fails blot failed from one
shot we can take a look at code as far
as code goes it didn't use any kind of
third party libraries there is some CSS
and there is Javascript it uses canvas
it wrote All on its own it just not
working considering there were not even
errors it's even hard to say what's
wrong okay next let's see what happened
in web it uses the same model we can see
that I have here the same prompt just
formatted here with lines and from first
try also failed it showed an error and
when I've tried to fix the error it did
this exactly like I wanted
right uh I did ask it to change some
things and got
this now at least it doesn't fall down
partially problem is that I asked it for
physics game and i' used physics engine
made not for topown games but for Sid
scroller
games but we can also here see why this
prompt is problematic it includes using
physics engines a third party dependence
and physics engines are notoriously very
lacky in a sense that even small
mistakes in how physics Works makes them
go crazy that's what you see here so
asking model to write this kind of code
it needs to write a code it needs to
understand how to use library and it
needs to understand what I asked in
context that I want a top- down parking
game how do you use physics engine
correctly to make it work and also
render correctly this is a very it's not
easy CU we are asking a textual model it
never seen things how can it reason
about physics and how to use physics
engine for it to correctly simulate what
what I'm asking this is this is hard
even with iteration with son 3 in vbim
or in clo this thing fails for me I've
tried multiple and multiple times before
I wanted to make this to help my wife
illustrate how parking works when and
how you should reason about where and
where your car is and how different
parts trajectories go when you're
parking and I couldn't do it and I don't
don't have time to sit down and do it on
my own it could take takes it's not
actually easy to do this from scratch to
work correctly so Sonet and Claud and
web fail for me with this kind of PR
let's take a look at what openio wanted
here is the same PR here is chat with uh
ch1 preview the slower more expensive
smarter model it thought for 24 seconds
it didn't show whole reasoning it just
showed summary of it and we here see
that it says crafting the simulation I'm
creating GTA 2 style top down car game
it will be HTML and
JavaScript it will Design on
canvas will be
controls mapping out the game I'm
setting up HTML the game Conners will
serve as rendering area setting up
setting up environment I'm pulling
together the game conas driving car
class focusing on speed acceleration
steering position updates and rendering
we can see that it's reasoning through
in different ways through what it's
going to be doing then it would have
enchancing car physics I'm working
through realistic car physics including
wheel rotation it's like selft talk you
know like it's self-affirmation what
humans do in the sense that what is it
that I'm doing now oh yeah that's what
I'm doing okay let's do
it so interesting what's interesting
that clae also does that we can see here
some of its reasoning being put out but
there is also hidden part of the
reasoning that they do not show we know
from proms that they have thinking parts
and they are not showing them here
they're hiding them so CLA does this too
and this does this too does it perform
better well it generated this kind of
code actually big one usually chipt was
not generating such a big one code from
what I seen o1 preview and o1 mini can
output very large answers something like
eight or more like 8 to 16 times larger
than before this is not 8 to 16 times
larger than before but it's an large
answer and we can say that it gives us
explanation like there is HTML convas is
of this size there are colors there is
car code it explains everything about
the code you can read it and learn on
some level of what the code it wrote
does
it has features realistic physics how to
use use with this arrows and we have
code what we can do is we can copy that
code we can go to code pen past that
code let's save so that we have empty
one save again and this is what we
got it's a little bit
unfinished but it's
working Wheels rotate going I can go
back I can go forward and it is drawing
a line only issue is that this line
seems to be the middle of the rectangle
of the car it's a little bit not what I
want I probably would draw actually
Trails of corners of the car's body I
think this is the most important thing
for this kind of game but um it did it
it did it from first try single prompt
one shot no correction it just
works this is impressive
so that's it I just compared Sonet to 01
model 01 did in one shot something I was
trying to achieve with web seam or
CLA it's not
perfect then one more interesting thing
happened I've gave this code to web seam
to see what it's going to do let's take
a look at
that so I think it was here here I gave
web Sim the same code and it worked we
have the same car that 01 generated but
webam added other things we can see
speedometer we can see parking spots we
can see score we can see buildings and
if we go into the building it's going to
become
red we can go and try to
park oh we got 100
score and four parking spots remaining
we can go to the next
one I also wanted a little bit more work
around the
trails now we have four
trails and we have also some
settings so we can change acceleration
and some other things now it's right
faster we we we we we we oh my
God so what happened here and we can go
and see the prompts so we can see that
first prompt was just code that o1
generated and then I said that make it
draw four Trails one for each will leave
other things as this add settings
control minimal and maximal speed
friction control and so on make left
right arrows rotate the wheel and so on
so on so I I iterated little bit and I
got even better version so what's
happening here is is I've gave result of
smarter model presumably seems like o1
is smarter than son 35 I gave thinking
from smarter model to a less capable
model and it was capable of using it
correctly not breaking it and improving
Aon it which is interesting and
impressive use case you can use smarter
models to get you started or for hard
problems and for other things use
cheaper faster models it's an
interesting future we're going into so
this is another thing I wanted to show
now all of what I showed you I've did
yesterday evening now I want to do a
little bit of live testing with you the
gp4 model and here I want to create so
we using preview I want a new
chat so what I'm thinking I still
slightly dislike this I think it's not
realistic enough in some
ways it doesn't really behave like a
physical
car and it got me thinking while o1 is
impressive it moves the button further
it can do more it's not that sky is the
limit ceiling has risen but like but
it's lower than the sky and this is what
I want to show you because I was playing
with it a little bit and it does fail it
just it does more complex things and
fails less it still
fails so what I was thinking let's ask
it to make a 3D game let's make it even
harder let's try to push it Until It
Breaks A little bit let's make a harder
game I want
HTML JavaScript and
CSS okay it should be in 3D with physics
it should
be parking
game there is a car four wheels front
wheels
State wheels are round there is friction
inertia and everything else to make it
realistic purpose of the game is to
allow people to learn how to park how to
think and reason about when and why and
how you need to rotate the whe
will what are
trails and trajectories of wheels and
car
Corners so people can learn in the game
and apply that knowledge to real
parking let's ask it this what's
interesting here and I I do suspect it's
going to fail what's interesting here is
that I'm now asking it to use third
party Library game engine and it's in 3D
and it should be realistic and there is
purpose in teaching people how to park
let's see what it's going to
do it's very challenging Asar in some
ways so it's thinking it's
crafting so crafting parking game 3D in
HTML JavaScript CSS setting up 3D
elements crafting a game creating a
realistic 3D parking HTML game it will
use 3GS
library and Canon G for physics note due
to the complexity of full-fledged game
the following code provides a basic
framework you can expand upon it to add
more features
okay
okay so what I dislike so far is that it
writes it as a bunch of different
segments I want to ask it can you write
it as one large HTML code block I can
copy and paste at
once so on the reasoning side I think
it's good choice three jazz is most
popular Library Canon Jazz I do not know
I have not much experience with 3D
physics but it seems more or less
correct it creates the car there is
geometry there are wheels and there is a
lot of opportunity for failure it is
very hard task so now I asked it it to
do a one block it thought for 7 seconds
and wrote for me a block that actually
looks
smaller is it we can see it's probably
not as good let's create a new fan see
what this small block
did okay it's
actually the only thing it's not
writing can
I but it's almost
there um
okay I wonder if we could add all of
that let's try to
check uh
if we search for this it is here twice
okay okay let's give it feedback let's
see if it can
iterate oh I I see that wheels are in
different directions it cannot analyze
images as well so it's not multimodel
model just yet I didn't check in maybe
it is multimodel Data just didn't expose
the capability oh what's interesting it
hand it still continue to
work it didn't finish uh so let's give
it feedback it seems like wheels are not
correctly
positioned they
go parallel to the
ground ground so when I press arrows car
does uh right anywhere anywhere can you
ruminate
and why and how to
fix so I don't want to make this video
too long after this iteration we're
going to stop for this last part I just
wanted to demonstrate that it's not
magic it is better in some ways it's
considerably better but it's not human
level developer yet on other hand it's
fast it is faster than human developer
doing these kind of things so it's again
noticing issue the user has been point
of the wheels is parel to the ground
causing the car to stay motionless when
arrows are
pressed hang out the options analyzing
code
setup ensuring correct wheel setup
navigating orientation
differences oh it's thinking it was
taking for a while and it failed so yeah
no magic there let's
see okay this time it thought slightly
less
and it's writing new code well let's
keep some fingers crossed maybe it's
going to
work as you can see it writes more code
I wonder how many lines that is I
remember they were speaking about
something like 32,000 16,000 64,000
tokens this is a lot it wrote A big one
it explains what it try to
fix it even wants to run oh boy this is
going to be interesting in ating
workflows it speaks about how to test
that you will not run to server open it
I wonder if it's going to be able to
manipulate a browser
eventually okay so we here have no new
code let's give it a try
sa and I think it completely failed and
it failed so not magic it's better and
impressive but still will require
iteration there will be type of problems
for which it's going to be failing this
is all I wanted to show today it's
better it's going to be interesting to
see where it goes this was round one in
it it did win against son 35 I will be
playing with it more during next weeks
and I will make another video uh if I
learn anything interesting so if you
like this kind of things push subscribe
down there and see you next time
Voir Plus de Vidéos Connexes
Wie Künstliche Intelligenz funktioniert | Philip Häusser
Großer LKW, schwere Container, enge Stadt: Das Gehalt als Berufskraftfahrer | Lohnt sich das? | BR
🇨🇵 DEINE 35 ERSTEN WÖRTER AUF FRANZÖSISCH: authentisches Französisch mit einer Muttersprachlerin
GÄNSEHAUT! Versprechen eingelöst! Mit meinem Lamborghini Countach auf der Autobahn!
Using Gemini 1.5 PRO to Automatically FIX GitHub Issues (Insane) (Part -2)
SO macht Lernen endlich Spaß! 🥳
5.0 / 5 (0 votes)