New ChatGPT o1 VS GPT-4o VS Claude 3.5 Sonnet - The Ultimate Test

Skill Leap AI
19 Sept 202416:05

Summary

TLDRIn this video, the presenter compares the new OpenAI Chat GPT-01 model with the GPT-40 model across 10 different prompts. They also test a custom GPT built with Chain of Thought prompting and a Claude project by Anthropic. The test aims to see if GPT-01 can outperform not only GPT-40 but also these other AI models. The video includes tests on letter counting, logical reasoning, and coding challenges. The GPT-01 model shows promising results, particularly in coding and logical reasoning, suggesting it may be superior to GPT-40 and the other models tested.

Takeaways

  • πŸ€– The video compares the new Chat GP01 model from OpenAI with the older GPT-40 model.
  • πŸ” The test includes 10 different prompts to evaluate the models' performance.
  • πŸ’‘ The creator also built a custom GPT model using Chain of Thought prompting to replicate the 01 model's capabilities.
  • 🌐 The test incorporates prompts from OpenAI and Matthew Burman's video for a comprehensive comparison.
  • πŸ“ The first prompt asks about the number of 'R's in 'strawberry', which all models answered correctly.
  • 🐣 The 'chicken or the egg' question was used to test the models' ability to provide scientific explanations.
  • πŸ“Š A math question about comparing numbers (9.11 vs. 9.9) was used to assess the models' numerical reasoning.
  • 🎱 A logic puzzle about a marble and a glass杯 was used to test the models' spatial reasoning.
  • πŸ“ A word count test was used to evaluate the models' ability to perform simple counting tasks.
  • πŸ•΅οΈβ€β™‚οΈ A 'hallucination test' was conducted to see if the models would make up information about non-existent mango cultivars.
  • πŸ’» A coding test to create a game of chess in Python was used to assess the models' programming capabilities.
  • πŸ† The Chat GP01 model outperformed GPT-40, the custom GPT, and Claude in the overall test.

Q & A

  • What is the main focus of the video?

    -The main focus of the video is to compare the performance of the new OpenAI chat GPT-01 model with the GPT-40 model and other AI models on various prompts.

  • How many different prompts were used in the test?

    -The video mentions that 10 different prompts were used in the test.

  • What is the purpose of testing against a custom GPT model built by the video creator?

    -The purpose of testing against a custom GPT model is to see if it can replicate the Chain of Thought prompting that the GPT-01 model is believed to use, and to compare its performance.

  • Which AI model is also tested in the video besides the custom GPT and GPT-40?

    -In addition to the custom GPT and GPT-40, the video also tests against a Claude project powered by Claude 3.5 Sonnet.

  • What is the first test question mentioned in the video?

    -The first test question is 'How many Rs are in a strawberry?'

  • What is the significance of the chicken or the egg question in the video?

    -The chicken or the egg question is used to test the AI models' ability to provide scientifically accurate answers and their reasoning capabilities.

  • How does the video creator improve the test to make it more scientific?

    -The video creator improves the test by using prompts from OpenAI and Matthew Burman's video, which are designed to effectively compare the models.

  • What is the outcome of the marble in the glass cup test?

    -The GPT-01 model correctly identifies that the marble is left on the table when the glass is moved to the microwave, while GPT-40 and the custom GPT models incorrectly place the marble inside the microwave.

  • Which model performs the best in the coding test of creating a game of chess in Python?

    -The GPT-01 model performs the best in the coding test, providing a functional chess game that is closer to a complete game than the other models.

  • What is the final verdict of the video regarding the performance of the AI models?

    -The final verdict is that the GPT-01 model outperforms the GPT-40, the custom GPT, and the Claude project in the tests conducted.

  • What additional information does the video provide about updates to the AI course and community platform?

    -The video mentions that updates are being made to the AI course and community platform to include information related to the new GPT model, with over 20 courses and an active community for questions.

Outlines

00:00

πŸ€– AI Model Comparison: Chat GPT-01 vs. GPT-40

The video begins with the host introducing a comparison test between the new Chat GPT-01 model from OpenAI and the existing GPT-40 model. The test involves 10 different prompts to evaluate performance. Additionally, the host has created a custom GPT model with a Chain of Thought prompting system and a Claude project using Claude 3.5 Sonnet, both designed with the same prompt to replicate the 01 model's capabilities. The test aims to determine if the 01 model can outperform not only GPT-40 but also the custom-built models. The host also mentions using prompts from OpenAI and Matthew Burman's video to make the test more comprehensive and scientific.

05:01

πŸ“ Counting 'R's in 'Strawberry' and the Chicken or Egg Conundrum

The first test presented is a simple question about the number of 'R's in the word 'strawberry'. Both the 01 model and GPT-40 correctly identify there are three 'R's. The host then discusses the 'chicken or the egg' question, where both models provide a scientifically accurate answer that the egg came first due to evolutionary mutation. The custom GPT and Claude project also give comprehensive responses, aligning with the 01 model's performance. The host notes that all models pass this round of testing.

10:02

🧠 Solving Logical Puzzles and Hallucination Tests

The video continues with a logical puzzle about a marble and a glass cup, where the 01 model correctly deduces the marble's location, outperforming GPT-40. The custom GPT and Claude project also correctly answer, while the regular GPT-40 and the custom GPT clone fail to provide the correct reasoning. A hallucination test follows, where the 01 model successfully avoids fabricating information about a non-existent mango cultivar, unlike GPT-40 which hallucinates details. The custom GPT and Claude project show a mix of accuracy and slight hallucination, with Claude maintaining a more cautious approach.

15:04

🏰 Coding Challenge: Creating a Chess Game in Python

The host presents a coding challenge, asking the models to write a game of chess in Python. GPT-40 fails to produce a functional game, while the 01 model provides a near-complete game with only minor missing features like check and endgame logic. The Claude project also manages to create a functional chess game, although it lacks the visual elements due to its inability to provide web links. The 01 model's performance in coding tests is highlighted as superior to both GPT-40 and Claude 3.5 in the host's early testing. The video concludes with the host announcing updates to their AI course and community platform, emphasizing the practical applications of the new GPT models in various fields.

Mindmap

Keywords

πŸ’‘Chat GPT-01

Chat GPT-01 refers to a new model from OpenAI that is being tested in the video. It is a language model designed to understand and generate human-like text based on the input it receives. In the context of the video, Chat GPT-01 is being compared to other models like GPT-40 to evaluate its performance on various tasks. The video script mentions testing Chat GPT-01 against different prompts to see if it can outperform its predecessors.

πŸ’‘Chain of Thought prompting

Chain of Thought prompting is a technique used to enhance the reasoning capabilities of AI models. It involves breaking down complex problems into simpler steps and explaining the reasoning process. In the video, the creator mentions building a custom GPT with a set of instructions to replicate what the Chat GPT-01 model is doing, which includes Chain of Thought prompting. This technique is used to make the AI's thought process more transparent and to improve its ability to solve problems.

πŸ’‘Claude project

The Claude project is mentioned as another AI model being tested in the video, powered by Claude 3.5 Sonnet. It is compared alongside Chat GPT-01 and GPT-40 to see which performs better on the given tasks. The Claude project is also given the same system prompt as the custom GPT to ensure a fair comparison. This keyword is significant as it represents an external AI model being used as a benchmark in the tests.

πŸ’‘Matthew Burman

Matthew Burman is referenced in the script as a source of test prompts. He is a notable figure in the AI community, known for his rigorous testing of AI models. The video creator uses some of his questions to test the AI models, indicating that Matthew Burman's tests are considered comprehensive and challenging enough to evaluate the capabilities of the models being tested.

πŸ’‘RS in a strawberry

The phrase 'RS in a strawberry' is used as a prompt to test the AI models' ability to count letters within a word. It is one of the first tests conducted in the video, and all models, including Chat GPT-01, GPT-40, and the custom GPT, correctly identify that there are three Rs in the word 'strawberry'. This keyword illustrates the models' basic language processing capabilities.

πŸ’‘Chicken or the egg

The classic question 'which came first, the chicken or the egg?' is used as a prompt to test the AI models' ability to provide scientifically accurate answers. The video script notes that the correct answer, from a scientific perspective, is the egg. This keyword is significant as it tests the models' ability to handle abstract and philosophical questions while still providing factual responses.

πŸ’‘9.11 or 9.9

The prompt 'which number is bigger, 9.11 or 9.9' is used to test the AI models' numerical reasoning capabilities. This is noted as a challenging question for large language models (LLMs), but both GPT-40 and Chat GPT-01 correctly identify that 9.9 is greater. This keyword is important as it demonstrates the models' ability to perform basic numerical comparisons.

πŸ’‘Glass and marble

The scenario involving a glass, a marble, and a microwave is used as a logical reasoning test. The video script describes a situation where a marble is placed under an inverted glass cup, and the glass is then moved to a microwave. The models are asked to deduce the location of the marble. This keyword tests the AI's ability to reason through physical scenarios and understand the consequences of actions.

πŸ’‘Word count

The task of counting words in a response is used to test the AI models' ability to perform a simple yet common task that is often challenging for LLMs. The video script mentions that Chat GPT-01 correctly counts the words in a response, while the custom GPT and GPT-40 do not. This keyword is significant as it highlights the models' capabilities in processing and quantifying textual data.

πŸ’‘Coding test

A coding test, specifically writing a game of chess in Python, is used to evaluate the AI models' ability to generate functional code. The video script describes the varying success of the models in creating a playable chess game. This keyword is crucial as it tests the models' practical application skills and their ability to translate textual instructions into executable code.

πŸ’‘Hallucination test

The 'hallucination test' involves asking the AI models to describe mango cultivars, some of which may not exist. This test is designed to see if the models will fabricate information when they lack it. The video script notes that Chat GPT-01 avoids making up information it doesn't have, which is a positive indication of its reliability. This keyword is important as it assesses the models' integrity in providing information.

Highlights

Testing the new chat GPT-01 model from OpenAI against the GPT-40 model.

Conducting a comprehensive test with 10 different prompts.

Comparing the new model with a custom GPT built with Chain of Thought prompting.

Using the same system prompt for a Claude project powered by Claude 3.5 Sonnet.

The first test question: How many 'R's are in the word 'strawberry'.

All models correctly identified there are three 'R's in 'strawberry'.

The question 'Which came first, the chicken or the egg?' was answered scientifically by the models.

Custom GPT and Claude provided in-depth answers to the chicken and egg question.

A test to determine which number is bigger: 9.11 or 9.9, with all models getting it right.

A logic puzzle about a marble in a glass cup was correctly solved by the GPT-01 model.

GPT-40 and the custom GPT failed to correctly answer the marble in the glass logic puzzle.

Claude correctly identified the marble's location in the logic puzzle.

GPT-01 model outperformed GPT-40 in a word count test.

A hallucination test was conducted with the models describing mango cultivars.

GPT-01 model avoided hallucination by admitting lack of information on a mango cultivar.

GPT-40 exhibited hallucination by inventing details about a non-existent mango cultivar.

Claude showed a slight hallucination倾向 but managed to avoid completely making up details.

A logic question about killers in a room was answered correctly by all models.

GPT-01 model provided a functional chess game in Python, surpassing GPT-40's attempt.

Claude's chess game attempt crashed, indicating a limitation without web access for assets.

GPT-01 emerged as the winner in the comprehensive test, outperforming GPT-40 and Claude.

Updates to the AI course and community platform to include new GPT model applications.

Transcripts

play00:00

in today's video I'm going to take the

play00:01

chat gp01 preview model the new model

play00:03

from open Ai and I'm going to test it

play00:06

against chat GPT 40 model we're going to

play00:08

do 10 different prompts and I'm also

play00:11

going to test it against couple other

play00:13

things that I put together one is a

play00:15

custom GPT that I built with my own set

play00:17

of instruction to try to replicate what

play00:20

the 01 model is doing in the background

play00:22

which to some extent is Chain of Thought

play00:25

prompting I'll explain how I built this

play00:26

in a second and I'll give you the exact

play00:28

prompt for it did cover this in a

play00:30

previous video as well but I also

play00:33

created a Claude project powered by

play00:36

Claud 3.5 sonnet with the same exact

play00:40

prompt the system prompt that I gave to

play00:42

this custom GPT so this should be a very

play00:45

comprehensive test to see if the 01

play00:49

model could outperform not only GPT 40

play00:52

which I'm assuming it will but can it

play00:54

actually outperform this which I covered

play00:56

in a different video with IQ and math

play00:58

test but I think I got some better

play01:00

questions this time around and against

play01:02

the Claud project that I've put together

play01:04

here now this time to improve the test

play01:06

and make it a little bit more scientific

play01:08

I found couple resources for prompts one

play01:10

was directly from open AI with a few

play01:12

examples that I thought would do a good

play01:14

job comparing this model versus the

play01:16

previous models and I also went on this

play01:19

video right here Matthew Burman I'm sure

play01:22

you probably follow his channel but he

play01:24

has a great test that he runs every time

play01:27

a new model comes out so I took a few

play01:29

few of his questions as well that I

play01:32

think do a really great job I'll link to

play01:33

this video this is where I got the

play01:35

prompts from where he compared it and he

play01:38

got fantastic results from gp01 okay the

play01:41

first test is going to be how many RS in

play01:43

a strawberry this is the very first

play01:45

question they have and I'm going to send

play01:47

this out okay gp01 and I'll keep the

play01:49

orientation the same so this is always

play01:51

going to be on the right there are three

play01:53

Rs in the W strawberry which is right

play01:55

and GPT 40 even got this one right let

play01:57

me actually run it again cuz sometimes

play02:00

it doesn't get a write wow you got to

play02:02

write again in my previous experiences a

play02:04

lot of times GPT 40 didn't know how to

play02:07

count letters in a word okay we also

play02:09

have my GPT clone and we have our CLA

play02:12

project with the same set of

play02:14

instructions so I'll show you the

play02:15

instructions here and I'll put this in

play02:17

the description if you want to build

play02:18

your own as well but I'll make this one

play02:20

publicly available too with a link where

play02:23

you could test it out you are an AI

play02:25

assistant designed to Think Through

play02:26

problem step by step using Chain of

play02:28

Thought prompting now this is all I give

play02:31

it this prompt is actually not even that

play02:32

long it just has a few different steps

play02:34

to it understand the problem carefully

play02:36

read and understand the user questions

play02:38

break down the reasoning process is the

play02:40

next part explain each step arrive at

play02:43

The Final Answer after completing all

play02:45

the steps provide the final answer and

play02:47

solution review the thought process so

play02:50

again you could go ahead and copy and

play02:51

paste and create your own project or

play02:53

your own GPT I have ton of videos on

play02:55

this channel about creating both of

play02:57

these these are my favorite AI tools

play02:59

available right now okay here's the

play03:01

answer from both and as you could see

play03:02

the answer is much more comprehensive

play03:05

than you would guess straight out of

play03:06

Claude and straight out of GPT 40

play03:09

because of that system prompt there are

play03:11

three Rs in Strawberry this is GPT clone

play03:13

and three Rs in Strawberry this is

play03:15

Claude okay everybody got this one right

play03:17

it's a pass okay next one is another

play03:19

open AI question this one says which

play03:22

came first the chicken or the egg

play03:24

scientifically speaking the egg came

play03:26

first but it's still a fun question to

play03:28

think about and the reason is the egg

play03:31

came first because the first true

play03:33

chicken likely evolved from a mutation

play03:36

in an egg laid by another type of bird

play03:40

now this one long before the chicken

play03:42

existed other egg laying animals were

play03:45

producing eggs and again genetic

play03:47

mutation so same answer from both okay

play03:50

let's see our custom gptm project here

play03:53

wow these answer is again a lot more in

play03:56

depth let's see what we got out of it at

play03:59

the end conclusion the egg came first

play04:01

this is because the first chicken would

play04:03

have hatched from an egg laid by another

play04:05

bird great and same thing with Claude

play04:07

here it says the egg came first and the

play04:10

Egg was laid by a very close ancestor of

play04:13

the modern chicken okay here is one from

play04:16

Matthew's video which number is bigger

play04:19

9.11 or 9.9 and again this is a problem

play04:23

for llms to get correctly so it's very

play04:26

obvious for us but for an llm this has

play04:29

always been been challenging okay I got

play04:31

the answer right away out of GPT 40 9.9

play04:36

is bigger than

play04:37

9.11 this one also 9.9 is greater than

play04:42

9.11 so they both got the answer this

play04:44

one did take 19 seconds I think this one

play04:46

took like two seconds so they did both

play04:49

get a right okay and with our Claude and

play04:52

GPT 9.9 is bigger than 9.11 same thing

play04:56

so a pass again for all four okay this

play04:58

next one a is put in a glass cup the

play05:01

glass is turned upside down and put on a

play05:04

table then the glass is picked up and

play05:06

put in a microwave where is the marble

play05:08

explain your reasoning step by step this

play05:10

is from Matthew as well but open AI

play05:12

actually had a very close version it

play05:14

looks like they took from his videos and

play05:16

added it to their uh platform as well

play05:20

let's see let's get to the conclusion

play05:23

now 01 says location of the marble on

play05:26

the table where the inverted glass was

play05:28

initially placed the marble was left

play05:31

behind when the glass was picked up and

play05:33

moved and over here when the glass is

play05:35

picked up from the table The Marble

play05:37

Falls to the bottom of the glass so in

play05:40

the microwave the marble is at the base

play05:43

of the glass touching the bottom of the

play05:45

microwave now the actual answer here is

play05:47

this one this 01 model got it right the

play05:50

marble is Left Behind on the table not

play05:52

inside of the microwave so here 01 does

play05:55

get one point over GPT 40 now let's try

play05:59

the custom gptm project so for our clot

play06:03

project here it says the most likely

play06:05

location of the marble is on the table

play06:08

which is actually correct but inside of

play06:11

our custom GPT it didn't quite give me

play06:14

an answer the marble is inside of the

play06:15

glass cup but it's not telling me if

play06:17

it's on the table or the microwave I'll

play06:20

just do one quick followup okay it's

play06:22

still not giving me a exact answer it's

play06:24

telling me it's at the bottom of the cup

play06:27

but I want to know is it inside of the

play06:29

microwave or is it on the table like the

play06:32

other ones told me and this one also

play06:34

thinks it's inside of the microwave so

play06:37

the GPT clone here did not improve from

play06:40

the regular GPT 40 I got the same wrong

play06:44

response Claude got it right here I also

play06:47

just want to test it inside of our

play06:49

regular clot here just to see what we

play06:51

get if we don't have a custom project I

play06:53

want to see if the custom project helped

play06:56

okay in this case Claude conclusion the

play06:58

marble is on the table but where the

play06:59

glass was originally placed so Claude

play07:01

got it right both inside of the regular

play07:04

chat and inside of our project our GPT

play07:08

40 and our custom GPT both did not get

play07:10

it right and 01 got it right okay this

play07:13

next one again I'm going to use 01 here

play07:15

and 40 how many wordss are in your

play07:18

response to this prompt and I'm going to

play07:21

send it this is from Matthew's video as

play07:22

well this is something these models just

play07:24

can't do they don't know how to count

play07:26

Words correctly I usually use Microsoft

play07:28

Word to get the word count

play07:30

1 2 3 4 5 6 7 8 9 10 oh it was close but

play07:37

definitely not 11 and I guess it's

play07:40

counting numbers as award two this one

play07:43

says the response contains five W one 2

play07:46

3 4 five okay again 01 got it right this

play07:50

time and I guess with our custom GPT

play07:52

this is not going to work very well

play07:54

because as part of the response it has

play07:57

to give us the step-by-step thinking

play07:59

here where the other one was doing that

play08:01

behind the scenes with the 01 model but

play08:04

as I'm looking at this let's just take

play08:05

this one now I will count the words 1

play08:07

two 3 4 5 six okay this one is right

play08:11

this one is 16 and I counted that was 15

play08:15

actually so if it's counting the comma

play08:17

maybe as a word but it was 15 here so

play08:21

again this one I don't think the custom

play08:22

gpts are going to do a good job let's

play08:25

try the cloud project probably going to

play08:27

have the same exact problem because yep

play08:29

is going to think out loud with the

play08:30

Chain of Thought prompting okay and

play08:33

again not a very useful answer so for

play08:35

this kind of thing 01 is actually the

play08:38

first model that has been doing a good

play08:40

job from all the tests that I've seen

play08:42

okay this next one is a hallucination

play08:44

test to see if Chain of Thought

play08:46

prompting or this 01 model however is

play08:48

working in the background is going to

play08:49

solve the hallucination problem I saw

play08:52

this in a comment section of that same

play08:54

video that I've been referring to

play08:56

someone asked for a hallucination test

play08:59

describe each of the following mango

play09:00

cultivars here are four and this one is

play09:04

not one so let's see if it's going to

play09:07

hallucinate and tell us about a little

play09:09

more about this one this is a good

play09:11

hallucination test actually okay so with

play09:13

the 01 model is telling us this one

play09:16

right here that this is something that

play09:18

it doesn't have information about so it

play09:21

might be a newer or less widely known

play09:23

variety okay so it did give us an answer

play09:26

but it didn't kind of make it up it this

play09:28

is the right answer it does doesn't know

play09:30

because the information cut off doesn't

play09:32

have that knowledge but look at what 40

play09:34

did this is an example of hallucination

play09:38

a relatively newer variety The Lemon

play09:41

Cream mango has a distinctive sweet

play09:44

tarte

play09:45

flavor he's just like totally making

play09:47

something up that shouldn't be there

play09:49

right based on that prompt this is the

play09:51

more accurate answer hey I don't have

play09:53

the information on that but this time it

play09:55

made it up okay here inside of cloud

play09:58

project this project we have it

play10:01

says I'm less certain about this one

play10:04

okay so it didn't totally make it up but

play10:06

I believe it's also from Florida okay

play10:08

it's hallucinating a bit likely yellow

play10:12

but you could see it's un short it's

play10:13

just not completely making it up look at

play10:15

our GPT clone here is flavor profile is

play10:17

making that up right so this time GPT

play10:21

clone this is the GPT 40 and the GPT

play10:25

clone not the 01 model the 01 model got

play10:27

it right it says hey I'm not sure I have

play10:29

a cut off date of my knowledge so again

play10:32

Claude is keeping up but GPT 40 is

play10:35

falling behind the new 01 model okay

play10:37

here's another good one there are three

play10:39

killers in the room someone enters the

play10:41

room and kills one of them nobody leaves

play10:43

the room how many killers are left in

play10:45

the room explain your reasoning step by

play10:48

step and right here our GPT 40 says the

play10:51

answer is that there are three killers

play10:52

in the room that is correct and it says

play10:55

the two original Killers plus the new

play10:57

one okay that is right let's see what 01

play11:00

gave us oh looks like I hit some kind of

play11:05

content violation here but there are

play11:08

three killers left in the room two

play11:10

original and one new one okay so we got

play11:12

that one right even though we had some

play11:14

kind of error here but it did conclude

play11:17

okay with our custom GPT there are three

play11:19

killers in the room two original and the

play11:23

new one over here what do we got out of

play11:26

our Cloud

play11:27

projects and we got three therefore

play11:30

there are three killers in the room okay

play11:32

looks like they all got the right answer

play11:34

no clear winner here okay for this one

play11:36

I'm going to do one coding test which I

play11:38

did in the original test write a game of

play11:41

chess in Python and I want to see if I

play11:44

could run this on my computer here okay

play11:46

here's the first game we got this is GPT

play11:48

40 not the 01 model and this time it

play11:51

decided to give me a much simpler game

play11:53

than I've gotten before oh wow we can't

play11:55

even drag and drop these pieces we have

play11:58

to type in

play11:59

which part of the board we want to move

play12:01

into I don't even know doesn't even have

play12:04

marking so I don't have the board

play12:05

memorized like that and okay so this is

play12:09

a total fail out of 40 again it's only

play12:11

one prompt I'm just doing this off the

play12:13

very first prompt to just make it a more

play12:16

fair test because obviously with back

play12:17

and forth I could refine this a lot more

play12:20

I've done this test with other videos as

play12:21

well okay here's the new game of chest

play12:24

this is what I got out of 01 and these

play12:27

pieces right here it told me where to

play12:30

download them so he gave me a link and I

play12:32

went and downloaded these pgs from the

play12:34

link he gave me and I just had to name

play12:37

them this way so the code could pull

play12:39

them into the game let's see the logic

play12:42

of the game okay that worked right move

play12:45

that here move this here this should

play12:47

take this piece I should take this piece

play12:50

oh wow that is working a lot better than

play12:54

before wow this is incredible I was not

play12:58

able to get this to work at all the

play13:00

first time I tried it the first day this

play13:01

came out and it looks like everything is

play13:05

working exactly as it should okay I'm in

play13:08

check now let's see if it could move

play13:10

okay so it does not understand check yet

play13:13

it looks like that's where it's missing

play13:15

because right there I technically

play13:17

couldn't move a different piece I had to

play13:19

block okay and game's not over okay so

play13:25

almost there I would say 80% there it

play13:28

just doesn't have some ingame logic and

play13:30

I actually think he gave me a little bit

play13:31

of text inside of the chat telling me

play13:35

this is missing few things like castling

play13:38

and endgame logic so I could maybe with

play13:41

one followup get it to work but wow this

play13:43

is incredible this is much further than

play13:45

I've ever got with any large language

play13:47

model and last I'll try the chess game

play13:50

inside of claw 3.5 SAA just a regular

play13:52

chatel I don't think the project or the

play13:54

custom gpts are going to be very

play13:56

appropriate for this kind of thing so

play13:59

I'll just give it a prompt okay so

play14:00

here's the game out of Claude now as you

play14:02

could see the pieces don't look like

play14:04

chess pieces because Claude just can't

play14:07

get those to me because it doesn't have

play14:09

web access so it didn't give me a link

play14:11

so if I was just using Cloud I wouldn't

play14:13

have those pieces those pngs to replace

play14:16

it but let's look at the game logic here

play14:19

okay this is nice these dots look good

play14:22

this looks good let me just play the

play14:24

same pieces here let me take this oh

play14:28

okay crashed it looks like it just

play14:30

crashed the game let me try to relaunch

play14:33

it again let me see why it crashed let

play14:35

me try Okay it can't take that piece

play14:39

okay so you could see 01 when it comes

play14:42

to just some simple coding tests it does

play14:45

beat cloth 3.5 Sonet in my early testing

play14:48

again I'm just doing some fun game

play14:50

testing I'm not a developer by trade so

play14:54

this is what I'm getting and I'm just

play14:55

showing you here in real time of what he

play14:57

gave me okay now if we take everything

play14:59

side by side you can see the regular GPT

play15:01

4 is falling behind my custom GPT didn't

play15:04

do a much better job either but Claude

play15:06

is keeping up both with projects and

play15:09

inside of the chat but 01 did win this

play15:13

entire test if we take all the different

play15:16

question as I asked it including the

play15:18

coding question

play15:19

gpt1 or open AI 01 in the preview mode

play15:23

right now and it's supposed to even

play15:24

improve when it comes out of preview is

play15:26

the winner of this test and I also

play15:29

wanted to let you know that we're making

play15:31

updates to skill. that's our AI course

play15:34

and Community platform so we have over

play15:36

20 courses that you get access to with a

play15:38

free trial if it's a good fit then it's

play15:40

a simple monthly membership and I'm

play15:43

updating all those courses adding things

play15:45

related to the new GPT when you would

play15:47

want to use the new chat GPT model when

play15:49

you still want to use the GPT 40 model

play15:52

for very practical application when it

play15:54

comes to entrepreneurship marketing and

play15:56

content creation so I'll link that below

play15:59

and we have an active Community as well

play16:00

where you could ask me any questions

play16:02

thanks for watching this video I will

play16:04

see you on the next one

Rate This
β˜…
β˜…
β˜…
β˜…
β˜…

5.0 / 5 (0 votes)

Related Tags
AI TestingGPT ModelsChatbotsComparisonMachine LearningCoding TestsProblem SolvingArtificial IntelligenceTech Review