GPT-4o VS Claude 3.5 Sonnet - Which AI is #1?

Skill Leap AI
22 Jun 202425:18

Summary

TLDRThis video provides a comprehensive, practical comparison between GPT-4 and Claude 3.5 Sonet, focusing on real-world applications rather than benchmark tests. The host examines both models' performance in tasks like writing, summarizing, data analytics, coding, and reasoning. The tests reveal strengths and weaknesses in each model, highlighting that while Claude excels in coding and visualization, GPT-4 offers superior functionality in writing, summarization, and customizability. Despite some limitations, both models demonstrate significant capabilities, making the choice between them dependent on specific user needs.

Takeaways

  • 😀 The video is a comprehensive test comparing GPT 40 and Claude 3.5 Sonnet, two AI models, focusing on practical applications rather than scientific benchmarks.
  • 🔍 The test covers a range of practical uses, including writing, summarizing, vision, data analytics, coding, and reasoning to determine which AI is most practical for everyday work.
  • 💰 Both AI models are available in paid versions, but the free versions are limited in usage, prompting the need for a subscription for extensive testing.
  • 📝 In creative writing tasks, both GPT 40 and Claude 3.5 Sonnet performed well, with neither showing a clear advantage in generating product descriptions or emails.
  • 📚 Text summarization capabilities were tested with both AIs providing accurate and concise summaries, with GPT 40 showing a slight edge in tone and detail.
  • 🖼️ When analyzing complex images, GPT 40 initially provided incorrect time frames but corrected itself after further prompts, while Claude 3.5 Sonnet was more accurate from the start.
  • 📊 In data analytics, both AIs were comparable, but GPT 40 had an advantage in creating PowerPoint presentations directly from data, a functionality lacking in Claude 3.5 Sonnet.
  • 🏗️ Claude 3.5 Sonnet excelled in coding tasks, creating interactive visual dashboards and games, outperforming GPT 40 in these areas.
  • 🔎 Research capabilities were found to be lacking in both AIs, with the video suggesting the use of other tools like Perplexity AI for more accurate research.
  • 🤖 Complex reasoning tasks were handled well by both AIs, solving riddles and mathematical problems with correct logic and reasoning.
  • 📱 For content creation, Claude 3.5 Sonnet provided a more usable tweet for social media, while GPT 40's output was less practical and engaging.
  • 🚀 The video highlights the importance of choosing the right AI tool based on specific needs, acknowledging the strengths and limitations of both GPT 40 and Claude 3.5 Sonnet.

Q & A

  • What is the main purpose of the video?

    -The main purpose of the video is to conduct a practical head-to-head test comparing GPT 40 and Claude 3.5 Sonnet, two AI models, to determine which is more practical for everyday work and business use.

  • What types of tasks will the video cover in the comparison?

    -The video will cover tasks such as writing, text summarizing, vision and data analytics, coding, and reasoning to evaluate the AI models' performance in everyday applications.

  • How does the video differentiate between a scientific test and a practical test?

    -A scientific test is typically more structured and formal, like those in benchmark testing. A practical test, as used in the video, focuses on how the AI models perform in real-world scenarios and everyday tasks.

  • What are the limitations of the free versions of GPT 40 and Claude 3.5 Sonnet mentioned in the video?

    -The free versions of both AI models are extremely limited in terms of usage, with Claude 3.5 Sonnet only allowing about 10 messages before requiring an upgrade to a subscription.

  • What is the first writing prompt given to both AI models in the video?

    -The first writing prompt asks the AI models to create a short, punchy product description for a game-changing software tool in the world of marketing that revolutionizes customer relationship management for businesses.

  • How does the video evaluate the AI models' performance in text summarization?

    -The video asks the AI models to provide two summaries of an article: one with two to three sentences and another with five to six sentences that includes more details.

  • What is the main difference between the AI models' capabilities in handling vision tasks as shown in the video?

    -The main difference is that GPT 40 allows uploading of more images and has connected apps for easier image handling, while Claude 3.5 Sonnet has a feature called 'artifacts' for creating visual presentations and tables.

  • How does the video assess the AI models' performance in data analytics?

    -The video tests the AI models' ability to analyze complex images and data, such as a graph representing interest rates, and to create visual presentations or tables based on the data.

  • What limitations does the video highlight regarding Claude 3.5 Sonnet's capabilities in research?

    -The video highlights that Claude 3.5 Sonnet does not have internet access and therefore cannot provide current articles, reports, or relevant links for research, unlike GPT 40 which sometimes provides incorrect links.

  • What is the video's conclusion regarding the comparison between GPT 40 and Claude 3.5 Sonnet?

    -The video concludes that Claude 3.5 Sonnet performs better in coding and data visualization using code, while GPT 40 has advantages in writing, summarization, and having a memory function. The choice between the two depends on the specific needs and use cases.

  • What additional capabilities does GPT 40 offer that Claude 3.5 Sonnet does not, according to the video?

    -GPT 40 offers additional capabilities such as web browsing, image generation with Dolly 3, a memory function to improve responses based on previous interactions, and the ability to build custom GPTs for specific tasks.

Outlines

00:00

🤖 AI Model Comparison: GPT 4 vs. Claude 3.5

This paragraph introduces a comparative test between GPT 4 and Claude 3.5 Sonnet AI models, focusing on practical applications rather than scientific benchmarks. The test will cover various tasks like writing, summarizing, vision, data analytics, coding, and reasoning. Both AIs are used in their paid versions, which are limited in availability. The test begins with creative writing prompts for marketing a new CRM tool, assessing the AIs' ability to generate punchy product descriptions and emails within a specified word count.

05:02

📝 Writing and Summarization Capabilities

The paragraph discusses the AI models' performance in writing and summarization tasks. It highlights the AIs' ability to produce accurate and professional text without obvious AI-generated traits. The models are tested on summarizing an article into short and detailed versions, with Claude 3.5 showing quick response and accuracy, while GPT 4 provides a factual and toned-down summary. The paragraph also touches on the ability to change models within the paid version of GPT for varied responses.

10:03

🔎 Vision and Data Analytics Tests

This section examines the AI models' capabilities in vision and data analytics. The AIs are tasked with interpreting a complex historical image and a graph representing US credit card interest rates. GPT 4 initially provides incorrect time frames but corrects itself upon further prompting, while Claude 3.5 accurately identifies the time frame but struggles with specificity regarding the graph's subject. Both AIs demonstrate the ability to create presentations, but GPT 4 has an edge with its integration into PowerPoint.

15:03

🔍 Limitations and Research Flaws

The paragraph points out the limitations of the AI models in conducting research and providing accurate links. GPT 4 often generates non-functional links, while Claude lacks internet access altogether. The narrator recommends using other tools like Perplexity AI or Google Gemini for research purposes due to the inaccuracies and limitations of the AI models in this context.

20:06

💻 Coding and Interactive Dashboards

This paragraph assesses the AI models' coding abilities, specifically their capacity to create interactive dashboards from financial reports. Claude 3.5 excels in this task, quickly generating visual representations and interactive elements. In contrast, GPT 4 struggles to produce the same results, offering lengthy explanations and processes instead of direct coding solutions.

25:08

🎲 Game Development and Complex Reasoning

The paragraph explores the AI models' capabilities in game development and complex reasoning. Claude 3.5 successfully creates a functional checkers game with interactive elements, while GPT 4 fails to produce a working game. Both AIs perform equally well in solving logic puzzles and riddles, demonstrating their reasoning abilities.

📊 Content Creation and Social Media Optimization

The final paragraph evaluates the AI models' skills in content creation and social media optimization. Claude 3.5 effectively condenses a YouTube script into a concise tweet suitable for quick consumption, while GPT 4 provides both a tweet and a LinkedIn post, although the quality of the tweet is questionable. The paragraph emphasizes the importance of creating shareable and engaging content for social media platforms.

📝 Conclusion and Practical Usage Decisions

The conclusion summarizes the head-to-head test, highlighting Claude 3.5's strengths in coding and visual tasks, and GPT 4's advantages in writing and summarization. The narrator discusses the limitations of both AIs, such as Claude's lack of web browsing and GPT 4's occasional inaccuracies. The paragraph concludes with the narrator's personal decision to use both AIs for different tasks, emphasizing the importance of using the right tool for the job.

Mindmap

Keywords

💡AI models

AI models, or Artificial Intelligence models, refer to the algorithms and computational frameworks that simulate human intelligence processes. In the video, AI models are the central focus as the host compares two models, GPT 40 and Claude 3.5 Sonet, to determine their practical utility in everyday tasks such as writing, summarizing, vision, data analytics, and coding. The script uses AI models to demonstrate their capabilities and limitations in performing specific tasks, which is central to the theme of the video.

💡Benchmark testing

Benchmark testing is a method of evaluating the performance of a system or component by comparing it against a set of predefined metrics. In the context of the video, the host mentions that Claude 3.5 Sonet outperforms GPT 40 in benchmark testing, indicating that it achieves better results in standardized tests. However, the host is more interested in practical applications rather than theoretical benchmarks, which shapes the direction of the video's comparative analysis.

💡Practical test

A practical test is an assessment that measures the performance of a tool or system in real-world scenarios, as opposed to theoretical or simulated conditions. The video script emphasizes the host's intention to conduct a practical test of the AI models to determine which is more suitable for everyday work and business applications. This approach is exemplified by the various tasks the AI models are given, such as writing product descriptions and summarizing text, which are typical of real-world usage.

💡Paid version

The term 'paid version' in the script refers to the subscription-based access to the full capabilities of the AI models being tested. The host mentions that while both GPT 40 and Claude 3.5 Sonet have free versions, they are extremely limited in usage. The paid versions offer more extensive access, allowing for a thorough head-to-head comparison in the video.

💡Creative writing

Creative writing is a form of writing that uses imagination to produce original work, as opposed to writing that is factual or based on existing information. In the video, creative writing is one of the tasks assigned to the AI models, where they are asked to write a product description for a hypothetical CRM tool. This task is used to evaluate the models' ability to generate engaging and original content.

💡Text summarizing

Text summarizing is the process of condensing a longer piece of text into a shorter version while retaining the key points and overall meaning. The video script includes a task where the AI models are asked to provide summaries of an article, demonstrating their ability to process and distill information effectively.

💡Vision capabilities

Vision capabilities, in the context of AI, refer to the ability of a model to interpret and understand visual data, such as images. The script describes a test where the AI models are presented with a complex image and asked to analyze and describe its contents, showcasing their visual understanding and analytical skills.

💡Data analytics

Data analytics involves examining, processing, and interpreting data to extract valuable insights, patterns, and trends. In the video, the AI models are tested on their data analytics capabilities by being asked to analyze a graph representing interest rates and to create presentations based on the data, demonstrating their ability to handle and derive meaning from numerical information.

💡Coding

Coding is the process of writing computer programs or scripts in a specific programming language. The video script includes a test where the AI models are asked to write code for a game of checkers, evaluating their ability to generate functional and executable code.

💡Reasoning

Reasoning is the cognitive process of forming conclusions, judgments, or inferences from facts or premises. In the video, the AI models are presented with logic puzzles and mathematical problems to solve, which tests their capacity for complex reasoning and problem-solving.

💡Content creation

Content creation refers to the process of producing original content for various platforms, such as social media posts, articles, or video scripts. The video script includes a task where the AI models are asked to condense a YouTube script into a tweet, demonstrating their ability to create concise and engaging content suitable for quick consumption.

Highlights

Comprehensive head-to-head test comparing GPT 40 versus Claude 3.5 Sonnet.

GPT 40 and Claude 3.5 Sonnet benchmark testing results discussed.

Practical test approach to evaluate AI models for everyday work and business.

Paid versions of both AI models are available but have usage limitations.

Creative writing prompt to assess AI's ability to generate product descriptions.

AI's performance in writing email drafts evaluated.

Preference for AI responses without additional explanatory text.

AI models' performance in text summarization tested with a complex article.

Comparison of AI's vision capabilities with a complex historical image.

AI's data analytics ability tested with a graph of interest rates.

Chat GPT's ability to create presentations from data analyzed.

Claude's limitations in creating PowerPoint presentations directly.

AI's performance in coding tasks, such as creating a dashboard from a financial report.

Claude's superior performance in creating visual presentations with code.

AI's capability in creating interactive games like checkers.

AI's performance in complex reasoning tasks, such as solving riddles and mathematical problems.

AI's ability to condense YouTube scripts into concise tweets or LinkedIn posts.

Claude 3.5 Sonnet's overall lead in the practical test but with significant limitations.

Limitations of Claude in web browsing, image generation, and memory function.

Advantages of GPT 40 in custom AI model creation for specific tasks.

Recommendation to use both AI models depending on the task at hand.

Transcripts

play00:00

in this video I want to do a

play00:01

comprehensive head-to-head test

play00:02

comparing GPT 40 versus claw 3.5 Sonet

play00:06

which just came out and it beats GPT 40

play00:10

in their Benchmark testing but I don't

play00:12

want to do a scientific test like they

play00:13

do in their benchmarks I want to do a

play00:15

practical test how we use these AI

play00:17

models every day for work and business

play00:20

so then we could actually figure out

play00:21

which one is the most practical model

play00:22

that we use if we had to pick one which

play00:24

one is the right one to pick so that's

play00:26

typically how I do these tests with very

play00:28

practical applications so I'm going to

play00:30

cover about 10 different things and

play00:32

we're going to use things like writing

play00:34

text summarizing text and then we'll get

play00:36

into vision and data analytics and then

play00:38

we'll do some coding and reasoning too

play00:41

at the end so both GPT 40 that I'm going

play00:43

to use and the Claude 3.5 that I'm going

play00:46

to use are going to be the paid version

play00:49

but both of those are available

play00:50

completely for free they are extremely

play00:53

limited and how much you could use them

play00:55

right now so I was using CLA 3.5 sonnet

play00:58

and I only got about 10 messages before

play01:00

I ran out so I had to upgrade and get

play01:02

the subscription to do a real

play01:04

head-to-head test first let's start with

play01:06

a writing prompt a lot of us use these

play01:07

models to write all kinds of different

play01:09

things right so this one is going to be

play01:10

a little bit of a creative writing in

play01:12

the world of marketing you're launching

play01:14

a gamechanging software Tool

play01:16

revolutionizing customer relationship

play01:18

management for business write a short

play01:20

Punchy product description and I told it

play01:22

exactly how many words in this case and

play01:24

I'll do the same thing for cloud 3.5

play01:27

here here's the result from chat GPT it

play01:29

went right to the answer introducing

play01:31

customer connect the ultimate CRM tool

play01:33

that transforms customer interactions

play01:36

pretty good I asked out how many words

play01:37

it was 41 our instruction is 50 words

play01:40

approximately now Claud gave us a little

play01:43

bit of a longer one it's 54 wordss here

play01:45

but again I said approximately and he's

play01:48

asking me if I wanted to adjust it this

play01:50

one named it autoc CRM revolutionize

play01:52

your customer relationship our AI

play01:55

powered platform automized follow-ups

play01:57

delivers real time insights and boost

play02:00

retention effortlessly okay so they both

play02:02

did a good job nothing here that screams

play02:04

that an AI wrote it and I typically use

play02:07

these models to help with writing email

play02:09

drafts so write an email introducing

play02:10

this to our email list and keep it short

play02:12

let's see what it comes up with now here

play02:14

we got claude's answer and this time

play02:17

again it does the same thing it gives us

play02:19

that sentence that we don't need that's

play02:21

not part of our email and it kind of

play02:23

gives you a quick recap of what it's

play02:25

done and then you'll have to kind of

play02:26

copy and paste the middle part here

play02:28

where chat GPT just gives us the copy

play02:31

option right here I could just copy this

play02:33

there's no extra words so I personally

play02:36

prefer to have zero extra words when I

play02:38

ask something in a prompt I just want

play02:41

the answer I don't want it to kind of

play02:42

explain it to me if I need it to explain

play02:45

to me I would ask it to explain it to me

play02:48

and I found chat GPT typically every

play02:50

time I give it tasks where it has to

play02:52

give me an answer that I could just copy

play02:54

and paste it just does exactly that and

play02:56

as I'm reading the tone of this email

play02:59

here they both again kind of did an

play03:01

equal job this one is a little bit

play03:03

overly excited so it's a little bit

play03:05

sounds too promotional uses words like

play03:08

boost which is very common with AI and

play03:11

if I go to chat GPT again the exact same

play03:14

thing happens there's boost I commonly

play03:16

see that and I erased my memory here on

play03:19

chat GPT because I've actually trained

play03:21

it in the memory function which I've

play03:23

made a different video about not to use

play03:25

specific words but since Claude doesn't

play03:27

have memory I excluded that and I just

play03:29

cleared CLE down my memory here so it

play03:31

just writes like it would without any

play03:33

kind of backend instructions here and

play03:35

when it comes to writing I ran it across

play03:37

10 different tests and there was not a

play03:39

clearcut winner I think if I was

play03:42

comparing claw 3.5 versus the previous

play03:45

GPT model it would have been a clear

play03:46

winner but with 40 and Claw 3.5 Sonet

play03:50

right now I think when it comes to

play03:51

writing is pretty on par now next I want

play03:54

to show you text summary and a lot of

play03:55

times I consume information now using

play03:58

these models when there's a large amount

play04:00

of information or a huge article or a

play04:02

big newsletter I usually just put it

play04:04

here and I let it just summarize it for

play04:06

me very quickly so let's do that here

play04:09

okay here's the prompt provide two

play04:11

summaries for this article the first one

play04:13

two to three sentences the second one

play04:16

five to six sentences and includes more

play04:19

details and I'm going to use this

play04:21

anthropic introduction here about Sonet

play04:24

so I'm going to go ahead and copy this

play04:26

this whole page and I'll just paste it

play04:28

here I typically just copy the entire

play04:30

page with the footer and everything and

play04:31

he knows what to do with it it's a much

play04:33

faster way to do this I'll send this out

play04:35

here and this time let me just kind of

play04:37

show you the speed because this 3.5

play04:40

sonit is actually pretty quick so he

play04:42

paid attention to the length he got it

play04:44

write I read through this and everything

play04:46

was accurate there was no kind of

play04:47

hallucination with this answer

play04:49

everything was exactly from that article

play04:52

okay here's chat gpt's answer the short

play04:54

summary again right on point the detail

play04:56

summary I really like the tone it was

play04:58

not at all emotional very factual kind

play05:01

of how I like it without much detail

play05:03

prompting I didn't really tell it what

play05:05

kind of tone to take wanted to see what

play05:07

it does by default and one functional

play05:09

thing I like here GPT actually if you

play05:11

have the paid version you could use GPT

play05:13

4 and actually get a different response

play05:16

using a different model and compare your

play05:18

results inside of cloud if I try to do

play05:21

the same thing with a paid version I

play05:23

could TR change models and use Opus for

play05:25

example but then that's going to require

play05:27

me to start a new chat so I can't

play05:30

functionally use it the same way not

play05:32

that I would do that often but sometimes

play05:34

I find myself not quite liking what 40

play05:36

gave me and I wanted to see what the

play05:38

older four model which was good which is

play05:40

good still I wanted to see what that

play05:42

gave me something I could do in chat GPT

play05:44

okay now let's test its Vision

play05:46

capabilities then I'll test its Vision

play05:49

capabilities with some data analytics

play05:50

but here I want to see if you could find

play05:53

out what's in a very complex image this

play05:55

is as weird of an image I could find

play05:58

this is world history in one image

play06:00

that's what I Googled and just looking

play06:03

at it you can't tell you can't make out

play06:06

just about anything that's happening

play06:07

maybe some some of the years here you

play06:10

could make out but nothing else let's

play06:12

see if Claude and CH PT could figure it

play06:15

out so again with these models you could

play06:17

upload images here with Claude you could

play06:20

upload five and each has to be 10

play06:22

megabytes with chat GPT you could upload

play06:25

10 here so that is one of the benefits

play06:28

and it has connected apps so you could

play06:30

actually connect to your one drive with

play06:32

Microsoft or mine is connected to my

play06:35

Google Drive which makes this a whole

play06:37

lot more useful when it just comes to

play06:39

these functionalities CLA is really

play06:42

lacking when it comes to just basic

play06:44

functionalities I'll point out bunch of

play06:45

them too as we go through this video but

play06:47

let me upload that image and I'm just

play06:49

going to press enter I'm not even going

play06:50

to give it a prompt here this is chat

play06:52

GPT 40 okay and from what I could see it

play06:55

found the name for it stream of time a

play06:58

timeline m map the rise and fall of

play07:01

different civilization Empires and

play07:02

Nations and it says it's from 250 ad to

play07:06

1700 ad and I'm going to say analyze

play07:08

this image and explain this to me so

play07:11

sometimes just changing something and

play07:13

giving it in different format bullet

play07:14

point table I use that all the time okay

play07:17

and it's giving us a really nice

play07:18

presentation here in table format so it

play07:20

looks like it's putting the different

play07:21

country civilizations here and showing

play07:24

the time period where they were around

play07:25

the key events and so on and summary

play07:28

okay really nice answer here that we got

play07:30

out of chat GPT assuming is correct

play07:33

let's go to CLA and do the same thing

play07:35

okay again I'll just press enter to

play07:36

start with no prompt okay so it got the

play07:39

same theme it picked up on some of the

play07:41

colors and it's telling us it's from

play07:44

580 to 1180 let's make that same table

play07:48

here and this one is using this thing

play07:50

called artifacts which is something you

play07:52

could turn on in settings is completely

play07:54

beta right now but it kind of creates

play07:56

things on the right side and it's great

play07:59

for coding and visual presentations like

play08:01

tables like this so I really like this

play08:03

new update I covered this in a different

play08:05

video that I posted about claw 3.5 and

play08:08

it created kind of a different table so

play08:10

it just gave us the elements and the

play08:12

description for these elements and chat

play08:14

GPT gave us something completely

play08:16

different so this looks more usable but

play08:19

when I'm looking at this it actually got

play08:20

the time wrong I did a little bit closer

play08:23

look at this image if you look a little

play08:25

bit closer on the very bottom it does

play08:27

end at the Timeline ends at 1100 ad so

play08:31

Claude was correct he got the

play08:32

information right so I'm just going to

play08:34

follow up here to ask for the similar

play08:36

table that chat GPT gave me I'm going to

play08:38

ask it for a timeline with each

play08:41

civilization and the rise and fall okay

play08:44

and this time it looks like it did a

play08:45

better job and just to be sure I ran

play08:47

this three different times and each time

play08:49

I got a very similar result here so

play08:52

basically chat GPT gave us something

play08:54

that looks better right so at first

play08:57

glance this looks more interesting in

play09:00

and it looks like it gave us better

play09:01

information but it totally made up the

play09:04

time that graph started from 500 to 1100

play09:08

this did not give us anything that is in

play09:11

that time frame it kind of extended the

play09:13

time frame here so I wouldn't take any

play09:15

of this information at face value these

play09:17

are some of the limitations of these

play09:19

large language models in general you

play09:21

can't just look at the output and just

play09:23

assume it's right so sometimes it even

play09:26

makes sense to have two different

play09:27

subscriptions if you're doing Vision k

play09:29

capabilities of data analytics to just

play09:31

run them and then using your own

play09:33

judgment to see actually which one makes

play09:35

sense I had to really take a close look

play09:38

at this picture to kind of try to figure

play09:40

it out and this was one of the more

play09:41

complex things that I've given these

play09:43

large language model models to analyze

play09:45

but I could tell Claude 3.5 right now is

play09:49

winning there here's kind of a

play09:50

challenging graph to see what it comes

play09:51

up with so I'm giving it this graph but

play09:53

I cropped out what this graph represents

play09:56

but what it is is the interest rate for

play09:58

used card in the US here so I'm going to

play10:01

see what it comes up with okay here's

play10:02

chat gpts telling us this is a Trends

play10:05

graph from 2014 to

play10:07

2024 is stable all the way till the

play10:09

pandemic then he has a dip which is

play10:11

telling us right here it has a 3% or it

play10:14

dips below 3% which is right so it was

play10:17

all the way around four dip to closer to

play10:19

three and then significant increase

play10:21

which is accurate let's see if you could

play10:24

actually figure out what this represents

play10:26

and he thinks it represents the federal

play10:28

funds rate which again they do set the

play10:30

interest rates so pretty close but I

play10:33

didn't really think it would figure out

play10:35

that this is for the car market in the

play10:37

US in this time frame but I was curious

play10:39

to see if it's going to do any type of

play10:42

research it's going to look online but

play10:44

it came up with two different

play10:45

conclusions here federal funds rate or

play10:47

Central Bank policy which is not correct

play10:50

let's go see what Claude did but that

play10:52

was not my test by the way I just wanted

play10:54

to see if it did that extra step right

play10:56

now I want to see if it could just

play10:57

analyze things pull in the numbers and

play11:00

then use those numbers to do deeper

play11:02

analysis and it looks like Claud again

play11:05

no problems here it gave us the range it

play11:06

told us the range of the interest rate

play11:09

here and it told us exactly how it's

play11:11

changing over time and this time I also

play11:13

asked it what this graph represented and

play11:16

it gave us five different options none

play11:17

of them were very specific to what I was

play11:20

going for but generally it's all about

play11:22

the interest rate here and it kind of

play11:24

figured that out but it wasn't specific

play11:26

enough to use car market in the US and

play11:28

the interest rate on that now I'm going

play11:30

to follow up with chat GPT I'm going to

play11:32

say create a presentation based on this

play11:34

information now it's going to go through

play11:36

here create these kind of slides for us

play11:38

so it's giving us titles for the slides

play11:41

then it's telling us would you like to

play11:43

create a PowerPoint file created with

play11:45

this contents let's say yes okay it's

play11:48

done and gave us a link let me go ahead

play11:49

and download this link to see what it

play11:51

gave us okay it looks like it gave us a

play11:53

detailed PowerPoint here it does need

play11:56

some styling it typically doesn't do the

play11:57

styling here but PowerPoint has this AI

play12:00

this designer AI where you could just go

play12:02

ahead and select different designs here

play12:04

from the side and get yourself a

play12:07

finished presentation so nice job with

play12:10

chat GPT and I asked Claude to create a

play12:12

visual presentation and look what Claude

play12:15

did here this is with the artifacts

play12:17

option turned on again you could turn

play12:19

that on in the settings but it wrote

play12:21

some code and then it creates this

play12:24

preview window and it created this nice

play12:27

visual graph I mean this is kind of the

play12:28

same as our current graph let me see if

play12:31

it could make us a PowerPoint

play12:32

presentation but this is really cool

play12:34

right inside of your viewport here let

play12:36

me see if it could create a PowerPoint

play12:38

presentation here okay it's doing the

play12:40

same kind of thing it's creating the

play12:41

slides or the text here for the

play12:43

different slides and what bullet point

play12:45

should go in each one and it looks like

play12:47

it cannot do that so I can't create or

play12:50

edit or provide download links to

play12:52

PowerPoints directly so all it was able

play12:55

to do is kind of write code and create

play12:57

this nice visual presentation for for us

play12:59

right within chat but in this case I did

play13:02

want a PowerPoint presentation now

play13:04

that's one of the big limitations of

play13:06

cloud there's lot of functionalities

play13:08

like this one was a really useful

play13:10

practical thing right I want to give it

play13:11

some data just from an image get the

play13:14

context from that and turn it into a

play13:16

presentation chat GPT could do that in

play13:18

one minute right and we could then use

play13:20

PowerPoint to design it using the AI

play13:22

inside of that Claud can't do that so it

play13:25

could only do things like these visual

play13:27

representations and again I ran this

play13:29

through a bunch of different tests and I

play13:31

think with data analytics so far in my

play13:35

early testing they were pretty equal so

play13:37

functionality goes to chat GPT but in

play13:39

the function of data analytics they both

play13:42

are about the same right now now at this

play13:43

point I usually do a head-to-head test

play13:45

with image generation but the only way

play13:48

you get image generation right now is

play13:49

using chat GPT with a paid subscription

play13:52

and that gives you access to Dolly 3

play13:54

that generates images for you Claude

play13:56

cannot and never has been able to

play13:58

generate images so I can't compare that

play14:01

so that obviously is a point for chat

play14:03

GPT so if you need image generation in

play14:06

your day-to-day work you'll have to use

play14:08

another tool like co-pilots free and

play14:10

that has doly 3 built into it but you

play14:12

can't just use Claude because that

play14:14

doesn't have image generation at all so

play14:16

if that's part of your workflow keep

play14:18

that in mind now let's do a little bit

play14:19

of research here I'm going to ask chat

play14:21

PT write about ai's disruption in the

play14:24

accounting industry and give me specific

play14:26

links and articles and reports and here

play14:29

you gave us some information I'm going

play14:30

to go to the bottom of it to make sure

play14:31

he's giving us some relevant links here

play14:33

and for some reason the links are not

play14:36

clickable so sometimes it makes up links

play14:38

sometimes it gives us links that look

play14:40

like hyperlinks but when you go to click

play14:42

them they're not clickable I'm going to

play14:43

tell it they're not clickable I asked it

play14:45

to give me the links again and I still

play14:47

couldn't click them the third time I

play14:49

couldn't click them so I asked them to

play14:51

give me the links like this so I could

play14:52

copy and paste the links let's see nope

play14:55

made a page there let's try this one

play14:59

cannot find this page okay so a lot of

play15:03

times chant GPT when you use it for

play15:05

research when you need specific

play15:07

information from specific websites and

play15:11

resources like this it just does not

play15:13

work it literally makes up links like

play15:15

you're seeing here this has happened to

play15:17

me probably every other time that I've

play15:19

used chat GPT for research okay on the

play15:22

other hand let's look at Claude so

play15:23

Claude did again a nice job gave us

play15:25

specific use cases of things that could

play15:27

be interrupted by AI potential

play15:30

challenges if I go to the bottom here I

play15:33

don't have access to current articles or

play15:35

reports okay so keep this in mind Claude

play15:38

does not have internet access it never

play15:41

has had internet access where GPT 40 has

play15:44

internet access sometimes it makes huge

play15:46

mistakes like you just saw but sometimes

play15:48

it works so in this case it doesn't work

play15:51

at all chat GPT could follow URLs a lot

play15:54

of times I'm optimizing my website for

play15:56

example I give it a URL it goes craws

play15:57

the website it tells me things to

play15:59

improve I can't do things like that here

play16:02

for research I would not use any of

play16:04

these tools I wouldn't use Claud I

play16:06

wouldn't use chat GPT I would use

play16:08

perplexity AI so that is a great

play16:11

research tool it uses the power of these

play16:13

models in the background but it's really

play16:15

designed to be a search engine that's AI

play16:17

powered I have a different video about

play16:19

that or I'll use Google Gemini and let

play16:22

Google Gemini give me a snippet and I

play16:24

know that is kind of pulling from more

play16:26

accurate listings based on the Google

play16:28

search right so both of these in my

play16:31

opinion get a zero okay now let's do

play16:33

some coding I'm going to see if I can

play16:35

make a dashboard with these models we're

play16:37

going to go to the Nvidia website here

play16:39

and I'm going to pull in one of their

play16:41

financial reports here so this is a

play16:43

massive massive document I believe it's

play16:45

98 Pages let's download this okay I'm

play16:48

going to ask Claude I uploaded this

play16:49

document turn this into a visual

play16:52

dashboard here to see what we get and

play16:55

usually if you have that artifacts

play16:57

option turned on it starts writing the

play16:59

code right here on the side and as soon

play17:01

as it's done it turns into preview mode

play17:05

where you could actually see the output

play17:07

which is awesome this is one of my

play17:08

favorite updates look at this it created

play17:10

this visual update for me and it's

play17:13

interactive so I could hover over things

play17:16

wow this is nice all right let's see if

play17:18

chat GPT could do the same now the nice

play17:21

thing by the way is both GPT 40 and Claw

play17:25

3.5 Signet now they have such a big

play17:29

context window that I could just use a

play17:31

98 page document as part of my prompt

play17:34

and upload that okay so chat GPT just

play17:37

gave us bunch of information from that

play17:39

document so it pulled in bunch of

play17:41

different numbers and things like that

play17:42

it did not create the visual

play17:44

presentation it's ask me if I want to

play17:46

proceed I'll say yes and again it looks

play17:48

like it gave us a ridiculously long

play17:51

stepbystep process on how to use this

play17:54

other app to do this outside of chat

play17:56

gptt it's not even attemp attempting to

play17:59

write oh it's still going it's not even

play18:00

attempting to write any code for us

play18:02

again I went back and forth three

play18:04

different times with chat GPT to try to

play18:06

just get it to do this before it used to

play18:09

create interactive and I think it still

play18:11

does but for some reason in the last

play18:12

couple of days I haven't got it to do

play18:14

any type of coding or create any type of

play18:16

interactive graphs here when I give a

play18:19

very specific instructions to do so okay

play18:22

so when it comes to visualization of

play18:25

data using Code well cloud is obviously

play18:28

the winner there now let's see if we

play18:30

could create a game this time I'm going

play18:32

to create a game of checkers I typically

play18:34

do a game of snake or Tic Tac Toe let's

play18:36

see if he could do a game of checkers

play18:38

without again any information about what

play18:41

kind of language to use in just 10

play18:43

seconds he wrote the code and he gave us

play18:45

the preview now let's see if it actually

play18:47

works it says current player red let's

play18:50

go ahead and try to move our piece from

play18:52

here to here black from here to here and

play18:55

I'll take this piece nope oh he almost

play18:59

worked but it doesn't quite know how to

play19:02

take a piece I asked chpt to create a

play19:04

game of checkers and this time it's

play19:07

giving us again bunch of text board

play19:09

layouts okay where's the code okay there

play19:12

we go we finally got it and he chose

play19:14

python here and here's the python game

play19:16

that chat GPT wrote it does not have any

play19:19

pieces I can't start a new game okay so

play19:22

it just made the board so I'll just give

play19:24

it one prompt to try to fix it although

play19:26

I didn't give Claude any prompts okay

play19:28

here's a new one we got pieces this time

play19:31

and okay it did not add any

play19:34

functionality so it just basically

play19:36

designed a game that doesn't do anything

play19:39

so I don't know so far I've tested this

play19:42

a handful of times and every single time

play19:47

claw 3.5 Sonet when it came to any type

play19:51

of coding it be chat GPT 40 okay let's

play19:55

test out complex reasoning here so

play19:57

here's the prompt at a party each guest

play19:59

shakes hands with every other guest

play20:02

exactly once if there were a total of 66

play20:06

handshakes how many guests were at a

play20:08

party okay so CHP created a nice formula

play20:11

over here and I know the answer is 12 so

play20:15

let's see if we get to that answer it

play20:17

came up with two answers 12

play20:20

And1 since the guest can't be negative

play20:23

the number is 12 and it gave us that

play20:25

answer let's try Claude okay Claude took

play20:28

the same path it came up with 12 or1

play20:31

since the answer can't be negative the

play20:34

number must be 12 okay let's try this

play20:37

one what has a voice that can't speak it

play20:39

has a bed but never sleeps it has a

play20:42

mouth but never eats and it runs but it

play20:45

has no feet okay Claud says this is a

play20:48

riddle and the answer is a river and

play20:50

chat GPT same thing it thinks it's a

play20:53

river okay I'm obviously not doing a

play20:55

scientific test but you could see

play20:56

they're both kind of doing a good job

play20:58

when when it comes to logic and solving

play20:59

riddles and puzzles now this next one is

play21:02

for Content creation so what I'm going

play21:03

to do is I'm going to give it a YouTube

play21:05

script I'm just going to upload my last

play21:06

YouTube script here and I'm going to ask

play21:08

it to turn it into a tweet this is my

play21:10

prompt extract the core lessons and

play21:12

actionable items from this YouTube

play21:14

script condense it into a concise tweet

play21:17

or LinkedIn post suitable for quick

play21:19

consumption and I'll add that YouTube

play21:21

transcript here okay here's our tweet

play21:24

Claud 3.5 Sonet and new free AI model

play21:27

from anthropic outperforms GPT 4 if

play21:30

forgot the 40 part on most benchmarks

play21:34

Improvement in speed vision 200k context

play21:38

window introduces artifacts okay really

play21:41

nice there is no reason why besides

play21:44

fixing this little missing o here I

play21:46

would use this right here you created a

play21:48

couple hashtags it's good usually I'm

play21:50

having a hard time finding even when I'm

play21:52

building my own gpts to do a really good

play21:55

job where I would actually use that

play21:57

tweet or that link Linkin post okay so

play21:59

chat GPT decided to give us a tweet and

play22:02

a LinkedIn post so the other one only

play22:04

gave us one and then I guess I could use

play22:07

it for both but look at this tweet new

play22:09

CLA 3.5 is here faster forget

play22:12

subscriptions dive into the future never

play22:15

would I use that that's just extremely

play22:18

uh bad okay if I see a post like this

play22:22

I'm not following that person right that

play22:24

is not a useful post so once again it

play22:28

looks like claw 3.5 Sonet is beating

play22:31

chat GPT okay so with my practical test

play22:35

claw 3.5 Sonic came ahead overall but

play22:38

huge limitations I want to point out so

play22:40

if you had to choose between one of them

play22:43

here's a huge limitation when it comes

play22:44

to the paid subscription you don't have

play22:47

web browsing with Claude a huge downside

play22:50

the information is not going to be

play22:52

relevant and up to dat if you use this

play22:54

for research I recommend you use

play22:55

perplexity anyway for research even the

play22:57

free version of perplexity is going to

play23:00

beat both of these I think that's a huge

play23:02

downside for Claude if you need to

play23:04

create images huge downside for Claude

play23:07

Claude doesn't have a good way to search

play23:09

any previous conversations but neither

play23:12

does chat GPT but chat GPT does if you

play23:15

have the desktop app or if you have the

play23:17

mobile app they have a search function

play23:19

which is huge it's missing though right

play23:21

now inside of the chat GPT website for

play23:24

some reason now two huge downsides with

play23:27

Claude it doesn't yet have memory

play23:30

function and that I had no idea how much

play23:33

it's going to improve the functionality

play23:34

of chat GPT by default when you talk to

play23:37

it sometimes you store things to memory

play23:39

and it gets smarter and smarter based on

play23:41

your previous conversations and gives

play23:43

you better responses so that's a huge

play23:45

benefit of chat GPT and the biggest

play23:47

reason why I would choose GPT 40 over

play23:51

Claude is because with a paid version of

play23:54

chat GPT you could build custom gpts

play23:57

those are very specific little mini gpts

play24:00

basically with your knowledge base and

play24:02

with your very specific set of

play24:04

instructions for my company we've build

play24:06

well over I think we have 15 20

play24:09

different custom gpts and each one does

play24:11

a very specific task at this point I

play24:14

wouldn't really even know how to

play24:15

function day-to-day without those Claude

play24:17

obviously just doesn't have those

play24:19

co-pilot for some reason is getting rid

play24:20

of those but that is my favorite part of

play24:24

generative AI is those custom gpts that

play24:26

I could train to do just do one thing

play24:28

really really well where the broad

play24:30

version of chat GPT and CLA are just not

play24:32

going to be that good at it right they

play24:33

don't have that specific knowledge base

play24:35

they don't have that specific set of

play24:37

instructions so I've covered custom gpts

play24:39

in a different video and exactly how to

play24:40

build them so if you haven't watched

play24:42

that and you're not using that I highly

play24:44

recommend them they will solve so many

play24:47

issues for you and it will save you so

play24:48

much time throughout the week so I'll

play24:50

link that video here I hope you found

play24:52

this head-to-head useful and you can

play24:53

make a clear decisions between the two

play24:56

right now just from the function for my

play24:59

personal use I kind of have to use both

play25:01

because some of these coding just basic

play25:03

coding things that I'm doing just CLA is

play25:05

just so much better with that so I'm

play25:07

going to use Claud for that kind of

play25:08

stuff I'm going to use chat GPT for my

play25:10

everyday writing summarization things

play25:12

like that and I'm going to use

play25:14

perplexity for research I'll see you on

play25:17

the next video

Rate This

5.0 / 5 (0 votes)

Связанные теги
AI ComparisonGPT-4Claude 3.5Practical TestBusiness ToolsText SummarizationVision AnalyticsCodingReasoningContent CreationResearch Limitations
Вам нужно краткое изложение на английском?