Debunking Devin: "First AI Software Engineer" Upwork lie exposed!

Internet of Bugs
9 Apr 202425:16

Summary

TLDRThe video script discusses the hype around AI, specifically focusing on the claims made about an AI named Devin, which was introduced as the 'world's first AI software engineer.' The speaker, a software professional with 35 years of experience, critiques the exaggeration and misinformation surrounding AI capabilities, using Devin as a case study. He argues that while AI can perform impressive tasks, it is not capable of fully replacing human software engineers, especially in understanding and communicating with clients. The speaker emphasizes the importance of truthful representation of AI's abilities and the potential harm caused by overhyping its capabilities.

Takeaways

  • 😶 The video script critiques the hype around AI, specifically calling out the exaggerated claims made about an AI named Devin.
  • 🚫 The main claim that Devin can 'make money taking on messy Upwork tasks' is disputed as false and misleading.
  • 🙅‍♂️ The speaker emphasizes that they are not anti-AI, but rather against the hype and misinformation surrounding AI capabilities.
  • 🎥 The video description and company tweets are highlighted as sources of the misleading claims about Devin's capabilities.
  • 🤖 The speaker acknowledges that generative AI tools like GitHub Copilot and ChatGPT are impressive and useful, but stresses the importance of honesty in their representation.
  • 🛠️ The actual task that Devin was supposed to accomplish is discussed, noting that it was a cherry-picked, specific task rather than a general capability.
  • 🔍 The speaker conducted their own research and attempted to replicate Devin's work, finding that the AI's output was not as groundbreaking as claimed.
  • ⏳ The time it took for the speaker to replicate Devin's work was significantly less than what was shown in the video, questioning the efficiency of the AI's process.
  • 💻 The speaker points out that Devin generated errors and then attempted to fix them, which does not align with the narrative of fixing existing code in the repository.
  • 🔗 The speaker encourages viewers to check the original sources of information and to be skeptical of headlines and claims made about AI.
  • 📢 The video ends with a call for transparency, honesty, and skepticism in the face of AI hype and the potential for misinformation.

Q & A

  • What is the main claim presented in the video description that the speaker argues is a lie?

    -The main claim presented in the video description that the speaker argues is a lie is that Devin, an AI software engineer, can make money by taking on messy Upwork tasks. The speaker asserts that this does not happen in the video and that it is misleading.

  • What is the speaker's professional background and stance on AI?

    -The speaker has been a software professional for 35 years and is not anti-AI. However, the speaker is anti-hype and believes that lying about the capabilities of AI tools like Devin does a disservice to everyone.

  • How does the speaker describe the impact of hype and misinformation around AI capabilities?

    -The speaker suggests that the hype and misinformation around AI capabilities can lead non-technical people to overestimate the current capabilities of AI. This can result in less skepticism towards AI outputs, leading to potential problems such as increased bugs, exploits, and hacks in the software ecosystem.

  • What was the specific task that Devin was supposed to perform on Upwork?

    -Devin was supposed to provide detailed instructions on how to make inferences with a specific model in a repository on AWS EC2. However, the speaker argues that Devin did not fulfill this task as it did not generate the required detailed instructions.

  • What does the speaker criticize about the way Devin was presented and the hype around it?

    -The speaker criticizes the hype around Devin for being exaggerated and misleading. The speaker points out that the claims about Devin's capabilities were not truthful and that the company behind Devin should have been more honest about what it could actually do.

  • What is the speaker's opinion on the importance of communication in software engineering?

    -The speaker believes that communication is a crucial part of software engineering, involving understanding the needs of the customer, stakeholders, and team members. The speaker argues that AI is currently not capable of handling these communication aspects, which are some of the most important tasks in software engineering.

  • What did the speaker find when replicating Devin's work?

    -The speaker found that Devin did not fix any actual errors from the repository provided by the Upwork client. Instead, Devin generated its own code with errors and then attempted to debug and fix those self-generated issues.

  • How long did it take the speaker to replicate Devin's results?

    -It took the speaker approximately 36 minutes and 55 seconds to replicate what Devin did, which was significantly less time than the six hours and 20 minutes that Devin supposedly took.

  • What is the speaker's advice for AI product creators and those who report on AI?

    -The speaker advises AI product creators to be truthful about their products' capabilities and not to exaggerate. For journalists, bloggers, and influencers, the speaker urges them to verify the claims they read on the Internet and not to blindly amplify unverified information.

  • What does the speaker suggest is the current state of generative AI in terms of coding?

    -The speaker suggests that the current state of generative AI in coding often produces complicated, convoluted, and sometimes nonsensical code. It may work, but it is not efficient and can create more work for maintenance, bug fixing, or updates in the future.

  • What is the speaker's final message to the audience regarding skepticism and the internet?

    -The speaker's final message is a call for skepticism towards everything seen on the Internet or news, especially when it comes to AI-related content. The speaker emphasizes the importance of not taking information at face value and verifying the truthfulness of claims before accepting them as facts.

Outlines

00:00

🗣️ Introduction and Critique of AI Hype

The speaker, Carl, introduces himself and sets the stage for a critical examination of AI hype, specifically focusing on Devin, an AI software engineer. Carl clarifies that while he supports AI, he is against the exaggerated claims surrounding it. He criticizes the claim that Devin can make money by taking on Upwork tasks, stating that this is a lie and that the video does not demonstrate this capability. Carl emphasizes the damage caused by such lies, especially to non-technical individuals who may develop unrealistic expectations of AI capabilities.

05:01

📝 Analysis of Devin's Upwork Task

Carl delves into the specifics of the Upwork task that Devin was supposed to complete. He points out that the task was not randomly selected, implying that it may not represent Devin's capabilities accurately. Carl outlines what the customer requested and contrasts it with what Devin actually did, highlighting the discrepancies. He stresses the importance of understanding customer needs and the limitations of AI in this aspect, suggesting that AI is currently not capable of fully comprehending and executing complex tasks as required by human software engineers.

10:03

🛠️ Devin's Actual Performance and Shortcomings

Carl provides a detailed critique of Devin's actual performance on the Upwork task. He explains that Devin did not fulfill the customer's request and instead generated code with errors. Carl suggests that Devin's actions gave the false impression of fixing repository errors, when in fact it was creating and then fixing its own mistakes. He also notes that a real error in the repository went unnoticed and unfixed by Devin, further demonstrating the limitations of AI in software engineering tasks.

15:04

⏱️ Time Efficiency and Quality of Devin's Work

Carl discusses the time it took Devin to complete the task, which was significantly longer than what Carl himself took to replicate the results. He questions the efficiency of Devin's process and highlights the unnecessary complexity introduced by the AI. Carl also points out a nonsensical command used by Devin, illustrating the AI's current shortcomings in generating efficient and sensible code. He emphasizes that while Devin's output might seem impressive, it is not practical or efficient in real-world scenarios.

20:05

🚫 The Need for Skepticism and Truth in AI

Carl concludes by urging viewers to be skeptical of AI-related claims and to verify information before accepting it as truth. He calls for honesty from AI developers, journalists, and influencers when presenting AI capabilities to the public. Carl reiterates that while AI can be impressive, it is crucial to manage expectations and not to overstate its current capabilities. He ends with a reminder that the internet is full of misinformation and that skepticism is essential, especially when it comes to AI.

Mindmap

Keywords

💡AI Software Engineer

The term 'AI Software Engineer' refers to an artificial intelligence system or tool, like Devin, that is designed to perform tasks typically associated with software engineering. In the context of the video, this term is contentious as the presenter argues that Devin, despite being touted as the 'world's first AI software engineer,' does not live up to the expectations set by such a title. The presenter is skeptical of the hype surrounding AI capabilities and emphasizes the importance of truthful representation of AI tools.

💡Hype

In the context of this video, 'hype' refers to the exaggerated or misleading promotion of a product or concept, in this case, AI technology. The presenter is critical of the hype around Devin, arguing that the claims made about its capabilities are not accurate and do a disservice to the field by creating unrealistic expectations. Hype can lead to disappointment and mistrust when the actual performance does not match the promoted potential.

💡Upwork

Upwork is a freelancing platform where individuals and businesses can post jobs and find independent contractors to complete them. In the video, the claim is made that Devin can take on and complete tasks from Upwork, specifically a task related to making inferences with a model in a repository. However, the presenter argues that this claim is false and that Devin does not actually perform the work as advertised.

💡Generative AI

Generative AI refers to artificial intelligence systems that are capable of creating new content, such as text, images, or code. In the video, the presenter acknowledges the coolness factor of generative AI and mentions using tools like GitHub Copilot, ChatGPT, and Stable Diffusion. However, the presenter also expresses concern about the potential for these tools to be misrepresented and to create a false sense of their capabilities, leading to potential harm if their outputs are not critically evaluated.

💡Technical Audience

The term 'technical audience' refers to individuals who have a background or understanding in technology, particularly in fields such as software development, programming, or IT. In the video, the presenter addresses this audience, assuming they have a certain level of technical knowledge. The presenter warns that while the technical audience may be more discerning, there are many non-technical individuals who may be more easily misled by the hype around AI.

💡Bug

In the context of this video, a 'bug' refers to an error, flaw, or failure in a software program or system. The presenter uses the term 'Internet of Bugs' to illustrate the prevalence of software issues and the potential for AI-generated code to contribute to these problems. The concern is that if AI tools like Devin are not accurate or reliable, they could generate code that introduces new bugs or exacerbates existing ones.

💡Code Quality

Code quality refers to the standard and effectiveness of the programming code produced, which includes factors such as readability, efficiency, maintainability, and the absence of errors. The video criticizes Devin for producing code that is not only unnecessary but also potentially problematic, suggesting that the AI's output may compromise code quality. The presenter argues that good code quality is essential for the long-term health of software ecosystems and that AI-generated code should be held to high standards.

💡Transparency

Transparency in the context of this video refers to the openness and clarity with which AI developers or companies communicate about the capabilities and limitations of their AI products. The presenter advocates for transparency, arguing that companies should provide verifiable evidence of their AI's performance and not mislead the public with exaggerated claims. Transparency is crucial for building trust and ensuring that users have accurate expectations of AI capabilities.

💡Skepticism

Skepticism here refers to a critical and questioning attitude towards claims and information, particularly those related to AI. The video encourages viewers to maintain a level of skepticism when encountering claims about AI capabilities, especially since the field is rife with hype and misinformation. By approaching AI announcements and demonstrations with a skeptical mindset, individuals can protect themselves from accepting false or misleading information at face value.

💡Communication

In the context of software development, 'communication' refers to the interaction and information exchange between developers, clients, and other stakeholders to ensure that the project meets the requirements and expectations. The video emphasizes that AI, as currently developed, is not adept at understanding and managing the nuances of communication with clients, which is a crucial aspect of a software engineer's job. Effective communication helps in clarifying project requirements, managing expectations, and addressing any issues or changes that may arise during the development process.

💡Cloud Instance

A 'cloud instance' refers to a virtual machine that is provisioned and hosted on a cloud computing platform, such as Amazon Web Services (AWS) or other similar services. In the context of the video, the presenter discusses the need for a cloud instance to run a software model and how Devin, the AI, fails to provide detailed instructions on setting up and using such an instance for the task at hand. Proper setup and management of cloud instances are essential for running applications and services efficiently and securely.

Highlights

The speaker, Carl, introduces himself and clarifies that the video will be divided into three parts, focusing on the claim about Devin, an AI software engineer, and the hype surrounding it. (Start time: 0s)

Carl emphasizes his 35 years of experience in software and his stance against AI hype rather than AI itself. (Start time: 10s)

Devin was introduced as the 'world's first AI software engineer,' a claim Carl disputes. (Start time: 20s)

Carl criticizes the false claim that Devin can make money by taking on messy Upwork tasks, stating that this does not happen in the video. (Start time: 30s)

The speaker expresses his appreciation for generative AI tools like GitHub Copilot, ChatGPT, and Diffusion, but stresses the importance of honesty about their capabilities. (Start time: 50s)

Carl argues that the hype around Devin is based on a lie that has been repeated and embellished, causing harm to the perception of AI capabilities. (Start time: 1m 10s)

The video description contains a link to Carl's previous video about Devin, which provides context for the discussion. (Start time: 1m 30s)

Carl points out that the lies about Devin's capabilities are not in the video itself but in the description and company tweets. (Start time: 2m 20s)

The speaker highlights the damage caused by non-technical people believing AI is more capable than it currently is, leading to issues such as fake cases and scientific papers. (Start time: 3m 10s)

Carl explains the actual job Devin was supposed to do on Upwork, which involved making inferences with a model in a repository. (Start time: 4m 20s)

The speaker criticizes the bidding process on Upwork and suggests a better approach involving a Q&A section and clear assumptions. (Start time: 5m 40s)

Carl discusses the importance of communication with customers and stakeholders in software development, a skill he believes AIs lack. (Start time: 6m 30s)

The speaker clarifies that Devin's report did not contain what the customer asked for, and questions the actual value of the work done. (Start time: 7m 20s)

Carl describes the actual process of reproducing Devin's work, emphasizing the simplicity of the task and the inefficiency of Devin's approach. (Start time: 8m 10s)

The speaker points out that Devin's work involved fixing errors in code that Devin itself generated, rather than fixing the original repository's code. (Start time: 9m 00s)

Carl reveals that Devin's video showed a lengthy process, taking six hours and 20 minutes, which he finds inefficient and not reflective of competent work. (Start time: 10m 00s)

The speaker concludes by urging AI product creators and influencers to be truthful about AI capabilities and by encouraging internet users to be skeptical of AI-related claims. (Start time: 11m 20s)

Transcripts

play00:00

This is the Internet of Bugs, my

play00:01

name is Carl, and that is a lie.

play00:05

So this video is in three parts.

play00:08

First, we're going to talk about

play00:09

that claim.

play00:10

We're going to talk about what

play00:12

should have been done.

play00:14

What Devin actually did and how it

play00:17

did it and how well it did it.

play00:18

I have been a software professional

play00:21

for 35 years.

play00:22

I am not anti-AI, but I really am

play00:26

anti-hype and that's why I'm doing

play00:29

this.

play00:30

Devin was intro'd not quite a month

play00:32

ago now.

play00:32

And it was touted as the world's

play00:36

"first AI software engineer."

play00:38

And I don't believe that it's the

play00:40

first software engineer and I

play00:41

already made a video about that.

play00:43

I'll put the links in the

play00:43

description.

play00:44

But today is about the specific

play00:46

claim

play00:47

that's the first line of the video

play00:49

description, which says "watch Devin

play00:51

make money taking on messy Upwork

play00:53

tasks.

play00:54

That statement is a lie.

play00:56

You cannot watch that in the video.

play00:59

It does not happen in the video.

play01:00

It does not happen.

play01:01

What's worse though is that the

play01:04

hype and the fear, uncertainty and

play01:06

doubt from people repeating and

play01:08

embellishing on that claim because

play01:09

they're trying to get clicks or

play01:10

they're trying to go viral or they

play01:12

just want to be part of the zeitgeist.

play01:13

The hype around Devin in general is

play01:15

just crazy.

play01:16

And that statement seems to be what

play01:18

a lot of it is is pinned on. For the

play01:19

record, personally,

play01:20

I think generative AI is cool.

play01:22

I use GitHub co-pilot on a regular

play01:24

basis.

play01:25

I use ChatGPT, llama 2, Stable

play01:27

Diffusion.

play01:28

All that kind of stuff is cool, but

play01:30

lying about what these tools can do

play01:32

does everyone a disservice.

play01:34

So Devin does some impressive

play01:35

things.

play01:36

And I wish the company had just

play01:38

been truthful and just taken the

play01:41

win, but they didn't.

play01:43

And they had to pretend that it did

play01:45

a lot more than it actually did.

play01:47

Now, I don't want to take anything

play01:49

away from the engineers that

play01:51

actually built Devin.

play01:52

I think Devin is impressive in many

play01:53

ways and I'm especially not trying

play01:55

to pick on the guy that's in the

play01:57

video.

play01:58

The lies are not in the video

play02:00

itself.

play02:00

They're in the description and they're

play02:02

in the tweets that the company made

play02:03

point to it.

play02:04

And then they're in a lot of places

play02:06

and people that have repeated that

play02:08

lie over and over again.

play02:09

It shouldn't be okay.

play02:11

Companies should just not be

play02:12

allowed to lie without getting

play02:13

called out on it.

play02:14

And people shouldn't repeat things

play02:17

they heard on the Internet without

play02:19

checking for themselves.

play02:20

I realize that's tilting at windmills,

play02:23

but I'm going to die on that hill.

play02:25

Since nobody else that I've seen

play02:28

seems to be explaining why this is

play02:30

a lie.

play02:31

I guess if it's going to get done,

play02:34

I'm going to have to do it.

play02:36

So here I go. Before you think this

play02:38

is harmless,

play02:39

understand this kind of lie does

play02:40

real damage.

play02:41

You're watching this.

play02:43

You're probably at least somewhat

play02:44

technical.

play02:45

Keep in mind that there are a lot

play02:46

of people out there that see

play02:48

headlines don't read the articles

play02:50

that are not technical.

play02:52

And what these lies do is they

play02:54

cause non-technical people to

play02:57

believe that AI is far more capable

play02:59

than it is at the moment.

play03:00

And that causes all kinds of

play03:02

problems.

play03:03

People end up being a lot less

play03:05

skeptical of AI than they should be.

play03:08

They're a lot less skeptical of the

play03:10

output of AI than they really

play03:12

should be.

play03:12

And taking AI at face value these

play03:14

days is getting a lot of people in

play03:15

trouble.

play03:16

Just Google "AI lawyer fake cases" or

play03:19

"AI fake scientific papers."

play03:22

And those are just the prominent ones.

play03:23

And this hurts real software

play03:25

professionals too, because there are

play03:27

going to be folks that are going to

play03:29

trust the code that AIs generate.

play03:30

And that just means more bugs on

play03:32

the Internet and there are already

play03:34

way too many already.

play03:35

It's already a mess.

play03:37

They're already too many exploits.

play03:38

They're already too many hacks.

play03:40

And the more bad code that gets out

play03:42

there, the worse the ecosystem

play03:44

becomes for everyone.

play03:45

Enough of that. On to section two.

play03:47

What was the job that Devin was

play03:48

supposed to have done?

play03:49

So this is the beginning of the

play03:51

video or early in the video.

play03:53

Note that in the bottom left hand

play03:54

corner of your screen, I have stuck

play03:57

the time code of every frame that I'm

play03:59

going to be breaking down for you.

play04:01

So this is 2.936 seconds into the

play04:05

video.

play04:06

So you can go look yourself if you're

play04:08

curious about any particular thing

play04:09

or want to know the context around

play04:10

something that I'm talking about.

play04:12

This is the job that Devin

play04:13

supposedly did on Upwork.

play04:15

We'll talk about it in a minute.

play04:17

First off, look at the left of your

play04:18

screen at the top.

play04:19

Notice that they searched for this.

play04:21

So this is not some random job.

play04:23

This is not "Devin can do any job on

play04:25

Upwork,"" right?

play04:26

They cherry picked this.

play04:27

That isn't deceptive necessarily.

play04:31

You would kind of expect them to.

play04:33

But keep in mind that what that

play04:34

means is chances are Devin is

play04:37

actually worse at most jobs than

play04:39

Devin turned out to be on this one,

play04:41

which wasn't great.

play04:44

So zooming into that particular

play04:46

request.

play04:47

There at the bottom, that's what

play04:52

the customer actually wanted.

play04:55

"I want to make inferences with this

play04:57

repository.""

play04:59

"Your deliverable is detailed

play05:00

instructions.""

play05:01

I'm not going to talk about the

play05:02

estimate to complete the job thing.

play05:04

Devin didn't do that.

play05:06

That's fine.

play05:07

I'm not worried about that.

play05:08

But look at this.

play05:10

This is what Devin was actually

play05:12

told.

play05:14

This is what was copied and pasted

play05:15

into Devin.

play05:15

"I'm looking to make inferences with

play05:17

this model in the repository.

play05:19

Here's the repository.

play05:20

Please figure it out."

play05:21

Okay, back to the job.

play05:24

"Your deliverable will be detailed

play05:26

instructions on how to do it in EC2

play05:29

on AWS."

play05:29

"Please figure it out" is not the

play05:31

same as "detailed instructions on how

play05:33

to do it in an EC2 instance in AWS."

play05:35

For the record, this at the end of

play05:38

the video is the report that Devin

play05:40

generated.

play05:40

There is nothing in that at all

play05:42

about what the customer was

play05:44

actually asking for.

play05:45

So what should the results of this

play05:47

job actually look like?

play05:50

To start with, this is what you

play05:53

really need to know in order to be

play05:55

able to figure out how to do this.

play05:57

You're going to have to have some

play05:58

kind of instance in the cloud.

play05:59

You need to figure out what size,

play06:00

type, how much memory, all that

play06:01

kind of stuff.

play06:02

You need to find out from the

play06:03

customer.

play06:04

Would you rather have one that runs

play06:06

faster and is more expensive?

play06:08

Or would you rather one that's

play06:09

cheaper that runs slower?

play06:10

Is this going to be something that's

play06:11

always going to be up and you can

play06:13

just throw stuff at it whenever and

play06:14

have it give you an answer?

play06:15

Or are you going to launch it, run

play06:16

it and then turn it off to make to

play06:18

save money?

play06:18

How are you going to get the stuff

play06:20

you want to make inferences on?

play06:22

How are you going to take the

play06:23

images that you want to analyze?

play06:25

How are you going to get that onto

play06:26

the server?

play06:27

You want to do a web interface for

play06:28

that?

play06:28

You can SSH them.

play06:29

You can put them in S3 bucket.

play06:31

You know, how are you going to get

play06:32

access to the output of that?

play06:33

These are all questions that you

play06:34

need to know, right?

play06:35

This is going back to another video

play06:38

that I made, the part of the job of

play06:40

a software developer that the AIs

play06:43

are bad at.

play06:44

The hard part, the important part,

play06:46

the difficult part, the time

play06:48

consuming part of being a software

play06:49

engineer is communication with the

play06:52

customer, with your boss, with the

play06:54

stakeholders.

play06:54

Figuring out what actually needs to

play06:56

get done, going back and forth,

play06:57

saying, "okay, this would be a lot

play06:59

easier.

play07:00

How about we do that?"

play07:01

Those are the kinds of things that

play07:02

AI just isn't capable of doing, and

play07:03

those are some of the most

play07:04

important things that we do.

play07:06

This just starts right off as AI

play07:08

doing the wrong thing.

play07:09

Unfortunately, this is Upwork.

play07:11

So just for those of you that

play07:12

actually are ever going to be in

play07:14

this situation, Requests For

play07:16

Proposals like this are are bad.

play07:19

If you can avoid doing them, avoid

play07:21

it. Competent Request For Proposals

play07:22

process is going to have a Q&A

play07:24

section.

play07:25

So they tell you "This is what we

play07:26

want.""

play07:26

You send them questions other

play07:27

vendors send them questions.

play07:29

They answer all the questions, they

play07:30

send out the answers to everybody,

play07:31

and then the bidding happens.

play07:32

Since we can't do that in Upwork

play07:34

because it's not set up that way,

play07:35

the next best thing, which isn't

play07:36

actually a good thing, but the next

play07:38

best thing is you write down your

play07:39

questions.

play07:39

You pick the answer that will cause

play07:42

the cheapest amount of work, right?

play07:44

The least amount of work for you.

play07:46

Then at the top of your proposal,

play07:47

you say, "okay, here are all the

play07:48

assumptions I'm making.

play07:49

If any of these assumptions turn

play07:51

out not to be true, that's negotiable,

play07:53

but it means that the cost is going

play07:54

to go up." Because you want to bid as

play07:56

low as you can, but you want to

play07:57

make sure that the customer

play07:58

understands

play07:59

that you're bidding that value with

play08:01

these assumptions.

play08:02

And if any of those assumptions,

play08:03

they want it done differently, they're

play08:05

going to have to pay more.

play08:05

It's not a good bidding process,

play08:06

but if you're going to have to do

play08:07

that kind of bidding process, that's

play08:08

how you do it.

play08:09

So, a deliverable for this

play08:11

particular job should contain what

play08:13

kind of cloud instance type to use,

play08:15

what kind of operating system and

play08:17

image to use.

play08:18

How do you set up the install

play08:19

environment?

play08:20

So CUDA, Apex, PyTorch, don't worry

play08:22

about if you don't know what any of

play08:23

those are.

play08:23

It's not really important for this

play08:25

purpose.

play08:26

How to install that repo, so that's

play08:28

a four year old repo.

play08:30

You're either going to need to

play08:31

update that repo for modern Python

play08:33

and modern libraries, or you're

play08:35

going to have to explain how to

play08:37

install a four year old or an older

play08:39

environment.

play08:39

One of those two things is going to

play08:40

have to happen.

play08:41

You're going to have to explain to

play08:42

the customer how the data should be

play08:43

got onto the instance, how they're

play08:45

going to get their output off the

play08:46

instance, all that kind of stuff.

play08:47

I actually reproduced what Devin

play08:49

did myself.

play08:50

We'll talk more about that later.

play08:51

This is the actual instance size

play08:54

that I used.

play08:56

I used a company called Vultr

play08:57

instead of AWS because AWS's

play08:59

interface is a mess and it wouldn't

play09:01

make good videos.

play09:02

And on top of that, by the time

play09:03

this video got edited and uploaded,

play09:05

probably the new version of

play09:07

something would have been released

play09:09

and I would have the numbers wrong.

play09:10

So this is just it's a lot more

play09:11

stable.

play09:12

It's easier for this job for the

play09:14

customer.

play09:15

I would have actually done it on

play09:16

AWS.

play09:16

There's no... we have no idea what

play09:18

kind of image Devin used.

play09:19

They didn't tell us anything about

play09:20

it.

play09:22

If you are a masochist, there is a

play09:24

link for the whole and I'll put it

play09:26

now in the description for the

play09:27

whole uncut version of me spending

play09:30

35 minutes and 55 seconds or

play09:32

however long it took actually

play09:34

reproducing what Devin ended up

play09:36

doing.

play09:37

So if you have no life, you're

play09:40

welcome to watch that.

play09:41

I think transparency is important.

play09:44

It's really boring to watch, but it's

play09:46

important and I wish that the

play09:47

company that made Devin and anybody

play09:49

else that's making these kinds of

play09:50

claims on the Internet would

play09:51

actually just post

play09:52

"Here's the raw footage of what

play09:54

actually happened" so that we can

play09:55

verify their claims if we need to.

play09:57

All right, so on the next section

play09:59

given that we know that Devin didn't

play10:02

do what the customer asked and

play10:05

Devin's report did not have any of

play10:06

the stuff that the customer wanted

play10:08

and that Devin didn't actually get

play10:09

paid for any of this.

play10:11

What did Devin actually do if it

play10:13

didn't make money, what did it make

play10:15

and how good a job of that did it

play10:16

do?

play10:17

So here's a screenshot from the

play10:19

video.

play10:20

This is the repo in question.

play10:22

We'll come back to screens like

play10:24

this later.

play10:25

This is the first thing that Devin

play10:28

really changed.

play10:30

So there's a thing called a

play10:31

requirements.txt file.

play10:32

It determines what version of

play10:34

dependent libraries your code is

play10:36

going to run.

play10:37

And it had to change some things

play10:39

because the the libraries that this

play10:41

repo originally used from four

play10:44

years ago, some of them aren't

play10:45

downloadable anymore because they're

play10:47

so old.

play10:48

So something had to change.

play10:50

Here it says that Devin is actually

play10:52

updating the code.

play10:54

I guess that's kind of arguably

play10:57

true.

play10:58

I would say it's more a

play10:59

configuration file than changing

play11:01

the code, but I'll allow it.

play11:04

It is really cool that Devin can do

play11:06

this if what the tool did was just

play11:08

change all of the requirements so

play11:10

they all lined up,

play11:11

that would be something that would

play11:12

save me time.

play11:12

So that would be a cool thing to do.

play11:14

So it's good that you can do this.

play11:17

I don't know that I'd call it code,

play11:18

but it's a very, very small part of

play11:20

what actually needs to get done

play11:21

instead of what the customer asked

play11:23

for, which is basically "I want to

play11:24

be able to make my own inferences.""

play11:26

Devin was told just using the

play11:29

sample data is fine.

play11:30

So that's what I did on my reproducing

play11:32

what Devin did.

play11:34

Normally it should be more

play11:35

complicated than that, but that's

play11:37

what we're going to show that Devin

play11:38

actually did.

play11:40

Okay, so Devin is fairly early on

play11:43

hits an error.

play11:45

I did not hit this error and you'll

play11:46

see why in a second.

play11:48

So zooming in, here's this command

play11:51

line error.

play11:52

So here at the top.

play11:54

We have this error with

play11:57

image.open

play12:03

"file not found no such file

play12:05

or directory.""

play12:05

So this error is in a code file

play12:08

called "visualize_detections.py"

play12:10

and the reason that I didn't run

play12:11

into this problem is because there

play12:13

is no file called visualize_detections.py

play12:17

in that repository.

play12:18

I don't know where that file came

play12:21

from, but more about that in a sec.

play12:23

So back to that command line.

play12:25

If you zoom in on the other part of

play12:26

that window, you see this.

play12:28

So Devin is echoing a bunch of

play12:30

stuff into a file called inspect

play12:32

results.py

play12:33

and then it's running Python on it

play12:34

and it's getting a syntax error.

play12:36

You can't put backslash 'n' in a

play12:39

Python file.

play12:40

It doesn't work that way.

play12:42

Echo doesn't work that way.

play12:43

None of this works that way.

play12:45

This is just this is just

play12:46

nonsensical.

play12:47

This is the kind of thing that you

play12:48

might do as a human because you're

play12:51

not paying attention.

play12:51

And then you go, oh, yeah, I need

play12:53

to change the way I did that.

play12:55

But what seems to be happening is

play12:58

Devin is creating files that have

play13:01

errors in them

play13:02

and then it's fixing the errors.

play13:03

So here the video says that Devin

play13:05

is actually "doing print line

play13:06

debugging"" and that's cool.

play13:08

That's something a lot of us do.

play13:09

You know, there are always times

play13:10

that printf debugging or print

play13:11

line debugging ends up being useful.

play13:13

So it's cool that Devin can do that

play13:15

in at least some circumstances.

play13:17

But here's another error I didn't

play13:18

see and Devin is coming in trying

play13:20

to figure this out.

play13:21

The commentary here says "Devin is

play13:23

adding statements to track down

play13:24

these data flows until Devin

play13:25

understands.""

play13:26

Now, I'm okay with that.

play13:28

I don't know if the word

play13:29

"understands" there is technically

play13:30

true.

play13:31

I don't know that Devin actually

play13:32

"understands" anything.

play13:33

I would doubt it, but we anthropomorphize

play13:36

stuff like that all the time and it's

play13:38

a handy way of using language.

play13:39

So I'm not going to give them a

play13:41

hard time for that.

play13:42

But that said, let's look at what

play13:43

Devin's actually doing here.

play13:45

So zooming in on this, we've got

play13:48

this weird loop that it's doing.

play13:51

It's going through this file and

play13:52

reading stuff into a buffer.

play13:53

So this is the update_image_ids.py

play13:56

file.

play13:56

And again, this file does not exist

play14:00

anywhere in the repository that the

play14:02

customer wanted us to use.

play14:03

In fact, I searched all of GitHub

play14:05

and there are only two places that

play14:08

a file that this name exists at all.

play14:09

The reason there are three on the

play14:10

screen there is because one of them

play14:11

is a fork of the other.

play14:12

And none of them look anything like

play14:14

the one that Devin is using.

play14:15

So I don't know where this came

play14:16

from.

play14:17

We don't have any idea.

play14:19

But the problem is Devin is here

play14:24

debugging a file and that file it

play14:27

created and it's not in the repo at

play14:30

all.

play14:30

This is pretty insidious.

play14:32

So this gives the person who's

play14:33

viewing the video who's not paying

play14:35

that much attention who didn't have

play14:36

time or take the effort to look at

play14:38

the repo.

play14:38

It gives that viewer the impression

play14:41

that Devin is finding errors in the

play14:43

repository that the Upwork user

play14:46

asked us to look at.

play14:49

And fixing the errors in the

play14:50

repository.

play14:51

That's not the case.

play14:54

Devin is generating its own errors

play14:56

and then debugging and fixing the

play14:59

errors that it made itself.

play15:01

That's not what it seems like Devin

play15:04

would be doing.

play15:05

It's not what Devin is implied to

play15:07

be doing.

play15:07

It's not what many people who have

play15:10

written articles and posted videos

play15:14

about Devin have thought Devin was

play15:16

doing.

play15:18

But in fact, Devin isn't fixing

play15:20

code that it found on the Internet.

play15:22

Devin isn't fixing code that a

play15:24

customer asked it to fix.

play15:26

Devin is fixing code that it

play15:28

generated with errors in it.

play15:29

And that's not at all what most of

play15:32

the people who watch this video

play15:34

will think that it's doing.

play15:37

What's worse is that there's no

play15:39

reason for this.

play15:40

This is the README file from that

play15:42

repo.

play15:43

I told you we'd come back to this

play15:44

page.

play15:45

There is a file called infer.py

play15:47

that is in that repo and it does

play15:50

exactly what Devin does in this

play15:52

video.

play15:53

The README tells you that it

play15:55

does it.

play15:56

It tells you how to use it.

play15:58

There on the right.

play15:59

there's even a little button that

play16:01

you can click on where you can copy

play16:03

the whole command line and paste it

play16:05

in your window and hit return.

play16:06

And if you watch the long video

play16:08

where I reproduce the result, that's

play16:10

exactly what I did.

play16:11

I copied and pasted things, changed

play16:12

the path names and hit return and it

play16:13

worked.

play16:14

I don't think the person that wrote

play16:16

this repository, detecting road

play16:17

damage,

play16:18

I don't think the person that wrote

play16:20

that could have made it any easier

play16:21

to understand how we were supposed

play16:23

to use it.

play16:23

But Devin didn't seem to be able to

play16:25

figure that out.

play16:26

And so Devin had to create this

play16:29

other thing that was a mess.

play16:31

This code right here, this reading

play16:33

into a buffer thing.

play16:35

It's bad.

play16:38

Right? This is the way we had to

play16:40

read files in decade ago in 'C' and

play16:42

really lower level languages.

play16:44

Python has much better ways to

play16:46

handle this.

play16:46

As Devin is figuring out, this kind

play16:48

of thing is hard to debug.

play16:49

It's complicated.

play16:50

It's difficult, easy to get off by

play16:52

a little bit, which is I think what

play16:54

Devin is trying to debug here.

play16:55

I'm not exactly sure what was going

play16:56

wrong, but that's what it seems

play16:57

like

play16:58

is going wrong

play16:58

is it got off by some characters

play17:00

and so the JSON didn't parse right?

play17:02

But I mean, this is not how you

play17:04

would do it these days.

play17:05

This is not how you would do it in

play17:06

Python.

play17:07

This is not something that I would

play17:09

accept in a code review from a

play17:10

junior developer.

play17:11

This is causing more problems than

play17:13

it actually solves.

play17:15

This is bad.

play17:15

It's just bad.

play17:16

In addition, there is a real error

play17:19

in the repo and Devin didn't find

play17:22

it or fix it.

play17:23

Devin just created a bunch of other

play17:24

stuff.

play17:25

So like I said, I replicated Devin's

play17:27

work myself.

play17:28

There's the link.

play17:29

Again, it'll be in the description.

play17:31

I used torch 2.2.2, which is a much

play17:34

more current version than the one

play17:35

that Devin said.

play17:37

If you go back to that requirements.txt

play17:39

file, the hard part of what I did

play17:44

was getting a software package

play17:45

called Apex installed with the

play17:47

right version of CUDA, which is

play17:49

NVIDIA's driver stuff.

play17:50

It was a pain.

play17:51

I ended up having to build it from

play17:52

source, which took about 16 minutes

play17:53

of the 36 minutes that I was

play17:54

working on the thing.

play17:55

So there probably might have been

play17:57

an easier way to do it, but for a

play17:58

16 minute build time, that just

play18:00

seemed to be the most expedient way.

play18:04

I did remove the hard coding from

play18:05

the requirements.txt file. Devin

play18:06

just changed some of the numbers.

play18:08

I think my way is better, but

play18:09

either way, technically is okay.

play18:11

See in the next slides, there is

play18:12

actually one error that needed to

play18:14

get fixed.

play18:15

And I'll show you what it is.

play18:15

It took me about 36 minutes, 35

play18:18

minutes and 55 seconds, I think to

play18:20

actually do what I did.

play18:21

That will come important later when

play18:22

we talk about how long Devin took.

play18:24

Okay, so this is a screenshot from

play18:26

that long video that I posted.

play18:28

It's unlisted, but I gave you a

play18:30

link to it if you want to watch the

play18:31

whole thing zooming in.

play18:32

So this is where the actual error

play18:34

was.

play18:34

It's in a file called dataset.py

play18:37

on line 33.

play18:38

And the error is that the module

play18:40

called torch has no attribute

play18:41

called '_six'.

play18:43

I did a Google search.

play18:46

I found this comment on a GitHub

play18:48

issue.

play18:48

I changed the line of that code the

play18:50

way that issue told me that would

play18:53

fix it.

play18:53

It did fix it.

play18:55

I put in a link to show where it

play18:56

was that I got the idea to do that.

play18:59

Because I'm not an expert in

play19:01

exactly how Apex works.

play19:02

It was good that I found somebody

play19:04

on the Internet entire time on task

play19:06

that it took me to do that was like

play19:08

a minute and seven seconds or

play19:10

something like that is all it took

play19:11

me to fix that error.

play19:12

It was a quick Google search.

play19:14

So here is the change that I made

play19:16

in context.

play19:17

So this is a diff between what I

play19:19

started with and what I ended up

play19:20

with.

play19:21

This is a diff of the requirements

play19:24

dot txt file.

play19:25

So the torch 1.4.0 is what it

play19:28

started with.

play19:28

I use the most recent version of

play19:30

torch, which is 2.2.2 or at least a

play19:32

relatively recent one.

play19:34

There might have been a new one

play19:35

released in the last hour for all

play19:36

I know, a more recent one.

play19:38

And then here is, on the right, one

play19:40

of the last screens from Devin's

play19:41

video and on the left,

play19:43

there is my video, the final output.

play19:46

They were both more or less the

play19:47

same.

play19:48

My box is yellow.

play19:49

Their box is red.

play19:50

I don't know which one might be

play19:51

better or worse, but it only took

play19:53

me 36 minutes.

play19:54

Devin took slightly longer than

play19:55

that.

play19:56

So here is the early part of the

play19:58

Devin video.

play19:59

There's a timestamp at 3:25 PM on

play20:03

March the 9th.

play20:05

Later in the video, you see a

play20:08

timestamp from 9:41 PM on March the

play20:11

9th.

play20:11

So we're looking at six hours and

play20:13

20 minutes.

play20:14

I have no idea what would have been

play20:16

happening for six hours and 20

play20:17

minutes.

play20:18

Hopefully like Devin was waiting on

play20:20

people for a while from that

play20:21

because it doesn't make any sense

play20:23

that it would take that long.

play20:24

That's just crazy because it like I

play20:26

said, it took me a little over half

play20:27

an hour.

play20:27

There's another one and I'm

play20:29

assuming this is just like they

play20:32

left it overnight

play20:32

and then came back to it or

play20:33

something.

play20:33

But there's another one from the

play20:35

next day from 6 PM and hopefully it

play20:38

wasn't doing stuff

play20:39

over that whole time.

play20:40

So I'm assuming it just took six

play20:41

hours, but it could have taken, you

play20:42

know, day and two hours.

play20:44

That's just... I don't know why it

play20:45

would have taken that long.

play20:46

It's not efficient.

play20:48

It's not what I would call

play20:49

competent. A little weird command line

play20:51

use popped up in one of the screens

play20:53

when you frame by frame it.

play20:54

So here's a weird error.

play20:55

Let me zoom in on that head -n 5

play20:57

5 results.json | tail -n 5

play20:58

-n 5.

play20:59

So what that says is take the first

play21:02

five lines of this JSON file and

play21:04

then take the last five lines

play21:07

of the first five lines.

play21:08

There's no reason to do that.

play21:10

No human would do that.

play21:12

And it's the kind of thing that AI

play21:13

does that just doesn't make any

play21:15

sense that when you come around

play21:16

later and you look at it and you're

play21:17

like, "OK, you're trying to debug

play21:19

what's going on.""

play21:20

And there's all this extraneous

play21:21

stuff all over the place and it

play21:22

makes it really, really

play21:23

hard to figure out what the point

play21:25

was.

play21:25

In fact, the right way to do this

play21:27

is `head -5 results.json`.

play21:29

The `-n` is redundant.

play21:30

You can just say `-5`. That

play21:32

extra stuff in there is for no good

play21:34

reason.

play21:34

And it's the kind of thing that

play21:36

just makes it way more complicated

play21:38

when AI generates stuff right now.

play21:40

Hopefully that will get better.

play21:41

But at the moment AI generates a

play21:43

lot of stupid stuff.

play21:45

It does things in Python the way

play21:46

you would do it in 'C' when no one

play21:49

would do it that way in Python

play21:50

these days.

play21:51

Even when it gets things to work

play21:52

right now, the state of the art of

play21:54

generative AI

play21:55

is it just does a bad, complicated,

play21:58

convoluted job that just makes more

play22:01

work for everybody else

play22:02

if you're ever going to try to

play22:03

maintain it or fix a bug in it or

play22:05

update it to a new version

play22:07

or anything like that any time in

play22:08

the future.

play22:08

Let's look at the list of things

play22:10

that Devin thought it needed to do.

play22:13

If you look at the left there,

play22:15

there's like this series of checkboxes.

play22:17

I'm going to run through some pages.

play22:18

Exactly what they are.

play22:18

Isn't really important, but just

play22:19

look how many there are.

play22:20

This list of checkboxes gives the

play22:22

impression that Devin did something

play22:25

complicated or difficult.

play22:26

And when you're watching the video

play22:28

and you see all this scroll by, you're

play22:29

like, you know, wow,

play22:30

Devin must have done a bunch of

play22:31

stuff.

play22:31

All you needed to do, all I had to

play22:33

do to replicate Devin's results was

play22:36

get an environment set up

play22:37

on a cloud instance with the right

play22:38

hardware and run literally two

play22:40

commands with the right paths.

play22:42

All of this stuff makes it look

play22:44

like Devin did a bunch of work.

play22:45

It makes it look like Devin

play22:47

accomplished a lot of stuff.

play22:48

And really, all you had to do was

play22:50

run two commands once you set the

play22:51

environment up.

play22:52

None of those code fixes are

play22:53

relevant at all because it's all

play22:54

code that Devin generated itself.

play22:56

And at the end, the person that was

play22:59

narrating the video says, "Good Job,

play23:01

Devin.""

play23:01

Now, what Devin actually got done

play23:05

was kind of cool for an AI.

play23:08

If you had asked me a couple of

play23:09

months ago, what an AI would have

play23:12

done given that problem.

play23:15

I would have guessed an output that's

play23:17

worse than what Devin actually did.

play23:20

So it is honestly, as far as I'm

play23:21

concerned, kind of impressive.

play23:23

But in the context of what an Upwork

play23:26

job should have been, and

play23:28

especially in the context of a

play23:30

bunch

play23:31

of people saying that Devin is

play23:33

"taking jobs off of Upwork and doing

play23:35

them,"" and especially in the

play23:36

context of the company saying that

play23:38

this video will let us watch Devin

play23:41

get paid for doing work,

play23:42

which is, again, just a lie.

play23:44

I don't know that saying "Good Job."

play23:46

I don't know that I would agree

play23:47

with that.

play23:48

So look, if you make AI products,

play23:50

that's great.

play23:51

AI is good.

play23:53

I use it a lot.

play23:54

I want it to get better.

play23:55

Please make products.

play23:58

Just please tell people the truth

play24:00

about them.

play24:01

If you're a journalist or a blogger

play24:04

or an influencer, just please don't

play24:07

blindly repeat and

play24:08

amplify things that people say on

play24:10

the Internet, things that you read

play24:12

on the Internet without

play24:15

doing some due diligence, without

play24:16

looking to see if they're actually

play24:17

true.

play24:18

If you don't understand if they're

play24:19

true, if you can't figure out on

play24:21

your own if they're

play24:21

true, ask someone or just don't

play24:23

amplify it.

play24:24

Because there are a lot of people

play24:25

that are never going to look at the

play24:27

original source.

play24:28

They're just going to see the

play24:28

headline and they're going to think

play24:29

that that's true.

play24:30

That's unfortunate, but that's just

play24:32

the way we are.

play24:32

And if you're just someone who's

play24:34

using the Internet now, please, for

play24:36

the love of all that's

play24:37

holy, be skeptical of everything

play24:39

you see on the Internet or anything

play24:41

you see on the news,

play24:42

especially anything that might

play24:44

possibly be AI related.

play24:45

There's so much hype out there and

play24:47

there's so much stuff that people

play24:49

are bouncing around and

play24:51

saying to each other is true.

play24:52

That's just not true.

play24:53

So please just don't forget to be

play24:56

skeptical.

play24:57

It's important.

play24:58

Okay, so that's what I have for

play25:00

this video. Until next time,

play25:02

Always keep in mind that the

play25:04

Internet is full of Bugs and anyone

play25:06

who says differently is trying

play25:07

to sell you something.

play25:08

Have a good one, everybody.

Rate This

5.0 / 5 (0 votes)

関連タグ
AI EthicsSoftware EngineeringHype DebunkingTech IndustryUpwork TasksAI LimitationsCritical ThinkingGenerative AIInternet HypeBug Detection
英語で要約が必要ですか?