Debunking Devin: "First AI Software Engineer" Upwork lie exposed!
Summary
TLDRThe video script discusses the hype around AI, specifically focusing on the claims made about an AI named Devin, which was introduced as the 'world's first AI software engineer.' The speaker, a software professional with 35 years of experience, critiques the exaggeration and misinformation surrounding AI capabilities, using Devin as a case study. He argues that while AI can perform impressive tasks, it is not capable of fully replacing human software engineers, especially in understanding and communicating with clients. The speaker emphasizes the importance of truthful representation of AI's abilities and the potential harm caused by overhyping its capabilities.
Takeaways
- 😶 The video script critiques the hype around AI, specifically calling out the exaggerated claims made about an AI named Devin.
- 🚫 The main claim that Devin can 'make money taking on messy Upwork tasks' is disputed as false and misleading.
- 🙅♂️ The speaker emphasizes that they are not anti-AI, but rather against the hype and misinformation surrounding AI capabilities.
- 🎥 The video description and company tweets are highlighted as sources of the misleading claims about Devin's capabilities.
- 🤖 The speaker acknowledges that generative AI tools like GitHub Copilot and ChatGPT are impressive and useful, but stresses the importance of honesty in their representation.
- 🛠️ The actual task that Devin was supposed to accomplish is discussed, noting that it was a cherry-picked, specific task rather than a general capability.
- 🔍 The speaker conducted their own research and attempted to replicate Devin's work, finding that the AI's output was not as groundbreaking as claimed.
- ⏳ The time it took for the speaker to replicate Devin's work was significantly less than what was shown in the video, questioning the efficiency of the AI's process.
- 💻 The speaker points out that Devin generated errors and then attempted to fix them, which does not align with the narrative of fixing existing code in the repository.
- 🔗 The speaker encourages viewers to check the original sources of information and to be skeptical of headlines and claims made about AI.
- 📢 The video ends with a call for transparency, honesty, and skepticism in the face of AI hype and the potential for misinformation.
Q & A
What is the main claim presented in the video description that the speaker argues is a lie?
-The main claim presented in the video description that the speaker argues is a lie is that Devin, an AI software engineer, can make money by taking on messy Upwork tasks. The speaker asserts that this does not happen in the video and that it is misleading.
What is the speaker's professional background and stance on AI?
-The speaker has been a software professional for 35 years and is not anti-AI. However, the speaker is anti-hype and believes that lying about the capabilities of AI tools like Devin does a disservice to everyone.
How does the speaker describe the impact of hype and misinformation around AI capabilities?
-The speaker suggests that the hype and misinformation around AI capabilities can lead non-technical people to overestimate the current capabilities of AI. This can result in less skepticism towards AI outputs, leading to potential problems such as increased bugs, exploits, and hacks in the software ecosystem.
What was the specific task that Devin was supposed to perform on Upwork?
-Devin was supposed to provide detailed instructions on how to make inferences with a specific model in a repository on AWS EC2. However, the speaker argues that Devin did not fulfill this task as it did not generate the required detailed instructions.
What does the speaker criticize about the way Devin was presented and the hype around it?
-The speaker criticizes the hype around Devin for being exaggerated and misleading. The speaker points out that the claims about Devin's capabilities were not truthful and that the company behind Devin should have been more honest about what it could actually do.
What is the speaker's opinion on the importance of communication in software engineering?
-The speaker believes that communication is a crucial part of software engineering, involving understanding the needs of the customer, stakeholders, and team members. The speaker argues that AI is currently not capable of handling these communication aspects, which are some of the most important tasks in software engineering.
What did the speaker find when replicating Devin's work?
-The speaker found that Devin did not fix any actual errors from the repository provided by the Upwork client. Instead, Devin generated its own code with errors and then attempted to debug and fix those self-generated issues.
How long did it take the speaker to replicate Devin's results?
-It took the speaker approximately 36 minutes and 55 seconds to replicate what Devin did, which was significantly less time than the six hours and 20 minutes that Devin supposedly took.
What is the speaker's advice for AI product creators and those who report on AI?
-The speaker advises AI product creators to be truthful about their products' capabilities and not to exaggerate. For journalists, bloggers, and influencers, the speaker urges them to verify the claims they read on the Internet and not to blindly amplify unverified information.
What does the speaker suggest is the current state of generative AI in terms of coding?
-The speaker suggests that the current state of generative AI in coding often produces complicated, convoluted, and sometimes nonsensical code. It may work, but it is not efficient and can create more work for maintenance, bug fixing, or updates in the future.
What is the speaker's final message to the audience regarding skepticism and the internet?
-The speaker's final message is a call for skepticism towards everything seen on the Internet or news, especially when it comes to AI-related content. The speaker emphasizes the importance of not taking information at face value and verifying the truthfulness of claims before accepting them as facts.
Outlines
🗣️ Introduction and Critique of AI Hype
The speaker, Carl, introduces himself and sets the stage for a critical examination of AI hype, specifically focusing on Devin, an AI software engineer. Carl clarifies that while he supports AI, he is against the exaggerated claims surrounding it. He criticizes the claim that Devin can make money by taking on Upwork tasks, stating that this is a lie and that the video does not demonstrate this capability. Carl emphasizes the damage caused by such lies, especially to non-technical individuals who may develop unrealistic expectations of AI capabilities.
📝 Analysis of Devin's Upwork Task
Carl delves into the specifics of the Upwork task that Devin was supposed to complete. He points out that the task was not randomly selected, implying that it may not represent Devin's capabilities accurately. Carl outlines what the customer requested and contrasts it with what Devin actually did, highlighting the discrepancies. He stresses the importance of understanding customer needs and the limitations of AI in this aspect, suggesting that AI is currently not capable of fully comprehending and executing complex tasks as required by human software engineers.
🛠️ Devin's Actual Performance and Shortcomings
Carl provides a detailed critique of Devin's actual performance on the Upwork task. He explains that Devin did not fulfill the customer's request and instead generated code with errors. Carl suggests that Devin's actions gave the false impression of fixing repository errors, when in fact it was creating and then fixing its own mistakes. He also notes that a real error in the repository went unnoticed and unfixed by Devin, further demonstrating the limitations of AI in software engineering tasks.
⏱️ Time Efficiency and Quality of Devin's Work
Carl discusses the time it took Devin to complete the task, which was significantly longer than what Carl himself took to replicate the results. He questions the efficiency of Devin's process and highlights the unnecessary complexity introduced by the AI. Carl also points out a nonsensical command used by Devin, illustrating the AI's current shortcomings in generating efficient and sensible code. He emphasizes that while Devin's output might seem impressive, it is not practical or efficient in real-world scenarios.
🚫 The Need for Skepticism and Truth in AI
Carl concludes by urging viewers to be skeptical of AI-related claims and to verify information before accepting it as truth. He calls for honesty from AI developers, journalists, and influencers when presenting AI capabilities to the public. Carl reiterates that while AI can be impressive, it is crucial to manage expectations and not to overstate its current capabilities. He ends with a reminder that the internet is full of misinformation and that skepticism is essential, especially when it comes to AI.
Mindmap
Keywords
💡AI Software Engineer
💡Hype
💡Upwork
💡Generative AI
💡Technical Audience
💡Bug
💡Code Quality
💡Transparency
💡Skepticism
💡Communication
💡Cloud Instance
Highlights
The speaker, Carl, introduces himself and clarifies that the video will be divided into three parts, focusing on the claim about Devin, an AI software engineer, and the hype surrounding it. (Start time: 0s)
Carl emphasizes his 35 years of experience in software and his stance against AI hype rather than AI itself. (Start time: 10s)
Devin was introduced as the 'world's first AI software engineer,' a claim Carl disputes. (Start time: 20s)
Carl criticizes the false claim that Devin can make money by taking on messy Upwork tasks, stating that this does not happen in the video. (Start time: 30s)
The speaker expresses his appreciation for generative AI tools like GitHub Copilot, ChatGPT, and Diffusion, but stresses the importance of honesty about their capabilities. (Start time: 50s)
Carl argues that the hype around Devin is based on a lie that has been repeated and embellished, causing harm to the perception of AI capabilities. (Start time: 1m 10s)
The video description contains a link to Carl's previous video about Devin, which provides context for the discussion. (Start time: 1m 30s)
Carl points out that the lies about Devin's capabilities are not in the video itself but in the description and company tweets. (Start time: 2m 20s)
The speaker highlights the damage caused by non-technical people believing AI is more capable than it currently is, leading to issues such as fake cases and scientific papers. (Start time: 3m 10s)
Carl explains the actual job Devin was supposed to do on Upwork, which involved making inferences with a model in a repository. (Start time: 4m 20s)
The speaker criticizes the bidding process on Upwork and suggests a better approach involving a Q&A section and clear assumptions. (Start time: 5m 40s)
Carl discusses the importance of communication with customers and stakeholders in software development, a skill he believes AIs lack. (Start time: 6m 30s)
The speaker clarifies that Devin's report did not contain what the customer asked for, and questions the actual value of the work done. (Start time: 7m 20s)
Carl describes the actual process of reproducing Devin's work, emphasizing the simplicity of the task and the inefficiency of Devin's approach. (Start time: 8m 10s)
The speaker points out that Devin's work involved fixing errors in code that Devin itself generated, rather than fixing the original repository's code. (Start time: 9m 00s)
Carl reveals that Devin's video showed a lengthy process, taking six hours and 20 minutes, which he finds inefficient and not reflective of competent work. (Start time: 10m 00s)
The speaker concludes by urging AI product creators and influencers to be truthful about AI capabilities and by encouraging internet users to be skeptical of AI-related claims. (Start time: 11m 20s)
Transcripts
This is the Internet of Bugs, my
name is Carl, and that is a lie.
So this video is in three parts.
First, we're going to talk about
that claim.
We're going to talk about what
should have been done.
What Devin actually did and how it
did it and how well it did it.
I have been a software professional
for 35 years.
I am not anti-AI, but I really am
anti-hype and that's why I'm doing
this.
Devin was intro'd not quite a month
ago now.
And it was touted as the world's
"first AI software engineer."
And I don't believe that it's the
first software engineer and I
already made a video about that.
I'll put the links in the
description.
But today is about the specific
claim
that's the first line of the video
description, which says "watch Devin
make money taking on messy Upwork
tasks.
That statement is a lie.
You cannot watch that in the video.
It does not happen in the video.
It does not happen.
What's worse though is that the
hype and the fear, uncertainty and
doubt from people repeating and
embellishing on that claim because
they're trying to get clicks or
they're trying to go viral or they
just want to be part of the zeitgeist.
The hype around Devin in general is
just crazy.
And that statement seems to be what
a lot of it is is pinned on. For the
record, personally,
I think generative AI is cool.
I use GitHub co-pilot on a regular
basis.
I use ChatGPT, llama 2, Stable
Diffusion.
All that kind of stuff is cool, but
lying about what these tools can do
does everyone a disservice.
So Devin does some impressive
things.
And I wish the company had just
been truthful and just taken the
win, but they didn't.
And they had to pretend that it did
a lot more than it actually did.
Now, I don't want to take anything
away from the engineers that
actually built Devin.
I think Devin is impressive in many
ways and I'm especially not trying
to pick on the guy that's in the
video.
The lies are not in the video
itself.
They're in the description and they're
in the tweets that the company made
point to it.
And then they're in a lot of places
and people that have repeated that
lie over and over again.
It shouldn't be okay.
Companies should just not be
allowed to lie without getting
called out on it.
And people shouldn't repeat things
they heard on the Internet without
checking for themselves.
I realize that's tilting at windmills,
but I'm going to die on that hill.
Since nobody else that I've seen
seems to be explaining why this is
a lie.
I guess if it's going to get done,
I'm going to have to do it.
So here I go. Before you think this
is harmless,
understand this kind of lie does
real damage.
You're watching this.
You're probably at least somewhat
technical.
Keep in mind that there are a lot
of people out there that see
headlines don't read the articles
that are not technical.
And what these lies do is they
cause non-technical people to
believe that AI is far more capable
than it is at the moment.
And that causes all kinds of
problems.
People end up being a lot less
skeptical of AI than they should be.
They're a lot less skeptical of the
output of AI than they really
should be.
And taking AI at face value these
days is getting a lot of people in
trouble.
Just Google "AI lawyer fake cases" or
"AI fake scientific papers."
And those are just the prominent ones.
And this hurts real software
professionals too, because there are
going to be folks that are going to
trust the code that AIs generate.
And that just means more bugs on
the Internet and there are already
way too many already.
It's already a mess.
They're already too many exploits.
They're already too many hacks.
And the more bad code that gets out
there, the worse the ecosystem
becomes for everyone.
Enough of that. On to section two.
What was the job that Devin was
supposed to have done?
So this is the beginning of the
video or early in the video.
Note that in the bottom left hand
corner of your screen, I have stuck
the time code of every frame that I'm
going to be breaking down for you.
So this is 2.936 seconds into the
video.
So you can go look yourself if you're
curious about any particular thing
or want to know the context around
something that I'm talking about.
This is the job that Devin
supposedly did on Upwork.
We'll talk about it in a minute.
First off, look at the left of your
screen at the top.
Notice that they searched for this.
So this is not some random job.
This is not "Devin can do any job on
Upwork,"" right?
They cherry picked this.
That isn't deceptive necessarily.
You would kind of expect them to.
But keep in mind that what that
means is chances are Devin is
actually worse at most jobs than
Devin turned out to be on this one,
which wasn't great.
So zooming into that particular
request.
There at the bottom, that's what
the customer actually wanted.
"I want to make inferences with this
repository.""
"Your deliverable is detailed
instructions.""
I'm not going to talk about the
estimate to complete the job thing.
Devin didn't do that.
That's fine.
I'm not worried about that.
But look at this.
This is what Devin was actually
told.
This is what was copied and pasted
into Devin.
"I'm looking to make inferences with
this model in the repository.
Here's the repository.
Please figure it out."
Okay, back to the job.
"Your deliverable will be detailed
instructions on how to do it in EC2
on AWS."
"Please figure it out" is not the
same as "detailed instructions on how
to do it in an EC2 instance in AWS."
For the record, this at the end of
the video is the report that Devin
generated.
There is nothing in that at all
about what the customer was
actually asking for.
So what should the results of this
job actually look like?
To start with, this is what you
really need to know in order to be
able to figure out how to do this.
You're going to have to have some
kind of instance in the cloud.
You need to figure out what size,
type, how much memory, all that
kind of stuff.
You need to find out from the
customer.
Would you rather have one that runs
faster and is more expensive?
Or would you rather one that's
cheaper that runs slower?
Is this going to be something that's
always going to be up and you can
just throw stuff at it whenever and
have it give you an answer?
Or are you going to launch it, run
it and then turn it off to make to
save money?
How are you going to get the stuff
you want to make inferences on?
How are you going to take the
images that you want to analyze?
How are you going to get that onto
the server?
You want to do a web interface for
that?
You can SSH them.
You can put them in S3 bucket.
You know, how are you going to get
access to the output of that?
These are all questions that you
need to know, right?
This is going back to another video
that I made, the part of the job of
a software developer that the AIs
are bad at.
The hard part, the important part,
the difficult part, the time
consuming part of being a software
engineer is communication with the
customer, with your boss, with the
stakeholders.
Figuring out what actually needs to
get done, going back and forth,
saying, "okay, this would be a lot
easier.
How about we do that?"
Those are the kinds of things that
AI just isn't capable of doing, and
those are some of the most
important things that we do.
This just starts right off as AI
doing the wrong thing.
Unfortunately, this is Upwork.
So just for those of you that
actually are ever going to be in
this situation, Requests For
Proposals like this are are bad.
If you can avoid doing them, avoid
it. Competent Request For Proposals
process is going to have a Q&A
section.
So they tell you "This is what we
want.""
You send them questions other
vendors send them questions.
They answer all the questions, they
send out the answers to everybody,
and then the bidding happens.
Since we can't do that in Upwork
because it's not set up that way,
the next best thing, which isn't
actually a good thing, but the next
best thing is you write down your
questions.
You pick the answer that will cause
the cheapest amount of work, right?
The least amount of work for you.
Then at the top of your proposal,
you say, "okay, here are all the
assumptions I'm making.
If any of these assumptions turn
out not to be true, that's negotiable,
but it means that the cost is going
to go up." Because you want to bid as
low as you can, but you want to
make sure that the customer
understands
that you're bidding that value with
these assumptions.
And if any of those assumptions,
they want it done differently, they're
going to have to pay more.
It's not a good bidding process,
but if you're going to have to do
that kind of bidding process, that's
how you do it.
So, a deliverable for this
particular job should contain what
kind of cloud instance type to use,
what kind of operating system and
image to use.
How do you set up the install
environment?
So CUDA, Apex, PyTorch, don't worry
about if you don't know what any of
those are.
It's not really important for this
purpose.
How to install that repo, so that's
a four year old repo.
You're either going to need to
update that repo for modern Python
and modern libraries, or you're
going to have to explain how to
install a four year old or an older
environment.
One of those two things is going to
have to happen.
You're going to have to explain to
the customer how the data should be
got onto the instance, how they're
going to get their output off the
instance, all that kind of stuff.
I actually reproduced what Devin
did myself.
We'll talk more about that later.
This is the actual instance size
that I used.
I used a company called Vultr
instead of AWS because AWS's
interface is a mess and it wouldn't
make good videos.
And on top of that, by the time
this video got edited and uploaded,
probably the new version of
something would have been released
and I would have the numbers wrong.
So this is just it's a lot more
stable.
It's easier for this job for the
customer.
I would have actually done it on
AWS.
There's no... we have no idea what
kind of image Devin used.
They didn't tell us anything about
it.
If you are a masochist, there is a
link for the whole and I'll put it
now in the description for the
whole uncut version of me spending
35 minutes and 55 seconds or
however long it took actually
reproducing what Devin ended up
doing.
So if you have no life, you're
welcome to watch that.
I think transparency is important.
It's really boring to watch, but it's
important and I wish that the
company that made Devin and anybody
else that's making these kinds of
claims on the Internet would
actually just post
"Here's the raw footage of what
actually happened" so that we can
verify their claims if we need to.
All right, so on the next section
given that we know that Devin didn't
do what the customer asked and
Devin's report did not have any of
the stuff that the customer wanted
and that Devin didn't actually get
paid for any of this.
What did Devin actually do if it
didn't make money, what did it make
and how good a job of that did it
do?
So here's a screenshot from the
video.
This is the repo in question.
We'll come back to screens like
this later.
This is the first thing that Devin
really changed.
So there's a thing called a
requirements.txt file.
It determines what version of
dependent libraries your code is
going to run.
And it had to change some things
because the the libraries that this
repo originally used from four
years ago, some of them aren't
downloadable anymore because they're
so old.
So something had to change.
Here it says that Devin is actually
updating the code.
I guess that's kind of arguably
true.
I would say it's more a
configuration file than changing
the code, but I'll allow it.
It is really cool that Devin can do
this if what the tool did was just
change all of the requirements so
they all lined up,
that would be something that would
save me time.
So that would be a cool thing to do.
So it's good that you can do this.
I don't know that I'd call it code,
but it's a very, very small part of
what actually needs to get done
instead of what the customer asked
for, which is basically "I want to
be able to make my own inferences.""
Devin was told just using the
sample data is fine.
So that's what I did on my reproducing
what Devin did.
Normally it should be more
complicated than that, but that's
what we're going to show that Devin
actually did.
Okay, so Devin is fairly early on
hits an error.
I did not hit this error and you'll
see why in a second.
So zooming in, here's this command
line error.
So here at the top.
We have this error with
image.open
"file not found no such file
or directory.""
So this error is in a code file
called "visualize_detections.py"
and the reason that I didn't run
into this problem is because there
is no file called visualize_detections.py
in that repository.
I don't know where that file came
from, but more about that in a sec.
So back to that command line.
If you zoom in on the other part of
that window, you see this.
So Devin is echoing a bunch of
stuff into a file called inspect
results.py
and then it's running Python on it
and it's getting a syntax error.
You can't put backslash 'n' in a
Python file.
It doesn't work that way.
Echo doesn't work that way.
None of this works that way.
This is just this is just
nonsensical.
This is the kind of thing that you
might do as a human because you're
not paying attention.
And then you go, oh, yeah, I need
to change the way I did that.
But what seems to be happening is
Devin is creating files that have
errors in them
and then it's fixing the errors.
So here the video says that Devin
is actually "doing print line
debugging"" and that's cool.
That's something a lot of us do.
You know, there are always times
that printf debugging or print
line debugging ends up being useful.
So it's cool that Devin can do that
in at least some circumstances.
But here's another error I didn't
see and Devin is coming in trying
to figure this out.
The commentary here says "Devin is
adding statements to track down
these data flows until Devin
understands.""
Now, I'm okay with that.
I don't know if the word
"understands" there is technically
true.
I don't know that Devin actually
"understands" anything.
I would doubt it, but we anthropomorphize
stuff like that all the time and it's
a handy way of using language.
So I'm not going to give them a
hard time for that.
But that said, let's look at what
Devin's actually doing here.
So zooming in on this, we've got
this weird loop that it's doing.
It's going through this file and
reading stuff into a buffer.
So this is the update_image_ids.py
file.
And again, this file does not exist
anywhere in the repository that the
customer wanted us to use.
In fact, I searched all of GitHub
and there are only two places that
a file that this name exists at all.
The reason there are three on the
screen there is because one of them
is a fork of the other.
And none of them look anything like
the one that Devin is using.
So I don't know where this came
from.
We don't have any idea.
But the problem is Devin is here
debugging a file and that file it
created and it's not in the repo at
all.
This is pretty insidious.
So this gives the person who's
viewing the video who's not paying
that much attention who didn't have
time or take the effort to look at
the repo.
It gives that viewer the impression
that Devin is finding errors in the
repository that the Upwork user
asked us to look at.
And fixing the errors in the
repository.
That's not the case.
Devin is generating its own errors
and then debugging and fixing the
errors that it made itself.
That's not what it seems like Devin
would be doing.
It's not what Devin is implied to
be doing.
It's not what many people who have
written articles and posted videos
about Devin have thought Devin was
doing.
But in fact, Devin isn't fixing
code that it found on the Internet.
Devin isn't fixing code that a
customer asked it to fix.
Devin is fixing code that it
generated with errors in it.
And that's not at all what most of
the people who watch this video
will think that it's doing.
What's worse is that there's no
reason for this.
This is the README file from that
repo.
I told you we'd come back to this
page.
There is a file called infer.py
that is in that repo and it does
exactly what Devin does in this
video.
The README tells you that it
does it.
It tells you how to use it.
There on the right.
there's even a little button that
you can click on where you can copy
the whole command line and paste it
in your window and hit return.
And if you watch the long video
where I reproduce the result, that's
exactly what I did.
I copied and pasted things, changed
the path names and hit return and it
worked.
I don't think the person that wrote
this repository, detecting road
damage,
I don't think the person that wrote
that could have made it any easier
to understand how we were supposed
to use it.
But Devin didn't seem to be able to
figure that out.
And so Devin had to create this
other thing that was a mess.
This code right here, this reading
into a buffer thing.
It's bad.
Right? This is the way we had to
read files in decade ago in 'C' and
really lower level languages.
Python has much better ways to
handle this.
As Devin is figuring out, this kind
of thing is hard to debug.
It's complicated.
It's difficult, easy to get off by
a little bit, which is I think what
Devin is trying to debug here.
I'm not exactly sure what was going
wrong, but that's what it seems
like
is going wrong
is it got off by some characters
and so the JSON didn't parse right?
But I mean, this is not how you
would do it these days.
This is not how you would do it in
Python.
This is not something that I would
accept in a code review from a
junior developer.
This is causing more problems than
it actually solves.
This is bad.
It's just bad.
In addition, there is a real error
in the repo and Devin didn't find
it or fix it.
Devin just created a bunch of other
stuff.
So like I said, I replicated Devin's
work myself.
There's the link.
Again, it'll be in the description.
I used torch 2.2.2, which is a much
more current version than the one
that Devin said.
If you go back to that requirements.txt
file, the hard part of what I did
was getting a software package
called Apex installed with the
right version of CUDA, which is
NVIDIA's driver stuff.
It was a pain.
I ended up having to build it from
source, which took about 16 minutes
of the 36 minutes that I was
working on the thing.
So there probably might have been
an easier way to do it, but for a
16 minute build time, that just
seemed to be the most expedient way.
I did remove the hard coding from
the requirements.txt file. Devin
just changed some of the numbers.
I think my way is better, but
either way, technically is okay.
See in the next slides, there is
actually one error that needed to
get fixed.
And I'll show you what it is.
It took me about 36 minutes, 35
minutes and 55 seconds, I think to
actually do what I did.
That will come important later when
we talk about how long Devin took.
Okay, so this is a screenshot from
that long video that I posted.
It's unlisted, but I gave you a
link to it if you want to watch the
whole thing zooming in.
So this is where the actual error
was.
It's in a file called dataset.py
on line 33.
And the error is that the module
called torch has no attribute
called '_six'.
I did a Google search.
I found this comment on a GitHub
issue.
I changed the line of that code the
way that issue told me that would
fix it.
It did fix it.
I put in a link to show where it
was that I got the idea to do that.
Because I'm not an expert in
exactly how Apex works.
It was good that I found somebody
on the Internet entire time on task
that it took me to do that was like
a minute and seven seconds or
something like that is all it took
me to fix that error.
It was a quick Google search.
So here is the change that I made
in context.
So this is a diff between what I
started with and what I ended up
with.
This is a diff of the requirements
dot txt file.
So the torch 1.4.0 is what it
started with.
I use the most recent version of
torch, which is 2.2.2 or at least a
relatively recent one.
There might have been a new one
released in the last hour for all
I know, a more recent one.
And then here is, on the right, one
of the last screens from Devin's
video and on the left,
there is my video, the final output.
They were both more or less the
same.
My box is yellow.
Their box is red.
I don't know which one might be
better or worse, but it only took
me 36 minutes.
Devin took slightly longer than
that.
So here is the early part of the
Devin video.
There's a timestamp at 3:25 PM on
March the 9th.
Later in the video, you see a
timestamp from 9:41 PM on March the
9th.
So we're looking at six hours and
20 minutes.
I have no idea what would have been
happening for six hours and 20
minutes.
Hopefully like Devin was waiting on
people for a while from that
because it doesn't make any sense
that it would take that long.
That's just crazy because it like I
said, it took me a little over half
an hour.
There's another one and I'm
assuming this is just like they
left it overnight
and then came back to it or
something.
But there's another one from the
next day from 6 PM and hopefully it
wasn't doing stuff
over that whole time.
So I'm assuming it just took six
hours, but it could have taken, you
know, day and two hours.
That's just... I don't know why it
would have taken that long.
It's not efficient.
It's not what I would call
competent. A little weird command line
use popped up in one of the screens
when you frame by frame it.
So here's a weird error.
Let me zoom in on that head -n 5
5 results.json | tail -n 5
-n 5.
So what that says is take the first
five lines of this JSON file and
then take the last five lines
of the first five lines.
There's no reason to do that.
No human would do that.
And it's the kind of thing that AI
does that just doesn't make any
sense that when you come around
later and you look at it and you're
like, "OK, you're trying to debug
what's going on.""
And there's all this extraneous
stuff all over the place and it
makes it really, really
hard to figure out what the point
was.
In fact, the right way to do this
is `head -5 results.json`.
The `-n` is redundant.
You can just say `-5`. That
extra stuff in there is for no good
reason.
And it's the kind of thing that
just makes it way more complicated
when AI generates stuff right now.
Hopefully that will get better.
But at the moment AI generates a
lot of stupid stuff.
It does things in Python the way
you would do it in 'C' when no one
would do it that way in Python
these days.
Even when it gets things to work
right now, the state of the art of
generative AI
is it just does a bad, complicated,
convoluted job that just makes more
work for everybody else
if you're ever going to try to
maintain it or fix a bug in it or
update it to a new version
or anything like that any time in
the future.
Let's look at the list of things
that Devin thought it needed to do.
If you look at the left there,
there's like this series of checkboxes.
I'm going to run through some pages.
Exactly what they are.
Isn't really important, but just
look how many there are.
This list of checkboxes gives the
impression that Devin did something
complicated or difficult.
And when you're watching the video
and you see all this scroll by, you're
like, you know, wow,
Devin must have done a bunch of
stuff.
All you needed to do, all I had to
do to replicate Devin's results was
get an environment set up
on a cloud instance with the right
hardware and run literally two
commands with the right paths.
All of this stuff makes it look
like Devin did a bunch of work.
It makes it look like Devin
accomplished a lot of stuff.
And really, all you had to do was
run two commands once you set the
environment up.
None of those code fixes are
relevant at all because it's all
code that Devin generated itself.
And at the end, the person that was
narrating the video says, "Good Job,
Devin.""
Now, what Devin actually got done
was kind of cool for an AI.
If you had asked me a couple of
months ago, what an AI would have
done given that problem.
I would have guessed an output that's
worse than what Devin actually did.
So it is honestly, as far as I'm
concerned, kind of impressive.
But in the context of what an Upwork
job should have been, and
especially in the context of a
bunch
of people saying that Devin is
"taking jobs off of Upwork and doing
them,"" and especially in the
context of the company saying that
this video will let us watch Devin
get paid for doing work,
which is, again, just a lie.
I don't know that saying "Good Job."
I don't know that I would agree
with that.
So look, if you make AI products,
that's great.
AI is good.
I use it a lot.
I want it to get better.
Please make products.
Just please tell people the truth
about them.
If you're a journalist or a blogger
or an influencer, just please don't
blindly repeat and
amplify things that people say on
the Internet, things that you read
on the Internet without
doing some due diligence, without
looking to see if they're actually
true.
If you don't understand if they're
true, if you can't figure out on
your own if they're
true, ask someone or just don't
amplify it.
Because there are a lot of people
that are never going to look at the
original source.
They're just going to see the
headline and they're going to think
that that's true.
That's unfortunate, but that's just
the way we are.
And if you're just someone who's
using the Internet now, please, for
the love of all that's
holy, be skeptical of everything
you see on the Internet or anything
you see on the news,
especially anything that might
possibly be AI related.
There's so much hype out there and
there's so much stuff that people
are bouncing around and
saying to each other is true.
That's just not true.
So please just don't forget to be
skeptical.
It's important.
Okay, so that's what I have for
this video. Until next time,
Always keep in mind that the
Internet is full of Bugs and anyone
who says differently is trying
to sell you something.
Have a good one, everybody.
Ver Más Videos Relacionados
Devin: l'AI che sostituirà i programmatori [Reaction]
I used the first AI Software Engineer for a week. This is happening.
Software Engineers and IT Leaders are Dead Wrong about AI
Is AI Replacing Software Engineering?
Software Engineer jobs (Mid-Senior) can NEVER be replaced by a Coding "AI" like Devin
Using AI in Software Design: How ChatGPT Can Help With Creating a Solution Architecture | R. Müller
5.0 / 5 (0 votes)