Will Devin AI Take Your Job?
Summary
TLDRThe video discusses the new AI tool, Devon, which has generated buzz for its software engineering capabilities. While Devon can learn to use new technologies, fix bugs, and even perform some real-world tasks, the video argues that it's not as revolutionary as it seems. Devon's abilities are showcased through a limited set of well-documented GitHub issues and it requires specific prompts to learn from resources. The video emphasizes that Devon is a tool to aid developers, not replace them, as it lacks the problem-solving skills and technical knowledge that human developers possess.
Takeaways
- 🤖 Devon is a new AI tool developed by Cognition Lab, designed to mimic the functions of a software engineer, causing some concern among professionals.
- 📈 Despite impressive claims, Devon's capabilities are not as overwhelming as they are portrayed, and it is important to analyze them critically.
- 💡 Devon can learn to use unfamiliar technology by teaching itself using existing documentation, but this is not entirely autonomous learning.
- 🔍 The AI can find and fix bugs, but it is not as autonomous as it seems; it requires specific prompts and does not actively seek out errors.
- 📊 Devon's reported ability to solve 13.86% of GitHub issues is based on a limited sample and may not represent its full capabilities.
- 🛠️ Devon's real-world job capabilities are showcased through carefully selected examples on platforms like Upwork, which may not be representative of its overall potential.
- 📖 The AI's learning process is facilitated by specific instructions and existing scripts, rather than independent discovery and understanding.
- 🐞 Devon's bug-finding process is more about writing and refining tests based on developer prompts, rather than independently identifying and fixing issues.
- 🔢 The AI is not fast; tasks can take hours to complete, and it is not as efficient as other tools like ChatGPT or AI Code Pilot.
- 🔧 Devon is best seen as a tool to assist developers by speeding up workflows and handling tedious tasks, rather than replacing the need for human problem-solving skills.
Q & A
What is Devon and who created it?
-Devon is a new AI tool designed to act and work like a software engineer, created by Cognition Labs.
How much funding has Devon raised according to the script?
-Devon has raised $21 million in funding.
What are some of the capabilities of Devon mentioned in the script?
-Devon is capable of learning how to use unfamiliar technology, finding and fixing bugs autonomously, and accomplishing real-world jobs on platforms like Upwork.
What is the 'thiswe bench' and what does it measure?
-The 'thiswe bench' is a benchmark used for testing AI against GitHub issues, specifically looking at how well the AI can address issues in 12 popular Python repositories.
According to the script, what percentage of GitHub issues can Devon solve?
-Devon is able to accomplish 13.86% of GitHub issues, based on the 'thiswe bench' benchmark.
Why is the claim that Devon can solve 13.86% of GitHub issues considered misleading?
-This claim is misleading because it only considers a very small subset of issues from 12 Python repositories with exceptionally well-documented and structured issues, not the entirety of GitHub.
What does the script suggest about Devon's ability to learn from resources like blog articles?
-While Devon is said to learn from blog articles and resources, the script suggests that its ability to do so may be limited, and in the example given, it largely relied on existing scripts and instructions rather than generating new knowledge.
How does Devon's performance in writing tests and finding bugs compare to human developers?
-Devon can write tests and identify bugs through that process, but it requires specific instructions to do so. Its ability to autonomously find and fix bugs is not as advanced as it might initially appear.
What is the significance of Devon's performance on Upwork tasks according to the script?
-Devon's ability to accomplish work on Upwork, particularly tasks involving the implementation of existing AI models, is highlighted as impressive. However, these tasks are carefully selected and do not represent the full spectrum of freelance work available.
What is the main argument against the fear of Devon replacing software engineering jobs?
-The script argues that while Devon is a powerful tool, it cannot replace the core problem-solving skills and creative thinking of software engineers, highlighting that AI tools are meant to empower rather than replace human developers.
Outlines
🤖 Introduction to Devon AI and Addressing Concerns
This paragraph introduces Devon, a new AI tool that has been creating a buzz on social media platforms like Twitter and YouTube. The speaker, Kyle, aims to address concerns that Devon might replace software engineers' jobs. He explains that after researching and analyzing the claims made about Devon, he believes the AI is not as intimidating as it's portrayed. Kyle plans to discuss what Devon can and cannot do, debunking some myths around its capabilities.
💡 Analyzing Devon's Marketing and Real Capabilities
In this section, Kyle scrutinizes the marketing strategies of AI companies like Cognition Lab, the creators of Devon. He points out that the company's blog article and other promotional materials highlight the best-case scenarios to attract funding. The speaker expresses skepticism about the claim that Devon can solve 13.86% of GitHub issues, noting that this figure is based on a small, cherry-picked subset of data. He emphasizes the importance of evaluating these claims critically and understanding the limitations of AI in real-world scenarios.
🔍 A Closer Look at Devon's Learning Process and Bug Fixing
Kyle delves deeper into specific capabilities of Devon, such as its ability to learn from external resources and fix bugs in code. He critiques the way these features are presented in promotional videos, arguing that they might be misleading. For instance, while Devon can generate code based on instructions from a blog post, it doesn't demonstrate true self-learning across all situations. Similarly, its bug-fixing abilities are more about writing tests that fail and then adjusting the code to pass those tests, rather than independently identifying and fixing bugs in existing code.
🚀 Devon's Real-World Application and Speed
The speaker discusses Devon's application in real-world tasks, such as solving problems on Upwork, but notes that the tasks chosen are carefully selected to showcase the AI in the best light. Kyle also highlights that Devon operates at a slower pace compared to other AI tools, taking hours to complete tasks. He emphasizes that while Devon can automate certain coding tasks, it still requires technical knowledge to use effectively. Kyle concludes that Devon is a tool to assist developers rather than replace them, as AI is currently incapable of the complex problem-solving required in software engineering.
Mindmap
Keywords
💡Devon AI
💡GitHub
💡Unit Tests
💡Bug Fixing
💡Self-Learning AI
💡Upwork
💡Software Engineering
💡AI Tool
💡Funding
💡Problem Solving
💡Technical Knowledge
Highlights
Devon is a new AI tool by Cognition Lab that's causing a buzz for its potential to work like a software engineer.
Despite the hype, Devon may not be as revolutionary or threatening to software engineering jobs as some think.
Devon has raised $21 million in funding, highlighting the significant interest in its capabilities.
One of Devon's touted capabilities is learning to use unfamiliar technology through existing resources.
Devon's bug fixing feature is shown to be less autonomous and more limited than initially suggested.
Devon's real-world job accomplishment on Upwork is highlighted, but with a note on its selective showcasing.
A benchmark claims Devon can solve 13.86% of GitHub issues, but this is based on a specific, small dataset.
The dataset for the benchmark consists of 12 popular Python repositories and 2300 issues with associated pull requests.
Devon's self-learning capability through blog articles and resources is questioned based on the detailed prompting required.
The process of Devon writing tests and finding bugs is more iterative and requires specific prompts from the developer.
Devon's execution of real-world tasks, such as on Upwork, is not as quick or seamless as might be expected.
Devon is viewed more as a tool to aid developers rather than replace them, emphasizing the importance of human problem-solving skills.
The video concludes that Devon, while impressive, is not a threat to software engineering jobs due to the core skills required in problem-solving and creativity.
Devon's potential is seen in empowering developers, making certain tasks easier, rather than in taking over their roles.
The analysis stresses that understanding and interpreting complex problems remain a uniquely human skill outside Devon's current capabilities.
Transcripts
if you've been on Twitter or YouTube
over the last week you've definitely
heard of Devon the brand new AI tool
that supposedly acts and works just like
a software engineer and a lot of people
are worried that this is going to be the
thing that takes over your job as a
software engineer and there's a lot of
really impressive claims that Devon is
making but how true are they actually
and how impressive is this AI tool I've
gone through I've done the research read
the papers looked at all the different
claims that they're making and I really
think Devon is not nearly as impressive
or scary as people are making it out to
be and in this video I kind of want to
talk about what Devon is what it
actually can accomplish and some of the
things that it really cannot
do welcome back to web def simplified my
name is Kyle and my job is to simplify
the web for you so you can start
building your dream project Center and
today we're going to be talking about
cognition lab's newest AI which is Devon
and this is pretty much a brand new
company that really hasn't released
anything at all before until releasing
this Devon AI now they put out a Blog
article which I'm going to link in the
description of this video and this blog
article goes through quite a few
different things about Devon what it's
capable of what it can all do and really
is showcasing all of the best case
scenarios for Devon because they want
this to look as good as possible and
that's because most of the time these AI
companies what they're trying to
accomplish is actually getting tons and
tons of funding if we actually scroll to
the top of this page you can see that
they've already raised $21 million in
funding pretty much immediately from
announcing this and all of that stuff
going along with this so really the goal
of these types of blog articles and all
this information is to really drum up as
much hype as possible to get as much
funding as possible in into these
particular AIS so they want it to look
as good as possible on paper now there's
a few different things I want to talk
about in this video that specifically
are the things people are most scared of
so if we scroll down to this Devon's
capabilities there's a bunch of
different videos that we can go through
that talk about the different things
Devon can do and I want to focus on some
of the main ones and why they're maybe
not as scary as you think the first one
here is that Devon can learn how to use
unfamiliar technology this one is scary
to a lot of people because the AI
essentially can teach itself using
existing blog articles videos
documentation and so on which sounds
really scary but honestly we'll deep
dive into this it's not that bad another
thing that we want to talk about is how
it can actually find and fix bugs for
you autonomously which is very
misleading compared to what they
actually do in the video again I'll dive
deeper into why this is not nearly as
scary as they make it out to be
especially based on the video that they
show you and then finally here if we go
down a little bit further we can see
that Devon is actually able to
accomplish Real World jobs on epor which
is again something that's really scary
for people because it's like replacing
essentially jobs that people could do
but again this may not be as scary as
you think it is now if we scroll all the
way down to the bottom here you may see
this chart this is probably something
you've seen if you've heard people talk
about Devon and essentially it's saying
that Devon is able to accomplish 13.86%
of GitHub issues and that's how a lot of
people present it but essentially it's
just using thiswe bench which is
essentially a paper a benchmark for
testing AI against GitHub issues and if
we go to the actual site for this you'll
notice that this is actually much less
of a scary thing than people think they
may think that okay it can solve
essentially what is it 13.8% of all
GitHub issues but really what this does
is it takes just 12 GitHub repositories
if we scroll all the way down here you
can see it's 12 popular python
repositories and it's only pulling 2300
different poll request issues so the way
that this works is it takes 2300 issues
and the associated poll request that was
generated for that issue and each of
these poll requests has test data that
was written for it for unit test and in
order to be considered passing for the
AI model all it has to do is write code
that passes the unit test that were
written to go along with that P R it
doesn't actually mean that the code is
100% correct or that it does things
exactly like it's supposed to it just
has to pass those unit tests which is
generally a good idea to say that the
code is most likely correct now if we go
ahead and we look at an example of one
of the issues that is used inside of
this data set you'll see that this is an
issue for some python library for
something where new lines were being
added in wrong places and you'll notice
something really important about this is
that the issue is very well documented
you can see here is exactly what I
searched for here is exactly what's
happening you can see the expected
Behavior what the observed Behavior
should be how to reproduce this all the
different stuff with versioning
configuration files I mean this is an
incredibly well-written issue much
better than 99% of GitHub repositories
out there and this is actually a
recurring theme between pretty much all
these different GitHub issues that are
tested they have very good documentation
in the issue side of things now if we
look at the poll request that was
submitted by an actual user this is not
generated by AI you notice that the
amount of files changed was 10 it's not
a huge amount of data that was changed
and if we go all the way down to the
test you can see that this person wrote
a few different test cases inside of
here so if we look at a few of these
different tests you can see there's just
a couple tests that are being written
and modified so this is essentially what
the data is being test on is these like
two or three different test cases that
were added or modified so really as long
as the AI model is able to actually
correctly write some code that passes
these tests that's the only thing that's
being checked on but in general that's a
pretty good indicator that they were
able to solve the problem and it's still
impressive that they're able to solve
essentially 133% of these different
problems but another thing to worry
about here is if we scroll down you'll
notice that Devon was evaluated on a
random 25% % subset of the data now I'm
not sure why they decided to go with
only 25% of the data instead of doing
100% of the data it makes me a little
bit concerned because since there's no
way for us to actually test with Devon
right now since it's a closed off system
currently it's not open to the public
it's a little bit scary for me to think
maybe they kind of randomly chose 25%
until they got a 25% that gave them this
good number for their announcement to
try to raise money they could have just
continually tested a random 25% until
they landed on a random 25% that gave
them the best best possible number
because obviously some issues are going
to be easier than others to solve so
it's a little bit strange they didn't do
it with 100% I don't know if there's
certain resource constraints or if there
was a different data set they used or
what it was but it would be much more
comforting to actually see that they did
this on 100% of the data instead of only
25% of it especially because like I said
there's only 2,300 issues so doing 25%
versus 100% is not that big of a
difference so if you see this type of
chart being thrown around where it's
like they can solve 14% of all issues on
GitHub that's very misleading it's 14%
of issues ues in a very small subset
across a very few select repositories
that have very good documentation and
very good issue support now the other
things I talked about one thing is that
this AI can learn for itself this is the
video that they mentioned that
specifically that the AI can actually
learn from blog articles and resources
out there so in this particular video
this person is asking Devon they're
pasting in a link to a Blog article and
they're saying hey this blog article
says that it can do X Y and Z and it
even mentions in the blog article a
script that you can use to do this
that's what they tell Devon and they say
hey can you set this up and generate
images for me with these specific
criteria so if we go over to that blog
article at the very bottom you'll see
that it has this try it-yourself section
and it even has a link to a GitHub
repository with that script if we open
up that script you can see right here is
the GitHub repository with all the
information you need to be able to set
this up it even tells you the exact code
you need to use obviously it has the
script files and everything so
essentially all the code to do this is
already written it's just giving you
instructions on how to get set up with
that so Devon's not really writing too
much custom code it's just mostly
following these instructions that are
set up in this blog article and set up
in this GitHub repository and it's able
to generate these things based on the
code that's already been written by
other people and I noticed something
really specific about this prompt they
give it they specifically in the prompt
say here's the blog article and they
mention that there's a script in the
blog article that is supposed to be used
to generate those things so they're
specifically telling this AI hey look
for this script inside this blog article
they maybe ignored everything else in
the blog article went straight to the
script and looked at this actual GitHub
reposit itory with all the information
and code to be able to do what it needs
to do so it says that it can teach
itself based on these different things
and sure there may be some degree of
that to it but the fact that they had to
specifically prompt telling it where the
script was telling it where the blog
article was and having that blog article
pretty much already have all the code
inside of it makes me a little bit leery
saying that it can really learn for
itself in all situations it seems rather
Limited in its capabilities in this
regard at least based on this particular
video example now the next one that I
think is kind of scary for a lot of
people is that this AI is able to find
and fix bugs in your specific code and
if you go through and that you watch
this video you'll realize that it's
really actually not finding and fixing
these bugs for you so if you watch this
video essentially what happens is this
guy wrote some particular code to do
something inside of his repository and
he wrote that code but he didn't want to
write any test cases for that code so
there's no test at all for this code and
he comes to Devon and he says hey Devon
I would like you to write a test for
this code it's specifically asking in
the prompt I would like you to write
test for this particular code and it's
going to write out that test case and
what happens is that he goes back and
forth a couple times with Devon asking
it to write more and more test based on
more specific things and finally Devon
writes a test that actually fails and
Devon isn't necessarily finding this bug
per se he's telling it to write test
then Devon is going ahead and it's
writing out these test and in the
process of writing out the test that
this developer specifically told Devon
to write out it is then finding that
these tests do not pass now the cool
thing about Devon in this regard that I
will give a credit for is that when when
it finds this bug in the code
essentially it says hey this test does
not pass it actually goes through and
finds where that bug is in the code to
make the test pass and is able to solve
the bug which is essentially one line of
code that needs to be added to the
actual thing as you can see right here
on line 36 he adds this one single line
of code and that essentially fixes the
bug so Devon is able to go through it's
writing these tests and it's finding the
bug it's really cool but as you can see
it's not just looking at a code
repository and saying hey I found the
bug for you instead it's kind of a very
step-by-step process of hey write these
tests for me this test failed so
obviously there must be a bug it's a
very cherry-picked example and they're
really kind of blowing it out of
proportion a little bit with the
language they're using it's not
necessarily finding bugs in your code
it's just writing these tests and
through that process happens to stumble
upon the bug now the last one I want to
talk about is honestly the one that is
probably the most impressive and that is
that Devan is able to accomplish work on
upwork so if we look at this particular
upwork task obviously they very much
cherry-picked it they chose the one
thing that is obviously going to work
for them there's probably hundreds of
upwork examples that do not work for
them but this one is very simple for
them because essentially all that this
person is asking is hey all I want you
to do is to take this model that already
exists and I want you to be able to
implement it and use it for me an AI
model specifically so in this video
essentially the person goes through and
they tell Devon hey here's this thing
this model that I want you to implement
and start using and it goes through and
it implements that model and it starts
using the information from it and it
ends up generating some results now one
important thing to note about pretty
much everything that Devon is doing is
that it's not particularly fast this
example for this upwork thing I think
took about two maybe 3 hours to actually
accomplish and a lot of the these other
things are taking an hour two hours to
actually run through and generate this
code so it's not like chat jpt or AI
Code Pilot or something like that where
it's really quickly giving your
responses this is a relatively slow
process and it might be very iterative
where you're working directly with it
trying to help prompt it along which is
another reason why I think that you
shouldn't really be super worried while
it can do these really cool things where
it generates some code based on
different GitHub repositories or
articles which is really cool to see
it's something that still requires
technical knowledge in order to use if I
were to give my wife this tool and tell
her hey you can use this to solve upwork
problems or something like that she
would maybe be able to solve some really
simple things but as soon as that Devon
ran into a snag or didn't really know
what to do she would obviously be
completely underwater not know where to
go because she doesn't have that
technical background so you still need
those problem solving and technical
skills in order to actually use a tool
like this and I keep using the word tool
because really this is a tool this is
something that software engineers and
developers are going to be able to use
to speed up their coding workflow maybe
make certain things easier for them
maybe make some tedious tasks not be
something that you need to manually do
just like things like AI autocomplete
like chat GPT and co-pilot have made
doing certain things in coding a lot
easier they haven't replaced your job
they just modified how you work and made
certain things easier I think Devon is
just another example of a tool that's
going to make actually working in
programming a little bit easier it's
going to clean up certain things for you
make certain learnings a little bit
easier but the actual knowledge of being
a developer where you actually need to
think about how to solve real world
problems and you to develop custom
solutions to complex problems and just
be a problem solver that is something
that AI is really not capable of
replacing currently and something I
don't think it'll be able to replace in
the future these tools are really cool
and they have a lot of potential but
really their potential is to empower you
as a developer and not to replace you
now don't get me wrong I think these
tools are really impressive and really
cool but if you're worried about Devon
replacing your job you really don't have
to worry about it because you as a
developer knowing how to think like a
programmer are the core skills you have
and being able to write out like code
for certain things is not your core
skill it's your ability to problem solve
and so on that these AI tools really
struggle with and are probably never
going to be able to replicate now with
that said I really hope you enjoyed this
video and have a good day
Ver Más Videos Relacionados
Software Engineers and IT Leaders are Dead Wrong about AI
OpenDevin Tutorial (Open-Source Devin) - Build Entire Apps From a Single Prompt
Is GPT 4o Developer Doom?
AI just officially took our jobs… I hate you Devin
First AI Software Engineer Devin By Cognition AI :(- Lag Gaye Bhai
The existential crisis of programmers by AI
5.0 / 5 (0 votes)