Devin AI - Are Software Engineers finally doomed?
Summary
TLDR该视频脚本讨论了名为Devon的AI软件工程师的发布,它能够通过自己的Shell代码编辑器和网络浏览器解决工程任务。虽然Devon在软件基准测试中取得了14%的解决率,但评论者认为这还不够令人震惊,因为这些测试仅限于开源项目,而非大型企业代码库。评论者期待AI能够在企业级代码库中进行更复杂的集成和调试工作,认为这将是AI技术发展的重要一步。
Takeaways
- 🚀 Devon是Cognition Labs开发的一个新的AI工程师,能够像软件工程师一样执行任务。
- 🧠 Devon通过了实际的工程面试并完成了Upwork上的一些真实工作。
- 🛠️ Devon在软件基准测试中正确解决了大约14%的问题,而之前AI的解决率仅为2%到5%。
- 🔒 存在对Devon自主操作和处理敏感信息时的安全担忧。
- 🔍 Devon使用自己的Shell代码编辑器和网络浏览器来解决工程任务。
- 📈 尽管Devon取得了进步,但在处理大型企业级代码库和微服务时仍面临挑战。
- 🤖 Devon在调试时使用打印语句而非传统软件工程师使用的断点技术。
- 📚 开源项目通常只涉及单一问题,而实际工作中可能需要解决多个代码库和服务的问题。
- 🔄 Devon能够自我训练并微调自己的AI模型,这对于未来的软件工程工作具有潜力。
- 🌐 企业可能不愿意开放他们的代码库让AI进行学习和问题解决。
- 🔮 虽然Devon的进展令人兴奋,但在成为真正的AI合作伙伴之前,仍有很长的路要走。
Q & A
Devon是做什么的?
-Devon是一个由Cognition Labs开发的人工智能工程师,它能够像软件工程师一样执行任务。
Devon在软件工程基准测试中的表现如何?
-Devon在软件工程基准测试中正确解决了大约14%的问题,而之前只有2%到5%的问题能在无人帮助的情况下解决。
Devon完成的实际工程面试和Upwork上的真实工作是什么情况?
-Devon已经成功通过了实际工程面试,并在Upwork上完成了一些真实的工作。但这些工作通常是比较直接和简单的任务。
Devon的自主性体现在哪些方面?
-Devon通过自己的Shell代码编辑器和网络浏览器来解决工程任务,这意味着它可以独立地执行编程和网络相关的操作。
安全性是如何在Devon的操作中得到考虑的?
-安全性是通过确保Devon在一个受控环境中操作,并限制其对敏感数据和可能造成破坏性操作的访问来实现的。
开源项目在Devon训练中的作用是什么?
-开源项目用于训练Devon,因为它提供了大量的代码样本和问题场景,但这些项目可能没有大型企业代码库那么复杂和敏感。
Devon如何处理意外错误?
-当遇到意外错误时,Devon会添加调试打印语句,重新运行代码,并使用日志中的错误信息来修复bug。
Devon的自我学习能力如何体现?
-Devon能够训练和微调自己的AI模型,这意味着它可以适应和优化特定任务或代码库的解决方案。
Devon在软件工程领域的未来潜力是什么?
-Devon的长期潜力在于成为一个真正的AI合作伙伴,能够在企业级的大型代码库中工作,并能够理解和解决跨多个服务和库的复杂问题。
Devon目前面临的挑战有哪些?
-Devon目前面临的挑战包括处理更复杂的企业级代码库,理解多个服务和库之间的交互,以及确保操作的安全性和避免破坏性行为。
评论者Scott对Devon的整体评价是什么?
-Scott认为Devon是一个向正确方向迈出的巨大步伐,他对AI成为真正的软件工程合作伙伴感到兴奋,但他认为Devon还没有达到那个水平,还有很长的路要走。
Outlines
🤖 对新AI工程师Devon的介绍与初步印象
该段落介绍了一个新的AI工程师Devon,他是由Cognition Labs开发的自主代理,能够接受指令并像软件工程师一样完成任务。虽然Devon在软件编码基准测试中取得了一定的成绩,并通过了实际的工程面试,但作者对此并不感到印象深刻。作者认为,工程面试通常有固定的流程和规则,适合AI来完成。此外,Devon在Upwork上完成了一些实际工作,但这些工作通常是简单的任务。作者对Devon能够自主使用Shell代码编辑器和Web浏览器感到惊讶,但也提出了安全方面的担忧,例如AI在操作数据库或API时可能会产生的风险。尽管如此,作者认为这是一个有趣的进步,并期待未来的发展。
🔍 Devon在实际工作中的应用示例和挑战
在这一段中,Scott从Cognition AI介绍了Devon的实际工作示例。Devon能够制定解决问题的步骤计划,并独立执行任务,如评估不同API提供者的性能。作者喜欢Devon自主解决问题的方式,但也指出了安全性的问题,例如如何安全地存储和管理秘密信息。Devon在执行任务时遇到了错误,但它能够通过添加调试语句和分析日志来解决问题,这显示了其自我调试的能力。最后,Devon成功构建并部署了一个带有完整样式的网站,这表明了AI在软件工程中的潜力,尽管作者认为这还不足以令人震惊。
🚀 对Devon未来发展的期待与批评
作者在这一段落中表达了对Devon未来发展的期待和批评。虽然认为Devon是一个积极的步骤,但作者认为还有很长的路要走。作者观察了其他示例和评论,发现Devon在处理不熟悉的技术或贡献到成熟的代码库方面表现出了一些能力,但这些任务通常是孤立的,不涉及微服务等现代软件工程的复杂性。作者希望看到AI能够处理企业级的大规模代码库,并与现有的系统集成。作者强调,尽管Devon是一个很好的工具,但它还不是真正的合作伙伴,作者期待AI能够更全面地评估和处理整个系统的问题。
Mindmap
Keywords
💡AI工程师
💡软件工程
💡自主代理
💡安全问题
💡开源项目
💡微服务
💡长短期规划
💡调试
💡API
💡Devon AI
💡Upwork
Highlights
Devin是来自Cognition Labs的AI工程师,能够像软件工程师一样执行任务。
Devin在软件编码基准测试中取得了新的最佳状态,并成功通过了实际工程面试。
Devin甚至在Upwork上完成了真实工作,但这些工作通常是直接且简单的。
Devin通过自己的Shell代码编辑器和网络浏览器解决工程任务,这一点非常引人注目。
Devin在SWE基准测试中正确解决了大约14%的问题,而之前只有2%到5%。
开源项目可能不如大型企业代码库那样复杂和敏感,Devin在这些项目上的表现可能不会那么出色。
Devin的能力包括自我学习和微调自己的AI模型,这对于未来的发展非常有前景。
Devin的演示包括制定解决问题的逐步计划,这是其方法论的一部分。
Devin能够使用浏览器查找API文档,这是对软件工程师非常有帮助的功能。
Devin在遇到意外错误时,能够添加调试打印语句并修复bug。
Devin最终成功构建并部署了一个具有完整样式的网站。
Devin的进步表明AI在软件工程领域的潜力,尽管目前还有很长的路要走。
评论中提到的其他示例显示Devin能够学习和使用不熟悉的技术。
Devin在成熟的产品库中进行调试,显示了其在特定库中的调试能力。
Devin在Upwork上完成的真实工作相对简单,并不令人震惊。
Devin的发展和应用展示了AI与软件工程师合作的可能性。
Devin目前的能力还未能达到在企业级大型代码库中工作的水平。
Transcripts
wow this is actually kind of really
exciting so yesterday there was an
announcement that there's a new AI
engineer called Devin no not you Devin
from cognition Labs an AI engineer
called Devon so Devon seems to be this
autonomous agent that allows you to give
them instructions as if they were a
software engineer and him complete them
and the video is actually pretty wild
and you know what I'm going to go over
the announcement line by line and Al the
video frame by frame and give you my
impression I've been doing this I've
been a software engineer for 15 years
and I've been following AI like uh
advancements pretty closely this is
actually pretty crazy to watch but let's
just dive into the post all right so
here's a post Devon is is claiming to be
the new
state-of-the-art uh on the software
bench coding Benchmark that's worded
weirdly uh and has successfully passed
practical engineering interviews this
doesn't really impress me
the Practical engineering interviews if
it's in an interview then it's not
really a hard problem if you're expected
to finish a task and solve a problem
within an hour or two that's not really
a hard problem and nearly everyone knows
that has gone through an engineering
interview knows that um it's kind of a
very robotic process and you just got to
learn the rules and algorithms and the
process to do well on those interviews
so that's perfect for a bot and that's
not really that
interesting uh and has even completed
real jobs and upwork that also doesn't
really surprise me that much uh some of
those jobs are pretty straightforward
they just need the work done um the demo
they had was actually kind of
interesting down below of what the the
job was on upwork but again um it's
progress but nothing like Earth
shattering in my eyes it's anonomous
agent that solves engineering tasks
through its own Shell Code editor and
web browser okay that's bananas to me uh
for multiple reasons for the most part
like letting AI just run through these
kind of things in their own environment
um I immediately just think of like so
many different like cons security
concerns like if they're doing real work
there's a lot of like Secrets involved
there's a lot of concerns about uh right
right right guards of like whether or
not you want to delete Row from a
database interact with the database or
even an API you got to be super careful
that the AI is not doing destructive
actions that it's more of a curiosity
and there's just a bunch of security
concerns in that regards but that's all
stuff that you can work out that can get
better in the future like right now
that's like one of my main concerns it's
wild that you can do its own shell
editor and browser but uh you got to
think about the side effects and what
they're able to do and what they're able
to ver prevent it from
doing that's still cool so this was the
big thing is that the
swe bench Benchmark which is uh where
they have ai try to solve real world
open-source projects and Devin correctly
solved about 14% of it unassisted that's
pretty sweet and and before it was only
like 2% or like if you helped it it was
like 5% um I love this Benchmark but
there's one big fault I have with it is
this part open source project
um I I understand that that's the only
way that you can really train this
because you're not going to get
proprietary code to really like play
with uh real business problems but uh I
I feel like this is the kitty pool this
I feel like open- Source projects are
not going to be as big as complicated
and as sensitive as like real corporate
uh uh large large code
bases uh so I could see AI falling on
its face when you get to like real
Legacy
code this is still impressive but
there's still like a long ways for this
to be really interesting and there's a
lot of um concerns with that is that you
also need the company to like sign off
on having an AI like con uh ingest all
of your code so that it can learn off of
that and then be able to try to solve
issues or real tickets um I don't see a
lot of companies being very excited to
just being able to like open up their
code for this to happen but maybe maybe
there's there are ways where you can
like but eventually I imagine there's
going to be a Devon that's able to be
self-contained or be able to be run on
site versus like through the network um
maybe I don't know but that's a lot
that's a lot one other thing that I have
kind of a huge issue with is open source
projects is that's um isolated on one
problem so it's isolated into one
library or one application but uh real
uh real world software engineering
especially nowadays people are like
really pushing microservices I hate
microservices but that's fine but like
if you're trying to solve problems in
multiple code bases and having the AI be
trained and knowledgeable about how all
those uh services and how these all
these others libraries like match up and
work together that's a different issue
that's a different uh uh problem it can
get complicated super quickly especially
when you start thinking about libraries
and their versioning and uh what version
this service is using versus what that's
using and how each of the parts of the
application are supposed to communicate
with one another um like I said I think
the op Source projects is a very uh
Kitty pool example they they do have
complicated big mature projects out
there but I'm wondering if the 14% of
the issues that are solved are on those
complicated ones versus issues found on
smaller
applications still exciting but I feel
like there's still a long ways to go
there hey I'm Scott from cognition Ai
and today I'm really excited to
introduce you to Devon the first AI
software engineer let me show you an
example of Devon in
action I'm going to ask Devon to
Benchmark the performance of llama and a
couple different API providers
from now on Devon is in the driver's
seat first Deon makes a step-by-step
plan of how to tackle the problem okay I
love this part is that it's mapping out
how it's approaching all the problems uh
this problem that it's giving isn't that
exciting it's uh interacting with uh
newer Technologies and comparing them
but uh this could be a real task for
like more Junior Engineers but um it's
super interesting that it's going to run
off and do this on its own uh console
and
everything after that it builds a whole
project using all the same tools that a
human software engineer would use Devon
has its own command
line its own code editor this is another
thing about security that I think is
really interesting that they're going to
have to try to solve is how to handle
appropriate uh secret storage of like if
you want AI to act like an actual
engineer then you need to have some
level of trust of giving it these kind
of Secrets and trust that it will never
get leaked outside of its own container
I'm assuming it's it's uh in its own
container but uh when you give them real
Keys you're also giving it the
permission and the ability to do
destructive actions on any of these
services for example that's something
they have to
consider and even its own
browser in this case Devon decides to
use the browser to pull up API
documentation so that it can read and
learn how to PL each of these
apis that's sweet that's sweet because
that's something that is a big pain in
the butt to do over and over again as a
software engineer if Devon can help you
ramp up on all these apis and be able to
uh collaborate with it that's a really
cool
feature here Deon runs into an
unexpected
error Deon actually decides to add a
debugging print
statement reruns the code with the
debugging print statement and then uses
the error in the logs to figure out how
to fix the bug see this is super
interesting in my eyes because it's
trying to debug but it's it's going at
the level of the console out so
that so in order for this to work it has
to use print statements to understand
what the output is versus like um break
points and dealing with the the the
local data at the
time this is super fascinating but I
want to see what happens in the near
future finally Devon decides to build
and deploy a website with full styling
as the
visualization this has been done
before all of this is possible today
because of the advancements that we've
made in both reasoning and long-term
planning it's really hard problem and
we've only oh this is them this is re
them regurgitating the fact that they uh
solved 14% of issues unassisted uh which
is a huge increase but again we're
talking about open source projects open
source issues and issues that are only
uh Scopes to just
one uh one application one repo um when
you start integrating these sort of
libraries and these applications that's
when things get complicated fast and
that's what I want AI to help me with
with but I guess we're not there
yet just started but we're super excited
about the progress that we've made so
far okay so this isn't like Earth
shattering to me but it is super
exciting what makes me more excited is
like the path they're choosing and the
path that they're they're growing to
like actually doing software engineering
work um it's it's a step in the right
direction but there's still like a long
ways to go uh I also looked at the
comments and some of the other examples
let's pull those up up to okay now
looking at this this is thing and one of
the first things is uh I can learn how
to use unfamiliar Technologies and uh
this this is pretty cool but it seems a
lot more in line with like uh Dolly and
image manipulation and it just feels
more mature in that sense and I feel
like there's technology on it that's
like spinning up the the static websites
pretty easily pretty cool but I don't
know how many companies are going to be
asking for this kind of work contribute
to mature produ repositories now you
might think that that's me that's
criticizing my comment earlier but it's
not if if when you look at this it's a
mature like uh algebra library that
they're trying to debug which is still
pretty interesting but again
self-contained in a very specific
Library that's not like an average use
case for software engineering or web
development and how integrated
everything works um it's a step in the
right direction and it's exciting how it
can start debugging like that and I feel
like this is more a advancement of it
being a tool of how to software
Engineers can enhance how they work be
good for like pair programming if you're
with uh this Devon AI Devon can train
and fine-tune its own AI models this is
awesome this is this is promis for like
the future when you start advancing your
own code base and if you do have one of
these agents inside of your business
inside of your team then you're going to
need this kind of skill to be able to
like uh pivot and manipulate uh what it
has learned so far so that it can be
more productive in the future so this is
like a this is a pretty solid
foundational
move and this is what they said earlier
about uh giv real jobs and upwork and it
can do those too yeah yeah some some of
these upwork jobs are pretty
straightforward um it's not
crazy um I feel like that's something
that people can already do today so that
doesn't blow me away that much so what's
my take what do I think about this I
think this is a wild huge step in the
right direction of having AI be someone
that you can like work with but um I
don't know maybe I'm just overly
critical but there's a lot of things
that I want to see it do and this is
something that like people think that I
uh you might think that I hate the the
idea that I'm so much better than Ai No
I want this technology to advance
because there's so much about my job and
the things that I do that I'd rather
have a robot helping me going through it
or even replacing some of the work that
I'd like to do that's fine so that being
said I want this to I want things like
this to grow into like being real
co-workers in a sense real AI co-workers
but what I'm seeing here is something
that I I don't think it's it's there yet
the biggest steps that I'm that I want
to see are it being able to go into
Enterprise level like large large code
bases that are not open- source that
they're not trained on and seeing how
well that you can train with your
existing code base in how it can ingest
I don't know like 30 applications and
how they all interact with one another
and then learn and be productive from
then because that's half the battle of
debugging or trying to advance new
feature work is where it needs to go
where the problem is and it's not just
like a line of code it could be an
entire application or it can be a
different service or it can be a library
or it could be a library that you don't
own that you have to try to figure out
uh how to work around the issue if the
bug in that Library hasn't been solved
in itself so there's so many different
factors and I feel like Devon and all
this AI is focusing on one part of it
when we start getting into better ideas
of like how it can holistically uh
assess an entire system that's when
things will start getting cool this
Devon is wild this is pretty cool uh but
it's not earth shattering yet uh I'm
gonna keep working I don't think I'm
going to lose my job very very soon but
you know Famous Last Words we'll see so
that's all about the announcement but
also if you're really curious here's a
video that I talk about more about how I
don't think AI is going to replace us as
software Engineers but also I want to
start the conversation in the comments
so let me know what you guys think of
this advancement and what it's going to
look like in the next year or two when
we have this kind of Technology out here
but thanks for watching guys I'll see
you
later
関連動画をさらに表示
New OPEN SOURCE Software ENGINEER Agent Outperforms ALL! (Open Source DEVIN!)
Software engineers and programmers in reddit what was your response to Nvidia CEO saying programmin
Software Engineering: Crash Course Computer Science #16
LangGraph: Creating A Multi-Agent LLM Coding Framework!
Lecture 1.1 — Why do we need machine learning — [ Deep Learning | Geoffrey Hinton | UofT ]
GopherCon 2015: Derek Parker - Delve Into Go
5.0 / 5 (0 votes)