Devin AI - Are Software Engineers finally doomed?

Cody Codes
13 Mar 202414:50

Summary

TLDR该视频脚本讨论了名为Devon的AI软件工程师的发布,它能够通过自己的Shell代码编辑器和网络浏览器解决工程任务。虽然Devon在软件基准测试中取得了14%的解决率,但评论者认为这还不够令人震惊,因为这些测试仅限于开源项目,而非大型企业代码库。评论者期待AI能够在企业级代码库中进行更复杂的集成和调试工作,认为这将是AI技术发展的重要一步。

Takeaways

  • 🚀 Devon是Cognition Labs开发的一个新的AI工程师,能够像软件工程师一样执行任务。
  • 🧠 Devon通过了实际的工程面试并完成了Upwork上的一些真实工作。
  • 🛠️ Devon在软件基准测试中正确解决了大约14%的问题,而之前AI的解决率仅为2%到5%。
  • 🔒 存在对Devon自主操作和处理敏感信息时的安全担忧。
  • 🔍 Devon使用自己的Shell代码编辑器和网络浏览器来解决工程任务。
  • 📈 尽管Devon取得了进步,但在处理大型企业级代码库和微服务时仍面临挑战。
  • 🤖 Devon在调试时使用打印语句而非传统软件工程师使用的断点技术。
  • 📚 开源项目通常只涉及单一问题,而实际工作中可能需要解决多个代码库和服务的问题。
  • 🔄 Devon能够自我训练并微调自己的AI模型,这对于未来的软件工程工作具有潜力。
  • 🌐 企业可能不愿意开放他们的代码库让AI进行学习和问题解决。
  • 🔮 虽然Devon的进展令人兴奋,但在成为真正的AI合作伙伴之前,仍有很长的路要走。

Q & A

  • Devon是做什么的?

    -Devon是一个由Cognition Labs开发的人工智能工程师,它能够像软件工程师一样执行任务。

  • Devon在软件工程基准测试中的表现如何?

    -Devon在软件工程基准测试中正确解决了大约14%的问题,而之前只有2%到5%的问题能在无人帮助的情况下解决。

  • Devon完成的实际工程面试和Upwork上的真实工作是什么情况?

    -Devon已经成功通过了实际工程面试,并在Upwork上完成了一些真实的工作。但这些工作通常是比较直接和简单的任务。

  • Devon的自主性体现在哪些方面?

    -Devon通过自己的Shell代码编辑器和网络浏览器来解决工程任务,这意味着它可以独立地执行编程和网络相关的操作。

  • 安全性是如何在Devon的操作中得到考虑的?

    -安全性是通过确保Devon在一个受控环境中操作,并限制其对敏感数据和可能造成破坏性操作的访问来实现的。

  • 开源项目在Devon训练中的作用是什么?

    -开源项目用于训练Devon,因为它提供了大量的代码样本和问题场景,但这些项目可能没有大型企业代码库那么复杂和敏感。

  • Devon如何处理意外错误?

    -当遇到意外错误时,Devon会添加调试打印语句,重新运行代码,并使用日志中的错误信息来修复bug。

  • Devon的自我学习能力如何体现?

    -Devon能够训练和微调自己的AI模型,这意味着它可以适应和优化特定任务或代码库的解决方案。

  • Devon在软件工程领域的未来潜力是什么?

    -Devon的长期潜力在于成为一个真正的AI合作伙伴,能够在企业级的大型代码库中工作,并能够理解和解决跨多个服务和库的复杂问题。

  • Devon目前面临的挑战有哪些?

    -Devon目前面临的挑战包括处理更复杂的企业级代码库,理解多个服务和库之间的交互,以及确保操作的安全性和避免破坏性行为。

  • 评论者Scott对Devon的整体评价是什么?

    -Scott认为Devon是一个向正确方向迈出的巨大步伐,他对AI成为真正的软件工程合作伙伴感到兴奋,但他认为Devon还没有达到那个水平,还有很长的路要走。

Outlines

00:00

🤖 对新AI工程师Devon的介绍与初步印象

该段落介绍了一个新的AI工程师Devon,他是由Cognition Labs开发的自主代理,能够接受指令并像软件工程师一样完成任务。虽然Devon在软件编码基准测试中取得了一定的成绩,并通过了实际的工程面试,但作者对此并不感到印象深刻。作者认为,工程面试通常有固定的流程和规则,适合AI来完成。此外,Devon在Upwork上完成了一些实际工作,但这些工作通常是简单的任务。作者对Devon能够自主使用Shell代码编辑器和Web浏览器感到惊讶,但也提出了安全方面的担忧,例如AI在操作数据库或API时可能会产生的风险。尽管如此,作者认为这是一个有趣的进步,并期待未来的发展。

05:03

🔍 Devon在实际工作中的应用示例和挑战

在这一段中,Scott从Cognition AI介绍了Devon的实际工作示例。Devon能够制定解决问题的步骤计划,并独立执行任务,如评估不同API提供者的性能。作者喜欢Devon自主解决问题的方式,但也指出了安全性的问题,例如如何安全地存储和管理秘密信息。Devon在执行任务时遇到了错误,但它能够通过添加调试语句和分析日志来解决问题,这显示了其自我调试的能力。最后,Devon成功构建并部署了一个带有完整样式的网站,这表明了AI在软件工程中的潜力,尽管作者认为这还不足以令人震惊。

10:06

🚀 对Devon未来发展的期待与批评

作者在这一段落中表达了对Devon未来发展的期待和批评。虽然认为Devon是一个积极的步骤,但作者认为还有很长的路要走。作者观察了其他示例和评论,发现Devon在处理不熟悉的技术或贡献到成熟的代码库方面表现出了一些能力,但这些任务通常是孤立的,不涉及微服务等现代软件工程的复杂性。作者希望看到AI能够处理企业级的大规模代码库,并与现有的系统集成。作者强调,尽管Devon是一个很好的工具,但它还不是真正的合作伙伴,作者期待AI能够更全面地评估和处理整个系统的问题。

Mindmap

Keywords

💡AI工程师

AI工程师指的是专门从事人工智能领域软件工程设计、开发和维护的专业人员。在视频中,提到了一个名为Devon的AI工程师,这是一个自主代理,能够接收指令并完成软件工程师的任务。

💡软件工程

软件工程是指使用工程化原则、方法、工具和实践来开发、维护、退役软件的一门学科。视频中的Devon作为一个AI工程师,其核心任务就是完成软件工程相关的工作。

💡自主代理

自主代理是指能够在没有外部指令的情况下独立执行任务的系统或实体。在视频中,Devon作为一个自主代理,能够自主地完成编程任务和解决工程问题。

💡安全问题

安全问题通常指的是在信息技术领域中,可能对系统的数据完整性、可用性或保密性构成威胁的风险。视频中提到,当AI工程师如Devon在执行任务时,需要考虑其可能带来的安全风险,如对数据库的不当操作或API的误用。

💡开源项目

开源项目指的是其源代码对公众开放,允许任何人查看、使用、修改和分发的软件项目。视频中提到Devon在开源项目上的表现,这通常是AI工程师学习和提升技能的平台。

💡微服务

微服务是一种软件开发架构风格,它将应用程序分解为一组小型服务,每个服务运行在其独立的进程中,并通过轻量级的通信机制进行交互。视频中提到,真正的软件工程问题往往涉及到多个微服务的集成和协调。

💡长短期规划

长短期规划是指在进行任务或项目时,同时考虑短期目标的实现和长期目标的规划。在视频中,Devon展示了其在解决问题时的长短期规划能力,如制定步骤计划和处理意外错误。

💡调试

调试是软件开发过程中的一个步骤,旨在发现和修正程序代码中的错误。在视频中,Devon通过添加调试语句和分析错误日志来修复代码中的错误,展示了其调试能力。

💡API

API(应用程序编程接口)是一套预定义的函数或规则,允许不同的软件应用程序之间进行交互。视频中提到Devon使用自己的浏览器来查阅API文档,以便学习和理解如何使用不同的API。

💡Devon AI

Devon AI是视频中提到的一个AI软件工程师的名称,它能够自主地完成编程任务和解决工程问题。Devon AI代表了人工智能在软件工程领域的应用和进步。

💡Upwork

Upwork是一个全球性的在线市场,允许自由职业者和雇主之间进行匹配,以完成各种项目和任务。视频中提到Devon在Upwork上完成了一些真实工作,这表明AI工程师在实际项目中的应用潜力。

Highlights

Devin是来自Cognition Labs的AI工程师,能够像软件工程师一样执行任务。

Devin在软件编码基准测试中取得了新的最佳状态,并成功通过了实际工程面试。

Devin甚至在Upwork上完成了真实工作,但这些工作通常是直接且简单的。

Devin通过自己的Shell代码编辑器和网络浏览器解决工程任务,这一点非常引人注目。

Devin在SWE基准测试中正确解决了大约14%的问题,而之前只有2%到5%。

开源项目可能不如大型企业代码库那样复杂和敏感,Devin在这些项目上的表现可能不会那么出色。

Devin的能力包括自我学习和微调自己的AI模型,这对于未来的发展非常有前景。

Devin的演示包括制定解决问题的逐步计划,这是其方法论的一部分。

Devin能够使用浏览器查找API文档,这是对软件工程师非常有帮助的功能。

Devin在遇到意外错误时,能够添加调试打印语句并修复bug。

Devin最终成功构建并部署了一个具有完整样式的网站。

Devin的进步表明AI在软件工程领域的潜力,尽管目前还有很长的路要走。

评论中提到的其他示例显示Devin能够学习和使用不熟悉的技术。

Devin在成熟的产品库中进行调试,显示了其在特定库中的调试能力。

Devin在Upwork上完成的真实工作相对简单,并不令人震惊。

Devin的发展和应用展示了AI与软件工程师合作的可能性。

Devin目前的能力还未能达到在企业级大型代码库中工作的水平。

Transcripts

play00:00

wow this is actually kind of really

play00:02

exciting so yesterday there was an

play00:04

announcement that there's a new AI

play00:06

engineer called Devin no not you Devin

play00:11

from cognition Labs an AI engineer

play00:14

called Devon so Devon seems to be this

play00:17

autonomous agent that allows you to give

play00:19

them instructions as if they were a

play00:21

software engineer and him complete them

play00:24

and the video is actually pretty wild

play00:26

and you know what I'm going to go over

play00:27

the announcement line by line and Al the

play00:30

video frame by frame and give you my

play00:32

impression I've been doing this I've

play00:34

been a software engineer for 15 years

play00:35

and I've been following AI like uh

play00:38

advancements pretty closely this is

play00:40

actually pretty crazy to watch but let's

play00:42

just dive into the post all right so

play00:44

here's a post Devon is is claiming to be

play00:47

the new

play00:48

state-of-the-art uh on the software

play00:51

bench coding Benchmark that's worded

play00:53

weirdly uh and has successfully passed

play00:56

practical engineering interviews this

play00:58

doesn't really impress me

play01:00

the Practical engineering interviews if

play01:02

it's in an interview then it's not

play01:04

really a hard problem if you're expected

play01:06

to finish a task and solve a problem

play01:08

within an hour or two that's not really

play01:11

a hard problem and nearly everyone knows

play01:14

that has gone through an engineering

play01:15

interview knows that um it's kind of a

play01:17

very robotic process and you just got to

play01:19

learn the rules and algorithms and the

play01:21

process to do well on those interviews

play01:24

so that's perfect for a bot and that's

play01:26

not really that

play01:28

interesting uh and has even completed

play01:31

real jobs and upwork that also doesn't

play01:33

really surprise me that much uh some of

play01:36

those jobs are pretty straightforward

play01:37

they just need the work done um the demo

play01:41

they had was actually kind of

play01:42

interesting down below of what the the

play01:44

job was on upwork but again um it's

play01:48

progress but nothing like Earth

play01:51

shattering in my eyes it's anonomous

play01:54

agent that solves engineering tasks

play01:56

through its own Shell Code editor and

play01:59

web browser okay that's bananas to me uh

play02:02

for multiple reasons for the most part

play02:05

like letting AI just run through these

play02:08

kind of things in their own environment

play02:10

um I immediately just think of like so

play02:13

many different like cons security

play02:14

concerns like if they're doing real work

play02:16

there's a lot of like Secrets involved

play02:19

there's a lot of concerns about uh right

play02:22

right right guards of like whether or

play02:24

not you want to delete Row from a

play02:25

database interact with the database or

play02:27

even an API you got to be super careful

play02:30

that the AI is not doing destructive

play02:33

actions that it's more of a curiosity

play02:36

and there's just a bunch of security

play02:37

concerns in that regards but that's all

play02:40

stuff that you can work out that can get

play02:42

better in the future like right now

play02:44

that's like one of my main concerns it's

play02:45

wild that you can do its own shell

play02:47

editor and browser but uh you got to

play02:49

think about the side effects and what

play02:51

they're able to do and what they're able

play02:53

to ver prevent it from

play02:55

doing that's still cool so this was the

play02:58

big thing is that the

play03:00

swe bench Benchmark which is uh where

play03:04

they have ai try to solve real world

play03:06

open-source projects and Devin correctly

play03:09

solved about 14% of it unassisted that's

play03:13

pretty sweet and and before it was only

play03:16

like 2% or like if you helped it it was

play03:19

like 5% um I love this Benchmark but

play03:24

there's one big fault I have with it is

play03:27

this part open source project

play03:30

um I I understand that that's the only

play03:32

way that you can really train this

play03:34

because you're not going to get

play03:35

proprietary code to really like play

play03:37

with uh real business problems but uh I

play03:41

I feel like this is the kitty pool this

play03:43

I feel like open- Source projects are

play03:46

not going to be as big as complicated

play03:48

and as sensitive as like real corporate

play03:53

uh uh large large code

play03:56

bases uh so I could see AI falling on

play04:00

its face when you get to like real

play04:02

Legacy

play04:04

code this is still impressive but

play04:06

there's still like a long ways for this

play04:08

to be really interesting and there's a

play04:10

lot of um concerns with that is that you

play04:12

also need the company to like sign off

play04:15

on having an AI like con uh ingest all

play04:20

of your code so that it can learn off of

play04:22

that and then be able to try to solve

play04:24

issues or real tickets um I don't see a

play04:27

lot of companies being very excited to

play04:29

just being able to like open up their

play04:31

code for this to happen but maybe maybe

play04:34

there's there are ways where you can

play04:36

like but eventually I imagine there's

play04:38

going to be a Devon that's able to be

play04:39

self-contained or be able to be run on

play04:42

site versus like through the network um

play04:46

maybe I don't know but that's a lot

play04:47

that's a lot one other thing that I have

play04:50

kind of a huge issue with is open source

play04:52

projects is that's um isolated on one

play04:56

problem so it's isolated into one

play04:58

library or one application but uh real

play05:02

uh real world software engineering

play05:04

especially nowadays people are like

play05:06

really pushing microservices I hate

play05:08

microservices but that's fine but like

play05:11

if you're trying to solve problems in

play05:13

multiple code bases and having the AI be

play05:16

trained and knowledgeable about how all

play05:18

those uh services and how these all

play05:21

these others libraries like match up and

play05:22

work together that's a different issue

play05:25

that's a different uh uh problem it can

play05:27

get complicated super quickly especially

play05:30

when you start thinking about libraries

play05:32

and their versioning and uh what version

play05:35

this service is using versus what that's

play05:37

using and how each of the parts of the

play05:40

application are supposed to communicate

play05:42

with one another um like I said I think

play05:45

the op Source projects is a very uh

play05:47

Kitty pool example they they do have

play05:50

complicated big mature projects out

play05:53

there but I'm wondering if the 14% of

play05:56

the issues that are solved are on those

play05:59

complicated ones versus issues found on

play06:01

smaller

play06:03

applications still exciting but I feel

play06:05

like there's still a long ways to go

play06:14

there hey I'm Scott from cognition Ai

play06:17

and today I'm really excited to

play06:18

introduce you to Devon the first AI

play06:20

software engineer let me show you an

play06:22

example of Devon in

play06:25

action I'm going to ask Devon to

play06:27

Benchmark the performance of llama and a

play06:28

couple different API providers

play06:31

from now on Devon is in the driver's

play06:33

seat first Deon makes a step-by-step

play06:35

plan of how to tackle the problem okay I

play06:37

love this part is that it's mapping out

play06:40

how it's approaching all the problems uh

play06:42

this problem that it's giving isn't that

play06:45

exciting it's uh interacting with uh

play06:48

newer Technologies and comparing them

play06:50

but uh this could be a real task for

play06:52

like more Junior Engineers but um it's

play06:57

super interesting that it's going to run

play06:58

off and do this on its own uh console

play07:01

and

play07:04

everything after that it builds a whole

play07:06

project using all the same tools that a

play07:07

human software engineer would use Devon

play07:10

has its own command

play07:12

line its own code editor this is another

play07:15

thing about security that I think is

play07:17

really interesting that they're going to

play07:18

have to try to solve is how to handle

play07:21

appropriate uh secret storage of like if

play07:25

you want AI to act like an actual

play07:27

engineer then you need to have some

play07:28

level of trust of giving it these kind

play07:30

of Secrets and trust that it will never

play07:32

get leaked outside of its own container

play07:35

I'm assuming it's it's uh in its own

play07:37

container but uh when you give them real

play07:41

Keys you're also giving it the

play07:43

permission and the ability to do

play07:45

destructive actions on any of these

play07:47

services for example that's something

play07:49

they have to

play07:52

consider and even its own

play07:55

browser in this case Devon decides to

play07:57

use the browser to pull up API

play07:58

documentation so that it can read and

play08:00

learn how to PL each of these

play08:02

apis that's sweet that's sweet because

play08:05

that's something that is a big pain in

play08:07

the butt to do over and over again as a

play08:09

software engineer if Devon can help you

play08:12

ramp up on all these apis and be able to

play08:15

uh collaborate with it that's a really

play08:17

cool

play08:20

feature here Deon runs into an

play08:22

unexpected

play08:27

error Deon actually decides to add a

play08:29

debugging print

play08:32

statement reruns the code with the

play08:33

debugging print statement and then uses

play08:36

the error in the logs to figure out how

play08:38

to fix the bug see this is super

play08:40

interesting in my eyes because it's

play08:42

trying to debug but it's it's going at

play08:45

the level of the console out so

play08:55

that so in order for this to work it has

play08:58

to use print statements to understand

play08:59

what the output is versus like um break

play09:04

points and dealing with the the the

play09:06

local data at the

play09:07

time this is super fascinating but I

play09:11

want to see what happens in the near

play09:13

future finally Devon decides to build

play09:15

and deploy a website with full styling

play09:17

as the

play09:19

visualization this has been done

play09:23

before all of this is possible today

play09:25

because of the advancements that we've

play09:26

made in both reasoning and long-term

play09:28

planning it's really hard problem and

play09:30

we've only oh this is them this is re

play09:33

them regurgitating the fact that they uh

play09:35

solved 14% of issues unassisted uh which

play09:38

is a huge increase but again we're

play09:41

talking about open source projects open

play09:43

source issues and issues that are only

play09:45

uh Scopes to just

play09:47

one uh one application one repo um when

play09:52

you start integrating these sort of

play09:53

libraries and these applications that's

play09:56

when things get complicated fast and

play09:57

that's what I want AI to help me with

play09:59

with but I guess we're not there

play10:02

yet just started but we're super excited

play10:05

about the progress that we've made so

play10:07

far okay so this isn't like Earth

play10:09

shattering to me but it is super

play10:11

exciting what makes me more excited is

play10:13

like the path they're choosing and the

play10:15

path that they're they're growing to

play10:17

like actually doing software engineering

play10:18

work um it's it's a step in the right

play10:21

direction but there's still like a long

play10:22

ways to go uh I also looked at the

play10:26

comments and some of the other examples

play10:28

let's pull those up up to okay now

play10:31

looking at this this is thing and one of

play10:33

the first things is uh I can learn how

play10:36

to use unfamiliar Technologies and uh

play10:39

this this is pretty cool but it seems a

play10:41

lot more in line with like uh Dolly and

play10:44

image manipulation and it just feels

play10:46

more mature in that sense and I feel

play10:48

like there's technology on it that's

play10:49

like spinning up the the static websites

play10:52

pretty easily pretty cool but I don't

play10:55

know how many companies are going to be

play10:56

asking for this kind of work contribute

play10:58

to mature produ repositories now you

play11:00

might think that that's me that's

play11:01

criticizing my comment earlier but it's

play11:04

not if if when you look at this it's a

play11:06

mature like uh algebra library that

play11:10

they're trying to debug which is still

play11:11

pretty interesting but again

play11:13

self-contained in a very specific

play11:15

Library that's not like an average use

play11:18

case for software engineering or web

play11:20

development and how integrated

play11:22

everything works um it's a step in the

play11:25

right direction and it's exciting how it

play11:26

can start debugging like that and I feel

play11:28

like this is more a advancement of it

play11:30

being a tool of how to software

play11:32

Engineers can enhance how they work be

play11:35

good for like pair programming if you're

play11:37

with uh this Devon AI Devon can train

play11:39

and fine-tune its own AI models this is

play11:42

awesome this is this is promis for like

play11:44

the future when you start advancing your

play11:46

own code base and if you do have one of

play11:48

these agents inside of your business

play11:50

inside of your team then you're going to

play11:52

need this kind of skill to be able to

play11:54

like uh pivot and manipulate uh what it

play11:57

has learned so far so that it can be

play11:59

more productive in the future so this is

play12:00

like a this is a pretty solid

play12:02

foundational

play12:04

move and this is what they said earlier

play12:06

about uh giv real jobs and upwork and it

play12:08

can do those too yeah yeah some some of

play12:11

these upwork jobs are pretty

play12:12

straightforward um it's not

play12:15

crazy um I feel like that's something

play12:17

that people can already do today so that

play12:19

doesn't blow me away that much so what's

play12:21

my take what do I think about this I

play12:23

think this is a wild huge step in the

play12:25

right direction of having AI be someone

play12:28

that you can like work with but um I

play12:32

don't know maybe I'm just overly

play12:33

critical but there's a lot of things

play12:34

that I want to see it do and this is

play12:36

something that like people think that I

play12:38

uh you might think that I hate the the

play12:41

idea that I'm so much better than Ai No

play12:43

I want this technology to advance

play12:46

because there's so much about my job and

play12:47

the things that I do that I'd rather

play12:49

have a robot helping me going through it

play12:52

or even replacing some of the work that

play12:54

I'd like to do that's fine so that being

play12:57

said I want this to I want things like

play13:00

this to grow into like being real

play13:03

co-workers in a sense real AI co-workers

play13:05

but what I'm seeing here is something

play13:07

that I I don't think it's it's there yet

play13:10

the biggest steps that I'm that I want

play13:12

to see are it being able to go into

play13:15

Enterprise level like large large code

play13:18

bases that are not open- source that

play13:20

they're not trained on and seeing how

play13:22

well that you can train with your

play13:24

existing code base in how it can ingest

play13:26

I don't know like 30 applications and

play13:28

how they all interact with one another

play13:30

and then learn and be productive from

play13:32

then because that's half the battle of

play13:35

debugging or trying to advance new

play13:37

feature work is where it needs to go

play13:40

where the problem is and it's not just

play13:42

like a line of code it could be an

play13:44

entire application or it can be a

play13:46

different service or it can be a library

play13:48

or it could be a library that you don't

play13:50

own that you have to try to figure out

play13:52

uh how to work around the issue if the

play13:56

bug in that Library hasn't been solved

play13:58

in itself so there's so many different

play14:00

factors and I feel like Devon and all

play14:02

this AI is focusing on one part of it

play14:04

when we start getting into better ideas

play14:06

of like how it can holistically uh

play14:09

assess an entire system that's when

play14:11

things will start getting cool this

play14:13

Devon is wild this is pretty cool uh but

play14:16

it's not earth shattering yet uh I'm

play14:18

gonna keep working I don't think I'm

play14:20

going to lose my job very very soon but

play14:22

you know Famous Last Words we'll see so

play14:24

that's all about the announcement but

play14:26

also if you're really curious here's a

play14:27

video that I talk about more about how I

play14:29

don't think AI is going to replace us as

play14:31

software Engineers but also I want to

play14:33

start the conversation in the comments

play14:34

so let me know what you guys think of

play14:36

this advancement and what it's going to

play14:38

look like in the next year or two when

play14:40

we have this kind of Technology out here

play14:42

but thanks for watching guys I'll see

play14:43

you

play14:48

later

Rate This

5.0 / 5 (0 votes)

Related Tags
AI工程师Devon能力软件工程自动化编程安全顾虑开源项目技术进步编程挑战未来展望行业趋势
Do you need a summary in English?