World's First AGI Agent SHOCKS the Entire Industry! (FULLY Autonomous AI Software Engineer Devin)

AI Unleashed - The Coming Artificial Intelligence Revolution and Race to AGI
12 Mar 202424:00

Summary

TLDRCognition Labs introduces Devon, the world's first AI software engineer, capable of autonomously tackling complex engineering tasks. Devon demonstrates its abilities by benchmarking API performance, debugging, building websites, and even fine-tuning AI models. The AI's proficiency in using developer tools and learning from documentation showcases the potential for AI to revolutionize software engineering, offering a glimpse into a future where AI assistants like Devon could automate and enhance various aspects of the profession.

Takeaways

  • ๐Ÿš€ Introduction of Devon, the first AI software engineer, capable of performing complex tasks like a human engineer.
  • ๐Ÿ› ๏ธ Devon can create a step-by-step plan, build projects, and use tools such as a command line, code editor, and browser.
  • ๐Ÿ“š Devon autonomously learns by reading API documentation and other technical materials to solve problems.
  • ๐Ÿ’ก Devon has the ability to debug code by adding print statements and fixing bugs based on error logs.
  • ๐ŸŒ Devon can build and deploy fully styled websites, showcasing its capabilities in web development.
  • ๐Ÿ“ˆ Devon has successfully passed practical engineering interviews and completed real jobs, demonstrating its real-world applicability.
  • ๐Ÿค– The development of Devon represents significant advancements in AI reasoning, long-term planning, and autonomous task execution.
  • ๐ŸŽฅ A video from 6 months prior discussed the concept of autonomous AI agents running software businesses, which is now becoming a reality with Devon.
  • ๐Ÿ”ง Devon is equipped with common developer tools within a sandboxed computer environment, allowing it to perform tasks securely.
  • ๐Ÿ† Devon outperforms other AI models in benchmarks for resolving real-world GitHub issues, indicating its superior problem-solving skills.
  • ๐ŸŒŸ The potential future scenario where autonomous AI agents like Devon could run businesses, performing tasks and customer service without human intervention.

Q & A

  • What is Devon and what makes it unique?

    -Devon is the world's first fully autonomous AI software engineer developed by Cognition Labs. It is unique because it can perform complex engineering tasks, learn over time, and fix mistakes. Devon is equipped with common developer tools and can operate within a sandboxed computer environment, making it capable of end-to-end development and deployment of applications.

  • How does Devon tackle a problem?

    -Devon approaches a problem by first creating a step-by-step plan to tackle the issue. It then builds a project using the same tools a human software engineer would use. If it encounters an error, Devon adds debugging statements, reruns the code, and uses the error logs to fix the bug.

  • What are some real-world applications of Devon?

    -Devon has been used to complete real jobs on Upwork, fine-tune a 7B llama model, set up a computer vision model, and fix bugs in existing software. It has also been used to implement a game of life, improve user experience in an open-source tool, and autonomously learn from a blog post to generate a desktop background image.

  • How does Devon's performance compare to other AI models in solving real-world GitHub issues?

    -In a benchmark for resolving real-world GitHub issues, Devon achieved a 13.86% success rate, which is significantly higher than other models like GPT-4, making it around 7 times more effective than GPT-4 in this context.

  • What kind of support does Devon provide to human engineers?

    -Devon can assist human engineers by taking on tasks such as running commands, tracking their status, fixing bugs, writing test cases, and improving user experience in tools. This allows engineers to focus on more interesting problems and achieve more ambitious goals.

  • How does Devon's learning process work?

    -Devon learns by reading documentation, running code, and understanding the context of tasks. It can recall relevant context at every step and adapt its approach based on the information it gathers, allowing it to learn from its experiences and improve over time.

  • What is the potential impact of Devon on the software engineering field?

    -The introduction of Devon could revolutionize the software engineering field by automating complex tasks, reducing the time taken to solve problems, and enabling engineers to work on more innovative projects. It could also lead to the creation of new job roles that focus on managing and optimizing AI software engineers like Devon.

  • How does Devon handle versioning issues?

    -When faced with versioning issues, Devon updates the code to make it compatible with the required versions. It also uses tools like pip to manage dependencies and ensure that the project runs smoothly.

  • What is the significance of Devon's ability to use a browser?

    -Devon's ability to use a browser is significant as it allows it to access API documentation, learn how to integrate with various APIs, and gather information from the internet to assist in problem-solving and project development.

  • How does Devon's deployment of a website showcase its capabilities?

    -Devon's deployment of a website with full styling demonstrates its ability to not only code but also to create visually appealing and functional end-products. It shows that Devon can understand design requirements, implement them in code, and deploy the final product, just like a human developer.

  • What is the future potential of autonomous AI agents like Devon?

    -The future potential of autonomous AI agents like Devon is vast. They could lead to the automation of various aspects of business operations, from customer service to product development. As they become more advanced, they could potentially run entire businesses, allowing humans to focus on higher-level tasks and innovation.

Outlines

00:00

๐Ÿค– Introduction of Devon, the AI Software Engineer

The video introduces Devon, the first AI software engineer developed by Cognition Labs. Scott, from Cognition AI, showcases Devon's capabilities by asking it to benchmark the performance of a llama and different API providers. Devon demonstrates its problem-solving skills by creating a step-by-step plan, using tools like a command line, code editor, and browser to tackle the task. It encounters an error, adds a debugging statement, and fixes the bug using logs. The video emphasizes the advancements in AI's reasoning and long-term planning, and invites viewers to try Devon for real-world tasks.

05:01

๐Ÿ› ๏ธ Devon's Problem-Solving and Debugging Skills

The paragraph details Devon's ability to handle complex tasks such as setting up a computer vision model for a client on Upwork. Despite encountering versioning issues, Devon updates the code to resolve them and proceeds to load and import packages, downloading images to run the model. Devon also performs print line debugging to understand data flow and corrects the code accordingly. The paragraph highlights Devon's persistence and problem-solving approach, ultimately delivering a report with sample images and a detailed explanation of its work.

10:03

๐ŸŽฎ Devon Assists in Game Development and AI Training

Devon's versatility is showcased as it assists in implementing the game of life and fine-tuning a 7B llama model. For the game, Devon creates a React application, writes code, and deploys it, making adjustments based on user feedback. In the AI training task, Devon fine-tunes a large language model, overcoming CUDA issues, and successfully running the training job. The paragraph emphasizes Devon's ability to learn and adapt to new tasks, such as understanding and applying a fine-tuning method to a language model.

15:06

๐Ÿ”ง Devon Fixes Bugs and Improves User Experience

The paragraph describes Devon's role in fixing bugs and improving user experience for various projects. It helps enhance an open-source tool's UX by understanding the code and making necessary changes. Devon also assists in debugging a Python algebra system, identifying and fixing an issue with log calculations. Another engineer shares his experience with Devon, who helps in writing and expanding test cases for an open-source repository, ultimately finding and fixing a bug. The paragraph highlights Devon's ability to understand and manipulate code, debug issues, and enhance software quality.

20:08

๐Ÿš€ Devon's Autonomous Learning and Business Potential

The video script ends with a scenario where Devon autonomously learns from a blog post to generate a desktop background image. It also discusses the potential of Devon as an AI software engineer in a business context, where it could handle all aspects of a custom automation business, from customer interaction and solution brainstorming to software development, part ordering, and customer service. The scenario illustrates the potential for AI to revolutionize business operations, requiring minimal human intervention and leading to significant efficiency and profit.

Mindmap

Keywords

๐Ÿ’กDevon

Devon is the first AI software engineer introduced by Cognition Labs. It is designed to autonomously perform complex software engineering tasks, such as benchmarking, debugging, and deploying websites. In the video, Devon is showcased as a capable assistant that can tackle real-world programming challenges, learn from documentation, and even complete jobs on platforms like Upwork. The AI's ability to use tools such as a shell, code editor, and web browser demonstrates its advanced capabilities in software development.

๐Ÿ’กAI Software Engineer

An AI software engineer refers to an artificial intelligence system, like Devon, that is capable of performing the tasks typically associated with a human software engineer. This includes coding, debugging, project management, and even learning new technologies or programming languages. The concept is significant because it represents a leap in AI capabilities, suggesting that AI can now operate with a level of autonomy and complexity previously reserved for human professionals.

๐Ÿ’กBenchmark

Benchmarking is the process of evaluating the performance of a system or component by comparing it to a standard or other similar systems. In the context of the video, Devon is shown benchmarking the performance of a llama and different API providers, which involves testing and analyzing their efficiency and effectiveness. This is crucial for ensuring that the software or systems being developed meet the required performance standards and for identifying areas of improvement.

๐Ÿ’กAPI Providers

API, or Application Programming Interface, providers are entities that supply the protocols and tools for building software applications. They allow different software to communicate with each other. In the video, Devon interacts with various API providers to understand their documentation and integrate with their services, which is a common task for software engineers when developing applications that rely on external data or services.

๐Ÿ’กDebugging

Debugging is the process of identifying and fixing errors or bugs in a computer program. It is a critical part of software development and maintenance. In the context of the video, Devon demonstrates its debugging skills by adding print statements to track data flow and identify issues in the code. This ability is essential for AI software engineers to ensure that the code they produce is error-free and functions as intended.

๐Ÿ’กUpwork

Upwork is a global freelancing platform where businesses and individuals can find independent contractors for various projects, including software development. In the video, Devon's capabilities as an AI software engineer are highlighted by its successful completion of real jobs on Upwork, demonstrating its practical application in the freelance market.

๐Ÿ’กAutonomous AI Agents

Autonomous AI agents are artificial intelligence systems that can operate independently, making decisions and executing tasks without continuous human intervention. In the video, the concept of autonomous AI agents is discussed in the context of Devon's capabilities, suggesting a future where AI can run businesses, satisfy customer orders, and create scripts autonomously.

๐Ÿ’กLong-term Planning

Long-term planning refers to the ability to strategize and make decisions that take into account future outcomes and goals. For AI software engineers like Devon, this involves the capacity to plan and execute complex engineering tasks that require a series of decisions and actions over an extended period. This is a critical skill for managing projects and ensuring that they are completed successfully and efficiently.

๐Ÿ’กCode Editor

A code editor is a software application used for writing and modifying computer code. It typically offers features like syntax highlighting, auto-indentation, and code suggestions to aid programmers. In the context of the video, Devon utilizes its own code editor to write and debug code, which is a standard tool for software engineers and a testament to Devon's capabilities as an AI software engineer.

๐Ÿ’กSandboxed Environment

A sandboxed environment is a secure and isolated space within a computer system where programs can run without affecting the host system. It is used to test and develop software safely, as any changes or potential security issues are contained within the sandbox. In the video, Devon operates within a sandboxed computer environment, equipped with common developer tools, which allows it to perform software engineering tasks without risking the stability or security of the host system.

๐Ÿ’กGitHub Issues

GitHub Issues is a feature of the GitHub platform that allows users to track and manage tasks, enhancements, and bugs for their projects. It is a collaborative tool that facilitates communication between developers and helps in organizing and prioritizing work. In the context of the video, Devon's ability to resolve real-world GitHub issues demonstrates its practical application in software development and its capacity to interact with widely used development tools and platforms.

Highlights

Cognition Labs introduces Devon, the world's first AI software engineer, capable of performing complex engineering tasks.

Devon can benchmark performance, create step-by-step plans, and build projects using tools like a human software engineer.

Devon has its own command line, code editor, and browser to execute tasks and learn from API documentation.

The AI software engineer can debug code by adding print statements and fixing errors based on logs.

Devon can build and deploy fully styled websites, showcasing its capabilities in web development.

Advancements in reasoning and long-term planning have made it possible for AI like Devon to tackle complex problems.

Devon has successfully passed practical engineering interviews and completed real jobs on platforms like Upwork.

The AI agent autonomously solves engineering tasks, including setting up computer vision models and fine-tuning large language models.

Devon demonstrates the ability to learn from blog posts and apply the knowledge to generate desktop background images.

Cognition AI's development of Devon represents a significant leap in AI capabilities, surpassing current tools available to the general public.

Devon's performance on benchmarks shows it is seven times more effective than GPT-4 at resolving real-world GitHub issues.

The AI software engineer can autonomously learn and improve code, fixing bugs and edge cases not covered in the original documentation.

Engineering teams can achieve more ambitious goals with Devon's assistance, as it can handle complex tasks requiring thousands of decisions.

Devon is equipped with common developer tools within a sandboxed computer environment, simulating a human developer's workflow.

Cognition Labs' creation of Devon signifies rapid progress in AI, with capabilities beyond what was anticipated just six months prior.

The potential applications of Devon include running businesses autonomously, providing custom automation solutions, and handling customer service.

Devon's ability to learn and apply new skills suggests a future where AI agents could become commonplace in various industries.

Cognition Labs has raised $21 million in series A funding led by Founders Fund to further develop Devon's capabilities.

Transcripts

play00:00

so cognition Labs just drops this Devon

play00:03

the first AI software

play00:07

engineer hey I'm Scott from cognition Ai

play00:10

and today I'm really excited to

play00:12

introduce you to Devon the first AI

play00:14

software engineer let me show you an

play00:16

example of Devon in

play00:19

action I'm going to ask Devon to

play00:20

Benchmark the performance of llama and a

play00:22

couple different API

play00:23

providers from now on Devon is in the

play00:26

driver's

play00:27

seat first Deon makes a step-by-step

play00:29

plan of how to tackle the

play00:32

problem after that it builds a whole

play00:34

project using all the same tools that a

play00:35

human software engineer would use Devon

play00:38

has its own command

play00:41

line its own code

play00:45

editor and even its own

play00:47

browser in this case Devon decides to

play00:49

use the browser to pull up API

play00:51

documentation so that it can read up and

play00:53

learn how to plug into each of these

play00:57

apis here Deon runs into an unexp

play01:01

[Music]

play01:04

error Deon actually decides to add a

play01:06

debugging print

play01:09

statement reruns the code with the

play01:11

debugging print statement and then uses

play01:13

the error in the logs to figure out how

play01:15

to fix the

play01:19

bug finally Devon decides to build and

play01:22

deploy a website with full styling as

play01:24

the

play01:26

visualization you can see the website

play01:28

here

play01:30

all of this is possible today because of

play01:32

the advancements that we've made in both

play01:33

reasoning and long-term planning it's a

play01:35

really hard problem and we've only just

play01:37

started but we're super excited about

play01:39

the progress that we've made so

play01:41

far in the meantime if you'd like to try

play01:43

out Devon on your own real world tasks

play01:46

send us a request below and we'd be

play01:47

happy to forward it to Devon about 6

play01:49

months ago I made this video about

play01:51

autonomous AI agents now this was before

play01:55

that terminology was quite as used you

play01:57

we're hearing it more now back then it

play01:59

was isn't quite as common so this was 6

play02:02

months ago August 27th 2023 I'll play a

play02:05

clip from this towards the end of the

play02:06

video but the question I was asking that

play02:08

video was how far away are we from a

play02:11

situation where basically an autonomous

play02:13

AI agent would be able to kind of run a

play02:16

software business on your behalf satisfy

play02:18

customers orders create little scripts

play02:20

for them send those out Etc and you know

play02:23

that video is very well received a few

play02:25

people thought that's never going to

play02:27

happen that's a million years away and

play02:29

some people gave their kind of timelines

play02:30

for that but I think overall a lot of

play02:32

people said how Blown Away they were by

play02:35

this concept of having an autonomous

play02:38

worker that's able to generate money for

play02:41

you kind of a big concept to think about

play02:43

on many different levels not just for

play02:45

yourself but how would having access to

play02:47

something like that change the world

play02:49

it's been about 6 months that was 6

play02:52

months ago keep that keep that in mind

play02:54

people were super interested some small

play02:56

percentage was calling BS that was 6

play02:58

months ago and this this is today today

play03:00

we're announced to excite I can't even

play03:02

read right now okay Wes calm down and

play03:04

try that again today we're excited to

play03:06

announce Devon the first AI software

play03:09

engineer so Devon has successfully

play03:11

passed practical engineering interviews

play03:13

from leading AI companies and has even

play03:15

completed real jobs on upwork so it does

play03:18

the job interview it completes the jobs

play03:21

demon is an autonomous agent that solves

play03:23

engineering tasks through the use of its

play03:25

own Shell Code editor and web browser so

play03:29

let me do this so this thing isn't out

play03:31

yet for everyone I am I am spamming the

play03:35

crap out of anyone that can uh get me

play03:38

access so I'm really hoping to have

play03:40

access to this thing ASAP big props to

play03:42

David Andre for breaking the story

play03:44

that's where I saw it first the title

play03:46

said world's first AGI agent yes this is

play03:49

real I was like no it's not and I

play03:51

clicked it anyways and yeah it kind of

play03:54

seems like it just might be let me know

play03:55

in the comments what you want to see

play03:57

this thing do and watch these videos

play03:59

that showcase its

play04:01

abilities hey I'm Walden one of the

play04:03

developers here at cognition AI we were

play04:06

playing around with whether or not Devon

play04:08

could start a side hustle on upwork so

play04:10

here's actual real job from upwork where

play04:12

the client wants to set up this computer

play04:13

vision model which actually looks quite

play04:16

interesting seems very difficult to set

play04:18

up um I'm not sure how I would start

play04:20

doing this but you know you give the

play04:22

task to Devon and ask Devon to figure it

play04:26

out and things just kick off Devon

play04:29

immediately

play04:30

goes ahead and you can see it sort of

play04:32

starts setting up the repo it actually

play04:34

runs into some issues here with the

play04:36

versioning so if you watch how Devon

play04:38

deals with

play04:40

it deon's actually updating the code to

play04:42

make these things

play04:44

work he continues with this loading and

play04:48

importing packages you can see that

play04:50

actually downloads images from the

play04:51

internet to run through the

play04:53

model

play04:55

but you can see here that there are

play04:57

actually some issues that come across

play05:00

however Devon knows how to handle these

play05:03

things Devon kind of pushes through and

play05:07

if you look closely Devon's actually

play05:09

doing print line debugging

play05:11

here where Devon is adding these

play05:15

statements to track where the data

play05:17

flows and Devon continues to do this

play05:20

until Devon understands how everything's

play05:22

working and actually then updates the

play05:25

code with the fixes

play05:28

after removing print line

play05:31

statements Deon continues this pattern

play05:33

of fixing code and running it again

play05:36

until it runs the image model across all

play05:39

these roads across the world and we can

play05:42

ask for a report from

play05:44

Devon at which point Devon sends over

play05:47

some sample images of roads with damage

play05:51

marked

play05:52

out and a nice txt file explaining

play05:56

Devon's work and the different kinds of

play05:59

outputs of the model good job

play06:03

de hi I'm Adon and today I felt like

play06:06

playing the game of life so I asked Deon

play06:08

to implement it for

play06:09

me Deon started by creating a new react

play06:13

application using the Shell and then it

play06:15

started writing some code through its

play06:18

editor after that it deployed the app

play06:21

through netlify let's check it

play06:25

up that seems nice um but there's a lot

play06:28

more features which I want to add

play06:30

so let's ask Deon to do this one at a

play06:35

time I want the words Dev to be written

play06:38

at the initialization screen instead of

play06:40

it being

play06:42

random then I want the word to be

play06:44

slightly bigger and the frame rate to be

play06:49

faster I also want him to fix a bug

play06:51

where the screen gets freezed after 3

play06:56

seconds let's see the progress dev has

play06:58

made so far

play07:00

you can see the diff and um the last

play07:03

diff shows that Devon just fixed the bug

play07:06

uh where the screen gets frozen after 3

play07:09

seconds the seems reasonable to me so

play07:11

let's move

play07:15

on next I want Deon to increase the

play07:18

frame rate after 10

play07:20

seconds and also to make the website

play07:23

responsive to different window

play07:26

sizes also wanted to make it interactive

play07:29

so that when when I click my mouse

play07:30

somewhere it should spawn a new

play07:35

block let's check out what Deon has made

play07:37

so

play07:40

far started with de which is what we

play07:43

asked for and when I click something it

play07:46

creates a new block as

play07:48

well that's

play07:50

fun um let's play around with

play07:56

it well that goes my evening

play08:02

hey guys today I'm going to show you an

play08:05

AI training in

play08:07

AI so here we're going to take the Cur

play08:10

repo which is a fine-tuning method for

play08:14

quantizing large language models we're

play08:17

going to feed this repo to our agent

play08:21

Devin and all we have to ask Devon is to

play08:24

fine-tune a 7B llama model Devon clones

play08:28

the repo

play08:30

figures out how to run it using the

play08:33

readme sets up all of the requirements

play08:36

using

play08:39

pip looks through all the

play08:42

scripts and is able to start running the

play08:45

training job there are a few hiccups

play08:48

where Devon runs into some Cuda issues

play08:53

which is to be expected with open source

play08:54

repos but it's not a problem Devon looks

play08:57

at the Nvidia environment

play09:00

and figures out how to reinstall the

play09:02

packages to make it

play09:05

work after a few more runs figure out

play09:09

the correct model

play09:10

names Devon successfully gets the

play09:13

training run

play09:16

working here we see training proceeding

play09:19

smoothly loss is going

play09:21

down and

play09:24

after few steps looks pretty good I tell

play09:27

Devon to wait as the training job

play09:33

runs after about an hour I come back ask

play09:38

Devon hey how's the training going Devon

play09:41

helps me look a few hundred steps are

play09:43

done now and everything is still

play09:45

proceeding

play09:46

smoothly looks great thanks Devon for

play09:49

helping me set up my training

play09:53

run hey I'm Tony an engineer cognition I

play09:58

helped build Devon and now Deon helps me

play10:00

too today at work I wanted to run a

play10:03

bunch of commands at once and be able to

play10:05

track their status on one screen I found

play10:07

an open source tool named impro to do

play10:09

this here it is right here looks like it

play10:12

all finished but the status is way too

play10:14

vague I don't know which ones failed

play10:16

they all just say

play10:18

down I really want to improve the ux

play10:21

here but I'm not familiar with the code

play10:22

at all so I had Devin my AI software

play10:26

engineer help me looks like this person

play10:28

right here had the same issue so all I

play10:31

gave Devon was the link to the issue and

play10:34

asked Deon to fix it you can see me make

play10:36

the request right here on the left let's

play10:39

see what Devon did on the right we can

play10:41

track deon's work and watch Devon jump

play10:43

from tool to Tool first Deon Clon the

play10:46

repository using the Shell then reads

play10:49

the read me and an Editor to learn how

play10:50

to sub the

play10:51

code then goes back to the Shell to

play10:54

install the required

play10:56

dependencies Devon also opens up a web

play10:58

browser

play11:00

to take a look at the

play11:02

issue now Devon starts

play11:07

coding at some point Devon even opens up

play11:10

some Rust documentation to debug a

play11:12

compiler

play11:14

error finally Deon finishes the task and

play11:17

reports a summary of the changes that

play11:19

were

play11:19

made let's see the changes work I have

play11:23

deon's code right

play11:26

here looks like it worked the third

play11:28

command succeeded I can even see the

play11:31

status

play11:35

codes here's all the code that Devon

play11:37

wrote for this

play11:40

change thanks

play11:42

Devon hey I'm Neil and I wanted to show

play11:45

you an example of Devon our AI software

play11:49

engineer helping me fix a bug so I've

play11:52

been using this repo called Senpai

play11:55

Senpai is an algebra system written in

play11:57

Python and I noticed this issue where

play11:59

when you take the log of a fraction you

play12:02

get Zoo which is a type of

play12:05

infinity so that's definitely wrong but

play12:09

instead of trying to figure this out

play12:11

myself I just asked Deon to take a

play12:13

look Devon immediately jumps in sets up

play12:16

the repo and is able to reproduce that

play12:20

same Zoo

play12:21

output Devon then figures out the right

play12:24

part of the code and adds print

play12:27

statements um in order to figure out

play12:29

what the cause of this issue

play12:31

is

play12:33

and we can see here that the cause is

play12:38

that integer division leads to a zero

play12:40

and then we take the log of zero So

play12:43

based on that Devon's able to fix the

play12:45

issue in The Code by replacing that

play12:47

integer division with true Division and

play12:50

then cleans up the debug

play12:52

output and verifies that the result is

play12:56

what we want and then Devon even runs

play12:59

the test and the repo as well to make

play13:01

sure nothing else is broken so that was

play13:04

great um saved me a ton of time so thank

play13:07

you

play13:10

Dev hey I'm Andrew an engineer at

play13:13

cognition and I wanted to share a pretty

play13:15

amazing experience I had with Devon so I

play13:18

maintained this big open source

play13:20

repository uh which contains a lot of

play13:22

different algorithms uh used for

play13:24

competitive programming a lot of people

play13:26

use it and a few weeks ago uh my friend

play13:28

texted me that you know there was

play13:30

actually a bug in one of the in one of

play13:32

the implementations uh the

play13:33

implementation wasn't quite right when

play13:35

the inputs weren't uh weren't relatively

play13:37

prime I kind of glossed over that case

play13:40

when I was implementing it so I never

play13:41

really thought about it so I implemented

play13:43

a quick fix and then I thought that I

play13:45

should test it but I actually never

play13:47

really got around to writing any test

play13:49

cases so I thought if I don't want to do

play13:51

it I should just ask Devon to do it

play13:53

instead so I gave Deon the repository

play13:56

asked uh asked Deon to just check it out

play13:58

and start working on it uh so Deon you

play14:01

know found the right repository checked

play14:03

it out you know found all files that are

play14:05

in the repository and then I told Devon

play14:08

what test case I wanted him to

play14:11

write uh I just told Deon you know these

play14:14

are the inputs and then try checking for

play14:16

these conditions for me so Devon wrote

play14:19

the test without too much trouble uh it

play14:21

was Devon just looked around to

play14:23

understand what exactly uh what exactly

play14:25

the test should look like and what

play14:27

exactly the interfaces were and with

play14:29

this Devon ran the tests ran into a

play14:32

quick hiccup which was a compile ER but

play14:34

Devon is able to solve those very

play14:35

effectively and just added an extra

play14:37

include to fix that and then uh was done

play14:41

writing this initial test so then I

play14:44

asked de to actually expand the test a

play14:46

little bit instead of just testing this

play14:47

one input I wanted Deon to write test it

play14:50

on all inputs so kind of the Brute Force

play14:52

testing strategy I use this a lot in my

play14:55

test and I just wanted Devon to

play14:56

implement it so that I didn't have to

play14:57

worry about it so de went and rewrote

play15:00

the test function to use four n for

play15:03

Loops but this time after Deon ran the

play15:05

tests Devon actually found a test

play15:09

failure now you know if the code were

play15:11

correct there could be compilers in the

play15:13

test but you know the test seemed really

play15:15

pretty reasonable so there probably

play15:17

shouldn't be a failure so Devon went and

play15:19

tried to debug the program for me so

play15:22

Devon here actually wrote uh actually

play15:25

added a print statement to debug the

play15:27

outputs uh and the inputs to the failing

play15:30

test reran the tests and actually found

play15:33

which case was wrong in this case these

play15:36

are the inputs and then the return value

play15:38

was actually 9 uh and the the code I'm

play15:42

running actually should never really

play15:43

return negative values so Deon realized

play15:45

this and actually went looking in the uh

play15:49

went looking in the code that we're

play15:51

trying to test and actually added this

play15:53

line of code that if extra less than

play15:55

zero extra plus equals you know plus

play15:57

equals something and in order to make

play16:00

sure that the return value was actually

play16:02

non-

play16:03

negative so after fixing this Devon

play16:07

actually reran the tests and now uh now

play16:10

I can be confident that my code is

play16:12

correct and I have some tests to prove

play16:13

it thanks

play16:17

Deon hey everyone my name is Sarah and

play16:21

I'm going to show you how Devon our AI

play16:23

software engineer can autonomously learn

play16:25

from a blog post within a few minutes

play16:27

Devon successfully generated this

play16:29

desktop background image for me with my

play16:31

name on it so all I had to do was send

play16:33

this blog post in a message to Devon

play16:36

from there Devon actually does all the

play16:37

work for me starting with reading this

play16:39

blog post and figuring out how to run

play16:42

the

play16:43

code in a couple minutes Devon's

play16:46

actually made a lot of progress and if

play16:48

we jump to the middle here you can see

play16:51

that Devon's been able to find and fix

play16:53

some edge cases and bugs that the blog

play16:56

post did not cover for me and if we jump

play16:58

to the end we can see that Devon uh

play17:01

sends me the final result which I love I

play17:04

also got two bonus images uh here and

play17:07

here so uh let me know if you guys see

play17:11

anything hidden in these so this is

play17:13

cognition Labs website so that you've

play17:16

raised a 21 million series a led by

play17:18

Founders fund meet Devon the world's

play17:20

first fully autonomous AI software

play17:22

engineer Deon is a tireless skilled

play17:24

teammate equally ready to build

play17:25

alongside you or independently complete

play17:28

tasks for you review with Devon

play17:30

Engineers can focus on more interesting

play17:31

problems and Engineering teams can

play17:33

strive for more ambitious goals with our

play17:36

advances and long-term reasoning and

play17:38

planning Devon can plan and execute

play17:40

complex engineering tasks requiring

play17:42

thousands of decisions Devon can recall

play17:44

relevant context at every step learn

play17:46

over time and fix mistakes we've also

play17:49

equipped down of common developer tools

play17:51

including the Shell Code editor and

play17:53

browser within a sandboxed computer

play17:55

environment everything a human would

play17:56

need to do their work what I find really

play17:58

interesting about this stuff is

play18:00

generally when you're building tools

play18:02

aimed at developers um I mean generally

play18:05

speaking you have less restrictions

play18:08

versus something that's aimed at sort of

play18:10

everybody so from what we've seen so far

play18:12

because this seems much more powerful

play18:14

than anything else that's available

play18:17

right now to the general public being

play18:19

able to learn unfamiliar technology so

play18:22

like giving it a blog post and then it

play18:24

knows how to do stuff I mean that was

play18:26

the promise that's kind of what we

play18:28

expected these things to be able to do

play18:30

we've seen some of that like that in

play18:32

context learning by large language

play18:35

models we've seen it but this seems Next

play18:38

Level certainly build and deploy apps

play18:40

end to endend now on this channel we've

play18:42

shown for example chat Dev and autogen

play18:45

and stuff like that so this isn't new

play18:48

but just everything about this seems

play18:50

smoother faster more intuitive and more

play18:54

what is even the word for this more

play18:56

agentic like it just seems smarter and

play18:58

more competent you tell it what to do

play19:00

and it goes and does that so thewe bench

play19:04

is a benchmark for you know can language

play19:06

models resolve real world GitHub issues

play19:09

and so here's kind of the results that

play19:11

they have right here with Claude 2 GPT 4

play19:14

Etc assisted and unassisted so looks

play19:17

like yeah Cloud 2 4.8 unassisted GPT 4

play19:21

1.74% unassisted and so Devon just

play19:23

completely knocks it out of park so

play19:25

that's

play19:27

13.86% 13 Point let's say just call it

play19:30

14% so what is that 7x better than GPT 4

play19:34

looks like Cloud 3 hasn't been tested

play19:36

yet is isn't on The Benchmark quite yet

play19:38

so I am tirelessly trying to wrestle up

play19:40

the beta access to this thing and see it

play19:43

kind of in effect for real but let me

play19:45

play you that clip that I was talking

play19:47

about from this channel that you can see

play19:49

here on this channel 6 months ago back

play19:51

then I was asking how long until we have

play19:54

something like this is it a year 5 years

play19:56

10 years 20 and I don't know if this

play19:59

thing will exactly hit all those

play20:00

criteria it might but the rate of

play20:02

progress here is quite astonishing from

play20:05

6 months ago to to today we can't even

play20:07

make babies that fast imagine this if

play20:10

you can you wake up in the morning you

play20:11

have your coffee or whatever other

play20:13

stimulant gets you going you turn on

play20:15

your computer and pops up your very own

play20:18

autonomous AI agent you named him

play20:20

goalgetter or GG for short that was very

play20:24

clever of you Gigi reports to you what

play20:26

it's been doing while you were sleeping

play20:28

and it's been very busy you see you

play20:30

created GG to run a business for you the

play20:34

business idea is simple millions of

play20:36

people in the world are looking for

play20:37

simple inexpensive custom automation

play20:40

Solutions some want a little script that

play20:42

helps them automate their home some want

play20:44

a morning routine automation where the

play20:46

alarm rings their curtains open the

play20:48

coffee maker turns on Etc some want

play20:50

their emails answered and sorted in very

play20:53

specific ways some want a thing that

play20:55

feeds their cat automatically other

play20:57

people want solutions for their business

play20:58

that are a little bit more advanced

play20:59

requires cameras and tracking Etc but

play21:01

the point is that people are lazy people

play21:04

are busy and you have developed quite a

play21:06

reputation for delivering smart and

play21:08

inexpensive custom automation Solutions

play21:11

one-of-a-kind automation Solutions

play21:13

people go on your website they type in

play21:15

what they want done they don't even have

play21:16

to know exactly what they want they just

play21:18

type in something like my stupid cat

play21:20

keeps running away now you respond

play21:22

within a minute or two with an exact

play21:24

plan of how to fix that problem and

play21:26

multiple choices with different price

play21:28

options

play21:29

depending on how fancy they want to get

play21:31

starting from a simple tracker for your

play21:32

cat to an army of drones folling it

play21:34

around everywhere it goes they choose

play21:36

whatever option they want and they

play21:38

provide a credit card for you to build

play21:39

once the automation is done you type a

play21:41

script doing the thing that they want

play21:43

maybe you have to order some parts like

play21:45

trackers or video cameras and have it

play21:47

shipped to their house and then you

play21:48

record a quick voice over with visuals

play21:50

on how to set it all up you're kind of

play21:53

like the Ikea of automations the

play21:55

instructions are great they're brain

play21:57

dead simple you made sure the

play21:59

documentation can be understood by

play22:00

anyone regardless of their Tech

play22:02

background if they can use a smartphone

play22:04

they can set up the thing that you've

play22:05

made for them if they run into any

play22:07

issues you provide unlimited support

play22:08

through text emails or whatever they

play22:10

want once you send everything over to

play22:12

them all the instructions all the

play22:13

details Etc you charge their credit card

play22:16

and you're done but here's the key you

play22:19

didn't do any of that you were sleeping

play22:21

the whole time remember everything was

play22:23

done by GG it talk to the customers and

play22:26

figure out what they wanted it

play22:27

brainstorm some Solutions told them

play22:29

where to put their credit card

play22:30

information Etc then it went to work

play22:32

building needed software ordering the

play22:34

parts that were needed and writing out

play22:36

very clear instructions for how to use

play22:39

the thing that it just built it shipped

play22:41

the product either digitally or the

play22:43

physical components that I needed it

play22:44

handled the customer service it made

play22:46

sure that everybody was happy Etc

play22:48

recently you've been teaching it how to

play22:49

do marketing campaigns and you're seeing

play22:51

some great results of that it learned to

play22:54

hug people in with little free

play22:55

automations then it goes online to find

play22:57

whatever information about them that it

play22:59

can and then starts pitching them ideas

play23:01

that it thinks they will respond to it

play23:04

produces content showing how their lives

play23:06

could be improved by these various

play23:09

Solutions custom made tailor made to

play23:11

them now your customers are ecstatic and

play23:14

you are basically printing money you

play23:16

don't do any work to run the business

play23:19

zero the only work you do is on

play23:21

improving the agent the autonomous AI

play23:23

agent that is running this whole thing

play23:25

you work on optimizing it on teaching it

play23:27

new skills

play23:29

now you are aware that eventually people

play23:30

will catch on and these agents will be

play23:32

more commonplace and more available to

play23:35

everyone but until then you basically

play23:37

found an unlimited money glitch GG so my

play23:41

question for you is when will this

play23:42

scenario likely play out never is it

play23:45

just science fiction is it possible 10

play23:47

years in the future 50 what do you think

play23:50

what number of years will pass before a

play23:52

handful of people have something like

play23:54

this

play23:57

running

Rate This
โ˜…
โ˜…
โ˜…
โ˜…
โ˜…

5.0 / 5 (0 votes)

Related Tags
AI EngineeringDevon AIAutonomous CodingTech InnovationSoftware DevelopmentDebugging AIAI LearningUpwork JobsCognition AIFuture Tech