How Developers might stop worrying about AI taking software jobs and Learn to Profit from LLMs

Internet of Bugs
6 May 202412:22

Summary

TLDRThe video script critiques the hype surrounding general artificial intelligence (AGI), likening it to the 'Underpants Gnomes' from the 1990s, where the end goal is assumed but the path is unclear. It argues that despite advances in large language models (LLMs), we are still far from understanding human-level intelligence, which is akin to the complexity of the human brain. The script suggests that the current AGI landscape may be reaching a plateau, with diminishing returns on investment in computational power. It posits that instead of waiting for AGI, developers should focus on leveraging existing LLMs to solve real-world problems, hinting at a potential phase two of AI application that could be economically valuable.

Takeaways

  • ๐Ÿง The 'Underpants Gnomes' analogy is used to describe the lack of a clear path to achieving General Artificial Intelligence (AGI), suggesting that like the 1990s startup culture, there's a disconnect between current efforts and the desired outcome.
  • ๐Ÿง  The human brain is considered the most complex system known to humans, and our current Large Language Models (LLMs) are significantly simpler and less capable of generating human-level intelligence.
  • ๐Ÿ”„ LLMs cannot incorporate feedback continuously like the human brain, which is a key limitation in their ability to achieve AGI.
  • ๐Ÿ›‘ The speaker argues against the hype of AGI being imminent, suggesting that we are still clueless about how to reach that level of intelligence with current technology.
  • ๐Ÿ“ˆ The script challenges the idea of exponential growth in AI capabilities, explaining that real-world growth is always limited by resources and cannot continue indefinitely.
  • ๐Ÿ“Š Evidence suggests that LLMs may be reaching a point of diminishing returns, where more resources do not proportionally improve performance, hinting at potential resource constraints.
  • ๐Ÿ“š The 'Chinchilla' experiment by Google indicates that there is an optimal ratio of data, tokens, and parameters for training models, beyond which additional compute is wasted.
  • ๐Ÿ—‘๏ธ High-quality data for training LLMs may be running out or has already, which could be a significant limiting factor for the growth of AI capabilities.
  • ๐Ÿ” The AI index report and other studies point to a crisis of data quality, with phenomena like 'Model Collapse' reducing the effectiveness of training data.
  • ๐Ÿ’ป For developers, the implications are that the growth of LLMs is likely to slow, making it a more stable foundation for building software products and solutions.
  • ๐Ÿ”ฎ The speaker predicts a future where developers will wrap non-AI functionality around LLMs to specialize them for specific business use cases, similar to the app development boom post-2008.

Q & A

  • What is the 'Underpants Gnomes' analogy in relation to startup culture and AI landscape?

    -The 'Underpants Gnomes' analogy refers to a critique from the 1990s suggesting that many startups claimed to know where they were going but actually had no clear plan. The speaker compares this to the current AI landscape, where there is a belief that general artificial intelligence (AGI) is just around the corner, despite a lack of understanding of how to achieve it.

  • Why does the speaker compare LLMs (Large Language Models) to the 'Underpants Gnomes'?

    -The speaker compares LLMs to the 'Underpants Gnomes' because, similar to the startup culture critique, there is a belief that LLMs will eventually lead to human-level intelligence (AGI) without a clear understanding of how to get there. The speaker expresses skepticism about the simplicity of LLMs being able to create human-level intelligence.

  • What is the speaker's view on the current state of LLMs in comparison to the human brain?

    -The speaker believes that despite the advancements, our current LLMs are far simpler than the human brain, which is arguably the most complex system known to humans. The speaker also mentions that the human brain can incorporate feedback continuously, unlike LLMs, which have their networks frozen at the time of training.

  • What is the significance of the 'exponential growth' concept in the context of AI development?

    -The speaker argues against the unchallenged use of the term 'exponential growth' in AI development, stating that exponential growth is a theoretical construct and cannot occur indefinitely in the real world due to finite resources. The speaker suggests that the growth of AI models will eventually face limitations.

  • What evidence does the speaker present to suggest that AI models might be reaching a point of diminishing returns?

    -The speaker cites a study that shows improvements on the multi-task language understanding benchmark have been linear, not exponential, since mid-2019. Additionally, the speaker refers to the Chinchilla experiment, which found a sweet spot for model training beyond which increasing compute does not improve functionality.

  • What is the 'Chinchilla' experiment, and what does it imply for AI model training?

    -The 'Chinchilla' experiment conducted by Google found an optimal ratio between the amount of data a model is trained on, the number of tokens, and the number of parameters it has. Beyond this ratio, increasing compute on the same size dataset does not improve functionality but instead wastes resources.

  • What does the speaker suggest about the future of high-quality data for AI models?

    -The speaker suggests that high-quality data might be running out or has already, as indicated by a paper from Epoch AI estimating that we could run out of high-quality language stock in the next year. This could be a limiting factor for AI model growth.

  • How does the speaker view the impact of AI on code generation and its quality?

    -The speaker mentions that AI-generated code has a higher likelihood of being reverted or rewritten within the first two weeks, indicating a 'code churn' problem. This suggests that while AI can help generate code faster, it may not be suitable for long-term maintenance.

  • What is the speaker's perspective on the economic value and profitability of AI in the current cycle?

    -The speaker believes that for the current AI cycle, there might be a 'step 2' that provides economic value and generates profits, similar to how e-commerce and internet advertising emerged from the dot-com bubble. However, this is not for AGI but for the practical applications of current LLM technology.

  • What advice does the speaker give to software developers and companies regarding AGI hype?

    -The speaker advises software developers and companies to reject the AGI hype and start planning for a profitable phase two. This involves applying current LLM technology to real-world problems and creating software that interfaces between LLMs and business issues.

  • What does the speaker predict for the next few years in terms of LLMs and software development?

    -The speaker predicts that in the next few years, many people will take LLM models and wrap non-AI functionality around them to specialize them for specific use cases. This could be similar to the period from 2008 to 2014 when existing services were converted into mobile apps.

Outlines

00:00

๐Ÿค– The Underpants Gnomes of AI and the Quest for AGI

The speaker likens the current state of artificial general intelligence (AGI) to the 'Underpants Gnomes' analogy from the 1990s, suggesting that while the industry claims to have a clear path forward, there's a lack of understanding about how to achieve AGI. The human brain is acknowledged as an incredibly complex system, and current large language models (LLMs) are far simpler. Skepticism is expressed about whether simple LLMs can generate human-level intelligence. The speaker also contrasts the human brain's continuous feedback incorporation with the static nature of LLMs at the time of training. The discussion suggests that LLMs may not be the final step towards AGI. It's argued that the focus should shift from waiting for AGI to leveraging current LLMs for economic value and profit, drawing a parallel to the post-dotcom bubble era where e-commerce and internet advertising emerged as valuable, despite initial failures.

05:00

๐Ÿ“‰ Diminishing Returns in AI Development and the Data Dilemma

This paragraph delves into the potential limitations of AI growth, suggesting that exponential growth in AI models is not sustainable due to finite resources. The speaker references the Chinchilla experiment by Google, which found an optimal ratio between the amount of data, tokens, and parameters for training AI models. Beyond this ratio, increased computational resources do not improve functionality and only waste resources. Evidence is presented that suggests we may be reaching a point of diminishing returns for LLMs, with benchmarks not improving proportionally to the resources invested. The speaker also discusses the possibility that high-quality data, necessary for training, may be running out and the issue of 'Model Collapse' where training on LLM-generated data leads to a degradation in data quality. The implications are that the growth of AI may be more limited than the hype suggests, and the focus should be on applying current technology to real-world problems rather than waiting for AGI.

10:00

๐Ÿ›  The Future of Software Development with LLMs and the Need for Specialization

The speaker predicts that the next phase for software development will involve wrapping non-AI functionalities around LLMs to specialize them for specific use cases. They suggest that the improvements in code generation AI may slow down due to the scarcity of high-quality data, as there are significantly fewer tokens from human-written source code compared to English tokens available on the internet. The paragraph also highlights studies showing that while AI can speed up coding, it also leads to code quality issues that require rework. The speaker envisions a future where developers will still be needed to create and maintain code, especially for long-term projects, and that the focus should shift from waiting for better AI models to applying current LLMs to solve real-world problems. The economic climate is compared to 2008, with generative AI as a potential money maker, and the speaker expresses cautious optimism about the future of AI in business and software development.

Mindmap

Keywords

๐Ÿ’กUnderpants Gnomes

The term 'Underpants Gnomes' is used metaphorically to describe a flawed business or strategic plan that lacks a clear or logical second step. In the script, it is used to criticize the startup culture of the 1990s and later to draw a parallel with the current state of AI, suggesting that while there is enthusiasm and investment in AI, there is a lack of understanding or a coherent plan on how to achieve the ultimate goal of general artificial intelligence (AGI).

๐Ÿ’กAGI (Artificial General Intelligence)

AGI, or Artificial General Intelligence, refers to a type of AI that possesses the ability to understand, learn, and apply knowledge across a wide range of tasks at a level equal to or beyond that of a human. The script discusses skepticism around the idea that current AI models, particularly Large Language Models (LLMs), are on the verge of achieving AGI, highlighting the complexity of the human brain and the limitations of current technology.

๐Ÿ’กLLMs (Large Language Models)

LLMs are advanced AI systems designed to process and generate human-like text based on vast amounts of data. The script mentions LLMs as being far simpler than the human brain and questions whether they can achieve human-level intelligence. It also discusses the limitations of LLMs in terms of their inability to incorporate feedback continuously, unlike the human brain.

๐Ÿ’กE-commerce

E-commerce refers to the buying and selling of goods or services using the internet, as well as the transfer of money and data to execute these transactions. In the script, e-commerce is cited as an example of a useful outcome from the dot-com bubble, suggesting that despite the failure of many startups, valuable innovations can still emerge from periods of hype and investment in technology.

๐Ÿ’กExponential Growth

Exponential growth is a pattern of increase where the value of a variable grows at a rate proportional to its current value. The script challenges the idea of exponential growth in the context of AI development, arguing that it is a theoretical construct and not sustainable in the real world due to finite resources.

๐Ÿ’กLogistic Curve

A logistic curve is an S-shaped curve that describes growth that starts slowly, accelerates, and then slows again, reaching a saturation point. The script uses the logistic curve to illustrate the limitations of growth, suggesting that what is often referred to as exponential growth is actually just a part of this curve and will eventually level off.

๐Ÿ’กChinchilla Experiment

The Chinchilla experiment, conducted by Google, is mentioned in the script as evidence that there may be a limit to the effectiveness of increasing computational resources for AI models. The experiment suggests an optimal ratio between the amount of data, the number of tokens, and the number of parameters in an AI model, beyond which additional resources do not improve functionality.

๐Ÿ’กData Quality

Data quality refers to the overall quality of the data used for training AI models, including its accuracy, completeness, and relevance. The script discusses a crisis in data quality, particularly the issue of 'Model Collapse,' where training on LLM-generated data leads to a degradation in the quality of the data and the models' performance.

๐Ÿ’กCode Generation

Code generation is the process of using AI to create or suggest code for software development. The script highlights the challenges with code generation, noting that AI-generated code tends to have a higher likelihood of being reverted or rewritten, indicating that while it may speed up initial development, it could lead to maintenance challenges.

๐Ÿ’กGPT-5

GPT-5 refers to the hypothetical next iteration of the GPT (Generative Pre-trained Transformer) series of LLMs developed by OpenAI. The script speculates about the potential release and capabilities of GPT-5, suggesting that if it does not represent an exponential improvement over previous models, it may signal a shift in the AI community's approach to applying current technology to real-world problems.

Highlights

Critique of startup culture in the 1990s as 'Underpants Gnomes', implying a lack of clear direction or understanding of how to achieve goals.

Comparison of the 'Underpants Gnomes' analogy to the current AI landscape, suggesting a similar lack of understanding in achieving AGI (Artificial General Intelligence).

Historical perspective on cycles of 'True AI is just around the corner' with no clear path to achieving it.

The human brain as the most complex system known, with current Large Language Models (LLMs) being far simpler.

Skepticism about the ability of relatively simple LLMs to create human-level intelligence.

Differences between the human brain's continuous feedback incorporation and the static nature of LLMs' training.

The potential for LLMs to be a stepping stone rather than the final solution for AGI.

E-commerce and internet advertising as outcomes of the '90s startup failures, highlighting the usefulness that can emerge from AI development.

The importance of moving past the hype cycle of AGI for practical application and profit generation.

The need for software developers and companies to plan for a profitable phase two, beyond the AGI hype.

Exponential growth in the real world being limited by finite resources, contrasting with theoretical mathematical models.

Evidence suggesting that LLMs may be reaching a point of diminishing returns, with increased resources not proportionally improving benchmarks.

The Chinchilla experiment indicating an optimal ratio between data, tokens, and parameters for LLMs, beyond which additional compute is wasteful.

Concerns about running out of high-quality data for training LLMs, potentially within the next year.

The 'Model Collapse' effect, where training on LLM-generated data leads to a degradation in data quality.

GitHub's study showing mixed results on the impact of AI on programmer efficiency and code quality.

Prediction that future development will involve wrapping non-AI functionality around LLMs to specialize them for specific use cases.

Comparison of the current AI landscape to the early days of the Apple App Store, suggesting a potential for future growth and profitability.

The anticipation of chat GPT-5 and its potential impact on the recognition of the need to apply current LLMs to real-world problems.

Transcripts

play00:00

Back in the 1990s, because of

play00:01

course, I have to bring up the 1990s,

play00:03

there was this critique of startup

play00:05

culture as "Underpants Gnomes."

play00:08

The implication was that a lot of

play00:10

startups and a lot of startups

play00:12

were claiming that they knew where

play00:13

they were going, but they had no

play00:14

actual clue.

play00:15

Basically, it was "step one, do a

play00:17

thing, collect underpants, whatever.

play00:20

Step two, question mark, step three,

play00:21

profit."

play00:22

That struck home at the time. I was

play00:24

part of the Austin startup

play00:25

community,

play00:25

but I think it's actually more apt

play00:27

as an analogy for the current AI

play00:29

landscape.

play00:29

So we've been through at least two

play00:31

previous cycles of "True AI

play00:33

is just around the corner" in the

play00:34

last, oh, 50 or so years.

play00:36

And each time it turned out we

play00:37

actually had no clue how to get

play00:39

from what we were working on

play00:40

to where we wanted to go.

play00:42

So we'll see what happens, but many

play00:44

people, including me,

play00:45

think that when it comes to general

play00:47

artificial intelligence or AGI,

play00:49

we're still just as clueless as the

play00:51

underpants gnomes.

play00:52

The human brain, according to

play00:53

experts, is arguably the most

play00:55

complex system

play00:56

that humans are aware of.

play00:57

Now, not everybody believes this,

play00:59

but there are some links below.

play01:00

And it's without dispute at least

play01:02

that our current state of the art LLMs

play01:04

are far, far, far simpler than what

play01:08

we know of the human brain.

play01:10

And we don't understand a lot of

play01:11

the human brain.

play01:12

And the human brain is the only

play01:13

thing we know that can generate

play01:14

human-level intelligence.

play01:15

So the idea that a really simple,

play01:18

relatively speaking,

play01:18

LLM will be able to create human-level

play01:20

intelligence.

play01:21

I'm kind of skeptical about that.

play01:23

And in addition, the human brain is

play01:25

able to incorporate feedback almost

play01:28

continuously,

play01:28

whereas LLMs, even the ones that

play01:31

can go and look at the internet at the moment,

play01:32

their networks are frozen at the

play01:33

time of training.

play01:34

Because of those two factors at

play01:36

least, although there are a lot of others,

play01:37

a lot of people, including me, don't

play01:40

believe that LLMs are the last step

play01:43

on the way to human-level

play01:44

intelligence.

play01:45

But the last underpants gnome

play01:48

question eventually got answered, right?

play01:51

So a lot of startups failed in the

play01:53

late '90s and early 2000s.

play01:56

But what came out of that was e-commerce

play01:58

and internet targeted advertising,

play02:00

which turned out to be useful,

play02:01

although like a lot of things,

play02:03

there have been some downsides.

play02:04

Like e-commerce, the LLMs we have

play02:06

now, despite some issues, are

play02:09

genuinely widely useful.

play02:11

And they have some really useful

play02:13

applications.

play02:14

So I think that, for the current AI

play02:17

cycle, we might have a step 2.

play02:20

Not for AGI, but when it comes to

play02:22

how we might be able to provide

play02:24

economic value

play02:25

and actually generate profits.

play02:26

And by we, I mean a lot of us, not

play02:29

just the OpenAI's and the Anthropic's

play02:31

of the world.

play02:32

Although big companies like Amazon

play02:34

grabbed a huge share of e-commerce,

play02:36

a whole bunch of other companies

play02:37

benefited from the rise of that

play02:38

ecosystem too.

play02:39

But those two things are tied

play02:41

together.

play02:42

As long as we're listening to the

play02:44

hype and we believe that AGI is

play02:45

just around the corner,

play02:46

and all of the jobs are about to go

play02:48

away and be replaced by AI,

play02:50

there's no incentive to try to

play02:51

apply to any of this, and there's

play02:53

no path to profit.

play02:54

In order for us to get to the part

play02:56

where we start using the current LLM

play02:58

technology to actually get

play02:59

stuff done, we have to give up on

play03:01

AGI and we have to get past the

play03:02

hype cycle.

play03:03

Because few people will invest in

play03:05

building something now if chat GPT-5

play03:08

or 6 is going to

play03:08

make it irrelevant in a year or two.

play03:10

And even if we did build something

play03:11

now that's useful, few businesses, if any,

play03:14

are going to be willing to try to

play03:15

adopt that thing, if it turns out

play03:17

that chat GPT-5 or 6 is

play03:19

likely to make that irrelevant in a

play03:20

year or two.

play03:20

So today I'm going to try to make

play03:23

two cases.

play03:24

I'm going to try to make the case

play03:25

that it's time for software

play03:26

developers and software companies

play03:28

to both reject the AGI hype and

play03:31

start at least planning for a

play03:33

profitable phase two.

play03:34

The comments on this video are

play03:36

going to be...

play03:47

This is the Internet of Bugs. My

play03:49

name is Carl.

play03:49

And I'm going to start today with

play03:51

an interview that Ezra Klein

play03:52

released with the CEO of Anthropic.

play03:54

I've put the link in the show notes

play03:56

as always.

play03:56

The word "exponential" was mentioned

play03:58

unchallenged like 18 times.

play04:00

Now I'm not an AI or an ML expert,

play04:02

but I do know software and I know

play04:04

computational scaling

play04:05

at my degrees in Physics.

play04:06

Any experimental physicist or

play04:07

scientist can tell you that

play04:08

exponential growth

play04:10

doesn't actually happen.

play04:10

It doesn't exist in the real world.

play04:12

It's a theoretical mathematical

play04:15

construct and it just can't happen.

play04:17

The reason is in the real world

play04:18

there aren't infinite resources.

play04:20

Here growth always has limitations.

play04:24

What we refer to as exponential is

play04:26

actually a particular part of what's

play04:27

called a logistic curve,

play04:28

where there are far more resources

play04:30

available and is needed to grow.

play04:32

Once those resources become

play04:34

constrained, the exponential part

play04:36

are going to stop.

play04:36

So here's a graph from 3Blue1Brown

play04:39

which is a fantastic explainer

play04:41

YouTube channel.

play04:41

Again, link to the full video below.

play04:43

This is the video that explains

play04:44

about logistic curves.

play04:45

We know that our current AI models

play04:46

can't grow forever because nothing

play04:48

can grow forever.

play04:49

We know that we'll run out of

play04:50

resources at some point.

play04:51

The thing is, we don't know what

play04:53

resource will run out at first.

play04:54

We won't know what the limiting

play04:56

factor is until we actually hit it.

play04:59

Now it's possible we'll start

play05:00

running out of the critical

play05:01

resource very soon.

play05:02

It's possible we already have and

play05:03

just don't know it yet.

play05:05

It's actually possible that a few

play05:07

people know it, but they aren't

play05:08

admitting it to the public yet.

play05:10

There's an article that came out

play05:11

the day after the Ezra Klein Interview

play05:12

was posted,

play05:13

detailing evidence that LLMs are

play05:14

already reaching a point of

play05:15

diminishing returns.

play05:16

We're throwing more and more

play05:18

resources at LLMs,

play05:18

but the benchmarks just aren't

play05:20

going up commensurately.

play05:21

That article linked to this paper

play05:22

from last month showing that since

play05:24

2012,

play05:24

"the median doubling time for

play05:26

effective compute is 8.4 months

play05:28

with a 95% confidence interval

play05:30

4.5 to 14.3 months."

play05:31

This graph from that paper shows

play05:33

that basically every 8 months or so,

play05:35

models are getting roughly twice as

play05:37

efficient.

play05:38

Note, this graph is log scale.

play05:40

So a straight line down and to the

play05:41

right indicates an exponential

play05:42

reduction.

play05:43

This is a graph I made of the same

play05:44

data.

play05:45

It's the same data as the previous graph

play05:46

Only I'm not using log scale.

play05:48

I'm using smaller dots so you can

play05:49

see better.

play05:49

And I put a trend curve in there

play05:51

that was just added by Apple's

play05:53

"Numbers" spreadsheet.

play05:54

So, answering a question in 2023

play05:56

should have taken roughly half as

play05:57

many resources

play05:58

as it took to answer that same

play06:00

question in 2022.

play06:01

And roughly 1/16th as many

play06:03

resources as four years earlier in

play06:04

2019.

play06:05

But here's the graph from another

play06:06

resource.

play06:07

It shows that since mid 2019,

play06:09

improvements on the multi-task

play06:10

language understanding benchmark

play06:12

have been linear, not exponential.

play06:15

So we decreased the amount of

play06:16

effort we're expending and that

play06:18

decrease is exponential.

play06:19

But the benchmark improvement that

play06:20

we're getting for it is only linear.

play06:22

And that disconnect between the

play06:23

exponential and the linear

play06:25

suggests there might be some other

play06:26

factor that's limiting our

play06:27

efficiency.

play06:28

And we might actually know what it

play06:29

is. So Google did an experiment in 2022

play06:32

with a model called Chinchilla.

play06:33

There's a paper presented at the

play06:35

World AI alignment forum that

play06:36

breaks down the implications of

play06:37

that. Chinchilla found there's a sweet

play06:39

spot between the amount of data a

play06:40

model is trained on,

play06:41

the number of tokens, and the

play06:42

number of parameters it has.

play06:44

It seems to show that past this

play06:45

optimal ratio, if you throw more

play06:47

compute at the same size data set,

play06:48

it just doesn't increase

play06:49

functionality, it just wastes

play06:50

resources.

play06:51

So if the Chinchilla paper is right,

play06:52

then the number of tokens is a

play06:54

limiting factor.

play06:55

And if that's true, how close are

play06:56

we to running out?

play06:57

Last month, the 2024 Stanford

play06:59

annual AI index report came out.

play07:01

There was a four page special

play07:03

section called "Will models run out of data?"

play07:05

And it started on page 26.

play07:07

Of course, no mention of running

play07:08

out of data makes it to any of the

play07:09

key takeaways or summary,

play07:11

or anything that you would know if

play07:12

you just read the first few pages.

play07:14

And I wonder why that is.

play07:15

Actually, I don't wonder why that

play07:17

is at all.

play07:18

That section references a paper

play07:21

from Epoch AI that estimates we

play07:24

could run out of high quality

play07:25

language stock in the next year, if

play07:27

we haven't already.

play07:28

In addition, there's a crisis of

play07:30

data quality.

play07:31

Several papers have shown an effect

play07:33

called "Model Collapse,"

play07:34

which happens when a model is

play07:36

trained on LLM generated data.

play07:38

Over time it becomes like having a

play07:40

photocopy of a photocopy of a

play07:41

photocopy of a photocopy-

play07:43

if anybody on YouTube even knows

play07:44

what a photocopy is anymore.

play07:46

But it causes the data to become

play07:48

more and more and more similar,

play07:50

and all of the interesting stuff to go away.

play07:52

And eventually just kind of converges

play07:54

on boring.

play07:55

There's a graphic that kind of

play07:57

displays that.

play07:58

Now, we don't know for sure yet

play08:00

that the data problem is going to

play08:01

be the limiting factor,

play08:02

but evidence points to that.

play08:04

And we should find out before too long.

play08:06

Now, we're still going to get a GPT-5

play08:07

one of these days.

play08:08

But if this evidence is correct, it's

play08:10

not going to be the dramatic

play08:11

increase that the hype has led

play08:12

many to believe. Closer to home,

play08:13

talking to the developers out there,

play08:15

The "running out of data problem" is

play08:17

even worse for code generation.

play08:18

If training data is the bottleneck,

play08:20

the improvements in code generation

play08:22

are going to slow down even more.

play08:23

Because there are orders of

play08:24

magnitude more English tokens

play08:26

available on the internet,

play08:27

then there are available tokens

play08:28

from human written source code, right?

play08:30

And there's already evidence that

play08:32

code generation isn't all that it's

play08:34

claimed to be.

play08:34

So GitHub reduced a study in 2022

play08:37

that showed that programmers were

play08:38

55% faster

play08:39

when they were using AI, which is

play08:41

cool. But with two more years of GitHub

play08:42

copilot data, a new paper shows

play08:44

that AI generated code

play08:46

"creates a downward pressure on code

play08:48

quality."

play08:48

That paper, which is from GitClear,

play08:50

shows that AI generated code has a

play08:52

much higher likelihood

play08:53

of being reverted or rewritten

play08:54

within the first two weeks.

play08:56

It's a metric they call "code churn."

play08:57

There are other quality metrics

play08:58

that are a problem,

play09:00

copy/paste, repeating that kind of

play09:01

stuff.

play09:02

But the code churn one is really bad.

play09:04

So: just a quick recap.

play09:06

1) We know that growth must be limited

play09:07

on something.

play09:08

2) The Chinchilla experiments imply

play09:09

that something might be data.

play09:11

3) High quality data is running out or

play09:13

it has already.

play09:13

4) There's evidence that general LLM

play09:15

progress has dramatically slowed recently,

play09:17

increasing only linearly since 2020.

play09:19

5) Code generation has orders of

play09:21

magnitude less data that requires

play09:22

more correctness than English.

play09:24

And 6) we're already seeing evidence

play09:25

of AI reducing general code quality.

play09:27

And if all that turns out to be

play09:29

correct, in many ways, that's

play09:31

really good for business.

play09:32

And it's good for us, the

play09:34

developers.

play09:34

So we've been living in a world

play09:36

where no one really wanted to

play09:37

invest in building

play09:38

software products around LLMs that

play09:40

would apply to business problems.

play09:42

Because everybody believed that the

play09:43

LLMs would be twice as good next

play09:44

year. So they might be able to handle the

play09:46

business problem on their own

play09:47

and you would have wasted your time and money.

play09:49

Once it becomes clear that the

play09:50

growth is becoming just incremental,

play09:53

then it will be time to apply LLMs

play09:55

to real world problems.

play09:56

And that means writing software to

play09:57

interface between the LLMs and the

play09:59

business issues.

play10:00

Based on the code quality data from

play10:01

GitClear, we're still going to need

play10:03

plenty of programmers.

play10:04

Those programmers might be able to

play10:05

create code a little faster with AI,

play10:07

but since the AI generated code

play10:08

seems to require a lot of rework,

play10:10

it might be fine for one shot.

play10:11

"Let's get the answer and throw the

play10:13

code away" problems,

play10:14

but not for code that needs to be

play10:15

maintained long-term.

play10:16

So I'm guessing the next few years,

play10:18

maybe three to five,

play10:19

there are going to be a lot of

play10:20

people that are going to be taking

play10:22

the LLM models

play10:23

and they're going to be wrapping

play10:24

non-AI functionality around them

play10:26

to specialize them for specific use

play10:28

cases.

play10:28

This is going to be a thing kind of

play10:30

the way Devin,

play10:31

or at least the open source

play10:33

versions like Devika and OpenDevin

play10:35

that work about as well as Devin,

play10:36

but we can actually look under the

play10:38

hood and see how they work.

play10:39

Those systems are on LLM,

play10:40

but with a browser module plugged

play10:42

in and a terminal module and a

play10:43

planning module

play10:44

and a reporting module and some

play10:45

expert systems around it

play10:46

that kind of tell it what to do.

play10:48

That kind of system can be applied

play10:50

not just to,

play10:51

you know, "let's pretend to be an AI

play10:53

engineer,"

play10:54

but it can also be applied to all

play10:55

kinds of business problems.

play10:56

The same way that 2008 to 2012 or

play11:00

2014 something in there

play11:02

was a lot about taking existing

play11:03

stuff like websites

play11:04

and then converting them to become

play11:06

mobile apps.

play11:06

2024 to 2027 or 2029, something like that

play11:10

might be taking a lot of existing

play11:13

things,

play11:13

services, software, apps, websites,

play11:15

that kind of thing

play11:16

and reforming them around the

play11:18

capabilities of LLMs.

play11:20

It feels to me now a lot like 2008.

play11:22

The first Apple App Store had just

play11:24

been announced

play11:25

and we thought we could make money

play11:26

from it, but we weren't quite sure.

play11:27

And the economy was kind of illiquid

play11:29

because the mortgage bubble had

play11:30

just popped

play11:31

and we were in the beginnings of

play11:32

the Great Recession.

play11:33

At the moment, it seems like generative

play11:36

AI is going to be a money maker,

play11:37

but we're not quite sure and the

play11:38

economy is kind of illiquid

play11:39

because interest rates are still really high.

play11:40

If when that settles, we could just

play11:42

well be off to the races.

play11:43

Am I right?

play11:44

No idea, but we should know soon.

play11:48

We've heard a lot recently about

play11:49

how chat GPT-5 is supposed to be a

play11:50

huge leap

play11:51

and it's supposed to be released as

play11:52

early as this summer.

play11:53

There's a link to that in the notes

play11:54

below. If we don't get an exponentially

play11:57

better chat GPT-5 this year,

play11:59

then hopefully people will start

play12:00

recognizing that it's time to

play12:01

change from "let's wait for better LLMs" to

play12:04

"let's take the current LLMs

play12:05

and let's start applying it to real

play12:07

world problems."

play12:07

And if we can do that, that will be

play12:10

a phase two that we can live with.

play12:12

Remember, the internet is full of bugs.

play12:14

Anyone who tells you different is

play12:15

probably trying to sell you some

play12:17

stupid AI service. Let's be careful

play12:20

out there.

Rate This
โ˜…
โ˜…
โ˜…
โ˜…
โ˜…

5.0 / 5 (0 votes)

Related Tags
AI LimitationsStartup CultureLLM FutureAGI DiscussionEconomic ValueHype CycleSoftware DevelopmentData QualityCode GenerationTech Analysis