How Developers might stop worrying about AI taking software jobs and Learn to Profit from LLMs
Summary
TLDRThe video script critiques the hype surrounding general artificial intelligence (AGI), likening it to the 'Underpants Gnomes' from the 1990s, where the end goal is assumed but the path is unclear. It argues that despite advances in large language models (LLMs), we are still far from understanding human-level intelligence, which is akin to the complexity of the human brain. The script suggests that the current AGI landscape may be reaching a plateau, with diminishing returns on investment in computational power. It posits that instead of waiting for AGI, developers should focus on leveraging existing LLMs to solve real-world problems, hinting at a potential phase two of AI application that could be economically valuable.
Takeaways
- ๐ง The 'Underpants Gnomes' analogy is used to describe the lack of a clear path to achieving General Artificial Intelligence (AGI), suggesting that like the 1990s startup culture, there's a disconnect between current efforts and the desired outcome.
- ๐ง The human brain is considered the most complex system known to humans, and our current Large Language Models (LLMs) are significantly simpler and less capable of generating human-level intelligence.
- ๐ LLMs cannot incorporate feedback continuously like the human brain, which is a key limitation in their ability to achieve AGI.
- ๐ The speaker argues against the hype of AGI being imminent, suggesting that we are still clueless about how to reach that level of intelligence with current technology.
- ๐ The script challenges the idea of exponential growth in AI capabilities, explaining that real-world growth is always limited by resources and cannot continue indefinitely.
- ๐ Evidence suggests that LLMs may be reaching a point of diminishing returns, where more resources do not proportionally improve performance, hinting at potential resource constraints.
- ๐ The 'Chinchilla' experiment by Google indicates that there is an optimal ratio of data, tokens, and parameters for training models, beyond which additional compute is wasted.
- ๐๏ธ High-quality data for training LLMs may be running out or has already, which could be a significant limiting factor for the growth of AI capabilities.
- ๐ The AI index report and other studies point to a crisis of data quality, with phenomena like 'Model Collapse' reducing the effectiveness of training data.
- ๐ป For developers, the implications are that the growth of LLMs is likely to slow, making it a more stable foundation for building software products and solutions.
- ๐ฎ The speaker predicts a future where developers will wrap non-AI functionality around LLMs to specialize them for specific business use cases, similar to the app development boom post-2008.
Q & A
What is the 'Underpants Gnomes' analogy in relation to startup culture and AI landscape?
-The 'Underpants Gnomes' analogy refers to a critique from the 1990s suggesting that many startups claimed to know where they were going but actually had no clear plan. The speaker compares this to the current AI landscape, where there is a belief that general artificial intelligence (AGI) is just around the corner, despite a lack of understanding of how to achieve it.
Why does the speaker compare LLMs (Large Language Models) to the 'Underpants Gnomes'?
-The speaker compares LLMs to the 'Underpants Gnomes' because, similar to the startup culture critique, there is a belief that LLMs will eventually lead to human-level intelligence (AGI) without a clear understanding of how to get there. The speaker expresses skepticism about the simplicity of LLMs being able to create human-level intelligence.
What is the speaker's view on the current state of LLMs in comparison to the human brain?
-The speaker believes that despite the advancements, our current LLMs are far simpler than the human brain, which is arguably the most complex system known to humans. The speaker also mentions that the human brain can incorporate feedback continuously, unlike LLMs, which have their networks frozen at the time of training.
What is the significance of the 'exponential growth' concept in the context of AI development?
-The speaker argues against the unchallenged use of the term 'exponential growth' in AI development, stating that exponential growth is a theoretical construct and cannot occur indefinitely in the real world due to finite resources. The speaker suggests that the growth of AI models will eventually face limitations.
What evidence does the speaker present to suggest that AI models might be reaching a point of diminishing returns?
-The speaker cites a study that shows improvements on the multi-task language understanding benchmark have been linear, not exponential, since mid-2019. Additionally, the speaker refers to the Chinchilla experiment, which found a sweet spot for model training beyond which increasing compute does not improve functionality.
What is the 'Chinchilla' experiment, and what does it imply for AI model training?
-The 'Chinchilla' experiment conducted by Google found an optimal ratio between the amount of data a model is trained on, the number of tokens, and the number of parameters it has. Beyond this ratio, increasing compute on the same size dataset does not improve functionality but instead wastes resources.
What does the speaker suggest about the future of high-quality data for AI models?
-The speaker suggests that high-quality data might be running out or has already, as indicated by a paper from Epoch AI estimating that we could run out of high-quality language stock in the next year. This could be a limiting factor for AI model growth.
How does the speaker view the impact of AI on code generation and its quality?
-The speaker mentions that AI-generated code has a higher likelihood of being reverted or rewritten within the first two weeks, indicating a 'code churn' problem. This suggests that while AI can help generate code faster, it may not be suitable for long-term maintenance.
What is the speaker's perspective on the economic value and profitability of AI in the current cycle?
-The speaker believes that for the current AI cycle, there might be a 'step 2' that provides economic value and generates profits, similar to how e-commerce and internet advertising emerged from the dot-com bubble. However, this is not for AGI but for the practical applications of current LLM technology.
What advice does the speaker give to software developers and companies regarding AGI hype?
-The speaker advises software developers and companies to reject the AGI hype and start planning for a profitable phase two. This involves applying current LLM technology to real-world problems and creating software that interfaces between LLMs and business issues.
What does the speaker predict for the next few years in terms of LLMs and software development?
-The speaker predicts that in the next few years, many people will take LLM models and wrap non-AI functionality around them to specialize them for specific use cases. This could be similar to the period from 2008 to 2014 when existing services were converted into mobile apps.
Outlines
๐ค The Underpants Gnomes of AI and the Quest for AGI
The speaker likens the current state of artificial general intelligence (AGI) to the 'Underpants Gnomes' analogy from the 1990s, suggesting that while the industry claims to have a clear path forward, there's a lack of understanding about how to achieve AGI. The human brain is acknowledged as an incredibly complex system, and current large language models (LLMs) are far simpler. Skepticism is expressed about whether simple LLMs can generate human-level intelligence. The speaker also contrasts the human brain's continuous feedback incorporation with the static nature of LLMs at the time of training. The discussion suggests that LLMs may not be the final step towards AGI. It's argued that the focus should shift from waiting for AGI to leveraging current LLMs for economic value and profit, drawing a parallel to the post-dotcom bubble era where e-commerce and internet advertising emerged as valuable, despite initial failures.
๐ Diminishing Returns in AI Development and the Data Dilemma
This paragraph delves into the potential limitations of AI growth, suggesting that exponential growth in AI models is not sustainable due to finite resources. The speaker references the Chinchilla experiment by Google, which found an optimal ratio between the amount of data, tokens, and parameters for training AI models. Beyond this ratio, increased computational resources do not improve functionality and only waste resources. Evidence is presented that suggests we may be reaching a point of diminishing returns for LLMs, with benchmarks not improving proportionally to the resources invested. The speaker also discusses the possibility that high-quality data, necessary for training, may be running out and the issue of 'Model Collapse' where training on LLM-generated data leads to a degradation in data quality. The implications are that the growth of AI may be more limited than the hype suggests, and the focus should be on applying current technology to real-world problems rather than waiting for AGI.
๐ The Future of Software Development with LLMs and the Need for Specialization
The speaker predicts that the next phase for software development will involve wrapping non-AI functionalities around LLMs to specialize them for specific use cases. They suggest that the improvements in code generation AI may slow down due to the scarcity of high-quality data, as there are significantly fewer tokens from human-written source code compared to English tokens available on the internet. The paragraph also highlights studies showing that while AI can speed up coding, it also leads to code quality issues that require rework. The speaker envisions a future where developers will still be needed to create and maintain code, especially for long-term projects, and that the focus should shift from waiting for better AI models to applying current LLMs to solve real-world problems. The economic climate is compared to 2008, with generative AI as a potential money maker, and the speaker expresses cautious optimism about the future of AI in business and software development.
Mindmap
Keywords
๐กUnderpants Gnomes
๐กAGI (Artificial General Intelligence)
๐กLLMs (Large Language Models)
๐กE-commerce
๐กExponential Growth
๐กLogistic Curve
๐กChinchilla Experiment
๐กData Quality
๐กCode Generation
๐กGPT-5
Highlights
Critique of startup culture in the 1990s as 'Underpants Gnomes', implying a lack of clear direction or understanding of how to achieve goals.
Comparison of the 'Underpants Gnomes' analogy to the current AI landscape, suggesting a similar lack of understanding in achieving AGI (Artificial General Intelligence).
Historical perspective on cycles of 'True AI is just around the corner' with no clear path to achieving it.
The human brain as the most complex system known, with current Large Language Models (LLMs) being far simpler.
Skepticism about the ability of relatively simple LLMs to create human-level intelligence.
Differences between the human brain's continuous feedback incorporation and the static nature of LLMs' training.
The potential for LLMs to be a stepping stone rather than the final solution for AGI.
E-commerce and internet advertising as outcomes of the '90s startup failures, highlighting the usefulness that can emerge from AI development.
The importance of moving past the hype cycle of AGI for practical application and profit generation.
The need for software developers and companies to plan for a profitable phase two, beyond the AGI hype.
Exponential growth in the real world being limited by finite resources, contrasting with theoretical mathematical models.
Evidence suggesting that LLMs may be reaching a point of diminishing returns, with increased resources not proportionally improving benchmarks.
The Chinchilla experiment indicating an optimal ratio between data, tokens, and parameters for LLMs, beyond which additional compute is wasteful.
Concerns about running out of high-quality data for training LLMs, potentially within the next year.
The 'Model Collapse' effect, where training on LLM-generated data leads to a degradation in data quality.
GitHub's study showing mixed results on the impact of AI on programmer efficiency and code quality.
Prediction that future development will involve wrapping non-AI functionality around LLMs to specialize them for specific use cases.
Comparison of the current AI landscape to the early days of the Apple App Store, suggesting a potential for future growth and profitability.
The anticipation of chat GPT-5 and its potential impact on the recognition of the need to apply current LLMs to real-world problems.
Transcripts
Back in the 1990s, because of
course, I have to bring up the 1990s,
there was this critique of startup
culture as "Underpants Gnomes."
The implication was that a lot of
startups and a lot of startups
were claiming that they knew where
they were going, but they had no
actual clue.
Basically, it was "step one, do a
thing, collect underpants, whatever.
Step two, question mark, step three,
profit."
That struck home at the time. I was
part of the Austin startup
community,
but I think it's actually more apt
as an analogy for the current AI
landscape.
So we've been through at least two
previous cycles of "True AI
is just around the corner" in the
last, oh, 50 or so years.
And each time it turned out we
actually had no clue how to get
from what we were working on
to where we wanted to go.
So we'll see what happens, but many
people, including me,
think that when it comes to general
artificial intelligence or AGI,
we're still just as clueless as the
underpants gnomes.
The human brain, according to
experts, is arguably the most
complex system
that humans are aware of.
Now, not everybody believes this,
but there are some links below.
And it's without dispute at least
that our current state of the art LLMs
are far, far, far simpler than what
we know of the human brain.
And we don't understand a lot of
the human brain.
And the human brain is the only
thing we know that can generate
human-level intelligence.
So the idea that a really simple,
relatively speaking,
LLM will be able to create human-level
intelligence.
I'm kind of skeptical about that.
And in addition, the human brain is
able to incorporate feedback almost
continuously,
whereas LLMs, even the ones that
can go and look at the internet at the moment,
their networks are frozen at the
time of training.
Because of those two factors at
least, although there are a lot of others,
a lot of people, including me, don't
believe that LLMs are the last step
on the way to human-level
intelligence.
But the last underpants gnome
question eventually got answered, right?
So a lot of startups failed in the
late '90s and early 2000s.
But what came out of that was e-commerce
and internet targeted advertising,
which turned out to be useful,
although like a lot of things,
there have been some downsides.
Like e-commerce, the LLMs we have
now, despite some issues, are
genuinely widely useful.
And they have some really useful
applications.
So I think that, for the current AI
cycle, we might have a step 2.
Not for AGI, but when it comes to
how we might be able to provide
economic value
and actually generate profits.
And by we, I mean a lot of us, not
just the OpenAI's and the Anthropic's
of the world.
Although big companies like Amazon
grabbed a huge share of e-commerce,
a whole bunch of other companies
benefited from the rise of that
ecosystem too.
But those two things are tied
together.
As long as we're listening to the
hype and we believe that AGI is
just around the corner,
and all of the jobs are about to go
away and be replaced by AI,
there's no incentive to try to
apply to any of this, and there's
no path to profit.
In order for us to get to the part
where we start using the current LLM
technology to actually get
stuff done, we have to give up on
AGI and we have to get past the
hype cycle.
Because few people will invest in
building something now if chat GPT-5
or 6 is going to
make it irrelevant in a year or two.
And even if we did build something
now that's useful, few businesses, if any,
are going to be willing to try to
adopt that thing, if it turns out
that chat GPT-5 or 6 is
likely to make that irrelevant in a
year or two.
So today I'm going to try to make
two cases.
I'm going to try to make the case
that it's time for software
developers and software companies
to both reject the AGI hype and
start at least planning for a
profitable phase two.
The comments on this video are
going to be...
This is the Internet of Bugs. My
name is Carl.
And I'm going to start today with
an interview that Ezra Klein
released with the CEO of Anthropic.
I've put the link in the show notes
as always.
The word "exponential" was mentioned
unchallenged like 18 times.
Now I'm not an AI or an ML expert,
but I do know software and I know
computational scaling
at my degrees in Physics.
Any experimental physicist or
scientist can tell you that
exponential growth
doesn't actually happen.
It doesn't exist in the real world.
It's a theoretical mathematical
construct and it just can't happen.
The reason is in the real world
there aren't infinite resources.
Here growth always has limitations.
What we refer to as exponential is
actually a particular part of what's
called a logistic curve,
where there are far more resources
available and is needed to grow.
Once those resources become
constrained, the exponential part
are going to stop.
So here's a graph from 3Blue1Brown
which is a fantastic explainer
YouTube channel.
Again, link to the full video below.
This is the video that explains
about logistic curves.
We know that our current AI models
can't grow forever because nothing
can grow forever.
We know that we'll run out of
resources at some point.
The thing is, we don't know what
resource will run out at first.
We won't know what the limiting
factor is until we actually hit it.
Now it's possible we'll start
running out of the critical
resource very soon.
It's possible we already have and
just don't know it yet.
It's actually possible that a few
people know it, but they aren't
admitting it to the public yet.
There's an article that came out
the day after the Ezra Klein Interview
was posted,
detailing evidence that LLMs are
already reaching a point of
diminishing returns.
We're throwing more and more
resources at LLMs,
but the benchmarks just aren't
going up commensurately.
That article linked to this paper
from last month showing that since
2012,
"the median doubling time for
effective compute is 8.4 months
with a 95% confidence interval
4.5 to 14.3 months."
This graph from that paper shows
that basically every 8 months or so,
models are getting roughly twice as
efficient.
Note, this graph is log scale.
So a straight line down and to the
right indicates an exponential
reduction.
This is a graph I made of the same
data.
It's the same data as the previous graph
Only I'm not using log scale.
I'm using smaller dots so you can
see better.
And I put a trend curve in there
that was just added by Apple's
"Numbers" spreadsheet.
So, answering a question in 2023
should have taken roughly half as
many resources
as it took to answer that same
question in 2022.
And roughly 1/16th as many
resources as four years earlier in
2019.
But here's the graph from another
resource.
It shows that since mid 2019,
improvements on the multi-task
language understanding benchmark
have been linear, not exponential.
So we decreased the amount of
effort we're expending and that
decrease is exponential.
But the benchmark improvement that
we're getting for it is only linear.
And that disconnect between the
exponential and the linear
suggests there might be some other
factor that's limiting our
efficiency.
And we might actually know what it
is. So Google did an experiment in 2022
with a model called Chinchilla.
There's a paper presented at the
World AI alignment forum that
breaks down the implications of
that. Chinchilla found there's a sweet
spot between the amount of data a
model is trained on,
the number of tokens, and the
number of parameters it has.
It seems to show that past this
optimal ratio, if you throw more
compute at the same size data set,
it just doesn't increase
functionality, it just wastes
resources.
So if the Chinchilla paper is right,
then the number of tokens is a
limiting factor.
And if that's true, how close are
we to running out?
Last month, the 2024 Stanford
annual AI index report came out.
There was a four page special
section called "Will models run out of data?"
And it started on page 26.
Of course, no mention of running
out of data makes it to any of the
key takeaways or summary,
or anything that you would know if
you just read the first few pages.
And I wonder why that is.
Actually, I don't wonder why that
is at all.
That section references a paper
from Epoch AI that estimates we
could run out of high quality
language stock in the next year, if
we haven't already.
In addition, there's a crisis of
data quality.
Several papers have shown an effect
called "Model Collapse,"
which happens when a model is
trained on LLM generated data.
Over time it becomes like having a
photocopy of a photocopy of a
photocopy of a photocopy-
if anybody on YouTube even knows
what a photocopy is anymore.
But it causes the data to become
more and more and more similar,
and all of the interesting stuff to go away.
And eventually just kind of converges
on boring.
There's a graphic that kind of
displays that.
Now, we don't know for sure yet
that the data problem is going to
be the limiting factor,
but evidence points to that.
And we should find out before too long.
Now, we're still going to get a GPT-5
one of these days.
But if this evidence is correct, it's
not going to be the dramatic
increase that the hype has led
many to believe. Closer to home,
talking to the developers out there,
The "running out of data problem" is
even worse for code generation.
If training data is the bottleneck,
the improvements in code generation
are going to slow down even more.
Because there are orders of
magnitude more English tokens
available on the internet,
then there are available tokens
from human written source code, right?
And there's already evidence that
code generation isn't all that it's
claimed to be.
So GitHub reduced a study in 2022
that showed that programmers were
55% faster
when they were using AI, which is
cool. But with two more years of GitHub
copilot data, a new paper shows
that AI generated code
"creates a downward pressure on code
quality."
That paper, which is from GitClear,
shows that AI generated code has a
much higher likelihood
of being reverted or rewritten
within the first two weeks.
It's a metric they call "code churn."
There are other quality metrics
that are a problem,
copy/paste, repeating that kind of
stuff.
But the code churn one is really bad.
So: just a quick recap.
1) We know that growth must be limited
on something.
2) The Chinchilla experiments imply
that something might be data.
3) High quality data is running out or
it has already.
4) There's evidence that general LLM
progress has dramatically slowed recently,
increasing only linearly since 2020.
5) Code generation has orders of
magnitude less data that requires
more correctness than English.
And 6) we're already seeing evidence
of AI reducing general code quality.
And if all that turns out to be
correct, in many ways, that's
really good for business.
And it's good for us, the
developers.
So we've been living in a world
where no one really wanted to
invest in building
software products around LLMs that
would apply to business problems.
Because everybody believed that the
LLMs would be twice as good next
year. So they might be able to handle the
business problem on their own
and you would have wasted your time and money.
Once it becomes clear that the
growth is becoming just incremental,
then it will be time to apply LLMs
to real world problems.
And that means writing software to
interface between the LLMs and the
business issues.
Based on the code quality data from
GitClear, we're still going to need
plenty of programmers.
Those programmers might be able to
create code a little faster with AI,
but since the AI generated code
seems to require a lot of rework,
it might be fine for one shot.
"Let's get the answer and throw the
code away" problems,
but not for code that needs to be
maintained long-term.
So I'm guessing the next few years,
maybe three to five,
there are going to be a lot of
people that are going to be taking
the LLM models
and they're going to be wrapping
non-AI functionality around them
to specialize them for specific use
cases.
This is going to be a thing kind of
the way Devin,
or at least the open source
versions like Devika and OpenDevin
that work about as well as Devin,
but we can actually look under the
hood and see how they work.
Those systems are on LLM,
but with a browser module plugged
in and a terminal module and a
planning module
and a reporting module and some
expert systems around it
that kind of tell it what to do.
That kind of system can be applied
not just to,
you know, "let's pretend to be an AI
engineer,"
but it can also be applied to all
kinds of business problems.
The same way that 2008 to 2012 or
2014 something in there
was a lot about taking existing
stuff like websites
and then converting them to become
mobile apps.
2024 to 2027 or 2029, something like that
might be taking a lot of existing
things,
services, software, apps, websites,
that kind of thing
and reforming them around the
capabilities of LLMs.
It feels to me now a lot like 2008.
The first Apple App Store had just
been announced
and we thought we could make money
from it, but we weren't quite sure.
And the economy was kind of illiquid
because the mortgage bubble had
just popped
and we were in the beginnings of
the Great Recession.
At the moment, it seems like generative
AI is going to be a money maker,
but we're not quite sure and the
economy is kind of illiquid
because interest rates are still really high.
If when that settles, we could just
well be off to the races.
Am I right?
No idea, but we should know soon.
We've heard a lot recently about
how chat GPT-5 is supposed to be a
huge leap
and it's supposed to be released as
early as this summer.
There's a link to that in the notes
below. If we don't get an exponentially
better chat GPT-5 this year,
then hopefully people will start
recognizing that it's time to
change from "let's wait for better LLMs" to
"let's take the current LLMs
and let's start applying it to real
world problems."
And if we can do that, that will be
a phase two that we can live with.
Remember, the internet is full of bugs.
Anyone who tells you different is
probably trying to sell you some
stupid AI service. Let's be careful
out there.
Browse More Related Video
5.0 / 5 (0 votes)