AI isn't gonna keep improving

Theo - t3β€€gg
7 Aug 202422:10

Summary

TLDRThe video script discusses the concept of hitting an 'AI Plateau,' suggesting that despite advancements in AI models like ChatGPT and Claude, we may be nearing a limit in performance improvements due to physical constraints, similar to the end of Moore's Law. It explores the idea that future progress in AI might not come from incremental improvements but from entirely new architectures and approaches. The script also touches on the environmental impact of AI and the need for a shift in focus towards more general AI methods that can scale effectively.

Takeaways

  • πŸš€ The AI industry might be reaching a plateau in terms of performance improvements, despite the rapid advancements in AI models like GPT-2 to GPT-4 and the emergence of open-source models.
  • πŸ“‰ The script references Moore's Law, which predicted the doubling of transistors on a microchip every two years, but acknowledges that this trend has slowed due to physical limitations in manufacturing processes.
  • 🏭 The manufacturing complexity for advanced chips is so high that only a few companies like TSMC can produce them, leading to a concentration in the industry and affecting the pace of innovation.
  • πŸ“Š A study suggests that the performance growth of processors has flattened, with the rate of improvement decreasing significantly from the past decades.
  • πŸ’‘ The script discusses the potential of different architectures, like Apple's approach with specialized cores for efficiency and performance, as a way to circumvent the limitations of traditional CPUs.
  • 🎯 There's a call for a shift in focus from just increasing compute power to developing more general AI methods that can scale effectively, such as search and learning algorithms.
  • 🌐 The environmental impact and cost of AI research, which often requires massive amounts of compute resources, are becoming concerns as the field matures.
  • πŸ“ˆ Despite the plateau in performance, there's potential for AI to grow in new areas, like video generation, that are not yet fully explored and could provide the next wave of advancements.
  • πŸ” The script highlights the need for new benchmarks and methods to measure AI progress, as current models struggle with tasks that require general intelligence and adaptability.
  • 🚧 The future of AI may lie in hybrid models that combine the strengths of both handcrafted algorithms and AI, moving beyond the limitations of large language models (LLMs).

Q & A

  • What is the AI Plateau mentioned in the script?

    -The AI Plateau refers to the perceived slowing down of significant advancements in AI capabilities, despite the continuous release of new models. It suggests that we might be reaching a limit in how much AI can improve within the current paradigm.

  • What is Moore's Law and how does it relate to the script's discussion on AI?

    -Moore's Law is the observation that the number of transistors on a microchip doubles approximately every two years, leading to increased performance at a lower cost. The script discusses how this law has seemingly plateaued in recent years, which parallels the perceived plateau in AI improvements.

  • What is the significance of the flatlining in the performance graph presented in the script?

    -The flatlining in the performance graph indicates that the rate of improvement in AI performance has slowed down, suggesting that we may be reaching a physical or theoretical limit in how much further AI can advance using current technologies and methods.

  • What does the script suggest about the future of AI if we continue to rely on increasing compute power?

    -The script suggests that relying solely on increasing compute power may not yield the same exponential improvements in AI as it has in the past. Instead, it implies that innovative architectural changes and new methods will be necessary to continue AI's advancement.

  • What is the script's view on the comparison between CPUs and AI models in terms of hitting a plateau?

    -The script draws a parallel between CPUs, which have seen a slowdown in performance improvements due to physical limitations, and AI models, which seem to be reaching a similar plateau in terms of the improvements seen with each new generation.

  • What is the role of TSMC in the context of chip manufacturing as discussed in the script?

    -TSMC (Taiwan Semiconductor Manufacturing Company) is highlighted as one of the few companies capable of manufacturing chips at the smallest, most efficient scales. The script notes that reliance on TSMC by major tech companies signifies a bottleneck in innovation due to the complexity of these manufacturing processes.

  • What does the script imply about the relationship between the number of transistors and AI performance?

    -The script implies that historically, an increase in the number of transistors has been a measure of progress in processors and by extension, AI performance. However, it suggests that this measure may no longer be as indicative of progress due to the plateau in AI improvements.

  • What alternative to Moore's Law is proposed in the script, and how does it compare?

    -The script mentions an alternative law proposed in a study that attempts to account for theζ”ΎηΌ“ in performance gains. This alternative law is shown as a flatter line compared to Moore's Law, indicating a slower rate of improvement in recent years.

  • How does the script discuss the potential of different architectures for AI advancement?

    -The script suggests that different architectures, such as those being researched by companies like IBM for analog AI chips, could potentially offer new paths for AI advancement that bypass the limitations of current models and hardware.

  • What does the script suggest about the future of AI research and development?

    -The script suggests that the future of AI research and development may need to focus on creating and iterating on entirely new architectures and methods, rather than simply increasing the compute power or refining existing models.

  • What is the significance of the ARC AI prize mentioned in the script?

    -The ARC AI prize is significant as it represents a public competition aimed at encouraging new ideas and solutions in AI that can efficiently acquire new skills and solve open-ended problems, which the script argues is a more accurate representation of general intelligence.

Outlines

00:00

πŸš€ AI Development and the End of Moore's Law

The paragraph discusses the current state of AI development and draws a parallel to Moore's Law, which predicted the doubling of transistors on a microchip every two years. It highlights the rapid advancements in AI models like GPT-2 to GPT-4 and the emergence of Claude and open-source models like llama and mistol. The speaker then delves into the concept of Moore's Law, its historical significance, and how it led to significant performance improvements in computers. However, the paragraph notes that we've hit a wall in physics, making it difficult to continue this pace of improvement. The speaker points out that companies like TSMC are now the only ones capable of manufacturing chips at the scale required for the most efficient processors, and even they are not meeting the goals set by Moore's Law. The paragraph concludes with a discussion of how the performance gains in AI and computing have started to plateau, similar to the trend seen in Moore's Law.

05:00

πŸ” The Plateau in AI and CPU Performance

This paragraph explores the idea that AI and CPU performance improvements may be reaching a plateau. It discusses how companies like Nvidia admit to a slowdown in CPU performance gains, moving from significant yearly improvements to much smaller ones. The speaker uses their own experience with computer hardware to illustrate that the performance difference between a 2020 processor and a 2024 one is not as vast as historical improvements. The paragraph also touches on the benefits of this plateau, such as being able to purchase older processors at a lower cost without a significant loss in performance. It then transitions into a discussion about the potential for new architectures to break through the current limitations, mentioning IBM's research into analog AI chips as an example. The speaker also reflects on the release of new AI models like 'mol large 2' and the thoughts of AI researchers on the future of AI development, suggesting that the focus should shift from incremental improvements to entirely new architectures.

10:02

🌐 The Future of AI and Compute

The paragraph delves into the future of AI and the limitations of traditional CPU manufacturing, suggesting that a new architecture is necessary for further improvements. It discusses the potential of GPUs for AI tasks and contrasts it with the exploration of new AI chip architectures by companies like IBM. The speaker also mentions the release of 'mol large 2' and its capabilities, then compares the development of AI models to the evolution of CPUs, noting a similar trend of diminishing returns on performance improvements. The paragraph includes a tweet from Yan LeCun, emphasizing the need to look beyond large language models (LLMs) for the next generation of AI systems. It concludes with a discussion of Apple's approach to innovation in processor design and the potential for AI to follow a similar path, focusing on specialized tasks and efficiency rather than general performance gains.

15:03

πŸ“ˆ The Diminishing Returns of Compute in AI

This paragraph discusses the diminishing returns on compute power in AI, referencing a 2019 blog post by AI pioneer Rich Sutton. It contrasts the historical approach of encoding expert knowledge into AI systems with the more general methods that have proven more effective. The speaker points out that as we reach the limits of traditional CPU architecture, the focus should shift to more general AI methods that can scale, such as search and learning. The paragraph also touches on the environmental impact of AI's increasing compute demands and the concentration of AI research in large corporations that can afford the necessary resources. It concludes with a call for a change in approach, suggesting that the future of AI lies in finding new methods and architectures rather than simply increasing compute power.

20:05

🚧 The Stalled Progress of Artificial General Intelligence (AGI)

The final paragraph addresses the stagnation in progress towards Artificial General Intelligence (AGI) and the misconceptions surrounding it. It critiques the current benchmarks used to measure AI progress, arguing that they do not accurately reflect general intelligence. The speaker agrees with the definition of AGI as a system capable of efficiently acquiring new skills and solving open-ended problems, rather than just automating economically valuable work. The paragraph highlights the need for new ideas and benchmarks to measure true AGI progress, pointing to initiatives like the Arc Prize, which challenges the AI community to develop solutions that can beat open-source benchmarks. It concludes with a reflection on the plateau in AI development and the need to move beyond current models like LLMs to achieve further advancements in AI capabilities.

Mindmap

Keywords

πŸ’‘AI Plateau

The term 'AI Plateau' refers to a perceived slowdown in the rapid advancements of artificial intelligence capabilities. In the video, the speaker discusses the possibility that AI development might be reaching a point where significant improvements are becoming less frequent or less impactful. This concept is central to the video's theme, as it explores whether the pace of AI innovation is slowing down, despite the continuous introduction of new models and technologies.

πŸ’‘Moore's Law

Moore's Law is the observation that the number of transistors on a microchip doubles approximately every two years, which historically led to a corresponding increase in computational power. In the video, the speaker uses Moore's Law as an analogy to discuss the potential slowing down of AI progress. The script mentions how Moore's Law has faced physical limitations, which parallels the discussion on AI reaching a plateau.

πŸ’‘Transistors

Transistors are semiconductor devices used to amplify or switch electronic signals and electrical power. They are a fundamental component of modern electronic devices. The video script discusses how the doubling of transistors on microchips, as per Moore's Law, has historically driven performance improvements in computers. The speaker reflects on the current state of transistor technology in relation to AI's potential plateau.

πŸ’‘Manufacturing Processes

Manufacturing processes refer to the methods and techniques used to produce goods or components on a large scale. In the context of the video, the speaker mentions that the complexity of modern chip manufacturing processes has become a limiting factor in the continued advancement of Moore's Law. This ties into the broader discussion about the physical and practical barriers that could also be affecting AI development.

πŸ’‘Specialized Chips

Specialized chips are designed for specific tasks, as opposed to general-purpose processors. The video script points out that companies like Apple have embedded specialized chips in their processors to optimize certain functions, such as video encoding. This concept is used to draw a parallel with AI, suggesting that specialized architectures might be necessary to overcome the plateau in AI performance.

πŸ’‘LLMs (Large Language Models)

Large Language Models (LLMs) are AI models trained on vast amounts of text data to understand and generate human-like text. The video discusses how LLMs might be reaching their limits in terms of improvements, which is a central concern as it questions the future of AI progress. The script mentions models like GPT-2, GPT-3, and Claude, indicating a potential plateau in the performance gains of these models.

πŸ’‘General AI Methods

General AI methods refer to approaches that aim to create AI systems capable of handling a wide range of tasks without needing specific programming for each one. The video script suggests that focusing on general AI methods, rather than trying to encode expert knowledge into AI systems, might be a more fruitful path for AI development. This aligns with the video's exploration of how to overcome the AI plateau.

πŸ’‘Hype Cycle

A Hype Cycle is a graphical representation of the maturity and adoption of technologies, which typically includes a period of overenthusiasm followed by a period of disillusionment. The video uses the concept of the Hype Cycle to discuss the current state of AI, suggesting that AI might be entering a 'trough of disillusionment' where the reality of AI capabilities does not match the initial hype.

πŸ’‘AGI (Artificial General Intelligence)

Artificial General Intelligence refers to AI systems that possess the ability to understand, learn, and apply knowledge across a broad range of tasks at a human level. The video script contrasts AGI with the current state of AI, which is more narrowly focused on specific tasks. The speaker discusses the challenges in achieving AGI and how the current trajectory of AI development might not lead to it.

πŸ’‘Neuromorphic Computing

Neuromorphic computing is an approach to designing computer systems that樑仿 the human brain's structure and function. The video script briefly touches on this concept as a potential alternative to traditional computing architectures. It suggests that neuromorphic computing could be a way to overcome the limitations faced by current AI models and hardware.

Highlights

The discussion suggests we might be reaching an AI Plateau, despite advancements in AI models like Chat GPT and Claude.

Mo's Law, which predicted the doubling of transistors on microchips every 2 years, is no longer holding true due to physical limitations.

The manufacturing complexity of microchips has increased, with only a few companies like TSMC capable of producing the smallest, most efficient chips.

A study shows a flatlining of performance improvements in recent years, contrary to the steady growth seen from the 70s to the 2000s.

Nvidia's own data indicates a significant plateau in CPU performance improvements.

The value of older processors has increased as their performance remains competitive with newer, more expensive models.

The discussion points out that GPUs, while not the best for AI tasks, can still see performance improvements by adding more chips to their architecture.

IBM is researching analog AI chips, potentially offering a new architecture for AI that could surpass the limitations of traditional CPUs and GPUs.

Mol, an open-source AI company, released Mol Large 2, a model that significantly improves on code generation, mathematics, and reasoning.

The head of AI at Meta suggests that students interested in next-gen AI systems should look beyond Large Language Models (LLMs).

Apple's strategy of integrating specialized chips for tasks like video encoding shows a path forward for performance improvements.

The discussion predicts a future where AI models become commodities, with performance differences between models diminishing.

The transcript highlights the need for new AI methods that can scale, such as search and learning, rather than trying to encode human knowledge into AI systems.

The fast inverse square root algorithm used in Doom is cited as an example of how clever hacks, rather than raw compute power, can lead to significant advancements.

The Gartner Hype Cycle for AI shows that AI is currently in the 'trough of disillusionment', indicating that expectations may be outpacing realistic progress.

The discussion concludes that the future of AI may not lie in making existing models faster, but in developing entirely new types of AI architectures.

The Arc prize is introduced as a competition to encourage new ideas in AI, highlighting the need for AI to efficiently acquire new skills, not just demonstrate existing ones.

Transcripts

play00:00

I got a Hut take for y'all today I think

play00:02

we might have hit an AI Plateau well we

play00:04

haven't hit it yet I think we're getting

play00:06

there fast what do I mean this can't be

play00:09

possible right when we look at all the

play00:10

new models and all the crazy things you

play00:12

can do with them the improvements from

play00:13

chat GPT 2 to 3 to 3.5 to four with

play00:17

Claude coming out of nowhere and being

play00:19

really good with open source models like

play00:21

llama and mistol we can't be out a

play00:23

plateau that's crazy right well there's

play00:26

a lot of things that have these patterns

play00:28

and I want to start with with a bit of

play00:30

an interesting tangent I want to talk

play00:32

about Mo's law if you're not familiar

play00:35

Mo's law is an old concept from the

play00:37

programming world it's a law created

play00:40

it's not a real law it was some

play00:42

speculation from a Dev and a cardware

play00:45

Enthusiast back in the like what' 7s and

play00:47

he noticed how fast things were

play00:49

improving in terms of performance his

play00:51

observation is that the number of

play00:53

transistors on a microchip roughly

play00:55

doubles every 2 years and the cost is

play00:58

haved over the same time time frame so

play01:00

if you had a chip let's say that had

play01:03

four transistors on it within 2 years

play01:06

with advancements on how we were

play01:07

manufacturing these chips we' get it up

play01:09

to eight transistors and it would be

play01:10

cheaper and we did that over and over

play01:13

again and saw massive growth in

play01:15

performance of our machines it was

play01:17

actually realistic for a bit if you took

play01:18

a computer that you went and bought at

play01:20

Best Buy and then waited 2 to 4 years

play01:23

bought a new one the processor could be

play01:25

two times faster in a very short window

play01:28

and it's crazy to think because now like

play01:30

if you buy an Apple M1 computer from

play01:32

2020 and a brand new top of the line

play01:35

machine from 2024 the performance

play01:37

difference between those things is not

play01:39

that big but back in the day we saw

play01:42

insane improvements year over year we've

play01:45

started to hit walls with the physics

play01:47

though we realized you can only get so

play01:49

small for the Silicon before you start

play01:50

running into manufacturing problems now

play01:53

the manufacturing processes for a lot of

play01:55

these things are so complex there's only

play01:56

one or two companies that can even do it

play01:58

at the small size that we're expected to

play02:01

hit now if you want to make the most

play02:03

efficient chips possible that fit as

play02:05

many transistors into your die as

play02:06

possible you have to do that through a

play02:08

company like tsmc because they're one of

play02:10

the only places in the world that can

play02:12

manufacture that way and companies like

play02:14

Intel apple and Nvidia all rely on that

play02:16

one manufacturer tsmc that is still not

play02:19

hitting the mors law goals but they're

play02:21

the only ones even vaguely close we have

play02:23

effectively accepted that Mo's law due

play02:26

to physics is no longer true here we see

play02:29

from a study where somebody proposed a

play02:31

new alternative to Mor's law the blue

play02:33

line is Mora's law the orange line is

play02:35

their alternative law but the green is

play02:37

performance and you'll notice things are

play02:39

flatlining pretty hard up here when

play02:42

before they were going up at a

play02:44

relatively steady rate from the' 70s to

play02:47

the 2000s even like the 2015 things were

play02:49

going up pretty steadily but we've

play02:51

started to see a flatline and the harsh

play02:53

reality is that from 2020 onwards it's

play02:55

gotten worse not better that's

play02:57

terrifying obviously there are companies

play02:59

that disagree here is a diagram from

play03:01

Nvidia where they actually admit that

play03:03

the CPU performance we're seeing has

play03:06

plateaued pretty hard where we went from

play03:07

getting multiple giant wins to around

play03:10

1.5x per year down to like 1.1x per year

play03:14

I'm streaming right now using a PC with

play03:17

Hardware from when did the 10700k

play03:19

release date yeah the processor of my

play03:21

desktops from 2020 it's not even one of

play03:23

the high-end ones and it doesn't perform

play03:26

much worse than the top spec one I just

play03:28

bought a new computer in different room

play03:30

performance wins year-over-year have

play03:32

gotten way worse even though the

play03:34

technology is still advancing we're

play03:36

still making big wins in the

play03:37

manufacturing Intel and AMD are as

play03:39

competitive as ever we're still not

play03:41

seeing massive wins anymore this does

play03:43

have benefits though like if you buy an

play03:44

old processor for way cheaper you still

play03:47

get really good performance you can buy

play03:48

a MacBook Air M1 from like Walmart used

play03:51

for

play03:52

$400 and have great performance on a

play03:55

machine that I paid 2 Grand for not long

play03:57

ago and those things are great I know

play03:59

people are pitching all about this isn't

play04:01

about performance as transistor count

play04:03

we've used the number of transistors as

play04:05

a way to measure the progress of

play04:07

processors and historically if you had a

play04:09

big win in the manufacturing process if

play04:11

you made the dies go from 10 nanometer

play04:14

transistors down to like 4 nanometers

play04:17

you would see massive winds this is

play04:20

rough obviously nvidia's doing their

play04:22

thing here claiming that GPU compute

play04:24

performance continue to grow the fun

play04:26

thing with graphics cards is they don't

play04:28

have the same model with cores with the

play04:30

complexity of sharing things between

play04:32

them because the cores in a GPU are

play04:34

significantly Dumber it's a different

play04:36

abstraction which means you can just

play04:38

staple more and more gpus onto each

play04:40

other to improve performance you might

play04:42

end up with a GPU no longer being a tiny

play04:44

little thing you slide into your

play04:45

computer and now it's a giant room full

play04:48

of things it's still one GPU because of

play04:50

the way the chips are architected but

play04:52

the only way Nvidia is going to see this

play04:53

type of performance wins continuously is

play04:55

if they just add more and more chips to

play04:58

their actual architecture it's kind of

play05:00

cheating but the reality is that the

play05:02

tech that we use today which is

play05:04

traditional CPU manufacturing we have

play05:06

hit a physics wall for how much

play05:08

improvement we can see and the only way

play05:10

to get out of it theoretically is an

play05:12

entirely different architecture and way

play05:13

of building compute things that rely on

play05:16

this model will not benefit from these

play05:18

advancements as much but anything that

play05:20

can work with this model could

play05:21

theoretically continue to see growth on

play05:24

that note gpus are not necessarily the

play05:26

best way to do AI stuff just a quick tie

play05:28

in I think it's interesting that IBM is

play05:30

researching analog AI chips similar to

play05:33

the stuff that we saw with Bitcoin back

play05:34

in the day where before you would mine

play05:35

Bitcoin with a GPU before as6 were made

play05:38

which were specialized computers just to

play05:40

make Bitcoin mining as efficient as

play05:42

possible we're starting to see some

play05:44

research into doing this for AI as well

play05:46

which is exciting potentially gpus

play05:48

aren't the right architecture for AI and

play05:50

we can see advancements and these chips

play05:52

once they work will probably Advance

play05:54

significantly faster than CPUs or gpus

play05:57

so why am I talking about all of this

play05:58

when I'm talking about models hell why

play06:00

am I even talking about models I saw a

play06:02

very interesting post for mistol mol is

play06:04

one of the two big open-source AI

play06:07

businesses it's them and funny enough

play06:09

meta So Meta faceb are working on llama

play06:13

which is their open- Source model it's

play06:14

technically not open source because you

play06:15

can't run the code yourself but you get

play06:17

the model and you can use it however you

play06:19

want mol is doing the same thing and

play06:21

they just released mol large 2 the new

play06:23

generation of their Flagship model

play06:25

compared to pror mist large 2 is

play06:27

significantly more capable in code gen

play06:28

mathematics and reason

play06:30

it also has stronger multi language

play06:32

stuff and function calling stuff cool

play06:34

the key here is large enough this made

play06:36

me start thinking a lot about the

play06:39

plateau that we're likely reaching and

play06:41

I'm not the only one thinking about this

play06:43

here's a tweet from Yan laon who is the

play06:46

head of AI and llm research at Facebook

play06:49

and meta he's one of the ones most

play06:50

directly responsible for the creation of

play06:52

llama and he said if you're a student

play06:54

interested in building the next

play06:55

generation of AI systems don't work on

play06:58

llms what llms are how all of these

play07:02

things work well let's rephrase this if

play07:05

you're a student interested in building

play07:07

the next generation of computers don't

play07:09

work on CPUs or don't work at Intel it's

play07:13

obvious when you look at the

play07:15

numbers that iteration on CPUs is not

play07:18

going to be where we see massive

play07:19

performance wins and massive computation

play07:21

wins going forward different

play07:23

architectures will have to be invented

play07:25

and iterated on for us to see meaningful

play07:28

improvements in performance

play07:29

year-over-year Apple does this in all

play07:31

sorts of interesting ways one of the

play07:33

crazy things Apple invented was the idea

play07:34

of having different cores with different

play07:36

roles so you had efficiency cores that

play07:39

are trying to use as little power as

play07:41

possible to do simple things and then

play07:43

performance cores that use way more

play07:44

power but are quite a bit more powerful

play07:46

they also started embedding things like

play07:48

video processing and video encoding

play07:50

chips that just do h264 h265 decoding

play07:54

and encoding way more efficiently Apple

play07:56

started adding things to their

play07:58

processors that weren't just CP and also

play08:00

weren't just gpus in order to optimize

play08:02

specific things so they could keep

play08:03

seeing massive performance wins I think

play08:05

this is the future for AI as well and I

play08:07

have a reason I have a very similar

play08:09

chart to this one notice how much

play08:12

smaller the winds are getting Claude saw

play08:14

another solid one with Sonet in 3.5 but

play08:17

the Gap from gp4 Turbo to Turbo 2 to 40

play08:23

is a lot smaller than from four turbo to

play08:26

four it is way smaller than from 4 to

play08:29

three

play08:30

CLA one to two to three some massive

play08:32

winds but those are starting to slow

play08:34

down as well we're seeing a plateau of

play08:37

the quality of the responses these

play08:39

models are generating it is not like

play08:41

going from 4 to 4 Turbo to 40 was less

play08:46

work than going from 3.5 to 4 if

play08:49

anything there is more money more time

play08:51

more gpus more effort going into these

play08:55

bumps and the actual bump we're seeing

play08:57

is going down so each of these ations

play08:59

takes more money more time more compute

play09:02

more energy and the results are not as

play09:04

big as they used to be I know a lot of

play09:06

people are saying the AI future is going

play09:08

to Doom us all because the AI keep

play09:10

getting so much smarter eventually

play09:11

they're going to be smarter than all of

play09:12

us I don't see that here I don't see

play09:14

that here at all what I see is a

play09:16

theoretical ceiling that we're getting

play09:18

very close to and a closing of the Gap

play09:21

in performance between these different

play09:22

models more and more these options are

play09:24

going to become Commodities the same way

play09:26

you have like 15 different computer

play09:28

manufacturers is making the same Windows

play09:30

laptop that has roughly the exact same

play09:32

performance we're starting to see that

play09:34

here too I have to read a LinkedIn post

play09:36

which I know pain cringe miserable so

play09:39

I'm going to soften the blow with an XK

play09:41

CD first this one was linked in chat and

play09:43

I thought it was really funny number of

play09:44

computers created is going up a lot

play09:47

year-over-year in fact I think it's

play09:48

going up exponentially but the number

play09:50

destroyed by hurling them into Jupiter

play09:53

it's a much smaller number it's only

play09:54

three so far NASA needs to pick up the

play09:57

pace if they ever want to finish the job

play10:00

yeah they ever want to catch up they got

play10:02

work to do it's a fun way to think about

play10:04

data in these ways the compute changes

play10:06

over time anyways the bitter lesson

play10:08

famous 2019 blog post claims that

play10:10

General AI methods using massive compute

play10:12

are the most effective nvidia's soaring

play10:14

stock price supports the thesis but is

play10:16

this approach sustainable what are the

play10:18

Alternatives in the original blog post

play10:20

AI Pioneer Rich suon makes the following

play10:22

observations over the last 70 years AI

play10:25

researchers have repeatedly made the

play10:26

same mistakes of trying to bake human

play10:28

knowledge in into AI systems only to be

play10:31

eventually outperformed by a more

play10:32

General method using Brute Force compute

play10:34

this is funny cuz we're seeing the

play10:36

opposite in processors now where

play10:37

processors were trying to just increase

play10:39

how many transistors were in them and

play10:40

how fast they could solve problems and

play10:42

now we're seeing specialized chips being

play10:44

embedded in the processors that do

play10:45

certain things way better some prominent

play10:47

examples of what was happening before

play10:49

with models were custom chess and go

play10:51

engines versus deep blue and Alpha zero

play10:53

this was a fun one the go not the

play10:55

programming language the board game was

play10:58

really hard for software developers to

play11:00

solve because the game has so many

play11:01

different potentials you can't just

play11:03

encode all of them and then figure out

play11:05

which is optimal and we learned after

play11:07

trying to make custom engines for these

play11:08

things that AI Solutions like deep blue

play11:11

and Alpha zero that were more generic

play11:13

more traditional AI did a better job

play11:16

than the custom code we wrote it took

play11:18

hilariously more compute to do it like

play11:20

hundreds of times more but the results

play11:22

were always better the main reasons for

play11:24

this are the following building an

play11:26

expert knowledge is personally

play11:27

satisfying for the experts and often

play11:29

useful in the short term it's a very

play11:30

good point if you have experts that know

play11:33

this game really well or Know video

play11:34

encoding really well they can Flex their

play11:36

knowledge feel useful and see an

play11:38

immediate result all of which feels good

play11:40

on top of that researchers tend to think

play11:42

in terms of fixed availability compute

play11:44

when it's actually increasing daily this

play11:46

is also a fair point yes the amount that

play11:49

a given processor improves

play11:50

year-over-year has gone down but the

play11:52

amount of processors you have available

play11:54

is going up especially with Nvidia going

play11:55

insane with their manufacturing Sun

play11:57

concludes that we should focus on on

play11:59

General AI methods that can continue to

play12:01

scale most notably search and learning

play12:03

we should stop trying to bake the

play12:04

contents of the human mind into AI

play12:06

systems as they are too complex and

play12:08

instead focus on finding metame methods

play12:10

that can capture this complexity

play12:12

themselves some of the important things

play12:13

that people pointed out are that mors

play12:14

law is fading architecture of our most

play12:16

successful learning models were actually

play12:18

carefully handcrafted by humans like

play12:20

Transformers comets lstms Etc and for

play12:23

General computation problems like

play12:25

integer factorization progress based on

play12:27

human understanding was often far

play12:29

greater than progress according to Mo's

play12:31

law another great point we're still

play12:33

optimizing algorithms in ways that we

play12:35

never would have imagined possible

play12:36

before one that I love to cite here is

play12:39

the fast inverse square root which was

play12:41

used in Doom in order to handle lighting

play12:43

Reflections and rendering because

play12:46

knowing the inverse square root lets you

play12:48

know how far something is relative to

play12:50

multiple points and it's used a ton for

play12:53

doing math in games previously getting

play12:56

this number getting the inverse square

play12:58

root took a lot of compute and as such

play13:00

the idea of 3D games was basically

play13:02

impossible but someone discovered a math

play13:05

hack they didn't even understand at the

play13:07

time the fast inverse square root

play13:09

function that was in this code base had

play13:12

evil floating Point bit level hacking

play13:14

it's the comment here this weird bit

play13:16

shift where they take this random

play13:18

hardcoded value subtract the bit shifted

play13:21

long long representation of Y comment

play13:24

what the next comment first

play13:26

iteration where we multiply it by three

play13:29

haves and this function here and we

play13:31

could run it again if we wanted to be

play13:32

more accurate 3D Graphics program is

play13:34

supposed perform millions of these

play13:35

calculations every second to simulate

play13:37

lighting when code was developed in the

play13:39

early 90s most floating Point processing

play13:41

power lagged the speed of integer

play13:43

processing so yeah if you were trying to

play13:44

do this with floating points which

play13:45

everyone was it would eat your processor

play13:48

the advantages in speed in this fast

play13:49

function came from treating the 32-bit

play13:51

floating Point word as an integer then

play13:53

subtracting it from a magic constant

play13:55

this integer subtraction in bit shift

play13:57

resulted in a bit pattern which when

play13:59

redefined as a floating Point number is

play14:01

a rough approximation of the inverse

play14:03

square root of that number this function

play14:06

this crazy math hack allowed us to add

play14:09

Dynamic lighting to 3D games this wasn't

play14:11

something we got because processors were

play14:13

way more powerful it was a clever hack

play14:16

that allowed us to invent a new genre of

play14:17

game effectively pretty nuts pretty

play14:20

crazy stuff that this enabled as much as

play14:23

it enabled because somebody came up with

play14:24

a clever math hack that's not even that

play14:26

accurate it's just accurate enough so as

play14:29

is said here the wins we saw on compute

play14:33

the revolution in 3D games that we saw

play14:36

after that code came out and people

play14:38

started using the engine that wasn't

play14:39

because gpus or CPUs got way better it's

play14:41

because our understanding of how to use

play14:43

them to do these specific things got

play14:45

better and we saw massive wins not

play14:47

because the CPU got way faster but

play14:50

because we found smarter ways to use it

play14:52

and I think this is going to be true now

play14:54

more than ever in the same way we're

play14:56

reaching the cap of how much you can do

play14:58

with a C CPU we're reaching the cap of

play15:00

how much you can do with an llm

play15:03

companies like open AI show that

play15:04

focusing on more compute may still lead

play15:07

to massive gains as compute power

play15:09

despite the warning of Mo's law

play15:10

continues to increase several orders of

play15:12

magnitude over the next decades don't

play15:14

necessarily agree currently the hype is

play15:16

definitely outperforming Mo's law see

play15:18

the image below as a result AI is at

play15:20

risk of creating a deep environmental

play15:22

footprint and research is increasingly

play15:24

restricted to large corporations that

play15:25

can afford to pay for the compute it's a

play15:27

bitter lesson of the last year yeah this

play15:30

is a fun one Mo's law versus AI

play15:32

popularity but again Moors law is

play15:34

plateauing and AI is now way more

play15:36

popular than what Mor's law enables so

play15:38

we're just spending billions on gpus

play15:41

found a surprisingly good chart from

play15:43

Gartner believe it or

play15:45

not the hype cycle for artificial

play15:47

intelligence hype Cycles are very

play15:50

common this particular chart the startup

play15:54

hype cycle an idea happens we have a

play15:57

spike of excitement first Valley of

play15:59

Death happens where you realize this is

play16:01

hard you go hard you go really

play16:03

hard you get inflated expectations

play16:06

irrational exuberance and then pain you

play16:09

end up in this thing called the trough

play16:10

of disillusionment where you're unsure

play16:12

of everything then the slow slope of

play16:14

reality as you figure out what you're

play16:15

actually capable of and what your

play16:17

product company Vision whatever it is

play16:19

actually could resolve to and then you

play16:21

hit the real company and real value so

play16:23

back to the Gartner chart it's funny

play16:25

they have all these examples in here

play16:27

first principles AI multi M agent

play16:29

systems neuros symbolic AI more and more

play16:32

things happening and we start getting

play16:34

into generative Ai and then we hit a

play16:37

massive Point realized we needed more

play16:39

optimization things like synthetic data

play16:41

better model optimization AI that is on

play16:43

the edge so to speak so it runs on our

play16:45

phones instead of on the servers

play16:47

knowledge graphs but you notice we're

play16:48

going down because these things aren't

play16:49

fun these things suck and they're

play16:51

necessary for us to keep evolving then

play16:53

we started seeing AI makers and teaching

play16:55

kits to try and get people to actually

play16:57

learn autonomous vehicles with

play16:58

which were very painful and still are

play17:00

cars that drive themselves are far from

play17:02

functioning but now we're seeing more

play17:04

and more things that will hopefully

play17:06

allow us to really benefit from AI but

play17:09

we need to make sure our expectations

play17:11

are realistically set not around the

play17:13

exponential growth every year rather

play17:16

around how we apply the functionality of

play17:18

these things to actually benefit Our

play17:20

Lives day in and day out I am honestly

play17:22

just annoyed that people pretend the

play17:23

models are going to get two times better

play17:25

every couple years because we went

play17:27

through that that's clearly over we're

play17:29

just not seeing levels up like that

play17:31

anymore what I'm expecting us to see

play17:33

instead is massive winds in things that

play17:36

we're not currently using models 4 like

play17:38

we're starting to see video generation

play17:39

catch on and it's taking us a lot of

play17:41

time to get there but I could see us

play17:43

growing there really quickly similar to

play17:45

how chat GPT got way better really

play17:47

quickly but it will also hit a plateau

play17:49

and I think we're going to see more and

play17:50

more of those plateaus hit and our

play17:51

solution isn't going to be magically

play17:53

make it better it's going to be entirely

play17:55

different models and hybrids where we

play17:57

take advantage of handwritten and

play17:59

crafted code maybe human massaging of

play18:01

things and AIS and intermingling and

play18:04

mixing those the same way CPUs and gpus

play18:06

take turns working on things depending

play18:08

on what each is best at handwritten code

play18:10

and AI code doing similar stuff has a

play18:13

ton of potential and I think that's

play18:14

going to be the future of AI because

play18:17

this this is not the future of AI this

play18:20

is a flatline this is a plateau This Is

play18:22

Us reaching the end not the beginning

play18:24

and if mistol is saying that their

play18:25

model's large enough I'm inclined to

play18:27

agree especially when you look at the

play18:30

numbers here and you see how close all

play18:33

of these models are getting to being

play18:34

basically even the winds are no longer

play18:37

the models being way better than the

play18:39

others the winds are going to be

play18:40

efficiency performance speed of

play18:43

responses and then the next set of wins

play18:45

is going to be how we use these things

play18:46

in new and unique ways this is actually

play18:49

a very interesting link there's a

play18:51

project called The Arc prize that was

play18:53

just linked from chat AGI progress has

play18:56

stalled new ideas are needed

play18:59

it's a million dooll public competition

play19:01

to beat an open- Source a solution to

play19:03

the ark AGI

play19:05

Benchmark most AI benchmarks measure

play19:08

skill but skill is not intelligence

play19:09

general intelligence is the ability to

play19:11

efficiently acquire new skills charlot's

play19:13

unbeaten 2019 abstraction and reasoning

play19:15

Corpus for artificial general

play19:17

intelligence is the only formal

play19:19

Benchmark of AGI it's easy for humans

play19:22

but it's hard for

play19:23

AI oh this is fun this is this is going

play19:26

to be like captions basically so we have

play19:30

these patterns an input and an output

play19:32

it's pretty clear what we do here

play19:34

configure your output grid there and

play19:36

then we have to put the dark Blues here

play19:39

here here and

play19:42

here submit fun so the point here is

play19:47

these are the types of puzzles that we

play19:49

can Intuit it we look at the pattern and

play19:51

we can learn quickly what the pattern is

play19:54

with these things it looks like the

play19:56

light blue is ignored

play19:59

red has the outward pattern then dark

play20:02

blue has the pattern with the like

play20:04

t-shape but AI is historically really

play20:07

bad at solving these types of things so

play20:10

here's the arc AGI progress but if we

play20:12

look at other AI benchmarks that people

play20:14

use a lot of the ones we were looking at

play20:16

earlier like H swag imag net all of

play20:19

these it seems like things are improving

play20:21

at an insane rate when you look at

play20:24

general intelligence through a benchmark

play20:25

like this AI sucks at it progress

play20:28

towards artificial general intelligence

play20:29

is stalled llms are trained on

play20:31

unimaginably vast amounts of data yet

play20:33

they remain unable to adapt to simple

play20:35

problems they haven't been trained on or

play20:37

make novel inventions no matter how

play20:39

basic strong Market incentives have

play20:41

pushed Frontier AI research to go closed

play20:43

Source research attention and resources

play20:45

are being put towards a dead end you can

play20:47

change that I like that they have a they

play20:49

call out that the consensus definition

play20:51

for AGI is wrong AGI is a system that

play20:54

can automate the majority of

play20:55

economically valuable work but in

play20:57

reality AGI is a system that can

play20:59

efficiently acquire new skills and solve

play21:01

open-ended problems yes that's what the

play21:03

General in AGI stands for I actually

play21:05

fully agree with this call out defition

play21:08

are important because we turn them into

play21:09

benchmarks to measure progress towards

play21:11

AI I fully agree I love that Nat fredman

play21:15

is one of the advisers the old CEO of

play21:17

GitHub we also have Mike knop who's an

play21:20

absolute Legend who's been involved in

play21:22

all things software Dev and AI for a

play21:24

very long time yeah I love this and I

play21:28

think this is the only way we're going

play21:29

to really see improvements and wins with

play21:32

AI llms are hitting their limitations

play21:35

and as we saw here they're not really

play21:39

winning on General benchmarks like this

play21:41

and sure we have these fancy benchmarks

play21:43

that everybody loves but even these

play21:45

we're starting to see a flatline and a

play21:47

plateau on them we might be at the end

play21:49

of the llm Revolution and if we want to

play21:51

see AI continue to grow and advance in

play21:53

its capabilities we might have to leave

play21:55

behind llms the same way we're starting

play21:57

to leave behind CPUs the future isn't an

play21:59

llm but faster if we want the future to

play22:02

be AI it has to be a different type of

play22:04

AI let me know what you guys think and

play22:06

tell me all the ways I'm wrong until

play22:08

next time peace nerds

Rate This
β˜…
β˜…
β˜…
β˜…
β˜…

5.0 / 5 (0 votes)

Related Tags
AI PlateauComputing FutureMoore's LawAI ModelsTech InnovationHardware LimitsSoftware AdvancementsGeneral AIPerformance Trends