The Fastest Way to AGI: LLMs + Tree Search – Demis Hassabis (Google DeepMind CEO)

Dwarkesh Patel
27 Feb 202405:14

Summary

TLDRThe speaker believes that large language models will likely form a key component of future AGI systems, but additional planning and search capabilities will need to be layered on top to enable goal-oriented behavior. He argues that leveraging existing data and models as a starting point will allow for quicker progress compared to a purely reinforcement learning approach. However, specifying the right objectives and rewards remains an open challenge when expanding beyond games to real-world applications. Overall the path forward involves improving predictive models of the world and combining them with efficient search to explore massive spaces of possibilities.

Takeaways

  • 😀 AGI will likely require large language models as a component, but additional planning and search capabilities will also be needed on top.
  • 👍 Using existing knowledge and data to bootstrap learning in models seems more promising than learning purely from scratch.
  • 😮 More efficient search methods can reduce compute requirements - improving the world model allows more effective search.
  • 🤔 Defining the right rewards and goals is challenging for real world systems compared to games with clear winning conditions.
  • 🚀 There is great potential in adding tree search planning capabilities on top of large language models.
  • 🔢 Combining scalable algorithms like Transformers to ingest knowledge, with search and planning holds promise.
  • 😥 Current large models may be missing the ability to chain thoughts or reasoning together with search.
  • 📈 Keep improving large models to make them better world models - a key piece.
  • 👀 Some are exploring building all knowledge purely from reinforcement learning, but this may not be the fastest path.
  • 💡 The final AGI system will likely involve both learned prior knowledge and new on-top mechanisms.

Q & A

  • What systems has DeepMind pioneered that can think through different steps to achieve a goal?

    -DeepMind has pioneered systems like AlphaZero that can think through different possible moves in games like chess and Go to try to win the game. It uses a planning mechanism on top of a world model to explore massive spaces of possibilities.

  • What does the speaker believe are the necessary components of an AGI system?

    -The speaker believes the necessary components of an AGI system are: 1) Large models that are accurate predictors of the world, 2) Planning mechanisms like AlphaZero that can make concrete plans to achieve goals using the world model, and 3) Possibly search algorithms to chain lines of reasoning and explore possibilities.

  • What potential does the speaker see for AGI to come from a pure reinforcement learning approach?

    -The speaker thinks theoretically it's possible for AGI to emerge entirely from a reinforcement learning approach with no priors or data given to the system initially. However, he believes the quickest and most plausible path is to use existing knowledge and scalable algorithms like Transformers to ingest information to bootstrap the learning.

  • How can systems like AlphaZero be more efficient in their search compared to brute force methods?

    -By having a richer, more accurate world model, AlphaZero can make strong decisions by searching far fewer possibilities than brute force methods that lack an accurate model. This suggests improving the world model allows more efficient search.

  • What challenge exists in defining reward functions for real-world systems compared to games?

    -Games have clear reward functions like winning the game or increasing the score. But specifying the right rewards and goals in a general yet specific way for real-world systems is more challenging.

  • What benefit did DeepMind gain from using games as a proving ground for its algorithms?

    -Games provided an efficient research domain with clearly defined reward functions in terms of winning or scoring. This made them ideal testbeds before tackling real-world complexity.

  • How might search mechanisms explore the possibilities generated by large language models?

    -They could chain together lines of reasoning produced by the LLMs, using search to traverse trees of possibilities originating from the models' outputs.

  • Do LLMs have inherent goals and rewards driving their behavior?

    -No, LLMs themselves don't have inherent goals and rewards. They produce outputs based on their training, requiring search/planning mechanisms and predefined goals/rewards to drive purposeful, goal-oriented behavior.

  • What role might hybrid systems play in developing AGI?

    -The speaker believes hybrid systems combining large models with search, planning, and reinforcement learning components may provide the most promising path to developing AGI.

  • Why does the speaker believe starting from existing knowledge will enable quicker progress towards AGI compared to learning 'tabula rasa'?

    -Starting tabula rasa forgoes all the progress made in collecting knowledge and developing algorithms for processing it. Building on top of this using hybrid approaches allows bootstrapping rather than starting from scratch.

Outlines

00:00

🤔 How additional planning and search mechanisms could make LLMs more capable

This paragraph discusses how large language models (LLMs) are good at predicting the world but likely insufficient on their own for artificial general intelligence (AGI). Additional planning and search mechanisms modeled after AlphaZero could allow LLMs to chain thoughts, reason through possibilities, and achieve goals, making them more capable. Combining LLMs with planning on top is likely the quickest path to AGI.

05:02

😅 The challenge of defining reward functions and goals for real-world AI systems

This paragraph acknowledges defining the right reward functions and goals as a key challenge when developing real-world AI systems beyond games. Games provide clear reward functions around winning, scoring points, etc. Specifying rewards and goals that point systems in the right direction in the real world is more complex.

Mindmap

Keywords

💡AlphaZero

AlphaZero is an AI system created by DeepMind that mastered the games of chess, shogi and Go, achieving superhuman performance with no domain knowledge except the rules. It serves as an example of advanced AI planning and search capabilities that could be built on top of large language models to make them more capable of reasoning, exploring possibilities, and achieving goals.

💡world models

World models refer to internal models that AI systems build to represent beliefs, concepts and knowledge about the world. The script argues that large language models alone are likely insufficient for AGI, but they could serve as an essential component by providing sufficiently accurate world models onto which planning and search capabilities can be layered.

💡planning mechanisms

Planning mechanisms allow AI systems to conceptualize and evaluate different possible sequences of actions to achieve a desired goal. The script suggests planning mechanisms similar to those used by AlphaZero could be built on top of large language models to chain lines of reasoning and explore massive spaces of possibilities.

💡Reinforcement Learning

Reinforcement learning is a machine learning approach centered around an agent learning by interacting with an environment and receiving feedback in the form of rewards and punishments. The script discusses whether AGI could arise solely from a pure reinforcement learning approach rather than by combining learned priors from large language models with search and planning.

💡sample efficiency

Sample efficiency refers to the ability of a machine learning system to learn effectively from a limited number of examples. The speaker notes that DeepMind focuses on sample efficient methods that can reuse existing data rather than learning purely from scratch.

💡objective function

The objective function, or reward function, defines the goal an AI agent should achieve. Games have clear objective functions, like winning, that reinforcement learning agents can optimize for. Defining the right objective functions for real-world systems is noted as a key challenge.

💡proof of concept

Games serve as a proof of concept for AI and machine learning approaches before tackling messier, harder to define real world problems. They allow efficient testing and iteration with unambiguous reward signals like scores.

💡brute force

Brute force refers to solving problems by exhaustively searching all possible solutions. Traditional game-playing systems like Deep Blue relied on brute force search to evaluate millions of possible moves. In contrast, AlphaZero plays at superhuman levels while evaluating far fewer moves due to its learned knowledge.

💡tradeoffs

There are tradeoffs in AI system design, for example between the sophistication of planning algorithms and learned models. The script argues that improving world models can allow for more efficient search, reducing the computation needed to reason through possibilities.

💡general intelligence

The overarching theme is pathways towards artificial general intelligence (AGI) - AI systems with flexible learning and reasoning capabilities that can handle a wide range of tasks. Large language models are viewed as a key building block but likely insufficient on their own without additional search, planning and reasoning modules.

Highlights

LLMs are necessary but probably not sufficient components of an AGI system

Planning mechanisms like AlphaZero could be built on top of LLMs to achieve goals and chain reasoning

LLMs currently lack the search capabilities to explore possibilities like AlphaZero does

The most likely path to AGI is using all available knowledge to pre-train transformers that can plan and search

The final AGI system will likely combine LLMs and planning/search mechanisms

In theory AGI could emerge from a pure reinforcement learning approach but using existing knowledge is faster

Better world models allow more efficient search, like AlphaZero beating humans while searching less positions

There is a tradeoff between model quality and search efficiency

Specifying rewards and goals is challenging in real world systems compared to games

Games provide easy reward specification which aids AI research

The objective function and rewards are key challenges in developing real world AI systems

Planning mechanisms could be built on top of LLMs to explore possibilities

Combining LLMs and search/planning is the most likely path to AGI

Better world models enable more efficient search

Specifying rewards is easier in games than real world systems

Transcripts

play00:00

obviously deep minders at the frontier

play00:01

and has been for many years you know

play00:03

with systems like Alpha zero and so

play00:04

forth of having these agents who can

play00:06

like think through different steps to

play00:07

get to an end outcome um are will this

play00:10

just be is a path for llms to have this

play00:12

sort of uh tree search kind of thing on

play00:14

top of them how do you think about this

play00:16

I think that's a super promising

play00:18

Direction in my opinion so you know

play00:19

we've got to carry on improving uh the

play00:22

large models and we've got to carry on

play00:24

um basically making the more and more

play00:26

accurate predictors of the world so in

play00:28

effect making them more more reliable

play00:30

World models that's clearly a necessary

play00:32

but I would say probably not sufficient

play00:34

component of an AGI system um and then

play00:37

on top of that I would you know we're

play00:39

working on things like Alpha zero like

play00:41

planning mechanisms on top that make use

play00:44

of that model in order to make concrete

play00:46

plans to achieve certain goals in the

play00:48

world um and and perhaps sort of chain

play00:50

you know chain thought together or lines

play00:53

of reasoning together and maybe use

play00:55

search to kind of explore massive spaces

play00:57

of possibility I think that's kind of

play00:59

missing from our current large models is

play01:02

there any potential for the AGI to

play01:04

eventually come from just a pure RL

play01:06

approach like the the way we're talking

play01:07

about it it sounds like there'll be uh

play01:09

the llm will form the right prior and

play01:12

then this sort of research will go on

play01:13

top of that or is there possibility just

play01:14

like completely out of the I think I

play01:16

certainly you know that theoretically I

play01:17

think there's no reason why you couldn't

play01:19

go full Alpha zero like on it and there

play01:21

are some people uh here deep Google Deep

play01:25

Mind and and and in the RL Community who

play01:27

work on that right um fully uh no priors

play01:31

uh no data and and just build all

play01:33

knowledge from scratch um and I think

play01:36

that's valuable because of course you

play01:37

could you know those those ideas and

play01:39

those algorithms should also work when

play01:41

you have some knowledge too um but

play01:43

having said that I think by far probably

play01:45

my betting would be the quickest way to

play01:47

get to AGI and the most likely plausible

play01:49

way is to um use all the knowledge

play01:51

that's existing in the world right now

play01:53

on things like the web and that we've

play01:54

collected and we have these scalable uh

play01:57

algorithms like like um Transformers

play01:59

that are capable of ingesting all of

play02:01

that information and I don't see why you

play02:04

wouldn't start with a a model as a kind

play02:07

of Prior or or to build on and to make

play02:09

predictions that helps bootstrap your

play02:11

learning I just think it it doesn't make

play02:13

sense not to make use of that so my my

play02:16

my betting would be is that um you know

play02:19

the final AGI system will have these

play02:21

large multimodels um models as part of

play02:24

the the overall solution but probably U

play02:27

won't be enough on their own you will

play02:28

need this additional planning search on

play02:30

top how do you get past the sort of

play02:32

immense amount of compute that these

play02:33

approaches tend to require so even the

play02:35

alpago uh system was you know a pretty

play02:38

expensive system um because you had to

play02:39

do the sort of running an LM LM on each

play02:41

node of the tree uh how how do you

play02:44

anticipate that'll get more made more

play02:45

efficient well we focus a lot on

play02:47

efficient You Know sample efficient

play02:48

methods and and and reusing uh existing

play02:52

data things like experience replay um

play02:54

and also just looking at uh more

play02:56

efficient ways I mean the better your

play02:58

world model is the more efficient your

play02:59

search can be so one example I always

play03:01

give with Alpha zero our system to play

play03:03

go and chess and you know any game is

play03:06

that um it's stronger than world

play03:08

champion level human world champion

play03:10

level at all these games um and it uses

play03:12

a lot less search than a brute force

play03:15

method um like deep blue say to play

play03:17

chess deep blue one of these traditional

play03:19

stockfish or deep blue um systems would

play03:22

maybe look at millions of uh possible

play03:24

moves for every decision it's going to

play03:26

make alpha zero and Alpha go made you

play03:29

know looked at around T tens of

play03:31

thousands of um possible positions in

play03:34

order to make a decision about what to

play03:35

move next but a human Grandmaster a

play03:37

human world champion uh probably only

play03:40

looks at a few hundreds of moves even

play03:42

the top ones in order to make their very

play03:45

uh good decision about what to play next

play03:47

so that suggests that obviously the

play03:49

Brute Force systems don't have any real

play03:51

model other than theistic about the game

play03:54

Alpha Zer has quite a decent uh uh model

play03:57

but the but the human you know human top

play04:00

human players have a much richer much

play04:02

more accurate model than of go or chess

play04:05

so that allows them to make you know

play04:06

world-class decisions on a very small

play04:09

amount of search so I think there's

play04:10

still there's a sort of tradeoff there

play04:12

like you know if you improve the models

play04:14

then I think your search can be more

play04:15

efficient and therefore you can get

play04:17

further with your search yeah I have two

play04:19

questions based on that uh the first

play04:21

being with Alpha's go you had um a very

play04:23

conrete win condition of you know at the

play04:25

end of the day do I win this game ago or

play04:26

not and you can reinforce on that how

play04:29

when you're just thinking of like nlm

play04:30

putting out thought what will do you

play04:32

think there will be this kind of ability

play04:33

to discriminate uh in the end whether

play04:35

that was like a good good thing to

play04:37

reward or not well of course that's why

play04:39

we you know we pioneered and and Deep

play04:40

Mind sort of famous for using games as a

play04:43

Proving Ground um partly because

play04:45

obviously it's efficient to research in

play04:47

that domain but the other reason is

play04:49

obviously it's it's you know extremely

play04:50

easy to specify reward function winning

play04:52

the game or improving the score

play04:53

something like that sort of built into

play04:55

most games so that is the the that is

play04:57

the that one of the challenges of real

play04:59

wealth systems is how does one Define uh

play05:01

the right objective function the right

play05:03

reward function um and the right goals

play05:05

um and specify them in a in in you know

play05:08

in a general way but they're specific

play05:10

enough and and and actually points the

play05:12

system in the right direction