The Fastest Way to AGI: LLMs + Tree Search – Demis Hassabis (Google DeepMind CEO)

Dwarkesh Patel

27 Feb 202405:14

Summary

TLDRThe speaker believes that large language models will likely form a key component of future AGI systems, but additional planning and search capabilities will need to be layered on top to enable goal-oriented behavior. He argues that leveraging existing data and models as a starting point will allow for quicker progress compared to a purely reinforcement learning approach. However, specifying the right objectives and rewards remains an open challenge when expanding beyond games to real-world applications. Overall the path forward involves improving predictive models of the world and combining them with efficient search to explore massive spaces of possibilities.

Takeaways

😀 AGI will likely require large language models as a component, but additional planning and search capabilities will also be needed on top.
👍 Using existing knowledge and data to bootstrap learning in models seems more promising than learning purely from scratch.
😮 More efficient search methods can reduce compute requirements - improving the world model allows more effective search.
🤔 Defining the right rewards and goals is challenging for real world systems compared to games with clear winning conditions.
🚀 There is great potential in adding tree search planning capabilities on top of large language models.
🔢 Combining scalable algorithms like Transformers to ingest knowledge, with search and planning holds promise.
😥 Current large models may be missing the ability to chain thoughts or reasoning together with search.
📈 Keep improving large models to make them better world models - a key piece.
👀 Some are exploring building all knowledge purely from reinforcement learning, but this may not be the fastest path.
💡 The final AGI system will likely involve both learned prior knowledge and new on-top mechanisms.

Q & A

What systems has DeepMind pioneered that can think through different steps to achieve a goal?
-DeepMind has pioneered systems like AlphaZero that can think through different possible moves in games like chess and Go to try to win the game. It uses a planning mechanism on top of a world model to explore massive spaces of possibilities.
What does the speaker believe are the necessary components of an AGI system?
-The speaker believes the necessary components of an AGI system are: 1) Large models that are accurate predictors of the world, 2) Planning mechanisms like AlphaZero that can make concrete plans to achieve goals using the world model, and 3) Possibly search algorithms to chain lines of reasoning and explore possibilities.
What potential does the speaker see for AGI to come from a pure reinforcement learning approach?
-The speaker thinks theoretically it's possible for AGI to emerge entirely from a reinforcement learning approach with no priors or data given to the system initially. However, he believes the quickest and most plausible path is to use existing knowledge and scalable algorithms like Transformers to ingest information to bootstrap the learning.
How can systems like AlphaZero be more efficient in their search compared to brute force methods?
-By having a richer, more accurate world model, AlphaZero can make strong decisions by searching far fewer possibilities than brute force methods that lack an accurate model. This suggests improving the world model allows more efficient search.
What challenge exists in defining reward functions for real-world systems compared to games?
-Games have clear reward functions like winning the game or increasing the score. But specifying the right rewards and goals in a general yet specific way for real-world systems is more challenging.
What benefit did DeepMind gain from using games as a proving ground for its algorithms?
-Games provided an efficient research domain with clearly defined reward functions in terms of winning or scoring. This made them ideal testbeds before tackling real-world complexity.
How might search mechanisms explore the possibilities generated by large language models?
-They could chain together lines of reasoning produced by the LLMs, using search to traverse trees of possibilities originating from the models' outputs.
Do LLMs have inherent goals and rewards driving their behavior?
-No, LLMs themselves don't have inherent goals and rewards. They produce outputs based on their training, requiring search/planning mechanisms and predefined goals/rewards to drive purposeful, goal-oriented behavior.
What role might hybrid systems play in developing AGI?
-The speaker believes hybrid systems combining large models with search, planning, and reinforcement learning components may provide the most promising path to developing AGI.
Why does the speaker believe starting from existing knowledge will enable quicker progress towards AGI compared to learning 'tabula rasa'?
-Starting tabula rasa forgoes all the progress made in collecting knowledge and developing algorithms for processing it. Building on top of this using hybrid approaches allows bootstrapping rather than starting from scratch.

Outlines

00:00

🤔 How additional planning and search mechanisms could make LLMs more capable

This paragraph discusses how large language models (LLMs) are good at predicting the world but likely insufficient on their own for artificial general intelligence (AGI). Additional planning and search mechanisms modeled after AlphaZero could allow LLMs to chain thoughts, reason through possibilities, and achieve goals, making them more capable. Combining LLMs with planning on top is likely the quickest path to AGI.

05:02

😅 The challenge of defining reward functions and goals for real-world AI systems

This paragraph acknowledges defining the right reward functions and goals as a key challenge when developing real-world AI systems beyond games. Games provide clear reward functions around winning, scoring points, etc. Specifying rewards and goals that point systems in the right direction in the real world is more complex.

Mindmap

Keywords

💡AlphaZero

AlphaZero is an AI system created by DeepMind that mastered the games of chess, shogi and Go, achieving superhuman performance with no domain knowledge except the rules. It serves as an example of advanced AI planning and search capabilities that could be built on top of large language models to make them more capable of reasoning, exploring possibilities, and achieving goals.

💡world models

World models refer to internal models that AI systems build to represent beliefs, concepts and knowledge about the world. The script argues that large language models alone are likely insufficient for AGI, but they could serve as an essential component by providing sufficiently accurate world models onto which planning and search capabilities can be layered.

💡planning mechanisms

Planning mechanisms allow AI systems to conceptualize and evaluate different possible sequences of actions to achieve a desired goal. The script suggests planning mechanisms similar to those used by AlphaZero could be built on top of large language models to chain lines of reasoning and explore massive spaces of possibilities.

💡Reinforcement Learning

Reinforcement learning is a machine learning approach centered around an agent learning by interacting with an environment and receiving feedback in the form of rewards and punishments. The script discusses whether AGI could arise solely from a pure reinforcement learning approach rather than by combining learned priors from large language models with search and planning.

💡sample efficiency

Sample efficiency refers to the ability of a machine learning system to learn effectively from a limited number of examples. The speaker notes that DeepMind focuses on sample efficient methods that can reuse existing data rather than learning purely from scratch.

💡objective function

The objective function, or reward function, defines the goal an AI agent should achieve. Games have clear objective functions, like winning, that reinforcement learning agents can optimize for. Defining the right objective functions for real-world systems is noted as a key challenge.

💡proof of concept

Games serve as a proof of concept for AI and machine learning approaches before tackling messier, harder to define real world problems. They allow efficient testing and iteration with unambiguous reward signals like scores.

💡brute force

Brute force refers to solving problems by exhaustively searching all possible solutions. Traditional game-playing systems like Deep Blue relied on brute force search to evaluate millions of possible moves. In contrast, AlphaZero plays at superhuman levels while evaluating far fewer moves due to its learned knowledge.

💡tradeoffs

There are tradeoffs in AI system design, for example between the sophistication of planning algorithms and learned models. The script argues that improving world models can allow for more efficient search, reducing the computation needed to reason through possibilities.

💡general intelligence

The overarching theme is pathways towards artificial general intelligence (AGI) - AI systems with flexible learning and reasoning capabilities that can handle a wide range of tasks. Large language models are viewed as a key building block but likely insufficient on their own without additional search, planning and reasoning modules.

Highlights

LLMs are necessary but probably not sufficient components of an AGI system

Planning mechanisms like AlphaZero could be built on top of LLMs to achieve goals and chain reasoning

LLMs currently lack the search capabilities to explore possibilities like AlphaZero does

The most likely path to AGI is using all available knowledge to pre-train transformers that can plan and search

The final AGI system will likely combine LLMs and planning/search mechanisms

In theory AGI could emerge from a pure reinforcement learning approach but using existing knowledge is faster

Better world models allow more efficient search, like AlphaZero beating humans while searching less positions

There is a tradeoff between model quality and search efficiency

Specifying rewards and goals is challenging in real world systems compared to games

Games provide easy reward specification which aids AI research

The objective function and rewards are key challenges in developing real world AI systems

Planning mechanisms could be built on top of LLMs to explore possibilities

Combining LLMs and search/planning is the most likely path to AGI

Better world models enable more efficient search

Specifying rewards is easier in games than real world systems