The Fastest Way to AGI: LLMs + Tree Search – Demis Hassabis (Google DeepMind CEO)
Summary
TLDRThe speaker believes that large language models will likely form a key component of future AGI systems, but additional planning and search capabilities will need to be layered on top to enable goal-oriented behavior. He argues that leveraging existing data and models as a starting point will allow for quicker progress compared to a purely reinforcement learning approach. However, specifying the right objectives and rewards remains an open challenge when expanding beyond games to real-world applications. Overall the path forward involves improving predictive models of the world and combining them with efficient search to explore massive spaces of possibilities.
Takeaways
- 😀 AGI will likely require large language models as a component, but additional planning and search capabilities will also be needed on top.
- 👍 Using existing knowledge and data to bootstrap learning in models seems more promising than learning purely from scratch.
- 😮 More efficient search methods can reduce compute requirements - improving the world model allows more effective search.
- 🤔 Defining the right rewards and goals is challenging for real world systems compared to games with clear winning conditions.
- 🚀 There is great potential in adding tree search planning capabilities on top of large language models.
- 🔢 Combining scalable algorithms like Transformers to ingest knowledge, with search and planning holds promise.
- 😥 Current large models may be missing the ability to chain thoughts or reasoning together with search.
- 📈 Keep improving large models to make them better world models - a key piece.
- 👀 Some are exploring building all knowledge purely from reinforcement learning, but this may not be the fastest path.
- 💡 The final AGI system will likely involve both learned prior knowledge and new on-top mechanisms.
Q & A
What systems has DeepMind pioneered that can think through different steps to achieve a goal?
-DeepMind has pioneered systems like AlphaZero that can think through different possible moves in games like chess and Go to try to win the game. It uses a planning mechanism on top of a world model to explore massive spaces of possibilities.
What does the speaker believe are the necessary components of an AGI system?
-The speaker believes the necessary components of an AGI system are: 1) Large models that are accurate predictors of the world, 2) Planning mechanisms like AlphaZero that can make concrete plans to achieve goals using the world model, and 3) Possibly search algorithms to chain lines of reasoning and explore possibilities.
What potential does the speaker see for AGI to come from a pure reinforcement learning approach?
-The speaker thinks theoretically it's possible for AGI to emerge entirely from a reinforcement learning approach with no priors or data given to the system initially. However, he believes the quickest and most plausible path is to use existing knowledge and scalable algorithms like Transformers to ingest information to bootstrap the learning.
How can systems like AlphaZero be more efficient in their search compared to brute force methods?
-By having a richer, more accurate world model, AlphaZero can make strong decisions by searching far fewer possibilities than brute force methods that lack an accurate model. This suggests improving the world model allows more efficient search.
What challenge exists in defining reward functions for real-world systems compared to games?
-Games have clear reward functions like winning the game or increasing the score. But specifying the right rewards and goals in a general yet specific way for real-world systems is more challenging.
What benefit did DeepMind gain from using games as a proving ground for its algorithms?
-Games provided an efficient research domain with clearly defined reward functions in terms of winning or scoring. This made them ideal testbeds before tackling real-world complexity.
How might search mechanisms explore the possibilities generated by large language models?
-They could chain together lines of reasoning produced by the LLMs, using search to traverse trees of possibilities originating from the models' outputs.
Do LLMs have inherent goals and rewards driving their behavior?
-No, LLMs themselves don't have inherent goals and rewards. They produce outputs based on their training, requiring search/planning mechanisms and predefined goals/rewards to drive purposeful, goal-oriented behavior.
What role might hybrid systems play in developing AGI?
-The speaker believes hybrid systems combining large models with search, planning, and reinforcement learning components may provide the most promising path to developing AGI.
Why does the speaker believe starting from existing knowledge will enable quicker progress towards AGI compared to learning 'tabula rasa'?
-Starting tabula rasa forgoes all the progress made in collecting knowledge and developing algorithms for processing it. Building on top of this using hybrid approaches allows bootstrapping rather than starting from scratch.
Outlines
🤔 How additional planning and search mechanisms could make LLMs more capable
This paragraph discusses how large language models (LLMs) are good at predicting the world but likely insufficient on their own for artificial general intelligence (AGI). Additional planning and search mechanisms modeled after AlphaZero could allow LLMs to chain thoughts, reason through possibilities, and achieve goals, making them more capable. Combining LLMs with planning on top is likely the quickest path to AGI.
😅 The challenge of defining reward functions and goals for real-world AI systems
This paragraph acknowledges defining the right reward functions and goals as a key challenge when developing real-world AI systems beyond games. Games provide clear reward functions around winning, scoring points, etc. Specifying rewards and goals that point systems in the right direction in the real world is more complex.
Mindmap
Keywords
💡AlphaZero
💡world models
💡planning mechanisms
💡Reinforcement Learning
💡sample efficiency
💡objective function
💡proof of concept
💡brute force
💡tradeoffs
💡general intelligence
Highlights
LLMs are necessary but probably not sufficient components of an AGI system
Planning mechanisms like AlphaZero could be built on top of LLMs to achieve goals and chain reasoning
LLMs currently lack the search capabilities to explore possibilities like AlphaZero does
The most likely path to AGI is using all available knowledge to pre-train transformers that can plan and search
The final AGI system will likely combine LLMs and planning/search mechanisms
In theory AGI could emerge from a pure reinforcement learning approach but using existing knowledge is faster
Better world models allow more efficient search, like AlphaZero beating humans while searching less positions
There is a tradeoff between model quality and search efficiency
Specifying rewards and goals is challenging in real world systems compared to games
Games provide easy reward specification which aids AI research
The objective function and rewards are key challenges in developing real world AI systems
Planning mechanisms could be built on top of LLMs to explore possibilities
Combining LLMs and search/planning is the most likely path to AGI
Better world models enable more efficient search
Specifying rewards is easier in games than real world systems
Transcripts
obviously deep minders at the frontier
and has been for many years you know
with systems like Alpha zero and so
forth of having these agents who can
like think through different steps to
get to an end outcome um are will this
just be is a path for llms to have this
sort of uh tree search kind of thing on
top of them how do you think about this
I think that's a super promising
Direction in my opinion so you know
we've got to carry on improving uh the
large models and we've got to carry on
um basically making the more and more
accurate predictors of the world so in
effect making them more more reliable
World models that's clearly a necessary
but I would say probably not sufficient
component of an AGI system um and then
on top of that I would you know we're
working on things like Alpha zero like
planning mechanisms on top that make use
of that model in order to make concrete
plans to achieve certain goals in the
world um and and perhaps sort of chain
you know chain thought together or lines
of reasoning together and maybe use
search to kind of explore massive spaces
of possibility I think that's kind of
missing from our current large models is
there any potential for the AGI to
eventually come from just a pure RL
approach like the the way we're talking
about it it sounds like there'll be uh
the llm will form the right prior and
then this sort of research will go on
top of that or is there possibility just
like completely out of the I think I
certainly you know that theoretically I
think there's no reason why you couldn't
go full Alpha zero like on it and there
are some people uh here deep Google Deep
Mind and and and in the RL Community who
work on that right um fully uh no priors
uh no data and and just build all
knowledge from scratch um and I think
that's valuable because of course you
could you know those those ideas and
those algorithms should also work when
you have some knowledge too um but
having said that I think by far probably
my betting would be the quickest way to
get to AGI and the most likely plausible
way is to um use all the knowledge
that's existing in the world right now
on things like the web and that we've
collected and we have these scalable uh
algorithms like like um Transformers
that are capable of ingesting all of
that information and I don't see why you
wouldn't start with a a model as a kind
of Prior or or to build on and to make
predictions that helps bootstrap your
learning I just think it it doesn't make
sense not to make use of that so my my
my betting would be is that um you know
the final AGI system will have these
large multimodels um models as part of
the the overall solution but probably U
won't be enough on their own you will
need this additional planning search on
top how do you get past the sort of
immense amount of compute that these
approaches tend to require so even the
alpago uh system was you know a pretty
expensive system um because you had to
do the sort of running an LM LM on each
node of the tree uh how how do you
anticipate that'll get more made more
efficient well we focus a lot on
efficient You Know sample efficient
methods and and and reusing uh existing
data things like experience replay um
and also just looking at uh more
efficient ways I mean the better your
world model is the more efficient your
search can be so one example I always
give with Alpha zero our system to play
go and chess and you know any game is
that um it's stronger than world
champion level human world champion
level at all these games um and it uses
a lot less search than a brute force
method um like deep blue say to play
chess deep blue one of these traditional
stockfish or deep blue um systems would
maybe look at millions of uh possible
moves for every decision it's going to
make alpha zero and Alpha go made you
know looked at around T tens of
thousands of um possible positions in
order to make a decision about what to
move next but a human Grandmaster a
human world champion uh probably only
looks at a few hundreds of moves even
the top ones in order to make their very
uh good decision about what to play next
so that suggests that obviously the
Brute Force systems don't have any real
model other than theistic about the game
Alpha Zer has quite a decent uh uh model
but the but the human you know human top
human players have a much richer much
more accurate model than of go or chess
so that allows them to make you know
world-class decisions on a very small
amount of search so I think there's
still there's a sort of tradeoff there
like you know if you improve the models
then I think your search can be more
efficient and therefore you can get
further with your search yeah I have two
questions based on that uh the first
being with Alpha's go you had um a very
conrete win condition of you know at the
end of the day do I win this game ago or
not and you can reinforce on that how
when you're just thinking of like nlm
putting out thought what will do you
think there will be this kind of ability
to discriminate uh in the end whether
that was like a good good thing to
reward or not well of course that's why
we you know we pioneered and and Deep
Mind sort of famous for using games as a
Proving Ground um partly because
obviously it's efficient to research in
that domain but the other reason is
obviously it's it's you know extremely
easy to specify reward function winning
the game or improving the score
something like that sort of built into
most games so that is the the that is
the that one of the challenges of real
wealth systems is how does one Define uh
the right objective function the right
reward function um and the right goals
um and specify them in a in in you know
in a general way but they're specific
enough and and and actually points the
system in the right direction
関連動画をさらに表示
Can LLMs reason? | Yann LeCun and Lex Fridman
How Developers might stop worrying about AI taking software jobs and Learn to Profit from LLMs
The Evolution of AI: Traditional AI vs. Generative AI
How I'd Learn AI in 2024 (If I Could Start Over) | Machine Learning Roadmap
STUNNING Step for Autonomous AI Agents PLUS OpenAI Defense Against JAILBROKEN Agents
Perplexity AI Masterclass: Search Will Never Be the Same
5.0 / 5 (0 votes)