Are Hallucinations Popping the AI Bubble?
Summary
TLDRThe video script discusses the recent dip in Nvidia stocks and attributes it to the bursting of a bubble surrounding Large Language Models (LLMs) in AI, rather than AI as a whole. It highlights the issue of 'hallucinations' in LLMs, where nonsensical outputs are confidently produced. The speaker argues for the integration of symbolic language and logic into AI to address these issues, citing Deepmind's progress in AI for mathematical proofs as an example of 'neurosymbolic' AI. The script concludes that companies focusing solely on LLMs may struggle, while those building AI with logical reasoning and real-world models, like Deepmind, are more likely to succeed.
Takeaways
- đ The speaker bought Nvidia stocks hoping to benefit from the AI boom but has seen the stocks drop, attributing the decline to a bubble bursting in the market, specifically around Large Language Models (LLMs).
- đ§ The speaker believes that the current AI enthusiasm will resume once people realize there's more to AI than just LLMs, and that their stocks will recover as a result.
- đ” The main issue with LLMs, as highlighted, is the phenomenon of 'hallucinations' where the models confidently produce nonsensical outputs.
- đ Improvements have been made in LLMs to avoid obvious errors, such as providing real book recommendations, but they still struggle with logical consistency, like referencing non-existent papers and reports.
- đ€ The speaker uses the example of Kindergarten riddles to illustrate how LLMs can provide answers that are 'close' to correct but fundamentally flawed due to missing context.
- đ The crux of the problem is identified as the difference in metrics for 'good' output between humans and models, suggesting that models lack an understanding of what makes their output valuable to humans.
- đĄ A proposed solution is the integration of logic and symbolic language into AI, termed 'neurosymbolic', which could address many of the issues with LLMs by providing a more human-like logical structure.
- đ The speaker cites Deepmind's progress in AI for mathematical proofs as an example of the potential of neurosymbolic AI, emphasizing the importance of logical rigor in AI development.
- đ€ The speaker suggests that building intelligent AI requires starting with mathematical and physical reality models, then layering language on top, rather than focusing solely on language models.
- đž There is a warning that companies heavily invested in LLMs may not recover their expenses, and that those building AI on logical reasoning and real-world models, like Deepmind, are likely to be the winners.
- đž The speaker ends with a humorous note about the potential for AI to manage stocks and the recommendation of Brilliant.org for those interested in learning more about AI and related fields, offering a discount for new users.
Q & A
Why did the speaker decide to buy Nvidia stocks?
-The speaker bought Nvidia stocks to benefit from the AI boom, as they wanted to get something out of the advancements in artificial intelligence.
What is the speaker's opinion on the current state of AI and its impact on the stock market?
-The speaker believes that the current drop in AI-related stocks is not due to a problem with AI itself, but rather with a specific type of AI called Large Language Models, and they are optimistic that AI enthusiasm will resume once people realize there's more to AI.
What is the main issue with Large Language Models according to the script?
-The main issue with Large Language Models is that they sometimes produce 'hallucinations,' which means they confidently generate incorrect or nonsensical information.
How have Large Language Models improved in providing book recommendations?
-Large Language Models have become better at avoiding obvious pitfalls by listing books that actually exist, tying their output more closely to the training set.
What is an example of a problem that Large Language Models still struggle with?
-An example is when solving riddles like the wolf, goat, and cabbage problem, where leaving out critical information leads the model to provide an answer that is logically incorrect, even though it uses similar words to the original problem.
What is the fundamental problem the speaker identifies with Large Language Models?
-The fundamental problem is that Large Language Models use a different metric for 'good' output than humans do, and this discrepancy cannot be fixed by simply training the models with more input.
What solution does the speaker propose to improve the output of AI models?
-The speaker suggests teaching AI logic and using symbolic language, similar to what math software uses, and combining this with neural networks in an approach called 'neurosymbolic' AI.
Why does the speaker believe that AI needs to be built on logical reasoning and models of the real world?
-The speaker believes that because the world at its deepest level is mathematical, building intelligent AI requires starting with math and models of physical reality, and then adding words on top.
What is the speaker's view on the future of companies that have invested heavily in Large Language Models?
-The speaker thinks that companies that have invested heavily in Large Language Models may never recover those expenses, and the winners will be those who build AI on logical reasoning and models of the real world.
What does the speaker suggest for AI researchers to focus on?
-The speaker suggests that AI researchers should think less about words and more about physics, implying that a deeper understanding of the physical world is crucial for developing truly intelligent AI.
What resource does the speaker recommend for learning more about neural networks and large language models?
-The speaker recommends Brilliant.org for its interactive visualizations and follow-up questions on a variety of topics, including neural networks and large language models.
Outlines
đ AI Stock Woes and the Large Language Model Challenge
The speaker begins by sharing their experience of purchasing Nvidia stocks in hopes of capitalizing on the AI boom, only to witness a decline in stock value. They argue that the current market panic is not a reflection of AI's potential but rather a response to the shortcomings of Large Language Models (LLMs). The speaker highlights the issue of 'hallucinations' in LLMs, where these models produce nonsensical outputs with confidence. Despite improvements in avoiding obvious errors, such as generating real book recommendations, LLMs still struggle with logical consistency, as illustrated by the kindergarten riddle example. The speaker emphasizes that the problem lies in the models' inability to discern 'good' output based on human metrics, suggesting that integrating logic and symbolic language, as seen in neurosymbolic AI, could resolve many of these issues. They conclude by expressing optimism for AI's future, particularly in approaches that combine neural networks with logical reasoning, citing Deepmind's progress in AI-assisted mathematical proofs as an encouraging sign.
đ§ Beyond Words: The Necessity of Logic in AI Development
In the second paragraph, the speaker addresses the limitations of training large language models on vast amounts of text and images, arguing that such an approach does not lead to true understanding. They discuss the concept of 'linguistic confusion', pointing out the inherent subjectivity in language use and the difficulty of establishing logical relations from diverse textual sources. The speaker advocates for a foundational approach to AI that starts with mathematical and physical models, suggesting that companies investing solely in LLMs may not recoup their investments. They predict that success in AI will come to those who build on logical reasoning and real-world models, using Deepmind's virtual mouse example to illustrate this point. The speaker ends with a tongue-in-cheek comment about the potential chaos of AI in stock management and promotes Brilliant.org for those interested in learning more about neural networks and AI, offering a discount for new users through a provided link.
Mindmap
Keywords
đĄNvidia stocks
đĄAI boom
đĄLarge Language Models (LLMs)
đĄHallucinations
đĄSymbolic language
đĄNeurosymbolic AI
đĄDeepMind
đĄLinguistic confusion
đĄPhysical reality models
đĄBrilliant.org
Highlights
Investment in Nvidia stocks was motivated by the AI boom, but the stocks have been dropping due to investor panic.
The current bubble bursting is not due to AI itself, but rather Large Language Models (LLMs).
AI enthusiasm is expected to resume as people realize there's more to AI than just LLMs.
LLMs are known to produce 'hallucinations', confidently generating incorrect information.
An example of LLMs' limitations includes citing non-existent legal cases, as seen with a lawyer using ChatGPT.
LLMs have improved in avoiding obvious pitfalls, such as recommending real books, but still struggle with made-up references.
AI's challenge is producing correct output that is 'close' to wrong output, as illustrated with modified riddles.
The issue with LLMs is the discrepancy between human and model metrics for what constitutes 'good' output.
A potential solution is 'neurosymbolic' AI, combining neural networks with symbolic language for improved logic.
DeepMind's progress in AI for mathematical proofs demonstrates the potential of neurosymbolic AI.
Neurosymbolic AI could resolve many issues with LLMs by applying logical rigor to verbal arguments.
The need for AI to understand and use logic, as opposed to just processing more text and images.
The difficulty of integrating symbolic reasoning with existing models due to 'linguistic confusion'.
The necessity for retraining large language models to incorporate a deeper understanding of logic and physical reality.
The world's fundamental nature is mathematical, suggesting AI should be built on models of physical reality.
Companies that invested heavily in LLMs may not recover their expenses, unlike those focusing on logical reasoning.
DeepMind's virtual mouse example illustrates a promising approach to building intelligent AI.
A call for AI researchers to focus more on physics and less on words for truly intelligent AI development.
Recommendation of Brilliant.org for learning about neural networks, large language models, and other scientific topics.
Transcripts
A few weeks ago, I finally came around to buy a few Nvidia stocks because, hey, Â
I also want to get something out of the AI boom. These stocks have been Â
dropping ever since. Why? Oh right, I donât believe in god.
Yes, so, it doesnât look good for AI at the moment as investors are panicking and Â
stocks are dropping. But in this video I want to make a case that this bubble Â
which is currently bursting is not that of AI per se, itâs that of the specific Â
type of AI called Large Language Models. I am sure AI enthusiasm will resume once Â
people get it into their head that thereâs more to AI. And my stocks will recover.
The best known problem with Large Language Models is what has become known as âhallucinationsâ. Â
They sometimes confidently ramble along and produce nonsense. Youâd think we all Â
learned this lesson in 2022, but then there was the lawyer who used ChatGPT Â
to cook up a defence and ended up citing cases that simply didnât exist. Oops.
Large Language Models have become better at avoiding some obvious pitfalls. For example, Â
if you ask ChatGPT for book recommendations it will now Â
list books that actually exist. It still often refers to made-up papers Â
and reports though. And Midjourney now for the most part puts 5 fingers on each hand, Â
so much so that if you explicitly ask for a hand with 6 fingers, it will still have 5 fingers.
You can do this by tying some output closely to the training set. That might Â
make the problem appear solvable. But itâs not that simple because hallucinations are Â
just one symptom of a much bigger problem, which is that for a Large Â
Language Models correct output that is -- in a quantifiable sense -- âcloseâ to wrong output.
A very illustrative example comes from Colin Fraser who has been using modified versions Â
of Kindergarten riddles, like the wolf, goat, and cabbage problem. In this riddle, Â
the farmer has to get all three in a boat across the river. But the boat will only carry one item Â
in addition to the farmer. Left unattended, the wolf will eat the goat, and the goat will eat Â
the cabbage. The solution is that the farmer has to take one of the items back on a trip.
If you ask a large language model this question but leave out the Â
information that the wolf will eat the goat and the goat the cabbage Â
then it will still give the same answer, which now makes no sense.
I like this example because itâs obvious whatâs going wrong. By way of word content, the altered Â
riddle is similar to the riddle that the models have been trained on. So they extrapolate from Â
what they know and spit out an answer that is close to the answer for the original riddle.Â
But as with hallucinations these answers are âcloseâ in a sense that we donât care about. Yes, Â
they use similar words. But the content is wrong. Itâs like in some sense a hand with Â
six fingers is âcloseâ to one with five fingers. But itâs still wrong.
The issue is that we have a different metric for good output than the one that the models use. Â
When I say âweâ I mean humans, just in case there are some misunderstandings. And these different Â
metrics for what is âgoodâ are a problem you canât fix by just training a model with more and more Â
input. Itâs fundamentally missing information about what makes its output good for us.
The solution is to teach AI logic to use symbolic language, similar to what most Â
maths software uses. If you combine that with a neural network, itâs called âneurosymbolicâ. Â
This can fix a lot of problems with large language models and some of those approaches already exist.Â
For example I mentioned already in January that deepmind made remarkable progress with using Â
AI for mathematical proofs. Just last month they reported that their maths AI now reached the level Â
of a silver medallist in the maths Olympics. Not only does it solve the problems it also provides Â
a proof that humans can understand. Well, kind of.
The relevant point isnât that AI can solves maths Olympics problems because letâs be honest, Â
who really cares. The relevant point is that this AI can parse the problems, Â
and can formulate logically correct answers that a human can understand. Apply this logical rigor to Â
verbal arguments, and boom, a lot of problems with large language models will disappear.
Imagine an AI that could win every internet argument. Reddit Â
would become a ghost town overnight. What I just told you is neither new Â
nor particularly original. Itâs been pointed out for decades by many computer scientists, Â
including Gary Marcus and and others. I just want to say: I think theyâre right. Â
You canât just train large language models on more and more text and images and hope that Â
it will begin to âunderstandâ whatâs going on. And I think no one really expected that.
That said, itâs more difficult than lumping symbolic reasoning on top of the already existing Â
Models basically because of what Wittgenstein called âlinguistic confusionâ. Itâs that no Â
two people use a word to mean exactly the same. And once you have lumped together Â
text from billions of different people, logical relations between these words become washed out, Â
if there ever were any to begin with. I mean itâs not like people are all that good with Â
logic. So Iâm afraid that the already trained large language models will have to be retrained.
Ultimately, the problem with large language models is that the world is not made of Â
words. At the deepest level we know of, the world is mathematics. If you want to Â
build an intelligent AI you need to start with maths, and with models about physical reality, Â
and then put words on top of that. What this all means is that companies Â
which have poured a lot of money into large language models might never recover Â
those expenses. The winners will eventually be those who build an AI on logical reasoning and Â
models of the real world, like Deepmind. What you see here is a recent example in Â
which they created a virtual mouse in a virtual environment thatâs moving with its own neural Â
network modelled after a real mouse brain. This, I think, is how youâll get to really intelligent Â
AI. Next up: virtual cats chasing virtual mice across your computer screen during important Zoom Â
calls. Deepmind was acquired in 2014 by Google, so I havenât yet lost faith in my Google stocks.
The brief summary is that all those people working on AI need to think less about words and Â
more about physics. Just wait until people start using AI to manage their stocks, itâll be great.
Artificial intelligence is really everywhere these days. If you want to learn more about Â
how neural networks and large language models work, I recommend you check out the courses on Â
Brilliant.org. All courses on Brilliant have interactive visualizations and come Â
with follow-up questions. I found it to be very effective to learn something new. It really gives Â
you a feeling for what's going on and helps you build general problem-solving skills. They cover Â
a large variety of topics in science, computer science, and maths. From general scientific Â
thinking to dedicated courses on differential equations or large language models. And they're Â
adding new courses each month. It's a fast and easy way to learn and you can do it whenever and Â
wherever you have the time. Sounds good? I hope it does! You can try Brillant yourself for free Â
if you use my link brilliant.org/sabine that way you'll get to try out everything Brilliant has Â
to offer for a full 30 days and you'll get 20% off the annual premium subscription. So go and Â
give it a try, I'm sure you will won't regret it. Thanks for watching, see you tomorrow.
Voir Plus de Vidéos Connexes
Conversation w/ Victoria Albrecht (Springbok.ai) - How To Build Your Own Internal ChatGPT
The AI Hype is OVER! Have LLMs Peaked?
AI Unveiled beyond the buzz episode 4
Introduction to large language models
Simplifying Generative AI : Explaining Tokens, Parameters, Context Windows and more.
Fine Tuning, RAG e Prompt Engineering: Qual Ă© melhor? e Quando Usar?
5.0 / 5 (0 votes)