How Far Can We Scale AI? Gen 3, Claude 3.5 Sonnet and AI Hype
Summary
TLDRThe script discusses the rapid advancements in AI video generation, exemplified by models like Runway Gen 3 and the anticipated Sora from Open AI. It raises questions about the reliability of AI leaders and the scalability of language models, highlighting Claude 3.5 Sonic's capabilities and the incremental improvements in AI. The potential for AI in fields like drug discovery is mentioned, with a cautionary note on separating AI hype from reality and the unpredictable impact of scaling and new research.
Takeaways
- 🌐 AI video generation is rapidly advancing and becoming more accessible, with models like Runway Gen 3 and Sora promising highly realistic outputs despite training on a small fraction of available video data.
- 🎥 The Luma dream machine offers an engaging way to experiment with AI-generated images, allowing users to interpolate between two images or generate new ones.
- 📈 There is skepticism about the continuous scaling of language models, with concerns about whether increased scale will necessarily lead to more accurate or reliable AI.
- 🤖 The release of advanced voice models, such as OpenAI's real-time advanced voice model, has been delayed to improve content detection and refusal capabilities.
- 📊 Benchmark results for AI models like Claude 3.5 Sonic show improvements with increased compute, but the gains are incremental and not proportional to the scale.
- 🧠 The potential of AI in fields like biology and drug discovery is being discussed, with some suggesting AI could accelerate the rate of discoveries, though this is speculative.
- 🔍 There is a call for caution in interpreting benchmark results and a recognition that AI models still struggle with basic tasks, indicating that scale alone may not solve all issues.
- 📚 The script highlights the importance of metacognition in AI development, suggesting that understanding how to think about problems is as crucial as scaling computational power.
- 🤝 Open source AI models are seen as a way to encourage innovation and allow for diverse applications, contrasting with the idea of a single 'true AI'.
- 🚧 There is acknowledgment from AI leaders that the field is moving fast, and there is a need to ensure that understanding keeps pace with the capabilities of AI models.
- 🔮 Predictions about the future of AI are made with caution, acknowledging the many unknowns and the difficulty in forecasting exact outcomes or timelines for AI advancements.
Q & A
What is the current state of AI video generation technology?
-AI video generation technology is rapidly advancing, with models such as Runway Gen 3 becoming more accessible and generating increasingly realistic outputs. However, these models are likely trained on less than 1% of available video data, indicating that future generations could become even more realistic in a relatively short time.
What is the significance of the Luma Dream Machine in AI image generation?
-The Luma Dream Machine is a tool for AI image generation that allows users to create images or interpolate between two real ones. It represents a fun and engaging way for users to experiment with AI-generated visuals while waiting for the release of more advanced models.
What is the status of the video generation model called Sora from Open AI?
-Sora is a highly anticipated video generation model from Open AI, which is considered to be the most promising in its field. However, it is still under development, and comparisons with other models like Runway Gen 3 show that it may have benefits from larger scale training and compute resources.
Why is the release of the real-time advanced voice mode from Open AI delayed?
-The release of the real-time advanced voice mode from Open AI has been delayed to improve the model's ability to detect and refuse certain content. It also addresses concerns about the model occasionally producing inaccurate or unreliable outputs.
What are some of the limitations of scaling AI models?
-While scaling AI models can lead to improvements, it does not necessarily solve all problems. For example, even with more data and compute, models may still struggle with basic tasks or produce hallucinations in language generation. The hope is that scaling will eventually lead to more accurate and reliable AI, but there is skepticism about whether this will fully address current limitations.
What is the current performance of Claude 3.5 Sonic from Anthropic compared to other language models?
-Claude 3.5 Sonic is a language model from Anthropic that is free, fast, and in certain domains, more capable than comparable language models. It shows improvements in basic mathematical ability and general knowledge compared to models like GPT-40 and Gemini 1.5 Pro from Google, but the differences are not as significant as the scale of compute might suggest.
What is the concept of 'metacognition' in the context of AI development?
-Metacognition in AI refers to the ability of models to understand how to think about a problem in a broad sense, to assess the importance of an answer, and to use external tools to check their answers. It represents a significant frontier in AI development, moving beyond simple scaling to more human-like thinking processes.
What are some of the challenges in accurately assessing the capabilities of AI models?
-Assessing the capabilities of AI models is challenging due to the limitations and flaws in benchmark tests, which may not accurately reflect real-world performance. Additionally, models may struggle with basic tasks despite high scores on benchmarks, indicating a need for more nuanced evaluation methods.
How do AI lab leaders view the potential of AI in fields like biology and drug discovery?
-Some AI lab leaders, such as those from Anthropic, see the potential for AI to significantly impact fields like biology and drug discovery, possibly leading to new discoveries and even cures for diseases. However, these views are sometimes seen as overly optimistic or 'hype' by others in the industry.
What is the current sentiment regarding the hype around AI capabilities and scaling?
-There is a growing sentiment that the hype around AI capabilities and scaling may have gone too far, with some industry insiders expressing skepticism about the pace of progress and the reliability of benchmark results. The challenge is to separate hype from reality and to manage expectations about what AI can and cannot do.
What are the potential future developments in AI model scaling and algorithmic improvements?
-Future developments in AI are expected to include scaling to models with billions of parameters and continued improvements in algorithms and chip technology. If these advancements continue at their current pace, there is a possibility that by 2025-2027, AI models could surpass human capabilities in many areas.
Outlines
🌐 AI Video Generation and Model Scalability
The script discusses the rapid advancements in AI video generation, highlighting the capabilities of models like Runway Gen 3 and Sora from OpenAI. It emphasizes the transformative potential of these technologies on content consumption. The author also raises questions about the reliability of AI leaders and the limitations of scaling language models. Comparisons between different models, such as Runway Gen 3 and Sora, illustrate the improvements in video generation quality. The script also touches on the potential and limitations of AI in understanding and generating accurate world models.
📈 Incremental Improvements and the Economics of AI Scaling
This paragraph delves into the incremental improvements brought by scaling AI models, questioning the economic viability of investing in such advancements. It uses Claude 3.5 Sonic from Anthropic as a case study, comparing its performance to other models and highlighting the diminishing returns on investment as models approach human-level intelligence. The author also explores the concept of 'metacognition' in AI, suggesting that understanding how to think about a problem is as important as scaling computational power.
🤖 AI's Evolving Capabilities and the Hype Surrounding Them
The script addresses the hype around AI's capabilities, contrasting it with the reality of current technology. It cites comments from industry leaders who express skepticism about the pace and impact of AI advancements. The author discusses the potential of AI in fields like drug discovery and cancer treatment, while also acknowledging the uncertainty and the need for caution in interpreting AI's capabilities. The paragraph concludes with a call for a balanced view between optimism and realism in the AI community.
🚀 The Future of AI: Scaling, Research, and Real-World Applications
In the final paragraph, the script contemplates the future trajectory of AI, considering both scaling and algorithmic improvements. It presents predictions from industry experts about the potential of AI to surpass human capabilities in various domains. The author also reflects on the unpredictability of AI's impact and the importance of separating hype from reality. The paragraph ends with an invitation for the audience to engage in further discussions on AI's future through the author's Patreon platform.
Mindmap
Keywords
💡AI Video Generation
💡Scaling
💡Language Models
💡Multimodal Training
💡Hype vs. Reality
💡Benchmarks
💡Metacognition
💡Emergent Behaviors
💡Open AI
💡Anthropic
💡AI Ethics and Responsibility
Highlights
AI video generation is becoming increasingly tangible and accessible, set to transform content consumption.
Runway gen 3 and its audio, generated by AI, represents the current state of AI video generation.
AI models are trained on a fraction of available video data, indicating potential for more realistic video generation.
Luma dream machine allows for image generation and interpolation, offering a fun user experience.
Chinese model 'cing' and the anticipated release of 'Sora' from OpenAI are compared for video generation capabilities.
Training models on more data doesn't guarantee accurate world models, as seen in Sora's generation capabilities.
The question of whether scale in AI will solve all issues is raised, with indications it may not.
Real-time advanced voice mode from OpenAI, showcased in GPT-40's demo, has been delayed.
Claude 3.5 Sonic from Anthropic offers free and fast capabilities in certain domains.
Benchmarks for AI models have significant flaws and should be interpreted with caution.
The incremental benefits of scaling in AI models may not justify the increased cost and compute.
Artifacts feature in Claude 3.5 Sonic allows for interactive project work alongside the language model.
AI models still struggle with basic tasks despite increased scale, contradicting naive scaling hypotheses.
Bill Gates discusses the potential of scaling in AI and the need for understanding beyond just data access.
Metacognition in AI is identified as a significant frontier for development.
Mustafa Suleyman from Microsoft AI suggests that consistent action from AI models may not be achievable until GPT-6.
Sam Altman from OpenAI discusses the use of AI in cancer screening and potential future contributions to discovering cures.
Mark Zuckerberg expresses skepticism about the grandiose claims made by AI lab leaders.
Dario Amodei from Anthropic discusses the potential of AI in biology and drug discovery, with caveats about the uncertainty of predictions.
The pace of AI development is a challenge, with the need to ensure understanding keeps pace with capabilities.
AI models are compared to undergraduates, with the potential to reach professional levels in various fields.
Transcripts
artificial worlds generated by AI video
models have never been more tangible and
accessible and look set to transform how
millions and then billions of people
consume content and artificial
intelligence in the form of the new free
Claude 3.5 Sonic is more capable than it
has ever been but I will draw on
interviews in the last few days to show
that there are more questions than ever
not just about the merits of continued
scaling of language model
but about whether we can rely on the
words of those who lead these giant AI
orgs but first AI video generation which
is truly on fire at the moment these
outputs are from Runway gen 3 available
to many now and to everyone apparently
in the coming days the audio by the way
is also AI generated this time from udio
[Music]
and as you watch these videos remember
that the AI models that are generating
them are likely trained on far less than
1% of the video data that's a available
unlike highquality text Data video data
isn't even close to being used up expect
generations to get far more realistic
and not in too long either and by the
way if you're bored while waiting on the
Gen 3 wait list do play about with the
Luma dream machine I've got to admit it
is pretty fun to generate two images or
submit two real ones and have the model
interpolate between them now those of
you in China have actually already been
able to play with model of similar
capabilities called cing but we are all
waiting on the release of Sora the most
promising video generation model of them
all from open AI here are a couple of
comparisons between Runway gen 3 and
Sora the prompts used in both cases are
identical and there's one example that
particularly caught my eye as many of us
may have realized by now simply training
models on more data doesn't necessarily
mean they pick up accurate world models
now I strongly suspect that Sora was
trained on way more data with way more
compute with its generation at the
bottom you can see that the dust emerges
from behind the car this neatly
demonstrates the benefits of scale but
still leaves open the question about
whether scale will solve all now yes it
would be simple to extrapolate a
straight line upwards and say that with
enough scale we get a perfect world
simulation but I just don't think it
will be like that and there are already
more than tentative hints that scale
won't solve everything more on that in
just a moment but there is one more
modality I am sure we were all looking
forward to which is going to be delayed
that's the realtime advanced voice mode
from open AI it was the star of the demo
of GPT 40 and was promised in the coming
weeks alas though it has now been
delayed to the fall or the Autumn and
they say that's in part because they
want to improve the model's ability to
detect and refuse certain content I also
suspect though like dodgy physics with
video generation and hallucinations with
the language generation they also
realized it occasionally goes off the
rails now I personally find this funny
but you let me know whether this would
be acceptable to release refreshing
coolness in the air that just makes you
want to smile and take a deep breath of
that crisp invigorating Breeze the Sun's
shining but it's that this lovely gentle
warmth that's just perfect for light
Jack so either way we're definitely
going to have epic entertainment but the
question is what's next particularly
when it comes to the underlying
intelligence of models is it a case of
shooting past human level or diminishing
returns well here's some anecdotal
evidence with the recent release of
Claude 3.5 Sonic from anthropic it's
free and fast and in certain domains
more capable than comparable language
models this table I would say shows you
a comparison on things like basic
mathematical ability and general
knowledge compared to models like GPT 40
and Gemini 1.5 Pro from Google I would
caution that many of these benchmarks
have significant flaws so decimal point
differences I wouldn't pay too much
attention to the most interesting
comparison I would argue is between
Claude 3.5 Sonic and Claude 3 Sonet
there is some evidence that Claude 3.5
Sonic was trained on about four times as
much compute as Claude 3 on it and you
can see the difference that makes
definitely a boost across the board but
it would be hard to argue that it's four
times better and in the visual domain it
is noticeably better than its
predecessor and than many other models
and I got Early Access so I tested it a
fair bit these kind of benchmarks test
reading charts and diagrams and
answering basic questions about them but
the real question is how much extra
compute and therefore money can these
companies continue to scale up and
invest if the returns are still
incremental in other words how much more
will you and more importantly businesses
continue to pay for these incremental
benefits after all in no domains are
these models reaching 100% and let me
try to illustrate that with an example
and as we follow this example ask
yourself whether you would pay four
times as much for a 5% hallucination
rate versus an 8% hallucination rate if
in both cases you have to check the
answer anyway let me demonstrate with
the brilliant new new feature you can
use with Claude 3.5 Sonic from anthropic
it's called artifacts think of it like
an interactive project that you can work
on alongside the language model I dumped
a multi hundred page document on the
model and asked the following question
find three questions on functions from
this document and turn them into
clickable flash cards in an artifact
with full answers and explanations
revealed interactively it did it and
that is amazing but there's one slight
problem question one is perfect it's a
real question from the document
displayed perfectly and interactive with
the correct answer and explanation same
thing for question two but then we get
to question three where it copied the
question incorrectly worse than that it
rejigged and changed the answer options
also is there a real difference between
q^2 and netive Q ^2 when it claimed that
netive Q ^2 is the answer now you might
find this example trivial but I think
it's revealing don't get me wrong this
feature is mentally useful and it
wouldn't take me long to Simply tweak
that third question and by the way
finding those three examples strewn
across a multi hundred page document is
impressive even though it would save me
some time I would still have to
diligently check every character of
claude's answer and at the moment as I
discussed in more detail in my previous
video there is no indication that scale
will solve this issue now if you think
I'm just quibbling and benchmarks show
the real progress well here is the the
reasoning lead at Google deepmind
working on their Gemini series of models
someone pointed out a classic reasoning
error made by Claude 3.5 Sonic and Denny
XE said this love seeing tweets like
this rather than those on llms with phds
superhuman intelligence or fancy results
on leaked benchmarks I'm definitely not
the only one skeptical of Benchmark
results and an even more revealing
response to Claude 3.5s basic errors
came from open AI know Brown I think
it's more revealing because it shows
that those AI Labs anthropic and open AI
had their hopes slightly dashed based on
the results they expected in reasoning
from multimodal training non Brown said
Frontier models like GPT 40 and now
clawed 3.5 Sonic maybe at the level of a
quote smart high schooler mimicking the
words of Mira murati CTO of open aai in
some respects but they still struggle on
basic tasks like Tic-tac-toe and here's
the key quote there was hope that native
multimodal training would help with this
kind of reasoning but that hasn't been
the case that last sentence is somewhat
devastating to the naive scaling
hypothesis there was hope that native
multimodal training on things like video
from YouTube would teach models a world
model it would help but that hasn't been
the case now of course these companies
are working on Far More Than Just naive
scaling as we'll hear in a moment from
Bill Gates but it's not like you can
look at the benchmark Mark results on a
chart and just extrapolate forwards
here's Bill Gates promising two more
turns of scaling I think he means two
more orders of magnitude but notice how
he looks skeptical about how that will
be enough the big Frontier is not so
much scaling we have probably two more
turns of the crank on
scaling where by accessing video data
and getting very good at synthetic
data that we can scale up probably
you know two more times that's not the
most interesting Dimension the most
interesting Dimension is is what I call
metacognition where understanding how to
think about a problem in a broad sense
and step back and say okay how important
is this answer how could I check my
answer you know what external tools
would help me with this so we're get
we're going to get the scaling benefits
but at the same time the various
actions to change the the underlying
reing algorithm from the trivial that we
have today to more humanlike
metacognition that's the big
Frontier that uh it's a little hard to
prict how quickly that'll happen you
know I've seen that we will make
progress on that next year but we won't
completely solve it uh for some time
after that and there were others who
used to be incredibly bullish on scaling
that now sound a little different here's
Microsoft ai's CEO Mustafa sullan
perhaps drawing on lessons from the
mostly defunct inflection AI that he
used to run saying it won't be until GPT
6 the AI models will be able to follow
instructions and take consistent action
there's a lot of cherry-picked examples
that are impressive you know on Twitter
and stuff like that but to really get it
to consistently do it in novel
environments is is pretty hard and I
think that it's going to be not one two
orders of magnitude more computation of
training the models um so not gbt 5 but
more like gbt 6 scale models so I think
we're talking about two years before we
have systems that can really take action
now based on the evidence that I put
forward in my previous video let me know
if you agree with me that I still think
that's kind of naive reasoning
breakthroughs will rely on new research
breakthroughs not just more scale and
even samman said as much about a year
ago saying the ear ER of ever more
scaling of parameter count is over now
as we'll hear he has since contradicted
that saying current models are small
relative to where they'll be but at this
point you might be wondering about
emergent behaviors don't certain
capabilities just spring out when you
reach a certain scale well I simply
can't resist a quick plug for my new
corsera series that is out this week the
second module covers a mergent behaviors
and if you already have a corsera
account do please check it out it' be
free for you and if you were thinking of
getting one there will be a link in the
description anyway here's that quote
from samman somewhat contradicting the
comments he made a year ago models he
says get predictably better with scale
we're still just like so early in
developing such a complex system um
there's data issues There's algorithmic
issues uh the models are still quite
small relative to what they will be
someday and we know they get predictably
better but this was the point I was
trying to make at the start of the video
as I argu in my previous video I think
we're now at a time in AI where we
really have to work hard to separate the
hype from the reality simply trusting
the words of the leaders of these AI
Labs is less advisable than ever and of
course it's not just samman here's the
commitment from anthropic led by Dario
amade back last year they described why
they don't publish their research and
they said it's because we do not wish to
advance the rate of AI capabilities
progress but their CEO 3 days ago said
AI is progressing fast due in part to
their own efforts to try and keep Pace
with the rate at which the complexity of
the models is increasing I think this is
one of the biggest challenges in the
field the field is moving so fast
including by our own efforts that we
want to make sure that our understanding
keeps Pace with our our abilities our
capabilities to produce powerful models
he then went on to say that today's
models are like undergraduates which if
you've interacted with these models seem
seems pretty harsh on undergraduates if
we go back to the analogy of like
today's models are like
undergraduates uh you know let's say
those models get to the point where you
know they're kind of you know graduate
level or strong professional level think
of biology and Drug Discovery think of
um a model that is as strong as you know
a Nobel prize winning scientist or you
know the head of the you know the head
of head of drug Discovery at a major
pharmaceutical company now I don't know
if he's placing that on a naive trust in
benchmarks or whether he is deliberately
hyping and then later in the
conversation with the guy who's in
charge of the world's largest Sovereign
wealth fund he described how the kind of
AI that anthropic works on could be
instrumental in curing cancer I look at
all the things that have been invented
you know if I look back at biology you
know crisper the ability to like edit
genes if I look at um you know C
therapies which have cured certain kinds
of cancer
there's probably dozens of discoveries
like that lying around and if we had a
million copies of an AI system that are
as knowledgeable and as creative about
the field as all those scientists that
invented those things then I think the
rate of of those discoveries could
really proliferate and you know some of
our really really
longstanding diseases uh you know could
be could be addressed or even cured now
he added some caveats of course but that
was a claim echoed on the same day
actually I think by open AI Sam mman one
of our partners color health is now
using uh gb4 for cancer screening and
treatment plans and that's great and
then maybe a future version will help uh
discover cures for cancer other AI lab
leaders like Mark Zuckerberg think those
claims are getting out of hand but you
know part of that is the open source
thing too so that way other companies
out there can create different things
and people can just hack on it
themselves and mess around with it so I
guess that's a pretty deep worldview
that I have and I don't know I find it a
pretty big turnoff when people in the um
in the tech industry kind of talk about
building this one true AI it's like it's
almost as if they they kind of think
they're creating God or something and
it's it's like it's just that's not
that's not what we're doing that's I
don't think that's how this plays out
implicitly he's saying that companies
like open aai and anthropic are getting
carried away and later though in that
interview the CEO of anthropic admitted
that he was somewhat pulling things out
of his hat when it came to biology and
actually with scaling you know let's say
you know you extend people's productive
ability to work by 10 years right that
could be you know one six of the whole
economy do you think that's a realistic
Target I mean again like I know some
biology I know something about how the
AA models are going to happen I wouldn't
be able to tell you exactly what would
happen but like I can tell a story where
it's possible so so 15% and when will be
so when could we have added the
equivalent of 10 years to our life I
mean how long what what's the time frame
again like you know this involves so
many unknowns right if I if I try and
give an exact number it's just going to
sound like hype but like a thing I could
a thing I could imagine is like I don't
know like two to three years from now we
have ai systems that are like capable of
making that kind of Discovery 5 years
from now those those discoveries are
actually being made and 5 years after
that it's all gone through the
regulatory apparatus and and really so
you know we're talking about more we're
talking about you know a little over a
decade but really I'm just pulling
things out of my hat here like I don't
know that much about drug Discovery I
don't know that much about biology and
frankly although although I invented AI
scaling I don't know that much about
that either I can't predict it the truth
of course is that we simply don't know
what the ramifications will be of the
scaling and of course of new research
regardless these companies are pressing
ahead uh right now 100 Mill ion there
are models in training today that are
more like a billion I think if we go to
10 or 100 billion and I think that will
happen in 2025 2026 maybe 2027 and the
algorithmic improvements continue a pace
and the chip improvements continue a
pace then I think there there is in my
mind a good chance that by that time
we'll be able to get models that are
better than most humans at most things
but I want to know what you think are we
at the dawn of a new era in
entertainment and intelligence or has
the hype gone too far if you want to
hear more of my Reflections do check out
my podcasts on patreon on AI insiders
you could also check out the dozens of
bonus videos I've got on there and the
live meetups arranged via Discord but
regardless I just want to thank you for
getting all the way to the end and
joining me in these wild times have a
wonderful day
تصفح المزيد من مقاطع الفيديو ذات الصلة
The AI Hype is OVER! Have LLMs Peaked?
AI isn't gonna keep improving
AI News: Everything You Missed This Week!
AI Video Tools Are Exploding. These Are the Best
Elon Musk CHANGES AGI Deadline..Googles Stunning New AI TOOL, Realistic Text To Video, and More
ChatGPT Can Now Talk Like a Human [Latest Updates]
5.0 / 5 (0 votes)