New AI Chip, GPT4o, Claude 3.5, SpaceX Double Landing, AI Video Games
Summary
TLDRThe video script discusses recent AI advancements, focusing on Etched's new AI chip 'Sohu', which promises to outperform GPUs in processing speed and cost. It also covers OpenAI's delayed voice capabilities, Hugging Face's new LLM leaderboard highlighting 'Quen 72b' as a top performer, and Claude 3.5's coding prowess. The script wraps up with AI-generated videos resembling real-time gaming and Apple's decision not to integrate Meta's AI models into Siri due to privacy concerns.
Takeaways
- 🚀 A new AI chip company, Etched, has developed a chip called Sohu that claims to generate over 500,000 tokens per second running Llama 70b, which is specialized for Transformer models.
- 🔋 Sohu is said to be more efficient than GPUs, with one server equipped with eight Sohu chips replacing 160 Nvidia H100s, though it's not yet in production.
- 💡 The chip's specialization for Transformer models is likened to how ASICs were created for Bitcoin mining, suggesting a shift towards specialized hardware for AI tasks.
- 📉 GPUs are not improving significantly, with a 15% improvement in compute density over four years, indicating a need for specialized chips to enhance performance.
- 🌐 Open AI's voice capabilities, which were anticipated to be released, are delayed to further improve the model's content detection and refusal abilities, and infrastructure scalability.
- 🎙️ Open AI's advanced voice mode is expected to roll out in Alpha to a small group of users in late June, with a full rollout planned for the fall.
- 🏆 Hugging Face has launched a new open LLM leaderboard, with Quen 72b emerging as the top performer, indicating the dominance of Chinese open models in AI.
- 📊 There's a concern that AI model makers are focusing too much on public benchmarks, potentially at the expense of overall model performance.
- 🥇 CLA 3.5 (Sonaut) has achieved the top spot in coding and hard prompts, showcasing its capabilities against other leading models like GPT-40 and Gemini 1.5 Pro.
- 🎮 AI-generated video content, resembling a Call of Duty game, demonstrates the potential future of video games, though real-time generation requires significant computational power.
- 🔄 Reports suggest that Apple was in talks with Meta AI to integrate Llama 3 into Siri but has since decided against it due to privacy concerns, despite Apple's capacity to host the model themselves.
Q & A
What is the new AI chip company mentioned in the script called, and what is its claim to fame?
-The new AI chip company is called Etched, and it claims to be able to generate over 500,000 tokens per second running Llama 70b, with a chip named Sohu that is specialized for Transformer models.
How does the Sohu chip compare to Nvidia's H100 in terms of performance and efficiency?
-One server with eight Sohu chips is said to replace 160 Nvidia H100s. Sohu is more than 10 times faster and cheaper than Nvidia's next-generation Blackwell GPUs, running over 500,000 Llama 70b tokens per second compared to H100's 23,000 tokens per second.
What does the script suggest about the future of AI models and hardware specialization?
-The script suggests that within a few years, every large AI model will run on custom chips, which are more than 10 times faster and cheaper than current GPUs, indicating a shift towards specialized hardware for AI models.
Why is OpenAI delaying the release of its advanced voice mode?
-OpenAI is delaying the release of its advanced voice mode to improve the model's ability to detect and refuse certain content, and to further enhance the user experience and infrastructure to scale to millions of users while maintaining real-time responses.
What is the significance of the new open LLM leader board announced by Hugging Face's CEO?
-The new open LLM leader board is significant as it provides a comprehensive evaluation of major open LLMs, with Quen 72b emerging as the top performer, indicating a shift in the dominance of AI models and the importance of specialized benchmarks.
What is the current status of the integration talks between Apple and Meta's AI models for Siri?
-According to recent reports, Apple is no longer considering integrating Meta's AI models into Siri due to privacy concerns, despite previous talks suggesting otherwise.
What does the script imply about the potential impact of AI-generated content on the future of video games?
-The script implies that AI-generated content, as demonstrated by the realistic AI-rendered video, could revolutionize the video game industry by enabling highly realistic and immersive gaming experiences.
What is the script's perspective on the importance of specialized AI chips like Sohu for the future of AI development?
-The script highlights the importance of specialized AI chips like Sohu for the future of AI development, suggesting that they will become the standard for running large AI models due to their superior performance and cost-effectiveness.
What is the script's view on the current state of GPUs and their limitations in AI model performance?
-The script suggests that GPUs are not improving at a rate that matches the needs of AI model performance, with compute density only improving by 15% in the past four years, indicating a need for more specialized hardware.
How does the script describe the potential of AI in creating realistic video content, as shown in the AI-generated video?
-The script describes the potential of AI in creating realistic video content as impressive and mind-blowing, with the AI-generated video showcasing high-quality visuals and sound that are almost indistinguishable from real footage.
What is the script's opinion on the role of benchmarks in evaluating AI models?
-The script suggests that benchmarks are crucial for evaluating AI models, but there is a concern that model makers might be focusing too much on major public benchmarks at the expense of overall model performance.
Outlines
🚀 Revolutionary AI Chip 'Sohu' and Its Impact on the Future of AI
The script discusses the emergence of a new AI chip company called Etched, which has developed a chip named 'Sohu' capable of generating over 500,000 tokens per second using the Llama 70b model. This chip is specialized for Transformer models, similar to how ASICs were created for Bitcoin mining. 'Sohu' promises performance that surpasses Nvidia's H100 GPUs, suggesting a future where AI models will predominantly run on custom chips. The company emphasizes the importance of specialization in improving performance and reducing costs, and they predict a shift towards hardware tailored for AI models, potentially rendering general-purpose GPUs obsolete for AI tasks.
🎙️ Open AI's Delayed Voice Capabilities and the New Open LLM Leaderboard
The script addresses the delay in Open AI's voice capabilities, which were expected to launch but have been postponed to ensure high safety and reliability standards. Open AI plans to roll out advanced voice mode in Alpha to a select group of users and aims for broader access by fall. Additionally, the CEO of Hugging Face introduces a new leaderboard for evaluating major open LLMs, revealing that Quen 72b outperforms others, and there's a concern that model makers might be focusing too much on benchmark scores rather than overall performance. The script also highlights the impressive performance of Anthropic's CLA 3.5 model, which has secured top spots in various categories.
🎮 AI's Role in the Future of Video Games and Apple's Decision on AI Integration
The script explores the potential of AI in creating realistic and immersive video game experiences, as demonstrated by an AI-generated video that mimics the quality of a Call of Duty game. It also discusses the recent reports about Apple's consideration to integrate Meta AI's Llama 3 model into Siri, which was later dismissed due to privacy concerns. The summary touches on the implications of such integrations and Apple's stance on privacy, suggesting that Apple might have chosen to maintain control over user data by not proceeding with the integration.
Mindmap
Keywords
💡AI chips
💡Transformer models
💡GPUs
💡Nvidia H100s
💡ASICs
💡Open AI
💡Chat GPT Plus
💡Hugging Face
💡CLA 3.5
💡AI-generated video
💡Privacy concerns
Highlights
AI chip company 'Etched' claims its new chip 'Sohu' can generate over 500,000 tokens per second running Llama 70b, outperforming Nvidia GPUs.
Sohu is the first specialized chip for Transformer models, similar to how ASICs were created for Bitcoin mining.
Etched's Sohu chip is not yet in production, but promises more than 10 times the performance of Nvidia's next-generation GPUs.
Open AI's voice capabilities are delayed for further improvements, including the model's ability to detect and refuse certain content.
Open AI plans to roll out advanced voice mode in Alpha to a small group of users in late June, with full access expected in the fall.
Hugging Face's CEO introduces a new open LLM leaderboard, revealing that 'Quen 72b' is the top-performing model.
The new Sohu chip is expected to make every large AI model run on custom chips within a few years, marking a shift in AI hardware specialization.
AI Builders may be focusing too much on main evaluations at the expense of overall model performance.
CLA 3.5 SAA secures the top spot in coding and hard prompts, showcasing its competitive edge against other models like GPT-40.
Anton OA demonstrates Claude 3.5's performance against GPT-40 in coding, highlighting its success in task and full project completion.
Elon Musk shares a video of two rockets landing for reuse, showcasing a significant achievement in space engineering.
AI-generated video content is becoming increasingly realistic, hinting at the future of video games as predicted by Nvidia's CEO.
Apple was reportedly in talks with Meta AI to integrate Llama 3 into Siri but has since dropped the idea due to privacy concerns.
The decision to not integrate Meta AI's models into iPhones may also be influenced by Apple's criticism of Meta's privacy practices.
AI labs are spending millions optimizing kernels for Transformers, indicating a significant investment in AI model efficiency.
Startups are utilizing special Transformer software libraries to enhance features like speculative decoding and tree search.
Once Sohu hits the market, it is expected to reach a point of no return for AI model efficiency and specialization.
Transcripts
again it's just been a few days since my
last news video and so much has happened
we're going to be talking about AI chips
open AI Apple Cloud 3.5s dominance and
what looks to be the future of video
games let's get into it the first story
today is about a new AI chip company
called etched and it claims to be able
to generate over
500,000 tokens per second running llama
70b so this chip is called sohu I
believe and it lets you build products
that are impossible on gpus one server
with eight of these
replaces 160 Nvidia h100s and this is
all assuming this is true because I
don't think this is actually in
production yet now here's how it's able
to do that soou is the first specialized
chip as6 for Transformer models now if
you remember in the earlier days of
crypto mining specific chips were
created for Bitcoin mining and these
chips were especially made just to crack
that algorithm and this chip is specific
to the Transformer model and they even
say that we get way more performance
sohu can't run CNN's convolutional
neural networks lstms ssms or any other
AI model every major AI product chbt CLA
Gemini Sora is powered by Transformers
within a few years every large AI model
will run on custom chips it is more than
10 times faster and cheaper than even
nvidia's Next Generation Blackwell gpus
one so hu server runs over 500,000 llama
70b tokens per second 20 times more than
an h100 23,000 tokens per second and 10
times more than a B2 45,000 tokens per
second and here are the benchmarks now
interestingly enough they do not compare
themselves to Gro which I wonder why and
it says here gpus aren't getting better
they're just getting bigger in the past
four years compute density has only
improved by 15% next gen gpus are now
counting two chips as one card to double
their performance with with Mo's loss
slowing the only way to improve
performance is specialization and they
specifically call out Bitcoin mining
here so when Bitcoin miners hit the
market in 2014 it became cheaper to
throw out gpus than to use them to mine
Bitcoin and that's exactly what happened
everybody started transitioning to as6
they go on to say Transformers have a
huge moat we believe in the hardware
Lottery the architecture that wins is
the one that runs fastest and cheapest
on Hardware I'm not really sure what
they meant by Hardware lottery in this
context typically the hardware Lottery
means when you get a GPU some of them
are just made better than others AI labs
to spend hundreds of millions of dollars
optimizing kernels for Transformers
startups use special Transformer
software libraries like trt llm and VM
which offers features built on
Transformers like speculative decoding
and tree search Once soou hit the market
we will reach the point of no return
Transformer Killers will need to run
faster on gpus than Transformers run on
so hu if that happens we'll build an
Asic for that tool and on their website
scale is all you need for super
intelligence so if you want to learn
more about soou and the technical
details check out the website et.com and
I'll drop a link to that in the
description below next it seemed like
the open AI voice capabilities were
taking forever to come out and now we
know why they're actually delaying them
and as a reminder the GPT 40 voice
capabilities were essentially that movie
Her hey chat GPT how are you doing I'm
doing fantastic thanks for asking how
about you all right so we're sharing
update on the advanced voice mode we
demoed during the spring update which we
remain very excited about we plan to
start rolling this out in Alpha to a
small group of chat GPT plus users in
late June but need one more month to
reach our bar for launch for example
we're improving the model's ability to
detect and refuse certain content this
is disappointing this is very similar to
what it felt like when they announced
and demoed Sora and then we basically
have no idea when that's actually coming
and I'm guessing they were able to
jailbreak it we're also working on
improving the user experience and
preparing our infrastructure to scale to
Millions while maintaining real-time
responses as part of our iterative
deployment strategy we'll start the
alpha with a small group of users to
gather feedback and expand based on what
we learned we are planning for all plus
users to have access in the fall so
that's great right around the corner
exact timelines depend on meeting our
high safety and reliability bar we are
also working on rolling out a new video
and screen sharing capabilities we
demoed separately and will keep you
posted on that timeline chat gpt's
advanced voice mode can understand and
respond with emotions and non-verbal
cues moving us closer to realtime
natural conversations with AI great and
Billow wall responds with not sure what
the point of flexing a live demo was if
the intention was to delay the launch
like this clearly the product wasn't
ready I could probably say the same
thing about Microsoft's recall which was
recalled both of these things I was
incredibly excited about but now we have
to wait next Clen the CEO and co-founder
of hugging face has now announced a new
open llm leader board we burned 300
h100s to rerun new evaluations like MML
U pro for all major open llms some
learnings interestingly quen 72b is the
king and Chinese open models are
dominating overall you all know that I
tested quen 72b and yes it did perform
very well I would like to see a coding
specific flavor of quen 72b previous
evaluations have become two easy for
recent models that is something that I
found at least with my own llm rubric
and I asked you for suggestions for new
tests in my previous video keep them
coming drop it in the comments below if
you have suggestions for new tests that
I should use going forward on these open
source models there are indications that
AI Builders have started to focus on the
main evaluations too much at the expense
of model performances on other ones now
interestingly a lot of you have
mentioned and suggested that some of
these model makers might actually use my
tests to train their models I didn't
really think that was true because I'm
not actually doing a formal Benchmark
and it's just just a little YouTube
channel but it seems like open- Source
model makers are focusing on these major
public benchmarks and bigger is not
always smarter so that is something that
I've been thinking a lot about lately
think about this the Llama 3 88b Model
is substantially better than the Llama 2
7B model so just slightly bigger but
much better performance and they've
decided to cover the following General
tasks knowledge testing reasoning on
short and long context complex math
abilities and tasks well correlated with
human preference like instruction
following and to do that they have these
benchmarks mlu Pro GP QA M Sr which I've
never heard of multip soft reasoning
very cool math if eval and BBH and here
are the results as you can see here quen
72b instruct is number one metal llama
370b instruct number two 53 medium 4K
instruct number three ye 1.5 34b number
four and so on so if you want to learn
more if you want to read about the
details if you want to see the
benchmarks themselves I'll drop a link
to hugging face and this specific page
in the description below and speaking
about benchmarks and leaderboards by
lm.org CLA 3.5 SAA has just made a huge
leap securing the number one spot in
coding arita hard prompts Arena and
number two in the overall leaderboard
the new Sonet has surpassed Opus at five
times lower cost in competitive with
with Frontier models GPT 40 and Gemini
1.5 Pro across the board now in a
previous video I tested claw 3.5 Sonic
and yes it is the best model I've ever
tested in fact that was the video in
which I asked for new tests because it
completely demolished my tests so huge
congrats to anthropic and now it is
number one in coding number one in hard
prompts and number two overall and
remember this is the sonnet model this
is 3.5 Sonet Opus their largest model is
still coming 3.5 Opus that is so very
cool to see I'm now paying for Claude
and that is my go-to model and Anton OA
has also showed Claude 3.5s performance
versus GPT 40 on coding and this is
build success GPT 40 wins Claude 3.5
wins on task success and full project
success let's see what else he says
about it code that compiles fails a bit
more for Claude small difference and
pass human QA fails significantly less
for Claude And even more interesting is
the realistic benchmarks qualitative
results so he says Claud more verbose
nice for long pieces of code makes
generation slower which is interesting
because I guess the overall generation
because the code is longer it might make
sense but from what I've noticed the
actual tokens per seconds seem to be
faster with Claude and generally not
desirable in an agent setting which is
interesting and I haven't tested if you
want to see me test that let me know in
the comments does not follow
instructions in large prompts as
reliably as GPT 40 it tends to miss
crucial instr instructions for example
how to format output we experimentally
switched to Claud for a right long code
sub agent at lovable unfortunately we
decided to revert one of the things that
I'm going to be testing going forward
for all the models is its ability to
format its output in specific formats
and really Json next Elon Musk posted an
absolutely incredible video of two
rockets Landing for reuse at the same
time look at this video This Is The
Stuff of s sence fiction imagine how
much science and engineering had to go
into this to get these massive Rockets
to actually be able to come back down to
earth land for later reuse so
congratulations to SpaceX really cool
accomplishment keep up the amazing work
and I see Matt wolf commented this will
never not be impressive to me completely
agreed next Twitter user chubby who
always posts really interesting stuff
posted this video that is rendered
completely using AI this is not real
engine this is actually Ai and the
quality is mind-blowing so what we're
looking at is an AI generated video of
what looks to be kind of a Call of Duty
is game the sound is AI generated the
visuals are AI generated I can see a
little bit of morphing but overall it
looks incredible and very very realistic
now the amount of compute it will take
to do this in real time is going to be
tremendous and so we're not quite here
yet but as Jensen the CEO of Nvidia has
said this is truly the future of video
games and will really take video games
to the next level and just because
people were doubting that this was an
actual video game created by AI here is
another clip of it let's look at this
one so here's another version again you
could see a little bit of morphing a
little bit of clipping here and there
but overall it looks really good it
again is kind of like a Call of Duty is
game all right and last in the quickest
turnaround in reporting ever just a
couple days ago it was reported that
Apple was in talks with meta aai to
integrate llama 3 into Siri the same way
that open AI is integrated into Siri so
not a deep integration just simply an
API call but just a day or two later it
is now reported that they are not
considering that anymore and
specifically over privacy concerns which
is really surprising because if Apple
actually took the Llama model hosted it
themselves they would be able to control
the Privacy so maybe that means they
were depending on meta to actually
operate the inference endpoint so days
after the Wall Street Journal reported
that apple and meta were in talks to
integrate the latter's AI models
Bloomberg's Mark German said that the
iPhone maker was not planning any such
move and they were in talks with
multiple companies to explore
integration and I know they were also in
talks with Google or at least it was
reported as such they sheld the idea of
putting meta AI models on iPhones over
privacy concerns and the report also
noted that partnering with the social
networking company won won't do a lot of
good for Apple's image given that the
copertino based company has continuously
criticized meta's privacy practices very
true but again I think they could have
easily hosted the model themselves them
being apple and all the privacy concerns
they would have control over so I'm not
sure why they didn't do that so that's
it for today thanks for watching if you
enjoyed this video please consider
giving a like And subscribe and I'll see
you in the next one
Weitere ähnliche Videos ansehen
الذكاء الاصطناعي في أسبوع 🔥 | الحلقة 12 | نهاية سيطرة OpenAI أخبار مثيرة ونماذج وأدوات قوية ومجانية
En QUE SUPERA Claude 3.5 Sonnet a ChatGPT?
Why Apple's AI Bet Could Be Too Little, Too Late | Vantage with Palki Sharma
Huge AI NEWS : OpenAI Gives GPT-5 To US Government, Stunning New Humanoid Robot, Metas New Model..
GPT-5 SOON, AI-to-AI Payments Using Crypto, xAI GPU Cluster is Live, 1,000 Agent Simulation
GitHub's Devin Competitor, Sam Altman Talks GPT-5 and AGI, Amazon Q, Rabbit R1 Hacked (AI News)
5.0 / 5 (0 votes)