NVIDIA'S HUGE AI Chip Breakthroughs Change Everything (Supercut)
Summary
TLDRHuang unveils Nvidia's new Grace Hopper and H100 chips, representing a tipping point in accelerated computing and AI. Powering Nvidia's end-to-end AI platform, these chips enable breakthroughs in language models and vast improvements in performance and efficiency. Huang also announces partnerships and modular server designs to bring AI capabilities to the enterprise and every industry. Ultimately, he conveys a vision where AI empowers every company to become an 'intelligence producer', with AI factories fueled by Nvidia's accelerated computing platform.
Takeaways
- ๐ฒ Nvidia has reached a tipping point in accelerated computing and generative AI.
- ๐ฉโ๐ป Software is now programmed by engineers working with AI supercomputers.
- ๐ Accelerated computing is reinventing software from the ground up.
- ๐ Nvidia GPU performance has increased 1,000x in 5 years.
- ๐ญ AI supercomputers are a new type of factory for intelligence production.
- ๐ Grace Hopper, Nvidia's new accelerated computing chip, is now in full production.
- ๐คฏ The new DGX GH200 AI supercomputer delivers 1 exaflop of AI performance.
- ๐ Every data center will be re-engineered for accelerated computing.
- ๐ผ Nvidia AI Enterprise makes accelerated computing enterprise-grade.
- ๐ Nvidia is extending accelerated computing and AI from cloud to edge.
Q & A
What are the two fundamental transitions happening in the computer industry according to Huang?
-The two fundamental transitions are: 1) CPU scaling has ended, so the ability to get 10x more performance every 5 years has ended. 2) A new way of doing software via deep learning and AI was discovered, reinventing computation from the ground up.
What is accelerated computing and how does it work?
-Accelerated computing uses GPUs and other specialized hardware to dramatically speed up workloads like AI, data analytics, and graphics. It allows much higher performance and efficiency than general purpose CPUs.
What is the Grace Hopper superchip?
-The Grace Hopper is Nvidia's new accelerated computing CPU chip for AI and high performance computing. It has nearly 200 billion transistors and high speed memory built in for efficiency.
What is the H100 and why is it important?
-The H100 is Nvidia's latest AI accelerator GPU. With advanced features like the Transformer Engine, it delivers giant leaps in AI performance to power the next wave of generative AI.
What is the goal of the Nvidia MGX system?
-The Nvidia MGX is a new open and modular server architecture optimized specifically for accelerated computing. It allows flexible configurations for AI and HPC workloads.
What does Huang mean when he says the computer is now the data center?
-He means that with the rise of cloud computing and AI supercomputers, the focus has shifted from optimizing single servers to building integrated data centers with networks of servers acting as one giant computer.
What is Spectrum-X?
-Spectrum-X is Nvidia's new high performance data center interconnect platform. It helps optimize AI supercomputing clusters using advanced networking and software.
How does Nvidia plan to bring accelerated computing to enterprises?
-Nvidia aims to make accelerated computing enterprise-ready through solutions like the MGX reference architecture and the Nvidia AI Enterprise software stack for managing AI infrastructure securely.
What is the benefit of using accelerated computing in the cloud?
-Accelerated computing in the cloud can provide up to 24x higher throughput and 20x lower cost compared to CPU-only platforms for workloads like image processing.
What role do Grace Hopper Superchips play in Nvidia's plan for AI supercomputing?
-Nvidia will use Grace Hopper Superchips as the base components to build exascale AI supercomputers made up of 256 connected GH200 systems, delivering over 1 exaflop/s of AI performance.
Outlines
๐ Introducing the h100 AI Supercomputer
Jensen Huang introduces the h100 AI supercomputer, which has over 35,000 components and 8 Hopper GPUs. He shows the h100 system board and compute tray, calling it the world's most expensive computer at $200,000. He explains how the h100 replaces an entire room of computers and accelerates workloads significantly.
๐ Accelerated Computing Has Reached a Tipping Point
Huang explains the two fundamental computing industry transitions happening: CPU scaling has ended, but deep learning allows giant leaps in AI every two years. He shows how accelerated computing with GPUs processes an AI workload much faster and cheaper than CPUs. Accelerated computing now has high utilization across clouds, data centers, and applications.
๐ก AI Will Impact Every Industry
Huang discusses how easy access to large language models closes the digital divide and makes everyone a programmer. He explains why AI will enhance all existing applications, not just create new ones. The rate of progress will be fast since it's easy to use across industries.
๐ฅ Introducing Grace Hopper - The First Accelerated AI Supercomputer Processor
Huang unveils Grace Hopper, the world's first accelerated processor for AI with nearly 200 billion transistors and 600GB of coherent memory between the CPU and GPU. He shows how 256 Grace Hopper chips connected is an exascale system aimed at advancing generative AI.
๐ฆ Announcing the NVIDIA MGX - A Modular Accelerated Computing Server Architecture
Huang introduces NVIDIA MGX, an open server architecture specification optimized for accelerated computing. It's modular and flexible to address diverse data center requirements across industry verticals and make AI accessible.
๐ Spectrum-X - A New Interconnect to Make Any Data Center an AI Data Center
Huang proposes Spectrum-X, a high-throughput low-latency interconnect for supercomputing and AI workloads. Key capabilities are adaptive routing and congestion control to dramatically boost performance. This makes accelerated computing accessible across industries.
Mindmap
Keywords
๐กAccelerated Computing
๐กGenerative AI
๐กGrace Hopper
๐กNvidia AI
๐กHyperscale cloud
๐กSpectrum-X
๐กNvidia AI Enterprise
๐กModular architecture
๐กAI supercomputer
๐กExascale computing
Highlights
Software is no longer programmed just by computer engineers, it's now programmed by engineers working with AI supercomputers.
We have reached the tipping point of accelerated computing and generative AI.
GPU servers are expensive, but the computer is now the data center. The goal is to build the most cost effective data center, not server.
This computing era can understand multi-modality, so it can impact every industry.
The programming barrier is low - everyone is a programmer now. You just have to say something to the computer.
This computer can improve every existing application with AI. New applications aren't needed for it to succeed.
Grace Hopper has nearly 200 billion transistors and almost 600GB of memory coherent between CPU and GPU.
Google Cloud, Meta, and Microsoft will pioneer AI research with the new DGX GH200 AI supercomputer.
The NVIDIA MGX is an open, modular server design for accelerated computing that's multi-generation and standardized.
NVIDIA AI Enterprise makes accelerated computing Enterprise grade and secure for the first time.
New adaptive routing and congestion control in switches will dramatically increase Ethernet performance for AI.
We're in a transition with accelerated computing and generative AI that's full stack, data center scale, and domain specific.
H100 is in full production as the engine of generative AI. Grace Hopper scales it out in data centers.
Spectrum-X with switches, cables and software extends generative AI from hyperscale to enterprise.
NVIDIA AI Enterprise brings accelerated computing to the cloud for every enterprise.
Transcripts
this is the new computer industry
software is no longer programmed just by
computer Engineers software is
programmed by computer Engineers working
with AI supercomputers we have now
reached the Tipping Point of accelerated
Computing we have now reached the
Tipping Point of generative Ai and we
are so so so excited to be in full
volume production of the h100 this is
going to touch literally every single
industry let's take a look at how h100
is produced
[Music]
okay
[Music]
35
000 components on that system board
eight
Hopper gpus
let me show it to you
all right this
I would I would lift this but I I um I
still have the rest of the keynote I
would like to give this is 60 pounds 65
pounds it takes robots to lift it of
course and it takes robots to insert it
because the insertion pressure is so
high and has to be so perfect
this computer is two hundred thousand
dollars and as you know it replaces an
entire room of other computers it's the
world's single most expensive computer
that you can say the more you buy the
more you save
this is what a compute trade looks like
even this is incredibly heavy
see that
so this is the brand new h100 with the
world's first computer that has a
Transformer engine in it
the performance is utterly incredible
there are two fundamental transitions
happening in the computer industry today
all of you are deep within it and you
feel it there are two fundamental Trends
the first trend is because CPU scaling
has ended the ability to get 10 times
more performance every five years has
ended the ability to get 10 times more
performance every five years at the same
cost is the reason why computers are so
fast today
that trend has ended it happened at
exactly the time when a new way of doing
software was discovered deep learning
these two events came together and is
driving Computing today
accelerated Computing and generative AI
of doing software just a way of doing
computation is a reinvention from the
ground up and it's not easy accelerated
Computing has taken us nearly three
decades to accomplish
well this is how accelerated Computing
works
this is accelerated Computing used for
large language models basically the core
of generative AI this example is a 10
million dollar server and so 10 million
dollars gets you nearly a thousand CPU
servers and to train to process this
large language model takes 11 gigawatt
hours 11 gigawatt hours okay and this is
what happens when you accelerate this
workload with accelerated Computing and
so with 10 million dollars for a 10
million dollar server you buy 48 GPU
servers it's the reason why people say
that GPU servers are so expensive
remember people say GPS servers are so
expensive however the GPU server is no
longer the computer the computer is the
data center
your goal is to build the most cost
effective data center not build the most
cost effective server
back in the old days when the computer
was the server that would be a
reasonable thing to do but today the
computer is the data center so for 10
million dollars you buy 48 GPU servers
it only consumes 3.2 gigawatt hours and
44 times the performance
let me just show it to you one more time
this is before and this is after and
this is
we want dense computers not big ones we
want dense computers fast computers not
big ones let me show you something else
this is my favorite
if your goal if your goal is to get the
work done
and this is the work you want to get
done ISO work
okay this is ISO work all right look at
this
look at this look at this before
after you've heard me talk about this
for so many years
in fact every single time you saw me
I've been talking to you about
accelerated computing
and now
why is it that finally it's the Tipping
Point because we have now addressed so
many different domains of science so
many Industries and in data processing
in deep learning classical machine
learning
so many different ways for us to deploy
software from the cloud to Enterprise to
Super Computing to the edge
so many different configurations of gpus
from our hgx versions to our Omniverse
versions to our Cloud GPU and Graphics
version so many different versions now
the utilization is incredibly High
the utilization of Nvidia GPU is so high
almost every single cloud is
overextended almost every single data
center is overextended there are so many
different applications using it so we
have now reached the Tipping Point of
accelerated Computing we have now
reached the Tipping Point of generative
AI
people thought that gpus would just be
gpus they were completely wrong we
dedicated ourselves to Reinventing the
GPU so that it's incredibly good at
tensor processing and then all of the
algorithms and engines that sit on top
of these computers we call Nvidia AI the
only AI operating system in the world
that takes data processing from data
processing to training to optimization
to deployment and inference
end to end deep learning processing it
is the engine of AI today
we connected gpus to other gpus called
mvlink build one giant GPU and we
connected those gpus together using
infiniband into larger scale computers
the ability for us to drive the
processor and extend the scale of
computing
made it possible
for the AI research organization the
community to advance AI at an incredible
rate
so every two years we take giant leaps
forward and I'm expecting the next lead
to be giant as well
this is the new computer industry
software is no longer programmed just by
computer Engineers software is
programmed by computer Engineers working
with AI supercomputers these AI
supercomputers
are a new type of factory
it is very logical that a car industry
has factories they build things so you
can see cars
it is very logical that computer
industry has computer factories you
build things that you can see computers
in the future
every single major company will also
have ai factories
and you will build and produce your
company's intelligence
and it's a very sensible thing
we are intelligence producers already
it's just that the intelligence
producers the intelligence are people in
the future we will be intelligence
producers artificial intelligence
producers and every single company will
have factories and the factories will be
built this way
using accelerated Computing and
artificial intelligence we accelerated
computer Graphics by 1 000 times in five
years
Moore's Law is probably currently
running at about two times
a thousand times in five years a
thousand times in five years is one
million times in ten we're doing the
same thing in artificial intelligence
now question is what can you do when
your computer is one million times
faster
what would you do if your computer was
one million times faster well it turns
out that we can now apply the instrument
of our industry to so many different
fields that were impossible before
this is the reason why everybody is so
excited
there's no question that we're in a new
Computing era
there's just absolutely no question
about it every single Computing era you
could do different things that weren't
possible before and artificial
intelligence certainly qualifies this
particular Computing era is special in
several ways one
it is able to understand information of
more than just text and numbers it can
Now understand multi-modality which is
the reason why this Computing Revolution
can impact every industry
every industry two
because this computer
doesn't care how you program it
it will try to understand what you mean
because it has this incredible large
language model capability and so the
programming barrier is incredibly low we
have closed the digital divide
everyone is a programmer now you just
have to say something to the computer
third
this computer
not only is it able to do amazing things
for the for the future
it can do amazing things for every
single application of the previous era
which is the reason why all of these
apis are being connected into Windows
applications here and there in browsers
and PowerPoint and word every
application that exists will be better
because of AI
you don't have to just AI this
generation this Computing era does not
need
new applications it can succeed with old
applications and it's going to have new
applications
the rate of progress the rate of
progress because it's so easy to use
is the reason why it's growing so fast
this is going to touch literally every
single industry and at the core with
just as with every single Computing era
it needs a new Computing approach
the last several years I've been talking
to you about the new type of processor
we've been creating
and this is the reason we've been
creating it
ladies and gentlemen
Grace Hopper is now in full production
this is Grace Hopper
nearly 200 billion transistors in this
computer oh
foreign
look at this this is Grace Hopper
this this processor
this processor is really quite amazing
there are several characteristics about
it this is the world's first accelerated
processor
accelerated Computing processor that
also has a giant memory it has almost
600 gigabytes of memory that's coherent
between the CPU and the GPU and so the
GPU can reference the memory the CPU can
represent reference the memory and
unnecessary any unnecessary copying back
and forth could be avoided
the amazing amount of high-speed memory
lets the GPU work on very very large
data sets this is a computer this is not
a chip practically the Entire Computer
is on here all of the Lo this is uh uses
low power DDR memory just like your cell
phone except this has been optimized and
designed for high resilience data center
applications so let me show you what
we're going to do so the first thing is
of course we have the Grace Hopper
Superchip
put that into a computer the second
thing that we're going to do is we're
going to connect eight of these together
using ndlink this is an Envy link switch
so eight of this eight of this Connect
into three switch trays into eight eight
Grace Hopper pod
these eight Grace Hopper pods each one
of the grace Hoppers are connected to
the other Grace Hopper at 900 gigabytes
per second
Aid them connected together
as a pod and then we connect 32 of them
together
with another layer of switches
and in order to build in order to build
this
256 Grace Hopper Super Chips connected
into one exoflops one exaflops you know
that countries and Nations have been
working on exaflops Computing and just
recently achieved it
256 Grace Hoppers for deep learning is
one exaflop Transformer engine and it
gives us
144 terabytes of memory that every GPU
can see
this is not 144 terabytes distributed
this is 144 terabytes connected
why don't we take a look at what it
really looks like play please
foreign
[Applause]
this
is
150 miles of cables
fiber optic cables
2 000 fans
70
000 cubic feet per minute
it probably
recycles the air in this entire room in
a couple of minutes
forty thousand pounds
four elephants
one GPU
if I can get up on here this is actual
size
so this is this is our brand new
Grace Hopper AI supercomputer it is one
giant GPU
utterly incredible we're building it now
and we're so we're so excited that
Google Cloud meta and Microsoft will be
the first companies in the world to have
access
and they will be doing
exploratory research on the pioneering
front the boundaries of artificial
intelligence with us so this is the dgx
gh200 it is one giant GPU
okay I just talked about how we are
going to extend the frontier of AI
data centers all over the world and all
of them over the next decade will be
recycled
and re-engineered into accelerated data
centers and generative AI capable data
centers but there are so many different
applications in so many different areas
scientific computing
data processing cloud and video and
Graphics generative AI for Enterprise
and of course the edge each one of these
applications have different
configurations of servers
different focus of applications
different deployment methods and so
security is different operating system
is different how it's managed it's
different
well this is just an enormous number of
configurations and so today we're
announcing in partnership with so many
companies here in Taiwan the Nvidia mgx
it's an open modular server design
specification and the design for
Accelerated Computing most of the
servers today are designed for general
purpose Computing the mechanical thermal
and electrical is insufficient for a
very highly dense Computing system
accelerated computers take as you know
many servers and compress it into one
you save a lot of money you save a lot
of floor space but the architecture is
different and we designed it so that
it's multi-generation standardized so
that once you make an investment our
next generation gpus and Next Generation
CPUs and next generation dpus will
continue to easily configure into it so
that we can best time to Market and best
preservation of our investment different
data centers have different requirements
and we've made this modular and flexible
so that it could address all of these
different domains now this is the basic
chassis let's take a look at some of the
other things you can do with it this is
the Omniverse ovx server
it has x86 four l40s Bluefield three two
CX-7 six PCI Express Lots this is the
grace Omniverse server
Grace same for l40s BF3 Bluefield 3 and
2 cx-7s okay this is the grace Cloud
Graphics server
this is the hopper NV link generative AI
inference server
and of course Grace Hopper liquid cooled
okay for very dense servers and then
this one is our dense general purpose
Grace Superchip server this is just CPU
and has the ability to accommodate four
CPU four gray CPUs or two gray
Superchips enormous amounts of
performance in ISO performance Grace
only consumes 580 Watts for the whole
for the whole server versus the latest
generation CPU servers x86 servers 1090
Watts it's basically half the power at
the same performance or another way of
saying
you know at the same power if your data
center is power constrained you get
twice the performance most data centers
today are power limited and so this is
really a terrific capability
we're going to expand AI into a new
territory
if you look at the world's data centers
the data center is now the computer and
the network defines what that data
center does largely there are two types
of data centers today there's the data
center that's used for hyperscale where
you have application workloads of all
different kinds the number of CPUs you
the number of gpus you connect to it is
relatively low the number of tenants is
very high the workloads are Loosely
coupled
and you have another type of data center
they're like super Computing data
centers AI supercomputers where the
workloads are tightly coupled
the number of tenants far fewer and
sometimes just one
its purpose is high throughput on very
large Computing problems
and so super Computing centers and Ai
supercomputers and the world's cloud
hyperscale cloud are very different in
nature
the ability for ethernet to interconnect
components of almost from anywhere is
the reason why the world's internet was
created if it required too much
coordination how could we have built
today's internet so ethernet's profound
contribution it's this lossy capability
is resilient capability and because so
it basically can connect almost anything
together
however a super Computing data center
can't afford that you can't interconnect
random things together because that
billion dollar supercomputer the
difference between 95 percent
networking throughput achieved versus 50
is effectively 500 million dollars
now it's really really important to
realize that in a high performance
Computing application every single GPU
must finish their job so that the
application can move on
in many cases where you do all
reductions you have to wait until the
results of every single one so if one
node takes too long everybody gets held
back
the question is how do we introduce
a new type of ethernet that's of course
backwards compatible with everything but
it's engineered in a way that achieves
the type of capabilities that we that we
can bring AI workloads to the world's
any data center first
adaptive routing adaptive routing
basically says based on the traffic that
is going through your data center
depending on which one of the ports of
that switch is over congested it will
tell Bluefield 3 to send and will send
it to another Port Bluefield 3 on the
other end would reassemble it and
present the data to the GPU without any
CPU intervention second congestion
control congestion control it is
possible for a certain different ports
to become heavily congested in which
case each switch will see how the
network is performing and communicate to
the senders please don't send any more
data right away
because you're congesting the network
that congestion control requires
basically a overriding system which
includes software the switch working
with all of the endpoints to overall
manage the congestion or the traffic and
the throughput of the data center this
capability is going to increase
ethernet's overall performance
dramatically
now one of the things that very few
people realize
is that today there's only one software
stack that is Enterprise secure and
Enterprise grade
that software stack is CPU
and the reason for that is because in
order to be Enterprise grade it has to
be Enterprise secure and has to be
Enterprise managed and Enterprise
supported over 4 000 software packages
is what it takes for people to use
accelerated Computing today in data
processing and training and optimization
all the way to inference so for the very
first time we are taking all of that
software
and we're going to maintain it and
manage it like red hat does for Linux
Nvidia AI Enterprise will do it for all
of nvidia's libraries now Enterprise can
finally have an Enterprise grade and
Enterprise secure software stack this is
such a big deal otherwise
even though the promise of accelerated
Computing is possible for many
researchers and scientists is not
available for Enterprise companies and
so let's take a look at the benefit for
them this is a simple image processing
application if you were to do it on a
CPU versus on a GPU running on
Enterprise Nvidia AI Enterprise you're
getting
31.8 images per minute or basically 24
times the throughput or you only pay
five percent of the cost
this is really quite amazing this is the
benefit of accelerated Computing in the
cloud but for many companies Enterprises
is simply not possible unless you have
this stack
Nvidia AI Enterprise is now fully
integrated into AWS Google cloud and
Microsoft Azure or an oracle Cloud it is
also integrated into the world's machine
learning operations pipeline as I
mentioned before AI is a different type
of workload and this type of new type of
software this new type of software has a
whole new software industry and this
software industry 100 of them we have
now connected with Nvidia Enterprise
I told you several things I told you
that we are going through two
simultaneous Computing industry
transition accelerated Computing and
generative AI
two
this form of computing is not like the
traditional general purpose Computing it
is full stack
it is Data Center scale because the data
center is the computer and it is domain
specific for every domain that you want
to go into every industry you go into
you need to have the software stack and
if you have the software stack then the
utility the utilization of your machine
the utilization of your computer will be
high
so number two
it is full stack data scanner scale and
domain specific we are in full
production of the engine of generative
Ai and that is hgx h100 meanwhile
this engine that's going to be used for
AI factories will be scaled out using
Grace Hopper the engine that we created
for the era of generative AI we also
took Grace Hopper connected to 256 node
nvlink and created the largest GPU in
the world dgx
gh200
we're trying to extend generative Ai and
accelerated Computing in several
different directions at the same time
number one we would like to of course
extend it in the cloud
so that every cloud data center can be
an AI data center not just AI factories
and hyperscale but every hyperscale data
center can now be a generative AI Data
Center and the way we do that is the
Spectrum X it takes four components to
make Spectrum X possible the switch
the Bluefield 3 Nick the interconnects
themselves the cables are so important
in high speed high-speed Communications
and the software stack that goes on top
of it we would like to extend generative
AI to the world's Enterprise and there
are so many different configurations of
servers and the way we're doing that
with partnership with our Taiwanese
ecosystem the mgx modular accelerated
Computing systems we put Nvidia into
Cloud so that every Enterprise in the
world can engage us to create generative
AI models and deploy it in a Enterprise
grade Enterprise secure way in every
single Cloud I want to thank all of you
for your partnership over the years
thank you
[Applause]
Browse More Related Video
![](https://i.ytimg.com/vi/g5wf4yY52S0/hq720.jpg)
GET IN EARLY! I'm Investing In This HUGE AI Chip Breakthrough
![](https://i.ytimg.com/vi/PM6dRb8ttzQ/hq720.jpg)
NVIDIA CEO Jensen Huang Reveals AI Future: "NIMS" Digital Humans, World Simulations & AI Factories.
![](https://i.ytimg.com/vi/8Pm2xEViNIo/hq720.jpg)
A Conversation with the Founder of NVIDIA: Who Will Shape the Future of AI?
![](https://i.ytimg.com/vi/iUOrH2FJKfo/hqdefault.jpg?sqp=-oaymwEXCJADEOABSFryq4qpAwkIARUAAIhCGAE=&rs=AOn4CLBv3IDPS7H2G2aNrxjLt2B9Yg6qvg)
A Conversation with the Jensen Huang of Nvidia: Who Will Shape the Future of AI? (Full Interview)
![](https://i.ytimg.com/vi/Z1RrVgI_HQE/hq720.jpg)
Nvidia's meteoric rise to $3 trillion | About That
![](https://i.ytimg.com/vi/mpvsjftEeAo/hq720.jpg)
#้ปไปๅณ ้ฉๅ็บ #็พ่ถ ๅพฎ #ๆข่ฆๅพ #COMPUTEX ไธป้กๆผ่ฌ็ซๅฐ๏ฝๅฎๆด็ฒพ่ฏ
5.0 / 5 (0 votes)