NVIDIA'S HUGE AI Chip Breakthroughs Change Everything (Supercut)

Ticker Symbol: YOU
11 Jun 202326:07

Summary

TLDRHuang unveils Nvidia's new Grace Hopper and H100 chips, representing a tipping point in accelerated computing and AI. Powering Nvidia's end-to-end AI platform, these chips enable breakthroughs in language models and vast improvements in performance and efficiency. Huang also announces partnerships and modular server designs to bring AI capabilities to the enterprise and every industry. Ultimately, he conveys a vision where AI empowers every company to become an 'intelligence producer', with AI factories fueled by Nvidia's accelerated computing platform.

Takeaways

  • ๐Ÿ˜ฒ Nvidia has reached a tipping point in accelerated computing and generative AI.
  • ๐Ÿ‘ฉโ€๐Ÿ’ป Software is now programmed by engineers working with AI supercomputers.
  • ๐Ÿ”‹ Accelerated computing is reinventing software from the ground up.
  • ๐Ÿš€ Nvidia GPU performance has increased 1,000x in 5 years.
  • ๐Ÿญ AI supercomputers are a new type of factory for intelligence production.
  • ๐Ÿ˜Ž Grace Hopper, Nvidia's new accelerated computing chip, is now in full production.
  • ๐Ÿคฏ The new DGX GH200 AI supercomputer delivers 1 exaflop of AI performance.
  • ๐ŸŒ Every data center will be re-engineered for accelerated computing.
  • ๐Ÿ’ผ Nvidia AI Enterprise makes accelerated computing enterprise-grade.
  • ๐Ÿ”Œ Nvidia is extending accelerated computing and AI from cloud to edge.

Q & A

  • What are the two fundamental transitions happening in the computer industry according to Huang?

    -The two fundamental transitions are: 1) CPU scaling has ended, so the ability to get 10x more performance every 5 years has ended. 2) A new way of doing software via deep learning and AI was discovered, reinventing computation from the ground up.

  • What is accelerated computing and how does it work?

    -Accelerated computing uses GPUs and other specialized hardware to dramatically speed up workloads like AI, data analytics, and graphics. It allows much higher performance and efficiency than general purpose CPUs.

  • What is the Grace Hopper superchip?

    -The Grace Hopper is Nvidia's new accelerated computing CPU chip for AI and high performance computing. It has nearly 200 billion transistors and high speed memory built in for efficiency.

  • What is the H100 and why is it important?

    -The H100 is Nvidia's latest AI accelerator GPU. With advanced features like the Transformer Engine, it delivers giant leaps in AI performance to power the next wave of generative AI.

  • What is the goal of the Nvidia MGX system?

    -The Nvidia MGX is a new open and modular server architecture optimized specifically for accelerated computing. It allows flexible configurations for AI and HPC workloads.

  • What does Huang mean when he says the computer is now the data center?

    -He means that with the rise of cloud computing and AI supercomputers, the focus has shifted from optimizing single servers to building integrated data centers with networks of servers acting as one giant computer.

  • What is Spectrum-X?

    -Spectrum-X is Nvidia's new high performance data center interconnect platform. It helps optimize AI supercomputing clusters using advanced networking and software.

  • How does Nvidia plan to bring accelerated computing to enterprises?

    -Nvidia aims to make accelerated computing enterprise-ready through solutions like the MGX reference architecture and the Nvidia AI Enterprise software stack for managing AI infrastructure securely.

  • What is the benefit of using accelerated computing in the cloud?

    -Accelerated computing in the cloud can provide up to 24x higher throughput and 20x lower cost compared to CPU-only platforms for workloads like image processing.

  • What role do Grace Hopper Superchips play in Nvidia's plan for AI supercomputing?

    -Nvidia will use Grace Hopper Superchips as the base components to build exascale AI supercomputers made up of 256 connected GH200 systems, delivering over 1 exaflop/s of AI performance.

Outlines

00:00

๐Ÿ˜Š Introducing the h100 AI Supercomputer

Jensen Huang introduces the h100 AI supercomputer, which has over 35,000 components and 8 Hopper GPUs. He shows the h100 system board and compute tray, calling it the world's most expensive computer at $200,000. He explains how the h100 replaces an entire room of computers and accelerates workloads significantly.

05:01

๐Ÿš€ Accelerated Computing Has Reached a Tipping Point

Huang explains the two fundamental computing industry transitions happening: CPU scaling has ended, but deep learning allows giant leaps in AI every two years. He shows how accelerated computing with GPUs processes an AI workload much faster and cheaper than CPUs. Accelerated computing now has high utilization across clouds, data centers, and applications.

10:03

๐Ÿ’ก AI Will Impact Every Industry

Huang discusses how easy access to large language models closes the digital divide and makes everyone a programmer. He explains why AI will enhance all existing applications, not just create new ones. The rate of progress will be fast since it's easy to use across industries.

15:04

๐Ÿ”ฅ Introducing Grace Hopper - The First Accelerated AI Supercomputer Processor

Huang unveils Grace Hopper, the world's first accelerated processor for AI with nearly 200 billion transistors and 600GB of coherent memory between the CPU and GPU. He shows how 256 Grace Hopper chips connected is an exascale system aimed at advancing generative AI.

20:08

๐Ÿ“ฆ Announcing the NVIDIA MGX - A Modular Accelerated Computing Server Architecture

Huang introduces NVIDIA MGX, an open server architecture specification optimized for accelerated computing. It's modular and flexible to address diverse data center requirements across industry verticals and make AI accessible.

25:08

๐ŸŒ Spectrum-X - A New Interconnect to Make Any Data Center an AI Data Center

Huang proposes Spectrum-X, a high-throughput low-latency interconnect for supercomputing and AI workloads. Key capabilities are adaptive routing and congestion control to dramatically boost performance. This makes accelerated computing accessible across industries.

Mindmap

Keywords

๐Ÿ’กAccelerated Computing

Accelerated computing refers to the use of specialized hardware like GPUs to dramatically speed up computing tasks like AI and deep learning. It is a core theme of the video, with the presenter explaining how they have focused on reinventing the GPU to handle tensor processing and AI workloads. He states that accelerated computing combined with AI represents a fundamental transition in the computing industry. Examples of accelerated computing in the script include Nvidia GPUs and the new Grace Hopper specialized AI chip.

๐Ÿ’กGenerative AI

Generative AI refers to AI systems that can generate new content like images, video, text etc. The presenter states that we have reached a tipping point for generative AI, enabled by progress in accelerated computing. He envisions generative AI being used to create 'AI factories' that can produce intelligence and content for companies. The Grace Hopper chips will help scale generative AI.

๐Ÿ’กGrace Hopper

Grace Hopper is Nvidia's new specialized chip for accelerating AI and deep learning workloads. It has nearly 200 billion transistors and the presenter highlights advanced features like its giant coherent memory pool between the GPU and CPU. He announces they will connect 256 Grace Hopper chips into an 'exaflops transformer engine' called dgx gh200 aimed at pushing the boundaries of generative AI.

๐Ÿ’กNvidia AI

Nvidia AI refers to Nvidia's end-to-end software stack and tools that support deep learning and AI workloads. It works with their accelerated computing hardware like GPUs. The presenter explains how Nvidia AI helps take AI from data processing and training all the way to deployment and interference. It aims to increase ease of use and adoption of AI.

๐Ÿ’กHyperscale cloud

Hyperscale cloud refers to the massive cloud data centers run by companies like Amazon, Google and Microsoft that host websites and applications. The presenter contrasts the networking and architecture needs of hyperscale cloud with high performance computing uses like AI training that require tightly coupled workflows.

๐Ÿ’กSpectrum-X

Spectrum-X is a new type of programmable ethernet platform announced by Nvidia to help optimize networking in AI and high performance computing applications. It introduces capabilities like adaptive routing and congestion control to dynamically manage traffic, helping accelerate jobs that require tight coordination between nodes.

๐Ÿ’กNvidia AI Enterprise

Nvidia AI Enterprise is a new software stack announced by Nvidia that packages all their AI libraries, tools and drivers into a single enterprise-grade solution. The presenter explains this will finally make accelerated computing and AI accessible for business and enterprise use cases that require security, manageability and support.

๐Ÿ’กModular architecture

The presenter emphasizes the need for modular and flexible server architecture to address the diverse accelerated computing use cases across industries. To that end, they announce the Nvidia mgx - an open standard for modular accelerated computing system designs that can be adapted for needs ranging from cloud graphics to AI training.

๐Ÿ’กAI supercomputer

AI supercomputers refer to specialized computing systems optimized for AI workloads like training deep learning models. The presenter draws contrasts between the networking and architectural needs of AI supercomputers versus hyperscale cloud data centers. The dgx gh200 with 256 connected Grace Hopper chips is presented as a new AI supercomputer for pushing generative AI.

๐Ÿ’กExascale computing

Exascale computing refers to high performance computing systems capable of at least a billion billion operations per second. The presenter reveals their new dgx gh200 AI supercomputer with 256 Grace Hopper chips can achieve an exaflop in terms of transformer compute, allowing it to take on cutting edge generative AI workloads.

Highlights

Software is no longer programmed just by computer engineers, it's now programmed by engineers working with AI supercomputers.

We have reached the tipping point of accelerated computing and generative AI.

GPU servers are expensive, but the computer is now the data center. The goal is to build the most cost effective data center, not server.

This computing era can understand multi-modality, so it can impact every industry.

The programming barrier is low - everyone is a programmer now. You just have to say something to the computer.

This computer can improve every existing application with AI. New applications aren't needed for it to succeed.

Grace Hopper has nearly 200 billion transistors and almost 600GB of memory coherent between CPU and GPU.

Google Cloud, Meta, and Microsoft will pioneer AI research with the new DGX GH200 AI supercomputer.

The NVIDIA MGX is an open, modular server design for accelerated computing that's multi-generation and standardized.

NVIDIA AI Enterprise makes accelerated computing Enterprise grade and secure for the first time.

New adaptive routing and congestion control in switches will dramatically increase Ethernet performance for AI.

We're in a transition with accelerated computing and generative AI that's full stack, data center scale, and domain specific.

H100 is in full production as the engine of generative AI. Grace Hopper scales it out in data centers.

Spectrum-X with switches, cables and software extends generative AI from hyperscale to enterprise.

NVIDIA AI Enterprise brings accelerated computing to the cloud for every enterprise.

Transcripts

play00:00

this is the new computer industry

play00:03

software is no longer programmed just by

play00:05

computer Engineers software is

play00:07

programmed by computer Engineers working

play00:09

with AI supercomputers we have now

play00:12

reached the Tipping Point of accelerated

play00:15

Computing we have now reached the

play00:17

Tipping Point of generative Ai and we

play00:20

are so so so excited to be in full

play00:23

volume production of the h100 this is

play00:27

going to touch literally every single

play00:28

industry let's take a look at how h100

play00:32

is produced

play00:36

[Music]

play01:13

okay

play01:18

[Music]

play01:28

35

play01:29

000 components on that system board

play01:32

eight

play01:33

Hopper gpus

play01:36

let me show it to you

play01:43

all right this

play01:45

I would I would lift this but I I um I

play01:48

still have the rest of the keynote I

play01:49

would like to give this is 60 pounds 65

play01:53

pounds it takes robots to lift it of

play01:55

course and it takes robots to insert it

play01:57

because the insertion pressure is so

play01:58

high and has to be so perfect

play02:00

this computer is two hundred thousand

play02:02

dollars and as you know it replaces an

play02:05

entire room of other computers it's the

play02:07

world's single most expensive computer

play02:10

that you can say the more you buy the

play02:12

more you save

play02:18

this is what a compute trade looks like

play02:20

even this is incredibly heavy

play02:24

see that

play02:25

so this is the brand new h100 with the

play02:29

world's first computer that has a

play02:31

Transformer engine in it

play02:33

the performance is utterly incredible

play02:37

there are two fundamental transitions

play02:39

happening in the computer industry today

play02:41

all of you are deep within it and you

play02:43

feel it there are two fundamental Trends

play02:46

the first trend is because CPU scaling

play02:49

has ended the ability to get 10 times

play02:52

more performance every five years has

play02:54

ended the ability to get 10 times more

play02:56

performance every five years at the same

play02:58

cost is the reason why computers are so

play03:00

fast today

play03:01

that trend has ended it happened at

play03:04

exactly the time when a new way of doing

play03:07

software was discovered deep learning

play03:09

these two events came together and is

play03:13

driving Computing today

play03:16

accelerated Computing and generative AI

play03:20

of doing software just a way of doing

play03:21

computation is a reinvention from the

play03:23

ground up and it's not easy accelerated

play03:26

Computing has taken us nearly three

play03:28

decades to accomplish

play03:30

well this is how accelerated Computing

play03:32

works

play03:33

this is accelerated Computing used for

play03:35

large language models basically the core

play03:38

of generative AI this example is a 10

play03:41

million dollar server and so 10 million

play03:43

dollars gets you nearly a thousand CPU

play03:45

servers and to train to process this

play03:49

large language model takes 11 gigawatt

play03:52

hours 11 gigawatt hours okay and this is

play03:55

what happens when you accelerate this

play03:58

workload with accelerated Computing and

play04:00

so with 10 million dollars for a 10

play04:02

million dollar server you buy 48 GPU

play04:04

servers it's the reason why people say

play04:06

that GPU servers are so expensive

play04:10

remember people say GPS servers are so

play04:13

expensive however the GPU server is no

play04:17

longer the computer the computer is the

play04:19

data center

play04:20

your goal is to build the most cost

play04:22

effective data center not build the most

play04:24

cost effective server

play04:26

back in the old days when the computer

play04:28

was the server that would be a

play04:30

reasonable thing to do but today the

play04:32

computer is the data center so for 10

play04:34

million dollars you buy 48 GPU servers

play04:36

it only consumes 3.2 gigawatt hours and

play04:41

44 times the performance

play04:43

let me just show it to you one more time

play04:46

this is before and this is after and

play04:48

this is

play04:53

we want dense computers not big ones we

play04:57

want dense computers fast computers not

play04:58

big ones let me show you something else

play05:01

this is my favorite

play05:03

if your goal if your goal is to get the

play05:05

work done

play05:07

and this is the work you want to get

play05:08

done ISO work

play05:10

okay this is ISO work all right look at

play05:13

this

play05:24

look at this look at this before

play05:27

after you've heard me talk about this

play05:29

for so many years

play05:31

in fact every single time you saw me

play05:33

I've been talking to you about

play05:34

accelerated computing

play05:35

and now

play05:37

why is it that finally it's the Tipping

play05:39

Point because we have now addressed so

play05:42

many different domains of science so

play05:44

many Industries and in data processing

play05:47

in deep learning classical machine

play05:49

learning

play05:51

so many different ways for us to deploy

play05:52

software from the cloud to Enterprise to

play05:55

Super Computing to the edge

play05:56

so many different configurations of gpus

play05:58

from our hgx versions to our Omniverse

play06:02

versions to our Cloud GPU and Graphics

play06:04

version so many different versions now

play06:06

the utilization is incredibly High

play06:10

the utilization of Nvidia GPU is so high

play06:13

almost every single cloud is

play06:15

overextended almost every single data

play06:18

center is overextended there are so many

play06:19

different applications using it so we

play06:22

have now reached the Tipping Point of

play06:25

accelerated Computing we have now

play06:27

reached the Tipping Point of generative

play06:29

AI

play06:30

people thought that gpus would just be

play06:32

gpus they were completely wrong we

play06:33

dedicated ourselves to Reinventing the

play06:35

GPU so that it's incredibly good at

play06:38

tensor processing and then all of the

play06:40

algorithms and engines that sit on top

play06:42

of these computers we call Nvidia AI the

play06:45

only AI operating system in the world

play06:47

that takes data processing from data

play06:49

processing to training to optimization

play06:52

to deployment and inference

play06:54

end to end deep learning processing it

play06:57

is the engine of AI today

play06:59

we connected gpus to other gpus called

play07:02

mvlink build one giant GPU and we

play07:04

connected those gpus together using

play07:07

infiniband into larger scale computers

play07:09

the ability for us to drive the

play07:11

processor and extend the scale of

play07:14

computing

play07:15

made it possible

play07:17

for the AI research organization the

play07:19

community to advance AI at an incredible

play07:22

rate

play07:23

so every two years we take giant leaps

play07:26

forward and I'm expecting the next lead

play07:28

to be giant as well

play07:29

this is the new computer industry

play07:33

software is no longer programmed just by

play07:35

computer Engineers software is

play07:37

programmed by computer Engineers working

play07:39

with AI supercomputers these AI

play07:42

supercomputers

play07:43

are a new type of factory

play07:46

it is very logical that a car industry

play07:49

has factories they build things so you

play07:50

can see cars

play07:51

it is very logical that computer

play07:54

industry has computer factories you

play07:56

build things that you can see computers

play07:59

in the future

play08:01

every single major company will also

play08:05

have ai factories

play08:08

and you will build and produce your

play08:11

company's intelligence

play08:12

and it's a very sensible thing

play08:15

we are intelligence producers already

play08:18

it's just that the intelligence

play08:19

producers the intelligence are people in

play08:22

the future we will be intelligence

play08:24

producers artificial intelligence

play08:26

producers and every single company will

play08:28

have factories and the factories will be

play08:30

built this way

play08:31

using accelerated Computing and

play08:33

artificial intelligence we accelerated

play08:34

computer Graphics by 1 000 times in five

play08:37

years

play08:38

Moore's Law is probably currently

play08:40

running at about two times

play08:42

a thousand times in five years a

play08:45

thousand times in five years is one

play08:47

million times in ten we're doing the

play08:49

same thing in artificial intelligence

play08:51

now question is what can you do when

play08:53

your computer is one million times

play08:55

faster

play08:57

what would you do if your computer was

play08:59

one million times faster well it turns

play09:02

out that we can now apply the instrument

play09:04

of our industry to so many different

play09:07

fields that were impossible before

play09:10

this is the reason why everybody is so

play09:12

excited

play09:14

there's no question that we're in a new

play09:15

Computing era

play09:17

there's just absolutely no question

play09:18

about it every single Computing era you

play09:20

could do different things that weren't

play09:23

possible before and artificial

play09:25

intelligence certainly qualifies this

play09:27

particular Computing era is special in

play09:30

several ways one

play09:32

it is able to understand information of

play09:35

more than just text and numbers it can

play09:38

Now understand multi-modality which is

play09:39

the reason why this Computing Revolution

play09:41

can impact every industry

play09:44

every industry two

play09:46

because this computer

play09:49

doesn't care how you program it

play09:51

it will try to understand what you mean

play09:53

because it has this incredible large

play09:55

language model capability and so the

play09:57

programming barrier is incredibly low we

play10:00

have closed the digital divide

play10:03

everyone is a programmer now you just

play10:06

have to say something to the computer

play10:08

third

play10:10

this computer

play10:12

not only is it able to do amazing things

play10:14

for the for the future

play10:16

it can do amazing things for every

play10:19

single application of the previous era

play10:22

which is the reason why all of these

play10:24

apis are being connected into Windows

play10:25

applications here and there in browsers

play10:27

and PowerPoint and word every

play10:29

application that exists will be better

play10:31

because of AI

play10:33

you don't have to just AI this

play10:35

generation this Computing era does not

play10:37

need

play10:38

new applications it can succeed with old

play10:41

applications and it's going to have new

play10:43

applications

play10:45

the rate of progress the rate of

play10:47

progress because it's so easy to use

play10:50

is the reason why it's growing so fast

play10:52

this is going to touch literally every

play10:54

single industry and at the core with

play10:57

just as with every single Computing era

play10:58

it needs a new Computing approach

play11:01

the last several years I've been talking

play11:03

to you about the new type of processor

play11:05

we've been creating

play11:06

and this is the reason we've been

play11:08

creating it

play11:09

ladies and gentlemen

play11:11

Grace Hopper is now in full production

play11:13

this is Grace Hopper

play11:16

nearly 200 billion transistors in this

play11:20

computer oh

play11:23

foreign

play11:29

look at this this is Grace Hopper

play11:33

this this processor

play11:36

this processor is really quite amazing

play11:37

there are several characteristics about

play11:39

it this is the world's first accelerated

play11:41

processor

play11:42

accelerated Computing processor that

play11:44

also has a giant memory it has almost

play11:47

600 gigabytes of memory that's coherent

play11:50

between the CPU and the GPU and so the

play11:52

GPU can reference the memory the CPU can

play11:55

represent reference the memory and

play11:57

unnecessary any unnecessary copying back

play12:00

and forth could be avoided

play12:02

the amazing amount of high-speed memory

play12:05

lets the GPU work on very very large

play12:07

data sets this is a computer this is not

play12:10

a chip practically the Entire Computer

play12:13

is on here all of the Lo this is uh uses

play12:15

low power DDR memory just like your cell

play12:18

phone except this has been optimized and

play12:20

designed for high resilience data center

play12:22

applications so let me show you what

play12:24

we're going to do so the first thing is

play12:26

of course we have the Grace Hopper

play12:27

Superchip

play12:28

put that into a computer the second

play12:30

thing that we're going to do is we're

play12:32

going to connect eight of these together

play12:33

using ndlink this is an Envy link switch

play12:36

so eight of this eight of this Connect

play12:39

into three switch trays into eight eight

play12:44

Grace Hopper pod

play12:46

these eight Grace Hopper pods each one

play12:49

of the grace Hoppers are connected to

play12:51

the other Grace Hopper at 900 gigabytes

play12:53

per second

play12:54

Aid them connected together

play12:56

as a pod and then we connect 32 of them

play12:59

together

play13:02

with another layer of switches

play13:05

and in order to build in order to build

play13:08

this

play13:09

256 Grace Hopper Super Chips connected

play13:12

into one exoflops one exaflops you know

play13:18

that countries and Nations have been

play13:20

working on exaflops Computing and just

play13:23

recently achieved it

play13:26

256 Grace Hoppers for deep learning is

play13:28

one exaflop Transformer engine and it

play13:30

gives us

play13:32

144 terabytes of memory that every GPU

play13:36

can see

play13:38

this is not 144 terabytes distributed

play13:41

this is 144 terabytes connected

play13:45

why don't we take a look at what it

play13:47

really looks like play please

play13:57

foreign

play14:04

[Applause]

play14:11

this

play14:13

is

play14:14

150 miles of cables

play14:17

fiber optic cables

play14:19

2 000 fans

play14:23

70

play14:24

000 cubic feet per minute

play14:27

it probably

play14:28

recycles the air in this entire room in

play14:31

a couple of minutes

play14:34

forty thousand pounds

play14:38

four elephants

play14:43

one GPU

play14:52

if I can get up on here this is actual

play14:54

size

play14:56

so this is this is our brand new

play15:00

Grace Hopper AI supercomputer it is one

play15:04

giant GPU

play15:06

utterly incredible we're building it now

play15:09

and we're so we're so excited that

play15:11

Google Cloud meta and Microsoft will be

play15:14

the first companies in the world to have

play15:16

access

play15:17

and they will be doing

play15:18

exploratory research on the pioneering

play15:21

front the boundaries of artificial

play15:24

intelligence with us so this is the dgx

play15:28

gh200 it is one giant GPU

play15:33

okay I just talked about how we are

play15:36

going to extend the frontier of AI

play15:39

data centers all over the world and all

play15:41

of them over the next decade will be

play15:44

recycled

play15:46

and re-engineered into accelerated data

play15:49

centers and generative AI capable data

play15:51

centers but there are so many different

play15:52

applications in so many different areas

play15:54

scientific computing

play15:56

data processing cloud and video and

play15:58

Graphics generative AI for Enterprise

play16:01

and of course the edge each one of these

play16:03

applications have different

play16:04

configurations of servers

play16:06

different focus of applications

play16:08

different deployment methods and so

play16:11

security is different operating system

play16:13

is different how it's managed it's

play16:14

different

play16:15

well this is just an enormous number of

play16:17

configurations and so today we're

play16:19

announcing in partnership with so many

play16:21

companies here in Taiwan the Nvidia mgx

play16:24

it's an open modular server design

play16:26

specification and the design for

play16:28

Accelerated Computing most of the

play16:30

servers today are designed for general

play16:32

purpose Computing the mechanical thermal

play16:35

and electrical is insufficient for a

play16:38

very highly dense Computing system

play16:40

accelerated computers take as you know

play16:42

many servers and compress it into one

play16:46

you save a lot of money you save a lot

play16:48

of floor space but the architecture is

play16:51

different and we designed it so that

play16:53

it's multi-generation standardized so

play16:55

that once you make an investment our

play16:57

next generation gpus and Next Generation

play16:58

CPUs and next generation dpus will

play17:00

continue to easily configure into it so

play17:02

that we can best time to Market and best

play17:05

preservation of our investment different

play17:07

data centers have different requirements

play17:08

and we've made this modular and flexible

play17:10

so that it could address all of these

play17:12

different domains now this is the basic

play17:14

chassis let's take a look at some of the

play17:16

other things you can do with it this is

play17:17

the Omniverse ovx server

play17:19

it has x86 four l40s Bluefield three two

play17:23

CX-7 six PCI Express Lots this is the

play17:26

grace Omniverse server

play17:29

Grace same for l40s BF3 Bluefield 3 and

play17:33

2 cx-7s okay this is the grace Cloud

play17:35

Graphics server

play17:38

this is the hopper NV link generative AI

play17:42

inference server

play17:43

and of course Grace Hopper liquid cooled

play17:46

okay for very dense servers and then

play17:48

this one is our dense general purpose

play17:51

Grace Superchip server this is just CPU

play17:54

and has the ability to accommodate four

play17:57

CPU four gray CPUs or two gray

play18:00

Superchips enormous amounts of

play18:03

performance in ISO performance Grace

play18:05

only consumes 580 Watts for the whole

play18:08

for the whole server versus the latest

play18:11

generation CPU servers x86 servers 1090

play18:14

Watts it's basically half the power at

play18:17

the same performance or another way of

play18:18

saying

play18:19

you know at the same power if your data

play18:22

center is power constrained you get

play18:24

twice the performance most data centers

play18:27

today are power limited and so this is

play18:30

really a terrific capability

play18:32

we're going to expand AI into a new

play18:34

territory

play18:35

if you look at the world's data centers

play18:37

the data center is now the computer and

play18:39

the network defines what that data

play18:41

center does largely there are two types

play18:43

of data centers today there's the data

play18:46

center that's used for hyperscale where

play18:49

you have application workloads of all

play18:51

different kinds the number of CPUs you

play18:53

the number of gpus you connect to it is

play18:55

relatively low the number of tenants is

play18:58

very high the workloads are Loosely

play19:00

coupled

play19:01

and you have another type of data center

play19:03

they're like super Computing data

play19:04

centers AI supercomputers where the

play19:07

workloads are tightly coupled

play19:10

the number of tenants far fewer and

play19:13

sometimes just one

play19:15

its purpose is high throughput on very

play19:18

large Computing problems

play19:21

and so super Computing centers and Ai

play19:22

supercomputers and the world's cloud

play19:24

hyperscale cloud are very different in

play19:26

nature

play19:28

the ability for ethernet to interconnect

play19:30

components of almost from anywhere is

play19:33

the reason why the world's internet was

play19:35

created if it required too much

play19:37

coordination how could we have built

play19:39

today's internet so ethernet's profound

play19:41

contribution it's this lossy capability

play19:44

is resilient capability and because so

play19:46

it basically can connect almost anything

play19:48

together

play19:49

however a super Computing data center

play19:51

can't afford that you can't interconnect

play19:53

random things together because that

play19:55

billion dollar supercomputer the

play19:57

difference between 95 percent

play20:01

networking throughput achieved versus 50

play20:04

is effectively 500 million dollars

play20:07

now it's really really important to

play20:09

realize that in a high performance

play20:11

Computing application every single GPU

play20:15

must finish their job so that the

play20:18

application can move on

play20:20

in many cases where you do all

play20:22

reductions you have to wait until the

play20:24

results of every single one so if one

play20:26

node takes too long everybody gets held

play20:28

back

play20:29

the question is how do we introduce

play20:33

a new type of ethernet that's of course

play20:35

backwards compatible with everything but

play20:37

it's engineered in a way that achieves

play20:39

the type of capabilities that we that we

play20:42

can bring AI workloads to the world's

play20:45

any data center first

play20:48

adaptive routing adaptive routing

play20:50

basically says based on the traffic that

play20:53

is going through your data center

play20:54

depending on which one of the ports of

play20:57

that switch is over congested it will

play20:59

tell Bluefield 3 to send and will send

play21:02

it to another Port Bluefield 3 on the

play21:05

other end would reassemble it and

play21:08

present the data to the GPU without any

play21:12

CPU intervention second congestion

play21:14

control congestion control it is

play21:16

possible for a certain different ports

play21:20

to become heavily congested in which

play21:22

case each switch will see how the

play21:25

network is performing and communicate to

play21:27

the senders please don't send any more

play21:30

data right away

play21:32

because you're congesting the network

play21:33

that congestion control requires

play21:35

basically a overriding system which

play21:38

includes software the switch working

play21:40

with all of the endpoints to overall

play21:43

manage the congestion or the traffic and

play21:45

the throughput of the data center this

play21:47

capability is going to increase

play21:48

ethernet's overall performance

play21:50

dramatically

play21:51

now one of the things that very few

play21:53

people realize

play21:55

is that today there's only one software

play21:58

stack that is Enterprise secure and

play22:01

Enterprise grade

play22:03

that software stack is CPU

play22:06

and the reason for that is because in

play22:08

order to be Enterprise grade it has to

play22:11

be Enterprise secure and has to be

play22:12

Enterprise managed and Enterprise

play22:13

supported over 4 000 software packages

play22:17

is what it takes for people to use

play22:20

accelerated Computing today in data

play22:22

processing and training and optimization

play22:24

all the way to inference so for the very

play22:26

first time we are taking all of that

play22:28

software

play22:29

and we're going to maintain it and

play22:31

manage it like red hat does for Linux

play22:34

Nvidia AI Enterprise will do it for all

play22:37

of nvidia's libraries now Enterprise can

play22:40

finally have an Enterprise grade and

play22:42

Enterprise secure software stack this is

play22:45

such a big deal otherwise

play22:47

even though the promise of accelerated

play22:49

Computing is possible for many

play22:51

researchers and scientists is not

play22:53

available for Enterprise companies and

play22:55

so let's take a look at the benefit for

play22:57

them this is a simple image processing

play23:00

application if you were to do it on a

play23:02

CPU versus on a GPU running on

play23:04

Enterprise Nvidia AI Enterprise you're

play23:06

getting

play23:08

31.8 images per minute or basically 24

play23:11

times the throughput or you only pay

play23:14

five percent of the cost

play23:17

this is really quite amazing this is the

play23:19

benefit of accelerated Computing in the

play23:21

cloud but for many companies Enterprises

play23:23

is simply not possible unless you have

play23:25

this stack

play23:27

Nvidia AI Enterprise is now fully

play23:30

integrated into AWS Google cloud and

play23:33

Microsoft Azure or an oracle Cloud it is

play23:35

also integrated into the world's machine

play23:38

learning operations pipeline as I

play23:40

mentioned before AI is a different type

play23:42

of workload and this type of new type of

play23:45

software this new type of software has a

play23:46

whole new software industry and this

play23:48

software industry 100 of them we have

play23:50

now connected with Nvidia Enterprise

play23:52

I told you several things I told you

play23:54

that we are going through two

play23:57

simultaneous Computing industry

play23:59

transition accelerated Computing and

play24:01

generative AI

play24:03

two

play24:04

this form of computing is not like the

play24:07

traditional general purpose Computing it

play24:09

is full stack

play24:11

it is Data Center scale because the data

play24:13

center is the computer and it is domain

play24:16

specific for every domain that you want

play24:18

to go into every industry you go into

play24:19

you need to have the software stack and

play24:22

if you have the software stack then the

play24:24

utility the utilization of your machine

play24:26

the utilization of your computer will be

play24:28

high

play24:29

so number two

play24:30

it is full stack data scanner scale and

play24:32

domain specific we are in full

play24:34

production of the engine of generative

play24:37

Ai and that is hgx h100 meanwhile

play24:41

this engine that's going to be used for

play24:43

AI factories will be scaled out using

play24:46

Grace Hopper the engine that we created

play24:49

for the era of generative AI we also

play24:52

took Grace Hopper connected to 256 node

play24:55

nvlink and created the largest GPU in

play24:58

the world dgx

play24:59

gh200

play25:01

we're trying to extend generative Ai and

play25:02

accelerated Computing in several

play25:04

different directions at the same time

play25:05

number one we would like to of course

play25:07

extend it in the cloud

play25:10

so that every cloud data center can be

play25:12

an AI data center not just AI factories

play25:15

and hyperscale but every hyperscale data

play25:18

center can now be a generative AI Data

play25:20

Center and the way we do that is the

play25:21

Spectrum X it takes four components to

play25:24

make Spectrum X possible the switch

play25:27

the Bluefield 3 Nick the interconnects

play25:30

themselves the cables are so important

play25:31

in high speed high-speed Communications

play25:33

and the software stack that goes on top

play25:35

of it we would like to extend generative

play25:37

AI to the world's Enterprise and there

play25:40

are so many different configurations of

play25:41

servers and the way we're doing that

play25:42

with partnership with our Taiwanese

play25:44

ecosystem the mgx modular accelerated

play25:47

Computing systems we put Nvidia into

play25:49

Cloud so that every Enterprise in the

play25:52

world can engage us to create generative

play25:55

AI models and deploy it in a Enterprise

play25:58

grade Enterprise secure way in every

play26:01

single Cloud I want to thank all of you

play26:03

for your partnership over the years

play26:04

thank you

play26:05

[Applause]