How Google Makes Custom Cloud Chips That Power Apple AI And Gemini

CNBC
23 Aug 202413:12

Summary

TLDRGoogle's Silicon Valley lab is home to the Trillium, their latest Tensor Processing Unit (TPU), which powers Google's search and AI models like Gemini. Despite Google's pioneering in custom AI chips, some argue they've fallen behind in the AI race. The lab focuses on testing and developing Google's own microchips, including ASICs like TPUs for AI and VCUs for YouTube, marking a strategic move away from reliance on traditional chip giants. The TPU's efficiency has been key to Google's cloud services, influencing the company's market position and enabling innovations like the transformer model. Google also faces challenges in supply chain, power efficiency, and water usage, yet remains committed to advancing AI and chip technology.

Takeaways

  • 🌐 Google's Silicon Valley lab houses racks of servers that are used to test its own Tensor Processing Units (TPUs), which power various Google services.
  • πŸ’‘ Trillium is Google's latest generation TPU, which will be made public later in the year and is designed for AI model training, including Google's chatbot Gemini and Apple's AI.
  • πŸš€ Google was the first major cloud provider to create custom AI chips, starting with voice recognition needs in 2014, which led to the development of TPUs.
  • 🏭 Other tech giants like Amazon, Microsoft, and Meta have since followed Google's lead in creating their own AI chips to meet specific computational needs.
  • πŸ”‹ TPUs are application-specific integrated circuits (ASICs) that are more efficient for their single purpose compared to general-purpose CPUs and GPUs.
  • 🌟 Google's TPU has been a key differentiator in the AI cloud market, helping Google compete with and even surpass other cloud providers in AI capabilities.
  • πŸ“ˆ Google's TPUs have a significant market share, with 58% of the custom cloud AI chip market, according to Newman's team's research.
  • πŸ”§ The development of TPUs and other custom chips is a complex and costly process, requiring partnerships with chip developers like Broadcom and manufacturing by TSMC.
  • 🌍 Geopolitical risks, particularly around chip manufacturing in Taiwan, are a concern for tech companies, prompting efforts to diversify supply chains and increase domestic production.
  • 🌿 Google is committed to improving the power efficiency of its chips and data centers, which is crucial as AI servers are projected to consume significant energy in the future.

Q & A

  • What is Trillium and how does it relate to Google's technology?

    -Trillium is Google's latest generation Tensor Processing Unit (TPU), which is a type of AI accelerator chip designed to power various Google services, including search and YouTube. It is part of Google's strategy to create custom hardware for specific tasks, enhancing efficiency and performance.

  • How does Google's TPU differ from general-purpose hardware like CPUs and GPUs?

    -Google's Tensor Processing Units (TPUs) are application-specific integrated circuits (ASICs) designed for specific tasks, making them more efficient for those tasks compared to general-purpose hardware like CPUs and GPUs. TPUs are optimized for AI and machine learning workloads, which require high computational power and efficiency.

  • What role did TPUs play in the development of Google's AI capabilities?

    -TPUs have been crucial in advancing Google's AI capabilities by providing the necessary computational power to train and run complex AI models. They have enabled the development of services like Google's chatbot Gemini and have been instrumental in the research that led to the invention of the transformer, a key technology in generative AI.

  • Why did Google decide to develop its own AI chips instead of relying on existing solutions like Nvidia's GPUs?

    -Google developed its own AI chips to meet the specific needs of its applications more efficiently. By creating custom hardware like TPUs, Google could achieve a factor of 100 more efficiency compared to general-purpose hardware for tasks like voice recognition and AI model training.

  • What is the significance of Google's partnership with Broadcom in the context of chip development?

    -Google's partnership with Broadcom is significant as Broadcom assists in the development of Google's AI chips, including the TPU. Broadcom's expertise in chip development helps Google to design and manufacture custom chips that are tailored to its specific needs, contributing to Google's ability to stay competitive in the AI chip market.

  • How has the introduction of TPUs impacted Google's position in the cloud computing market?

    -The introduction of TPUs has significantly impacted Google's position in the cloud computing market by differentiating its offerings and enhancing its AI capabilities. It has allowed Google to compete more effectively with other cloud providers and has been a key factor in its rise to parity and, in some cases, a leader in AI prowess among cloud providers.

  • What is the role of TSMC in Google's chip manufacturing process?

    -TSMC (Taiwan Semiconductor Manufacturing Company) plays a critical role in Google's chip manufacturing process as it is the world's largest chip maker and manufactures some 92% of the world's most advanced semiconductors. Google sends its final chip designs to TSMC for fabrication, which is essential for producing the custom chips like TPUs.

  • How does Google address the potential geopolitical risks associated with its reliance on TSMC for chip manufacturing?

    -Google acknowledges the geopolitical risks associated with its reliance on TSMC and prepares for potential disruptions. It emphasizes the importance of global support for Taiwan and the need for diversification in the semiconductor industry. Additionally, Google supports initiatives like the CHIPS Act funding in the US to encourage domestic chip manufacturing.

  • What is Google's strategy for managing the environmental impact of its data centers and AI operations?

    -Google is committed to reducing the environmental impact of its data centers and AI operations. It focuses on improving the efficiency of its chips, using direct-to-chip cooling to reduce water consumption, and striving to drive carbon emissions towards zero. Google also invests in renewable energy and sustainable practices to mitigate its environmental footprint.

  • What is the significance of Google's announcement of its first general-purpose CPU, Axion?

    -The announcement of Axion signifies Google's expansion into the general-purpose CPU market, which is a significant move as it allows Google to offer a more comprehensive suite of custom hardware solutions. Axion is designed to improve performance and efficiency for Google's internal services and could potentially be offered to third parties, enhancing Google's competitiveness in the cloud computing market.

Outlines

00:00

πŸ’‘ Google's TPU Innovations and AI Advancements

The first paragraph introduces Google's lab at its Silicon Valley headquarters, where servers are used to test Google's Tensor Processing Units (TPUs), which power various Google services including search and YouTube. The TPU, Google's custom AI chip, is highlighted as a key differentiator in the AI race, with Google being an early adopter in creating custom hardware for AI applications. The narrative also touches on Google's competition with Nvidia and its efforts to stay ahead in the AI chip market. The Trillium system, Google's latest TPU generation, is mentioned, along with its role in training AI models like Google's chatbot Gemini and Apple's AI. The paragraph also discusses the history of Google's custom chip development, starting with the need for voice recognition features, and the evolution of TPUs over the years.

05:05

πŸ” Google's TPU Market Dominance and Future Challenges

The second paragraph delves into Google's transformer technology, which is foundational to generative AI and was made possible by TPUs. It discusses the computational expense of transformers and how TPUs have facilitated their development and scalability. The paragraph also addresses criticisms regarding Google's product releases in the generative AI space, particularly the delayed launch of its chatbot Gemini compared to OpenAI's ChatGPT. Despite this, Gemini is noted to be in use by major companies. The evolution of TPUs is further explored, from their initial focus on inference to training AI models, and the interconnectivity capabilities of the latest versions. The paragraph also touches on Google's competition with Nvidia, the impact of the AI boom on Nvidia's market value, and the potential for TPUs and other AI-specific chips to challenge Nvidia's dominance. Lastly, it mentions Apple's use of Google's TPUs and the implications of this partnership.

10:06

🌐 Google's Commitment to Custom Chips and Sustainable AI

The third paragraph discusses Google's entry into the general-purpose CPU market with its Axion chip, which is set to be available by the end of the year. It contrasts Google's late entry into the CPU market with its early and successful foray into AI chips with TPUs. The paragraph also addresses the strategic reasons behind Google's timing for launching Axion, emphasizing the company's focus on delivering value through customized solutions. It further explores the use of Arm Chip architecture in Google's processors, which offers power efficiency and customization. The importance of power efficiency in AI servers is highlighted, especially considering the projected energy consumption of AI servers by 2027. The paragraph concludes with Google's efforts to reduce its environmental impact through the use of energy-efficient chips and innovative cooling technologies, reflecting the company's commitment to sustainable AI development.

Mindmap

Keywords

πŸ’‘Trillium

Trillium is Google's latest generation of Tensor Processing Units (TPUs), which are specialized hardware accelerators designed for machine learning and AI workloads. In the video, Trillium is highlighted as a significant advancement in Google's custom chip technology, with the capability to support complex AI tasks more efficiently than general-purpose hardware. The script mentions a full Trillium system with 256 chips, emphasizing Google's commitment to developing advanced AI hardware.

πŸ’‘Tensor Processing Unit (TPU)

A Tensor Processing Unit is a type of application-specific integrated circuit (ASIC) developed by Google, designed to accelerate machine learning workloads. TPUs are integral to the video's theme as they power various Google services, including search and YouTube. The video discusses how TPUs are used to train AI models like Google's chatbot Gemini and even Apple's AI, showcasing their importance in the current AI landscape.

πŸ’‘AI Models

AI models refer to the algorithms and mathematical models that are trained to perform tasks such as natural language processing, image recognition, and more. In the context of the video, AI models like Google's chatbot Gemini are trained on TPUs, highlighting the synergy between custom hardware and AI development. The video suggests that the efficiency of TPUs allows for more sophisticated AI models to be trained and deployed.

πŸ’‘Custom Hardware

Custom hardware refers to the specialized, purpose-built electronic components designed for specific tasks or applications. Google's development of custom hardware like TPUs and VCUs (Video Coding Units) is a central theme of the video. It illustrates Google's strategy to optimize performance and efficiency for their services by creating hardware tailored to their unique needs, rather than relying on general-purpose components.

πŸ’‘ASICs (Application-Specific Integrated Circuits)

ASICs are electronic circuits customized for the performance of a particular task, which contrasts with general-purpose processors. Google's TPU is an example of an ASIC, designed for AI and machine learning applications. The video emphasizes Google's expertise in creating ASICs, which has allowed them to achieve higher efficiency and performance in their data centers compared to using standard CPUs or GPUs.

πŸ’‘Gemini

Gemini is Google's AI chatbot, mentioned in the video as an example of an AI model trained on TPUs. Gemini represents Google's entry into the competitive field of generative AI and highlights the company's efforts to stay at the forefront of AI technology. The video suggests that Gemini's training on TPUs gives it an edge in performance and efficiency.

πŸ’‘Voice Recognition

Voice recognition is a technology that enables devices to interpret and respond to spoken language. In the video, Google's early decision to develop custom hardware for voice recognition is cited as a catalyst for the creation of TPUs. This decision underscored the need for specialized hardware to meet the computational demands of emerging AI applications, setting the stage for Google's foray into custom chip design.

πŸ’‘Market Share

Market share refers to the portion of the market a company controls in terms of sales or revenue. The video discusses Google's TPUs dominating the custom cloud AI chip market with a 58% share, indicating their significant influence and success in this niche. This statistic underscores Google's competitive advantage in providing AI acceleration solutions.

πŸ’‘Supply Chain

The supply chain encompasses the network of organizations involved in the production and distribution of a product. The video touches on the importance of Google's in-house chip development to reduce dependency on external supply chain partners, ensuring greater control over product development and potentially mitigating risks associated with supply chain disruptions.

πŸ’‘Axion

Axion is Google's first general-purpose CPU, announced in the video as a significant step in Google's chip-making journey. The introduction of Axion signifies Google's expansion into the broader CPU market, aiming to provide a complete suite of custom hardware solutions for its services. This move is positioned as part of Google's strategy to increase vertical integration and control over its technology stack.

πŸ’‘Optical Circuit Switch

An optical circuit switch is a device that uses light to direct or control the flow of data within a network. In the video, Google's second-generation optical circuit switch is highlighted as a technology that allows their TPU supercomputers to be optically interconnected. This innovation contributes to the flexibility and efficiency of Google's data centers, enabling them to dynamically configure their hardware to best suit the tasks at hand.

Highlights

Google's Silicon Valley lab is dedicated to testing its own microchips, Tensor Processing Units (TPUs).

TPUs are used to train AI models like Google's chatbot Gemini and Apple's AI.

Google's TPU is a custom AI chip that powers search, video, and ads, differentiating from general-purpose hardware.

TPUs were first designed for voice recognition, leading to a 100x improvement in efficiency over general hardware.

Google's data centers rely on Intel, AMD for CPUs, and Nvidia for GPUs, alongside Google's ASICs like TPUs and VCUs.

TPUs are application-specific integrated circuits (ASICs) built for efficiency in single-purpose tasks.

Google's TPU was the first large-scale hardware accelerator for AI applications when it launched in 2015.

TPUs have helped Google move from third to a leading position in the cloud market for AI capabilities.

Amazon and Microsoft followed Google's lead in creating custom AI chips, Inferentia and Maya, respectively.

Google's TPUs dominate the custom cloud AI chip market with 58% market share, according to Newman's team's research.

The transformer, a foundational concept for generative AI, was invented by Google researchers enabled by TPUs.

Google's chatbot Gemini is trained and served on TPUs, competing with OpenAI's ChatGPT.

TPUs have evolved from connecting 256 chips in version two to almost 9000 chips in version five.

Google's TPUs are available to third parties, offering an alternative to Nvidia's GPUs.

Nvidia's GPUs are more flexible but have been in tight supply due to the AI boom, impacting their market cap.

Apple is using Google's TPUs to train its AI, indicating a shift from reliance on Nvidia.

Developing alternatives to Nvidia's chips is complex and costly, requiring scale and resources.

Google partners with Broadcom for chip development, investing over $3 billion in R&D for these partnerships.

Google's Axion, its first general-purpose CPU, will be available by the end of the year, marking a late entry into the CPU market.

Google's commitment to making its own chips is driven by the need for massive compute power in AI and generative AI tools.

Google is focused on improving chip efficiency to reduce carbon emissions and water usage in its data centers.

Transcripts

play00:04

At this sprawling lab at Google's Silicon Valley headquarters, these racks and racks of servers aren't

play00:09

running workloads for Google Cloud's millions of customers.

play00:12

Here, for example, is the very first Trillium system that we built is a full Trillium system

play00:18

with 256 chips in it for racks.

play00:20

Or for YouTube, or the world's most dominant search engine.

play00:24

And what is Trillium?

play00:25

Trillium is our latest generation TPU.

play00:27

It'll be public later this year.

play00:29

Instead, they're running tests on its very own microchips, Tensor Processing Units, that help power it

play00:35

all, search.

play00:36

And of course, video, YouTube ads.

play00:38

Everything Google does has been powered in many ways by its own homegrown TPU.

play00:43

Now, TPUs are used to train AI models like Google's own chatbot Gemini.

play00:47

.. and in some big news, Apple's AI, too.

play00:50

Apple,

play00:51

actually, we found out yesterday they disclosed in a paper they're using Google made chips.

play00:55

The world sort of has this fundamental belief that all AI large language models are being trained on

play01:00

Nvidia. But Google took its own path here.

play01:04

And yet, despite being the birthplace of some foundational concepts behind generative AI, many say

play01:09

Google's fallen behind in the AI race.

play01:11

But it was the first major cloud provider to do custom AI chips.

play01:15

It was ten years ago, almost to the day, where we decided that to meet the needs of our users

play01:21

in terms of a particular application, voice recognition at the time, we needed to design custom

play01:26

hardware.

play01:27

In the years since, Amazon, Microsoft and Meta have started making their own AI chips too.

play01:32

Here we're turning on the chips and the boards for the first time, making sure they're working properly to

play01:37

specification debugging any issues that might come up, that sort of thing.

play01:40

And no media has been inside here before.

play01:42

First time? Yep.

play01:43

We went to Google headquarters for an exclusive look inside the chip lab and sat down with its top executive

play01:49

to ask why and how.

play01:50

Google's betting big on the expensive, complex business of custom chips.

play02:05

It all started in 2014, when a group at Google calculated that in order to launch upcoming voice

play02:10

recognition features, Google would need to double the number of computers in its data centers.

play02:14

Amin Vahdat, now the head of custom cloud chips, started at Google four years before that.

play02:20

A number of leads at the company asked the question, what would happen if Google users wanted to interact

play02:26

with Google via voice for just 30s a day?

play02:29

And how much compute power would we need to support our users?

play02:32

We realized that we could build custom hardware, not general purpose hardware, but custom hardware Tensor

play02:36

Processing Units in this case to support that much, much more efficiently.

play02:41

In fact, a factor of 100 more efficiently than it would have been otherwise.

play02:44

What is a Tensor Processing Unit?

play02:45

And did you guys coin that term?

play02:47

We did. We coined the term Tensor Processing Unit.

play02:49

We believe that it was certainly the first large scale hardware accelerator for AI applications.

play02:55

There's a whole gamut of qualification and validation tests we do on, you know, power thermals functionality.

play03:00

You're really trying to make sure the design has enough margin so that it's going to operate well, you

play03:05

know, at volume, at scale.

play03:06

Principal engineer Andy Swing, who ended up leaving Google since our visit, was there for the first launch.

play03:11

There's actually four chips inside there.

play03:13

It's connected to.

play03:15

Actually, two of those are connected to a host machine that has CPUs in it.

play03:18

And then all these colorful cables are actually linking together all of the Trillium chips to work as

play03:23

one large supercomputer.

play03:26

Google data centers still rely heavily on chip giants like Intel and AMD for Central Processing Units,

play03:32

CPUs, and Nvidia for Graphics Processing Units.

play03:35

GPUs. Google makes a different category of chips called ASICs application specific integrated

play03:40

circuits, which are more efficient because they're built for a single purpose.

play03:44

Google's best known for its AI-focused ASIC, the TPU.

play03:47

But it also makes ASICs to power YouTube, called VCUs, Video Coding Units.

play03:52

And just like Apple, Google also makes custom chips for its devices.

play03:56

The G4 powers the new, fully AI enabled pixel nine, and the new A1 powers pixel Buds

play04:02

Pro two. But the TPU is what set Google apart because when it launched in 2015, it was the first of

play04:08

its kind.

play04:09

So the AI cloud era has completely reordered the way companies are seen.

play04:14

And this silicon differentiation, the TPU itself may be one of the biggest reasons that Google

play04:20

went from the third cloud to being seen truly on parity, and in some eyes, maybe even ahead of the other

play04:25

two clouds for its AI prowess.

play04:28

Amazon Web Services announced its first cloud AI chip, Inferentia, in 2018, three years after Google's

play04:34

came out. Microsoft's first custom AI chip, Maya, wasn't announced until the end of 2023.

play04:40

In order to stay differentiated, to stay competitive, to stay ahead of the market, and to not become overly

play04:45

dependent on any supply chain, partner or provider, they needed to do more, build

play04:51

more in-house.

play04:52

According to Newman's team's research, Google TPUs dominate among custom cloud AI chips, with 58% of the

play04:58

market share and Amazon comes in second at 21%.

play05:04

In 2017, a group of eight Google researchers wrote the now famous paper that invented the transformer,

play05:10

the underpinnings of today's generative AI craze.

play05:13

The invention, Vahdat says, was made possible by TPUs.

play05:16

The transformer computation is expensive, and if we were living in a world where it had to run on general

play05:22

purpose compute, maybe we wouldn't have imagined it.

play05:24

Maybe no one would have imagined it.

play05:26

But it was really the availability of TPUs that allowed us to think, not only could we design

play05:30

algorithms like this, but we could run them efficiently at scale.

play05:34

Still, Google has faced criticism for some botched product releases in the current rat race of generative

play05:39

AI, and its chatbot, Gemini came out more than a year after OpenAI's ChatGPT.

play05:44

Dozens and dozens of customers are leveraging Gemini every day, including some of the most familiar names

play05:50

out there, whether it's Deutsche Bank, EstΓ©e Lauder and many, many others that are household names,

play05:54

McDonald's, if you like, and others.

play05:56

Was Gemini trained on TPUs?

play05:58

Gemini was trained and is served externally entirely on TPUs.

play06:01

Back in 2018, Google expanded the focus of TPUs from inference to training AI models.

play06:07

Version two was actually a pod that connected 256 TPUs together.

play06:12

Now version five is in production, which connects almost 9000 chips together.

play06:16

The real magic of this TPU system is that you actually can interconnect everything over fiber optics

play06:22

dynamically. And so you can build small or as large of a system as you want.

play06:26

With version two in 2018, Google also made its TPUs available to third parties alongside market

play06:32

leading chips like Nvidia's GPUs, which are still used by most cloud customers.

play06:36

If you're using

play06:37

GPUs, they're more programmable, they're more flexible, but they've been in tight supply.

play06:42

The AI boom has sent Nvidia's stock through the roof, catapulting the chipmaker to a $3 trillion market

play06:48

cap in June, surpassing Google's parent company alphabet, and jockeying with Apple and Microsoft for

play06:53

position as the world's most valuable public company.

play06:56

Being candid, these specialty AI accelerators aren't nearly as flexible or as

play07:01

powerful as Nvidia's platform, and that is what the market is also waiting to see is

play07:07

can anyone play in that space?

play07:09

Now that we know Apple's using Google's TPUs to train its AI, the real test will come as it rolls out those

play07:15

full AI capabilities on iPhones and Macs next year.

play07:18

They were renting chips from Google for about two bucks an hour, times a gazillion

play07:24

chips to train their AI models.

play07:26

So they didn't even need Nvidia.

play07:28

All the market pull is coming from Nvidia, but longer term, people are just going to want to do AI things.

play07:33

And when they want to just do AI things, they may be just as happy to do it on a TPU or do it on another

play07:39

homegrown piece of AI dedicated silicon.

play07:45

But developing alternatives to Nvidia's hugely powerful, and expensive, chips is no small feat.

play07:50

It's expensive. You need a lot of scale.

play07:52

And so it's not something that everybody can do.

play07:54

But these hyperscalers, they've got the scale and the money and the resources to go down that path.

play07:59

But the process is so complex and costly that even the Googles of the world can't do it alone.

play08:04

Since the very first TPU ten years ago, Google's partnered with Broadcom, a chip developer that also

play08:09

helps Meta design its AI chips.

play08:11

Broadcom says it's spent more than $3 billion on R&D to make these partnerships happen.

play08:17

AI chips, they're very complex.

play08:18

There's lots of things on there.

play08:20

So Google brings the compute.

play08:21

Broadcom does all the peripheral stuff.

play08:24

They do like the I/O and the SerDes and all of the different pieces that go around that compute.

play08:29

They also do the packaging.

play08:31

Then the final design is sent off to be manufactured at a fabrication plant or fab, primarily those owned by

play08:37

the world's largest chip maker, Taiwan Semiconductor Manufacturing Company, which makes some 92% of the

play08:43

world's most advanced semiconductors.

play08:46

Do you have any safeguards in place should the worst happen in the geopolitical sphere between China and

play08:51

Taiwan?

play08:52

Yeah, it's an important question.

play08:54

And it's certainly something that we prepare for and we think about as well.

play08:57

But we're hopeful that actually it's not something that we're going to have to trigger.

play09:01

I think the entire world is at the same risk.

play09:04

It's not unique to Google.

play09:05

It's not unique to Amazon.

play09:06

It's not unique to Apple.

play09:07

It's not unique to Nvidia.

play09:08

If Taiwan is not given the appropriate support.

play09:13

If it deals with unexpected end of day circumstances, it is not only going to set back any

play09:19

one of these companies, it's going to set back the whole world.

play09:21

That's why the White House is handing out $52 billion in CHIPS Act funding to companies building fabs in the

play09:27

US, with the biggest portions going to Intel, TSMC and Samsung, so far.

play09:32

Intel

play09:33

and TSMC are putting a lot of their own money into this as well.

play09:35

I'm heartened to see that.

play09:36

But I mean, it's going to take a long time to to duplicate.

play09:39

So let's let's hope that it doesn't need to be duplicated.

play09:47

Risk aside, Google just made another big chip move, announcing its first general purpose CPU,

play09:53

Axion, will be available by the end of the year.

play09:55

Now we're able to bring in that last piece of the puzzle, the CPU.

play09:59

And so a lot of our internal services, whether it's BigQuery, whether it's Spanner, YouTube, advertising

play10:04

and more running are running on Axion.

play10:06

But Google is late to the CPU game.

play10:08

Amazon launched its processor Graviton in 2018.

play10:12

Alibaba launched its own server chip in 2021, and Microsoft announced its CPU in November.

play10:17

Why didn't you do it sooner?

play10:19

Our focus has been on where we can deliver the most value for our customers, and there it has been,

play10:24

starting with the TPU, our video coding units, our networking.

play10:27

We really thought that the time was now starting a couple of years ago.

play10:30

Again, these things are a number of years in the making to really bring our expertise to bear on the ARM

play10:35

CPUs.

play10:36

I don't fault Google for pacing out the launch of Axion in a more

play10:41

delayed fashion.

play10:42

This wasn't as differentiated.

play10:44

It's not as differentiated.

play10:45

To me, it is more of a supply game.

play10:47

It's more of a margin and vertical integration game for the company.

play10:50

Whereas the TPU was truly differentiated.

play10:53

Six generations, ten years of experience.

play10:55

All these processors from non chipmakers, including Google's, are made possible by Arm Chip architecture, a

play11:01

more customizable, power efficient alternative that's been gaining traction over the traditional x86 model

play11:06

from Intel and AMD.

play11:08

Power efficiency is crucial because by 2027, AI servers are projected to use up as much

play11:14

power every year as a small country.

play11:16

With TPUs, the ability to customize greatly boosts power efficiency.

play11:20

This is our second generation optical circuit switch.

play11:23

So our large TPU supercomputers are actually optically interconnected.

play11:28

It allows us to dynamically link together collections of TPU chips to custom tailor the dimensions to the

play11:33

job that's running. This is developed all in-house by us.

play11:38

Power is a huge thing now, and you know any anything you can do to try to improve efficiency, lower

play11:43

costs and save power, I think you're going to do.

play11:45

Google's latest environmental report showed emissions rose nearly 50% from 2019 to 2023,

play11:51

partly due to data center growth for powering AI.

play11:54

Without having the efficiency of these chips, the numbers could have wound up in a very different place.

play11:59

We remain committed to actually driving these numbers in terms of carbon emissions from our infrastructure

play12:04

24/7, driving it towards zero.

play12:06

Training and running AI also takes a massive amount of water to keep the servers cool so they can run 24/7.

play12:12

That's why with the third generation of TPU, Google started using direct to chip cooling, a new way to cool

play12:18

servers that uses far less water, and that's also being used by Nvidia's latest Blackwell GPUs.

play12:24

We have four chips, and these are our liquid cooling lines that come in.

play12:28

There's essentially a cold plate here that has little fins in it, and it picks up the heat from the chip,

play12:33

puts it into the water, and that comes back out.

play12:35

Despite challenges from geopolitics to power and water, Google is committed not only to its generative AI

play12:41

tools, but to making its own chips to handle the massive compute required by the craze.

play12:45

I've never seen anything like this, and no sign of it slowing down quite yet.

play12:50

I think it's fair to say that we really can't predict what's going to be coming as an industry in the next

play12:54

five years, and hardware is going to play a really important part there.

Rate This
β˜…
β˜…
β˜…
β˜…
β˜…

5.0 / 5 (0 votes)

Related Tags
Google TPUAI HardwareCloud ComputingCustom ChipsTech InnovationSilicon ValleyAI EfficiencyData CentersASIC TechnologyAI Race