Can AMD match NVIDIA in 2025 or 2026?

TechTechPotato

12 Jun 202512:33

Summary

TLDRAMD's latest advancements in AI hardware and software are setting the stage for a fierce competition with Nvidia. The MI355X GPU, built on the CDNA4 architecture, offers significant improvements in throughput, memory, and cost per token, positioning AMD as a strong alternative in the AI market. With a robust roadmap through 2027, AMD aims to lead with innovations in scaling, performance, and efficiency, focusing on rack-scale solutions and AI optimization. Their full-stack approach, including Rockom software, positions AMD as a credible primary supplier for hyperscale AI infrastructure.

Takeaways

😀 AMD's new MI355X GPU significantly improves AI throughput, offering up to 30% lower cost per token compared to Nvidia's A100 GPU.
😀 The MI355X GPU, built on AMD's CDNA4 architecture, features enhanced tensor compute performance with support for low-precision data types like FP6 and FP4.
😀 AMD aims to position itself as a primary supplier for hyperscale AI infrastructure, with a roadmap that spans through 2027, including annual GPU, CPU, and networking family updates.
😀 The MI355X GPU boasts 288 GB of HBM3 memory and offers 8 terabytes per second of memory bandwidth, which is 50% higher than the MI300X.
😀 AMD's AI roadmap includes the MI400 GPU for 2026, expected to deliver 20 petlops of FP8 and introduce Ultra Accelerator Link (UAL) for scale-up compute across hundreds of thousands of GPUs.
😀 The MI400 GPU will integrate UAL, rivaling Nvidia's NVLink, and will support up to 1024 GPUs in a single scale-up topology, allowing for the largest memory-bound training workloads.
😀 Rockom 7, AMD's new software stack, brings full support for the MI355X and faster performance for training and inference workloads, offering up to 3.8x better performance than previous versions.
😀 AMD is advancing its AI strategy by offering a full-stack solution with GPUs, CPUs, and networking, including the Polar AI network card, which supports standard Ethernet and open-scale networking.
😀 In 2026, AMD plans to launch the Helios AI rack platform, designed for higher GPU density, liquid-cooled solutions, and equipped with the MI400 GPU and next-generation networking technologies.
😀 AMD has committed to achieving a 20x improvement in rack-scale energy efficiency by 2030, driven by hardware and software optimizations to address the growing power demands of AI deployments.

Q & A

What is AMD's main strategy for AI hardware in 2025?
-AMD's main strategy for AI hardware in 2025 is to position itself as a competitive player against Nvidia, focusing on performance, price, and platform scope. This includes introducing new GPUs like the MI355X, expanding the Rockom software stack, and offering a full-stack solution that includes CPUs, GPUs, and networking.
What are the key features of AMD's new MI355X GPU?
-The MI355X GPU is built on the CDNA4 architecture and fabricated on TSMC's N3P node. It doubles AI matrix throughput compared to the MI300X, supports low precision data types like FP6 and FP4, and has 288 GB of HBM3 memory with 8 terabytes per second memory bandwidth. It targets high-density deployments with up to 128 GPUs per rack.
How does the MI355X compare to Nvidia's GPUs in terms of cost and performance?
-AMD claims that the MI355X offers a 30% lower cost per token than Nvidia's A100 or H100, or up to 40% more tokens per dollar. The MI355X aims to provide a competitive advantage not only in architecture but also in economic terms, making it a price-performance disruptor.
What is the significance of the MI400 GPU in AMD's roadmap?
-The MI400 GPU, expected to launch in 2026, is designed to support ultra-large scale AI workloads. It will feature 20 petaflops of FP8 performance, 432 GB of HBM4 memory, and 19.6 terabytes per second of memory bandwidth. It will also integrate AMD's Ultra Accelerator Link (UAL) for true scale-up compute across hundreds of thousands of GPUs.
What is the Ultra Accelerator Link (UAL) and how does it benefit AMD's platform?
-The Ultra Accelerator Link (UAL) is an open interconnect designed to rival Nvidia's NVLink. It allows for large-scale AI workloads by enabling clusters of up to 1024 GPUs in a single scale-up topology, improving the scalability and reducing latency for AI deployments.
How does Rockom 7 software enhance AMD's AI hardware performance?
-Rockom 7 provides full support for the MI355X and delivers day-one compatibility with major AI frameworks like PyTorch and Onyx. AMD claims that Rockom 7 increases inference and training workloads on MI300X hardware by up to 3.8x faster compared to Rockom 6, improving both software optimizations and release cycles.
What role does Rockom play in AMD's AI strategy?
-Rockom is a crucial part of AMD's AI strategy, as it offers software support that was previously lacking. With Rockom 7, AMD can now compete with Nvidia by ensuring day-one support for AI frameworks, improving performance, and offering scalability in AI workloads, especially for enterprise AI applications.
What are AMD's plans for AI networking with the Polar network card?
-The Polar network card is AMD's first AI-focused networking product, developed by the Pensando team. It supports the Ultra Ethernet consortium specification and aims to provide open scale networking for AI infrastructure, aligning with AMD’s push for vendor diversity and compatibility with standard Ethernet, as opposed to Nvidia's proprietary NVLink.
What is the Helios platform, and how does it contribute to AMD's AI ecosystem?
-The Helios platform, set to launch in 2026, is a next-generation AI rack solution built around the MI400 GPU. It integrates AMD's Ultra Accelerator Link, HBM4 memory, and next-generation CPUs, offering a scalable and efficient platform designed for the most demanding AI workloads. The Helios platform will help redefine AI infrastructure by incorporating higher GPU density and improved cooling solutions.
How does AMD plan to address energy efficiency in AI infrastructure?
-AMD has set a target of achieving a 20x improvement in rack-scale energy efficiency by 2030, relative to current systems. This will be achieved through hardware advancements such as denser memory (HBM4) and lower precision compute (FP4 and FP6), as well as software optimizations in Rockom to improve performance per watt and reduce the overall complexity of AI workloads.