Deepseek r1 RTX 4090 vs M3 max Macbook pro | 2025

Gimel12
30 Jan 202507:41

Summary

TLDRIn this video, the presenter compares the performance of two GPUs, the RTX 4090 and M3 Max, for running Deep Seek R1 model inference. The performance of both GPUs is tested with the same 8-billion-parameter model, showcasing differences in tokens per second: 7961 tokens for the RTX 4090 and 55 tokens for the M3 Max. The RTX 4090 demonstrates superior performance, while the M3 Max offers decent speed, optimized by Apple's MLX framework. The video also highlights the hardware requirements for running AI models and provides insights into quantization and GPU choices for different tasks.

Takeaways

  • ๐Ÿ˜€ The video compares the performance of the RTX 4090 GPU and M3 Max chip when running the Deepseek R1 model for inference tasks.
  • ๐Ÿ˜€ The comparison is aimed at testing GPU performance on different devices, with the author promising to test the RTX 1590 when it arrives.
  • ๐Ÿ˜€ Users can download the Deepseek model and run it locally on their machine, whether it's Windows, Linux, or Mac OS.
  • ๐Ÿ˜€ The RTX 4090 is used for running the model, which has 24GB of memory, while the M3 Max has 128GB of memory, but runs slower despite its higher memory.
  • ๐Ÿ˜€ The 8 billion parameter model is ideal for running on local machines, as larger models like the 70B model cannot fit into the GPU memory.
  • ๐Ÿ˜€ The author demonstrates how to load and run the Deepseek R1 model, showing how the assistant generates text based on user input and measures tokens per second.
  • ๐Ÿ˜€ On the RTX 4090, the model achieves around 79-92 tokens per second, using 8GB of memory and 271W of power, with 80% GPU utilization.
  • ๐Ÿ˜€ On the M3 Max, the same model performs slower at about 55 tokens per second, but still runs well for a laptop setup with 8GB of GPU memory.
  • ๐Ÿ˜€ The author tests the optimized version of the model using the MLX community platform on Mac OS, which improves performance and brings the token generation rate to 64 tokens per second.
  • ๐Ÿ˜€ While both GPUs are functional, the RTX 4090 outperforms the M3 Max, especially in terms of raw speed, making it better suited for tasks requiring more GPU power.
  • ๐Ÿ˜€ The video emphasizes that 8GB of GPU memory is required for running these models smoothly, with a focus on the trade-offs between gaming PCs and laptops like the MacBook with M3 Max.
  • ๐Ÿ˜€ The author hints at future testing with the upcoming RTX 5090 and invites viewers to comment on specific topics they'd like to learn more about, such as quantization and model differences.

Q & A

  • What is the main goal of the video?

    -The main goal of the video is to compare the performance of the DeepSeek R1 model on two different machines: an Nvidia RTX 4090 GPU and an Apple M3 Max chip, specifically focusing on their token generation speeds during inference tasks.

  • How do the RTX 4090 and M3 Max compare in terms of performance?

    -The RTX 4090 outperforms the M3 Max in terms of tokens per second. The RTX 4090 achieves between 79 and 92 tokens per second, while the M3 Max reaches about 55 tokens per second with the regular model, but with the optimized version using MLX, it performs better at 64 tokens per second.

  • What are the system specifications of the RTX 4090 used in the test?

    -The system used for testing the RTX 4090 has 24 GB of VRAM, and the GPU consumes up to 271 watts of power during the test. The GPU's memory utilization during the test was about 8 GB.

  • What makes the M3 Max's performance impressive despite being slower than the RTX 4090?

    -The M3 Max's performance is impressive because it is a laptop chip, and for a laptop, achieving 55 to 64 tokens per second with the DeepSeek R1 model is quite fast, especially without the specialized hardware of a desktop GPU like the RTX 4090.

  • What role does the MLX framework play in optimizing the M3 Max's performance?

    -The MLX framework, which is optimized for macOS hardware, significantly improves the performance of the M3 Max when running models like DeepSeek R1. When using MLX, the token generation rate increases from 55 to 64 tokens per second.

  • Why does the presenter recommend using the 8-billion parameter version of the DeepSeek model?

    -The 8-billion parameter version is recommended because it is more compatible with both GPUs tested in the video. Larger models, such as those with 70 billion parameters, require more GPU memory and time to perform, which may not be feasible on the systems tested here.

  • What is the key difference between running models on the RTX 4090 and M3 Max?

    -The key difference is that the RTX 4090, with its higher VRAM and more powerful processing capabilities, can run larger models faster and more efficiently than the M3 Max, which is a more power-efficient but less powerful system overall.

  • What kind of tasks is the DeepSeek R1 model most suitable for?

    -The DeepSeek R1 model is most suitable for text-based tasks such as coding, data manipulation, and generating or rephrasing text. However, for more complex tasks, larger and more powerful models like GPT-4 or Cloud 3.5 are recommended.

  • How does the cost of the M3 Max compare to the RTX 4090 setup?

    -The M3 Max setup (in this case, a MacBook) is relatively more expensive, with the system costing around $5800, compared to the RTX 4090-based setup, which is around $2000 for the GPU. However, additional costs for other components (like the CPU and motherboard) need to be considered in a custom-built PC.

  • What does the presenter plan to test next, and why is it relevant?

    -The presenter plans to test the upcoming RTX 5090 GPU to compare its performance with the RTX 4090 and M3 Max. This test is relevant as it will help further explore how next-gen GPUs handle large models like DeepSeek R1, especially in terms of token generation speeds.

Outlines

plate

This section is available to paid users only. Please upgrade to access this part.

Upgrade Now

Mindmap

plate

This section is available to paid users only. Please upgrade to access this part.

Upgrade Now

Keywords

plate

This section is available to paid users only. Please upgrade to access this part.

Upgrade Now

Highlights

plate

This section is available to paid users only. Please upgrade to access this part.

Upgrade Now

Transcripts

plate

This section is available to paid users only. Please upgrade to access this part.

Upgrade Now
Rate This
โ˜…
โ˜…
โ˜…
โ˜…
โ˜…

5.0 / 5 (0 votes)

Related Tags
GPU performanceDeepSeek R1RTX 4090M3 Maxtokens per secondtech comparisonmachine learningmodel testingMac optimizationAI hardwaregaming GPU