Deepseek R1 671b on a $500 AI PC!

Digital Spaceport

14 Apr 202520:27

Summary

TLDRIn this detailed video, the creator demonstrates the surprisingly effective performance of the Deepseek 671b AI model running on a budget HP Z440 workstation, costing around $500. With a $5 CPU, 512 GB of RAM, and a 64 GB configuration, the system manages to run inference tasks at modest speeds. The creator compares various AI models, revealing that while performance degrades with larger models, smaller ones like Gemma 3 perform impressively well. The video emphasizes the importance of flexibility in system design and highlights the evolving landscape of affordable AI hardware setups.

Takeaways

😀 A $500 machine, using a $5 processor, 512GB of RAM, and an HP Z440 workstation, can run the DeepSeek 671b AI model surprisingly well.
😀 Performance metrics for the DeepSeek 671b are modest, offering around 2 tokens per second on a budget setup, with a noticeable slowdown on larger prompts.
😀 The processor used (E5 2650 v4) is a low-cost option, demonstrating decent performance, though it’s not as fast as higher-end CPUs like the E5 2696 v4.
😀 Using a GPU like the 3090 doesn’t provide significant improvements for AI inference in this setup, as the GPU was not fully utilized during testing.
😀 CPU models impact performance, with reasoning-heavy models like QWQ performing at 1.6 tokens per second, while simpler models like Gemma 3 12B (Q8) reach up to 15 response tokens per second.
😀 Memory bandwidth is crucial for AI performance. The Broadwell CPUs, like the E5 series, offer decent throughput of around 75GB per second, helping with AI tasks.
😀 Larger models, such as Gemma 3 12B Q8, provide much faster inference times, with 4.1 tokens per second for response and 7.5 tokens per second for prompts.
😀 Although the Z440 workstation has limitations (e.g., it doesn’t fully support LR-DIMMs), it still functions well for running smaller AI models if configured correctly.
😀 System memory, especially 64GB to 128GB, is critical for better performance with AI models, avoiding frequent memory bottlenecks and enabling more complex tasks.
😀 For those looking for long-term performance, flexible, expandable systems are recommended, especially if you plan on growing your setup with new hardware in the future.

Q & A

What hardware setup is used to run DeepSeek 671b on a $500 machine?
-The setup includes a $5 processor (E5 2650 v4), 512 GB of RAM using LR-DIMMs, and an HP Z440 workstation, totaling around $500.
What are the observed token processing speeds for the $5 CPU setup?
-For simple prompts, the $5 CPU achieves around 2 tokens per second for response tokens and 2.9 tokens per second total tokens.
How does the inference speed change when running larger prompts?
-For larger prompts, the $5 CPU setup slows significantly, reaching around 1.3 response tokens per second and 6.35 prompt tokens per second.
How does the 2696 v4 CPU compare to the 2650 v4 in performance?
-The 2696 v4 CPU shows slightly higher performance in larger prompts with response tokens of 1.3 per second and prompt tokens of 6.35 per second, but the difference is not drastic for smaller prompts.
What impact does model size have on CPU inference performance?
-Larger models, like Kito 14B, process tokens slower (3.7 response tokens/sec and 9.5 prompt tokens/sec), whereas smaller models, like Gemma 3 Q4, run much faster (15 response tokens/sec and 20 prompt tokens/sec).
What are the recommended RAM configurations for effective CPU inference?
-A minimum of 32 GB is suggested, with 64 GB or more being ideal to handle ancillary models and prevent memory limitations.
Why might the HP Z440 display the 539 warning screen and how can it be managed?
-The 539 warning appears due to unsupported or partially supported LR-DIMMs. The user can skip it by pressing Enter, but there’s no confirmed BIOS workaround for fully bypassing it.
How does CPU memory bandwidth affect inference performance?
-Higher memory bandwidth directly improves performance; for example, Broadwell CPUs peak at around 75 GB/s, while other CPUs like Rome can reach 200 GB/s, significantly boosting inference speed.
Are GPUs necessary for running DeepSeek 671b in these setups?
-Not strictly; CPUs can handle inference for certain models, but GPUs provide faster performance and can offload specific layers using techniques like K-Transformers.
What advice is given regarding building future-proof AI systems?
-Start with a flexible system that allows for expansion in RAM, CPU, and GPU. Prioritize configurations that can grow with your future AI model requirements.
Which models are recommended for mid-range CPU setups?
-Gemma 3 12B Q8 is suggested for mid-range setups due to its reasonable token processing speed and manageable footprint in 16 GB of RAM.
How does system idle power consumption look during CPU inference?
-Idle power drops to around 55–60 watts, which is reasonable considering active PCIe lanes, USB devices, and system fans.