Linux vs Windows on Ryzen AI Max+ 395 – Which Is Faster for LLMs?

AIex The AI Workbench

11 May 202520:43

Summary

TLDRIn this detailed performance review, Alex tests the HP G1A workstation, equipped with an AMD Ryzen AIAX Pro 3950 CPU and Radeon 8060 GPU, across both Linux and Windows environments. The video compares the performance of various machine learning models using LM Studio and Olama software. Alex explores the impact of different configurations, including GPU memory allocation and charger types, revealing subtle differences in performance, particularly with tokens per second. The conclusion highlights the need for future updates to optimize the hardware fully, with a focus on user feedback and suggestions for further tests.

Takeaways

😀 The video tests the performance of the HP G1A workstation with the AMD Ryzen AIAX Pro 395 and Radeon 8060 graphics in a Linux environment.
😀 The tests include several AI models using both Olama and LM Studio software, comparing the speed and efficiency of each on the same hardware.
😀 The workstation is configured with 128 GB of RAM, 64 GB assigned to the GPU and system memory each, and is running Fedora Linux 42.
😀 Initially, LM Studio does not fully recognize the hardware configuration, particularly with the GPU, resulting in a software update requirement.
😀 Different AI models are tested, such as Quen 34B, Gemma 3 27B, and QW32B, with varying performance results measured in tokens per second.
😀 The highest speed achieved for Quen 34B in LM Studio was 52.76 tokens per second, while similar models performed with slightly lower speeds in Olama.
😀 Minor differences in performance were noted between Olama and LM Studio, with LM Studio showing a slight edge in some tests.
😀 In the Windows environment, similar models were tested in LM Studio, showing a 10% performance difference when switching from a non-original charger to the original charger.
😀 The system's power consumption was affected by the charger used, with the original charger leading to better GPU performance and higher power usage.
😀 Overall, performance differences between Linux and Windows were negligible, suggesting both operating systems handled the hardware similarly for the tested models.
😀 The video concludes with the speaker inviting feedback on potential improvements to the testing process, asking for suggestions on settings or tests for future videos.

Q & A

What is the primary focus of the video?
-The primary focus of the video is to test the performance of different models on an HP G1A workstation running Linux and Windows, specifically comparing the results between the software packages Olama and LM Studio using various AI models.
What hardware is being tested in the video?
-The video tests an HP G1A workstation equipped with an AMD Ryzen AIAX Pro 395 processor and a Radeon 8060 graphics card. The workstation has 128 GB of RAM, with 64 GB allocated to system memory and 64 GB to the GPU.
What is the purpose of testing with different models in the video?
-The purpose of testing with different models is to evaluate their performance in terms of tokens per second on both Linux and Windows environments. The goal is to compare how different software packages (Olama vs LM Studio) handle these models and to assess the overall efficiency and speed.
What was the performance of the smallest model, Quen 34B, in terms of tokens per second?
-The smallest model, Quen 34B, showed a performance of 52.54 tokens per second on the first test, and subsequent tests yielded results ranging from 47.57 to 41.60 tokens per second.
How does the performance of the models in LM Studio compare to Olama?
-The performance of models in LM Studio was generally better than in Olama, especially for the larger models like Quen 32B. LM Studio achieved around 10 tokens per second for Quen 32B, while Olama achieved about 8.61 tokens per second for the same model.
What was the impact of using a non-original charger on the performance?
-Using a non-original charger resulted in lower power consumption (45W) and a performance of 7.77 tokens per second. When the original charger was used, power consumption increased to 70W, and the performance improved to 8.54 tokens per second, showing a roughly 10% speed difference.
What software version was the Linux workstation running?
-The Linux workstation was running Fedora Linux 42 Workstation Edition with the Cinnamon 649 desktop environment. It had the latest drivers for the CPU and GPU.
How does the performance of the models compare between the Linux and Windows environments?
-The performance of the models in the Linux and Windows environments appeared similar, with only minor differences in speed. For example, LM Studio running Quen 32B showed 8.61 tokens per second on Linux and 8.44 tokens per second on Windows, with no significant difference in the results.
What is the significance of the GPU memory being properly recognized in LM Studio?
-Proper recognition of the GPU memory in LM Studio ensures that the software utilizes the GPU for processing, rather than the CPU. This improves performance, as seen when the system correctly used 23 GB of video memory during the tests.
What improvements does the video suggest might be necessary for future testing?
-The video suggests that software updates and new drivers for the CPU are needed to fully optimize performance on the HP G1A workstation. The speaker also invites viewers to suggest any settings or adjustments that could improve the results in future tests.