Using Ollama to Run Local LLMs on the Raspberry Pi 5

Ian Wootten

17 Jan 202409:29

Summary

TLDRThe video demonstrates how to use an 8 GB Raspberry Pi 5 to run open-source large language models (LLMs) on a local network. The creator installs and tests models like Tiny Llama and Llama 2, comparing their performance to a MacBook Pro. Tiny Llama runs efficiently, but larger models, like the 7 billion-parameter Llama 2, perform much slower on the Raspberry Pi. The video also showcases image recognition capabilities, albeit at a slow speed. Overall, it highlights the Pi’s potential in running LLMs despite its limitations in processing power and speed.

Takeaways

🖥️ The Raspberry Pi 5, with 8 GB of RAM, costs £80 in the UK or $80 in the US and is great for running open-source projects.
⚙️ The video demonstrates running a large language model (LLM) on the Raspberry Pi 5 and compares its performance to a MacBook Pro.
💡 The creator installed and tested Tiny LLaMA, an open-source LLM, on the Raspberry Pi using simple commands.
🌐 Tiny LLaMA was able to process questions and generate text, though its phrasing was different from larger models due to its size.
⚖️ Performance comparison: Raspberry Pi 5 generated responses at about half the speed of the MacBook Pro M1, with an eval rate of 12.9 tokens per second.
🚀 The larger LLaMA 2 model was significantly slower on the Raspberry Pi, with an eval rate of 1.78 tokens per second, demonstrating the impact of model size.
🔒 LLaMA 2 uncensored version was used to bypass overzealous filtering found in the default LLaMA 2 model.
🔍 The creator tested LLaMA’s ability to recognize images, such as a Raspberry Pi board, which was processed successfully but took over five minutes.
🛠️ Smaller models like Tiny LLaMA are recommended for faster performance on the Raspberry Pi, whereas larger models like LLaMA 2 are too slow.
🎬 The video emphasizes the usefulness of the Raspberry Pi 5 for experimenting with LLMs but highlights the need to choose models wisely based on speed and capability.

Q & A

What is a Raspberry Pi 5?
-The Raspberry Pi 5 is a small, affordable computer designed for educational use and loved by makers. The model mentioned in the script has 8 GB of RAM and costs around £80 in the UK or $80 in the US.
What is the main purpose of the video described in the script?
-The video's main purpose is to demonstrate how to use the 8 GB Raspberry Pi 5 to run an open-source large language model (LLM) on a local network and compare its performance to other devices like the MacBook Pro.
What open-source large language models are mentioned in the script?
-The script mentions several open-source models, including LLaMA, LLaMA 2, Tiny LLaMA, Code LLaMA, and Mistral.
How does the Tiny LLaMA model perform on the Raspberry Pi 5?
-The Tiny LLaMA model was successfully installed and tested on the Raspberry Pi 5. It generated an output with an evaluation rate of 12.9 tokens per second, which is about half of the rate the presenter achieved using the MacBook Pro's M1 processor.
What are the performance benchmarks for the Raspberry Pi 5 running larger models like LLaMA 2?
-When running the LLaMA 2 uncensored model (7 billion parameters), the performance was slower, with an evaluation rate of 1.78 tokens per second. This is significantly slower compared to Tiny LLaMA due to the larger model size.
Why did the presenter choose the uncensored version of LLaMA 2?
-The presenter chose the uncensored version of LLaMA 2 because the standard version applies more restrictions, which may prevent the model from providing certain information, such as regular expressions (regex) in Python.
What was the presenter’s experience with running image interpretation on the Raspberry Pi 5?
-The presenter tested the model's ability to interpret an image of a Raspberry Pi, which worked but was very slow, taking over 5 minutes to generate a response. The model was able to describe the image accurately without relying on external services.
What challenges did the presenter face when running large models on the Raspberry Pi 5?
-The primary challenges were related to the slower processing speed and the high memory requirements of larger models like LLaMA 2, resulting in much slower token generation rates compared to smaller models like Tiny LLaMA.
What are the presenter's recommendations for running LLMs on a Raspberry Pi 5?
-The presenter recommends using smaller models, such as Tiny LLaMA or Mistral, due to their faster performance on the Raspberry Pi 5, which has limited hardware capabilities compared to more powerful machines like the MacBook Pro.
What additional features of LLaMA does the presenter mention?
-The presenter briefly mentions LLaMA's ability to provide API functionalities, which can be explored further in other videos or tutorials.