Using Ollama to Run Local LLMs on the Raspberry Pi 5
Summary
TLDRThe video demonstrates how to use an 8 GB Raspberry Pi 5 to run open-source large language models (LLMs) on a local network. The creator installs and tests models like Tiny Llama and Llama 2, comparing their performance to a MacBook Pro. Tiny Llama runs efficiently, but larger models, like the 7 billion-parameter Llama 2, perform much slower on the Raspberry Pi. The video also showcases image recognition capabilities, albeit at a slow speed. Overall, it highlights the Pi’s potential in running LLMs despite its limitations in processing power and speed.
Takeaways
- 🖥️ The Raspberry Pi 5, with 8 GB of RAM, costs £80 in the UK or $80 in the US and is great for running open-source projects.
- ⚙️ The video demonstrates running a large language model (LLM) on the Raspberry Pi 5 and compares its performance to a MacBook Pro.
- 💡 The creator installed and tested Tiny LLaMA, an open-source LLM, on the Raspberry Pi using simple commands.
- 🌐 Tiny LLaMA was able to process questions and generate text, though its phrasing was different from larger models due to its size.
- ⚖️ Performance comparison: Raspberry Pi 5 generated responses at about half the speed of the MacBook Pro M1, with an eval rate of 12.9 tokens per second.
- 🚀 The larger LLaMA 2 model was significantly slower on the Raspberry Pi, with an eval rate of 1.78 tokens per second, demonstrating the impact of model size.
- 🔒 LLaMA 2 uncensored version was used to bypass overzealous filtering found in the default LLaMA 2 model.
- 🔍 The creator tested LLaMA’s ability to recognize images, such as a Raspberry Pi board, which was processed successfully but took over five minutes.
- 🛠️ Smaller models like Tiny LLaMA are recommended for faster performance on the Raspberry Pi, whereas larger models like LLaMA 2 are too slow.
- 🎬 The video emphasizes the usefulness of the Raspberry Pi 5 for experimenting with LLMs but highlights the need to choose models wisely based on speed and capability.
Q & A
What is a Raspberry Pi 5?
-The Raspberry Pi 5 is a small, affordable computer designed for educational use and loved by makers. The model mentioned in the script has 8 GB of RAM and costs around £80 in the UK or $80 in the US.
What is the main purpose of the video described in the script?
-The video's main purpose is to demonstrate how to use the 8 GB Raspberry Pi 5 to run an open-source large language model (LLM) on a local network and compare its performance to other devices like the MacBook Pro.
What open-source large language models are mentioned in the script?
-The script mentions several open-source models, including LLaMA, LLaMA 2, Tiny LLaMA, Code LLaMA, and Mistral.
How does the Tiny LLaMA model perform on the Raspberry Pi 5?
-The Tiny LLaMA model was successfully installed and tested on the Raspberry Pi 5. It generated an output with an evaluation rate of 12.9 tokens per second, which is about half of the rate the presenter achieved using the MacBook Pro's M1 processor.
What are the performance benchmarks for the Raspberry Pi 5 running larger models like LLaMA 2?
-When running the LLaMA 2 uncensored model (7 billion parameters), the performance was slower, with an evaluation rate of 1.78 tokens per second. This is significantly slower compared to Tiny LLaMA due to the larger model size.
Why did the presenter choose the uncensored version of LLaMA 2?
-The presenter chose the uncensored version of LLaMA 2 because the standard version applies more restrictions, which may prevent the model from providing certain information, such as regular expressions (regex) in Python.
What was the presenter’s experience with running image interpretation on the Raspberry Pi 5?
-The presenter tested the model's ability to interpret an image of a Raspberry Pi, which worked but was very slow, taking over 5 minutes to generate a response. The model was able to describe the image accurately without relying on external services.
What challenges did the presenter face when running large models on the Raspberry Pi 5?
-The primary challenges were related to the slower processing speed and the high memory requirements of larger models like LLaMA 2, resulting in much slower token generation rates compared to smaller models like Tiny LLaMA.
What are the presenter's recommendations for running LLMs on a Raspberry Pi 5?
-The presenter recommends using smaller models, such as Tiny LLaMA or Mistral, due to their faster performance on the Raspberry Pi 5, which has limited hardware capabilities compared to more powerful machines like the MacBook Pro.
What additional features of LLaMA does the presenter mention?
-The presenter briefly mentions LLaMA's ability to provide API functionalities, which can be explored further in other videos or tutorials.
Outlines
🖥️ Introduction to Raspberry Pi 5 and its Capabilities
The Raspberry Pi 5, released a few months ago, is a tiny computer designed for schools and makers. This version comes with 8 GB of RAM and costs around £80 in the UK or $80 in the US. The video focuses on using this computer to run a large language model (LLM) on a local network. The speaker compares the performance of the Pi 5 against a MacBook Pro, beginning with installing and running a Tiny LLaMA model. The model installs quickly, and the Pi 5 processes tasks at a respectable rate, although its performance is lower compared to the MacBook Pro. The fan activates during high CPU usage, indicating the computer's processing demands. The Tiny LLaMA runs smoothly, responding to prompts with decent results, despite the model's smaller size.
💡 Testing LLaMA 2 Uncensored and Performance Observations
The speaker moves on to testing the LLaMA 2 uncensored model, which has a larger 7-billion parameter size, requiring the full 8 GB of RAM on the Raspberry Pi 5. The model runs significantly slower compared to the smaller Tiny LLaMA, confirming that it’s not well-suited for this setup. The speaker emphasizes that the uncensored version of LLaMA 2 allows more flexibility in responses, avoiding overly restrictive system prompts. They prompt the model to generate a regular expression (regex) for matching email addresses, and while it does respond, the slower speed highlights the limitations of running larger models on the Pi.
Mindmap
Keywords
💡Raspberry Pi
💡Raspberry Pi 5
💡Llama
💡Tiny Llama
💡MacBook Pro
💡Tokens per second
💡Llama 2
💡Uncensored model
💡Regular expression (regex)
💡Image interpretation
Highlights
Introduction of Raspberry Pi 5 with 8GB RAM, available for £80 in the UK or $80 in the US.
Demonstration of running an open-source large language model (LLM) on Raspberry Pi 5.
Successfully installed and ran the Tiny Llama model on the Raspberry Pi 5.
Tested Tiny Llama's response to the question 'Why is the sky blue?' with successful generation of the response.
Tiny Llama performance evaluation: Achieved 12.9 tokens per second generation rate, which is about half the speed of a MacBook M1 Pro.
Experiment with running the larger Llama 2 uncensored model, highlighting the significant difference in performance.
Llama 2 uncensored model was much slower compared to Tiny Llama, processing at only 1.78 tokens per second.
Llama 2 model required 8GB RAM, matching Raspberry Pi 5's capacity but showed performance limitations.
Installed and tested the LLaVA model for image interpretation, analyzing a picture of a Raspberry Pi.
Successfully interpreted the image of a Raspberry Pi, recognizing the circuit board and its components.
LLaVA image processing took over 5 minutes, demonstrating slow performance on Raspberry Pi 5.
Conclusion: Tiny Llama is a more practical option for Raspberry Pi 5 compared to larger models like Llama 2.
Discussion on how using smaller models like Tiny Llama or Mistal improves performance on limited hardware.
Mention of Raspberry Pi 5's fan kicking in during LLM processing, indicating significant CPU usage.
Final takeaway: Running LLMs on Raspberry Pi is feasible but larger models are significantly slower due to hardware constraints.
Transcripts
this tiny computer is a Raspberry Pi
it's made for schools and loved by
makers and more specifically this is the
Raspberry Pi 5 which was released a few
months ago now this version is an 8 GB
of RAM model costs just £80 in the UK or
$80 in the US if you're lucky enough to
be able to get a hold of one so this
tiny computer can be used for many
things but specifically in this video I
want to show you how you can use that 8
GB of RAM for running an open source
large language model on your own network
and what sort of benchmarks we can get
versus say something like the MacBook
Pro that I've used a ll on in the past
so that all said let's get started so
I'm on my Pi five um I'm going to try
and install a ll on this and see how it
goes I should be able to run the AL
instructions and just see how they pan
out so if we just copy that curl
commands and uh paste that
see how that
does Okay cool so that seems to just
gone in and installed straight away so
if you're not familiar with a llama you
can go and pick up any of these models
it's got listed here so we got mix dra
we got uh llama 2 tiny llama code
llama um I'm just going to try and run
tiny llama at this point we can just run
tiny llama and it pull down that model
so let's see how we
do
I've never run tiny llama before so this
is going to be a new one for me so I'm
running raspian as you can see and I've
updated everything installed all the
latest packages and I haven't installed
anything else I literally just installed
a llama there okay cool so it's pulled
down everything let's have
a question classic why is a sky
blue see how that
does the sky blue is a natural color why
is the sky blue oh that's inter interes
in the way it's phrased
that I'm guessing this is just basically
down because it's a tiny llama which is
not as big model as other options okay
so it's actually work which is superb so
I'm actually pretty surprised that that
got installed so quickly and was so easy
um I'm going to try out a few things the
fan did kick in on the um heat sinks
there when I was trying things so it is
obviously uh using the CPU a bit be
interesting to know when we do this um
so we can run this for both commands so
if I do a llama run tiny llama d d I
think it
is uh then what me do why do same
thing we should get some stats out in
terms of how fast it's generating those
responses now when I was doing this on
my M1
Pro on uh llama not tiny llama we're
getting about 202 a second I think and
on my M1 I think it was like 17 or
something like that so eval rate 12.9
tokens a second that
is not bad that's prompt EV right so the
EV is 10 tokens so roughly half what I
was getting on the M1 Pro which is not
too shabby we could actually do a fairly
a better comparison and if we pull down
the other
model so we say buy and come out and
then do llama
run llama 2 I'm actually going to pull
down the uncensored one because um llama
2 is
pretty
restrictive it's pretty um aggressive
with it the restrictions apply so you
could ask for a
really spicy s so I think in my other
video I asked for a Rex for in Python
and it it wouldn't give me the answer to
those rexes because um it felt that they
were inappropriate and that I might be
trying to do nefarious things with them
so this is saying it's going to take
about 10 minutes so that's obviously a 4
gig model
now just wait a second and let that pull
that down okay cool that's all finished
downloading as well as the Llama 2
uncensored model I've pulled down uh L
as well because I wanted to check out if
it can do um how well it CES with doing
image kind of interpretations so let's
first do a llama let's run the Llama 2
uncensored and see how that
fares and in fact actually let's uh
let's do that with the ver post command
again so I'm going to prompt it with can
you write a regular expression to match
email addresses
addresses um so in previous video when I
did this it actually this is the reason
for using uh the uncensor version
because then this doesn't get
caught like I said things is a little
overzealous stuff and that generally is
to do with the initial system prompt you
can see that this is much
slower than the tiny llama that we
running
okay so it's doing it in uh JavaScript I
didn't actually specify that or in
Python but there we go that's
fine I have no idea if that's going to
match an email address well this is yeah
this is this is really slow in
comparison so you probably want to be
using one of those smaller models so
yeah this is the 7 billion parameter
model I didn't state that but it it says
on the Llama website under the un sensor
the um llama 2 model that the memory
requirements are that 7 billion
parameter models require generally 8 gig
ram which we've got here but you can see
that it's not it's not fast okay yeah so
you can see there we've got an e rate of
1.78
so tiny in comparison to what we had
just now with the um tiny llama so
obviously the model is that much more
bigger it's double size we've gone from
1.78 to 3.
78 um I think that's a 3 billion
parameter model let me have a squiz at
the website in fact actually no it's a
1.1 billion parameter model which is
obviously a lot smaller we're going from
7 1.1 billion to 7 billion and we
getting much slower eval rate so this is
probably not the way you want to go you
probably want to be using something like
mistol on this or in fact tiny larm is a
good option there because it seem to be
going pretty fast I'm going to try this
um image as well so I download this
image into downloads is a picture of the
Raspberry Pi let me see if I can get it
to understand that because that would be
pretty awesome to know that it can do
that as
well so let's run
lava and we're going to run that verose
as
well man got an absolute tweet storm
going on in a tree in my garden this
happens all the times okay so let's see
what's in this
picture home
in
downloads image
JP I think that's what it was called
okay let's go
wow this is this is slow and you got no
feedback um is the only is the other
thing here we're not seeing anything
aside from spinny snaked and it's
finally responding with an answer here
we go the image features a close-up of
the back of a computer circuit board
green and yellow Compu board has many
screws on it attaching various
components detail view showcases and
inner workings of electron devices such
as laptops or
computers so it's obviously looked at
that image and it understands it and
it's done all that locally which is
really impr impressive it's not gone out
to a third party service in order to do
that it hasn't been able to pick
anything out from the image file name
because I've made sure that it's not
identifiable from what I've named the
file so that's really impressive but
it's incredibly slow it took how long
did that take total duration 5 minutes
33 so a long time we've obviously got
all of the features that um alarm has as
well such as the API stuff you can go
and check my previous videos if you want
to see how to do that but yeah I hope
you found this useful uh let me know if
you're going to be trying it out on your
own rosby
Pi I'll speak to you soon in new video
and check out one of my other videos on
alarm there'll be one popping up in a
minute probably okay bye for now
bye
تصفح المزيد من مقاطع الفيديو ذات الصلة
Ollama.ai: A Developer's Quick Start Guide!
Llama 3.2 is HERE and has VISION 👀
Metas LLAMA 405B Just STUNNED OpenAI! (Open Source GPT-4o)
Mistral 7B: Smarter Than ChatGPT & Meta AI - AI Paper Explained
Llama 3.2 is INSANE - But Does it Beat GPT as an AI Agent?
EASIEST Way to Fine-Tune LLAMA-3.2 and Run it in Ollama
5.0 / 5 (0 votes)