WTF is an AI PC?
Summary
TLDRDans cette vidéo, NV Patel, fondateur et PDG de Framework, présente les modèles de langage avancés et leur utilisation sur des ordinateurs portables. Il discute de la manière dont les modèles locaux comme Meta's Llama 3.1 peuvent offrir des réponses intelligentes sans avoir à envoyer de données en ligne. Le PDG explore également les avantages de la confidentialité et de la personnalisation des modèles, ainsi que les progrès rapides dans le domaine de l'IA et de l'apprentissage automatique.
Takeaways
- 🤖 NV Patel, fondateur et PDG de Framework, discute d'IA sur une vidéo YouTube.
- 🧠 Il existe beaucoup de bruit autour de l'IA, mais Patel se concentre sur les cas d'utilisation réels et fonctionnels.
- 💡 Patel démontre comment exécuter le modèle 'llama 3.1' de Meta localement sur un ordinateur portable Framework.
- 🔍 Le modèle 'llama 3.1' répond rapidement à des questions complexes, montrant l'avancée de l'IA sur matériel de consommation.
- 🆚 Les grands modèles de langage comme GPT sont propriétaires et coûteux, contrairement aux modèles plus petits et ouverts comme 'llama 3.1'.
- 🌐 Exécuter des modèles localement permet de garder le contrôle total des données et de personnaliser le modèle.
- 🏆 'llama 3.1' se classe dans le top 10 du Chatbot Arena, prouvant la compétitivité des modèles ouverts.
- 🔢 Les paramètres du modèle (poids internes) déterminent sa capacité à apprendre et à répondre.
- 💾 Il est possible de réduire la profondeur des bits des modèles pour les faire tenir dans une mémoire GPU limitée.
- 🖥️ Les GPU sont adaptés pour exécuter des modèles IA en raison de leur capacité de traitement de matrices et de bande passante mémoire élevée.
- 📝 AMA est un outil simple pour exécuter des modèles de langage sur une machine personnelle sans connaissances techniques approfondies.
Q & A
Qu'est-ce que NV Patel mentionne comme problème avec l'hype autour de l'IA?
-Il mentionne qu'il y a beaucoup de bruit et d'informations non essentielles ('noise and BS') autour des sujets à la mode, et qu'il préfère se concentrer sur ce qui est réel et fonctionne.
Quel est l'objectif de NV Patel lorsqu'il parle de l'IA?
-Il veut éviter les cas d'utilisation hype et parler de ce qui fonctionne réellement avec l'IA sur des matériels réels.
Pourquoi NV Patel n'est-il pas un expert en IA?
-Il se décrit comme un amateur, et son approche est donc de partager ce qu'il a appris jusqu'à présent plutôt que de parler avec l'autorité d'un expert.
Quel modèle d'IA a-t-il exécuté sur un ordinateur portable Framework?
-Il a exécuté le modèle 'llama 3.1' de Meta.
Quelle question a-t-il posée au modèle 'llama 3.1' lors de la démonstration?
-Il a demandé au modèle ce qui est important concernant le droit de réparation.
Quels sont les avantages de l'exécution d'un modèle IA localement?
-Les avantages incluent le contrôle total, la sécurité des données (elles ne quittent pas la machine), et la possibilité de modifier le modèle ou d'inspecter ce qui se passe en coulisses.
Quelle est la différence entre les modèles de langage géants comme ChatGPT et les modèles plus petits?
-Les grands modèles comme ChatGPT ont des centaines de milliards de paramètres et s'exécutent sur des serveurs coûteux, tandis que les petits modèles sont conçus pour s'exécuter sur des GPU de niveau consommateur.
Quels sont les modèles IA qui commencent à être compétitifs avec les plus grands modèles?
-Les modèles 'llama 3.1' de Meta sont mentionnés comme étant compétitifs avec les plus grands modèles tout en étant suffisamment petits pour s'exécuter sur un ordinateur portable.
Quel est le rôle des paramètres dans un modèle IA?
-Les paramètres sont les poids ou les éléments à l'intérieur du modèle, et plus il y en a, plus le modèle peut être intelligent, en gros.
Pourquoi les GPU sont-elles efficaces pour exécuter des modèles IA localement?
-Les GPU ont beaucoup de bande passante mémoire et de capacité de multiplication de matrices et de débit, ce qui est utile pour le machine learning, similaire à la façon dont elles sont utilisées pour les jeux vidéo.
Quel outil a utilisé NV Patel pour exécuter des modèles de langage sur son ordinateur portable?
-Il a utilisé un outil appelé 'ama', développé par un groupe appelé 'ama', pour exécuter des modèles de langage.
Quels sont les autres types de modèles IA que mentionne NV Patel?
-Il mentionne les modèles de vision, comme 'lava' et 'lava llama 3', qui peuvent analyser des images et interagir avec l'utilisateur en fonction de l'image.
Quel est l'intérêt de la génération d'images avec des modèles IA?
-La génération d'images est une autre application intéressante de l'IA où des modèles comme 'stable diffusion' ou 'flux' peuvent créer des images à partir de rien.
Outlines
🤖 Introduction à l'IA avec NV Patel
NV Patel, fondateur et PDG de Framework, aborde le sujet de l'IA en evitant les cas d'utilisation hyperboliques. Il souhaite se concentrer sur ce qui fonctionne réellement aujourd'hui. Il mentionne qu'il n'est pas un expert en IA ou en apprentissage automatique et qu'il aborde le sujet en tant qu'amateur. Il démontre un exemple concret en utilisant le modèle 'llama 3.1' de Meta sur un ordinateur portable Framework, répondant à une question complexe sur le droit de réparation.
🔍 Explication des modèles d'IA et de leurs avantages
Le script explique les différences entre les modèles d'IA de grande taille, tels que Chat GPT, qui fonctionnent sur des serveurs onéreux et des modèles plus petits exécutés sur des GPU de consommation. Il souligne les avantages de Meta's 'llama 3.1', un modèle intermédiaire qui fonctionne sur des GPU de consommation et offre une bonne intelligence tout en étant modifiable et inspectable par l'utilisateur. Le modèle est classé dans le top 10 du Chatbot Arena, un concours mondial entre les modèles d'IA.
📊 La puissance des paramètres et de la mémoire GPU
Le script explore le concept de paramètres dans les modèles d'IA, qui sont les poids du modèle et déterminent sa capacité à apprendre et à répondre. Il explique comment réduire la profondeur des bits des paramètres peut permettre de faire tenir des modèles plus grands dans moins de mémoire. Il souligne également l'efficacité des GPU pour exécuter ces modèles en raison de leur capacité de traitement de matrices et de leur bande passante mémoire élevée.
💬 Utilisation des modèles de langage et des applications futures
Le script présente diverses utilisations des modèles de langage, allant de l'interaction textuelle à la reconnaissance d'images avec le modèle 'lava llama 3'. Il mentionne également d'autres types de modèles d'IA en dehors des LMM, comme les générateurs d'images, et comment ils pourraient être configurés pour fonctionner localement sur un ordinateur portable. Il conclut en disant que l'état de l'art en IA avance rapidement et que nous en entendrons davantage à l'avenir.
Mindmap
Keywords
💡AI
💡Hype
💡Framework Laptop
💡Discret GPU
💡Meta's Llama 3.1
💡Proprietary Model
💡Right to Repair
💡Matrix Multiplication
💡Open Source
💡Chatbot Arena
Highlights
NV Patel, founder and CEO of Framework, discusses AI and its practical applications.
AI hype often leads to noise and confusion, so Framework focuses on real, workable solutions.
Demonstration of running Meta's LLaMA 3.1 model locally on a Framework laptop to answer complex questions.
Comparison between large, proprietary AI models like ChatGPT and smaller, open-source models.
Advantages of running AI models locally include data privacy and the ability to modify and inspect the model.
Emergence of mid-sized AI models that are smart enough and can run on consumer GPUs.
LLaMA 3.1's performance in the Chatbot Arena, competing with proprietary models.
Parameters define the model's intelligence, with more parameters generally leading to smarter models.
GPUs are well-suited for running AI models due to their memory bandwidth and matrix multiplication capabilities.
Practical demonstration of asking LLaMA 3.1 about the best burritos in San Francisco.
Exploration of large language models beyond text, including models with vision support.
Demonstration of image analysis using the LLaMA model to describe content of an image.
Knowledge cutoff of AI models and their reliance on training data for information.
Potential future improvements in AI models' knowledge and capabilities.
Introduction to other AI applications like image generation using models like Stable Diffusion.
The rapid advancement of AI and machine learning technology and its impact on consumer electronics.
Framework's commitment to discussing AI and machine learning as the technology evolves.
Script for a YouTube video explaining how to run large language models locally on a laptop.
Transcripts
I'm NV Patel I'm the founder and CEO of
framework and today we're going to talk
about everyone's favorite topic AI we
haven't talked a lot about Ai and part
of the reason for that is there's just a
ton of hype around it and when there's
hype around a topic there tends to be a
lot of noise and Bs in general and we
try to avoid that we try to focus on
what's real we try to focus on what
works and so when we talk about AI we're
going to avoid the hypy use cases and
really actually just talk through what
works today on Real Running
[Music]
Hardware let's talk about what AI is in
the context of a framework laptop one
thing I want to call out from the start
is that I'm not an AI expert I'm not a
machine learning expert so I'm really
approaching this from the perspective of
a hobbyist and so before we go deep into
what I've learned so far we're actually
just going to run something so we're
going to open up oama and we're going to
run meta's llama 3.1 model and we're
just going to ask it a question to show
what AIS can actually do running locally
on a framework laptop or really any
consumer laptop with a discret GPU so
we're just going to ask it what's
important
about the right to repair so an
important question very important
question what's important about the
right to repair and you can see it's
just dumping out an answer pretty
quickly and just quickly reading through
this it's an issue that's gained
significant attention in recent years
environmental impact cost savings
increased product lifespan some of these
are a little iier job creation econom
iic growth obviously we're hiring people
so maybe we've created some jobs with
this that's great uh product design
Innovation so this is actually a pretty
good answer like it's pretty fast that
it's stmping this out it's a pretty
smart answer it's actually probably a
better answer than what you get in you
know a few minutes of Googling around
and that it's just nicely summarized um
and so basically just as a starting
point this model running locally on a
framework laptop 16 was able to answer
actually a quite complex question with a
reasonable answer pretty quickly and so
that's that's the starting point and so
let's talk about what that just was what
just happened here running locally on
this computer and how does it compare to
maybe some of the other AIS that you've
seen or used in the past like chat GPT
so the thing about chat GPT of course is
that it is this enormous proprietary
model that openai has developed there's
similar models like clad there are other
ones from Google and x and other
companies the key thing of course is
that these are normal models running on
very expensive servers in data centers
and they are proprietari and
inaccessible in the sense that if you
wanted to modify it if you wanted to
inspect what's happening under the hood
you actually can't do that if you've got
concerns around privacy or security you
have to basically trust that those
companies are doing the right things
with your queries and your data that
you're providing and so obviously the
advantage to being able to run a model
locally is that you have complete
control you can trust that your data is
not leaving your machine you can modify
the model you can inspect what's
happening under the hook it's all open
and available to you and the trade-off
of course is whether it's smart enough
and so as we look at those giant models
like chat GPT they have hundreds of
billion or yeah hundreds of billions of
parameters and they're running on these
multi-million doll machines filled with
inedia gpus that oftentimes you can't
even get access to versus let's say the
types of models that have been getting
hype in the PC space things that are
running on the tiny bit of silicon area
that's dedicated to npus on these recent
generation processors there's this
massive Gulf between those two ends
these giant models like trat GPT on one
side and then the tiny little models
that to run on 40 to 50 Tops on your
processor but the cool thing is that
there is this Middle Ground that's
starting to emerge where there are
models that fit on consumer level gpus
that are actually getting smart enough
to be useful and so this model that we
just ran here was meta's latest and
greatest called llama 3.1 and they
actually have multiple sizes of it that
range from too big to reasonably run on
a laptop to actually perfectly sized to
be able to run on 8 gigs of Graphics
memory which is exactly what we have
here with our Radeon 7700s and the
framework laptop 16 and so we're
actually just going to look at some of
these models this is um a site called
Elm Marina there's this competition
that's called the chatbot Arena which is
This Global competition between both
proprietary and open large language
models basically to see which one is the
smartest which one's delivering the best
responses to questions and you can see
the stack ranking that's been generated
now over the course of a few years and
obviously like that the top models the
greatest models giving the best answers
are these very very large proprietary
models that have the most data going
into them that have the largest number
of parameters largest model size overall
that are closed of course uh but the
cool thing is that as you start to
scroll down just a little bit right
there in the top 10 we've got an open
model this model called llama 3.1 that
meta has been investing in now over a
few generations and the exciting thing
about this is that it is open it's
available under an open license you can
actually download the entire model
locally and be able to play with it the
license of course it's it's a little bit
tricky it is an open model in the sense
that it's open and accessible but there
are some restrictions really for the
sake of safety
primarily and we're not going to address
that in this video but one thing to call
out here of course is that as you look
down this ranking you're kind of going
from like these giant models that have
hundreds of billions of parameters and
as you scroll down you start to see the
models that start to get a bit smaller
like you've got llama 370 billion you've
got llama 3 8 billion which is the one
that we just ran which is still in the
top 50 and actually if you look at some
of the models that are around it these
were best inclass proprietary models
just a generation ago so basically this
Frontier in Ai and machine Lear learning
is moving incredibly quickly where the
top models in the world the most
advanced models that were closed and
proprietary just a generation ago open
models that are small enough to run on a
consumer laptop are actually competitive
with them which is just such a cool
place to be and part of why machine
learning in AI actually genuinely is
interesting and that there is stuff
there beyond the hype and so one cool
thing so I mentioned parameters a few
times so just to explain a bit about
what parameters are so parameters are
basically the the weights the number of
of like items that are inside that model
and the more parameters there are just
as a rough approximation the smarter the
model can be that's why you see of
course meta three different model sizes
the the biggest model is the smartest
and it goes down as you shrink the model
down a bit and the number of parameters
normally would be you know let's say 16
bits per parameter two bytes per
parameter so for this 8 billion
parameter model you need a 16 gab chunk
of memory to be able to run through it
you can though shrink down the bit depth
to 8 bit or six bit or even smaller
without like substantially making the
model dumber and so often times I'll run
a model at 8 bit or even six bit just to
make it fit and so we can take these 8
billion parameter models and be able to
fit them in 8 gigs of video memory which
is cool and so one other thing to call
out of course when we ran that demo
asking a question um so I'll just say
tell me more um you can see that it it
it answers pretty quickly and so one
cool thing that makes gpus so effective
at running these models locally versus
something like an npu running uh you
know within the Silicon area of your of
your CPU Apu is that there's actually
quite a lot of memory bandwidth and
there's a lot of matrix multiplication
capability and throughput inside of a
GPU because they're crunching you know
polygons they're crunching shaders
things that are just matrix
multiplication which very conveniently
is largely what machine learning is and
so what made gpus perfect for gaming has
translated really really well over to
machine learning and so you can see in
general these models these 8 billion
parameter models because of that matrix
multiplication throughput because of the
memory bandwidth that's available they
can output answers basically faster than
you can read them and so like General
like conversational speed you'd need
about like five to seven words per
second or tokens per spec per second for
it to not feel sluggish and with these
models running on 7700s like we have
here you can get up to about 30 35
tokens per second which is basically
faster than you can read so really
actually quite usable while still
delivering these answers that are pretty
smart so we've entered this kind of cool
sweet spot here going back to focusing
on real use cases instead of hype of
course the cool thing here is that we're
really just approaching this likely
would any other piece of software so you
can choose what you want to run this
isn't like co-pilot that's baked into
your PC that you have no control over
this is literally open source software
and so the specific tool that I use I've
been playing with here showing these
demos is a tool called AMA which is from
a group called AMA and they make AMA
which is probably the easiest way to get
up and running with large language
models actually that's literally their
one sentence on their homepage get up
and running with large language models
and so I just hit the download button
installed it it popped open window
terminal and then I just ran ama ama run
and the cool thing here is that they
have access to they've actually made all
of these open models super super simple
to run and so you can go to their
website and just go to the models list
and see llama 31 Gemma 2 from Google
mistol which is a big AI company I think
based out of France a deep
coder code Gemma like all these
different models that kind of have
different specializations
and you can actually just run any one of
them so you can just say run run Gemma
2 and the cool thing here is that it
doesn't take any programming knowledge
it doesn't take really deep technical
knowledge of any kind it's literally
just you interacting in text with this
model but AMA and a lot of the tools
around this that are open source are
flexible enough that if you want to go
deeper as a developer or tinkerer you
can go in and write uh you know python
interfaces or other interfaces to be
able to like download data or have the
model interact with the Internet or
interact with data on your machine again
all that you control in a way that you
can see the code or Rite the code modify
the model if you want to and have
complete ownership over without having
to wonder what's happening with your
data or what's happening out in the
cloud so really cool stuff happening
locally these days so what else can you
do with large language models I've been
kind of treating it a bit like Google
but you can go a bit deeper you know you
can actually be friends with your
virtual friend in your framework laptop
um and so we can pick a couple of other
examples like maybe we ask llama
3.1 what's the best burrito in San
Francisco and just see um yeah and it's
just dumping out this answer super
quickly so I I live in San Francisco I
love burritos and I have to say these
are pretty good answers actually these
These are good these are good burritos
they got a a good the stack ranking of
course is controversial but the specific
few that they pick they are good good
burritos we've been focused on large
language models so far which are
basically models that you can feed text
into it'll crunch it and then output an
answer back as text but as we look at
machine learning in AI some of the
coolest stuff happening goes outside of
text outside of llms and into other
types of models and so one of the cool
things about olama is that there's
actually Vision support and that you can
load in a model that you can feed an
image and the model will basically parse
the image understanding what's happening
there and then you can interact with the
model and ask it questions and kind of
interrogate it about the image or have a
conversation that's based on the image
and so I'm going to download an image I
just picked this one off of our website
it is now just downloaded and I'm going
to run this model that's called lava and
actually I'm going to run a variant of
it that's called
lava llama 3 which uh uses meta llama 3
in combination with lava to actually
give some pretty good results and so I'm
just going to feed it this image image
um that is conveniently called image.png
and I'm just going to ask it what's in
this image and we're going to see how
smart it is so it po it so in this image
a young woman is engrossing her work in
a silver what a silver MacBook laptop
hold on um that's not a
Macbook other than that it was pretty
accurate so behind her there's a window
that offers a Serene view of lush green
plants out side in addition to the
ambience um see the woman seems to be
enjoying her work and I think she
probably was um wow this is even worse
so I told it that's not a Macbook it
thinks it's an Acer Spire or a Toshiba
Satellite I don't even know what those
are so one one key factor here is that
there is actually a knowledge cut off
because this model is not touching the
internet it is running entirely locally
so it actually only has knowledge of
whatever was baked in at the point in
time that it was fed that data to Crunch
into and train that model obviously that
model has no knowledge of what framework
is as a company it might have even been
trained before framework existed as a
company so unfortunately we can't ask it
about framework laptop we can only ask
it about Acer Aspire Toshiba satellite
or MacBook it's okay though I'm not
offended by lava llama 3 in the future
of course these models will know a bit
more about the r repair and a bit more
about frame work hopefully there's other
cool stuff you can do too like image
generation there models called uh stable
diffusion there's a newer one called
flux that are these open image
generation models they are quite a bit
more complex to set up so we're not
going to play with them today but they
have actually gotten a lot better on AMD
historically they've been largely Nvidia
focused in terms of what has been open
to run locally on a laptop or on a
desktop amd's been closing the Gap
really really quickly with their roam
framework as so as you can see like with
Olam it just runs straight out of the
box with some of the image Generation
stuff takes a little bit more work but
it is getting there pretty quickly and
so of course this was the first time
we've talked about AI it's the first
time we've talked about machine learning
but the state-of-the-art is advancing
really quickly the frontier is just
moving incredibly fast so this is
something that we're definitely going to
be talking more about as we go write me
a script for a YouTube
video video explain explaining how to
run large
language models locally on a
laptop here's script oh wow so an intro
30 second intro hey there Tech
enthusiasts welcome back to the
framework YouTube channel today we're
going to talk about something that's
been making waves in the Tech Community
running large language models locally on
your laptop yeah you heard that right
with a rise of AI and machine learning
having a powerful language model at your
fingertips oh my God it's writing faster
than I can read
and scrolling powerful language while
your fingertips can be super useful for
developers researchers even hobbyists
like ourselves so let's dive into it and
this is still going so we've got a 12
minute script that ends with oh closing
shot with a friendly smile thanks for
watching everyone we'll catch you in the
next video
[Music]
Voir Plus de Vidéos Connexes
Introduction à l’intelligence artificielle - 5 - Apprentissage Automatique
How DePIN can revolutionize the way AI works (With Jonathan Schemoul from Aleph.im)
IA : Ce qu'on ne vous dit pas ...
Connecting Novelcrafter to OpenRouter - Getting Started
Le DANGER et la Face Cachée de l'INTELLIGENCE ARTIFICIELLE ( IA, Chatgpt, chat gpt 4..) Documentaire
The Sad Reality of AI Job Market w/ ML Engineer
5.0 / 5 (0 votes)