FREE Local LLMs on Apple Silicon | FAST!
Summary
TLDRこのビデオスクリプトでは、Apple silicon GPUを使用してMacBookでChad GPTのようなものを実行し、より高速に動作するプロセスについて説明しています。スクリプトでは、オープンソースのLLM(Large Language Model)を使用し、AMAというツールを使ってローカルマシンでモデルをダウンロード・管理する方法を紹介しています。また、オープンWeb UIを使用して、PythonとJavaScriptで構築されたバックエンドとフロントエンドを持つチャットボットを作成し、カスタマイズ可能なUIでモデルを切り替えて対話することができるデモンストレーションを行います。さらに、Dockerを使用したデプロイの簡単さも触れており、開発者がコードの背後にある詳細を理解し、より強力な知識と力を持つことができると結論づけています。
Takeaways
- 🚀 AppleシリコンGPUを使用して、MacBookでChad GPTのようなものを実行できるようになり、設定が簡単になりました。
- 💻 個人のマシン上で独自のGPTを実行することができ、Chad GPTと同様に見栄え良くなっています。
- 📚 ソフトウェア開発者として、コードを確認して裏側の動作を知ることができます。
- 🔍 open web UIを使用して、AMAと組み合わせてセットアップします。
- 📥 GitHubリポジトリをクローンし、ローカルNode環境やPython環境のセットアップが必要です。
- 🐐 AMAは、マシン上で実行され、LLMモデルを自動的にダウンロード管理するエージェントです。
- 📚 オープンソースのモデルをダウンロードしてローカルで実行できます。
- 🔧 複数のモデルを試して、ユースケースに最適なものを見つけることができます。
- ⚙️ GPUで完全に実行され、設定なしで透過的に動作します。
- 🌐 open web UIプロジェクトを使用して、美しいUIを構築し、ユーザーインターフェースからモデルを管理できます。
- 🔗 Dockerを使用して、開発環境を構築し、コンテナで実行することも可能です。
- 📝 コミュニティから共有されたプロンプトを使用したり、独自のプロンプトを保存・インポート・エクスポートすることが可能です。
Q & A
AppleシリコンGPUを使用してChad GPTのようなものをMacBookで実行することの利点は何ですか?
-AppleシリコンGPUを使用することで、すべての処理が高速化され、設定や設定手順を省くことができます。また、ローカルマシン上でGPTのようなものを実行できるため、より良い外観と高速な機能性が得られます。
オープンウェブUIとは何ですか?
-オープンウェブUIは、説明されているプロジェクトで使用される名前で、フロントエンドがJavaScriptで書かれており、バックエンドはPythonで構成されています。また、Docker設定も含まれており、コンテナ化して実行することが可能です。
AMAとは何ですか?なぜダウンロードする必要がありますか?
-AMAは、ローカルマシン上で実行され、LLM(Large Language Model)モデルを自動的にダウンロード・管理するエージェントです。様々なオープンソースのLLMモデルを提供し、ローカルで実行できるように設計されています。
なぜ複数のLLMモデルをダウンロードし、ローカルで実行するのですか?
-異なるLLMモデルはそれぞれ異なる機能とトレーニングを持ち、ユーザーは自分の使用ケースに最適な結果を得るために、様々なモデルを試してみることができます。また、モデルを組み合わせることもできるため、柔軟性が高まります。
オープンウェブUIのバックエンドに必要なPythonのバージョンは何ですか?
-オープンウェブUIのバックエンドには、Python 3.11が必要です。ただし、condaを使用して仮想環境を設定し、必要なライブラリをインストールすることができます。
オープンウェブUIのフロントエンドには何が必要ですか?
-オープンウェブUIのフロントエンドは、Node.jsとnpm(Node Package Manager)が必要です。また、必要なパッケージをpackage.jsonからインストールするために、npm installを使用します。
オープンウェブUIを実行するために必要なDockerの設定ファイルは何ですか?
-オープンウェブUIを実行するためには、Dockerfileとdocker-compose.ymlという2つの主要な設定ファイルが必要です。これらは、プロジェクトのドキュメントセクションで見つけることができます。
オープンウェブUIで使用されるモデルはどのようにして取得・削除・組み合わせることができますか?
-UI上から直接モデルをダウンロードしたり、既存のモデルを削除したり、複数のモデルを組み合わせて使用することができます。これにより、ユーザーは自分のニーズに応じて柔軟に操作することが可能です。
オープンウェブUIの設定で、どのようなカスタマイズが行えますか?
-オープンウェブUIの設定では、テーマの選択、システムプロンプトの設定、高度なパラメータの調整などができる他、独自のプロンプトを保存・インポート・エクスポートすることもできます。
オープンウェブUIのコミュニティ機能とは何ですか?
-オープンウェブUIのコミュニティ機能を通じて、他のユーザーが共有しているプロンプトにアクセスしたり、独自のモデルを追加・共有することができます。これにより、より幅広い機能を利用することが可能です。
Dockerを使用してオープンウェブUIを実行する際の利点は何ですか?
-Dockerを使用することで、プロジェクトの依存関係をコンテナ化し、実行環境を簡単に構築・管理することができます。また、開発環境のセットアップを簡素化し、同じ環境を簡単に再現できる利点があります。
オープンウェブUIで提供される独自のモデル機能とは何ですか?
-オープンウェブUIでは、ユーザーが独自のモデルを追加できる機能が提供されています。これにより、特定のタスクに特化したモデルを実行することができ、より高度な機能を利用することが可能です。
Outlines
🚀 GPUで高速実行可能なAIモデルのローカル実行方法
Apple silicon GPUを使用して、MacBookでChad GPTのようなAIモデルを実行する方法が説明されています。AMAというツールを使用して、様々なオープンソースのAIモデルをダウンロード・管理し、ローカルで実行することが可能です。また、オープンソースのWeb UIを使用して、見た目も良く、高速に機能性を提供できるカスタムGPTを構築する方法も紹介されています。
💻 前提知識と環境構築
ビデオでは、ローカルNode環境やPython環境のセットアップが既に済んでいることを前提としていますが、必要であればリフレッシュのためにリンクを提供します。リポジトリのクローン、ディレクトリの確認、フロントエンドとバックエンドのコード構成、Dockerの使用、AMAの実行方法などが詳しく説明されています。また、さまざまなAIモデルの特徴や、それらを組み合わせて使用する方法についても触れられています。
🖥️ カスタムGPTの実行とカスタマイズ
AMAを使用してダウンロードしたAIモデルを、カスタムGPTで実行し、プロンプトに応答する方法が紹介されています。また、Web UIを使用して、インターフェイスをカスタマイズし、モデルの選択や組み合わせ、テーマの変更、システムプロンプトの追加、プロンプトの保存と共有、そしてコミュニティとの連携を行う方法が説明されています。さらに、Dockerを使用した実行方法も簡単に紹介されており、開発者が背後で行われていることを理解し、より深く学ぶことができるように、詳細な手順が提供されています。
Mindmap
Keywords
💡Apple silicon GPU
💡Chad GPT
💡AMA
💡オープンソースモデル
💡Docker
💡システムプロンプト
💡フロントエンド
💡バックエンド
💡コンテナ化
💡モデルファインチューニング
💡Web UI
Highlights
現在可以在MacBook上使用Apple硅GPU运行类似Chad GPT的应用程序,这使得一切运行得更快,且无需进行以往的配置和设置。
可以拥有自己定制的GPT,它在本地机器上运行,并且界面看起来比上周的混乱情况要好得多。
介绍了使用Open Web UI设置AMA(自动管理下载LLM模型的代理)的过程。
展示了如何克隆Open Web UI的仓库,并简要查看了前端和后端的代码结构。
介绍了使用Vit和Svelte Kit编写的前端,以及Python后端。
提供了Docker配置,展示了如何在不使用Docker的情况下常规运行程序。
详细说明了如何下载和运行AMA,以及如何通过命令行界面(CLI)获取模型。
讨论了不同模型的来源和它们各自的优势,如Llama来自Facebook Meta,53来自Microsoft等。
展示了如何在GPU上运行模型,并且无需额外配置,同时展示了GPU使用情况。
介绍了如何使用Open Web UI项目创建一个漂亮的用户界面,并激活Python环境。
展示了如何安装Python依赖项,并运行后端服务器。
说明了如何使用Node和npm安装前端依赖,并构建前端应用程序。
展示了如何启动后端服务,并通过Web界面与模型进行交互。
讨论了如何在UI中选择和组合不同的模型,以及如何通过用户界面直接拉取和删除模型。
介绍了如何使用社区共享的提示,并展示了如何存储和使用自定义提示。
展示了如何通过Open Web UI社区添加和使用自定义模型,如Code Companion。
提供了一个单一的Docker命令,以便快速运行Web UI,而无需手动执行所有步骤。
提到了另一个运行本地LLM的方法,如果不希望受限于olama提供的模型。
Transcripts
we're now at a point where we can run a
Chad GPT like thing on our MacBook using
the Apple silicon GPU which makes
everything run faster and we don't have
to configure that or go through all the
setup that we used to go through it's
much much easier now and you can even
have your own gpts kind of like gpts
that Chad GPT has on your own machine
and it looks way better than uh at least
the mess that I made last week hey that
was a beautiful prompt okay if you
missed that video I'll link to it down
below it didn't look very good today
it's going to look good and we're going
to get a ton of functionality and it's
going to be fast let's do it now there's
a couple of ways of going about this I
want to show you the details that go
into it because this you know we're
software developers we want to know
what's going on behind the scenes plus
this has the added advantage of showing
you the code so you can actually take a
look under the hood because I think it's
really cool this is what we're using
right here it's called open web UI we're
going to set up AMA and then we're going
to set up this on top of it the first
thing you're going to to do is clone
this repository now look I'm going to
show you the step by step I'm assuming
that you already know how to clone a g
repository but I will not assume that
you have a local node environment or a
python environment setup for those
videos I'll link those down below in
case you need a refresher I'm going to
clone this repository let's go get clone
boom and here it is we're going to go
into that directory and let's have a
peek at the code this has everything
this has the front end written in
JavaScript it's actually written in uh
selt here go to package.json so so this
will run on node and this is the front
end we're using vit and we're using spel
kit and this is like all the modern
stuff the modern text deck the back end
is a python back end so that's going to
be in the back end folder you can take a
look at the python side of things and
this also has Docker configurations so
yes you can run this through Docker but
first I'm going to show you how to do
this the regular way without Docker
we're going to kick things off with AMA
go to ama.com and download it but you're
going to be like what the heck is olama
and why should I download it and why
should I run things you tell me this is
the internet you might be doing bad
things to me so AMA is like an agent
that runs on your machine that
automatically manages downloading llm
models you can go to models here and
these are the models that are available
these are open- Source models you got
llama 3 you got Gemma you got mixol code
Gemma all these models that are
available and this is an open source
project so you can go to the GitHub page
and check out this code too but we don't
need to do that all we need to do is
download this tool so we're going to go
to the homepage click on download Mac OS
this is available for Linux or Windows
but you going to get the Mac OS version
cuz this is a Mac OS tutorial I'm doing
this on Mac OS Apple silicon let's go
boom in your downloads folder you're
going to get this file double click on
it it's going to extract AMA which you
can then drag to your applications
folder boom in applications find AMA and
run it are you sure yes we're sure this
is going to have a little cute llama in
your menu bar and it's running you can
tell that it's running by going to Local
Host
11434 is the port boom and it says olama
is running so how do you use this thing
well if you go back to the command line
now you can say AMA version it says AMA
version is that version right there
whatever version you might have when
you're watching this now you can use the
AMA CLI to fetch models and you can get
the model names on the website under
models llama 3 for example let's do that
so I'm going to say AMA pull llama 3
boom it goes out it gets the file it's
4.7 GB I've already downloaded before so
it took me absolutely no time at all but
it might take you a few minutes if I
want another one like llama 2 boom I
also had that one let's get one that I
didn't have 53 it doesn't matter to you
I'm going to speed up that video I'm not
going to make you watch it but I just
want to see how long it takes this one
is small it's 2.3 GB and it says it's
going to take like 25 seconds to
download what's up with all these
different models well they have
different capabilities they're trained
differently llama is from Facebook meta
53 is from Microsoft Gemma is from
Google mixol is from mistol all of these
companies spend millions of dollars
training these models so that you can
get them for free and run them locally
on your own machine and you're going to
have to play around to see what gives
you the best results for your use cases
all right I got three of these models
what do we do next with this well we can
run it ama run uh let's go with 53 what
this will do is create a prompt so you
can interact with a model right there Hi
boom look how fast that is that's insane
how can I assist you today but I want
you to see something here I'm going to
open up my activity monitor I have 64
gigs of RAM on this thing so the number
of models I'll be able to run should fit
within that 64 GB actually it's it's a
little less than that it's about 75% of
that earlier I made a video as to why
it's less but we won't get into that
here I want to open up the GPU history
here and have that little window open so
we can take a look at what's happening
because this is not running on the CPU
it's running on the GPU completely
transparent to us write me a
1,000w essay on JavaScript boom and
there it goes it starts writing that us
it's really fast it's crazy fast it's
probably going to be done before we see
anything here oh we do see stuff it's
done with the with the writing but at
least it gave us a little bit of a blue
Mark right there you can see that the
GPU usage this is the Apple silicon GPU
by the way was almost fully utilized for
that moment it was generating that text
and we didn't have to configure anything
like we did previously to exit this I'm
going to say buy because now we need a
front end a pretty UI that looks like
Chad GPT let's go back to our open web
UI project and I'm going to pop open the
terminal here in Visual Studio code
control back tick will do that for me I
have cond installed if you don't know
what I'm talking about I'll link a video
K lets you set up python environments
where you can run python code in
projects so that's what I have here it
says base because that's the base I'm
not in any active cond environment but I
will create a new one condac
create-- name and I'm going to call this
one open web UI and I want to use Python
equals. 311 if you follow my steps right
there you'll be fine but if you want to
know more details you can watch that
other video and now we're going to just
use this command to activate that
environment cond to activate open web UI
so copy that line paste it and now
instead of Base we have open web UI here
so now we're inside that python
environment that has python version 311
if we say python version 311 there we go
now we can go to the backend folder and
Mis type all kinds of stuff before we
get to the point so I'm going to do pip
install dasr requirements this has a
bunch of requirements that we need to
install I'm going to give it the dash U
flag for upgrade and if we take a look
at what that is in the backend folder we
have the requirements. text file so it
has all these requirements to run the
back end wow YouTube transcript API I
wonder if it can do it probably can I'm
not going to for for another time I'm
I'm getting distracted here pip install
- R requirements this actually takes a
couple minutes to do because there's so
many requirements and they all have to
be installed within this environment
what's nice about this environment setup
is I don't have to mess up my python
installation with all these requirements
that I might need only for this thing
that's why I like to use cond all right
we also have to run the front end
environment which is a node application
so while that's happening I'm going to
leave that alone oh it's done okay great
but if it was still working I can go up
here and say I want to do another
terminal I'm going to use node and npm
and for node I also have an environment
manager for that I use NVM for that
let's take a look at what version of
node I'm working with here I'm working
with 18 I'm fine with that you can use a
more modern version of node this is
modern enough for this purpose for node
version management I linked another
video you can check that out later or
you can just follow my steps here just
make sure you have node installed okay
just don't install it wrong please don't
install node globally you can if you
want but just don't let's move on
npmi what what does npmi do for those of
you that don't know is going to look at
package.json and install all the
dependencies that are here basically NP
I is short for npm install once that's
done we're ready to build this npm run
build which is going to do this script
right here cool we got our front end we
got our back end let's go back to the
back end terminal where we just
installed all the python requirements
and we're going to run Bash
start.sh that's in the backend folder
boom when you start it up and go to
Local Host port 8080 it takes you to off
automatically because this has authentic
a built in with a database really cool
for you to look at that code by the way
check it out and uh you do need to sign
up it's not going to send your
credentials anywhere it's all stored on
your machine this is just for fun Alex
let's go with my email this can be a
fake email by the way which is what I'm
going to do right now create account
boom and now I'm signed in and look at
this beautiful interface um I can start
chatting but I can't really until I
select a model so if we go up here to
select the model you'll see all the
models that we have installed you can
switch between them and you can even
combine models I'll show you that in a
moment so we've downloaded llama 2 you
saw me do that llama 3 and 53 these two
I'll talk about that in a second where
they came from I can say let's go with
53 and maybe I want to mix in llama 3
and let's do those two and I'll say hi
to both of them so it chose llama 3 for
some reason it says hello it's nice to
meet you is there something I can help
you with this is basically going to O
Lama and then returning the result to
the back end and then the front end and
that's how you interact with this in the
settings you have the ability to select
a theme system prompt Advanced
parameters you can play around with that
here's where you can have models and you
can pull models directly from this user
interface so if if you see another model
you like like mistol for example you can
type that in here and pull it down this
way delete models that exist all sorts
of add-ons for the UI image Generation
by the way I don't have a video for this
yet but if you want to see an image
Generation video with automatic 1111 and
how that integrates let me know in the
comments down below I may just do that
here or maybe a memb only video if
you're a member thank you so much for
being a member if you want to join the
channel and support the channel I make
special videos just for members as well
as these videos but don't worry I'll
also make more tutorials and things like
that for the main Channel as well if you
subscribe that's completely free that's
your cue to subscribe and if you like it
if you like this videos give me a thumbs
up so I know all right what do we got
here chats Imports exports account this
is your user management if you click on
user you can sign in sign out archive
chats playground with a playground
that's a little bit different than chat
you can actually add your system prompt
you might not know this but when you're
using chat GPT there is a system prompt
that gets sent along with your prompt in
addition to the context now here's a
really cool thing prompts this allows
you to basically store your own prompts
if you like some prompts you've created
them they work well for you you can
store them in here you can import them
you can export them and you can use
Community prompts so there's a whole
community that shares their prompts that
you can access through this UI it's not
working right now maybe when you try it
it'll work but what I did try was this
right here if you go to model files you
have made by open web UI Community this
is basically like gpts in Chad GPT where
where people put together their own
little models but it's more than just
gpts gpts is basically you provide a
context you provide some sample
prompting but here people can actually
add their own models to it so for
example code companion this gives you
the uh model file content so it gives
you a system prompt it gives you some
parameters to start with when you
install this it loads it up icon and
everything see this model tag name this
will actually grab the model that's
associated that's been fine-tuned I'm
guessing it's been fine-tuned if you
know any better let me know in the
comments by this person that created
this thing so if you go to save and
create is going to pull the Manifest
download any Associated models which may
be large see that pull progress right
there it's going to take a little bit of
time because this thing is pulling down
two pretty large files I think these are
26 billion parameter models that it's
pulling down related to code generation
specifically so I'm going to have some
coffee I don't know about you but that's
what I'm going to do I'm out of coffee
I'm going to have to sit here and wait
we're almost there folks we're almost
there we got code companion so now when
I go to new chat I can select that code
companion model write me some code I
know super descriptive but you're
probably sick of seeing all the examples
let's see what it comes up with when
just prompted like that oh it CHS go and
there we go get it I'm sure it can write
some code beautiful code but this is not
a video about that we're also set up
folks but there's one more thing and I
told you about Docker now I do have
Docker installed you can go to Docker
desktop products Docker desktop download
for Mac make sure it's the mac Apple
chip version if that's in fact what you
have run Docker desktop and again I have
entire development setup videos I'll
link to one of those down below how I
set up my development environment on Mac
that includes Docker once Docker is up
and running it's really easy peasy all
you got to do is go right here openweb
ui.com and you want to go to the docs
and getting started in fact this is way
easier than the steps that I showed you
but I wanted to show you the details so
that you know what goes on behind the
scenes because for developers that's a
little bit more important I think than
just having the thing run because that
gives you extra knowledge and extra
power when you're looking at this code
and you have the project everything is
here it's a beautiful thing including
the docker configuration Docker compose
there's a Docker file right there it
tells you everything that's being pulled
what ports are being mapped what dri are
being mapped and so on but I will give
you one single command to run you can
skip all this stuff right here and run
this Command right here it's right under
this headline in the documentation and I
will actually copy and paste this and
put it in the description now you know
how to run this web UI which is very
goodlooking it's so easy there is
another way to run local llms if you
don't want to be limited by the models
that ol Lama provides and I made a video
for that too right over here check that
out and I'll see you in the next one
[Music]
5.0 / 5 (0 votes)