Set up a Local AI like ChatGPT on your own machine!
Summary
TLDRIn this video, Dave, a retired software engineer from Microsoft, walks viewers through setting up a ChatGPT-style AI on their own machine. He demonstrates the process using a high-performance Dell Threadripper workstation, but explains that the setup works on more modest hardware as well. Key benefits include full privacy, cost savings, customization, and offline functionality. Dave covers everything from installing WSL2 and Ubuntu on Windows to deploying AI models with Llama and using a web-based interface for seamless AI interaction. This video is perfect for anyone curious about self-hosting AI or enhancing tech skills.
Takeaways
- 💻 Dave introduces his workshop on how to set up and run a ChatGPT-style AI on a personal machine, no cloud services or fees required.
- ⚙️ The demo machine is a high-end Dell Threadripper workstation with 96 cores, 512GB RAM, and dual Nvidia A6000 GPUs, valued at around $50,000.
- 🔒 Hosting your AI locally offers complete data privacy, ensuring no sensitive information is sent to third-party servers.
- 💡 Running AI models locally provides cost savings, especially for high-volume use, and is a free alternative to paid services like ChatGPT Plus.
- 🚀 Customization is a major advantage, allowing users to fine-tune models, integrate them into workflows, and even train the AI on proprietary data.
- 🌍 Self-hosted AI can run offline, making it useful for environments with unreliable internet, like airplanes or remote locations.
- ⚡ Running the AI locally reduces latency, speeding up responses and improving performance for real-time applications.
- 📚 Setting up a self-hosted AI offers a great learning experience with machine learning, model fine-tuning, and using GPUs, providing valuable tech skills.
- 🖥️ The setup requires WSL2, Linux, and Docker, and Dave walks through how to install these on both Windows and Linux environments.
- 🌐 Open Web UI provides a user-friendly interface similar to ChatGPT, allowing users to interact with AI models, customize settings, and add new models easily.
Q & A
What is the main purpose of this video?
-The main purpose of the video is to show how to set up and run a ChatGPT-style AI on your own machine without relying on cloud services, ensuring full privacy and control.
What is one key advantage of running AI locally on your own machine?
-One key advantage of running AI locally is data privacy. With a self-hosted AI, no data is sent to third-party servers, ensuring that sensitive conversations and private data remain fully secure.
Why does the presenter recommend running the AI on powerful hardware like the Dell Threadripper workstation?
-The presenter recommends powerful hardware, such as the Dell Threadripper workstation, because it can significantly accelerate the performance of the AI model. While modest hardware can run the AI, better hardware will result in faster execution and response times.
What are the two main technologies utilized to set up the AI in this tutorial?
-The two main technologies used to set up the AI are Linux (specifically WSL 2 for running Linux on Windows) and Docker (for running pre-built containers of AI models).
How does running a local AI model save on costs?
-Running a local AI model saves costs by eliminating the need to pay for cloud-based AI services, such as ChatGPT's API or premium subscriptions. This can be especially beneficial for those running a high volume of queries.
Why might developers or businesses prefer running their own AI models locally?
-Developers or businesses might prefer running their own AI models locally because it allows them to customize and fine-tune models to cater to specific needs, integrate them into workflows, and use proprietary data securely.
What is the advantage of running AI locally in terms of response time?
-Running AI locally can significantly reduce latency, as the model can respond immediately without waiting for a round trip to the cloud, which is especially useful for real-time applications like gaming or customer support.
What is LLaMA, and why is it mentioned in this video?
-LLaMA (Large Language Model Meta AI) is the AI system used in this tutorial. It is a local model similar in power to ChatGPT, and the presenter demonstrates how to set it up and run it on a local machine.
What role does Docker play in setting up the AI system?
-Docker is used to run a pre-built container for the Open Web UI, providing a user interface for interacting with the AI model in a manner similar to ChatGPT's interface. Docker makes it easy to set up and manage the AI environment.
What are the steps to set up WSL 2 on a Windows machine, as outlined in the video?
-To set up WSL 2 on a Windows machine, the steps are: 1) Run 'wsl --install' in PowerShell as an administrator, 2) Download and install the Linux kernel update package from Microsoft, 3) Set WSL 2 as the default version, and 4) Install a Linux distribution such as Ubuntu from the Microsoft Store.
Outlines
🔧 Setting Up ChatGPT-Style AI on Your Machine
In this introduction, Dave, a former Microsoft software engineer, explains how to set up and run a ChatGPT-style AI on your own machine. He highlights the benefits of hosting a large language model locally, such as privacy and cost savings, and demonstrates its performance on high-end hardware. While it can run on modest systems, Dave focuses on how to significantly accelerate the setup using a powerful Dell Threadripper workstation with dual Nvidia A6000 GPUs. This setup is presented as an alternative to relying on cloud-based AI services, offering full control over data and performance.
🖥️ Hardware Setup and the Role of WSL 2
Dave explains the joy of watching powerful hardware at work and introduces two key technologies, Linux and Docker, necessary for setting up AI. He walks through how to install and configure Windows Subsystem for Linux (WSL 2) on Windows 10 or 11 to enable Linux compatibility. The steps include enabling features in PowerShell and downloading the necessary Linux kernel update. After ensuring WSL 2 is running, Dave recommends installing Ubuntu as a Linux distribution. The setup aims to balance power and simplicity, preparing the system for AI model deployment.
🤖 Installing and Running LLaMA AI Model
Dave covers the installation of the AI system 'ollama,' which is used to run models like LLaMA 3.1. He explains how to install and set up the server using a simple curl command, followed by downloading and running a large language model (LLaMA 3.1). The instructions also include listing installed models and launching the AI system for local use. With an example of a smaller model, Dave showcases the speed and responsiveness of running AI locally. To enhance the experience, he introduces the option to integrate Open Web UI for a more user-friendly interface, mimicking ChatGPT’s layout.
🌐 Open Web UI: A Powerful, Customizable AI Interface
Dave introduces Open Web UI, a web-based interface for interacting with AI models locally. He explains how to install Docker and run a pre-built container that sets up the web interface. Users can access it through their browser and manage the system via an admin control panel. The UI allows for multiple models, file uploads for context, and other advanced features. Dave stresses how easy it is to select models tailored to different tasks, install new ones, and customize parameters, giving users complete control over their AI experience.
💻 Final Thoughts and Channel Engagement
Dave wraps up the video by discussing the versatility and customization Open Web UI provides when running AI models locally. He encourages viewers to explore the AI space through this hands-on approach. In a lighter tone, Dave reminds his audience to like and subscribe to his channel and check out his secondary content on autism, including his book on living a fulfilling life on the autism spectrum. He concludes by expressing appreciation for his community and invites viewers to return for more content.
Mindmap
Keywords
💡Self-hosted AI
💡Dell Threadripper Workstation
💡WSL2
💡Docker
💡LLaMA 3.1
💡GPU Acceleration
💡Data Privacy
💡Open Web UI
💡Model Fine-tuning
💡Latency Reduction
Highlights
Introduction to the host Dave, a retired software engineer from Microsoft, discussing setting up a ChatGPT-style AI on a home machine without the need for cloud services.
The AI setup can be run on modest hardware but runs much faster with advanced hardware like the Dell Thread Ripper workstation with 96 cores and 512 GB of RAM.
Privacy and security are key benefits of running AI locally. All data stays on the machine, avoiding third-party servers and ensuring no data breaches.
Significant cost savings when running AI locally, especially for high-volume usage, as opposed to paying for cloud-based services like ChatGPT Plus.
Customization opportunities: Locally running AI allows users to fine-tune models for specific needs, integrate them into workflows, and train the AI on proprietary data for more relevant responses.
Local AI setup runs offline, making it ideal for situations where web access is unavailable or unreliable, such as on airplanes or in remote areas.
Improved latency with locally running AI, providing faster query responses without the need for round trips to cloud servers.
Running AI locally offers a hands-on learning experience with machine learning frameworks, GPUs, and complex systems, valuable for developers.
The first step in setting up the local AI environment is installing WSL2 (Windows Subsystem for Linux) to enable Linux on Windows machines.
Installing Ubuntu as the Linux distribution of choice, which is one of the most popular and well-supported for WSL2.
The LLaMA AI model is used in this setup. After installing WSL2 and Ubuntu, the LLaMA model is pulled using the 'olama' command, allowing users to start running the AI locally.
A demonstration of running the LLaMA 3.1 model on a 96-core machine, showcasing its impressive performance even with smaller models.
Introduction to Open Web UI, a web-based user interface that mimics ChatGPT's layout and allows interaction with the AI in a more intuitive way.
Open Web UI supports multiple models and can be easily customized for different applications, offering flexibility in AI usage.
Final thoughts on the potential for AI customization and power when running models locally, highlighting the control and flexibility it gives to users without the need for external services.
Transcripts
hey I'm Dave welcome to my shop I'm Dave
plumber a retired software engineer from
Microsoft going back to the MS Doss at
Windows 95 days and today we're diving
into something I think you're going to
find incredibly cool how to set up and
run your very own chat GPT style AI
right at home on your own machine that's
right we're talking about hosting a
powerful large language model on your
own machine at home completely under
your private control no cloud services
needed no monthly fees or guard rails
and while you can run it even on a
modest laptop while sitting on an
airplane we're going to significantly
accelerate things by showing you how it
performs on a top tier Del thread Ripper
workstation a 96 core Beast featuring
512 GB of RAM and dual Nvidia A8 to 6000
gpus One Step Above the 490 that are
rocking and whopping 96 GB of video
memory now if you've ever played around
with chat GPT or something similar
online you've probably already realized
how incredible these systems are at
answering questions writing and
debugging code and generating content
and even holding full conversations but
what if you didn't have to rely on
somebody else's server to do all that
what if you could have all that power
right there on your own computer running
locally and fully private we're going to
go through why this matters and what
kind of Hardware you'll need and of
course how to get it all set up whether
you're just curious about the tech
behind it concerned about privacy or
looking to save on API costs you're
going to want to stick around because by
the end of the episode you'll know
exactly how to set up your own AI at
home complete with a chat GPT style UI
multiple models contact files and much
more now the first thing you're going to
need is a computer there's an old saying
that applies to hot roding cars as
equally as it does to setting up a
modern AI server speed cost money kid
how fast do you want to go and the
reality is you do not need fancy
Hardware to run it but it will be a
great deal faster with it where you put
the budget versus speed slider is a
personal decision but the tldr is that
you can run it on Modest Hardware but
the better Hardware that you have the
faster it will run you don't even need a
GPU though in an Nvidia card will
significantly speed things up and to
prove my point we're going to run it on
a serious workstation the Dell thread
Ripper workstation this machine is on
loan from Dell and it features the
thread rer Pro 7995 WX CPU with 96 cores
and 192 threads of processing power
better yet as I mentioned before it
features dual Nvidia a6000 cars or a to
6000 cars that retail for about $30,000
a pair the CPU is also worth 10K on its
own so by the time you add RAM and
storage this machine is pushing some
$50,000 before we set it up let's take a
look at a few of the reasons that you
might wish to do so the first is the
data privacy and security aspect with a
self-hosted AI your data stays yours no
information ever gets sent to the
third-party servers so sensitive
conversations or private data remain
fully secure this is a significant
selling point especially with increasing
concerns over data privacy and potential
data breaches at this point chat GPT
knows more about me than I'm really
comfortable with but that's kind of the
price of entry for using it there are
still certain questions and topics that
I'm not comfortable having part of my
public profile and so in those cases a
private AI is a big win there may also
be cases where you're not comfortable
uploading your context documents like
perhaps your proprietary source code to
the public AI with a private AI you can
give it access to all of your documents
and that information stays private on
your local machine there are also
significant cost savings particularly at
higher volumes chat GPT plus is only
something like 20 bucks a month but if
you're doing a lot of queries above the
basic limits or using their API the
costs can add up quickly since the AI
will be demoing is roughly equivalent to
the power of chat GPT 4.0 it's a
perfectly acceptable free substitute
running your own AI allows for a level
of customization not possible with the
external Services you can fine-tune the
models to cater to your specific needs
integrate them into your workflows and
even train the AI on your proprietary
data setor documents for hyper relevant
responses for developers or business
this level of control can be a
GameChanger the fact that it runs local
also means that it can run offline a
self-hosted AI can function without an
internet connection making it useful in
scenarios where web access is unreliable
or unavailable such as airplanes remote
locations research facilities or
situations requiring data autonomy like
defense and Healthcare depending on your
Hardware running your AI locally can
significantly reduce the latency in
responding to queries rather than
waiting for a round trip to the cloud
servers in back the the AI can respond
immediately making interactions faster
which is especially useful for high
performance applications like gaming
real-time customer support or
interactive
conversations and for the folks
interested in the AI space setting up
your own AI is a powerful learning
opportunity it provides hands-on
experience with machine learning
Frameworks fine-tuning models working
with gpus and handling complex systems
valuable skills in today's Tech
landscape plus it makes it trivial to
test drive dozens of different models
and select the one that's best fre your
situation and finally maybe it's an ASD
thing like watching the washing machine
but I just love to see fast Hardware
working hard the Dell workstation has
been doing a ton of headless work like
compiling a Linux curdle which it can do
in 19 seconds but I'm not really a gamer
so the poor gpus were sitting idle most
of the time and that's when it came to
me I could run AI on them for some
reason it tickles me in a special way
when you see the GPU meter Spike to 100%
on both
a6000 there are two technologies we're
going to take advantage of today which
are Linux and Docker but never fear
we're going to do it all on top of
windows so whether you're running
Windows or Linux on your machine I've
got you covered the first thing we need
to do is to make sure your system is set
up to support
wsl2 you'll need Windows 10 version 1903
or later or Windows 11 which comes with
wsl2 out of the box if you're running
Windows 10 and not sure what version you
have hit the Windows key plus r type
winver and hit enter if you see a
version number of the 1903 you're good
to go now to install WSL two you need to
enable a couple of features in Windows
first we're going to turn on WSL itself
and then enable the virtual machine
platform which is required for WSL 2 to
run here's the command you need to run
in Powershell make sure you run
Powershell as an administrator and then
run WSL Das install that'll take a few
minutes and then with those features
enabled the next step is to install the
actual Linux kernel update package
that's needed for
wsl2 Microsoft provides this as a
download and I'll put a link to it in
the video description and thank
thankfully it's easy to get open your
browser and head to Microsoft's website
search for wsl2 Linux kernel update
patch for x64 machines and download and
install that file once the kernel is
installed it's time to set WSL 2 as your
default version and that way whenever
you install a new Linux distribution it
will default to running under WSL 2
instead of WSL 1 you can set that with
the following command WSL D- set- deault
dvion 2 now we're cooking with gas at
this point you've got WSL 2 set up and
ready to go but you'll still need to
install a Linux distribution my
recommendation is to start with auntu
since it's one of the most popular and
well supported Linux distributions for
WSL you can grab it directly from the
Microsoft store or you can install it
from Powershell with WSL D- install DD
auntu once it's done downloading launch
auntu from the start menu it it'll go
through a brief install process where it
sets up your new user account and
password for the Linux environment and
that's it it you've now got a full Linux
environment running alongside Windows
you can open up your Linux terminal
anytime from the start menu or from any
folder by typing WSL in the address bar
of file explorer from there you can
install software run Linux commands or
even develop fullscale applications
right within your Windows system now
that our system is ready we can install
olama which is the AI system that we'll
be using to run the models to do so we
launch the install script directly from
the olama website using the curl command
and piping it into a command shell the
command looks something like curl Das
fssl and then the URL piped into sh once
it's installed all we need to do to
start things is to run the olama serve
command olama serve now with the server
running we'll open another command shell
and install our first model which will
be llama 3.1 to install a model you pull
it using olama like so olama pull llama
3.1 colon latest this is a 5 gbyte
download so depending on your internet
speed it can take some time to complete
but it will display status for you as it
goes once you've pulled the model you're
ready to run it to see the models that
have been installed if any you can run
ol llama list this will produce a list
of installed models and you should see
the Llama 3.1 model that we just pulled
to run the model we use the Run command
with the model name AMA run llama 3.1
colon latest that will give us a console
interface to the large language model
where we can use it much as you would
chat GPT on its homepage you can see how
quickly the model model responds you'd
be hard pressed to find an online model
that works this quickly now granted this
is a smaller model with 8 billion
parameters but even so it produces
useful answers at an amazing clip at
least on this machine Let's upgrade our
experience significantly by taking
advantage of open web UI which will give
us a web-based user interface that looks
a surprising amount like chat GPT now we
could enlist in the whole project right
from GitHub but we don't need to do that
in fact we don't need to install it at
all all we need is Docker on our system
and we can simply run a container that's
pre-built for us the easiest way to
install Docker on Linux is with the snap
command pseudo snap install Docker with
Docker installed we can then run the
container we need to launch the webui
this will pull the open webui container
which is a couple of gigabytes again on
its own so it can also take a few
minutes to complete depending on your
internet speed now the docker command
line is fairly long and complicated so
check the video description so you can
copy and paste it and maybe stick it in
a batch file for future use either way
once you launch it it will set up the
web UI on Port 3000 of your machine so
to access it browse to the machine name
followed by a colon and the number 3000
such as Local Host colon 3000 now when
you first launch it you will need to
create an account and you will then be
the administrator of the system as the
first account by default you can share
your url with others but they will also
have to create an account and then we
will have to approve them in the admin
settings control panel interacting with
the UI is very much like using chat GPT
The Familiar list of previous chats can
be found in the left-hand sidebar and
you can even upload files to it as
context for your discussions with the AI
open web UI is designed to offer
flexibility and power when working with
various AI models making it a versatile
tool for users one of the most important
things to understand is how to select
the right model for your task when you
first open the interface you'll see a
list of models each tailored for
different applications from natural
language processing to image generation
the interface makes this process
intuitive allowing you to switch between
models depending on what you're aiming
to achieve you don't need to be an
expert to know which model to pick
because each one typically includes a
description giving you an idea of its
strengths once you've chosen a model the
system seamlessly loads it ready for you
to interact with it right away another
key feature is the ability to install
new models open web UI doesn't restrict
you to the pre-installed models which is
great if you want to experiment with new
or more specialized ones installing a
new model is straightforward you simply
input the repository information or
upload the model name the system then
integrates it into your workspace making
it available in the same way as the
default options this capability opens up
a lot of room for customization and
expansion especially for users who want
to try out Cutting Edge AI models or one
specifically tuned for Niche
applications as you begin to dive deeper
you'll find a wealth of customization
options built into the UI these are
important for tailoring the behavior of
the models or even how the interface
itself responds you can can adjust
parameters that control the length and
the creativity of text Generations or
change how fast or slow responses come
and configure resource usage if you're
working on Hardware with specific
constraints each of these features while
Advanced is accessible through the clear
graphical interface which encourages you
to explore without fear of breaking
something the combination of
user-friendly design with technical
depth gives you the freedom to get the
results you want without needing to
write any code the combination of olama
and open web UI serve to give you the
complete AI experience directly on your
own machine if you found today's episode
to be any combination of informative or
entertaining remember I'm mostly in this
for the subs and likes so I'd be honored
if you'd consider subscribing to the
channel and leaving a like on the video
and if you're already a subscriber
thanks and be sure to check out my
second Channel Dave's attic where you
can find our weekly podcast that goes
live every Friday at
4M if you have any interest in matters
related to the autism spectrum please
check out the free sample of my book on
Amazon the non-visible part of the
autism spectrum it's intended for folks
that don't have a diagnosis but who
suspect they might have a few traits in
common with the Spectrum it's everything
I know now about living a great life on
the spectrum that I wish I'd known long
ago in the meantime and in between time
I hope to see you next time right here
in Dave's Garage do it l do it do it
関連動画をさらに表示
Private AI Chatbot on Your Computer - Step by Step Tutorial
ChatGPT vs. Gemini vs. Claude -- 6 AI Models in 1 Tool (ChatPlayground Review)
手把手教大家搭建Autogen|微軟的多智能體協作系統|多AI協調|#multi-agent #autogen #openai #chatpgt #gpt4 #gpt #web3
The Easiest Stable Diffusion With Quality. Fooocus Tutorial.
ULTIMATE FREE LORA Training In Stable Diffusion! Less Than 7GB VRAM!
GoHighLevel AI Appointment Booking Bot | Step-by-Step Guide
5.0 / 5 (0 votes)