Run ALL Your AI Locally in Minutes (LLMs, RAG, and more)
Summary
TLDRThe video introduces a comprehensive package for setting up local AI infrastructure, developed by n8n. It includes tools like llama for LLMs, Quadrant for vector databases, and PostgreSQL for SQL databases. The presenter guides viewers through installation using Docker Compose and customizing the setup for workflow automations in n8n. The script also covers extending the package for a full Rag AI agent, showcasing its capabilities and future expansion plans.
Takeaways
- 🎉 The video introduces an exciting package for local AI developed by the n8n team.
- 🛠️ The package includes components like llama for LLMs, Quadrant for vector databases, PostgreSQL for SQL databases, and n8n for workflow automation.
- 🌐 The presenter is enthusiastic about the potential of running your own AI infrastructure, especially with powerful open-source models like llama.
- 📚 The GitHub repository for the self-hosted AI starter kit is basic, with key files being the environment variable file and the Docker Compose file.
- 🔑 Before starting, ensure you have dependencies like git and Docker installed.
- 💻 Clone the repository and set up your environment variables for services like PostgreSQL.
- 📝 The Docker Compose file needs modifications for exposing necessary ports and pulling specific models for tools like llama.
- 🖥️ Detailed instructions are provided for different system architectures, including those with Nvidia GPUs and Mac users.
- 🔍 The video demonstrates how to check the running containers in Docker and interact with them.
- 🔗 The presenter shares a workflow in n8n that uses PostgreSQL for chat memory, Quadrant for the vector database, and llama for the LLM and embedding model.
- 📈 The video also covers setting up a pipeline to ingest files from Google Drive into the local vector database.
- 📝 Custom code is provided to avoid duplicate vectors in the vector database when updating documents.
- 📱 The presenter plans to expand the setup in the future with additional features like caching and possibly a self-hosted frontend.
Q & A
What is the main topic of the video?
-The main topic of the video is about setting up a local AI infrastructure using a package developed by the n8n team, which includes components like llama for the LLM, Quadrant for the vector database, Postgres for the SQL database, and n8n for workflow automations.
Why is the presenter excited about this package?
-The presenter is excited because the package is a comprehensive solution for local AI needs, making it easy to set up and extend, and it includes powerful open-source models that can compete with closed-source models.
What are the key components included in the package?
-The package includes llama for the LLM, Quadrant for the vector database, Postgres for the SQL database, and n8n for workflow automations.
What is the purpose of using Postgres in this setup?
-Postgres is used in the setup for the SQL database needs and to serve as the chat memory for AI agents.
How does the presenter plan to extend the package?
-The presenter plans to extend the package by adding components like Redis for caching, a self-hosted Superbase for authentication, and possibly a frontend. They also consider incorporating best practices for LLMs and n8n workflows.
What are the prerequisites for setting up this local AI infrastructure?
-The prerequisites include having Git and Docker installed, with Docker Desktop also recommended for its Docker Compose feature.
What is the role of n8n in this local AI setup?
-n8n is used to tie together the different components of the local AI infrastructure with workflow automations.
How does the presenter suggest customizing the Docker Compose file?
-The presenter suggests adding a line to expose the Postgres port and pulling an additional olama embedding model to customize the Docker Compose file.
What is the significance of the 'env' file in this setup?
-The 'env' file is significant as it contains credentials for services like Postgres and n8n Secrets, which are crucial for customizing the setup to the user's needs.
How does the presenter demonstrate the functionality of the local AI setup?
-The presenter demonstrates the functionality by creating a fully local Rag AI agent within n8n, using Postgres for chat memory, Quadrant for the vector database, and olama for the LLM and embedding model.
What is the importance of the custom node in the n8n workflow?
-The custom node is important for managing the ingestion of documents into the vector database by ensuring there are no duplicate vectors, which could confuse the LLM.
Outlines
🚀 Introduction to Local AI Package
The speaker is excited to introduce a comprehensive package for local AI developed by the n8n team. This package includes components like llama for the LLMs quadrant, vector database, PostgreSQL for SQL database needs, and n8n for workflow automations. The video aims to guide viewers on setting up this package quickly and extending its capabilities to create a fully functional local AI agent. The speaker emphasizes the growing accessibility and power of open-source AI models, suggesting that now is the time to invest in local AI infrastructure.
🛠️ Setting Up the Local AI Environment
The speaker provides a step-by-step guide to setting up the local AI environment. The process begins with downloading the code from the GitHub repository using a git clone command. Afterward, the speaker recommends using Visual Studio Code to edit the necessary files. The video covers the installation of dependencies like git and Docker, and the use of Docker Compose to bring together services like PostgreSQL, quadrant, and llama. The speaker also points out the need to edit the environment variable file to set up credentials and secrets for services like PostgreSQL and n8n.
🔧 Customizing Docker Compose for Local AI
The speaker discusses the need to customize the Docker Compose file to ensure proper functionality. This includes exposing the PostgreSQL port to enable its use within n8n workflows and pulling an additional llama embedding model to facilitate the use of llama for vector databases. The speaker provides a detailed walkthrough of the necessary code changes and mentions providing a customized version of the Docker Compose file for viewers' convenience.
🌐 Accessing and Managing Local AI Services
The speaker demonstrates how to access and manage the various local AI services using Docker. They show how to view running containers, check the output, and even execute Linux commands within each container. The video also covers setting up a fully local rag AI agent within n8n, using PostgreSQL for chat memory, quadrant for the vector database, and llama for the LLM and embedding model. The speaker provides a detailed explanation of the workflow and how to test the agent using a chat widget.
📈 Future Expansions and Conclusion
In the final part, the speaker talks about future plans to expand the local AI setup. They mention potential additions like Redis for caching, a self-hosted super base instead of vanilla PostgreSQL, and possibly incorporating frontend elements. The speaker also expresses excitement about the potential of the local AI stack and encourages viewers to like and subscribe for more content on the topic.
Mindmap
Keywords
💡Local AI
💡n8n
💡Docker
💡Llama
💡Vector Database
💡Postgress
💡Rag AI Agent
💡Workflow Automation
💡Environment Variable File
💡Embeddings
💡Google Drive
Highlights
Introduction of a comprehensive package for local AI developed by the n8n team.
Excitement about the ease of setup and the potential of the package for local AI.
The package includes components like llama, quadrant for vector database, postgress for SQL, and n8n for workflow automation.
Emphasis on the accessibility and power of open-source AI models like llama.
Instructions on setting up the package in minutes.
The GitHub repository for the self-hosted AI starter kit by n8n contains essential files for setup.
Clarification that the official instructions are lacking and will be improved upon in the video.
Prerequisites for setup include having git and Docker installed.
Downloading the repository is the first step in the setup process.
Editing the environment variable file is necessary to set up credentials for services like postgress.
Customizations to the Docker compose file are required to expose postgress port and pull an olama embedding model.
Different Docker compose commands are provided based on the user's system architecture.
A live demonstration of the Docker containers running the local AI services.
Introduction to creating a fully local rag AI agent within n8n.
Explanation of the workflow for using postgress as chat memory and quadrant as a vector database.
Details on how to use olama for embeddings and parsing responses from rag.
Demonstration of the custom code used to prevent duplicate vectors in the vector database.
A test of the AI agent showing its ability to retrieve information from the knowledge base.
Plans for future enhancements to the local AI stack, including caching and authentication.
Transcripts
have you ever wished for a single
package that you could easily install
that has everything you need for local
AI well I have good news for you today
because I have exactly what you are
looking for I have actually never been
so excited to make a video on something
before today I'm going to show you an
incredible package for local AI
developed by the n8n team and this thing
has it all it's got old llama for the
llms quadrant for the vector database
postgress for the SQL database and then
n8n to tie it Al together with workflow
automations this thing is absolutely
incredible and I'm going to show you how
to set it up in just minutes then I'll
even show you how to extend it to make
it better and use it to create a full
rag AI agent in n8n so stick around
because I have a lot of value for you
today running your own AI infrastructure
is the way of the future especially
because of how accessible is becoming
and because open-source models like
llama are getting to the point where
they're so powerful that they're
actually able to compete with close
Source models like GPT and clad so now
is the time to jump on this and what I'm
about to show you is an excellent start
to doing so and at the end of this video
I'll even talk about how I'm going to
extend this package in the near future
just for you to make it even better all
right so here we are in the GitHub
repository for the self-hosted AI
starter kit by n8n now this repo is
really really basic and I love it there
are basically just two files that we
have to care about here we have our
environment variable file where we'll
set credentials for things like
postgress and then we have Docker
compose the caml file here where we'll
basically be bringing in everything
together like postgress quadrant and
olama to have a single package for our
local AI now the first thing that I want
to mention here is that this read me has
instructions for how to install
everything yourself but honestly it's
quite lacking and there's a couple of
holes that I want to fill in here with
ways to extend it to really make it what
I think that you need and so I'll go
through that a little bit and we'll
actually get this installed on our
computer now there are a couple of
dependencies before you start basically
you just need git and Docker so I'd
recommend installing GitHub desktop and
then Docker desktop as well because this
also has Docker compose with it which is
what we need to bring everything
together for one package so with that we
can go ahead and get started downloading
this on our computer so the first thing
you want to do to download this code is
copy the get clone command here with the
URL of the repository you'll go into a
terminal then and then paste in this
command for me I've already cloned this
that's why I get this error message but
you're going to get this code downloaded
on your computer and then you can change
your directory into this new repository
that you've pulled and so with this we
can now go and edit the files in any
editor of our choice I like using VSS
code and so if you have VSS code as well
you can just type in code Dot and this
is going to pull up everything in Visual
Studio code now the official
instructions in the readme that we just
saw would tell you at this point to run
everything with the docker compos post
command now that is not actually the
right Next Step I'm not really sure why
they say that cuz we have to actually go
and edit a couple of things in the code
to make it customized for us and that
starts with the EnV file so you're going
to want to go into your EnV file I've
just made a env. example file in this
case because I already have my
credentials set up so you'll go into
your EnV and then set up your postgress
username and password the database name
and then also a couple of n8n Secrets
these can be whatever you want just make
sure that they are very very here and
basically just a long alpha numeric
string and then with that we can go into
our Docker compose file and here's where
I want to make a couple of extensions to
really fill in the gaps so the couple of
things that were missing in the original
Docker compose file first of all for
some reason the postgress container
doesn't have the port exposed by default
so you can't actually go and use
postgress as your database in an NN
workflow I think n uses postgress
internally which is why it's set up like
that initially but we want to actually
be a to use postgress for our chat
memory for our agents and so I'm going
to show you how to do that basically all
you have to do is go down to the
postgress service here and then just add
these two lines of code right here ports
and then just a single item where we
have 5432 map to the port 5432 inside
the container and that way we can go
Local Host 5432 and access postgress so
that is super super important otherwise
we won't actually be able to access it
within an NA end workflow we're going to
be doing that later when we build the
rag AI agent now the other thing that we
want to do is we want to use olama for
our embeddings for our Vector database
as well now the base command when we
initialize olama is just this part right
here so we sleep for 3 seconds and then
we pull llama 3.1 with oama so that's
why we have llama 3.1 Available To Us by
default but what I've added here is
another line to pull one of the olama
embedding models and we need this if we
want to be able to use AMA for our
Rag and so I've added this line as well
that is very very key so that is
literally everything that you have to
change in the code to get this to work
and I'll even have a link in the
description of this video to my version
of this you can pull that directly if
you want to have all the customizations
that we just went over here and with
that we can go ahead and actually start
it with Docker compose and so the
installation instructions in the readme
are actually kind of useful here because
there's a slightly different Docker
compose command that you want to run
based on your architecture so if you
have a Nvidia G
you can follow these instructions which
are a bit more complicated but if you
want to you can and then you can run
with a GPU Nvidia profile and then if
you are a Mac User you follow this
Command right here and then for everyone
else like what I'm going to use in this
case even though I have a Nvidia GPU
I'll just keep it simple with Docker
compose d-profile CPU up and so we'll
copy this command go into our terminal
here and paste it in and in my case I
have already created all these
containers and so it's going to run
really really fast for me but in your
case it's going to have to pull each of
the images for olama postgress n8n and
quadrant and then start them all up and
it'll take a little bit because I also
have to do things like pulling llama 3.1
for the old llama container and so in my
case it's going to blast through this
pretty quick because it's already done a
lot of this I did that on purpose so it
can be a quicker walkthrough for you
here um but you can see all the
different containers the different
colors here that are running everything
to set me up for each of the different
services and so like right here for
example it pulled llama 3.1 and then
right here it pulled the embedding model
that I chose from AMA as well um and so
at this point it's basically done so I'm
going to pause here and come back when
everything is ready all right so
everything is good to go and now I'm
going to actually take you in a Docker
so we can see all of this running live
so you're going to want to open up your
Docker desktop and then you'll see one
record here for the self-hosted AI
starter kit you can click on this button
on the left hand side to expand it and
then we can see every container that is
currently running or ran for the setup
so they're going to four containers each
running for one of our different local
AI services and we can actually click
into each one of them which is super
cool because we can see the output of
each container and even go to the exec
tab to run Linux commands within each of
these containers and so you can actually
do things in real time as well without
having to restart the containers you can
go into the postgress container and run
commands to query your tables and stuff
you can go into actually I'll show you
this really quick you can go into the
olama container and you can pull in real
time like if I want to go to exec here I
can do AMA pull llama
3.1 if I can spell it right 70b so I can
pull models in real time and have those
updated and available to me in n8n
without having to actually restart
anything which is super super cool all
right so now is the really really fun
part because we get to use all the local
infrastructure that we spun up just now
to create a fully local rag AI agent
within n8n and so to access your new
self-hosted n8n you can just go to Local
Host Port 5678 and the way that you know
that this is the URL is either through
the docker logs for your n container or
in the readme that we went over um that
was in the GitHub repository we cloned
and with that we can dive into this
workflow that I created to use postgress
for the chat memory quadrant for Rag and
olama for the llm and the embedding
model and so this is a full rag AI agent
that I've already built out I don't want
to build it from scratch just because I
want this to be a quicker smooth walk
through for you but I'll still go step
by step through everything that I set up
here and so that you can understand it
for yourself and also just steal this
from me CU I'm going to have this in the
description link as well so you can pull
this workflow and bring it into your own
n8n instance and so with that we can go
ahead and get started so there are two
parts to this workflow first of all we
have the agent itself with the chat
interaction here so this chat widget is
how we can interact with our agent and
then we also have the workflow that is
going to bring files from Google Drive
into our knowledge base with quadrant
and so I'll show the agent first and
then I'll dive very quickly into how I
have this pipeline set up to pull files
in from a Google drive folder into my
knowledge base so we have the trigger
that I just mentioned there where we
have our chat input and that is fed
directly into this AI agent where we
hook up all the different local stuff
and so first of all we have our olama
chat model and so I'm referencing llama
3.1 colon latest which is the 8 billion
parameter model but if you want to do an
AMA PLL Within the container like I
showed you how to do you can use
literally any olama llm right here it is
just so so simple to set up and then for
the credentials here it is very easy you
just have to put in this base URL right
here it is so important that for the URL
you use
HTTP and instead of Local Host you
reference host. doer. internal otherwise
it will not work and then the port for
Alama is if you don't change it
11434 and you can get this port either
in the Docker compost file or in the
logs for the AMA container you'll see
this in a lot of places and so with that
we've got our llm set up for this agent
and then for the memory of course we're
going to use postgress and so I'll click
into this and we're just going to have
any kind of table name you have here and
N will create this automatically in your
postgress database and it'll get the
session ID from the previous node and
then for the credentials here this is
going to be based on what you set in
yourb file so we have our host which is
host. doer. internal again just like
with AMA and then the database name user
and password all three of those you
defined in your EnV file that we went
over earlier and the port for postgress
is
5432 and so with that we've got our
local chat memory set up it is that
simple and so we can move on to the last
part of this agent which is the tool for
rag so we have the vector store tool
that we attach to our agent and then we
hook in our quadrant Vector store for
this and so we're just going to retrieve
any documents based on the query that
comes into our agent and then for the
credentials for Quadrant we just have an
API key which this was filled in for me
by default so I hope it is for you as
well I think it's just the password for
the NN instance and then for the
quadrant URL this should look very very
familiar HTTP host. doer. internal and
then the port for Quadrant is 6333 again
you can get this from the docker compose
file because we have to expose that Port
make it available or you can get it from
the quadrant logs as well
and so one other thing that I want to
show that is so so cool with hosting
quadrant locally is if you go to local
hostport
6333 like I have right here you can see
in the top left slash dashboard it's
going to take you to your very own
self-hosted quadrant dashboard where you
can see all your collections your
knowledge base basically and you can see
all the different vectors that you have
in there you can click into visualize
and I can actually go and see all my
different vectors which this is a
document that I already have inserted as
I was testing things um so you can see
all the metadata the contents of each
chunk it is so so cool so we'll go back
to this in a little bit here but just
know that like you have so much
visibility into your own quadrant
instance and you can even go and like
run your own queries to uh get
collections or delete vectors or do a
search uh it's just really awesome so
yeah hosting quadrant is a beautiful
thing um and so with that we have our
quadrant Vector store and then we're
using olama for embeddings using that
embedding model that I pulled that I
added to the docker compost file and
then we're just going to use llama 3.1
again to parse the responses that we get
from rag when we do our lookups so that
is everything for our agent and so we'll
test this in a little bit but first I
want to actually show you the workflow
for ingesting files into our knowledge
base and so the way that works is we
have two triggers here basically
whenever a file is created in a specific
folder in Google Drive or if a file is
updated in that same folder we want to
run this pipeline to download the file
and put it into our quadrant Vector
database running locally and so that
folder that I have right here is this
meeting notes folder in my Google Drive
and specifically the document that I'm
going to use for testing purposes here
is these fake meeting notes that I made
I just generated something really really
silly here about a company that is
selling robotic pets and AI startup um
and so we're going to use this document
for our rag I'm not going to do a bunch
bunch of different documents um because
I want to keep this really simple right
now but you can definitely do that and
the quadrant Vector database can handle
that but for now I'm just using this
single document and so I'll walk through
step by step what this flow actually
looks like to ingest this into the
vector database and so first of all I'm
going to fetch a test event which is
going to be the creation of this meeting
Note file that I just showed you and
then we're going to feed that into this
node here which is going to extrapolate
a couple of key pieces of information
including the file ID and the folder ID
and so once we have that I'm going to go
on to this next step right here and this
is a very very important step okay let
me just stop here for a second there are
a lot of rag tutorials with n8n on
YouTube that miss this when you have
this Step at the end here I'm just going
to skip to the end really quick whether
this is super base quadrant pine cone it
doesn't matter when you have this
inserter it is not an upsert it is just
an insert and so what that means means
is if you reinsert a document you're
actually going to have duplicate vectors
for that document so if I update a
document in Google Drive and it
reinserts the vectors into my quadrant
Vector database I'm going to have the
old vectors for the first time I
ingested my document and then new
vectors for when I updated the file it
does not get rid of the old files or
update the vectors in place that is so
important to keep in mind and so I'm
giving a lot of value to you right here
by including this node and it's actually
custom code because there's not a way to
do it without code in n8n but it is all
good because you can just copy this from
me I'm going to have a link to this
workflow in the description like I said
so you can just download this and bring
it into your own n8n take my code here
which basically just uses Lang chain to
connect to my quadrant Vector store get
all of the vector IDs where the metadata
file ID is equal to the ID of the file
I'm currently ingesting and then it just
deletes those vectors so basically we
clear everything that's currently in the
vector database for this file so that we
can reinsert it and make sure that we
have zero duplicates that is so so
important because you don't want
different versions of your file existing
at the same time in your knowledge base
that will confuse the heck out of your
llm and so this is a very very important
step and so I'll run this as well and
that's going to delete everything so I
can even go back to quadrant here go to
my Collections and you can see now that
this number was nine when I first showed
this quadrant dashboard and now it is
zero but it's going to go back up to 9
when I finish this workflow so next up
we're going to download this Google
drive
file nice and simple uh then we're going
to extract the text from it and so this
it doesn't matter if it's a PDF a CSP a
Google doc it'll take the file and get
the raw text from it and then we're
going to insert it into our quadrant
Vector store and so now I'm going to run
test step here and we're going to go
back to the UI after it's done doing
these insertions you can see here nine
items because it chunked up my document
so we go back here and I'll refresh it
right now it's zero I'll refresh and
there we go boom we're back up to nine
chunks and the reason there's so many
chunks for such a small document is
because if we go to my chunk size here
in my recursive character text splitter
I have a chunk size of 100 so every
single time I put in a document it's
going to get split up into 100 character
chunks so I want to keep it small just
because I'm running llama 3 .1 locally I
don't have the most powerful computer
and so I want my prompts to be small so
I'm keeping my context lower by having
smaller chunk sizes and not returning a
lot of documents when I perform Rag and
so the other thing that I wanted to show
really quickly here is my document data
loader and so or my default data loader
I'm adding two pieces of metadata here
the file ID and the folder ID the more
important one right here is the file ID
because that is how I know that a vector
is tied to a specific document so I use
that in that other step right here to
delete the old document vectors before I
insert the new one so that's how I make
that connection there so that's kind of
the most in-depth part of this
walkthrough is how that all works and
having this custom code here but just
know that this is so so important so
just take this from me I hope that it
makes sense to an extent I spent a lot
of time making this work for you um so
yeah with that that is everything we've
got our agent fully set up everything
ingested in uh we have the document
currently in the knowledge base CU I ran
through that step by step and so now we
can go ahead and test this thing so I'm
going to go to the chat widget here
actually I'm going to save it first and
then go to the chat widget and then I'll
ask it a question that it can only
answer if it actually has the document
in the knowledge base and can retrieve
it so I'll say what is the ad campaign
focusing on and because this is llama
3.1 running locally it's going to
actually take a little bit to get a
response because I don't have the BPS
computer so I'm going to pause and come
back when it has an answer for me all
right so we got an answer from llama 3.1
and this is looking pretty good it's a
little bit awkward at the start of the
response here uh but this is just the
raw output without any instructions from
me to the model on how to format a
response and so you can very very easily
fix this by just adding to the system
prompt for the llm and telling it how to
respond with the information it's given
from rag but overall it does have the
right answer and it's talking about
robotic pets which obviously it is only
going to get that if it's using regag on
the meaning notes document that I have
uploaded through my Google Drive so this
is working absolutely beautifully now I
would probably want to do a lot more
testing with this whole setup U but just
to keep things simple right now I'm
going to leave it at this as a simple
example um but yeah I would encourage
you to just take this forward keep
working on this agent and um yeah it's
fully fully local it is just a beautiful
thing so I hope that this whole local AI
setup is just as cool for you as it is
for me because I have been having a
blast with this and I will continue to
as I keep expanding on it so just just
as I promised in the start of the video
I want to talk a little bit about how
I'm planning on expanding this in the
future to make it even better cuz here's
the thing this whole stack that I showed
here is a really good starting point but
there's some things I want to add on to
it as well to make it even more robust
things like redis for caching or a
self-hosted super base instead of the
vanilla postgress CU then it can handle
things like authentication as well maybe
even turning this into a whole local AI
Tech stack that would even include
things like the front end as well or
maybe baking in best practices for red
and llms or na end workflows for that to
make this more of like a template as
well to actually make it really really
easy to get started with local AI so I
hope that you're excited about that if
you are or if you found this video just
helpful in general getting you set up
with your local AI Tech stack I would
really appreciate a like and a subscribe
and with that I will see you in the next
video
Weitere ähnliche Videos ansehen
5.0 / 5 (0 votes)