Run ALL Your AI Locally in Minutes (LLMs, RAG, and more)

Cole Medin
15 Sept 202420:19

Summary

TLDRThe video introduces a comprehensive package for setting up local AI infrastructure, developed by n8n. It includes tools like llama for LLMs, Quadrant for vector databases, and PostgreSQL for SQL databases. The presenter guides viewers through installation using Docker Compose and customizing the setup for workflow automations in n8n. The script also covers extending the package for a full Rag AI agent, showcasing its capabilities and future expansion plans.

Takeaways

  • 🎉 The video introduces an exciting package for local AI developed by the n8n team.
  • 🛠️ The package includes components like llama for LLMs, Quadrant for vector databases, PostgreSQL for SQL databases, and n8n for workflow automation.
  • 🌐 The presenter is enthusiastic about the potential of running your own AI infrastructure, especially with powerful open-source models like llama.
  • 📚 The GitHub repository for the self-hosted AI starter kit is basic, with key files being the environment variable file and the Docker Compose file.
  • 🔑 Before starting, ensure you have dependencies like git and Docker installed.
  • 💻 Clone the repository and set up your environment variables for services like PostgreSQL.
  • 📝 The Docker Compose file needs modifications for exposing necessary ports and pulling specific models for tools like llama.
  • 🖥️ Detailed instructions are provided for different system architectures, including those with Nvidia GPUs and Mac users.
  • 🔍 The video demonstrates how to check the running containers in Docker and interact with them.
  • 🔗 The presenter shares a workflow in n8n that uses PostgreSQL for chat memory, Quadrant for the vector database, and llama for the LLM and embedding model.
  • 📈 The video also covers setting up a pipeline to ingest files from Google Drive into the local vector database.
  • 📝 Custom code is provided to avoid duplicate vectors in the vector database when updating documents.
  • 📱 The presenter plans to expand the setup in the future with additional features like caching and possibly a self-hosted frontend.

Q & A

  • What is the main topic of the video?

    -The main topic of the video is about setting up a local AI infrastructure using a package developed by the n8n team, which includes components like llama for the LLM, Quadrant for the vector database, Postgres for the SQL database, and n8n for workflow automations.

  • Why is the presenter excited about this package?

    -The presenter is excited because the package is a comprehensive solution for local AI needs, making it easy to set up and extend, and it includes powerful open-source models that can compete with closed-source models.

  • What are the key components included in the package?

    -The package includes llama for the LLM, Quadrant for the vector database, Postgres for the SQL database, and n8n for workflow automations.

  • What is the purpose of using Postgres in this setup?

    -Postgres is used in the setup for the SQL database needs and to serve as the chat memory for AI agents.

  • How does the presenter plan to extend the package?

    -The presenter plans to extend the package by adding components like Redis for caching, a self-hosted Superbase for authentication, and possibly a frontend. They also consider incorporating best practices for LLMs and n8n workflows.

  • What are the prerequisites for setting up this local AI infrastructure?

    -The prerequisites include having Git and Docker installed, with Docker Desktop also recommended for its Docker Compose feature.

  • What is the role of n8n in this local AI setup?

    -n8n is used to tie together the different components of the local AI infrastructure with workflow automations.

  • How does the presenter suggest customizing the Docker Compose file?

    -The presenter suggests adding a line to expose the Postgres port and pulling an additional olama embedding model to customize the Docker Compose file.

  • What is the significance of the 'env' file in this setup?

    -The 'env' file is significant as it contains credentials for services like Postgres and n8n Secrets, which are crucial for customizing the setup to the user's needs.

  • How does the presenter demonstrate the functionality of the local AI setup?

    -The presenter demonstrates the functionality by creating a fully local Rag AI agent within n8n, using Postgres for chat memory, Quadrant for the vector database, and olama for the LLM and embedding model.

  • What is the importance of the custom node in the n8n workflow?

    -The custom node is important for managing the ingestion of documents into the vector database by ensuring there are no duplicate vectors, which could confuse the LLM.

Outlines

00:00

🚀 Introduction to Local AI Package

The speaker is excited to introduce a comprehensive package for local AI developed by the n8n team. This package includes components like llama for the LLMs quadrant, vector database, PostgreSQL for SQL database needs, and n8n for workflow automations. The video aims to guide viewers on setting up this package quickly and extending its capabilities to create a fully functional local AI agent. The speaker emphasizes the growing accessibility and power of open-source AI models, suggesting that now is the time to invest in local AI infrastructure.

05:00

🛠️ Setting Up the Local AI Environment

The speaker provides a step-by-step guide to setting up the local AI environment. The process begins with downloading the code from the GitHub repository using a git clone command. Afterward, the speaker recommends using Visual Studio Code to edit the necessary files. The video covers the installation of dependencies like git and Docker, and the use of Docker Compose to bring together services like PostgreSQL, quadrant, and llama. The speaker also points out the need to edit the environment variable file to set up credentials and secrets for services like PostgreSQL and n8n.

10:01

🔧 Customizing Docker Compose for Local AI

The speaker discusses the need to customize the Docker Compose file to ensure proper functionality. This includes exposing the PostgreSQL port to enable its use within n8n workflows and pulling an additional llama embedding model to facilitate the use of llama for vector databases. The speaker provides a detailed walkthrough of the necessary code changes and mentions providing a customized version of the Docker Compose file for viewers' convenience.

15:03

🌐 Accessing and Managing Local AI Services

The speaker demonstrates how to access and manage the various local AI services using Docker. They show how to view running containers, check the output, and even execute Linux commands within each container. The video also covers setting up a fully local rag AI agent within n8n, using PostgreSQL for chat memory, quadrant for the vector database, and llama for the LLM and embedding model. The speaker provides a detailed explanation of the workflow and how to test the agent using a chat widget.

20:03

📈 Future Expansions and Conclusion

In the final part, the speaker talks about future plans to expand the local AI setup. They mention potential additions like Redis for caching, a self-hosted super base instead of vanilla PostgreSQL, and possibly incorporating frontend elements. The speaker also expresses excitement about the potential of the local AI stack and encourages viewers to like and subscribe for more content on the topic.

Mindmap

Keywords

💡Local AI

Local AI refers to the practice of running artificial intelligence models and applications directly on local machines or personal devices, as opposed to relying on cloud-based services. In the video, the presenter is excited about a package that enables local AI development, emphasizing the growing accessibility and power of open-source AI models that can now compete with closed-source models. This concept is central to the video's theme of setting up a personal AI infrastructure.

💡n8n

n8n is an open-source workflow automation tool that facilitates the connection and automation of various services and databases without the need for complex coding. In the script, n8n is highlighted as a crucial component in tying together different elements of the local AI setup, including the workflow automations that create a full Rag AI agent.

💡Docker

Docker is a platform that uses containerization technology to simplify the deployment of applications. In the video script, Docker is used to package the local AI setup, making it easy to install and run the various services needed for the AI infrastructure. Docker Compose is specifically mentioned as part of the setup process.

💡Llama

Llama, in the context of the video, refers to an open-source AI model similar to large language models like GPT. The script mentions 'llama' as one of the components of the local AI package, indicating its role in providing language processing capabilities to the AI agent being set up.

💡Vector Database

A vector database is a type of database designed to store and retrieve vector representations of data, which is useful for AI applications that rely on machine learning models. In the script, 'Quadrant' is mentioned as the vector database used for the Rag AI agent, highlighting its importance in managing the knowledge base for the AI.

💡Postgress

Postgress, or PostgreSQL, is an open-source relational database management system. In the video, it is part of the local AI package and is used for handling SQL database needs within the AI infrastructure. The script provides instructions on how to set up PostgreSQL within the Docker environment.

💡Rag AI Agent

Rag stands for 'Retrieval-Augmented Generation' and refers to a type of AI model that retrieves information to assist in generating responses. In the video, the presenter demonstrates how to create a full Rag AI agent using the local infrastructure, which involves using tools like n8n, PostgreSQL, and Quadrant to manage the agent's memory and knowledge base.

💡Workflow Automation

Workflow automation involves the use of software to automate procedural tasks. In the script, workflow automation is central to setting up the local AI, as it involves creating automated processes that allow the AI agent to function effectively, such as managing chat interactions and retrieving data from a knowledge base.

💡Environment Variable File

An environment variable file contains settings that configure the runtime environment for an application. In the video script, the presenter mentions editing this file to set up credentials for services like PostgreSQL, which is a necessary step in customizing the local AI package to the user's needs.

💡Embeddings

Embeddings in AI refer to the representation of data points (like words or documents) in a vector space, which allows for complex patterns to be identified. The script discusses using an 'olama embedding model', indicating that embeddings are used to transform data into a form that can be effectively utilized by the AI agent for tasks like Rag.

💡Google Drive

Google Drive is a file storage and synchronization service. In the context of the video, it is used as a source for documents that are ingested into the AI's knowledge base. The script describes a workflow that monitors Google Drive for new or updated files and incorporates them into the local AI system.

Highlights

Introduction of a comprehensive package for local AI developed by the n8n team.

Excitement about the ease of setup and the potential of the package for local AI.

The package includes components like llama, quadrant for vector database, postgress for SQL, and n8n for workflow automation.

Emphasis on the accessibility and power of open-source AI models like llama.

Instructions on setting up the package in minutes.

The GitHub repository for the self-hosted AI starter kit by n8n contains essential files for setup.

Clarification that the official instructions are lacking and will be improved upon in the video.

Prerequisites for setup include having git and Docker installed.

Downloading the repository is the first step in the setup process.

Editing the environment variable file is necessary to set up credentials for services like postgress.

Customizations to the Docker compose file are required to expose postgress port and pull an olama embedding model.

Different Docker compose commands are provided based on the user's system architecture.

A live demonstration of the Docker containers running the local AI services.

Introduction to creating a fully local rag AI agent within n8n.

Explanation of the workflow for using postgress as chat memory and quadrant as a vector database.

Details on how to use olama for embeddings and parsing responses from rag.

Demonstration of the custom code used to prevent duplicate vectors in the vector database.

A test of the AI agent showing its ability to retrieve information from the knowledge base.

Plans for future enhancements to the local AI stack, including caching and authentication.

Transcripts

play00:00

have you ever wished for a single

play00:01

package that you could easily install

play00:03

that has everything you need for local

play00:05

AI well I have good news for you today

play00:08

because I have exactly what you are

play00:09

looking for I have actually never been

play00:12

so excited to make a video on something

play00:14

before today I'm going to show you an

play00:15

incredible package for local AI

play00:18

developed by the n8n team and this thing

play00:20

has it all it's got old llama for the

play00:23

llms quadrant for the vector database

play00:26

postgress for the SQL database and then

play00:28

n8n to tie it Al together with workflow

play00:31

automations this thing is absolutely

play00:33

incredible and I'm going to show you how

play00:35

to set it up in just minutes then I'll

play00:37

even show you how to extend it to make

play00:39

it better and use it to create a full

play00:42

rag AI agent in n8n so stick around

play00:45

because I have a lot of value for you

play00:47

today running your own AI infrastructure

play00:49

is the way of the future especially

play00:51

because of how accessible is becoming

play00:54

and because open-source models like

play00:56

llama are getting to the point where

play00:58

they're so powerful that they're

play01:00

actually able to compete with close

play01:02

Source models like GPT and clad so now

play01:05

is the time to jump on this and what I'm

play01:07

about to show you is an excellent start

play01:09

to doing so and at the end of this video

play01:11

I'll even talk about how I'm going to

play01:13

extend this package in the near future

play01:15

just for you to make it even better all

play01:18

right so here we are in the GitHub

play01:19

repository for the self-hosted AI

play01:21

starter kit by n8n now this repo is

play01:24

really really basic and I love it there

play01:27

are basically just two files that we

play01:28

have to care about here we have our

play01:30

environment variable file where we'll

play01:32

set credentials for things like

play01:33

postgress and then we have Docker

play01:35

compose the caml file here where we'll

play01:38

basically be bringing in everything

play01:39

together like postgress quadrant and

play01:41

olama to have a single package for our

play01:43

local AI now the first thing that I want

play01:46

to mention here is that this read me has

play01:48

instructions for how to install

play01:50

everything yourself but honestly it's

play01:52

quite lacking and there's a couple of

play01:53

holes that I want to fill in here with

play01:55

ways to extend it to really make it what

play01:57

I think that you need and so I'll go

play01:59

through that a little bit and we'll

play02:00

actually get this installed on our

play02:02

computer now there are a couple of

play02:03

dependencies before you start basically

play02:06

you just need git and Docker so I'd

play02:07

recommend installing GitHub desktop and

play02:10

then Docker desktop as well because this

play02:11

also has Docker compose with it which is

play02:13

what we need to bring everything

play02:15

together for one package so with that we

play02:18

can go ahead and get started downloading

play02:19

this on our computer so the first thing

play02:21

you want to do to download this code is

play02:23

copy the get clone command here with the

play02:25

URL of the repository you'll go into a

play02:28

terminal then and then paste in this

play02:30

command for me I've already cloned this

play02:32

that's why I get this error message but

play02:33

you're going to get this code downloaded

play02:35

on your computer and then you can change

play02:37

your directory into this new repository

play02:40

that you've pulled and so with this we

play02:42

can now go and edit the files in any

play02:44

editor of our choice I like using VSS

play02:46

code and so if you have VSS code as well

play02:48

you can just type in code Dot and this

play02:50

is going to pull up everything in Visual

play02:53

Studio code now the official

play02:54

instructions in the readme that we just

play02:56

saw would tell you at this point to run

play02:58

everything with the docker compos post

play03:00

command now that is not actually the

play03:02

right Next Step I'm not really sure why

play03:04

they say that cuz we have to actually go

play03:05

and edit a couple of things in the code

play03:07

to make it customized for us and that

play03:10

starts with the EnV file so you're going

play03:12

to want to go into your EnV file I've

play03:14

just made a env. example file in this

play03:17

case because I already have my

play03:18

credentials set up so you'll go into

play03:20

your EnV and then set up your postgress

play03:22

username and password the database name

play03:25

and then also a couple of n8n Secrets

play03:27

these can be whatever you want just make

play03:28

sure that they are very very here and

play03:30

basically just a long alpha numeric

play03:32

string and then with that we can go into

play03:34

our Docker compose file and here's where

play03:36

I want to make a couple of extensions to

play03:38

really fill in the gaps so the couple of

play03:40

things that were missing in the original

play03:42

Docker compose file first of all for

play03:45

some reason the postgress container

play03:47

doesn't have the port exposed by default

play03:49

so you can't actually go and use

play03:51

postgress as your database in an NN

play03:53

workflow I think n uses postgress

play03:56

internally which is why it's set up like

play03:58

that initially but we want to actually

play03:59

be a to use postgress for our chat

play04:02

memory for our agents and so I'm going

play04:03

to show you how to do that basically all

play04:05

you have to do is go down to the

play04:08

postgress service here and then just add

play04:10

these two lines of code right here ports

play04:12

and then just a single item where we

play04:14

have 5432 map to the port 5432 inside

play04:18

the container and that way we can go

play04:19

Local Host 5432 and access postgress so

play04:23

that is super super important otherwise

play04:25

we won't actually be able to access it

play04:26

within an NA end workflow we're going to

play04:28

be doing that later when we build the

play04:29

rag AI agent now the other thing that we

play04:32

want to do is we want to use olama for

play04:34

our embeddings for our Vector database

play04:36

as well now the base command when we

play04:39

initialize olama is just this part right

play04:41

here so we sleep for 3 seconds and then

play04:45

we pull llama 3.1 with oama so that's

play04:47

why we have llama 3.1 Available To Us by

play04:50

default but what I've added here is

play04:52

another line to pull one of the olama

play04:55

embedding models and we need this if we

play04:57

want to be able to use AMA for our

play05:00

Rag and so I've added this line as well

play05:02

that is very very key so that is

play05:04

literally everything that you have to

play05:06

change in the code to get this to work

play05:07

and I'll even have a link in the

play05:09

description of this video to my version

play05:11

of this you can pull that directly if

play05:12

you want to have all the customizations

play05:14

that we just went over here and with

play05:16

that we can go ahead and actually start

play05:18

it with Docker compose and so the

play05:20

installation instructions in the readme

play05:22

are actually kind of useful here because

play05:23

there's a slightly different Docker

play05:25

compose command that you want to run

play05:26

based on your architecture so if you

play05:28

have a Nvidia G

play05:30

you can follow these instructions which

play05:31

are a bit more complicated but if you

play05:33

want to you can and then you can run

play05:35

with a GPU Nvidia profile and then if

play05:38

you are a Mac User you follow this

play05:40

Command right here and then for everyone

play05:41

else like what I'm going to use in this

play05:43

case even though I have a Nvidia GPU

play05:45

I'll just keep it simple with Docker

play05:47

compose d-profile CPU up and so we'll

play05:50

copy this command go into our terminal

play05:52

here and paste it in and in my case I

play05:55

have already created all these

play05:56

containers and so it's going to run

play05:58

really really fast for me but in your

play06:00

case it's going to have to pull each of

play06:01

the images for olama postgress n8n and

play06:05

quadrant and then start them all up and

play06:07

it'll take a little bit because I also

play06:08

have to do things like pulling llama 3.1

play06:10

for the old llama container and so in my

play06:13

case it's going to blast through this

play06:14

pretty quick because it's already done a

play06:15

lot of this I did that on purpose so it

play06:17

can be a quicker walkthrough for you

play06:19

here um but you can see all the

play06:21

different containers the different

play06:22

colors here that are running everything

play06:24

to set me up for each of the different

play06:26

services and so like right here for

play06:27

example it pulled llama 3.1 and then

play06:30

right here it pulled the embedding model

play06:32

that I chose from AMA as well um and so

play06:35

at this point it's basically done so I'm

play06:36

going to pause here and come back when

play06:38

everything is ready all right so

play06:39

everything is good to go and now I'm

play06:41

going to actually take you in a Docker

play06:42

so we can see all of this running live

play06:45

so you're going to want to open up your

play06:46

Docker desktop and then you'll see one

play06:48

record here for the self-hosted AI

play06:51

starter kit you can click on this button

play06:52

on the left hand side to expand it and

play06:54

then we can see every container that is

play06:56

currently running or ran for the setup

play06:59

so they're going to four containers each

play07:00

running for one of our different local

play07:02

AI services and we can actually click

play07:04

into each one of them which is super

play07:06

cool because we can see the output of

play07:08

each container and even go to the exec

play07:10

tab to run Linux commands within each of

play07:13

these containers and so you can actually

play07:14

do things in real time as well without

play07:16

having to restart the containers you can

play07:18

go into the postgress container and run

play07:21

commands to query your tables and stuff

play07:23

you can go into actually I'll show you

play07:24

this really quick you can go into the

play07:26

olama container and you can pull in real

play07:29

time like if I want to go to exec here I

play07:31

can do AMA pull llama

play07:35

3.1 if I can spell it right 70b so I can

play07:38

pull models in real time and have those

play07:40

updated and available to me in n8n

play07:42

without having to actually restart

play07:44

anything which is super super cool all

play07:46

right so now is the really really fun

play07:48

part because we get to use all the local

play07:50

infrastructure that we spun up just now

play07:52

to create a fully local rag AI agent

play07:55

within n8n and so to access your new

play07:57

self-hosted n8n you can just go to Local

play08:00

Host Port 5678 and the way that you know

play08:03

that this is the URL is either through

play08:05

the docker logs for your n container or

play08:08

in the readme that we went over um that

play08:10

was in the GitHub repository we cloned

play08:12

and with that we can dive into this

play08:14

workflow that I created to use postgress

play08:16

for the chat memory quadrant for Rag and

play08:19

olama for the llm and the embedding

play08:21

model and so this is a full rag AI agent

play08:24

that I've already built out I don't want

play08:25

to build it from scratch just because I

play08:27

want this to be a quicker smooth walk

play08:29

through for you but I'll still go step

play08:31

by step through everything that I set up

play08:32

here and so that you can understand it

play08:34

for yourself and also just steal this

play08:36

from me CU I'm going to have this in the

play08:38

description link as well so you can pull

play08:40

this workflow and bring it into your own

play08:41

n8n instance and so with that we can go

play08:44

ahead and get started so there are two

play08:46

parts to this workflow first of all we

play08:48

have the agent itself with the chat

play08:50

interaction here so this chat widget is

play08:52

how we can interact with our agent and

play08:54

then we also have the workflow that is

play08:56

going to bring files from Google Drive

play08:58

into our knowledge base with quadrant

play09:01

and so I'll show the agent first and

play09:03

then I'll dive very quickly into how I

play09:04

have this pipeline set up to pull files

play09:07

in from a Google drive folder into my

play09:09

knowledge base so we have the trigger

play09:11

that I just mentioned there where we

play09:12

have our chat input and that is fed

play09:15

directly into this AI agent where we

play09:17

hook up all the different local stuff

play09:19

and so first of all we have our olama

play09:21

chat model and so I'm referencing llama

play09:23

3.1 colon latest which is the 8 billion

play09:26

parameter model but if you want to do an

play09:28

AMA PLL Within the container like I

play09:30

showed you how to do you can use

play09:31

literally any olama llm right here it is

play09:34

just so so simple to set up and then for

play09:36

the credentials here it is very easy you

play09:39

just have to put in this base URL right

play09:41

here it is so important that for the URL

play09:44

you use

play09:45

HTTP and instead of Local Host you

play09:48

reference host. doer. internal otherwise

play09:51

it will not work and then the port for

play09:53

Alama is if you don't change it

play09:56

11434 and you can get this port either

play09:59

in the Docker compost file or in the

play10:01

logs for the AMA container you'll see

play10:03

this in a lot of places and so with that

play10:05

we've got our llm set up for this agent

play10:08

and then for the memory of course we're

play10:09

going to use postgress and so I'll click

play10:11

into this and we're just going to have

play10:13

any kind of table name you have here and

play10:14

N will create this automatically in your

play10:16

postgress database and it'll get the

play10:19

session ID from the previous node and

play10:21

then for the credentials here this is

play10:23

going to be based on what you set in

play10:25

yourb file so we have our host which is

play10:28

host. doer. internal again just like

play10:30

with AMA and then the database name user

play10:33

and password all three of those you

play10:35

defined in your EnV file that we went

play10:37

over earlier and the port for postgress

play10:40

is

play10:41

5432 and so with that we've got our

play10:43

local chat memory set up it is that

play10:45

simple and so we can move on to the last

play10:46

part of this agent which is the tool for

play10:49

rag so we have the vector store tool

play10:52

that we attach to our agent and then we

play10:54

hook in our quadrant Vector store for

play10:56

this and so we're just going to retrieve

play10:58

any documents based on the query that

play11:00

comes into our agent and then for the

play11:02

credentials for Quadrant we just have an

play11:04

API key which this was filled in for me

play11:06

by default so I hope it is for you as

play11:08

well I think it's just the password for

play11:10

the NN instance and then for the

play11:12

quadrant URL this should look very very

play11:15

familiar HTTP host. doer. internal and

play11:19

then the port for Quadrant is 6333 again

play11:21

you can get this from the docker compose

play11:23

file because we have to expose that Port

play11:25

make it available or you can get it from

play11:27

the quadrant logs as well

play11:30

and so one other thing that I want to

play11:31

show that is so so cool with hosting

play11:34

quadrant locally is if you go to local

play11:36

hostport

play11:38

6333 like I have right here you can see

play11:40

in the top left slash dashboard it's

play11:43

going to take you to your very own

play11:45

self-hosted quadrant dashboard where you

play11:48

can see all your collections your

play11:50

knowledge base basically and you can see

play11:52

all the different vectors that you have

play11:53

in there you can click into visualize

play11:56

and I can actually go and see all my

play11:58

different vectors which this is a

play12:00

document that I already have inserted as

play12:02

I was testing things um so you can see

play12:03

all the metadata the contents of each

play12:05

chunk it is so so cool so we'll go back

play12:08

to this in a little bit here but just

play12:10

know that like you have so much

play12:11

visibility into your own quadrant

play12:12

instance and you can even go and like

play12:14

run your own queries to uh get

play12:16

collections or delete vectors or do a

play12:18

search uh it's just really awesome so

play12:21

yeah hosting quadrant is a beautiful

play12:23

thing um and so with that we have our

play12:25

quadrant Vector store and then we're

play12:27

using olama for embeddings using that

play12:29

embedding model that I pulled that I

play12:31

added to the docker compost file and

play12:34

then we're just going to use llama 3.1

play12:36

again to parse the responses that we get

play12:38

from rag when we do our lookups so that

play12:40

is everything for our agent and so we'll

play12:43

test this in a little bit but first I

play12:45

want to actually show you the workflow

play12:47

for ingesting files into our knowledge

play12:49

base and so the way that works is we

play12:51

have two triggers here basically

play12:53

whenever a file is created in a specific

play12:56

folder in Google Drive or if a file is

play12:59

updated in that same folder we want to

play13:02

run this pipeline to download the file

play13:04

and put it into our quadrant Vector

play13:05

database running locally and so that

play13:08

folder that I have right here is this

play13:10

meeting notes folder in my Google Drive

play13:12

and specifically the document that I'm

play13:14

going to use for testing purposes here

play13:17

is these fake meeting notes that I made

play13:19

I just generated something really really

play13:20

silly here about a company that is

play13:23

selling robotic pets and AI startup um

play13:26

and so we're going to use this document

play13:27

for our rag I'm not going to do a bunch

play13:29

bunch of different documents um because

play13:31

I want to keep this really simple right

play13:32

now but you can definitely do that and

play13:34

the quadrant Vector database can handle

play13:36

that but for now I'm just using this

play13:38

single document and so I'll walk through

play13:40

step by step what this flow actually

play13:42

looks like to ingest this into the

play13:43

vector database and so first of all I'm

play13:46

going to fetch a test event which is

play13:48

going to be the creation of this meeting

play13:50

Note file that I just showed you and

play13:52

then we're going to feed that into this

play13:53

node here which is going to extrapolate

play13:55

a couple of key pieces of information

play13:58

including the file ID and the folder ID

play14:01

and so once we have that I'm going to go

play14:03

on to this next step right here and this

play14:05

is a very very important step okay let

play14:08

me just stop here for a second there are

play14:10

a lot of rag tutorials with n8n on

play14:13

YouTube that miss this when you have

play14:16

this Step at the end here I'm just going

play14:17

to skip to the end really quick whether

play14:20

this is super base quadrant pine cone it

play14:22

doesn't matter when you have this

play14:23

inserter it is not an upsert it is just

play14:27

an insert and so what that means means

play14:29

is if you reinsert a document you're

play14:31

actually going to have duplicate vectors

play14:33

for that document so if I update a

play14:35

document in Google Drive and it

play14:37

reinserts the vectors into my quadrant

play14:40

Vector database I'm going to have the

play14:42

old vectors for the first time I

play14:44

ingested my document and then new

play14:46

vectors for when I updated the file it

play14:48

does not get rid of the old files or

play14:51

update the vectors in place that is so

play14:53

important to keep in mind and so I'm

play14:56

giving a lot of value to you right here

play14:58

by including this node and it's actually

play15:00

custom code because there's not a way to

play15:02

do it without code in n8n but it is all

play15:05

good because you can just copy this from

play15:07

me I'm going to have a link to this

play15:08

workflow in the description like I said

play15:10

so you can just download this and bring

play15:12

it into your own n8n take my code here

play15:14

which basically just uses Lang chain to

play15:16

connect to my quadrant Vector store get

play15:19

all of the vector IDs where the metadata

play15:22

file ID is equal to the ID of the file

play15:25

I'm currently ingesting and then it just

play15:27

deletes those vectors so basically we

play15:29

clear everything that's currently in the

play15:31

vector database for this file so that we

play15:34

can reinsert it and make sure that we

play15:35

have zero duplicates that is so so

play15:38

important because you don't want

play15:40

different versions of your file existing

play15:42

at the same time in your knowledge base

play15:44

that will confuse the heck out of your

play15:45

llm and so this is a very very important

play15:49

step and so I'll run this as well and

play15:51

that's going to delete everything so I

play15:52

can even go back to quadrant here go to

play15:55

my Collections and you can see now that

play15:57

this number was nine when I first showed

play15:59

this quadrant dashboard and now it is

play16:00

zero but it's going to go back up to 9

play16:03

when I finish this workflow so next up

play16:05

we're going to download this Google

play16:07

drive

play16:08

file nice and simple uh then we're going

play16:11

to extract the text from it and so this

play16:13

it doesn't matter if it's a PDF a CSP a

play16:15

Google doc it'll take the file and get

play16:18

the raw text from it and then we're

play16:21

going to insert it into our quadrant

play16:23

Vector store and so now I'm going to run

play16:25

test step here and we're going to go

play16:26

back to the UI after it's done doing

play16:29

these insertions you can see here nine

play16:30

items because it chunked up my document

play16:33

so we go back here and I'll refresh it

play16:35

right now it's zero I'll refresh and

play16:38

there we go boom we're back up to nine

play16:39

chunks and the reason there's so many

play16:41

chunks for such a small document is

play16:44

because if we go to my chunk size here

play16:46

in my recursive character text splitter

play16:49

I have a chunk size of 100 so every

play16:51

single time I put in a document it's

play16:53

going to get split up into 100 character

play16:55

chunks so I want to keep it small just

play16:57

because I'm running llama 3 .1 locally I

play17:00

don't have the most powerful computer

play17:01

and so I want my prompts to be small so

play17:03

I'm keeping my context lower by having

play17:05

smaller chunk sizes and not returning a

play17:07

lot of documents when I perform Rag and

play17:10

so the other thing that I wanted to show

play17:12

really quickly here is my document data

play17:15

loader and so or my default data loader

play17:17

I'm adding two pieces of metadata here

play17:19

the file ID and the folder ID the more

play17:22

important one right here is the file ID

play17:24

because that is how I know that a vector

play17:26

is tied to a specific document so I use

play17:29

that in that other step right here to

play17:32

delete the old document vectors before I

play17:34

insert the new one so that's how I make

play17:36

that connection there so that's kind of

play17:38

the most in-depth part of this

play17:39

walkthrough is how that all works and

play17:41

having this custom code here but just

play17:42

know that this is so so important so

play17:44

just take this from me I hope that it

play17:46

makes sense to an extent I spent a lot

play17:48

of time making this work for you um so

play17:50

yeah with that that is everything we've

play17:53

got our agent fully set up everything

play17:55

ingested in uh we have the document

play17:58

currently in the knowledge base CU I ran

play17:59

through that step by step and so now we

play18:02

can go ahead and test this thing so I'm

play18:03

going to go to the chat widget here

play18:05

actually I'm going to save it first and

play18:06

then go to the chat widget and then I'll

play18:08

ask it a question that it can only

play18:09

answer if it actually has the document

play18:11

in the knowledge base and can retrieve

play18:13

it so I'll say what is the ad campaign

play18:16

focusing on and because this is llama

play18:19

3.1 running locally it's going to

play18:21

actually take a little bit to get a

play18:22

response because I don't have the BPS

play18:24

computer so I'm going to pause and come

play18:26

back when it has an answer for me all

play18:29

right so we got an answer from llama 3.1

play18:31

and this is looking pretty good it's a

play18:33

little bit awkward at the start of the

play18:35

response here uh but this is just the

play18:37

raw output without any instructions from

play18:39

me to the model on how to format a

play18:41

response and so you can very very easily

play18:43

fix this by just adding to the system

play18:45

prompt for the llm and telling it how to

play18:47

respond with the information it's given

play18:49

from rag but overall it does have the

play18:51

right answer and it's talking about

play18:52

robotic pets which obviously it is only

play18:54

going to get that if it's using regag on

play18:56

the meaning notes document that I have

play18:58

uploaded through my Google Drive so this

play19:00

is working absolutely beautifully now I

play19:02

would probably want to do a lot more

play19:04

testing with this whole setup U but just

play19:06

to keep things simple right now I'm

play19:08

going to leave it at this as a simple

play19:09

example um but yeah I would encourage

play19:11

you to just take this forward keep

play19:13

working on this agent and um yeah it's

play19:16

fully fully local it is just a beautiful

play19:18

thing so I hope that this whole local AI

play19:20

setup is just as cool for you as it is

play19:22

for me because I have been having a

play19:24

blast with this and I will continue to

play19:26

as I keep expanding on it so just just

play19:28

as I promised in the start of the video

play19:31

I want to talk a little bit about how

play19:32

I'm planning on expanding this in the

play19:33

future to make it even better cuz here's

play19:35

the thing this whole stack that I showed

play19:38

here is a really good starting point but

play19:39

there's some things I want to add on to

play19:41

it as well to make it even more robust

play19:43

things like redis for caching or a

play19:44

self-hosted super base instead of the

play19:47

vanilla postgress CU then it can handle

play19:48

things like authentication as well maybe

play19:50

even turning this into a whole local AI

play19:52

Tech stack that would even include

play19:54

things like the front end as well or

play19:56

maybe baking in best practices for red

play19:58

and llms or na end workflows for that to

play20:01

make this more of like a template as

play20:03

well to actually make it really really

play20:04

easy to get started with local AI so I

play20:07

hope that you're excited about that if

play20:08

you are or if you found this video just

play20:10

helpful in general getting you set up

play20:12

with your local AI Tech stack I would

play20:14

really appreciate a like and a subscribe

play20:16

and with that I will see you in the next

play20:18

video

Rate This

5.0 / 5 (0 votes)

الوسوم ذات الصلة
Local AIn8nLLMsDatabaseDockerWorkflowAutomationSelf-hostedTech TutorialAI Infrastructure
هل تحتاج إلى تلخيص باللغة الإنجليزية؟