Easy 100% Local RAG Tutorial (Ollama) + Full Code
Summary
TLDRThe video script details a tutorial on setting up a local, offline Retrieval-Augmented Generation (RAG) system using the AMA model. The process involves converting a PDF to text, creating embeddings, and querying the model for information extraction. The presenter demonstrates how to download and install AMA, set up the environment, and run a Python script to extract data from a PDF. They also show how to adjust parameters for better results and invite viewers to try the tutorial from a GitHub repo link provided in the description.
Takeaways
- 😀 The video demonstrates how to extract information from a PDF file using a local system.
- 📄 The presenter converted news from the previous day into a PDF and then into a text file for processing.
- 🔍 The process involves using Python scripts to handle the PDF and text data for further analysis.
- 📝 The text from the PDF is appended to a file with each chunk on a separate line for better data structure.
- 🤖 The video introduces the use of a local RAG (Retrieval-Augmented Generation) system for querying the data.
- 🧐 The system can answer questions about the content, such as statements made by Joe Biden, by pulling context from the documents.
- 💻 The setup is completely offline and requires about 70 lines of code, making it lightweight and easy to implement.
- 🔧 The video provides a tutorial on setting up the system, including downloading and installing necessary components.
- 🔗 The tutorial and code are available on a GitHub repository, which viewers can clone or fork to try for themselves.
- 🛠️ Adjustments can be made to the system, such as changing the number of top results (top K) displayed in the output.
- 📈 The system is not perfect but is good enough for the presenter's use case, suggesting it's suitable for personal or small-scale projects.
Q & A
What is the purpose of the script?
-The script demonstrates how to extract information from a PDF file and create embeddings for an offline retrieval-augmented generation (RAG) system using a local model.
What is the first step mentioned in the script for setting up the RAG system?
-The first step is to convert news from yesterday into a PDF file and then append it to a text file with each chunk on a separate line.
Why is the text from the PDF appended with each chunk on a separate line?
-Appending text with each chunk on a separate line is beneficial because it was found to work best for creating embeddings and retrieving relevant information.
What command is used to start the RAG system in the script?
-The command used to start the RAG system is 'python local rag.py'.
How does the script handle the retrieval of information about Joe Biden?
-The script uses a search query for 'what did Joe Biden say' and retrieves context from the documents, showing chunks with mentions of President Biden.
What is the significance of setting 'top K' to three in the script?
-Setting 'top K' to three means that the system will pull three different chunks of information that are most relevant to the search query.
What is the main advantage of using the described RAG system?
-The main advantage is that the RAG system is lightweight, easy to use, quick, and operates 100% locally without the need for an internet connection.
How can the user adjust the number of results retrieved by the RAG system?
-The user can adjust the 'top K' value in the script to change the number of results retrieved, for example, changing it to five for more results.
What is the recommended way to obtain the code for setting up the RAG system?
-The recommended way is to visit the GitHub repo provided in the description, clone the repository, and follow the instructions there.
What is the minimum requirement for the PDF file before it can be processed by the RAG system?
-The PDF file needs to be uploaded and processed by 'python pdf.py' to be in the correct format with each chunk on a separate line.
How can the user ensure that the RAG system is working correctly?
-The user can test the system by asking questions related to the content of the PDF, such as 'what did Joe Biden say', and checking if relevant chunks are retrieved.
Outlines
📄 PDF to Text Extraction and Local Search with AMA
The speaker discusses a process for extracting information from a PDF file and converting it into a searchable text format using a local library. They demonstrate uploading the PDF to a text file with each chunk on a separate line, which is ideal for creating embeddings. The video script includes a tutorial on setting up an offline search system using the AMA (Ask Me Anything) model, which is run locally without internet access. The setup involves downloading AMA, installing it, and running a Python script to create embeddings from the text. The speaker also shows how to adjust the system to return different numbers of search results and emphasizes the simplicity and efficiency of the process, which requires only about 70 lines of code.
🔍 Customizing the Local AMA Search System
In the second paragraph, the speaker continues the tutorial by guiding viewers on how to customize the local AMA search system. They explain how to adjust the 'top K' parameter to change the number of results returned by the system, from the default three to five for more comprehensive search results. The speaker also mentions the possibility of altering the size of the text chunks processed by the system, although they do not delve into the details in this script. The paragraph concludes with an invitation for viewers to try out the system themselves by visiting the GitHub repository, giving the project a star, and sharing it with friends. The speaker appreciates the viewers' attention and wishes them a great day.
Mindmap
Keywords
💡Local
💡Embeddings
💡Terminal
💡AMA
💡GitHub
💡Model
💡RAG
💡Top K
💡Chunk
💡Tutorial
Highlights
The speaker converted news from the previous day into a PDF file to extract information offline using a local system.
The PDF content was appended to a text file with each chunk on a separate line, which is the desired data structure.
The speaker used a Python script called 'local rag.py' to create embeddings from the text for searching and querying.
The system can retrieve context from documents when asked questions, such as what Joe Biden said, by pulling relevant text chunks.
The speaker set the 'top K' parameter to three to retrieve three different chunks of context, which works best for their use case.
The system is running 100% locally and is lightweight, using only about 70 lines of code.
The AMA (Ask Me Anything) tool was used for the local setup, which is easy to install and run.
The speaker provided a tutorial and GitHub repo link for others to set up the system themselves.
The tutorial includes cloning the GitHub repo and installing dependencies using pip.
The 'pdf.py' script is used to upload a PDF file, which is then processed for text extraction.
Adjustments can be made to the system, such as changing the 'top K' value to retrieve more or fewer context chunks.
The speaker demonstrated how to adjust the system to retrieve the top five results instead of three.
The system can handle larger documents and学术论文, adjusting the 'top K' value as needed.
The speaker showed how the system can answer questions about specific topics in a document, such as 'sampling and voting'.
The system provides a method called 'sampling voting' for handling task queries using large language models (LLMs).
The setup is easy, quick, and suitable for local use on a personal computer, though not intended for enterprise-level applications.
The speaker encourages viewers to try the project, give it a star on GitHub, and share it with friends.
The speaker appreciates the audience's attention and provides a link to the GitHub repo in the description for further exploration.
Transcripts
okay so here we have a PDF file that uh
I just gathered some news from yesterday
and put it in and converted it to PDF so
what we're going to do now is try to
extract information from this PDF uh
with our local l so we are now offline
right so I'm just going to go Python
pdf. py and let's upload this to our
text file before before we create
embeddings from it okay so this was
appended to WTO text each Chunk on
separate line so let's take a look at it
it so you can see this is the structure
I want for my data so we want uh I don't
know how many how big the trunks are but
we want them on separate lines because I
found out that that works best so if we
go back to the terminal now and just
close this right and then we
run let's say we run python local rag.
py then we can start asking questions
about our data or document created this
embeddings here that we can now use as
if we do the search query what did Joe
Biden
say you can see uh context pulled from
documents so we have these three chunks
here I set my top K to three so we
pulled three different chunks and all of
this we have you can see President Biden
uh US President Biden and yeah probably
Biden here too right and you can see
mroll response Joe Biden spoke with
prime minister Benjamin natano yeah you
can see see uh we get the answer here
from mistr so this is running on AMA
100% locally if we go down here you can
see I'm not online so yeah it's working
great and the good thing is that it's
very short it's only about 70 lines of
code so this is all you need and yeah
it's been working great so I'm going to
show you today how you can set this up
yourself and just go through the full
tutorial and it's going to be open on
our um GitHub repo so you can download
it and try out for yourself okay so I'm
just going to walk you through the steps
you need to do to make this work so just
head over to ama.com download and
download AMA for whatever OS you're
using uh today I'm using Windows so I'm
just going to click on Windows and
install AMA pretty straightforward after
you've done that uh yeah head over to
the terminal and just run the command
AMA and pull yeah I'm going to pull mol
you can pull whatever model you want
here okay uh since I've already done
this it's 4.1 gab and yeah that is
pretty much it you can check this now by
doing AMA uh run mraw I think and yeah
that should be it so send message hello
and you can see all Lama is running here
now so very easy to set up next step
then is just going to be following the
link in the description you will come to
my GitHub repo here and just follow the
instructions here so you can just start
by cloning this rot you can Fork it too
if you want to so just go to your
terminal let's just clone this and let's
CD into this here right and then we're
just going to install our dependencies
so pip install um requirements. text
right I have this installed um
already and then we can just start if we
have a PDF file we can just do python
pdf. pirate we can upload our PDF like
this this has been pen bed right and
let's just close that and again next
part just going to run local rag. py
python local rag. py and we should be
good to go here now so hopefully we will
see our
embeddings and yeah we can ask what did
Joe Biden say right yeah we are pulling
this and hopefully we will get an answer
from mral
now pretty good yeah so very easy setup
and of course uh I'm going to go through
some quick adjustments you can make so
you can know how to do that if you want
to so let's do that now okay so let's
say you wanted to upload a PDF with a
paper so more agents is all you need
pretty big paper and now we kind of want
to do some
adjustments uh I want to bring in kind
of the top five results instead of three
right so I'm just going to change this
top K here to five uh you can also
change this down to one if you only want
the best uh or the results that matches
the coin similarity most with the user
input right but let's put it to five now
and let's head over to our terminal
let's do python pdf. pi and let's upload
the agent paper now instead right okay
so let's close that and if we open here
now you can see this is a bit figger but
it's the same format
right and this mentions a lot of
sampling and voting so if we go here now
and we ask let's say what does the paper
say
about sampling and voting right you can
see we have our embeddings that's good
uh so let's say if I run this
now oh you can see now we bring in like
more chunks Two Three or at least more
we bring in more information and you can
see the paper introduces a method called
sampling voting for handing task queries
using llms and yeah uh I think that's a
pretty good answer so uh that's some
adjustments you have to play around with
yourself you can also play around with
trying to change how big these chunks
can be uh I'm not going to go into
detail that in this simple setup but
that is something you can do yourself uh
yeah I think that's pretty much it
that's kind of what I wanted to cover
and uh I just found it neat that you can
create a pretty okay offline rag system
in so few lines of code and I really
found it helpful it's lightweight easy
to use Quick and 100% local but it's of
course it's not perfect that was not the
point either but it's good enough for my
use case right so maybe not use this at
your company but on your local PC sure
so again if you want to try this out
just head over to the repo you can find
the link in the description would really
appreciate it if you gave this project a
star if you want to try it out yourself
just share it with your friends if you
want to that's cool and yeah thank you
for tuning in have a great day
Browse More Related Video
Chat With Documents Using ChainLit, LangChain, Ollama & Mistral 🧠
End to end RAG LLM App Using Llamaindex and OpenAI- Indexing and Querying Multiple pdf's
EASIEST Way to Fine-Tune a LLM and Use It With Ollama
手把手教大家搭建Autogen|微軟的多智能體協作系統|多AI協調|#multi-agent #autogen #openai #chatpgt #gpt4 #gpt #web3
Como Baixar Documentos PDF do Google Drive Sem Permissão ou Bloqueados 2024
Realtime Powerful RAG Pipeline using Neo4j(Knowledge Graph Db) and Langchain #rag
5.0 / 5 (0 votes)