Python RAG Tutorial (with Local LLMs): AI For Your PDFs
Summary
TLDRThis tutorial video guides viewers on building a Python RAG application for querying information from a set of PDFs using natural language. It covers advanced features like running the app locally with open-source LLMs, updating the vector database, and evaluating AI responses. The host demonstrates how to index data sources, utilize embeddings, and integrate with local or online models for generating natural language responses, concluding with unit testing strategies to ensure quality.
Takeaways
- 📚 The video demonstrates building a Python RAG (Retrieval-Augmented Generation) application for querying information from a set of PDFs, specifically board game instruction manuals.
- 🔍 It introduces advanced features for the RAG application, including running it locally with open-source LLMs (Large Language Models) and updating the vector database without rebuilding from scratch.
- 🛠️ The tutorial covers the process of setting up the application, from gathering documents to using a PDF document loader and splitting the content into smaller chunks for indexing.
- 📈 The importance of creating embeddings for each chunk of text is highlighted, as these serve as keys in the vector database and are crucial for the RAG system to function effectively.
- 🔧 The video explains how to use ChromaDB as the vector database and how to tag each chunk with a unique ID to manage updates and additions to the database.
- 🔄 It shows how to detect new documents and update the database by checking for unique IDs, allowing for incremental updates instead of full rebuilds.
- 🤖 The application uses an LLM to generate responses to queries, with the video providing a demonstration of how the system formulates answers using context from the PDFs.
- 🔬 The script discusses the evaluation of the AI-generated responses through unit testing, using an LLM to judge the equivalence of expected and actual responses.
- 🔗 The video provides a GitHub link for those interested in accessing the full project code and running the application themselves.
- 💡 The tutorial encourages viewers to suggest further topics for future videos, such as deploying the application to the cloud, fostering a community of learners.
- 🚀 The video concludes by emphasizing the learning outcomes, such as using different LLMs, updating databases, and testing application quality, and invites viewers to engage with the content.
Q & A
What is the primary purpose of the application built in the video?
-The application is designed to allow users to ask natural language questions about a set of PDFs, specifically board game instruction manuals, and receive answers along with references to the source material.
What does RAG stand for and what is its role in the application?
-RAG stands for Retrieval, Augmented Generation. It is a method used to index a data source so that it can be combined with a Large Language Model (LLM) to provide an AI chat experience leveraging the indexed data.
How does the application handle the process of updating the vector database with new entries?
-The application updates the vector database by first giving each chunk of text a unique and deterministic ID based on the source file path, page number, and chunk number. It then checks if the chunk exists in the database; if not, it adds the new chunk.
What is the significance of embeddings in the context of this application?
-Embeddings are a key component in the application, serving as a numerical representation of the text chunks and queries. They are used to fetch the most relevant entries from the vector database when a question is asked.
What is the role of the Ollama server in the application?
-The Ollama server is used to run open-source LLMs locally on the user's computer. It provides the capability to generate responses using a local model, which can be more efficient and cost-effective than relying solely on online models.
How does the application handle the case of adding new PDFs or pages to an existing PDF?
-The application detects new documents or pages by comparing the unique IDs of the existing chunks in the database with the new chunks derived from the added PDFs or pages. Only the new chunks that do not exist in the database are added.
What is the significance of using a unique but deterministic ID for each chunk?
-Using a unique but deterministic ID for each chunk ensures that the application can accurately identify whether a chunk already exists in the database, allowing for efficient updates and avoiding duplication.
What is ChromaDB and how does it fit into the application?
-ChromaDB is a vector database used in the application to store the embeddings of the text chunks. It allows for efficient retrieval of the most relevant chunks when a query is made.
How does the application evaluate the quality of AI-generated responses?
-The application uses unit testing with a helper function that creates a prompt for an LLM to judge whether the expected response and the actual response are equivalent in meaning, despite potential differences in wording.
What is the importance of testing the application with both positive and negative test cases?
-Testing with both positive and negative cases helps ensure the robustness of the application. Positive cases confirm that the application works correctly with expected inputs, while negative cases verify that it can correctly identify and handle incorrect or unexpected inputs.
How can users access the full project code and run the application themselves?
-Users can access the full project code by visiting the GitHub link provided in the video description. This allows them to download the code and run the application end-to-end as demonstrated in the video.
Outlines
🛠️ Building a Python RAG Application
This paragraph introduces a project to create a Python application using the Retrieval-Augmented Generation (RAG) model. The app is designed to answer questions about a set of PDF documents, specifically board game instruction manuals for games like Monopoly and CodeNames. The video promises to cover advanced features, including running the app locally with open-source Large Language Models (LLMs), updating the vector database with new entries without rebuilding from scratch, and testing the AI's responses. The paragraph also provides a quick demo of the app in action, explaining the basic concept of RAG and how it combines an LLM with indexed data to provide natural language responses.
📚 Document Preparation and Embedding Creation
The second paragraph delves into the process of preparing documents for the RAG application. It discusses gathering PDFs as source material and using the Langchain library to load documents and split them into smaller, manageable chunks. The importance of creating an embedding function for these chunks is emphasized, as it serves as a key for the database. The paragraph also mentions different embedding options, such as AWS Bedrock and Ollama, and how to integrate them into the application. The process of building a vector database with ChromaDB and updating it with new or modified documents is outlined, including the use of unique IDs for each chunk to avoid duplication.
🔄 Updating the Database with New Content
This paragraph focuses on the functionality of updating the vector database with new PDFs or changes to existing documents. It explains how to detect new documents and avoid re-adding existing ones, ensuring efficient database management. The paragraph also touches on the challenge of updating modified content within a document, hinting at solutions but stating it's beyond the current scope. The code snippets provided demonstrate how to add new documents to the database using unique IDs and how to ensure that only new or updated content is added, maintaining database integrity.
🤖 Integrating the LLM for Response Generation
The fourth paragraph describes the integration of a local Large Language Model (LLM) for generating responses to user queries. It details the process of creating a Python script that takes a query, uses an embedding function to search the database for relevant chunks, and constructs a prompt for the LLM. The paragraph explains how to retrieve the most relevant context from the database and combine it with the user's question to form a complete prompt. It also discusses using the LLM to generate a response and how to handle different approaches to local versus online embeddings, including using an Ollama server for local embeddings or an online service like AWS Bedrock for better quality.
📝 Evaluating AI Response Quality with Unit Testing
The final paragraph addresses the evaluation of the AI-generated responses' quality through unit testing. It introduces the concept of writing test cases with expected answers and using an LLM to judge the equivalence of the expected and actual responses. The paragraph outlines creating a prompt template for the LLM to evaluate response correctness and suggests using an LLM's judgment to determine if the test passes or fails. It also mentions the importance of including both positive and negative test cases and setting a threshold for acceptable test success rates. The paragraph concludes with a demonstration of running test cases and adjusting assertions to reflect the correctness of the responses.
Mindmap
Keywords
💡RAG
💡LLM (Language Learning Model)
💡Embedding
💡Vector Database
💡ChromaDB
💡Ollama
💡Unit Testing
💡Langchain
💡Natural Language Processing (NLP)
💡Local LLM Model
Highlights
Building a Python RAG application to answer questions about a set of PDFs using natural language.
Using board game instruction manuals as the data source for the RAG application.
Introduction of advanced features for the RAG tutorial, including local running and vector database updates.
Demonstration of how to get the RAG application running locally using open source LLMs.
Explanation of how to update the vector database with new entries without rebuilding from scratch.
Overview of testing and evaluating the quality of AI-generated responses for the app.
Recap and explanation of the RAG concept: Retrieval, Augmented Generation.
Demonstration of the completed app's ability to answer questions about board game instructions.
Use of a local LLM model to generate responses in the app.
Behind-the-scenes explanation of how the app processes the data and queries.
Focus on main features and speeding through other parts for experienced viewers.
Instructions on installing or updating main dependencies for the RAG project.
Guide on gathering and preparing PDF documents as the source material.
Use of Langchain library for loading documents and its document loader options.
Splitting documents into smaller chunks for indexing and storage.
Creating an embedding function for database indexing and querying.
Recommendation to use the same embedding function for database creation and querying.
Discussion on using AWS Bedrock for embeddings and the option to use local models like Ollama.
Process of creating a vector database with the chunks and their unique IDs.
Explanation of how to add new PDFs to the database without recreating it from scratch.
Challenge of updating existing pages in the database and potential solutions.
Unit testing approach to evaluate the quality of responses from the RAG application.
Use of an LLM to judge the equivalence of responses in the testing process.
Writing unit tests with sample questions and expected answers for the RAG application.
Demonstration of the testing process and the use of assertions to validate responses.
Inclusion of both positive and negative test cases for comprehensive evaluation.
Final thoughts on the project, invitation for further topics, and reference to GitHub for code.
Transcripts
In this video, we're going to build a Python RAG
application that lets us ask questions about
a set of PDFs we have using natural language.
The PDFs I'm going to use here are a bunch of board game instruction
manuals for games like Monopoly or CodeNames.
I can ask questions about my data, like "how do I
build a hotel in Monopoly?" The app will give me
an answer and a reference to the source material.
Now, I have done a basic RAG tutorial before on this
channel, but in this video we're going to take it up
a notch by introducing some more advanced features
that you guys asked about in the comments last time.
We're going to cover how to get it running locally
on your computer using open source LLMs.
I'll also show you how to update the vector database with new entries.
So if you want to modify or add information, you can do that
without having to rebuild the entire database from scratch.
Finally, we'll take a look at how we can test and evaluate
the quality of our AI generated responses.
This way you can quickly validate your app whenever you make
a change to the data source, the code or the LLM model.
All right, let's get started.
If you haven't built an app like this before,
then I highly recommend you to check out my
previous video tutorial on this topic first.
It will help you to get up to speed with all of the basic concepts.
Otherwise, here's a quick recap. RAG stands for Retrieval
Augmented Generation, and it's a way to index a
data source so that we can combine it with an LLM.
This gives us an AI chat experience that can leverage that data.
Here's a quick demo of the completed app.
I have my Python script here and I'm going to
ask a question about my data source, which
is going to be board game instruction manual.
So I can ask, "how do I build a hotel in Monopoly?""
And the result is that it gives me a response based on the
data that it found in the PDF sources that I provided it.
So the response is going to use that and actually phrase
it into a proper natural language response.
It's not just going to copy and paste the raw data source.
And here it's telling me that if I want to build
a hotel, I need to have four houses in a single
color and then I can buy the hotel from the bank.
And in this version of the app, I'm also using
a local LLM model to generate this response.
So here I have my Ollama server running in a separate terminal.
If you don't know what that is yet, that's okay. We'll cover it later.
But here's the actual LLM reading the question
and then turning this into a response.
Here's a quick recap on how that all works behind the scenes.
First, we have our original data source, the PDFs.
This data is going to be split into small chunks
and then transformed into an embedding
and stored inside of the vector database.
Then when we want to ask a question, we'll also turn our query into an embedding.
This will let us fetch the most relevant entries from the database.
We can then use those entries together in a prompt
and that's how we get our final response.
For this tutorial, we're going to mainly focus on the
features I mentioned at the beginning of the video.
But for everything else, we're going to be speeding through it a little bit.
So if you feel like it's all going a little bit
too fast, you can either check out my previous
RAG tutorial video first to learn the basics.
Or you could also follow along by looking through the code itself on GitHub.
Links will be in the description.
Here are the main dependencies I'll be using in this project.
So go ahead and install or update them first before you start.
First, we'll need some data to feed our RAG application with.
Gather some documents that you'd like to use as your source material.
In my previous video, a lot of you asked me how to do this with PDFs.
So I'm going to be using PDFs here.
I'm going to use board game instruction manuals.
I've got one for Monopoly and I've also got one for A Ticket to Ride.
And I just found these for free online.
So you can use whatever you want, but this is what I'm going to use here.
Just download the PDFs you want to use online and then put them inside a folder.
In this case, I've put it inside this data folder here in my project.
This is the code I can then use to load the documents from inside that folder.
It's using a PDF document loader that comes with the Langchain library.
And for future reference, if you want to load other types of
documents, you can head over to the Langchain documentation.
Look up document loaders and then just pick from any
of the various available document loaders here.
There's things for CSV files, a directory, HTML, Markdown and Microsoft Office.
And if that's still not enough, you can click
on the document loader integrations and there's
a whole list of third-party document loaders
available for you to choose from as well.
And if you want to see what one of these documents
looks like after you've loaded it,
you could just go ahead and print it out.
You should see an object like this.
So each document is basically an object containing
the text content of each page in the PDF.
It also has some metadata attached, which tells
you the page number and the source of the text.
Our next problem is that each document or each page
of the PDF is probably too big to use on its own.
We'll need to split it into smaller chunks and we can use Langchains
built-in recursive text splitter to do exactly that.
After you run that on your documents, you'll find that each chunk is a lot smaller.
So this is going to be handy when we index and store the data.
Next, we'll need to create an embedding for each chunk.
This will become something like a key for a database.
I actually recommend creating a function that returns
an embedding function because we're actually going to
need this embedding function in two separate places.
The first is going to be when we create the database itself.
And the second is when we actually want to query the database.
And it's very important that we use the exact same
embedding function in both of these places.
Otherwise, it's not going to work.
Langchain also comes with a lot of different embedding functions you can use.
In this case, I'm using AWS Bedrock because I tend
to build a lot of stuff using AWS already.
And the results are pretty good, from what I can tell.
But you can switch to using a different embedding function as well.
You can choose from any of the embedding integrations
listed here on the Langchain website.
For example, if you want to run it completely locally on your
own computer, you can use an Ollama embedding instead.
Of course, for this to work, you also need to install Ollama
and run the Ollama server on your computer first.
If you haven't used Ollama before, you can think
of it as a platform that manages and runs
open source LLMs locally on your computer.
Just download it from the official website, Ollama.
com, and then install any of the available
open source models like Llama2 or Mistral.
You can then run this command to serve the model as a REST API on your local host.
Now, you'll be able to use an LLM just by calling this local API.
Of course, the Langchain module for Ollama embeddings will handle
all of this for you as long as the server is running.
However, just as a heads up, for my own testing
using one of the 4GB models on Ollama, the
embedding results just weren't very good.
For RAG apps, having good embeddings is essential,
otherwise your queries won't match up with the chunks
of information that are actually relevant.
So for myself on this project, I'm still going to use a
service like OpenAI or AWS Bedrock for the embeddings.
But if your computer can handle it, you can try
using a larger, more powerful model on Ollama
as well, and please let me know how that goes.
By the way, some of you might be wondering at this point,
how did I measure the quality of the embeddings?
Well, we'll get to that later when we look at testing.
Now let's walk through the process of creating the database.
Once we have the documents split into smaller chunks, we can use
the embedding function to build a vector database with it.
So just as a quick recap, a vector is something like
a list of numbers, and our embeddings are actually
a vector because they're just a list of numbers.
So a vector database lets us store information
using vectors as something like a key.
And in this video, we're going to be using ChromaDB as our vector database.
In my first video, we actually had code that looked
a lot like this, and it's useful if we wanted
to create a brand new database from scratch.
But what if we wanted to add or update items in an existing database?
ChromaDB will let us do this too, but first we'll
need to tag every item with a string id.
Let's go back to our chunk of text and figure out how we can do this.
So as you can see, each chunk already has its source file path and a page number.
So what if we put it together to do something like this?
We'll use the source path, the page number, and then the chunk number of that page.
Because remember, a single page could have several chunks.
That way, every chunk will have a unique but deterministic id.
We can then use this to see if this particular chunk exists in
the database already, and if it's not, then we can add it.
Implementing this is pretty easy as well.
We can loop through all the chunks and look at its metadata.
We'll concatenate the source and the page number to make an id.
But because a single page is split up into multiple chunks,
we actually have many chunks sharing the same page id.
Solving this is pretty easy though.
We can just keep count of the chunk index for a page,
and then reset it to zero whenever we see a new page.
So putting all that together, we now have a
chunk id that looks something like these.
Each chunk is now guaranteed a unique and deterministic id.
Let's add it back into the metadata of the chunk as well so we can use it later.
Now, if we add new PDFs or add new pages to an existing
PDF, our system will have a way to check
whether it's already in the database or not.
So let's hop over to the code editor and see this in action.
Currently, in my data folder, I've got a Monopoly PDF and a Ticket to Ride PDF.
So now I'm going to add a new PDF to this folder.
It's going to be the one for CodeNames.
This is the one I'm adding.
So now when I populate the database, I want my program to detect
that this one is new, but the other two already exist.
So I only want this one to be added.
So here, right away, it's quickly detected
that there's 41 documents already inside the
database, but we have 27 new documents
that we need to add just because I moved that
new pdf into the data directory as well.
So that was a new one.
And this time, even if we run the same command
again to populate the database, it can see that
all the documents, all the pdfs inside that
data folder have already been added from the previous
step and there was nothing new to add.
So this is exactly the behavior that we want.
Although this implementation will let us add
new data without having to recreate the entire
database itself, it's actually not enough
for us if we wanted to edit an existing page.
For example, if I modify the pdf content in this chunk
here, the chunk ID will still be exactly the same.
So how do we know when we need to actually update this page?
This problem is out of scope for today, but
there's actually many ways to solve this.
If you think you know the solution, then please share it in the comments.
Now let's close the loop on this and actually take a look
at the code that you need for updating your database.
Now that we've given every chunk a unique ID, let's add them to the database.
If you're using chroma, you can first load up your database like
this, using the same embedding function we used earlier.
Let's go through all the items in the database and get all of the IDs.
If you're running this for the very first time, then this should be an empty set.
After that, we can filter through all of the chunks we're about to add.
If we don't see an ID inside the set, that means
it's a new chunk and we should add it.
From there, it's all pretty easy.
It's just a few lines to add the documents to the database.
Just don't forget to also add the IDs explicitly as well.
If you don't specify a matching list of IDs for
the items that you're adding, then chroma will
generate new UUIDs for us automatically.
It's convenient, but it also means that we won't be able
to check for the existing items like we did earlier.
So if that's the case, when we try to add new
items, we're just going to end up with a
lot of duplicated items inside the database.
Now let's put all this together and make this not just
functional, but also able to run locally as well.
If you were using Ollama's local embeddings from before, you'll
be able to do everything 100% locally, end to end.
Or you might end up with more of a hybrid approach like me.
I use an online embedding model because it's better than what I can do locally.
But I found that as long as the embeddings are good,
I can actually get pretty impressive results using
a local LLM to do the actual chat interface.
So that's what we're going to do here.
We can start by creating a new Python script or
function that will take our query as input.
We'll also have to load the embedding function and the database.
We'll need to prepare a prompt for our LLM.
Here's the template I'm going to use.
There's two variables we'll need to replace here.
First is the context, which is going to be all the chunks
from our database that best matches the query.
And then second, it's the actual question that we want to ask.
So we'll put that whole thing together and then we get
the final prompt that we want to send to our LLM.
To retrieve the relevant context, we'll need to search
the database, which will give us a list of
the top K most relevant chunks to our question.
Then we can use that together with the original
question text to generate the prompt.
If you decide to print out the entire prompt at
this stage, you should see something like this.
So you've got your entire prompt template here, but you
could see that our context section already has some of
the chunks from the instruction manual formatted in.
And I put my k=5, so there's actually five different chunks.
And this is all part of one big prompt.
This is the information that my system thought
was the best matching to answer our query.
And then I kind of reiterate the question that I want right
at the end after I've given all of this context.
So here the question is, how many clues can I give in code names?
And the response is, in code names you can only give one
clue per turn, and the clue should be a single word.
And then I also have the sources of this answer cited here,
so that's basically where all these chunks were found.
After you have the prompt, the rest is super easy.
All you have to do is just invoke an LLM with the prompt.
Here I'll use the Mistral model on my local Ollama server.
It only needs four gigabytes to run, but it's actually quite capable.
And if you want, you can also get the original source of the text like this.
Now let's go back to our terminal and see this in action.
So I'm going to use this program and I'm going to query it.
How do I get out of jail in Monopoly?
And now the program stopped running, so let's go and see what it did.
Here you can see that we find all the relevant chunks.
So this one is the most relevant, and it's actually spot on.
It actually gives us step-by-step instructions on how to get out of jail.
So I think really this is the only one we need.
But anyways, we put our limit to five, so we also get a bunch
of other chunks that may be relevant to the question.
And then as part of the prompt, we reiterate the question
again so that our LLM knows what to answer.
And using all of that information, this is the response our LLM came up with.
So it came up with four different ways we can get out of jail in Monopoly.
And then right at the end, we also have the sources of all of this information.
So that's what it's like when we run the entire application.
And even though I used AWS Bedrock for the embeddings,
because I couldn't get local embeddings
that were good enough, this part to generate
the question still uses a local Ollama server.
So if I go to my other terminal here, see where
my Ollama server is running, you could
see it logging the work that we're doing.
We now have a RAG application that works quite well end-to-end.
We can get it to answer our questions by using the embedded
source material, but the quality of the answers we
get would depend on quite a lot of different factors.
For example, it could depend on the source material
itself, or the way we split the text.
And it will also 100% depend on the LLM model we
use for the embedding and the final response.
So the problem we have now is, how do we evaluate the quality of responses?
This seems to be a subjective matter.
Let's see if we can approach this with unit testing.
If you've never worked with unit tests in Python
before, then you can also check out my other
video on how to get started with pytest.
The main idea here is to write some sample questions and also
provide the expected answer for each of those questions.
So given a question like, "How much total money does
a player start with in Monopoly?", the answer I'd
expect my RAG application to respond with is 1500.
You want it to be something that you can already
validate or already know the answer for.
We can then run the test by passing the question
into our actual app, and then comparing
and asserting that the answer matches.
But the challenge with this is that we can't do
a strict equality comparison, because there could
be many ways to express the right answer.
So what we can do instead is actually use an LLM to judge the answer for us.
This won't always guarantee perfect results, but it does get us pretty close.
We can start by having a prompt template like this, that asks
the LLM to judge whether these responses are equivalent.
Then, as part of our test, we'll query the
RAG app with our question, and then we'll
create a prompt based on the question, the
expected response, and the actual response.
We can then invoke our LLM again to give us its opinion.
We can clean up the response we get from that, and
finally check whether the answer is true or false.
And this is something we'll actually be able to assert on as part of our unit test.
So putting all that together, I can wrap this into
a nice helper function that returns true or false.
Then, I can just write a bunch of unit tests using that helper
function, and I can write as many test cases as I want.
This will give me a quick way to see how well my application
is performing, especially after I make updates to
the code, the source documents, or the LLM model itself.
Now let's hop over back to our editor to do a quick demo.
So I've got my test file here, and here is the helper
function that you saw earlier, and here is us trying
to interpret that result into either a true
or a false result, and here is the prompt template.
So these are going to be my two test cases.
I'm going to test the monopoly rules, and I'm
also going to test the ticket to ride rules.
So two test cases. Let's see how it does.
Okay, and in this case, both of my test cases passed.
Let's expand this window and actually take a bit of a closer look.
So here, my expected response is 10 points,
and the actual response is "The longest continual
train gets a bonus of 10 points."
So these are not exactly the same string,
but they're still saying the same thing.
And this is true. So this was successful.
And then if I go up to my monopoly one, the expected response
is 1,500, and the actual response is also 1,500.
And as you can see again, the format is slightly
different, so we need the LLM to tell us whether
or not these actually mean the same thing.
So this one passed as well. In this case, both of our tests passed.
Now, we have to be careful with this because we
don't know whether it passed because the evaluation
was good and the answer was correct, or
if our LLM turns out to be too generous, we might
actually end up passing the wrong answers.
So it's also good to do a negative test case to kind of check that.
So what we could do is we can turn this expected
response into something we know that's wrong
and then check that it actually fails.
We want it to fail in that case.
So I'm going to put 9999.
Okay, and I'm now running that test again, expecting this case to fail.
And here it actually does fail, which is good. That's exactly what we wanted.
So we have our fake expected response of 9999,
and then the actual response is still the same
from when we asked it before, which is 1,500.
And our LLM evaluation correctly determines that this is the wrong response.
So our test will fail in this case, and our entire test suite will fail.
However, if we want a failing test, if we want this
negative case to be used as part of our suite in the
correct way, what we could actually do is go back
to our test case here and then invert the assertion.
So instead of asserting that this is true, we can
assert that this is actually going to fail.
And that also tells us that this answer should be wrong, and
something is wrong if it's not wrong, if that makes sense.
So let's go ahead and run this again.
So this time the LLM still believes that the
response doesn't match, and it's false.
But because we've inverted the assert case, the
entire test suite still manages to pass.
So I recommend that if you're going to write tests for
LLM applications like this, it's good to have both
positive cases and negative cases being tested.
And by the way, if you do have a lot of different
test cases you want to use, you maybe don't
need to assert that 100% of them succeed.
You could maybe set a threshold for what is good enough.
For example, 80% or 90%.
So now you've leveled up your project by learning
how to use different LLMs, including a
local one, and you've also learned how to add
new items to your database, and how to test
the quality of your application as a whole.
These were all topics that were brought up in the
comment section of my previous RAG tutorial.
And so after watching this, if there's more
things you'd like to learn how to do, like
deploy this to the cloud for example, then
let me know in the comments of this video and
we can build it together in the next one.
I know we went through the project quite quickly.
My focus here was to show you the coding snippets that
mattered the most and helping you to understand them.
So I've actually had to simplify a bunch of the code and the ideas along the way.
But if you want to take a closer look and see
how all the pieces fit together into a project,
or you just want to download a code
that you can run right away, then check out
the GitHub link in the video description.
There you'll have access to the entire project that
I used for this video, and something that I was
running end-to-end as you saw in the demo here.
Anyways, I hope this was useful, and I'll see you in the next one.
浏览更多相关视频
Retrieval Augmented Generation - Neural NebulAI Episode 9
End to end RAG LLM App Using Llamaindex and OpenAI- Indexing and Querying Multiple pdf's
RAG Explained
Ollama-Run large language models Locally-Run Llama 2, Code Llama, and other models
RUN LLMs Locally On ANDROID: LlaMa3, Gemma & More
2-Langchain Series-Building Chatbot Using Paid And Open Source LLM's using Langchain And Ollama
5.0 / 5 (0 votes)