RAG + Langchain Python Project: Easy AI/Chat For Your Docs
Summary
TLDRThis tutorial video guides viewers on constructing a retrieval augmented generation app using Langchain and OpenAI, ideal for handling extensive text data like books or documentation. It demonstrates the process from data preparation to creating a vector database with ChromaDB, utilizing techniques like RAG for precise AI responses. The video also covers embedding vectors for text and crafting prompts for AI to generate answers, concluding with examples using 'Alice in Wonderland' and AWS Lambda documentation.
Takeaways
- 📚 The video is a tutorial on building a retrieval augmented generation app using Langchain and OpenAI, which can interact with personal documents or data sources.
- 🔍 The app is useful for handling large volumes of text data, such as books, documents, or lectures, and allows AI interaction like asking questions or building chatbots.
- 🤖 The technique used is called RAG (Retrieval Augmented Generation), which ensures responses are based on provided data rather than fabricated answers.
- 📁 The data source can be a PDF, text, or markdown files, and the tutorial uses AWS Lambda documentation as an example.
- 🗂 The process starts with loading the data into Python using a directory loader module from Langchain, turning each file into a 'document' with metadata.
- 📐 The documents are then split into smaller 'chunks' using a text splitter, which can be paragraphs, sentences, or pages, to improve search relevance.
- 📊 A vector database, ChromaDB, is used to store the chunks, utilizing vector embeddings as keys, which require an OpenAI account for generation.
- 📈 Vector embeddings represent text meanings as numerical lists, where similar texts have close vector coordinates, measured by cosine similarity or Euclidean distance.
- 🔑 To generate a vector from text, an LLM like OpenAI is used, which can convert words into vector form for comparison and database querying.
- 🔍 The querying process involves turning a user's query into a vector and finding the most relevant chunks in the database based on embedding distance.
- 📝 The relevant chunks are then used to create a prompt for OpenAI to generate a response, which can also include references to the source material.
Q & A
What is the purpose of the video?
-The purpose of the video is to demonstrate how to build a retrieval augmented generation app using Langchain and OpenAI to interact with one's own documents or data sources, such as a collection of books, documents, or lectures.
What does RAG stand for in the context of this video?
-RAG stands for Retrieval Augmented Generation, a technique used in the video to build an application that can provide responses using a data source while also quoting the original source of information.
What is the data source used in the example provided in the video?
-The example in the video uses the AWS documentation for Lambda as the data source.
How does the video ensure the AI's response is based on the provided data source rather than fabricated?
-The video ensures this by demonstrating how the AI can use the provided documentation to give a response and quote the source, preventing the AI from fabricating a response.
What is the first step in building the app as described in the video?
-The first step is to prepare the data that you want to use, which could be a PDF, a collection of text, or markdown files, and then load this data into Python using a directory loader module from Langchain.
Why is it necessary to split a document into smaller chunks?
-Splitting a document into smaller chunks is necessary to make each chunk more focused and relevant when searching through the data, improving the quality and accuracy of the AI's response.
What tool is used to split the text into chunks in the video?
-A recursive character text splitter is used to divide the text into chunks, allowing the user to set the chunk size and the overlap between each chunk.
What is ChromaDB and how is it used in the video?
-ChromaDB is a special kind of database that uses vector embeddings as the key. It is used in the video to create a database from the chunks of text, which can then be queried for relevant data.
What is a vector embedding in the context of this video?
-A vector embedding is a list of numbers that represent text in a multi-dimensional space, capturing the meaning of the text. Similar texts will have similar vector embeddings.
How is the relevance of the retrieved data determined in the app?
-The relevance is determined by calculating the distance between the vector embeddings of the query and the chunks in the database. The chunks with the smallest distance are considered more relevant.
What is the final step in the process shown in the video?
-The final step is to use the relevant data chunks to create a prompt for OpenAI, which is then used to generate a high-quality response to the user's query, also providing references back to the source material.
Outlines
🚀 Building a Retrieval Augmented Generation App
This paragraph introduces a tutorial on constructing an app using Langchain and OpenAI to enhance data retrieval and generation. The app is designed for interacting with large text datasets, such as books or documents, through AI capabilities like asking questions or creating customer support chatbots. The tutorial will cover setting up a data source, creating a vector database, querying the database, and forming coherent responses using the RAG technique with AWS Lambda documentation as an example. The process is made accessible by breaking it down into manageable steps, starting from data preparation to the final response generation.
📚 Organizing and Processing Data for AI Interaction
The speaker explains the initial steps in preparing data for the AI app, including selecting a data source like PDFs or markdown files and organizing them into folders. The markdown files are loaded into Python using the directory loader module from Langchain, and each file is transformed into a 'document' containing text and metadata. The documents are then split into smaller, more focused 'chunks' using a text splitter, which is crucial for enhancing the relevance of search results. The AWS Lambda documentation and Alice in Wonderland book serve as examples of how different documents can be processed and split into chunks for better AI interaction.
🔍 Creating a Vector Database with ChromaDB
The paragraph delves into the technical process of creating a vector database using ChromaDB, which leverages vector embeddings as keys. It requires an OpenAI account to generate embeddings for text chunks. The tutorial outlines creating a Chroma path for persistent storage, removing old database versions, and saving the new database to disk. The concept of vector embeddings is introduced as a method to represent text in a multi-dimensional space, allowing for the measurement of semantic similarity between texts. The tutorial also demonstrates how to generate and evaluate embeddings using OpenAI's functions and Langchain's utility.
🤖 Querying the Database and Crafting AI Responses
The speaker describes how to use the vector database to query for information relevant to a given user query. This involves loading the Chroma database, using the same embedding function as before, and searching for the top matching chunks of information. The process includes checking for relevant matches and crafting a custom AI response based on the retrieved data. The tutorial provides a step-by-step guide on creating a prompt template for OpenAI, formatting it with context and query, and using it to generate a response. It also shows how to extract and print source references from the metadata of document chunks.
🌐 Demonstrating App Functionality with Different Data Sources
In this paragraph, the speaker demonstrates the app's functionality by switching the data source to the AWS Lambda documentation and asking a different question about supported languages or runtimes. The response showcases the app's ability to retrieve relevant information from various files and summarize it accurately. The tutorial concludes by encouraging viewers to try the app with their data and to provide feedback for future tutorial topics. A GitHub link is promised in the video description for those interested in the code.
Mindmap
Keywords
💡Retrieval Augmented Generation (RAG)
💡Langchain
💡OpenAI
💡Vector Database
💡Embeddings
💡Metadata
💡Chunking
💡ChromaDB
💡Query
💡Prompt Template
Highlights
Introduction of a tutorial on building a retrieval augmented generation app using Langchain and OpenAI.
The app allows interaction with personal documents or data sources using AI, suitable for large text datasets.
Demonstration of using AWS Lambda documentation as a data source with the app.
Explanation of the RAG technique for generating responses with source citations.
Assurance that the project is easier than it seems, with a step-by-step guide provided.
The necessity of a data source like PDFs, text, or markdown files for the project.
Use of the directory loader module from Langchain to load markdown data into Python.
The process of splitting documents into smaller chunks for more focused data retrieval.
Utilization of a recursive character text splitter with adjustable chunk size and overlap.
Transformation of text chunks into a vector database using ChromaDB.
Requirement of an OpenAI account for generating vector embeddings with the OpenAI embeddings function.
Explanation of vector embeddings as representations of text that capture meaning.
Use of cosine similarity or Euclidean distance to calculate the distance between vectors.
Demonstration of generating a vector from a word using OpenAI's API.
Introduction of an evaluator function in Langchain to compare embedding distances.
Querying the database to find the most relevant chunks of information for a given question.
Crafting a custom response using the retrieved data chunks with the help of AI.
Loading the Chroma database and using the same embedding function for consistency.
Use of a prompt template to create a prompt for OpenAI with placeholders for context and query.
Final step of using the crafted prompt to get a response from the LLM model.
Inclusion of source material references in the response for traceability.
Switching data sources to AWS Lambda documentation for a different example of app usage.
Summary of how the app uses query to search for information and answer based on that data.
Invitation to try the tutorial with one's own dataset and provide feedback for future topics.
Transcripts
Hey everyone, welcome to this video where I'm going
to show you how to build a retrieval augmented
generation app using Langchain and OpenAI.
You can then use this app to interact with your
own documents or your own data source.
This type of application is great for when
you have a lot of text data to work with.
For example, a collection of books, documents or lectures. And
you want to be able to interact with that data using AI.
For example, you might want to be able to ask
questions about that data or perhaps build
something like a customer support chatbot that
you want to follow a set of instructions.
Today, we're going to learn how to build this using
OpenAI and the Langchain library in Python.
We're going to be using a technique called RAG, which
stands for retrieval augmented generation.
In this example, the data source I've given it is the AWS documentation for Lambda.
And here I'm asking it a question based on that documentation.
The agent will be able to use that documentation
to give me a response as well as quote the source
where it got that information from originally.
This way, you always know that it's using data from the sources
you provided it with rather than hallucinating the response.
If this project sounds complex or difficult to you, then
don't worry because it's a lot easier than you think.
I'll walk you through every step of the project, starting
with how to prepare the data that you want to use
and then how to turn that into a vector database.
Then we'll also look at how to query that database for relevant pieces of data.
Finally, you can then put all those pieces together to form a coherent response.
If that sounds good, then let's get started.
To begin, we'll first need a data source like a
PDF or a collection of text or markdown files.
This can be anything.
For example, it could be documentation files for your software.
It could be a customer support handbook, or it
could even be transcripts from a podcast.
First, find some markdown files you want to use as data for this project.
But if you want some ideas, then here I've got the Alice
in Wonderland book as a markdown file, or I also have
the AWS documentation as a bunch of markdown files.
And I have each of them in their own separate
folder under this data folder in my project.
So make sure you have a setup like this first before you start.
Once you have that source material, we're going to need to load
it up and then split it into different chunks of text.
To load some markdown data from your folder into Python,
you can use this directory loader module from Langchain.
Just update this data path variable with wherever you've decided to put your data.
Here I'm using data/books.
If you only have one markdown file in that folder, it's okay.
Or if you have multiple markdown files, then
this will load everything and turn each of those
files into something called a document.
If I use this piece of code on my AWS Lambda
documents folder instead, then each of these
markdown files will become a document.
And a document is going to contain all of the content on this page.
So basically all of the text you see here.
And it's also going to contain a bunch of metadata.
For example, the name of the source file where the text originally came from.
And after you've created your document, you can also choose
to add any other metadata you want to that document.
Now the next problem we encounter is that a
single document can be really, really long.
So it's not enough that we load each markdown file into one document.
We have to also split each document if they're too long on their own.
With something as long as this, we're going to want
to split this big document into smaller chunks.
And a chunk could be a paragraph, it could be a
sentence, or it could be even several pages.
It depends on what we want.
By doing this, the outcome that we're looking
for is that when we search through all of this
data, each chunk is going to be more focused
and more relevant to what we're looking for.
To achieve this, we can use a recursive character text splitter.
And here we can set the chunk size in number of characters
and then the overlap between each chunk.
So in this example, we're going to make the chunk
size about 1000 characters, and each chunk
is going to have an overlap of 500 characters.
So I've just ran the script to split up my text into several chunks.
And here I've printed out the number of original documents
and the number of chunks it was split into.
Since I used this on the Alice in Wonderland
text, it split one document into 282 chunks.
And down here, I've just picked a random chunk as
a document and I printed out the page content and
the metadata so you could see what it looks like.
So the page content is just literally a part of the text taken out of that chunk.
So here you can see that it's about one or two paragraphs of the story.
And the metadata right now, it only has the source, which is
the path of the file it got this from, and the start index.
So where in that source does this particular chunk begin?
And if you try the same code with the AWS Lambda docs
instead, you'll see that the source also points to
the file that the information of each chunk is from.
So this is also useful if you have a lot of different files,
rather than just one big file splitting into smaller chunks.
To be able to query each chunk, we're going to need to turn this into a database.
We'll be using ChromaDB for this, which is a special kind
of database that uses vector embeddings as the key.
This is the code that you can use to create a Chroma database from our chunks.
For this, you're going to need an OpenAI account because
we're going to use the OpenAI embeddings function
to generate the vector embeddings for each chunk.
I'm also going to create a Chroma path and set that
as the persistent directory so that when we create
this database, I have a bunch of folders on my
disk that I can use to load the data later on.
This is useful because normally I might want to
put this database into a Lambda function or I
might want to put it in the cloud somewhere.
So I want to be able to save it to disk so
that I can copy it or deploy it as a file.
Now before I create the database or before I save
it to disk, I can also use this code snippet
to remove it first if it already exists.
This is useful if I want to clear all of my previous versions
of the database before I run the script to create a new one.
Now the database should save automatically after we create it,
but you can also force it to save using this persist method.
So once you've put all of that together and then
you've run your script to generate your database,
you should see this line where it's saved
all of your chunks to the Chroma database.
And you can see here on your disk that the data should be there as well.
And here it's going to be saved as a SQLite3 file.
So now at this point we have our vector database
created and we're ready to start using it.
But first you're probably going to want to know what a vector embedding is.
If you already know what embedding vectors are,
then feel free to skip the section entirely.
Otherwise, I'll give you a really quick explanation just to bring you up to speed.
Embeddings are vector representations of text that capture their meaning.
In Python, this is literally a list of numbers.
You can think of them as sort of coordinates in multi-dimensional
space and if two pieces of text
are closely related to each other in meaning, then
those coordinates will also be close together.
The distance between these vectors can then be calculated pretty
easily using cosine similarity or Euclidean distance.
We don't need to do that ourselves though, because there's a
lot of existing functions that can do that for us already.
And this will give us a single number that tells
us how far these two vectors are apart.
To actually generate a vector from a word, we'll need an LLM, like OpenAI.
And this is usually just an API or a function we can call.
For example, you can use this code to turn
the word "apple" into a vector embedding.
And this is the result I get from using that function.
So you could see that the vector here is literally a really long list of numbers.
And the first number is 0.007 something-something, but
I truncated the rest because the list is quite long.
In fact, if you print the length of the vector, you
could see that the list has 1536 characters.
So this is basically a list of one and a half thousand numbers.
The numbers themselves aren't interesting though.
What's really interesting is the distance between two vectors themselves.
And this is quite hard to calculate from scratch, but
Langchain actually gives us a utility function to compare
the embedding distance directly using OpenAI.
So it's called an evaluator and this is how you can create one.
And here's the code to run an evaluation.
So here I'm comparing the distance of the word "apple" to the word "orange".
And running this, the result is a score of 0.13.
So we don't actually know whether that's good or not
by comparing an apple to an orange, because we don't
know where 0.13 sits on the scale of other words.
So let's try a couple of other words just to see what's a better
match with apple than orange, and what's a worse match.
Here, if I compare "apple" to the word "beach", it's actually 0.2.
So "beach" is further away from "apple" than "orange"
is, I suppose because one is a fruit.
So that naturally makes it more similar.
Now if I compare the word "apple" to itself, this should
technically be 0 because it's literally the same word.
But in this case, it's close enough.
It's 2.5 x 10^-6.
Now what about if we compare the word "apple" to "iPhone"?
In this case, the score is even better than when we compared it with "orange".
The score is 0.09.
And this is really interesting as well, because in our
first example with apples and oranges, they were
both fruits, so they were similar in that respect.
But here, we're sort of interpreting the word "apple" from a different perspective.
We're seeing it as the name of the company "apple" instead.
So when you compare it with the word "iPhone",
the association is actually much stronger.
So now that you understand what embeddings are,
let's see how we can use it to fetch data.
To query for relevant data, our objective is to find
the chunks in our database that will most likely contain
the answer to the question that we want to ask.
So to do that, we'll need the database that we created
earlier, and we'll need the same embedding
function that we used to create that database.
Our goal now is to take a query, like the one
on the left here, and then turn that into
an embedding using the same function, and
then scan through our database and find
maybe five chunks of information that are
closest in embedding distance from our query.
So here, in this example, I might ask the question
like, "How does Alice meet the Mad Hatter in Alice
in Wonderland?" And when we scan our database,
we might get maybe four or five snippets of
text that we think is similar to this question.
And from that, we can put that together, have
the AI read all of that information, and decide
what is the response to give to the user.
So although we're not just simply returning the
chunks of information verbatim, we're actually
using it to craft a more custom response
that is still based on our source information.
To load the Chroma database that we created, we're
first going to need the path, which we have from
earlier, and we're going to need an embedding
function, which should be the same one we used
to create the database with in the first place.
So here, I'm just going to use the OpenAI embeddings function again.
This should load your database from that path.
If it doesn't, then just check that the path exists,
or just go back to the previous chapter and
run the script to create the database again.
Once the database is loaded, we can then search for the
chunk that best matches our query by using this method.
We need to pass in our query text as an argument and
specify the number of results we want to retrieve.
So in this example, we want to retrieve three best matches for our query.
The results of the search will be a list of tuples where
each tuple contains a document and its relevance score.
Before actually processing the results though, we can also add some checks.
For example, if there are no matches or if the
relevant score of the first result is below
a certain threshold, we can return early.
This will help us to make sure that we actually
find good, relevant information first before
moving to the next step of the process.
So now let's go to our code editor and put all that together and see what we get.
So here I've got the main function.
I just made a quick argument parser so I can
input the query text in the command line.
I've got my embeddings function and I'm going to
search the database that I've loaded and I'm
just going to print the content for each page.
So I'm going to find the top three results for my query.
So that's my script. Let's give it a go.
So here I'm running my script with the query,
which is how does Alice meet the mad hatter?
Here it's returned the three most relevant chunks
in the text that it thought best match our query.
So we have this piece of information, this piece
of information, and then this one here.
Now here the chunk size is quite small, so it doesn't
have the full context of each part of the text.
So if you want to edit that you can play with
that chunk size variable and make it either
bigger or smaller, depending on what
you think will give you the best results.
But for now, let's move on to the next step
and see if we can get the AI to use this
information and give us a direct response.
Now that we have found relevant data chunks for our
query, we can feed this into OpenAI to create a high
quality response using that data as our source.
First, we'll need a prompt template to create a prompt with.
You can use something like this.
Notice that there's placeholders for this template.
The first is the context that we're going to pass in.
So that's going to be the pieces of information that we got from the database.
And then the second is the actual query itself.
Next, here's the code to actually use that data to create the
actual prompt by formatting the template with our keys.
So after running this, you should have a single piece of string.
It's going to be quite a long string, but it's going
to be the entire prompt with all the chunks of information
and the query that you asked at the beginning.
After running that piece of code, you should
get a prompt that looks something like this.
So you're going to have this initial prompt, which is to
answer the question based on the following context.
And then we're going to have our three pieces of information.
And this can be as big or as little as we want
it to be, but here this is what we've chosen.
And then the question that we originally asked.
So here's our query.
How does Alice meet the Mad Hatter?
So this is the overall prompt that we're about to send to OpenAI.
This is actually the easy part.
So simply just call the LLM model of your choice with that prompt.
So here I'm using chatOpenAI, and then you'll have your response.
Finally, if you want to provide references back to
your source material, you can also find that in
the metadata of each of those document chunks.
So here's the code on how you can extract that out and print it out as well.
And going back to our code editor, this is what my script
looks like with all of those pieces put together.
So I've got my prompt template here.
I've got my main argument here, which takes the query, searches
the database for the relevant chunks, creates the
prompt, and then uses the LLM to answer the question.
And then here I'm collecting all the sources that were used
to answer the prompt and print out the entire response.
Let's go ahead and run that.
And here's the result of running that script.
So again, we see the entire prompt here, and this is the final response.
The response is Alice meets the Mad Hatter by walking in
the direction where the March Hare was set to live.
And obviously it took this from the first piece of the context.
And here we also have a list of the source references that it got it from.
This is pretty much pointing to the same file because I only
made it print out the actual file itself and not the index.
But this is pretty good already because you can
see how it's using our query to search for
pieces of information from our source material
and then answer based on that information.
Now let me switch up my data source and show you
a different example just so you can see what
else you can do with something like this.
I switched my database to one I prepared earlier, which
uses the AWS Lambda documentation as a source.
And here the query I'm going to ask it is what
languages or runtimes does AWS Lambda support?
So after I ran this, you can see that the chunks
I use here are much bigger than in the previous
example, but it still managed to find three relevant
chunks of my information and it's published
a response that summarizes that information.
So here it says AWS Lambda supports Java, C#, Python, and etc.
You can read the rest here.
But this is more interesting because unlike in the first
example, the sources were actually from different files.
So you can see here that each of the source is its own file.
So this is useful as well.
If you have data source that spread out across a lot of different
files and you want to see how to reference the source.
So we just covered how you can use the line chain and OpenAI
to create a retrieval augmented generation app.
I'll post a link to the GitHub code in the video
description and I encourage you to try this
out for yourself and with your own data set.
If you want to see more tutorials like this, then please let
me know what type of topics you'd be interested to see next.
Otherwise, I hope you found this useful and thank you for watching.
Voir Plus de Vidéos Connexes
Realtime Powerful RAG Pipeline using Neo4j(Knowledge Graph Db) and Langchain #rag
Introduction to Generative AI (Day 10/20) What are vector databases?
End to end RAG LLM App Using Llamaindex and OpenAI- Indexing and Querying Multiple pdf's
Retrieval Augmented Generation - Neural NebulAI Episode 9
"I want Llama3 to perform 10x with my private knowledge" - Local Agentic RAG w/ llama3
Llama-index for beginners tutorial
5.0 / 5 (0 votes)