RAG + Langchain Python Project: Easy AI/Chat For Your Docs

pixegami
20 Nov 202316:41

Summary

TLDRThis tutorial video guides viewers on constructing a retrieval augmented generation app using Langchain and OpenAI, ideal for handling extensive text data like books or documentation. It demonstrates the process from data preparation to creating a vector database with ChromaDB, utilizing techniques like RAG for precise AI responses. The video also covers embedding vectors for text and crafting prompts for AI to generate answers, concluding with examples using 'Alice in Wonderland' and AWS Lambda documentation.

Takeaways

  • 📚 The video is a tutorial on building a retrieval augmented generation app using Langchain and OpenAI, which can interact with personal documents or data sources.
  • 🔍 The app is useful for handling large volumes of text data, such as books, documents, or lectures, and allows AI interaction like asking questions or building chatbots.
  • 🤖 The technique used is called RAG (Retrieval Augmented Generation), which ensures responses are based on provided data rather than fabricated answers.
  • 📁 The data source can be a PDF, text, or markdown files, and the tutorial uses AWS Lambda documentation as an example.
  • 🗂 The process starts with loading the data into Python using a directory loader module from Langchain, turning each file into a 'document' with metadata.
  • 📐 The documents are then split into smaller 'chunks' using a text splitter, which can be paragraphs, sentences, or pages, to improve search relevance.
  • 📊 A vector database, ChromaDB, is used to store the chunks, utilizing vector embeddings as keys, which require an OpenAI account for generation.
  • 📈 Vector embeddings represent text meanings as numerical lists, where similar texts have close vector coordinates, measured by cosine similarity or Euclidean distance.
  • 🔑 To generate a vector from text, an LLM like OpenAI is used, which can convert words into vector form for comparison and database querying.
  • 🔍 The querying process involves turning a user's query into a vector and finding the most relevant chunks in the database based on embedding distance.
  • 📝 The relevant chunks are then used to create a prompt for OpenAI to generate a response, which can also include references to the source material.

Q & A

  • What is the purpose of the video?

    -The purpose of the video is to demonstrate how to build a retrieval augmented generation app using Langchain and OpenAI to interact with one's own documents or data sources, such as a collection of books, documents, or lectures.

  • What does RAG stand for in the context of this video?

    -RAG stands for Retrieval Augmented Generation, a technique used in the video to build an application that can provide responses using a data source while also quoting the original source of information.

  • What is the data source used in the example provided in the video?

    -The example in the video uses the AWS documentation for Lambda as the data source.

  • How does the video ensure the AI's response is based on the provided data source rather than fabricated?

    -The video ensures this by demonstrating how the AI can use the provided documentation to give a response and quote the source, preventing the AI from fabricating a response.

  • What is the first step in building the app as described in the video?

    -The first step is to prepare the data that you want to use, which could be a PDF, a collection of text, or markdown files, and then load this data into Python using a directory loader module from Langchain.

  • Why is it necessary to split a document into smaller chunks?

    -Splitting a document into smaller chunks is necessary to make each chunk more focused and relevant when searching through the data, improving the quality and accuracy of the AI's response.

  • What tool is used to split the text into chunks in the video?

    -A recursive character text splitter is used to divide the text into chunks, allowing the user to set the chunk size and the overlap between each chunk.

  • What is ChromaDB and how is it used in the video?

    -ChromaDB is a special kind of database that uses vector embeddings as the key. It is used in the video to create a database from the chunks of text, which can then be queried for relevant data.

  • What is a vector embedding in the context of this video?

    -A vector embedding is a list of numbers that represent text in a multi-dimensional space, capturing the meaning of the text. Similar texts will have similar vector embeddings.

  • How is the relevance of the retrieved data determined in the app?

    -The relevance is determined by calculating the distance between the vector embeddings of the query and the chunks in the database. The chunks with the smallest distance are considered more relevant.

  • What is the final step in the process shown in the video?

    -The final step is to use the relevant data chunks to create a prompt for OpenAI, which is then used to generate a high-quality response to the user's query, also providing references back to the source material.

Outlines

00:00

🚀 Building a Retrieval Augmented Generation App

This paragraph introduces a tutorial on constructing an app using Langchain and OpenAI to enhance data retrieval and generation. The app is designed for interacting with large text datasets, such as books or documents, through AI capabilities like asking questions or creating customer support chatbots. The tutorial will cover setting up a data source, creating a vector database, querying the database, and forming coherent responses using the RAG technique with AWS Lambda documentation as an example. The process is made accessible by breaking it down into manageable steps, starting from data preparation to the final response generation.

05:02

📚 Organizing and Processing Data for AI Interaction

The speaker explains the initial steps in preparing data for the AI app, including selecting a data source like PDFs or markdown files and organizing them into folders. The markdown files are loaded into Python using the directory loader module from Langchain, and each file is transformed into a 'document' containing text and metadata. The documents are then split into smaller, more focused 'chunks' using a text splitter, which is crucial for enhancing the relevance of search results. The AWS Lambda documentation and Alice in Wonderland book serve as examples of how different documents can be processed and split into chunks for better AI interaction.

10:03

🔍 Creating a Vector Database with ChromaDB

The paragraph delves into the technical process of creating a vector database using ChromaDB, which leverages vector embeddings as keys. It requires an OpenAI account to generate embeddings for text chunks. The tutorial outlines creating a Chroma path for persistent storage, removing old database versions, and saving the new database to disk. The concept of vector embeddings is introduced as a method to represent text in a multi-dimensional space, allowing for the measurement of semantic similarity between texts. The tutorial also demonstrates how to generate and evaluate embeddings using OpenAI's functions and Langchain's utility.

15:07

🤖 Querying the Database and Crafting AI Responses

The speaker describes how to use the vector database to query for information relevant to a given user query. This involves loading the Chroma database, using the same embedding function as before, and searching for the top matching chunks of information. The process includes checking for relevant matches and crafting a custom AI response based on the retrieved data. The tutorial provides a step-by-step guide on creating a prompt template for OpenAI, formatting it with context and query, and using it to generate a response. It also shows how to extract and print source references from the metadata of document chunks.

🌐 Demonstrating App Functionality with Different Data Sources

In this paragraph, the speaker demonstrates the app's functionality by switching the data source to the AWS Lambda documentation and asking a different question about supported languages or runtimes. The response showcases the app's ability to retrieve relevant information from various files and summarize it accurately. The tutorial concludes by encouraging viewers to try the app with their data and to provide feedback for future tutorial topics. A GitHub link is promised in the video description for those interested in the code.

Mindmap

Keywords

💡Retrieval Augmented Generation (RAG)

Retrieval Augmented Generation (RAG) is a technique that combines the capabilities of retrieving relevant information from a dataset with the generation of new content based on that information. In the context of the video, RAG is used to build an application that can interact with a specific data source, such as AWS documentation, to provide accurate and sourced responses to queries. The script illustrates this by showing how the agent uses the AWS Lambda documentation to answer a question, quoting the exact source of the information.

💡Langchain

Langchain is a library in Python that facilitates the creation of applications using RAG techniques. It provides modules for loading data, splitting text into chunks, and interacting with databases like ChromaDB. The video script discusses using Langchain to build an app that can handle text data, such as a collection of books or documents, and demonstrates its use in creating a vector database from markdown files.

💡OpenAI

OpenAI is a company specializing in artificial intelligence and is mentioned in the script as the provider of the embeddings function used to generate vector representations of text. This function is essential for creating the vector database in ChromaDB and for querying the database to find relevant text chunks. The script also mentions using OpenAI's chatbot model to generate responses based on the retrieved information.

💡Vector Database

A vector database, like ChromaDB mentioned in the script, is a type of database that stores and retrieves information based on vector embeddings of text. It allows for efficient querying of text data by calculating the similarity between query embeddings and stored text embeddings. The script explains how to create such a database from chunks of text and how it's used to retrieve relevant information for generating responses.

💡Embeddings

Embeddings are vector representations of text that capture semantic meaning, turning words or sentences into numerical values that can be analyzed mathematically. In the video, embeddings are used to convert text into a format that can be compared for similarity within the vector database. The script provides examples of how different words are converted into embeddings and how the distances between these vectors can indicate semantic similarity.

💡Metadata

Metadata in the context of the video refers to data about other data, such as the source file name and start index of a text chunk. This information is important for tracking the origin of text chunks within the database and for providing references back to the original material when generating responses. The script discusses adding metadata to documents and using it to identify the source of information in the response.

💡Chunking

Chunking is the process of breaking down a large document into smaller, more manageable pieces or 'chunks'. The script explains that this is necessary because a single document can be too long to be effectively processed and searched. By setting a chunk size and overlap, the script demonstrates how to split documents into chunks that can be individually analyzed and retrieved from the vector database.

💡ChromaDB

ChromaDB is a vector database mentioned in the script that is used to store and query text chunks based on their vector embeddings. It is specifically designed to work with vector embeddings as keys, allowing for efficient similarity-based searches. The script details the process of creating a ChromaDB database from text chunks and using it to find relevant information for response generation.

💡Query

In the context of the video, a query is a user's request or question that the application is designed to answer. The script describes how a query is turned into a vector embedding and used to search the ChromaDB database for the most relevant text chunks. The application then uses these chunks to generate a response to the query, as demonstrated with examples like 'How does Alice meet the Mad Hatter in Alice in Wonderland?'

💡Prompt Template

A prompt template is a pre-defined structure for creating prompts to be used with an AI model. In the script, it is used to format the context and query into a single string that can be sent to the AI model to generate a response. The template includes placeholders for the context and the query, which are filled in with the relevant information retrieved from the database to form the final prompt.

Highlights

Introduction of a tutorial on building a retrieval augmented generation app using Langchain and OpenAI.

The app allows interaction with personal documents or data sources using AI, suitable for large text datasets.

Demonstration of using AWS Lambda documentation as a data source with the app.

Explanation of the RAG technique for generating responses with source citations.

Assurance that the project is easier than it seems, with a step-by-step guide provided.

The necessity of a data source like PDFs, text, or markdown files for the project.

Use of the directory loader module from Langchain to load markdown data into Python.

The process of splitting documents into smaller chunks for more focused data retrieval.

Utilization of a recursive character text splitter with adjustable chunk size and overlap.

Transformation of text chunks into a vector database using ChromaDB.

Requirement of an OpenAI account for generating vector embeddings with the OpenAI embeddings function.

Explanation of vector embeddings as representations of text that capture meaning.

Use of cosine similarity or Euclidean distance to calculate the distance between vectors.

Demonstration of generating a vector from a word using OpenAI's API.

Introduction of an evaluator function in Langchain to compare embedding distances.

Querying the database to find the most relevant chunks of information for a given question.

Crafting a custom response using the retrieved data chunks with the help of AI.

Loading the Chroma database and using the same embedding function for consistency.

Use of a prompt template to create a prompt for OpenAI with placeholders for context and query.

Final step of using the crafted prompt to get a response from the LLM model.

Inclusion of source material references in the response for traceability.

Switching data sources to AWS Lambda documentation for a different example of app usage.

Summary of how the app uses query to search for information and answer based on that data.

Invitation to try the tutorial with one's own dataset and provide feedback for future topics.

Transcripts

play00:00

Hey everyone, welcome to this video where I'm going

play00:02

to show you how to build a retrieval augmented

play00:05

generation app using Langchain and OpenAI.

play00:07

You can then use this app to interact with your

play00:10

own documents or your own data source.

play00:12

This type of application is great for when

play00:14

you have a lot of text data to work with.

play00:16

For example, a collection of books, documents or lectures. And

play00:20

you want to be able to interact with that data using AI.

play00:23

For example, you might want to be able to ask

play00:26

questions about that data or perhaps build

play00:28

something like a customer support chatbot that

play00:31

you want to follow a set of instructions.

play00:33

Today, we're going to learn how to build this using

play00:36

OpenAI and the Langchain library in Python.

play00:38

We're going to be using a technique called RAG, which

play00:41

stands for retrieval augmented generation.

play00:43

In this example, the data source I've given it is the AWS documentation for Lambda.

play00:49

And here I'm asking it a question based on that documentation.

play00:53

The agent will be able to use that documentation

play00:56

to give me a response as well as quote the source

play00:59

where it got that information from originally.

play01:01

This way, you always know that it's using data from the sources

play01:05

you provided it with rather than hallucinating the response.

play01:08

If this project sounds complex or difficult to you, then

play01:11

don't worry because it's a lot easier than you think.

play01:14

I'll walk you through every step of the project, starting

play01:17

with how to prepare the data that you want to use

play01:19

and then how to turn that into a vector database.

play01:22

Then we'll also look at how to query that database for relevant pieces of data.

play01:26

Finally, you can then put all those pieces together to form a coherent response.

play01:31

If that sounds good, then let's get started.

play01:33

To begin, we'll first need a data source like a

play01:38

PDF or a collection of text or markdown files.

play01:42

This can be anything.

play01:43

For example, it could be documentation files for your software.

play01:46

It could be a customer support handbook, or it

play01:49

could even be transcripts from a podcast.

play01:51

First, find some markdown files you want to use as data for this project.

play01:55

But if you want some ideas, then here I've got the Alice

play01:59

in Wonderland book as a markdown file, or I also have

play02:03

the AWS documentation as a bunch of markdown files.

play02:06

And I have each of them in their own separate

play02:09

folder under this data folder in my project.

play02:11

So make sure you have a setup like this first before you start.

play02:14

Once you have that source material, we're going to need to load

play02:17

it up and then split it into different chunks of text.

play02:19

To load some markdown data from your folder into Python,

play02:23

you can use this directory loader module from Langchain.

play02:26

Just update this data path variable with wherever you've decided to put your data.

play02:31

Here I'm using data/books.

play02:33

If you only have one markdown file in that folder, it's okay.

play02:36

Or if you have multiple markdown files, then

play02:39

this will load everything and turn each of those

play02:41

files into something called a document.

play02:43

If I use this piece of code on my AWS Lambda

play02:46

documents folder instead, then each of these

play02:48

markdown files will become a document.

play02:51

And a document is going to contain all of the content on this page.

play02:54

So basically all of the text you see here.

play02:57

And it's also going to contain a bunch of metadata.

play03:00

For example, the name of the source file where the text originally came from.

play03:04

And after you've created your document, you can also choose

play03:07

to add any other metadata you want to that document.

play03:10

Now the next problem we encounter is that a

play03:13

single document can be really, really long.

play03:16

So it's not enough that we load each markdown file into one document.

play03:20

We have to also split each document if they're too long on their own.

play03:24

With something as long as this, we're going to want

play03:27

to split this big document into smaller chunks.

play03:30

And a chunk could be a paragraph, it could be a

play03:32

sentence, or it could be even several pages.

play03:35

It depends on what we want.

play03:36

By doing this, the outcome that we're looking

play03:39

for is that when we search through all of this

play03:41

data, each chunk is going to be more focused

play03:44

and more relevant to what we're looking for.

play03:46

To achieve this, we can use a recursive character text splitter.

play03:50

And here we can set the chunk size in number of characters

play03:53

and then the overlap between each chunk.

play03:56

So in this example, we're going to make the chunk

play03:58

size about 1000 characters, and each chunk

play04:00

is going to have an overlap of 500 characters.

play04:03

So I've just ran the script to split up my text into several chunks.

play04:07

And here I've printed out the number of original documents

play04:11

and the number of chunks it was split into.

play04:14

Since I used this on the Alice in Wonderland

play04:17

text, it split one document into 282 chunks.

play04:20

And down here, I've just picked a random chunk as

play04:22

a document and I printed out the page content and

play04:24

the metadata so you could see what it looks like.

play04:27

So the page content is just literally a part of the text taken out of that chunk.

play04:32

So here you can see that it's about one or two paragraphs of the story.

play04:37

And the metadata right now, it only has the source, which is

play04:40

the path of the file it got this from, and the start index.

play04:44

So where in that source does this particular chunk begin?

play04:48

And if you try the same code with the AWS Lambda docs

play04:51

instead, you'll see that the source also points to

play04:55

the file that the information of each chunk is from.

play04:58

So this is also useful if you have a lot of different files,

play05:01

rather than just one big file splitting into smaller chunks.

play05:05

To be able to query each chunk, we're going to need to turn this into a database.

play05:09

We'll be using ChromaDB for this, which is a special kind

play05:12

of database that uses vector embeddings as the key.

play05:15

This is the code that you can use to create a Chroma database from our chunks.

play05:19

For this, you're going to need an OpenAI account because

play05:22

we're going to use the OpenAI embeddings function

play05:25

to generate the vector embeddings for each chunk.

play05:27

I'm also going to create a Chroma path and set that

play05:30

as the persistent directory so that when we create

play05:33

this database, I have a bunch of folders on my

play05:36

disk that I can use to load the data later on.

play05:39

This is useful because normally I might want to

play05:42

put this database into a Lambda function or I

play05:44

might want to put it in the cloud somewhere.

play05:47

So I want to be able to save it to disk so

play05:49

that I can copy it or deploy it as a file.

play05:51

Now before I create the database or before I save

play05:53

it to disk, I can also use this code snippet

play05:56

to remove it first if it already exists.

play05:58

This is useful if I want to clear all of my previous versions

play06:01

of the database before I run the script to create a new one.

play06:03

Now the database should save automatically after we create it,

play06:07

but you can also force it to save using this persist method.

play06:10

So once you've put all of that together and then

play06:13

you've run your script to generate your database,

play06:16

you should see this line where it's saved

play06:18

all of your chunks to the Chroma database.

play06:21

And you can see here on your disk that the data should be there as well.

play06:24

And here it's going to be saved as a SQLite3 file.

play06:27

So now at this point we have our vector database

play06:30

created and we're ready to start using it.

play06:32

But first you're probably going to want to know what a vector embedding is.

play06:36

If you already know what embedding vectors are,

play06:38

then feel free to skip the section entirely.

play06:41

Otherwise, I'll give you a really quick explanation just to bring you up to speed.

play06:45

Embeddings are vector representations of text that capture their meaning.

play06:49

In Python, this is literally a list of numbers.

play06:52

You can think of them as sort of coordinates in multi-dimensional

play06:56

space and if two pieces of text

play06:58

are closely related to each other in meaning, then

play07:00

those coordinates will also be close together.

play07:03

The distance between these vectors can then be calculated pretty

play07:07

easily using cosine similarity or Euclidean distance.

play07:10

We don't need to do that ourselves though, because there's a

play07:13

lot of existing functions that can do that for us already.

play07:15

And this will give us a single number that tells

play07:18

us how far these two vectors are apart.

play07:20

To actually generate a vector from a word, we'll need an LLM, like OpenAI.

play07:25

And this is usually just an API or a function we can call.

play07:27

For example, you can use this code to turn

play07:30

the word "apple" into a vector embedding.

play07:32

And this is the result I get from using that function.

play07:35

So you could see that the vector here is literally a really long list of numbers.

play07:41

And the first number is 0.007 something-something, but

play07:45

I truncated the rest because the list is quite long.

play07:48

In fact, if you print the length of the vector, you

play07:51

could see that the list has 1536 characters.

play07:54

So this is basically a list of one and a half thousand numbers.

play07:58

The numbers themselves aren't interesting though.

play08:00

What's really interesting is the distance between two vectors themselves.

play08:05

And this is quite hard to calculate from scratch, but

play08:08

Langchain actually gives us a utility function to compare

play08:11

the embedding distance directly using OpenAI.

play08:14

So it's called an evaluator and this is how you can create one.

play08:17

And here's the code to run an evaluation.

play08:20

So here I'm comparing the distance of the word "apple" to the word "orange".

play08:26

And running this, the result is a score of 0.13.

play08:30

So we don't actually know whether that's good or not

play08:33

by comparing an apple to an orange, because we don't

play08:36

know where 0.13 sits on the scale of other words.

play08:38

So let's try a couple of other words just to see what's a better

play08:42

match with apple than orange, and what's a worse match.

play08:45

Here, if I compare "apple" to the word "beach", it's actually 0.2.

play08:49

So "beach" is further away from "apple" than "orange"

play08:52

is, I suppose because one is a fruit.

play08:54

So that naturally makes it more similar.

play08:56

Now if I compare the word "apple" to itself, this should

play08:58

technically be 0 because it's literally the same word.

play09:01

But in this case, it's close enough.

play09:02

It's 2.5 x 10^-6.

play09:05

Now what about if we compare the word "apple" to "iPhone"?

play09:08

In this case, the score is even better than when we compared it with "orange".

play09:11

The score is 0.09.

play09:13

And this is really interesting as well, because in our

play09:16

first example with apples and oranges, they were

play09:18

both fruits, so they were similar in that respect.

play09:21

But here, we're sort of interpreting the word "apple" from a different perspective.

play09:25

We're seeing it as the name of the company "apple" instead.

play09:28

So when you compare it with the word "iPhone",

play09:31

the association is actually much stronger.

play09:33

So now that you understand what embeddings are,

play09:35

let's see how we can use it to fetch data.

play09:38

To query for relevant data, our objective is to find

play09:41

the chunks in our database that will most likely contain

play09:44

the answer to the question that we want to ask.

play09:47

So to do that, we'll need the database that we created

play09:50

earlier, and we'll need the same embedding

play09:52

function that we used to create that database.

play09:54

Our goal now is to take a query, like the one

play09:57

on the left here, and then turn that into

play10:00

an embedding using the same function, and

play10:02

then scan through our database and find

play10:05

maybe five chunks of information that are

play10:08

closest in embedding distance from our query.

play10:11

So here, in this example, I might ask the question

play10:13

like, "How does Alice meet the Mad Hatter in Alice

play10:16

in Wonderland?" And when we scan our database,

play10:19

we might get maybe four or five snippets of

play10:22

text that we think is similar to this question.

play10:24

And from that, we can put that together, have

play10:27

the AI read all of that information, and decide

play10:30

what is the response to give to the user.

play10:32

So although we're not just simply returning the

play10:35

chunks of information verbatim, we're actually

play10:38

using it to craft a more custom response

play10:40

that is still based on our source information.

play10:42

To load the Chroma database that we created, we're

play10:45

first going to need the path, which we have from

play10:47

earlier, and we're going to need an embedding

play10:49

function, which should be the same one we used

play10:51

to create the database with in the first place.

play10:53

So here, I'm just going to use the OpenAI embeddings function again.

play10:56

This should load your database from that path.

play10:58

If it doesn't, then just check that the path exists,

play11:00

or just go back to the previous chapter and

play11:03

run the script to create the database again.

play11:05

Once the database is loaded, we can then search for the

play11:08

chunk that best matches our query by using this method.

play11:11

We need to pass in our query text as an argument and

play11:14

specify the number of results we want to retrieve.

play11:17

So in this example, we want to retrieve three best matches for our query.

play11:21

The results of the search will be a list of tuples where

play11:24

each tuple contains a document and its relevance score.

play11:27

Before actually processing the results though, we can also add some checks.

play11:31

For example, if there are no matches or if the

play11:34

relevant score of the first result is below

play11:36

a certain threshold, we can return early.

play11:39

This will help us to make sure that we actually

play11:41

find good, relevant information first before

play11:44

moving to the next step of the process.

play11:46

So now let's go to our code editor and put all that together and see what we get.

play11:50

So here I've got the main function.

play11:52

I just made a quick argument parser so I can

play11:54

input the query text in the command line.

play11:56

I've got my embeddings function and I'm going to

play11:59

search the database that I've loaded and I'm

play12:01

just going to print the content for each page.

play12:04

So I'm going to find the top three results for my query.

play12:07

So that's my script. Let's give it a go.

play12:09

So here I'm running my script with the query,

play12:11

which is how does Alice meet the mad hatter?

play12:13

Here it's returned the three most relevant chunks

play12:16

in the text that it thought best match our query.

play12:19

So we have this piece of information, this piece

play12:22

of information, and then this one here.

play12:24

Now here the chunk size is quite small, so it doesn't

play12:27

have the full context of each part of the text.

play12:30

So if you want to edit that you can play with

play12:32

that chunk size variable and make it either

play12:34

bigger or smaller, depending on what

play12:36

you think will give you the best results.

play12:38

But for now, let's move on to the next step

play12:41

and see if we can get the AI to use this

play12:44

information and give us a direct response.

play12:47

Now that we have found relevant data chunks for our

play12:50

query, we can feed this into OpenAI to create a high

play12:53

quality response using that data as our source.

play12:55

First, we'll need a prompt template to create a prompt with.

play12:58

You can use something like this.

play13:00

Notice that there's placeholders for this template.

play13:02

The first is the context that we're going to pass in.

play13:05

So that's going to be the pieces of information that we got from the database.

play13:09

And then the second is the actual query itself.

play13:11

Next, here's the code to actually use that data to create the

play13:15

actual prompt by formatting the template with our keys.

play13:19

So after running this, you should have a single piece of string.

play13:22

It's going to be quite a long string, but it's going

play13:25

to be the entire prompt with all the chunks of information

play13:28

and the query that you asked at the beginning.

play13:30

After running that piece of code, you should

play13:32

get a prompt that looks something like this.

play13:34

So you're going to have this initial prompt, which is to

play13:37

answer the question based on the following context.

play13:40

And then we're going to have our three pieces of information.

play13:43

And this can be as big or as little as we want

play13:45

it to be, but here this is what we've chosen.

play13:47

And then the question that we originally asked.

play13:50

So here's our query.

play13:51

How does Alice meet the Mad Hatter?

play13:53

So this is the overall prompt that we're about to send to OpenAI.

play13:58

This is actually the easy part.

play14:00

So simply just call the LLM model of your choice with that prompt.

play14:03

So here I'm using chatOpenAI, and then you'll have your response.

play14:07

Finally, if you want to provide references back to

play14:10

your source material, you can also find that in

play14:13

the metadata of each of those document chunks.

play14:16

So here's the code on how you can extract that out and print it out as well.

play14:19

And going back to our code editor, this is what my script

play14:22

looks like with all of those pieces put together.

play14:25

So I've got my prompt template here.

play14:27

I've got my main argument here, which takes the query, searches

play14:31

the database for the relevant chunks, creates the

play14:35

prompt, and then uses the LLM to answer the question.

play14:38

And then here I'm collecting all the sources that were used

play14:41

to answer the prompt and print out the entire response.

play14:44

Let's go ahead and run that.

play14:45

And here's the result of running that script.

play14:48

So again, we see the entire prompt here, and this is the final response.

play14:52

The response is Alice meets the Mad Hatter by walking in

play14:56

the direction where the March Hare was set to live.

play14:58

And obviously it took this from the first piece of the context.

play15:02

And here we also have a list of the source references that it got it from.

play15:07

This is pretty much pointing to the same file because I only

play15:10

made it print out the actual file itself and not the index.

play15:13

But this is pretty good already because you can

play15:15

see how it's using our query to search for

play15:18

pieces of information from our source material

play15:20

and then answer based on that information.

play15:23

Now let me switch up my data source and show you

play15:25

a different example just so you can see what

play15:27

else you can do with something like this.

play15:29

I switched my database to one I prepared earlier, which

play15:32

uses the AWS Lambda documentation as a source.

play15:35

And here the query I'm going to ask it is what

play15:38

languages or runtimes does AWS Lambda support?

play15:41

So after I ran this, you can see that the chunks

play15:43

I use here are much bigger than in the previous

play15:46

example, but it still managed to find three relevant

play15:49

chunks of my information and it's published

play15:51

a response that summarizes that information.

play15:53

So here it says AWS Lambda supports Java, C#, Python, and etc.

play15:58

You can read the rest here.

play16:00

But this is more interesting because unlike in the first

play16:03

example, the sources were actually from different files.

play16:06

So you can see here that each of the source is its own file.

play16:09

So this is useful as well.

play16:11

If you have data source that spread out across a lot of different

play16:14

files and you want to see how to reference the source.

play16:17

So we just covered how you can use the line chain and OpenAI

play16:20

to create a retrieval augmented generation app.

play16:23

I'll post a link to the GitHub code in the video

play16:26

description and I encourage you to try this

play16:28

out for yourself and with your own data set.

play16:30

If you want to see more tutorials like this, then please let

play16:32

me know what type of topics you'd be interested to see next.

play16:35

Otherwise, I hope you found this useful and thank you for watching.

Rate This

5.0 / 5 (0 votes)

Ähnliche Tags
AI AppLangchainOpenAIText RetrievalData InteractionRAG TechniqueAWS LambdaDocumentationChatbotEmbeddingsChromaDB
Benötigen Sie eine Zusammenfassung auf Englisch?