Python RAG Tutorial (with Local LLMs): AI For Your PDFs
Summary
TLDRDans cette vidéo, nous construisons une application Python RAG qui permet de poser des questions sur un ensemble de PDFs, tels que des manuels d'instructions de jeux de société, en utilisant un langage naturel. L'application fournit des réponses et des références aux sources. Nous abordons l'exécution locale avec des modèles de LLM open source, la mise à jour de la base de données de vecteurs avec de nouvelles entrées et l'évaluation de la qualité des réponses générées par l'IA. Des tutoriels antérieurs sont recommandés pour les débutants, et un dépôt GitHub fournit le code source pour une compréhension approfondie.
Takeaways
- 📚 Le tutoriel explique comment construire une application Python RAG (Retrieval-Augmented Generation) pour poser des questions sur un ensemble de PDFs, tels que des manuels d'instructions de jeux de société.
- 🔍 L'application peut répondre à des questions en utilisant le contenu des PDFs, en fournissant également une référence au matériel source.
- 💻 Le tutoriel couvre comment faire fonctionner l'application localement en utilisant des modèles de grandeurs libres (LLMs) open source.
- 🆕 Il est également expliqué comment mettre à jour la base de données vectorielle avec de nouvelles entrées sans avoir à reconstruire la base de données entière.
- 🔧 Le script aborde le test et l'évaluation de la qualité des réponses générées par l'IA pour valider rapidement les modifications apportées à l'application.
- 🔄 RAG est l'acronyme de Retrieval, Augmented Generation, une méthode d'indexation de données pour les combiner avec un LLM afin de créer une expérience de chat IA utilisant ces données.
- 📈 Pour créer la base de données, les documents PDF sont divisés en morceaux plus petits, transformés en vecteurs (embeddings) et stockés dans la base de données vectorielle.
- 📝 L'exemple de démonstration montre comment l'application peut répondre à des questions sur les règles du jeu Monopoly, en utilisant des données issues des PDFs des manuels d'instructions.
- 🔗 L'utilisation de l'embedding est cruciale pour que les requêtes correspondent aux morceaux d'information pertinents dans la base de données.
- 🛠️ Le tutoriel propose l'utilisation d'Ollama pour gérer et exécuter des modèles LLM open source localement sur un ordinateur, bien que d'autres options comme OpenAI ou AWS Bedrock soient également mentionnées.
- 📝 L'application peut être testée en utilisant des tests unitaires avec des questions et des réponses attendues, en utilisant un LLM pour évaluer si les réponses sont équivalentes.
Q & A
Qu'est-ce que RAG et comment fonctionne-t-il dans le cadre de cette application Python ?
-RAG signifie Retrieval, Augmented Generation. Il s'agit d'une méthode d'indexation d'une source de données pour qu'elle puisse être combinée avec un LLM (modèle de langage de longueur), permettant ainsi une expérience de chat IA qui exploite ces données.
Quels types de documents sont utilisés dans cet exemple d'application RAG ?
-Des manuels d'instructions de jeux de société, tels que Monopoly ou CodeNames, sont utilisés comme documents source pour cette application RAG.
Comment l'application peut-elle fournir une réponse à une question sur les PDFs ?
-L'application divise les données des PDFs en petits morceaux, les transforme en vecteurs (embeddings) et les stocke dans une base de données vectorielle. Lorsqu'une question est posée, une recherche est effectuée dans la base de données pour trouver les entrées les plus pertinentes, qui sont ensuite utilisées pour générer la réponse.
Quels sont les avantages d'utiliser un LLM local pour générer la réponse ?
-Un LLM local permet d'éviter les coûts associés aux services en ligne et de bénéficier de la flexibilité de modification et d'ajout d'informations sans avoir à reconstruire la base de données entière.
Comment l'application peut-elle être mise à jour avec de nouvelles entrées dans la base de données ?
-En attribuant un ID unique à chaque morceau de texte, l'application peut vérifier si un élément existe déjà dans la base de données et, le cas échéant, mettre à jour ou l'ajouter.
Quels sont les outils et bibliothèques clés utilisés dans ce tutoriel ?
-Ce tutoriel utilise des outils comme Langchain pour le chargement de documents, ChromaDB pour la base de données vectorielle, et Ollama pour le LLM local.
Comment les nouveaux PDF peuvent-ils être ajoutés à l'application sans重建 la base de données ?
-En utilisant un ID unique pour chaque morceau de texte, l'application peut identifier les nouveaux documents et les ajouter à la base de données existante sans avoir à reconstruire celle-ci.
Quelle est la différence entre un LLM et une fonction d'embedding ?
-Un LLM (modèle de langage) est un modèle de deep learning utilisé pour générer du texte, tandis qu'une fonction d'embedding est utilisée pour transformer des données en vecteurs, qui servent de clés dans une base de données vectorielle.
Comment l'application peut-elle évaluer la qualité des réponses générées par l'IA ?
-En utilisant des tests unitaires et en demandant à un autre LLM d'évaluer si les réponses sont équivalentes, l'application peut déterminer si les réponses sont correctes ou non.
Comment les tests unitaires sont-ils utilisés pour évaluer les réponses de l'application ?
-Des tests unitaires sont écrits avec des questions et des réponses attendues. L'application est interrogée avec ces questions et les réponses sont comparées à celles attendues, en utilisant un LLM pour évaluer l'équivalence des réponses.
Quels sont les défis potentiels lors de l'utilisation d'un LLM pour évaluer les réponses dans les tests unitaires ?
-Un défi est que l'LLM pourrait être trop généreux dans l'évaluation, ce qui pourrait conduire à accepter des réponses incorrectes. Il est donc important d'inclure des cas négatifs pour s'assurer que les mauvaises réponses sont correctement identifiées.
Outlines
😀 Création d'une application Python RAG
Dans cette vidéo, l'objectif est de développer une application Python RAG qui permet de poser des questions sur un ensemble de PDFs à l'aide d'un langage naturel. Les PDFs utilisés sont des manuels d'instructions de jeux de société comme Monopoly ou CodeNames. L'application peut répondre aux questions et fournir une référence au matériel source. Le tutoriel aborde des fonctionnalités avancées telles que le fonctionnement local avec des modèles de grandeurs libres, la mise à jour de la base de données de vecteurs avec de nouvelles entrées sans avoir à reconstruire la base de données entière, et la manière de tester et d'évaluer la qualité des réponses générées par l'IA. Un rappel rapide sur le concept de RAG (Retrieval, Augmented Generation) est donné, suivi d'une démonstration de l'application terminée.
📚 Traitement des données et création d'un vecteur d'indexation
Le script traite les données PDF en les divisant en petits morceaux, puis transforme ces morceaux en vecteurs (embeddings) et les stocke dans une base de données de vecteurs. Il est essentiel d'utiliser la même fonction d'embedding pour la création de la base de données et pour les requêtes. Différentes fonctions d'embedding sont discutées, y compris AWS Bedrock et Ollama, qui permet de gérer et d'exécuter des modèles de grandeurs libres localement. La création de la base de données avec ChromaDB est expliquée, en montrant comment ajouter ou mettre à jour des entrées existantes en utilisant des identifiants uniques pour chaque morceau de texte.
🔍 Mise à jour de la base de données et gestion des données
Le texte explique comment ajouter de nouveaux PDFs à la base de données sans avoir à reconstruire entièrement celle-ci. Chaque morceau de texte est identifié par un ID unique basé sur le chemin d'accès, le numéro de page et l'index de morceau. L'application peut détecter les nouveaux documents et les ajouter à la base de données, tout en évitant les doublons. La mise à jour d'une page existante est un problème plus complexe qui n'est pas abordé dans cette vidéo, mais les spectateurs sont encouragés à proposer des solutions.
🤖 Intégration de l'IA et génération de réponses
L'IA est utilisée pour générer des réponses en utilisant un modèle de langage local (Ollama avec le modèle Mistral). Le script crée un prompt qui inclut les morceaux de texte les plus pertinents pour la question posée et la question elle-même. L'LLM est alors invoqué pour générer une réponse basée sur ce contexte. L'importance de la qualité des embeddings est soulignée, car elle affecte directement la pertinence des informations retournées par l'application.
📝 Tests d'évaluation de la qualité des réponses
Pour évaluer la qualité des réponses de l'application, un ensemble de tests unitaires est utilisé. Ces tests comprennent des questions prédéfinies avec des réponses attendues, et utilisent un LLM pour déterminer si les réponses de l'application sont équivalentes aux réponses attendues. Les tests sont structurés pour permettre une évaluation approchée de la justesse des réponses, en tenant compte de la subjectivité de la langue naturelle. Les tests positifs et négatifs sont utilisés pour s'assurer de la fiabilité de l'évaluation.
🚀 Conclusion et invitation à la participation future
La vidéo se termine par une invitation aux téléspectateurs à apporter des suggestions pour les prochains projets, comme le déploiement sur le cloud. Les liens vers le code source sur GitHub sont fournis pour ceux qui souhaitent examiner ou exécuter le projet complet. L'auteur souligne l'importance de comprendre les morceaux de code clés et encourage les téléspectateurs à fournir des commentaires pour améliorer et élargir le contenu futur.
Mindmap
Keywords
💡RAG
💡LLM (Modèle de Langage de Longueur)
💡Embedding
💡Base de données vectorielle
💡ChromaDB
💡Ollama
💡Unit testing
💡Langchain
💡Mistral
Highlights
Création d'une application Python RAG pour poser des questions sur un ensemble de PDFs à l'aide d'un langage naturel.
Utilisation de manuels d'instructions de jeux de société comme Monopoly ou CodeNames en PDF pour l'indexation de données.
Introduction de fonctionnalités avancées telles que le fonctionnement local à l'aide de modèles de langage machine (LLM) open source.
Mise à jour de la base de données vectorielle avec de nouvelles entrées sans reconstruction complète de la base de données.
Évaluation de la qualité des réponses générées par l'IA pour valider rapidement les modifications apportées à l'application.
Présentation d'un démo de l'application RAG terminée avec des questions sur les manuels d'instructions de jeux.
Utilisation d'un modèle LLM local pour générer des réponses basées sur les données trouvées dans les PDFs.
Explication du fonctionnement en arrière-plan de la transformation de données et de la génération de réponses.
Utilisation de Langchain pour charger les documents PDF et la gestion de différents types de documents.
Démonstration de la façon de diviser les documents en morceaux plus petits à l'aide du diviseur de texte récursif de Langchain.
Création d'un embedding pour chaque morceau de texte pour l'indexation et le stockage dans la base de données.
Utilisation d'AWS Bedrock pour générer des embeddings et la possibilité d'utiliser d'autres fonctions d'embedding.
Présentation d'Ollama comme plateforme pour gérer et exécuter des modèles de langage machine open source localement.
Création de la base de données vectorielle avec ChromaDB et la gestion de l'ajout ou de la mise à jour d'éléments existants.
Mise en place d'un système de test et d'évaluation de la qualité des réponses de l'application à l'aide de tests unitaires et d'LLM.
Démonstration de l'exécution de l'application avec des requêtes et la génération de réponses à partir des données incorporées.
Utilisation de pytest pour écrire des tests unitaires et évaluer la qualité des réponses de l'application RAG.
Méthode d'évaluation des réponses à l'aide d'un LLM pour déterminer si les réponses sont équivalentes.
Conclusion sur l'ajout de nouvelles fonctionnalités à l'application RAG et la possibilité d'apprendre ensemble avec la communauté.
Transcripts
In this video, we're going to build a Python RAG
application that lets us ask questions about
a set of PDFs we have using natural language.
The PDFs I'm going to use here are a bunch of board game instruction
manuals for games like Monopoly or CodeNames.
I can ask questions about my data, like "how do I
build a hotel in Monopoly?" The app will give me
an answer and a reference to the source material.
Now, I have done a basic RAG tutorial before on this
channel, but in this video we're going to take it up
a notch by introducing some more advanced features
that you guys asked about in the comments last time.
We're going to cover how to get it running locally
on your computer using open source LLMs.
I'll also show you how to update the vector database with new entries.
So if you want to modify or add information, you can do that
without having to rebuild the entire database from scratch.
Finally, we'll take a look at how we can test and evaluate
the quality of our AI generated responses.
This way you can quickly validate your app whenever you make
a change to the data source, the code or the LLM model.
All right, let's get started.
If you haven't built an app like this before,
then I highly recommend you to check out my
previous video tutorial on this topic first.
It will help you to get up to speed with all of the basic concepts.
Otherwise, here's a quick recap. RAG stands for Retrieval
Augmented Generation, and it's a way to index a
data source so that we can combine it with an LLM.
This gives us an AI chat experience that can leverage that data.
Here's a quick demo of the completed app.
I have my Python script here and I'm going to
ask a question about my data source, which
is going to be board game instruction manual.
So I can ask, "how do I build a hotel in Monopoly?""
And the result is that it gives me a response based on the
data that it found in the PDF sources that I provided it.
So the response is going to use that and actually phrase
it into a proper natural language response.
It's not just going to copy and paste the raw data source.
And here it's telling me that if I want to build
a hotel, I need to have four houses in a single
color and then I can buy the hotel from the bank.
And in this version of the app, I'm also using
a local LLM model to generate this response.
So here I have my Ollama server running in a separate terminal.
If you don't know what that is yet, that's okay. We'll cover it later.
But here's the actual LLM reading the question
and then turning this into a response.
Here's a quick recap on how that all works behind the scenes.
First, we have our original data source, the PDFs.
This data is going to be split into small chunks
and then transformed into an embedding
and stored inside of the vector database.
Then when we want to ask a question, we'll also turn our query into an embedding.
This will let us fetch the most relevant entries from the database.
We can then use those entries together in a prompt
and that's how we get our final response.
For this tutorial, we're going to mainly focus on the
features I mentioned at the beginning of the video.
But for everything else, we're going to be speeding through it a little bit.
So if you feel like it's all going a little bit
too fast, you can either check out my previous
RAG tutorial video first to learn the basics.
Or you could also follow along by looking through the code itself on GitHub.
Links will be in the description.
Here are the main dependencies I'll be using in this project.
So go ahead and install or update them first before you start.
First, we'll need some data to feed our RAG application with.
Gather some documents that you'd like to use as your source material.
In my previous video, a lot of you asked me how to do this with PDFs.
So I'm going to be using PDFs here.
I'm going to use board game instruction manuals.
I've got one for Monopoly and I've also got one for A Ticket to Ride.
And I just found these for free online.
So you can use whatever you want, but this is what I'm going to use here.
Just download the PDFs you want to use online and then put them inside a folder.
In this case, I've put it inside this data folder here in my project.
This is the code I can then use to load the documents from inside that folder.
It's using a PDF document loader that comes with the Langchain library.
And for future reference, if you want to load other types of
documents, you can head over to the Langchain documentation.
Look up document loaders and then just pick from any
of the various available document loaders here.
There's things for CSV files, a directory, HTML, Markdown and Microsoft Office.
And if that's still not enough, you can click
on the document loader integrations and there's
a whole list of third-party document loaders
available for you to choose from as well.
And if you want to see what one of these documents
looks like after you've loaded it,
you could just go ahead and print it out.
You should see an object like this.
So each document is basically an object containing
the text content of each page in the PDF.
It also has some metadata attached, which tells
you the page number and the source of the text.
Our next problem is that each document or each page
of the PDF is probably too big to use on its own.
We'll need to split it into smaller chunks and we can use Langchains
built-in recursive text splitter to do exactly that.
After you run that on your documents, you'll find that each chunk is a lot smaller.
So this is going to be handy when we index and store the data.
Next, we'll need to create an embedding for each chunk.
This will become something like a key for a database.
I actually recommend creating a function that returns
an embedding function because we're actually going to
need this embedding function in two separate places.
The first is going to be when we create the database itself.
And the second is when we actually want to query the database.
And it's very important that we use the exact same
embedding function in both of these places.
Otherwise, it's not going to work.
Langchain also comes with a lot of different embedding functions you can use.
In this case, I'm using AWS Bedrock because I tend
to build a lot of stuff using AWS already.
And the results are pretty good, from what I can tell.
But you can switch to using a different embedding function as well.
You can choose from any of the embedding integrations
listed here on the Langchain website.
For example, if you want to run it completely locally on your
own computer, you can use an Ollama embedding instead.
Of course, for this to work, you also need to install Ollama
and run the Ollama server on your computer first.
If you haven't used Ollama before, you can think
of it as a platform that manages and runs
open source LLMs locally on your computer.
Just download it from the official website, Ollama.
com, and then install any of the available
open source models like Llama2 or Mistral.
You can then run this command to serve the model as a REST API on your local host.
Now, you'll be able to use an LLM just by calling this local API.
Of course, the Langchain module for Ollama embeddings will handle
all of this for you as long as the server is running.
However, just as a heads up, for my own testing
using one of the 4GB models on Ollama, the
embedding results just weren't very good.
For RAG apps, having good embeddings is essential,
otherwise your queries won't match up with the chunks
of information that are actually relevant.
So for myself on this project, I'm still going to use a
service like OpenAI or AWS Bedrock for the embeddings.
But if your computer can handle it, you can try
using a larger, more powerful model on Ollama
as well, and please let me know how that goes.
By the way, some of you might be wondering at this point,
how did I measure the quality of the embeddings?
Well, we'll get to that later when we look at testing.
Now let's walk through the process of creating the database.
Once we have the documents split into smaller chunks, we can use
the embedding function to build a vector database with it.
So just as a quick recap, a vector is something like
a list of numbers, and our embeddings are actually
a vector because they're just a list of numbers.
So a vector database lets us store information
using vectors as something like a key.
And in this video, we're going to be using ChromaDB as our vector database.
In my first video, we actually had code that looked
a lot like this, and it's useful if we wanted
to create a brand new database from scratch.
But what if we wanted to add or update items in an existing database?
ChromaDB will let us do this too, but first we'll
need to tag every item with a string id.
Let's go back to our chunk of text and figure out how we can do this.
So as you can see, each chunk already has its source file path and a page number.
So what if we put it together to do something like this?
We'll use the source path, the page number, and then the chunk number of that page.
Because remember, a single page could have several chunks.
That way, every chunk will have a unique but deterministic id.
We can then use this to see if this particular chunk exists in
the database already, and if it's not, then we can add it.
Implementing this is pretty easy as well.
We can loop through all the chunks and look at its metadata.
We'll concatenate the source and the page number to make an id.
But because a single page is split up into multiple chunks,
we actually have many chunks sharing the same page id.
Solving this is pretty easy though.
We can just keep count of the chunk index for a page,
and then reset it to zero whenever we see a new page.
So putting all that together, we now have a
chunk id that looks something like these.
Each chunk is now guaranteed a unique and deterministic id.
Let's add it back into the metadata of the chunk as well so we can use it later.
Now, if we add new PDFs or add new pages to an existing
PDF, our system will have a way to check
whether it's already in the database or not.
So let's hop over to the code editor and see this in action.
Currently, in my data folder, I've got a Monopoly PDF and a Ticket to Ride PDF.
So now I'm going to add a new PDF to this folder.
It's going to be the one for CodeNames.
This is the one I'm adding.
So now when I populate the database, I want my program to detect
that this one is new, but the other two already exist.
So I only want this one to be added.
So here, right away, it's quickly detected
that there's 41 documents already inside the
database, but we have 27 new documents
that we need to add just because I moved that
new pdf into the data directory as well.
So that was a new one.
And this time, even if we run the same command
again to populate the database, it can see that
all the documents, all the pdfs inside that
data folder have already been added from the previous
step and there was nothing new to add.
So this is exactly the behavior that we want.
Although this implementation will let us add
new data without having to recreate the entire
database itself, it's actually not enough
for us if we wanted to edit an existing page.
For example, if I modify the pdf content in this chunk
here, the chunk ID will still be exactly the same.
So how do we know when we need to actually update this page?
This problem is out of scope for today, but
there's actually many ways to solve this.
If you think you know the solution, then please share it in the comments.
Now let's close the loop on this and actually take a look
at the code that you need for updating your database.
Now that we've given every chunk a unique ID, let's add them to the database.
If you're using chroma, you can first load up your database like
this, using the same embedding function we used earlier.
Let's go through all the items in the database and get all of the IDs.
If you're running this for the very first time, then this should be an empty set.
After that, we can filter through all of the chunks we're about to add.
If we don't see an ID inside the set, that means
it's a new chunk and we should add it.
From there, it's all pretty easy.
It's just a few lines to add the documents to the database.
Just don't forget to also add the IDs explicitly as well.
If you don't specify a matching list of IDs for
the items that you're adding, then chroma will
generate new UUIDs for us automatically.
It's convenient, but it also means that we won't be able
to check for the existing items like we did earlier.
So if that's the case, when we try to add new
items, we're just going to end up with a
lot of duplicated items inside the database.
Now let's put all this together and make this not just
functional, but also able to run locally as well.
If you were using Ollama's local embeddings from before, you'll
be able to do everything 100% locally, end to end.
Or you might end up with more of a hybrid approach like me.
I use an online embedding model because it's better than what I can do locally.
But I found that as long as the embeddings are good,
I can actually get pretty impressive results using
a local LLM to do the actual chat interface.
So that's what we're going to do here.
We can start by creating a new Python script or
function that will take our query as input.
We'll also have to load the embedding function and the database.
We'll need to prepare a prompt for our LLM.
Here's the template I'm going to use.
There's two variables we'll need to replace here.
First is the context, which is going to be all the chunks
from our database that best matches the query.
And then second, it's the actual question that we want to ask.
So we'll put that whole thing together and then we get
the final prompt that we want to send to our LLM.
To retrieve the relevant context, we'll need to search
the database, which will give us a list of
the top K most relevant chunks to our question.
Then we can use that together with the original
question text to generate the prompt.
If you decide to print out the entire prompt at
this stage, you should see something like this.
So you've got your entire prompt template here, but you
could see that our context section already has some of
the chunks from the instruction manual formatted in.
And I put my k=5, so there's actually five different chunks.
And this is all part of one big prompt.
This is the information that my system thought
was the best matching to answer our query.
And then I kind of reiterate the question that I want right
at the end after I've given all of this context.
So here the question is, how many clues can I give in code names?
And the response is, in code names you can only give one
clue per turn, and the clue should be a single word.
And then I also have the sources of this answer cited here,
so that's basically where all these chunks were found.
After you have the prompt, the rest is super easy.
All you have to do is just invoke an LLM with the prompt.
Here I'll use the Mistral model on my local Ollama server.
It only needs four gigabytes to run, but it's actually quite capable.
And if you want, you can also get the original source of the text like this.
Now let's go back to our terminal and see this in action.
So I'm going to use this program and I'm going to query it.
How do I get out of jail in Monopoly?
And now the program stopped running, so let's go and see what it did.
Here you can see that we find all the relevant chunks.
So this one is the most relevant, and it's actually spot on.
It actually gives us step-by-step instructions on how to get out of jail.
So I think really this is the only one we need.
But anyways, we put our limit to five, so we also get a bunch
of other chunks that may be relevant to the question.
And then as part of the prompt, we reiterate the question
again so that our LLM knows what to answer.
And using all of that information, this is the response our LLM came up with.
So it came up with four different ways we can get out of jail in Monopoly.
And then right at the end, we also have the sources of all of this information.
So that's what it's like when we run the entire application.
And even though I used AWS Bedrock for the embeddings,
because I couldn't get local embeddings
that were good enough, this part to generate
the question still uses a local Ollama server.
So if I go to my other terminal here, see where
my Ollama server is running, you could
see it logging the work that we're doing.
We now have a RAG application that works quite well end-to-end.
We can get it to answer our questions by using the embedded
source material, but the quality of the answers we
get would depend on quite a lot of different factors.
For example, it could depend on the source material
itself, or the way we split the text.
And it will also 100% depend on the LLM model we
use for the embedding and the final response.
So the problem we have now is, how do we evaluate the quality of responses?
This seems to be a subjective matter.
Let's see if we can approach this with unit testing.
If you've never worked with unit tests in Python
before, then you can also check out my other
video on how to get started with pytest.
The main idea here is to write some sample questions and also
provide the expected answer for each of those questions.
So given a question like, "How much total money does
a player start with in Monopoly?", the answer I'd
expect my RAG application to respond with is 1500.
You want it to be something that you can already
validate or already know the answer for.
We can then run the test by passing the question
into our actual app, and then comparing
and asserting that the answer matches.
But the challenge with this is that we can't do
a strict equality comparison, because there could
be many ways to express the right answer.
So what we can do instead is actually use an LLM to judge the answer for us.
This won't always guarantee perfect results, but it does get us pretty close.
We can start by having a prompt template like this, that asks
the LLM to judge whether these responses are equivalent.
Then, as part of our test, we'll query the
RAG app with our question, and then we'll
create a prompt based on the question, the
expected response, and the actual response.
We can then invoke our LLM again to give us its opinion.
We can clean up the response we get from that, and
finally check whether the answer is true or false.
And this is something we'll actually be able to assert on as part of our unit test.
So putting all that together, I can wrap this into
a nice helper function that returns true or false.
Then, I can just write a bunch of unit tests using that helper
function, and I can write as many test cases as I want.
This will give me a quick way to see how well my application
is performing, especially after I make updates to
the code, the source documents, or the LLM model itself.
Now let's hop over back to our editor to do a quick demo.
So I've got my test file here, and here is the helper
function that you saw earlier, and here is us trying
to interpret that result into either a true
or a false result, and here is the prompt template.
So these are going to be my two test cases.
I'm going to test the monopoly rules, and I'm
also going to test the ticket to ride rules.
So two test cases. Let's see how it does.
Okay, and in this case, both of my test cases passed.
Let's expand this window and actually take a bit of a closer look.
So here, my expected response is 10 points,
and the actual response is "The longest continual
train gets a bonus of 10 points."
So these are not exactly the same string,
but they're still saying the same thing.
And this is true. So this was successful.
And then if I go up to my monopoly one, the expected response
is 1,500, and the actual response is also 1,500.
And as you can see again, the format is slightly
different, so we need the LLM to tell us whether
or not these actually mean the same thing.
So this one passed as well. In this case, both of our tests passed.
Now, we have to be careful with this because we
don't know whether it passed because the evaluation
was good and the answer was correct, or
if our LLM turns out to be too generous, we might
actually end up passing the wrong answers.
So it's also good to do a negative test case to kind of check that.
So what we could do is we can turn this expected
response into something we know that's wrong
and then check that it actually fails.
We want it to fail in that case.
So I'm going to put 9999.
Okay, and I'm now running that test again, expecting this case to fail.
And here it actually does fail, which is good. That's exactly what we wanted.
So we have our fake expected response of 9999,
and then the actual response is still the same
from when we asked it before, which is 1,500.
And our LLM evaluation correctly determines that this is the wrong response.
So our test will fail in this case, and our entire test suite will fail.
However, if we want a failing test, if we want this
negative case to be used as part of our suite in the
correct way, what we could actually do is go back
to our test case here and then invert the assertion.
So instead of asserting that this is true, we can
assert that this is actually going to fail.
And that also tells us that this answer should be wrong, and
something is wrong if it's not wrong, if that makes sense.
So let's go ahead and run this again.
So this time the LLM still believes that the
response doesn't match, and it's false.
But because we've inverted the assert case, the
entire test suite still manages to pass.
So I recommend that if you're going to write tests for
LLM applications like this, it's good to have both
positive cases and negative cases being tested.
And by the way, if you do have a lot of different
test cases you want to use, you maybe don't
need to assert that 100% of them succeed.
You could maybe set a threshold for what is good enough.
For example, 80% or 90%.
So now you've leveled up your project by learning
how to use different LLMs, including a
local one, and you've also learned how to add
new items to your database, and how to test
the quality of your application as a whole.
These were all topics that were brought up in the
comment section of my previous RAG tutorial.
And so after watching this, if there's more
things you'd like to learn how to do, like
deploy this to the cloud for example, then
let me know in the comments of this video and
we can build it together in the next one.
I know we went through the project quite quickly.
My focus here was to show you the coding snippets that
mattered the most and helping you to understand them.
So I've actually had to simplify a bunch of the code and the ideas along the way.
But if you want to take a closer look and see
how all the pieces fit together into a project,
or you just want to download a code
that you can run right away, then check out
the GitHub link in the video description.
There you'll have access to the entire project that
I used for this video, and something that I was
running end-to-end as you saw in the demo here.
Anyways, I hope this was useful, and I'll see you in the next one.
Voir Plus de Vidéos Connexes
Cette technique de Prompt va te faire gagner beaucoup de temps (ChatGPT, Mistral, Claude, Llama...)
L'IA À UTILISER EN 2024 ! (CLAUDE AI)
Retrieval Augmented Generation for Navigating Large Enterprise Documents
Tout comprendre sur les modèles ARVALIS, au cœur des Outils d’Aide à la Décision - ARVALIS.fr
Python #9 - gestion erreurs
IA : Ce qu'on ne vous dit pas ...
5.0 / 5 (0 votes)