Explained: The Voiceflow Knowledge Base (Retrieval Augmented Generation)
Summary
TLDRThis video script introduces 'Retrieval Augmented Generation' (REG), a key feature in AI assistants that allows them to answer questions based on uploaded documents. The script explains how documents are broken into chunks and stored in a vector database, which the AI then uses to find relevant information to answer user queries. It also covers the technical aspects of implementing REG in Voice Flow, including the use of AI models, system prompts, and chunk limits to optimize the accuracy and efficiency of the AI's responses.
Takeaways
- 📚 The script introduces Retrieval-Augmented Generation (RAG), a feature in AI tools that allows AI to answer questions based on uploaded documents.
- 🛠️ The process involves using a vector database to break documents into chunks and match them with user queries to provide relevant information.
- 🔍 Voice Flow's knowledge base is a key feature that enables the creation of AI assistants that can access and utilize uploaded documents to answer questions.
- 📈 The script explains the technical process of how documents are broken into chunks, stored in a vector database, and then matched to user queries.
- 📝 Chunks are small snippets of text from the original document, which are used to provide context to the AI model when assembling answers.
- 🧲 The AI model uses the similarity of concepts within the question to find relevant chunks from the vector database to answer the user's query.
- 🔑 The script highlights the importance of choosing the right AI model, adjusting settings like temperature, max tokens, and chunk limit for optimal results.
- 🔄 The accuracy of the AI's answers depends on the quality of the chunks provided, and the script suggests testing and refining the knowledge base for better accuracy.
- 🛑 The script mentions the ability to debug and inspect the process, including viewing the API information and similarity scores of chunks used in forming answers.
- 🔧 The script discusses the importance of optimizing chunk usage and the trade-off between accuracy and token usage when increasing the chunk limit.
- 🔗 The script concludes with a mention of Voice Flow's API documentation, which allows developers to build custom solutions and integrate document management with the knowledge base.
Q & A
What is Retrieval-Augmented Generation (RAG) and why is it important for AI assistants?
-Retrieval-Augmented Generation (RAG) is a function that allows AI assistants to upload documents and answer questions based on those documents. It is important because it enables AI to provide valuable and contextually accurate answers by referencing specific information from the uploaded documents.
What is the role of a vector database in RAG?
-A vector database is used in RAG to store and manage the chunks of text from uploaded documents as vectors. These vectors represent the content of the chunks and help the AI model to identify and retrieve the most relevant information when answering questions.
How does Voice Flow's knowledge base differ from a traditional AI's base layer of knowledge?
-Voice Flow's knowledge base allows users to upload specific documents that the AI can then reference for answering questions. Unlike a traditional AI's base layer, which relies on general knowledge from its training data, Voice Flow's knowledge base is tailored to the user's provided documents, offering more customized and specific answers.
What happens when a document is uploaded to Voice Flow's knowledge base?
-When a document is uploaded, Voice Flow breaks it down into smaller sections called chunks. These chunks are then processed and stored in a vector database, where they are turned into vectors that represent the content of the document.
How does Voice Flow determine which chunks of information to use when answering a question?
-Voice Flow uses a model to compare the user's question with the chunks in the vector database, identifying the most similar chunks based on the concepts within the question. It then selects the most relevant chunks to include in the AI model's response.
What is the significance of the chunk limit setting in Voice Flow?
-The chunk limit setting determines how many of the most similar chunks are selected to answer a question. Increasing the chunk limit can improve the accuracy of the answer by providing the AI model with more information, but it also increases the number of tokens used per response.
How can the quality of answers from Voice Flow's knowledge base be improved?
-The quality of answers can be improved by ensuring the chunks in the knowledge base are accurate and relevant. This involves testing the knowledge base, reviewing the chunks used in answers, and updating or removing documents that do not provide correct information.
What is the purpose of the system prompt in Voice Flow's knowledge base?
-The system prompt is used to frame the response from the AI model. It can be customized to influence the length, format, or style of the answer, such as requiring a certain number of sentences or a specific structure like bullet points.
How does Voice Flow handle the token usage when multiple chunks are used to answer a question?
-Token usage in Voice Flow is influenced by both the input (number of chunks) and the output (length of the answer). When more chunks are used, the input token count increases, which can affect the overall token usage for each response.
What are some of the developer tools available for Voice Flow's knowledge base?
-Voice Flow provides APIs for uploading, deleting, and replacing documents in the knowledge base, as well as for retrieving answers. These APIs can be used to build custom services or widgets that integrate with Voice Flow, allowing for automated updates to the knowledge base.
How can Voice Flow's knowledge base be used to create custom solutions?
-Developers can use Voice Flow's knowledge base APIs to create custom solutions, such as widgets for uploading documents directly into the knowledge base or services that automatically update the knowledge base with the latest documentation from a CMS.
Outlines
🤖 Introduction to Retrieval-Augmented Generation (REG)
This paragraph introduces the concept of Retrieval-Augmented Generation (REG), a feature in AI tools that allows users to upload documents and ask questions based on those documents. The video aims to explain how REG works conceptually and visually, using Voice Flow's knowledge base as an example. Voice Flow is a platform that enables the design of AI assistants capable of accessing uploaded documents to answer questions. The explanation includes the process of breaking documents into chunks and storing them in a vector database, which the AI uses to find relevant information when answering questions.
📚 Behind the Scenes of Voice Flow's Knowledge Base
This section delves into the inner workings of Voice Flow's knowledge base, starting with the process of uploading documents and breaking them into text chunks. It explains how these chunks are turned into vectors and stored in a vector database. The paragraph then describes the process of answering user questions by finding similar chunks in the database and using them to inform the AI model, which assembles an answer. The use of system prompts and the AI model's output are also discussed, along with the settings that can be adjusted for optimization, such as chunk limit and model selection.
🔍 Debugging and Customizing the Knowledge Base
The final paragraph focuses on debugging and customizing the knowledge base in Voice Flow. It discusses the importance of chunk quality for accurate answers and provides tips on how to identify and improve the quality of chunks. The paragraph also covers the settings that influence the AI's response, such as temperature, max tokens, and system prompt. Additionally, it touches on the chunk limit's impact on accuracy and token usage, advising on how to balance these factors. The paragraph concludes with a mention of Voice Flow's APIs for developers, which can be used to automate the upload, deletion, and replacement of documents in the knowledge base.
Mindmap
Keywords
💡Retrieval Augmented Generation (RAG)
💡Knowledge Base
💡Chunks
💡Vector Database
💡Concepts
💡AI Model
💡Tokens
💡System Prompt
💡Chunk Limit
💡API
Highlights
Retrieval Augmented Generation (REG) is a key feature for designing AI assistants that can provide valuable answers based on uploaded documents.
REG uses AI's base layer of knowledge and its understanding of how information relates to assemble answers from provided documents.
A vector database is used in the process, where document chunks are turned into vectors for similarity comparison.
Voice Flow's knowledge base allows uploading documents and designing AI assistants to access and answer questions based on those documents.
Documents are broken into chunks in Voice Flow, which are then stored in a vector database for efficient retrieval.
The AI model determines the most similar chunks from the vector database to answer a user's question.
Concepts within the user's question are matched with chunks to find the most relevant information.
The AI model then uses the selected chunks to formulate a response to the user's question.
Voice Flow's settings allow adjusting the chunk limit, which affects the detail and accuracy of the AI's answer.
Debugging in Voice Flow includes inspecting the network to see the API information and similarity scores used to form answers.
Choosing the right AI model and adjusting settings like temperature and max tokens can optimize the AI's response.
The system prompt in Voice Flow can be customized to influence the structure and depth of the AI's response.
Increasing the chunk limit improves accuracy but also increases token usage per response.
The knowledge base's effectiveness depends on the quality of the chunks provided from the uploaded documents.
Testing the knowledge base in preview mode helps identify which documents are being used to answer questions.
Updating or re-uploading documents can improve the accuracy of the knowledge base by replacing incorrect chunks.
Voice Flow offers APIs for uploading, deleting, and replacing documents in the knowledge base, enabling custom integrations.
Developers can leverage Voice Flow's APIs to build custom solutions for knowledge base management.
The presentation by Daniel, head of grow at Voice Flow, provides a comprehensive overview of the knowledge base functionality.
Transcripts
if you've used any AI tool of the past
year you'll notice that there's
typically a feature where you can upload
documents and actually ask the AI
questions based on those documents now
this is a function called retrieval
augmented generation or reg for short
and it's incredibly important to know
how this works if you're going to be
designing AI assistants that can
actually provide valuable answers now in
this video we're going to go over how it
works conceptually but we're also going
to show you visually what's actually
happening when you go and upload
documents into voice flows knowledge
base which is our own version of
retrieval augmented generation this lets
you upload a bunch of documents and then
actually design an AI assistant around
them to be able to send questions and
access those documents and even choose
certain files within the knowledge space
that you've uploaded to be able to uh
answer users questions so at a high
level retrieval augmented generation
works the following way it uses the ai's
base layer of knowledge so this is the
information that it's already trained on
so in the case of GPT it would be the
internet but really what's more
important is its understanding of how
relationships Co and map to each other
then what it does is when you ask it a
question it says can I answer this with
the knowledge that I have and then let
me look through a library of documents
that I've been provided or library of
information that I've been provided to
actually assemble an answer that can
answer this question now that process
typically involves the use of something
called a vector database and vectors and
so without further Ado let's actually
hop into vo low and get a sense of what
happens once you upload your knowledge
or your
to the knowledge base so the knowledge
base is the most impactful important
feature we built in voice flow and we've
got a lot of questions so the first one
is obviously how does the knowledge base
actually work so if you're within voice
flow if if you're remember and I've got
a little retail project up here this is
the ril template you can find on the
website let me just go ahead and copy
these URLs and within the knowledge base
when you go to head add data source and
you drop these URLs in once they start
uploading there's a lot of stuff that's
going on behind the scenes so as these
process I'm going to walk you through
just a little illustration of what this
looks
like so imagine you've got your document
here that you've uploaded right so this
can be the URL that I just did maybe
it's the returns page on your website so
once you upload this into voice flow
what's happening is that voice flow
behind the scenes is actually breaking
this one document so this might be a big
chunk of text right so you've got a
website it's got a bunch of stuff on it
it's basically scraping all the text
assembling it and then it breaks it into
sections called chunks so chunks are
basically like little tiny versions of
the document that you've built so let's
just go ahead and and illustrate this so
I'll make them maybe blue and these are
basically just Snippets so these are
maybe you know, 2,500 characters of of
text that overlap a bit throughout your
project and so that one document that
you've uploaded uh may actually be
broken out into like tens 50s hundreds
of different chunks that basically are
like little Snippets of the information
on the page itself so if you've got a
really long page with a lot of
information and it's talking about your
return policy again we're breaking that
out into little pieces so that we can
digest them and put them in what we call
a vector database so these chunks are
stored here so let's take them out so
let's just say you know this one
document now becomes these many little
different chunks and these chunks are
stored in again what's called a vector
database what this means is that this
these chunks are basically turned into
vectors and you can imagine a vector is
like an arrow and that Arrow generally
represents what's in the chunk and the
chunk is just a piece of information and
so what all we've done by this stage is
just break a bunch of text like a big
chunk document into little pieces that
we can then uh utilize uh within the
flow itself or within some of R
ml now when a user asks a question and
you can imagine here that you know
you're not just uploading one document
you're uploading many different
documents into your project and so
you've now got tons of chunks that are
in this Vector database that are all
kind of waiting to be used so now when a
user asks a question what's happening is
that we take that question and we look
at our Vector database and we say hey
what are some chunks that are similar to
the question being as asked and so we
may return let's just say you know maybe
it's this one make it yellow this one
here make it yellow again and maybe
another one from the same
document and so we have a model that's
running that's determining that based on
the question being asked these are the
most similar chunks of information that
most likely answer or pertain to that
question how this works is it's looking
at Concepts that are within the question
so if someone asks how much are your
products or what is your pricing it's
able to take that concept of pricing and
compare it to the different chunks here
to say are there another similar concept
of pricing in these chunks of
information that I have if it finds
multiple and you can set a max value
here so let's say for example the max
value is three here it'll pick the three
three most similar chunks or chunks that
have the most similarity to the question
being
asked then what it's going to do is it's
going to pull out those chunks so let's
just go ahead and actually pull them out
here and now it's going to take that
it's going to package it with a prompt
that's just us being able to say hey
answer this user's question and that's
going to then be sent to a AI
model and so we'll call this AI model
AI
model now this is the information that's
going to the AI model and we're
basically saying hey AI model this is
the question here's the relevant
information from this user's database or
the information they gave us assemble an
answer uh that actually answers the
question and it'll be able to come out
with uh a model an answer just like that
so we'll make this answer green and
we'll just say that this is the
output
so we'll call this output AI
output and so we've got here the AI
input that's being sent over and then
we've got the output which is the actual
answer of the question so that's
generally what's happening behind the
scenes here there's also a lot of
smaller optimizations and a lot of other
models that we have running that we're
constantly learning and trying to
improve to ensure that answers are
actually they're more accurate than what
you would find and so that's how this is
working and so now when we look at voice
flow and let's go into our retail
assistant here and let's go ahead and
ask a question like what is your return
policy or yeah what is your return
policy so if you remember I've already
because I've already uploaded all my
documents those have already been broken
out into chunks it's taking my question
it's looking and finding chunks it's
passing those to a language model and
it's summarizing the answer and you can
see here in the preview mode you can
actually see the chunks that it took it
from and so in this case I've got one
chunk that I grabbed it from and then
this was the chunk that it passed to the
AI model with my question to say come up
with a response now in the settings I
have the option to actually change some
stuff so I can make the chunk limit five
for example so let's hit save and let's
run this question again now and you can
see that the question the answer is
actually going to be much different so
what is your return
policy uh now because I've got five
chunks that are being passed in I've got
a lot more information to work with it's
also going to use more tokens but you
you can see that the answer is actually
a bit more detailed and when I go I've
got 1 2 3 four five different chunks
that it was able to actually pull from
to inform this
answer another thing you can do if you
want to go even deeper into debugging is
hit inspect and what you're able to do
is you can actually see some of the
information that's going on behind the
scenes here so if you go to network and
then let's just rerun this question so
what is your return
policy you can see that here under
knowledge base xhr I can I can actually
see the API information on what's being
pulled and so this is the information
you get when you actually use our API
directly but I can see what's being
passed to voice flow and so I can see
each of the chunks here and you can see
that when I mentioned there's a
similarity score right so our system
determined that these chunks had a
different they all have different
similarity scores and so it's accounting
for that in how it forms the answer as
well with the AI model so there's a lot
more going on behind the scenes here
than just kind of looking up information
in your document that's trying to ensure
the accuracy of
it so going into some of the settings
here that are important to know the
first one is which model you pick so in
our documentation here in our little
diagram this the AI out input and the AI
output are being determined by a model
of your choosing so in voice flow you've
got a couple different models you've got
from gpt3 to GPT 4 and everything in
between these are ranked in order of how
expensive they are and how accurate they
are you can see here that GPT 4 is
incredibly expensive you do not want to
use this for knowledge based just
general stuff it's going to use up a lot
of your tokens you can see the GPT 3.5
and clae instant are 1X tokens these are
really good to use for knowledge base
and if you really need to improve
accuracy you can increase it to claw one
or two but I would recommend just
sticking with these two as your accuracy
is going to be pretty
decent the second one is temperature so
this is an AI model setting most AI
models have this this is basically
telling the model like how creative it
can be with its answer or how closely it
should ST stick to the exact text or the
exact information that it has Max tokens
this is important to know as well so
when it comes to your output and so this
is your output here the answer that the
AI actually forms this slider is what's
going to dictate how long or how short
that answer is going to be but what's
important to note is that this is
actually just setting a limit so if I
set the limit of 300 tokens it's going
to be about like a th characters cuz
each token is about three to four
characters and so that's the maximum
length it can be but what I want to do
is I want to play with this and I want
to play with the system prompt right so
the system prompt here is what's
actually going to as part of the prompt
to be able to frame the response back
from the AI and if you want a shorter
answer you want a longer answer you want
a formatted answer this is where you'd
put it so I I would say something like
your sentences your answers must be at
least at least you know like three
sentences or you can say you must
structure in a bullet point form or you
can say you know you must say you're an
AI assistant this is where you can start
to tweak how long or how robust or How
deep the response is and then you can
use the max token slider to also help
account for that finally is the chunk
limit so this is now probably the most
important concept to think about is that
when you increase the chunk limit you're
going to have a higher accuracy because
the AI model has more information to
work with but you are also going to
increase the number of tokens you use
per response that's because token out
like the amount of tokens you use is
both input and output we already talked
about a slider where you can control the
output but the input is controlled by
the number of of of of chunks that you
have so in here where I I mentioned I've
got five chunks you can just imagine
that when I do my preview and I do that
same question so what is your return
policy that all five of those chunks so
all of that information is being passed
to the large language model and so there
are like all of these are counting
against your tokens so that's 1 2 3 4 5
it's quite a bit and so your token uses
is going to go way up so you really have
to be cautious around what you want to
do if you want to again increase those
or decrease those honestly two or three
is fine I would really stick with two
you can go as low as one if you're
really conscious about your token usage
but that's how you can start to modify
it so now going back to our presentation
and I hope that made sense we can look
at a couple other really specific
scenarios so on average the knowledge
base uses between 500 and 2,000 tokens
depending on how many chunks it's using
so again when you're running a question
in the knowledge base the like input and
output the input is is determined by the
number of chunks and so that's what you
really want to pay attention to the
quality of the answer depends on the
quality of the chunks provided so my
recommendation is as you're testing out
your knowledge base go ahead and answer
question or ask questions in the preview
mode because you can start seeing what
what documents is actually pulling from
to answer your question if you're
finding that you are getting bad quality
answers it's because you have chunks in
your project that are don't have good
information and they're being included
so you want to find out where those are
coming from and you want to either
remove that document or you want to
update and re-upload that document when
you delete a document it removes the
chunks when you reupload the document it
reads chunks so again what is your
return
policy you can see here where all the
chunks are coming from and so if I look
at and say you know I've say let's say
this isn't correct let's say it's
actually 60 days I can see that okay
cool it's pulling most of the
information from this returns page if my
information in the answer is incorrect
it's because there's some answer
incorrect in your returns page so you
need to go there figure where it is
remove it or update it and then
re-upload the document so that it's
pulling the chunks so that's how you
would improve the accuracy of your
project and how you would debug where
those answers are coming
from going back to over here we just
walked through how to determine and find
out where chunks are coming from we
talked about optimizing tunk chunks so
with just determin the number that
you're using the last one here is
knowledge based apis and so over here in
our API documentation so developer.
vo.com so if you're a developer you
should absolutely go check this out we
have a ton of apis and so for our
knowledge base the most important ones
uh that are relevant here are the upload
and delete and replace document
endpoints this means that you can build
a custom service on top of VO those apis
or if you are a developer with a react
chat you could build a widget so someone
could actually upload a document and
this would allow people to upload
documents right into the knowledge base
or remove them or delete them or replace
them without needing to actually touch
voice flow so if you have a customer or
or you are a business where you have all
of your documentation in like a CMS for
example you can use this API to
constantly upload update and replace all
the documentation your knowledge base
without touching voice so automatically
so again something to look at something
to keep an eye for if you're a developer
we've also got a API to actually get an
answer from the knowledge base so this
gives you a bit more control as
well so that's uh going to be really
important you should see some custom
Solutions coming out from community
members as well uh that are using
that so thanks again my name is Daniel
I'm the head of grow. voice flow I will
see you in Discord along with the rest
of my team bye
浏览更多相关视频
What is RAG? (Retrieval Augmented Generation)
Cosa sono i RAG, spiegato semplice (retrieval augmented generation)
Perplexity CEO explains RAG: Retrieval-Augmented Generation | Aravind Srinivas and Lex Fridman
Introduction to Generative AI (Day 7/20) #largelanguagemodels #genai
No Code RAG Agents? You HAVE to Check out n8n + LangChain
Fine Tuning, RAG e Prompt Engineering: Qual é melhor? e Quando Usar?
5.0 / 5 (0 votes)