Retrieval Augmented Generation for Navigating Large Enterprise Documents
Summary
TLDRThe Google Cloud Community session featured the Generali Italia team discussing their experience with developing a RAAG-based application for navigating complex enterprise documents. The team highlighted the challenges of information retrieval in a heavily regulated industry with extensive documentation. They detailed their approach using large language models, the process of embedding and retrieval, and the importance of in-context learning. The session included a live demonstration and Q&A, emphasizing the team's innovative use of AI to enhance document accessibility and information retrieval within their organization.
Takeaways
- π The Generali Italia team developed a RAAG (Retrieval-Augmented Generation) based application for navigating complex enterprise documents.
- π The project aimed to leverage AI advancements to simplify the information retrieval process within a large volume of technical and regulatory documentation.
- π The team faced challenges with over 400 documents totaling more than 5,000 pages, which would take over 100 hours to read.
- π Information retrieval was identified as a key field to assist with the challenges, involving searching for documents, information within documents, or the documents themselves.
- π‘ The team utilized large language models (LLMs) and generative AI to surpass the state of the art in understanding language and generating meaningful conversations.
- π§ In-context learning was employed to reduce hallucinations in AI responses by providing the model with relevant context from the documents.
- π The team conducted experiments with default parameters and later introduced custom strategies for document chunking and hyperparameter tuning.
- π They created a synthetic dataset for evaluation purposes due to the lack of an existing validation set, extracting questions and answers using a large language model.
- π§ The experimentation involved tools like Vertex AI Platform, various LLMs, and a vector database for storing embeddings.
- π The architecture included an ingestion phase and an inference phase, with the latter involving user interaction and frontend services.
- π The team plans to experiment with new foundation models and Vertex AI's vector search, as well as work on LLMs for better handling of new documents.
Q & A
What was the main challenge Generali Italia faced with their documentation?
-The main challenge was the continuous growth of textual data and knowledge, which made it difficult to extract information efficiently from a large volume of documents, leading to significant time consumption.
How did Generali Italia leverage AI to simplify the information retrieval process?
-They defined a perimeter of relevant business documents and used large language models within a retrieval-augmented (RAG) based solution to develop a document Q&A application.
What is information retrieval and how does it assist in addressing the challenges faced by Generali Italia?
-Information retrieval is the science of searching for information in documents or for documents themselves. It assists by providing methodologies to efficiently locate and extract the needed information from vast document collections.
What is the role of the embedding model in the RAG architecture?
-The embedding model, which is a large language model itself, takes text as input and returns a list of numbers (vector). It helps in creating context embeddings that are used to find similar information chunks for answering user queries.
How did Generali Italia handle the lack of a validation dataset for their RAG system?
-They created a synthetic dataset by extracting paragraphs from each document and using a large language model to generate questions and answers, which were then used for validation and performance evaluation.
What are the key metrics used to evaluate the performance of the RAG-based application?
-Key metrics include Mean Reciprocal Rank (MRR), Mean Average Precision (MAP) at a given cut-off of K, Recall, and Rouge and BLEU scores for comparing the quality of generated responses against real answers.
What was the significance of the research paper 'Lost in the Middle' in the context of Generali Italia's RAG system?
-The paper provided insights into how large language models use the information from the context provided. This led Generali Italia to introduce a re-ranking layer to optimize the organization of information presented to the LLM for better performance.
How did Generali Italia ensure the scalability and reliability of their RAG-based application?
-They utilized the Vertex AI platform for experimentation and model training, which ensured scalability and reproducibility. Additionally, they used Google's infrastructure for the reliability of their product.
What was the outcome of the experiments with custom chunking strategies and hyperparameter tuning?
-The experiments resulted in improved performance, with the best chunk size identified as 1,000 characters and a recall of 80% at 15 documents, along with a question-answer accuracy of 73%.
How did Generali Italia address the need to explain acronyms and insurance definitions to users?
-They added custom chunks to their collection that explained acronyms and insurance definitions, which improved the chatbot's ability to answer questions related to these topics, despite a slight decrease in overall metrics.
What are the next steps for Generali Italia's RAG-based application?
-The next steps include testing new foundation models like Gemini Pro, using Vertex AI's side-by-side pipeline to compare different models, and exploring Vertex Vector search for a more efficient vector database solution.
Outlines
π’ Introduction and Overview
The video begins with Ivan, a Google Cloud customer engineer, welcoming the audience to a session on Google Cloud Community. He introduces the Italia Italy team, including Ian, a tech lead data scientist, and Domino, a tech lead machine learning engineer. They will share their experience in developing a rag-based application for navigating complex enterprise documents. The agenda includes presenting the scenario of the application's development, discussing the choice of large language models, detailing the deployment process, and concluding with a live demonstration and Q&A session.
π The Evolution of Information Retrieval
The presentation continues with a historical overview of information retrieval, from its theoretical foundations in the 1950s to the integration of machine learning in the 2000s. It highlights the current era's focus on large language models (LLMs) and generative AI. The team explains the concept of in-context learning, which reduces hallucinations in AI responses by providing contextual information. They describe the process of embedding documents into a vector database for efficient information retrieval and the importance of metrics for evaluating the system's performance.
π§ Deep Dive into the Rag-Based Architecture
The team delves into the specifics of the rag-based architecture, discussing the process of creating a context section for the large language model. They explain how documents are split into paragraphs and then into chunks, which are embedded and stored in a vector database. The user's query is processed in the same way, and the model uses the embeddings to search the database for relevant information. The retrieved documents are then used to construct a prompt for the large language model to answer the user's question accurately.
π Experimentation and Metrics
The team shares their initial experiments with the rag-based system, using default parameters and a synthetic dataset generated by a large language model. They discuss the challenges of lacking evaluation metrics and the strategies they employed to create a validation set. The introduction of metrics such as mean reciprocal rank, mean average precision, and recall at a given cut-off, as well as Rouge and BERT score, allowed them to evaluate and improve the system's performance. The team also experimented with custom chunking strategies and tuning hyperparameters to enhance the system's recall and question-answering accuracy.
π§ Enhancing the Information Retrieval Process
The team focuses on enhancing the information retrieval process by moving from a simple embedding search to a combined method that includes both embedding techniques and classical search methods like BM25. They discuss the impact of this change on system performance, including a significant boost in recall and QA accuracy. The team also explores research findings on how large language models use context information and introduces a re-ranking layer to organize the provided information more effectively. They discuss the integration of new features, such as increasing the document collection and testing different models like Palm and Gemini Pro.
π― Final Architecture and Takeaways
The team presents the final architecture of their product, which includes an ingestion phase with cloud storage, vertex pipeline for data processing, and an inference phase where the user interacts with the frontend service. They discuss the importance of the Google Cloud infrastructure for scalability and reliability. The team shares their key takeaways from the project, such as increasing document accessibility, experimenting with cutting-edge AI technologies, and the potential for future improvements. They also outline next steps, including testing new foundation models and exploring the Vertex AI vector search.
π¬ Q&A and Live Interaction
The session concludes with a Q&A segment, where the team addresses questions from the audience. Topics covered include building a consistent framework for a rag-based system without customer Q&A data, handling document updates, chunking strategies, storing chunks with metadata, and handling follow-up questions that lack context. The team provides insights into their approach to these challenges and shares best practices for similar scenarios.
Mindmap
Please replace the link and try again.
Keywords
π‘Google Cloud Community
π‘RAAG (Retrieval-Augmented Generation)
π‘Information Retrieval
π‘Large Language Models (LLMs)
π‘In-Context Learning
π‘Vector Database
π‘Embedding Model
π‘Synthetic Data Set
π‘QA (Question Answering)
π‘Evaluation Metrics
Highlights
The session focuses on the experience of developing and deploying a RAAG (Retrieval-Augmented Generation) based application for navigating complex enterprise documents.
The Google Cloud Community session introduces the Italia Italy team from Generali Italia who share their insights on leveraging AI for technological innovation in the insurance industry.
The insurance industry's significant reliance on documentation presents challenges in information retrieval, with over 400 documents and 5,000 pages requiring over 100 hours to read.
The challenges faced include the continuous growth of textual data, the time it takes to extract information from text, and information access from multiple databases.
Information retrieval is defined and its evolution from the 1950s to the adoption of large language models (LLMs) in the 2020s is discussed.
The presentation covers the RAAG architecture which combines generative AI with information retrieval to surpass the state of the art in language understanding and generation.
In-context learning is introduced as a method to reduce hallucinations in large language models by providing contextual information for accurate responses.
The process of creating a context section by embedding documents and retrieving similar information through a vector database is explained.
The importance of evaluation metrics for determining the direction and success of the RAAG application development is emphasized.
A synthetic dataset was created due to the unavailability of an existing validation dataset, using a large language model to generate questions and answers from document paragraphs.
Metrics such as mean reciprocal rank, mean average precision, and recall at a given cut-off, as well as Rouge and BERTScore, were used to evaluate the performance of the RAAG application.
The experimentation phase included custom splitting strategies for documents and tuning hyperparameters like model temperature and chunk lengths for in-context learning.
The team experimented with increasing the number of documents in the collection set and migrating from one model to another, such as from Palm to Gemini Pro, for improved performance.
The architecture of the RAAG application includes an ingestion phase for data processing and an inference phase for user interaction and prompt engineering.
The session concludes with a live demonstration of the application's user interface, showcasing its ability to answer questions using relevant documents.
The project increased the accessibility of documents within the company, allowed experimentation with cutting-edge AI technologies, and relied on Google's infrastructure for scalability and reliability.
Future steps include trying new foundation models, utilizing Vertex AI's pipeline for model comparison, and working on LLMs for RAAG applications to handle new documents effectively.
Transcripts
hello everyone and welcome to this new
session of the Google Cloud Community
today we have the pleasure to have the
generally Italia Italy team uh who will
share their experience in developing and
deploying a rag based application to
navigate complex uh Enterprise
documents before to before to start let
me to introduce uh myself uh I am Ivan
nardini I'm a customer engineer at
Google cloud and I supported generally
in implementing the this generative AI
application and together with me today
we have Ian and Domino Ian Domino would
you like to introduce yourself oh for
sure thank you and welcome everybody my
name is Ian a tech lead data scientist
in general Italia and I support the
development of artificial intelligence
and machine learning Solutions
hello I'm domano and Tech lead machine
learning for General
Italian okay so let's uh take a look at
the agenda
then so first of all we will Begin by
presenting the scenario in which the
document Q&A application was developed
and next we will explore why generally
choose to address these challenges using
large language models within a rag based
solution following that we will Deep
dive uh into the process that enabled
the generally team to successfully
deploy the rug uh based llm application
into production and in particular they
will share with you some details about
the experiments they conducted in terms
of chuning Le lexical search and ranking
strategies that uh ultimately lead to
the deployment of the application and
finally we will conclude the session
with a live demonstration some takeaways
and as always the Q&A session so with
that Ian Domino the stage is
yours okay so thank you even again uh
let's start with the general business
case uh generally is investing in
technological innovation but as an
insurance company documentation is
always an important and significant
component for our business uh the
industry is Guided by technical
regulation all accompanied by
documentation and we have documents such
as policy statement with terms and
conditions premium
statement um risk assessment reports or
internal company knowledge uh with
documents such as uh machine learning
model documentation corporate
regulations information related to uh
legal entities and so on and uh going
through all these documents in in order
to find the right information is very
important uh recognizing this problem we
saw an opportunity um and the question
we asked ourself was how we can Leverage
The advancement in AI to simplify the
information retrieval
process so we defined first of all a
perimeter of relevant business documents
to focus on in order to understand the
complexity of this challenge Uh current
we have more than 400 documents at our
disposal uh toiling more than 5,000
pages and this means that it will take
around more than 100 hours to read them
all uh in front of these numbers we
stand before three significant
challenges uh firstly there is a
continuous growth of available textual
data and knowledge uh so this will um um
expand our resources and this growth
introduce another additional challenge
the T the time it takes to extract
information from from text the third
challenge is related to information
access uh from multiple different uh
data sources located in different
databases and the field of information
retrieval uh can assist us with all
these
challenges first of all
uh what is information retrieval uh so
uh information retrieval is the science
of searching for document searching for
information in a document or searching
for document themselves so lens embark
on a journey through the timeline of
information retrieval just few concept
we me in the 1950s a decade Market by
theoretical Foundation of uh this field
uh with Concept like index indexing and
search algorithm and fast forward to the
1980s we saw the Advent of vector space
model entering in um in the 19 um in in
in two in the 2000 we see the
integration of machine learning and
search engines into uh this field and
now in 2020s particularly in 2023 we are
in a period with the adoption of uh
llms so uh over all there are numerous
information retrieval
methodologies which have been um
categorized here with three main
categories uh there are the C sear
mechanism that are based on precise
identification of words or vectors then
we have probabilistic approaches and
those based on machine learning um and
finally there are uh there are more
advanced methods that could be an emble
of previous ones or other architecture
but here we will talk about rag
architecture which is based on
generative Ai and here we are with the
generative AI part of this presentation
uh so generative AI have recently shown
to surpass the state of Art in terms of
performance of um understanding language
generating new contents and maintaining
a meaningful
conversation uh a large language model
is a machine learning model that's is is
trained on a vast amount of data and
they found a very pleasant definition on
internet uh to describe them they are
large autoc comption systems so in this
slide you can see a a sentence uh and
this is an input of an nlm that from now
on we will call a prompt so slow and
steady wins from a grammatical
standpoint there are many combination
that could be used to finish this
sentence but is a essentially only one
way to conclude it and it's the
race okay so one important concept for
building rug is the in context learning
let's see it in action for example we
can try to ask to our large language
model what is the color of your T-shirt
and if we see the response you will see
that uh the the t-shirt is red but this
is wrong this is what we usually call
hallucination this because the large
language models don't know the answer to
this question but try anyway to uh find
an answer for that question so one way
to uh reduce this type of hallucinations
is the in context learning the in
context learning is a prompt like this
where we say to the large language model
that uh you need to answer to that
question using all the information that
comes from uh the context um windows so
in this example we are seeing that in
the context section uh what even is
wearing and in this case we sub if we
submit this prompt to the large language
model we will see that it will answer
correctly so the idea here is to insert
in this context section all the
information that comes from our
documents and in this way the our large
language model we will able to answer to
the um to our uh question using
information from our documents so the
first step we need to
do is to uh create the context SE
section so we start from our document
database we split each document in
different par paragraph or chunks and we
pass these chunks to an embedding model
the embedding model is a large langage
model itself but but takes as input uh
text and returns a list of number a
vector so for each paragraph at this
point we will have the corrective
embedding here the idea is that the
paragraph with same information or with
similar information we will have also
similar embedding and this similarity
between embeddings can be calculated
mathematically using some distance
Matrix like the coin similarity and uh
at this point we can take all these
embeddings and store it in a vector
database that are some database built do
for retrieving and storing uh these
vectors and at this point we can go to
the retrieving generate steps so we have
an user that use a query this query is
processed using the same text Em text
embedding model so we La as output uh
the embedding of the user question and
we can use this uh user embedding to St
to search through the vector database
all the information all the uh chunks
that have similar information the idea
is that these uh context that we are
retrieving uh contains the information
to answer that question so at this point
we can take all this information the
user question and the documents
retrieved and put it in a prompt like
the prompt we have seen in the in
context learning and we can submit the
prompt to the large language model that
at this point we will answer to the user
question here we can see a summary of
this process so we have a question
generated by a user we uh with a chatbot
and a user interface we retrieve all the
knowledge that we need to answer that
question this information are uh used by
the large language model to answer to
the user uh using the internal reg of
our
documents so uh we can say that is not
too complex to create an architecture of
this kind we have conducted some initial
experiments and these activities were
conducted with default parameters
leaving the splitting Methods at theault
and even the Chun CLS uh Chun CLS at
default and even the information
retrieval process unchanged but we
didn't venture to making any kind of
sophistication within it and we obtained
approximately 45,000 chance for the
database in order to represent the
entire document collection but we came
to realize something very important we
lacked of um evaluation metrics uh we
didn't have a sort of compass that would
allow us in order to uh to determine if
we are moving in the right direction or
in the wrong one uh and essentially
there are two possible strategies the
first one is having access to an
existing data set uh that can serve as a
validation Set uh and the second
strategy is to create a new data set the
first option was not feasible uh that
because a a validation database was not
available so we decided to create a new
synthetic data set um we generated a
synthetic data set using a large
language model and first of all we
extracted paragraphs from each document
and then we have sent them uh into the
large language model asking it directly
to identify three questions and three
Associated answers and by doing that we
have obtained pairs of question and
answers
pairs so uh in this slide you can see
the situation an LA uh the first part on
the left is what we have just just
discussed we have a huge amount of
questions uh and we injected all these
questions in the in the rag architecture
presented by Dominico and as a result we
have obtained pairs of real and
predicted answer on which we can perform
comparisons at this point we need
metrics in order to evaluate the quality
of our work and the introduction of
metrics is is crucial for evaluating the
performance of the model comparing
predicted answers against Real answers
in term of accuracy recall relevance and
the the overall model ability in doing
this task and metrics help us even to
understand the variation in performance
when we introduce a new fature uh into
the system on the left you can see the
metrix for the information retrial
process uh on the right you can see the
metrics related for um evaluating the
the quality of responses generated we
have the mean reciprocal rank which is
the metric that is determined which
determines uh if the chunk is placed at
the top in the middle or at the bottom
um of all the documents returned then we
have metrics like the mean average
precision and recall at a given cut off
of K uh and that refers of the accuracy
of the chunks over all the retriever
chunks or over all the total number of
current chunks in the data set I know
that it's a little bit
complicated um on the other side we have
metrics such as the Rouge and this is a
metric that is related to the machine
translation field so moving from one
language to another and this Bas this
metric is based on the overlap of terms
uh we have also the bir score that is
another metric that we use in order to
understand a comparison of sentence
embeddings between predicted and real
answer and finally last but not least um
something new that is the QA evaluation
and that that is a metric based on llm
so essentially we inject the question
the real answer and the predicted answer
into the large language model and the
large language model's task is to
determine if we are um if if the if
their answer is is correct or not uh so
we decided to ah sorry sorry this part
is
for so before ding into the experiment
let's check out the tool that we used so
as experimentation layer we use the
vertex platform verx AI platform inside
the Google cloud and it's a useful tool
because it allows us to scale the number
of experiment and also the resources
that we are using for that experiment it
ensure the uh experiment reproducibility
and finally it save for us automatically
all the artifact that we we are
generating through our uh experiments
for the large language model layer we
use the all the jamy and palm models and
also the embedding multilingual the
embedding model multilingual that works
well with the Italian language that are
the original language of our
documents as chain layer we used lung
chain this because it has some function
for reading and processing uh PDFs and
it has also function for the information
retrieval step and finally as storage
layer we use quadrum that is a vector
database where we store our EMB bendings
and we use this because it's as a very
fast algorithm for searching through
vectors and also an EAS installation on
kubernetes so now that we have the
metrix the test set and the tools that
we can use we can start really with the
experiments and so with our second
experiment we implemented a
um custom splitting strategies with our
documents uh in particular we starting
split by
paragraph and then we split the
paragraph in sub chunks using a a
specific length uh on the other side we
also tuned all all the other hper
parameters that we had like the
temperature of the model The Prompt the
length of the various chunks and for
example also the number of chunks that
we insert in our prompt for the in
context learning so after this this all
this tuning we had that the best chunk
size it's 1,000 Char for our chunks and
we generated with this approach uh
13,000 chunks for for the metric
standoff point we had a recall at 15
documents of 80% and a question answer
um of uh 7
73% on the right we can see a plot where
we can see how the recall increase when
we increase the number of chunks that we
insert in our prompt so we can see that
after the 10 10 chunks that we insert we
had a plateau in our recall
Cur okay for the next experiment we
added some custom chunks in our
collection this because a lot of our
documents have acronyms and insurance
defs and a lot of time we have question
from the user where they need to where
they ask the meaning of these acronyms
of or some insurance definition for
example what is the definition of c so
we want that the chatbot is able to
answer to this question and for doing
this we added manually some chunks where
we explain what that acronyms means or
what that insurance definitions uh means
and in this manner we generated another
4,000
chunks and for the metric standoff point
with this experiment we had a recall of
78% and a question answer of
72% uh as you can see these metrics are
a little bit
lower if we compare it with the previous
experiment but we choose to take this as
best experiment just because we want
that our chatboard can be able to
explain acrs to the user or some
Insurance
definitions so uh we have seen after
this experimentation and introducing the
fact the the aspects related to
definitions and so on that there is a
direct correlation between the
information retrieved by uh the vector
database and then then the information
used by the large language model uh but
simply if the information from the
vector database is incorrect thear wedge
model cannot generate an accurate HW so
we understood that we need to focus um
our efforts and time on the information
retrieval process um enhancing the
quality of the chunks we identify and we
move we moved from a simple let's say
DSE search through embedding method to a
neing method that combines embedding
techniques with the classical uh bm25
and these search U methods are defined
as a mix mixture of search and then sear
and then search and we left the other
parameters unchanged and as you can see
at the bottom of this slide we still
achieved a significant performance uh
boost in terms of recall and even in in
the QA
accuracy but moving forward we also
ventured a little bit in to the research
field and last year uh there was an
interesting paper uh titled lost in the
middle uh I I think that it was
published in in November uh and it's a
paper where a group of researchers aimed
to understand how a large language
models use the information from the
context provided and this research team
um have found that there is a um a a
correlation um of where the the chunk
the cor chunk is used by the large Lang
wi model compared to the overall number
of chunks in the prompt here in this
graph we um we have the what socalled
u-shaped Cur which essentially represent
the reduction of performances in in
relation to the um placement of corrent
information across uh the r return
document set and this consideration lead
us to think about how the information is
provided and is organized to llm so
that's why we introduced the new layer
or re ranking layer that is capable of
sorting uh the information that is the
most accurate information that might be
used by the large language model to
either at the top or at bottom of all
the documents into the
prompt so the last phase of our
experimentation was the integration of
two new features first of all we
increased we increased the number of
documents uh into the collection set and
the second was um to moving from Palm
one to Palm two and currently we are
testing uh even Gemini Pro in terms of
performances and as you can see at the
bottom of these slides there is a recall
reduction uh in term of performances and
we must acknowledge uh some change in in
terms of metrics so this is motivated by
the introduction of new documents
without fine-tuning the input
preprocessing uh pipeline it's very
important to adjust the uh the input
pipeline um to the to the documents that
you are going to use so we have also a
supplementary um documents that contains
definition within them and which can be
used during the QA
evaluation um as chunks
for the questions and this could be a
motivation for the recall reduction on
the other sides as you can see uh from
by migrating from Palm one to Palm Tre
we obtained an improvement in terms of
QA
accuracy uh oh demo part okay so having
discussed this
architecture um experiments as we have
just seen and many other things
um let's see briefly uh just this
application how it works uh give me one
second to to upload the the the
videos Okay so this um this is the the
the user interface as you can see there
the the user can insert a question uh
into the the the tab and uh in this case
we are asking to the system um explain
the generally strategy in terms of
environmental
sustainability and as you can see there
uh the answer is composed of two parts
the first part is related to uh the real
answer uh for the user query and the
second part is related to um the
document that is used by that is used
for for providing the the the answer
so that's is because we strongly believe
that the user should have the
opportunity to understand which document
uh was used in order to extract the the
information even in this case this is
another question related to AI ethics
from generali and even this case the the
system generates uh the the answer to
the question and then the the
sources
okay so this is our fin architecture
that we build for our product and this
is divided we have seen in two phases
the ingestion pH phase where we start
from a cloud storage where we store all
our documents and then we have the
vertex pipeline vertex pipeline are a
tool inside vertex AI platform it
ensure the training of models and and
can be used also for data processing so
we use that for doing all the data
ingestion chunk the chunking part the
embedding
part and the creation of bm25 index at
that point we have another two pipeline
uh at step three and four that do is the
information retrieval evaluation for
that calculat the recall at 15 documents
that we have seen and the part four
where we we calculate the semantic
evaluation and we calculate the question
answer accuracy at this point all the
eer parameters that we have find like
the prompts the
temperature are stored in an artifact
registry on the other point we have the
inference phase where we are user that
interacts with a frontend service we
have a back service that um manage all
the prompt engineering phase and this
can service read the information and all
the parameter that uh you need to use
like the temperature and the propt from
the artifact registry we had also a no
SQL database on top of fir store where
we store the uh conversation between the
user and the chatbot and this can be
used we use it to store data man that we
can fine tune uh in the next years uh
the large language model
okay so for what concerns the take homes
with uh this project we increased the
accessibility of our documents for all
the company uh we had the opportunity to
experiment with all the Google
Foundation models so with Cutting Edge
AI Technologies and we can rely on the
Google infr infrastructure for the
scalability of our experiments and also
the reliability of our product and
finally we are sharing a lot of
knowledge with hian nardini and all the
other Google Cloud
Engineers on the other side for the next
step uh the idea is to try the New
Foundation model publisher like ger Pro
1.5 try also the vertex AI AO side by
side the pipeline these are some
pipelines available in vertex where we
can compare two different model and uh
say how it answer to the same question
so we can choose the better model and
finally trying the vertex Vector search
that is a vector database inside the
Google inside the vertex this because
the current one the vector database that
we are using quadrant is an open source
um tool and uh so we have also verx
search that is implemented by default in
vertex and finally we can also work on
the LL lops for rug application for
seeing for example uh What uh what to do
when new documents are added to
our um
database I want to thank you also all
the team that work on this incredible
project and thank you you two for being
here with
us
cool so thank you Ian thank you Domino
for this great overview so as we
promised at the beginning now it's time
for having Q&A so we will start with
some questions that we collect uh and
then we will also go through the live
questions that we just received so let's
start with the first question which is
something that you just touched so the
first question is about how to build a
consistent framework for a r based
system when do when do you when you
don't you do not have a possibility to
collect hundreds of Q&A from your
customer okay um so we have partially
answered to this question previously um
in our case was uh was this this
scenario because we haven't access to
our
to an internal knowledge of Q&A so we
decide to create a synthetic data set uh
by providing uh chunks of paragraphs
into nlm and then
creating uh pairs of question and
answers in in the machine learning field
there are many possibilities in order to
do that uh using uh llm is just one of
them uh we suggest to have um a
framework um with pipelines in order to
be to to create this process in
iterative way uh take into account that
a synthetic data
Generation Um is not the same as um as
as the one that you can obtain from your
from the the business unit or your
customers uh so you need always to check
the the quality of your questions and
answers in order to have something that
is uh quite of a good
quality
um nothing else I
think okay well that's a great that's a
great answer I think you you just take a
it I so let's move on yeah let's move on
the next question so what do you do if
one of the documents uh get updated do
you build or rebuild actually the index
again
okay I will say that it depends because
if you add new documents you can just go
through the text embedding model and
update your index and your chat chat we
will able to answer uh to the new
question but maybe if you add too much
documents uh maybe there is there are
some more H parameters that are better
maybe there is a better prompt a better
length of chunks so uh my opinion is
that if we had a lot of new documents
maybe it's uh better to run the
preprocessing pipeline so
to to search if there are some better
parameters okay that makes
sense uh as you can see the question
they're getting shorter and shorter
which is good I think so uh what was
your
approach to
chunking can you can you maybe live a
little bit more yeah okay so I as I said
previously we the first thing that we
did is split by paragraph this because
we want that uh a single Chun will be
semantically different from another one
in in this manner you don't have a chunk
that has two different paragraph that
can lead to some that can
talk about different things so the first
thing that we did is split by paragraph
and then Subs spit um using the L chain
tool uh the iterative
splitter okay I hope uh it was a like
provides the the answer that they were
looking at so the I think this is the
last one um let's see but
what do you where do you store chunks of
tax in a in a gap uh cloud storage or B
query so as I said we use vertex
pipeline for our experiments and vertex
pipeline used by default save all the
artifacts on cloud storage so I will say
cloud
storage
okay you're passing the
exam uh so the the the last no this is
the last one this is the last question
so they're asking for best practices for
chunking large uh table data such as a
complex spread spreadsheet with many
sheets so now I don't know if you had
this kind of data but uh maybe you face
this uh this scenario with the other use
case that you are working on so feel
free to provide uh some best practices
here okay so um maybe I can join this
question with another that I have seen
in in the chat uh currently we are using
for doing that we are using an external
uh Library which is unstructured uh in
order to extract information from uh our
documents um you can use um some CNN
models in order to extract even the text
from images uh but um you you can use um
libraries like um Open PI Exel pandas in
these cases or um the or even
unstructured um it's important to take
into account that uh it's um it's even
important the way in which you feed the
llm uh using this kind of data so if you
are managing a spreadsheet you need to
create a structure in the prompt that is
um that is useful in order to be
understood by the LM maybe you can
integrate some sentences uh separators
and change the things in order to uh
create something that is more uh
semantic semantically valid for the llm
that's in my
opinion no it makes it makes totally
sense I don't know if Dominico you want
to add something here or we can move on
one of the Live question that we just
received
okay so one of the question that uh I
think it's a uh it's valuable
um they're asking about how large was
your validation set so how many uh Q&A
pairs you
have okay uh so we we have around uh
2,000 questions uh that we have used in
order to to
generate um this synthetic data set that
we try to split into validation and test
Set uh in order to be consistent in for
the definition of hyper parameters it's
uh we have seen that
uh the the the
parameters uh and the kind of rag
architecture that we are going to use
the the chunk lengths and so on it
depends on the questions that uh your uh
your rag is going to receive so it's
very important to ask your your business
units or your customers even what kind
of question do you do you expect to
generate or do do you expect to
introduce even for um creating a
um something that is uh okay from the
point of view of the
prompt okay so now let's uh let's uh uh
talk about a couple of question let me
ask you a couple of question uh related
to chunks so one question is how do how
do you store the chunks with metadata
other than the source file
name
Domin okay yeah we store all this
information uh in the vector database
because quadrant have this feature that
you can Store and search through um um
metadata so is very useful to have it in
the vector database because I've seen
another question um that it's asking how
do you uh use the right document of the
right people and if you have metadata in
your vector database you can just do the
semantic search and then filter also on
the on this metadata so if you you can
add a lot of metadata on your documents
that can be useful for your uh
rug and let me ask let me ask the last
question uh so how you handle scenarios
where a user ask a followup question
that lacks
context I'm reading the question
uh where okay okay I can give you the
example by the way or you can read it if
you
prefer uh so uh in this case uh you
could use multiple uh strategies you can
even um to create a summary of the
previous conversation and then using
them in order to feed the The Prompt for
generating the the the new answer or you
can uh embed the the previous
the entire previous conversation into uh
The Prompt but this is based even on the
um of
the the length input that the large
language model could
accept uh currently we are still working
on it so this feature is not integrated
for
us okay Dom do you
have is exactly uh that and if you want
in long chain these two strategies are
already implemented so it's very easy to
implement okay so yeah I think I think
this is it
so um let me just before to conclude let
me just uh uh go back on the dck on the
on the
slide
and and and yeah just one last uh
reminder uh don't forget that we are
going to have several events uh in March
here you have the some of them so uh
feel free to uh join them you have the
link to like participate uh but for now
uh I hope uh you enjoy the session and
uh thank you for uh
participating thank you thank you
bye-bye bye-bye bye
Browse More Related Video
Retrieval Augmented Generation - Neural NebulAI Episode 9
RAG vs. Fine Tuning
Enhance RAG Chatbot Performance By Refining A Reranking Model
ISR Unit I Lecture-1 | Data Retrieval Vs IR | Text Mining And IR Relation | B.E. IT|@yogeshborhade24
Strengthening Memory: Retrieval Practice Strategies
Introduction to Generative AI (Day 10/20) What are vector databases?
5.0 / 5 (0 votes)