Retrieval Augmented Generation for Navigating Large Enterprise Documents
Summary
TLDRDans cette session de la communauté Google Cloud, l'équipe d'Italia de Generali partage son expérience dans le développement et le déploiement d'une application basée sur les modèles de langage larges (LLM) pour naviguer dans des documents d'entreprise complexes. L'objectif était de simplifier le processus de recherche d'informations en utilisant les avancées de l'IA. Après avoir identifié les défis liés à la croissance continue des données textuelles, ils ont créé un ensemble de documents pertinents et utilisé des modèles de langage pour générer des questions et des réponses synthétiques. Ils ont mené des expériences pour affiner la recherche lexicale et les stratégies de classement, menant à la mise en production de l'application avec des améliorations significatives en matière de rappel et d'exactitude des réponses. L'équipe discute également des métriques clés utilisées pour évaluer les performances et des prochaines étapes pour améliorer l'application.
Takeaways
- 😀 L'équipe d'Italia de Generali a développé une application basée sur l'IA générative (RAG) pour faciliter la navigation dans les documents d'entreprise complexes.
- 🔍 L'application a été conçue pour répondre aux défis de la croissance continue des données textuelles, du temps nécessaire pour extraire des informations et de l'accès à l'information à partir de multiples sources de données.
- 📚 Generali gère plus de 400 documents et plus de 5 000 pages, ce qui représente plus de 100 heures de lecture pour les parcourir entièrement.
- 🛠️ Ils ont utilisé l'apprentissage in-context pour réduire les 'hallucinations' des modèles de langage et améliorer la précision des réponses de l'IA.
- 📈 Ils ont d'abord expérimenté avec des paramètres par défaut et créé un ensemble de données synthétique pour évaluer les performances de leur modèle.
- 📝 L'importance de l'évaluation a été soulignée, avec l'utilisation de métriques telles que le rappel, la précision moyenne, le ROUGE et le BIR pour mesurer la qualité des réponses.
- 🔧 Ils ont ajusté les paramètres tels que la taille des morceaux de texte, la température du modèle et la longueur des morceaux pour améliorer les performances de l'application.
- 🔑 La création d'un schéma de découpe de document personnalisé a permis d'améliorer la rétroaction et la précision des questions-réponses.
- 📈 L'ajout de morceaux personnalisés pour les acronymes et les définitions a aidé à répondre aux questions des utilisateurs sur ces éléments spécifiques.
- 🌐 L'utilisation de la plateforme Vertex AI de Google Cloud a permis une expérimentation à grande échelle et a assuré la reproductibilité des expériences.
- 🔄 L'application utilise une interface utilisateur pour poser des questions et fournir des réponses en se basant sur les documents internes de Generali, en montrant également la source de chaque réponse.
Q & A
Quel est l'objectif principal de cette session de la communauté Google Cloud?
-L'objectif principal est de partager l'expérience de l'équipe Generali Italia dans le développement et le déploiement d'une application basée sur RAG pour naviguer dans des documents d'entreprise complexes.
Qui est Ivan Nardini et quel rôle a-t-il joué dans ce projet?
-Ivan Nardini est un ingénieur client chez Google Cloud. Il a soutenu Generali dans la mise en œuvre de cette application d'IA générative.
Quels sont les trois principaux défis auxquels Generali est confronté en matière de gestion des documents?
-Les trois principaux défis sont la croissance continue des données textuelles, le temps nécessaire pour extraire des informations et l'accès à l'information provenant de diverses sources de données.
Qu'est-ce que l'apprentissage en contexte et comment aide-t-il à réduire les hallucinations dans les modèles de langage?
-L'apprentissage en contexte consiste à fournir au modèle de langage de grandes quantités d'informations contextuelles pertinentes afin qu'il puisse répondre correctement aux questions en se basant sur ces informations, réduisant ainsi les réponses incorrectes ou hallucinations.
Comment Generali a-t-il créé une base de données vectorielle pour améliorer la récupération d'informations?
-Generali a divisé les documents en paragraphes ou segments, a passé ces segments à un modèle d'embeddings pour générer des vecteurs, puis a stocké ces vecteurs dans une base de données vectorielle pour faciliter la récupération d'informations similaires.
Quels outils et plateformes Generali a-t-il utilisés pour mener ses expériences et stocker les données?
-Generali a utilisé la plateforme Vertex AI de Google pour les expérimentations, le modèle Palm pour le traitement du langage naturel, la bibliothèque LangChain pour le traitement des documents PDF et la base de données vectorielle Quadrant pour stocker les embeddings.
Quels types de métriques Generali a-t-il utilisés pour évaluer la qualité des réponses générées?
-Generali a utilisé des métriques telles que le rang réciproque moyen, la précision moyenne, le rappel à un certain niveau de K, le score Rouge pour la traduction de machine et le score BERT pour comparer les embeddings de phrases.
Quelle approche Generali a-t-il adoptée pour gérer les acronymes et les définitions spécifiques aux assurances dans les documents?
-Generali a ajouté manuellement des segments expliquant les acronymes et les définitions d'assurance dans la base de données vectorielle pour que le chatbot puisse répondre aux questions relatives à ces termes.
Comment Generali a-t-il adapté son processus de récupération d'informations pour améliorer la précision des réponses?
-Generali a combiné des techniques d'embeddings avec des méthodes de recherche classiques comme BM25 et a introduit une couche de réorganisation pour trier les informations les plus précises au sommet des documents récupérés.
Quelles sont les prochaines étapes prévues par Generali pour améliorer leur application RAG?
-Generali prévoit d'essayer de nouveaux modèles de base comme Gemini Pro, de comparer différentes versions de modèles avec Vertex AI et d'intégrer la recherche vectorielle de Vertex pour optimiser davantage le processus de récupération d'informations.
Outlines
😀 Présentation de l'équipe et du projet
Ivan, un ingénieur client chez Google Cloud, présente l'équipe d'Italia et le projet de développement et de déploiement d'une application basée sur l'IA généative pour naviguer dans des documents d'entreprise complexes. Il est accompagné d'Ian, un chef de projet technicien et data scientist, et de Domino, un chef de projet en apprentissage automatique. Ils discutent de l'agenda du prochain événement, qui comprend la présentation du scénario de développement de l'application, l'exploration de l'utilisation de modèles de langage large, et une démonstration en direct de l'application.
📚 Défi de la gestion des documents d'entreprise
L'équipe de General Italia aborde le problème de la croissance continue des données textuelles et de la connaissance, ainsi que le temps nécessaire pour extraire des informations de ces textes. Ils soulignent les défis de l'accès à l'information à partir de multiples sources de données, et expliquent comment ils ont utilisé les modèles de langage large pour améliorer le processus de recherche d'informations.
🔍 Approche de la recherche d'information
La présentation explore l'évolution de la recherche d'information, des concepts des années 1950 jusqu'à l'utilisation des modèles de langage large (LLM) dans les années 2020. Elle explique comment les LLM fonctionnent avec des exemples de 'prompts' et d'apprentissage contextuel pour réduire les 'hallucinations' et fournir des réponses précises en se basant sur le contexte fourni.
🤖 Déploiement de l'architecture RAG
L'équipe explique en détail le processus de déploiement de leur application basée sur l'IA généative, incluant la création d'un contexte pour l'apprentissage en contexte, le découpage des documents, l'utilisation d'un modèle d'incorporation pour générer des vecteurs, et la mise en place d'une base de données de vecteurs pour la recherche. Ils partagent également les résultats de leurs expériences initiales et l'importance de l'évaluation des métriques de performance.
📈 Expériences et améliorations
Après avoir constaté l'importance des métriques d'évaluation, l'équipe de General Italia crée un ensemble de données synthétique pour évaluer leurs résultats. Ils utilisent des outils comme Vertex AI Platform, des modèles de langage comme Jamy et Palm, et des bases de données de vecteurs comme Quadrum. Ils partagent les résultats de leurs expériences, y compris les améliorations apportées en ajustant la taille des morceaux de texte et en ajoutant des morceaux personnalisés pour répondre aux acronymes et aux définitions spécifiques du secteur de l'assurance.
🌐 Conclusion et perspectives
La session se termine par une démonstration en direct de l'application, des points clés tirés du projet et une session de questions-réponses. L'équipe de General Italia remercie Google Cloud et les ingénieurs pour leur soutien tout au long du projet. Ils discutent des prochaines étapes, notamment l'expérimentation avec de nouveaux modèles de fondation et l'amélioration des pipelines de prétraitement pour s'adapter aux nouveaux documents.
🙌 Questions-Réponses et discussions
Une session de questions-réponses est menée pour répondre aux interrogations des participants sur la création de cadres cohérents pour les systèmes RAG, la mise à jour des documents, l'approche de découpage des documents et le stockage des morceaux de texte. Les intervenants partagent des meilleures pratiques et des conseils basés sur leur expérience avec le projet.
📝 Conclusion finale et prochaines étapes
La session se conclut avec une invitation à participer à d'autres événements à venir et un remerciement aux participants. Les intervenants encouragent les participants à explorer de nouvelles fonctionnalités et à continuer à expérimenter avec les dernières technologies d'IA pour améliorer les applications RAG.
Mindmap
Keywords
💡Google Cloud Community
💡RAG (Retrieval-Augmented Generation)
💡IA générative (Generative AI)
💡Modèle de langage (Large Language Model, LLM)
💡Ingénierie inverse (Prompt Engineering)
💡Traitement du langage naturel (NLP, Natural Language Processing)
💡Curseur lexical (Lexical Search)
💡Classement (Ranking)
💡Évaluation des modèles (Model Evaluation)
💡Pipeline de données (Data Pipeline)
💡Métrique de recherche d'informations (IR Metrics)
💡Synthétique (Synthetic Data Set)
💡Hallucination
💡Apprentissage contextuel (In-context Learning)
💡Émbedding (Embedding)
💡Modèle de vecteurs (Vector Database)
💡Métriques de similarité (Similarity Metrics)
💡Rétrieval (Récupération d'informations)
💡BM25
💡Traitement des données (Data Processing)
💡Modèle de fondation (Foundation Model)
💡Infrastructure de Google (Google Infrastructure)
Highlights
Introduction of the session with the Google Cloud Community and the team from Generali Italia.
Ivan Nardini's role as a customer engineer at Google Cloud in supporting the development of the AI application.
Ian and Domino's introduction as tech leads in AI and machine learning at Generali Italia.
The agenda overview covering the development and deployment of a RAG-based application for enterprise documents.
The business case for Generali Italia focusing on the importance of documentation in the insurance industry.
The challenges of managing over 400 documents and the opportunity to simplify information retrieval with AI.
The definition and history of information retrieval, from the 1950s to the integration of LLMs in the 2020s.
The concept of generative AI and its recent advancements in language understanding and content generation.
The importance of in-context learning for reducing hallucinations in large language models.
The process of creating a context section by embedding document chunks and storing them in a vector database.
The retrieval and generation steps for answering user queries using the embedded information.
The initial experiments conducted with default parameters and the realization of the need for evaluation metrics.
The creation of a synthetic data set using a large language model to serve as a validation set.
The introduction of various metrics for evaluating the performance of the model in information retrieval and QA accuracy.
The use of Vertex AI platform for experimentation, ensuring scalability, reproducibility, and artifact storage.
Custom splitting strategies implemented to optimize chunk size and improve recall and QA accuracy.
The addition of custom chunks to address acronyms and insurance definitions for better user query responses.
The correlation found between the information retrieved by the vector database and the accuracy of the LLM.
The integration of a re-ranking layer to improve the organization of information provided to the LLM.
The migration from Palm to Palm 2 models and the testing of Gemini Pro for performance improvements.
The architecture of the RAG-based system, including the ingestion and inference phases for document processing.
The takeaways from the project, including increased document accessibility and the opportunity to experiment with cutting-edge AI technologies.
Plans for future steps, such as trying new foundation models and integrating Vertex AI pipelines for model comparison.
The Q&A session addressing questions on building a consistent framework for a RAG-based system without customer Q&A data.
Discussion on handling document updates and the process of rebuilding or updating the index.
Approach to chunking large documents and the use of tools like L chain for splitting and processing.
Storing chunks of text with metadata in a vector database for efficient retrieval and filtering.
Strategies for handling follow-up questions that lack context in a conversational AI system.
Transcripts
hello everyone and welcome to this new
session of the Google Cloud Community
today we have the pleasure to have the
generally Italia Italy team uh who will
share their experience in developing and
deploying a rag based application to
navigate complex uh Enterprise
documents before to before to start let
me to introduce uh myself uh I am Ivan
nardini I'm a customer engineer at
Google cloud and I supported generally
in implementing the this generative AI
application and together with me today
we have Ian and Domino Ian Domino would
you like to introduce yourself oh for
sure thank you and welcome everybody my
name is Ian a tech lead data scientist
in general Italia and I support the
development of artificial intelligence
and machine learning Solutions
hello I'm domano and Tech lead machine
learning for General
Italian okay so let's uh take a look at
the agenda
then so first of all we will Begin by
presenting the scenario in which the
document Q&A application was developed
and next we will explore why generally
choose to address these challenges using
large language models within a rag based
solution following that we will Deep
dive uh into the process that enabled
the generally team to successfully
deploy the rug uh based llm application
into production and in particular they
will share with you some details about
the experiments they conducted in terms
of chuning Le lexical search and ranking
strategies that uh ultimately lead to
the deployment of the application and
finally we will conclude the session
with a live demonstration some takeaways
and as always the Q&A session so with
that Ian Domino the stage is
yours okay so thank you even again uh
let's start with the general business
case uh generally is investing in
technological innovation but as an
insurance company documentation is
always an important and significant
component for our business uh the
industry is Guided by technical
regulation all accompanied by
documentation and we have documents such
as policy statement with terms and
conditions premium
statement um risk assessment reports or
internal company knowledge uh with
documents such as uh machine learning
model documentation corporate
regulations information related to uh
legal entities and so on and uh going
through all these documents in in order
to find the right information is very
important uh recognizing this problem we
saw an opportunity um and the question
we asked ourself was how we can Leverage
The advancement in AI to simplify the
information retrieval
process so we defined first of all a
perimeter of relevant business documents
to focus on in order to understand the
complexity of this challenge Uh current
we have more than 400 documents at our
disposal uh toiling more than 5,000
pages and this means that it will take
around more than 100 hours to read them
all uh in front of these numbers we
stand before three significant
challenges uh firstly there is a
continuous growth of available textual
data and knowledge uh so this will um um
expand our resources and this growth
introduce another additional challenge
the T the time it takes to extract
information from from text the third
challenge is related to information
access uh from multiple different uh
data sources located in different
databases and the field of information
retrieval uh can assist us with all
these
challenges first of all
uh what is information retrieval uh so
uh information retrieval is the science
of searching for document searching for
information in a document or searching
for document themselves so lens embark
on a journey through the timeline of
information retrieval just few concept
we me in the 1950s a decade Market by
theoretical Foundation of uh this field
uh with Concept like index indexing and
search algorithm and fast forward to the
1980s we saw the Advent of vector space
model entering in um in the 19 um in in
in two in the 2000 we see the
integration of machine learning and
search engines into uh this field and
now in 2020s particularly in 2023 we are
in a period with the adoption of uh
llms so uh over all there are numerous
information retrieval
methodologies which have been um
categorized here with three main
categories uh there are the C sear
mechanism that are based on precise
identification of words or vectors then
we have probabilistic approaches and
those based on machine learning um and
finally there are uh there are more
advanced methods that could be an emble
of previous ones or other architecture
but here we will talk about rag
architecture which is based on
generative Ai and here we are with the
generative AI part of this presentation
uh so generative AI have recently shown
to surpass the state of Art in terms of
performance of um understanding language
generating new contents and maintaining
a meaningful
conversation uh a large language model
is a machine learning model that's is is
trained on a vast amount of data and
they found a very pleasant definition on
internet uh to describe them they are
large autoc comption systems so in this
slide you can see a a sentence uh and
this is an input of an nlm that from now
on we will call a prompt so slow and
steady wins from a grammatical
standpoint there are many combination
that could be used to finish this
sentence but is a essentially only one
way to conclude it and it's the
race okay so one important concept for
building rug is the in context learning
let's see it in action for example we
can try to ask to our large language
model what is the color of your T-shirt
and if we see the response you will see
that uh the the t-shirt is red but this
is wrong this is what we usually call
hallucination this because the large
language models don't know the answer to
this question but try anyway to uh find
an answer for that question so one way
to uh reduce this type of hallucinations
is the in context learning the in
context learning is a prompt like this
where we say to the large language model
that uh you need to answer to that
question using all the information that
comes from uh the context um windows so
in this example we are seeing that in
the context section uh what even is
wearing and in this case we sub if we
submit this prompt to the large language
model we will see that it will answer
correctly so the idea here is to insert
in this context section all the
information that comes from our
documents and in this way the our large
language model we will able to answer to
the um to our uh question using
information from our documents so the
first step we need to
do is to uh create the context SE
section so we start from our document
database we split each document in
different par paragraph or chunks and we
pass these chunks to an embedding model
the embedding model is a large langage
model itself but but takes as input uh
text and returns a list of number a
vector so for each paragraph at this
point we will have the corrective
embedding here the idea is that the
paragraph with same information or with
similar information we will have also
similar embedding and this similarity
between embeddings can be calculated
mathematically using some distance
Matrix like the coin similarity and uh
at this point we can take all these
embeddings and store it in a vector
database that are some database built do
for retrieving and storing uh these
vectors and at this point we can go to
the retrieving generate steps so we have
an user that use a query this query is
processed using the same text Em text
embedding model so we La as output uh
the embedding of the user question and
we can use this uh user embedding to St
to search through the vector database
all the information all the uh chunks
that have similar information the idea
is that these uh context that we are
retrieving uh contains the information
to answer that question so at this point
we can take all this information the
user question and the documents
retrieved and put it in a prompt like
the prompt we have seen in the in
context learning and we can submit the
prompt to the large language model that
at this point we will answer to the user
question here we can see a summary of
this process so we have a question
generated by a user we uh with a chatbot
and a user interface we retrieve all the
knowledge that we need to answer that
question this information are uh used by
the large language model to answer to
the user uh using the internal reg of
our
documents so uh we can say that is not
too complex to create an architecture of
this kind we have conducted some initial
experiments and these activities were
conducted with default parameters
leaving the splitting Methods at theault
and even the Chun CLS uh Chun CLS at
default and even the information
retrieval process unchanged but we
didn't venture to making any kind of
sophistication within it and we obtained
approximately 45,000 chance for the
database in order to represent the
entire document collection but we came
to realize something very important we
lacked of um evaluation metrics uh we
didn't have a sort of compass that would
allow us in order to uh to determine if
we are moving in the right direction or
in the wrong one uh and essentially
there are two possible strategies the
first one is having access to an
existing data set uh that can serve as a
validation Set uh and the second
strategy is to create a new data set the
first option was not feasible uh that
because a a validation database was not
available so we decided to create a new
synthetic data set um we generated a
synthetic data set using a large
language model and first of all we
extracted paragraphs from each document
and then we have sent them uh into the
large language model asking it directly
to identify three questions and three
Associated answers and by doing that we
have obtained pairs of question and
answers
pairs so uh in this slide you can see
the situation an LA uh the first part on
the left is what we have just just
discussed we have a huge amount of
questions uh and we injected all these
questions in the in the rag architecture
presented by Dominico and as a result we
have obtained pairs of real and
predicted answer on which we can perform
comparisons at this point we need
metrics in order to evaluate the quality
of our work and the introduction of
metrics is is crucial for evaluating the
performance of the model comparing
predicted answers against Real answers
in term of accuracy recall relevance and
the the overall model ability in doing
this task and metrics help us even to
understand the variation in performance
when we introduce a new fature uh into
the system on the left you can see the
metrix for the information retrial
process uh on the right you can see the
metrics related for um evaluating the
the quality of responses generated we
have the mean reciprocal rank which is
the metric that is determined which
determines uh if the chunk is placed at
the top in the middle or at the bottom
um of all the documents returned then we
have metrics like the mean average
precision and recall at a given cut off
of K uh and that refers of the accuracy
of the chunks over all the retriever
chunks or over all the total number of
current chunks in the data set I know
that it's a little bit
complicated um on the other side we have
metrics such as the Rouge and this is a
metric that is related to the machine
translation field so moving from one
language to another and this Bas this
metric is based on the overlap of terms
uh we have also the bir score that is
another metric that we use in order to
understand a comparison of sentence
embeddings between predicted and real
answer and finally last but not least um
something new that is the QA evaluation
and that that is a metric based on llm
so essentially we inject the question
the real answer and the predicted answer
into the large language model and the
large language model's task is to
determine if we are um if if the if
their answer is is correct or not uh so
we decided to ah sorry sorry this part
is
for so before ding into the experiment
let's check out the tool that we used so
as experimentation layer we use the
vertex platform verx AI platform inside
the Google cloud and it's a useful tool
because it allows us to scale the number
of experiment and also the resources
that we are using for that experiment it
ensure the uh experiment reproducibility
and finally it save for us automatically
all the artifact that we we are
generating through our uh experiments
for the large language model layer we
use the all the jamy and palm models and
also the embedding multilingual the
embedding model multilingual that works
well with the Italian language that are
the original language of our
documents as chain layer we used lung
chain this because it has some function
for reading and processing uh PDFs and
it has also function for the information
retrieval step and finally as storage
layer we use quadrum that is a vector
database where we store our EMB bendings
and we use this because it's as a very
fast algorithm for searching through
vectors and also an EAS installation on
kubernetes so now that we have the
metrix the test set and the tools that
we can use we can start really with the
experiments and so with our second
experiment we implemented a
um custom splitting strategies with our
documents uh in particular we starting
split by
paragraph and then we split the
paragraph in sub chunks using a a
specific length uh on the other side we
also tuned all all the other hper
parameters that we had like the
temperature of the model The Prompt the
length of the various chunks and for
example also the number of chunks that
we insert in our prompt for the in
context learning so after this this all
this tuning we had that the best chunk
size it's 1,000 Char for our chunks and
we generated with this approach uh
13,000 chunks for for the metric
standoff point we had a recall at 15
documents of 80% and a question answer
um of uh 7
73% on the right we can see a plot where
we can see how the recall increase when
we increase the number of chunks that we
insert in our prompt so we can see that
after the 10 10 chunks that we insert we
had a plateau in our recall
Cur okay for the next experiment we
added some custom chunks in our
collection this because a lot of our
documents have acronyms and insurance
defs and a lot of time we have question
from the user where they need to where
they ask the meaning of these acronyms
of or some insurance definition for
example what is the definition of c so
we want that the chatbot is able to
answer to this question and for doing
this we added manually some chunks where
we explain what that acronyms means or
what that insurance definitions uh means
and in this manner we generated another
4,000
chunks and for the metric standoff point
with this experiment we had a recall of
78% and a question answer of
72% uh as you can see these metrics are
a little bit
lower if we compare it with the previous
experiment but we choose to take this as
best experiment just because we want
that our chatboard can be able to
explain acrs to the user or some
Insurance
definitions so uh we have seen after
this experimentation and introducing the
fact the the aspects related to
definitions and so on that there is a
direct correlation between the
information retrieved by uh the vector
database and then then the information
used by the large language model uh but
simply if the information from the
vector database is incorrect thear wedge
model cannot generate an accurate HW so
we understood that we need to focus um
our efforts and time on the information
retrieval process um enhancing the
quality of the chunks we identify and we
move we moved from a simple let's say
DSE search through embedding method to a
neing method that combines embedding
techniques with the classical uh bm25
and these search U methods are defined
as a mix mixture of search and then sear
and then search and we left the other
parameters unchanged and as you can see
at the bottom of this slide we still
achieved a significant performance uh
boost in terms of recall and even in in
the QA
accuracy but moving forward we also
ventured a little bit in to the research
field and last year uh there was an
interesting paper uh titled lost in the
middle uh I I think that it was
published in in November uh and it's a
paper where a group of researchers aimed
to understand how a large language
models use the information from the
context provided and this research team
um have found that there is a um a a
correlation um of where the the chunk
the cor chunk is used by the large Lang
wi model compared to the overall number
of chunks in the prompt here in this
graph we um we have the what socalled
u-shaped Cur which essentially represent
the reduction of performances in in
relation to the um placement of corrent
information across uh the r return
document set and this consideration lead
us to think about how the information is
provided and is organized to llm so
that's why we introduced the new layer
or re ranking layer that is capable of
sorting uh the information that is the
most accurate information that might be
used by the large language model to
either at the top or at bottom of all
the documents into the
prompt so the last phase of our
experimentation was the integration of
two new features first of all we
increased we increased the number of
documents uh into the collection set and
the second was um to moving from Palm
one to Palm two and currently we are
testing uh even Gemini Pro in terms of
performances and as you can see at the
bottom of these slides there is a recall
reduction uh in term of performances and
we must acknowledge uh some change in in
terms of metrics so this is motivated by
the introduction of new documents
without fine-tuning the input
preprocessing uh pipeline it's very
important to adjust the uh the input
pipeline um to the to the documents that
you are going to use so we have also a
supplementary um documents that contains
definition within them and which can be
used during the QA
evaluation um as chunks
for the questions and this could be a
motivation for the recall reduction on
the other sides as you can see uh from
by migrating from Palm one to Palm Tre
we obtained an improvement in terms of
QA
accuracy uh oh demo part okay so having
discussed this
architecture um experiments as we have
just seen and many other things
um let's see briefly uh just this
application how it works uh give me one
second to to upload the the the
videos Okay so this um this is the the
the user interface as you can see there
the the user can insert a question uh
into the the the tab and uh in this case
we are asking to the system um explain
the generally strategy in terms of
environmental
sustainability and as you can see there
uh the answer is composed of two parts
the first part is related to uh the real
answer uh for the user query and the
second part is related to um the
document that is used by that is used
for for providing the the the answer
so that's is because we strongly believe
that the user should have the
opportunity to understand which document
uh was used in order to extract the the
information even in this case this is
another question related to AI ethics
from generali and even this case the the
system generates uh the the answer to
the question and then the the
sources
okay so this is our fin architecture
that we build for our product and this
is divided we have seen in two phases
the ingestion pH phase where we start
from a cloud storage where we store all
our documents and then we have the
vertex pipeline vertex pipeline are a
tool inside vertex AI platform it
ensure the training of models and and
can be used also for data processing so
we use that for doing all the data
ingestion chunk the chunking part the
embedding
part and the creation of bm25 index at
that point we have another two pipeline
uh at step three and four that do is the
information retrieval evaluation for
that calculat the recall at 15 documents
that we have seen and the part four
where we we calculate the semantic
evaluation and we calculate the question
answer accuracy at this point all the
eer parameters that we have find like
the prompts the
temperature are stored in an artifact
registry on the other point we have the
inference phase where we are user that
interacts with a frontend service we
have a back service that um manage all
the prompt engineering phase and this
can service read the information and all
the parameter that uh you need to use
like the temperature and the propt from
the artifact registry we had also a no
SQL database on top of fir store where
we store the uh conversation between the
user and the chatbot and this can be
used we use it to store data man that we
can fine tune uh in the next years uh
the large language model
okay so for what concerns the take homes
with uh this project we increased the
accessibility of our documents for all
the company uh we had the opportunity to
experiment with all the Google
Foundation models so with Cutting Edge
AI Technologies and we can rely on the
Google infr infrastructure for the
scalability of our experiments and also
the reliability of our product and
finally we are sharing a lot of
knowledge with hian nardini and all the
other Google Cloud
Engineers on the other side for the next
step uh the idea is to try the New
Foundation model publisher like ger Pro
1.5 try also the vertex AI AO side by
side the pipeline these are some
pipelines available in vertex where we
can compare two different model and uh
say how it answer to the same question
so we can choose the better model and
finally trying the vertex Vector search
that is a vector database inside the
Google inside the vertex this because
the current one the vector database that
we are using quadrant is an open source
um tool and uh so we have also verx
search that is implemented by default in
vertex and finally we can also work on
the LL lops for rug application for
seeing for example uh What uh what to do
when new documents are added to
our um
database I want to thank you also all
the team that work on this incredible
project and thank you you two for being
here with
us
cool so thank you Ian thank you Domino
for this great overview so as we
promised at the beginning now it's time
for having Q&A so we will start with
some questions that we collect uh and
then we will also go through the live
questions that we just received so let's
start with the first question which is
something that you just touched so the
first question is about how to build a
consistent framework for a r based
system when do when do you when you
don't you do not have a possibility to
collect hundreds of Q&A from your
customer okay um so we have partially
answered to this question previously um
in our case was uh was this this
scenario because we haven't access to
our
to an internal knowledge of Q&A so we
decide to create a synthetic data set uh
by providing uh chunks of paragraphs
into nlm and then
creating uh pairs of question and
answers in in the machine learning field
there are many possibilities in order to
do that uh using uh llm is just one of
them uh we suggest to have um a
framework um with pipelines in order to
be to to create this process in
iterative way uh take into account that
a synthetic data
Generation Um is not the same as um as
as the one that you can obtain from your
from the the business unit or your
customers uh so you need always to check
the the quality of your questions and
answers in order to have something that
is uh quite of a good
quality
um nothing else I
think okay well that's a great that's a
great answer I think you you just take a
it I so let's move on yeah let's move on
the next question so what do you do if
one of the documents uh get updated do
you build or rebuild actually the index
again
okay I will say that it depends because
if you add new documents you can just go
through the text embedding model and
update your index and your chat chat we
will able to answer uh to the new
question but maybe if you add too much
documents uh maybe there is there are
some more H parameters that are better
maybe there is a better prompt a better
length of chunks so uh my opinion is
that if we had a lot of new documents
maybe it's uh better to run the
preprocessing pipeline so
to to search if there are some better
parameters okay that makes
sense uh as you can see the question
they're getting shorter and shorter
which is good I think so uh what was
your
approach to
chunking can you can you maybe live a
little bit more yeah okay so I as I said
previously we the first thing that we
did is split by paragraph this because
we want that uh a single Chun will be
semantically different from another one
in in this manner you don't have a chunk
that has two different paragraph that
can lead to some that can
talk about different things so the first
thing that we did is split by paragraph
and then Subs spit um using the L chain
tool uh the iterative
splitter okay I hope uh it was a like
provides the the answer that they were
looking at so the I think this is the
last one um let's see but
what do you where do you store chunks of
tax in a in a gap uh cloud storage or B
query so as I said we use vertex
pipeline for our experiments and vertex
pipeline used by default save all the
artifacts on cloud storage so I will say
cloud
storage
okay you're passing the
exam uh so the the the last no this is
the last one this is the last question
so they're asking for best practices for
chunking large uh table data such as a
complex spread spreadsheet with many
sheets so now I don't know if you had
this kind of data but uh maybe you face
this uh this scenario with the other use
case that you are working on so feel
free to provide uh some best practices
here okay so um maybe I can join this
question with another that I have seen
in in the chat uh currently we are using
for doing that we are using an external
uh Library which is unstructured uh in
order to extract information from uh our
documents um you can use um some CNN
models in order to extract even the text
from images uh but um you you can use um
libraries like um Open PI Exel pandas in
these cases or um the or even
unstructured um it's important to take
into account that uh it's um it's even
important the way in which you feed the
llm uh using this kind of data so if you
are managing a spreadsheet you need to
create a structure in the prompt that is
um that is useful in order to be
understood by the LM maybe you can
integrate some sentences uh separators
and change the things in order to uh
create something that is more uh
semantic semantically valid for the llm
that's in my
opinion no it makes it makes totally
sense I don't know if Dominico you want
to add something here or we can move on
one of the Live question that we just
received
okay so one of the question that uh I
think it's a uh it's valuable
um they're asking about how large was
your validation set so how many uh Q&A
pairs you
have okay uh so we we have around uh
2,000 questions uh that we have used in
order to to
generate um this synthetic data set that
we try to split into validation and test
Set uh in order to be consistent in for
the definition of hyper parameters it's
uh we have seen that
uh the the the
parameters uh and the kind of rag
architecture that we are going to use
the the chunk lengths and so on it
depends on the questions that uh your uh
your rag is going to receive so it's
very important to ask your your business
units or your customers even what kind
of question do you do you expect to
generate or do do you expect to
introduce even for um creating a
um something that is uh okay from the
point of view of the
prompt okay so now let's uh let's uh uh
talk about a couple of question let me
ask you a couple of question uh related
to chunks so one question is how do how
do you store the chunks with metadata
other than the source file
name
Domin okay yeah we store all this
information uh in the vector database
because quadrant have this feature that
you can Store and search through um um
metadata so is very useful to have it in
the vector database because I've seen
another question um that it's asking how
do you uh use the right document of the
right people and if you have metadata in
your vector database you can just do the
semantic search and then filter also on
the on this metadata so if you you can
add a lot of metadata on your documents
that can be useful for your uh
rug and let me ask let me ask the last
question uh so how you handle scenarios
where a user ask a followup question
that lacks
context I'm reading the question
uh where okay okay I can give you the
example by the way or you can read it if
you
prefer uh so uh in this case uh you
could use multiple uh strategies you can
even um to create a summary of the
previous conversation and then using
them in order to feed the The Prompt for
generating the the the new answer or you
can uh embed the the previous
the entire previous conversation into uh
The Prompt but this is based even on the
um of
the the length input that the large
language model could
accept uh currently we are still working
on it so this feature is not integrated
for
us okay Dom do you
have is exactly uh that and if you want
in long chain these two strategies are
already implemented so it's very easy to
implement okay so yeah I think I think
this is it
so um let me just before to conclude let
me just uh uh go back on the dck on the
on the
slide
and and and yeah just one last uh
reminder uh don't forget that we are
going to have several events uh in March
here you have the some of them so uh
feel free to uh join them you have the
link to like participate uh but for now
uh I hope uh you enjoy the session and
uh thank you for uh
participating thank you thank you
bye-bye bye-bye bye
5.0 / 5 (0 votes)