Retrieval Augmented Generation for Navigating Large Enterprise Documents

Google Cloud Events
28 Feb 202442:01

Summary

TLDRDans cette session de la communauté Google Cloud, l'équipe d'Italia de Generali partage son expérience dans le développement et le déploiement d'une application basée sur les modèles de langage larges (LLM) pour naviguer dans des documents d'entreprise complexes. L'objectif était de simplifier le processus de recherche d'informations en utilisant les avancées de l'IA. Après avoir identifié les défis liés à la croissance continue des données textuelles, ils ont créé un ensemble de documents pertinents et utilisé des modèles de langage pour générer des questions et des réponses synthétiques. Ils ont mené des expériences pour affiner la recherche lexicale et les stratégies de classement, menant à la mise en production de l'application avec des améliorations significatives en matière de rappel et d'exactitude des réponses. L'équipe discute également des métriques clés utilisées pour évaluer les performances et des prochaines étapes pour améliorer l'application.

Takeaways

  • 😀 L'équipe d'Italia de Generali a développé une application basée sur l'IA générative (RAG) pour faciliter la navigation dans les documents d'entreprise complexes.
  • 🔍 L'application a été conçue pour répondre aux défis de la croissance continue des données textuelles, du temps nécessaire pour extraire des informations et de l'accès à l'information à partir de multiples sources de données.
  • 📚 Generali gère plus de 400 documents et plus de 5 000 pages, ce qui représente plus de 100 heures de lecture pour les parcourir entièrement.
  • 🛠️ Ils ont utilisé l'apprentissage in-context pour réduire les 'hallucinations' des modèles de langage et améliorer la précision des réponses de l'IA.
  • 📈 Ils ont d'abord expérimenté avec des paramètres par défaut et créé un ensemble de données synthétique pour évaluer les performances de leur modèle.
  • 📝 L'importance de l'évaluation a été soulignée, avec l'utilisation de métriques telles que le rappel, la précision moyenne, le ROUGE et le BIR pour mesurer la qualité des réponses.
  • 🔧 Ils ont ajusté les paramètres tels que la taille des morceaux de texte, la température du modèle et la longueur des morceaux pour améliorer les performances de l'application.
  • 🔑 La création d'un schéma de découpe de document personnalisé a permis d'améliorer la rétroaction et la précision des questions-réponses.
  • 📈 L'ajout de morceaux personnalisés pour les acronymes et les définitions a aidé à répondre aux questions des utilisateurs sur ces éléments spécifiques.
  • 🌐 L'utilisation de la plateforme Vertex AI de Google Cloud a permis une expérimentation à grande échelle et a assuré la reproductibilité des expériences.
  • 🔄 L'application utilise une interface utilisateur pour poser des questions et fournir des réponses en se basant sur les documents internes de Generali, en montrant également la source de chaque réponse.

Q & A

  • Quel est l'objectif principal de cette session de la communauté Google Cloud?

    -L'objectif principal est de partager l'expérience de l'équipe Generali Italia dans le développement et le déploiement d'une application basée sur RAG pour naviguer dans des documents d'entreprise complexes.

  • Qui est Ivan Nardini et quel rôle a-t-il joué dans ce projet?

    -Ivan Nardini est un ingénieur client chez Google Cloud. Il a soutenu Generali dans la mise en œuvre de cette application d'IA générative.

  • Quels sont les trois principaux défis auxquels Generali est confronté en matière de gestion des documents?

    -Les trois principaux défis sont la croissance continue des données textuelles, le temps nécessaire pour extraire des informations et l'accès à l'information provenant de diverses sources de données.

  • Qu'est-ce que l'apprentissage en contexte et comment aide-t-il à réduire les hallucinations dans les modèles de langage?

    -L'apprentissage en contexte consiste à fournir au modèle de langage de grandes quantités d'informations contextuelles pertinentes afin qu'il puisse répondre correctement aux questions en se basant sur ces informations, réduisant ainsi les réponses incorrectes ou hallucinations.

  • Comment Generali a-t-il créé une base de données vectorielle pour améliorer la récupération d'informations?

    -Generali a divisé les documents en paragraphes ou segments, a passé ces segments à un modèle d'embeddings pour générer des vecteurs, puis a stocké ces vecteurs dans une base de données vectorielle pour faciliter la récupération d'informations similaires.

  • Quels outils et plateformes Generali a-t-il utilisés pour mener ses expériences et stocker les données?

    -Generali a utilisé la plateforme Vertex AI de Google pour les expérimentations, le modèle Palm pour le traitement du langage naturel, la bibliothèque LangChain pour le traitement des documents PDF et la base de données vectorielle Quadrant pour stocker les embeddings.

  • Quels types de métriques Generali a-t-il utilisés pour évaluer la qualité des réponses générées?

    -Generali a utilisé des métriques telles que le rang réciproque moyen, la précision moyenne, le rappel à un certain niveau de K, le score Rouge pour la traduction de machine et le score BERT pour comparer les embeddings de phrases.

  • Quelle approche Generali a-t-il adoptée pour gérer les acronymes et les définitions spécifiques aux assurances dans les documents?

    -Generali a ajouté manuellement des segments expliquant les acronymes et les définitions d'assurance dans la base de données vectorielle pour que le chatbot puisse répondre aux questions relatives à ces termes.

  • Comment Generali a-t-il adapté son processus de récupération d'informations pour améliorer la précision des réponses?

    -Generali a combiné des techniques d'embeddings avec des méthodes de recherche classiques comme BM25 et a introduit une couche de réorganisation pour trier les informations les plus précises au sommet des documents récupérés.

  • Quelles sont les prochaines étapes prévues par Generali pour améliorer leur application RAG?

    -Generali prévoit d'essayer de nouveaux modèles de base comme Gemini Pro, de comparer différentes versions de modèles avec Vertex AI et d'intégrer la recherche vectorielle de Vertex pour optimiser davantage le processus de récupération d'informations.

Outlines

00:00

😀 Présentation de l'équipe et du projet

Ivan, un ingénieur client chez Google Cloud, présente l'équipe d'Italia et le projet de développement et de déploiement d'une application basée sur l'IA généative pour naviguer dans des documents d'entreprise complexes. Il est accompagné d'Ian, un chef de projet technicien et data scientist, et de Domino, un chef de projet en apprentissage automatique. Ils discutent de l'agenda du prochain événement, qui comprend la présentation du scénario de développement de l'application, l'exploration de l'utilisation de modèles de langage large, et une démonstration en direct de l'application.

05:03

📚 Défi de la gestion des documents d'entreprise

L'équipe de General Italia aborde le problème de la croissance continue des données textuelles et de la connaissance, ainsi que le temps nécessaire pour extraire des informations de ces textes. Ils soulignent les défis de l'accès à l'information à partir de multiples sources de données, et expliquent comment ils ont utilisé les modèles de langage large pour améliorer le processus de recherche d'informations.

10:05

🔍 Approche de la recherche d'information

La présentation explore l'évolution de la recherche d'information, des concepts des années 1950 jusqu'à l'utilisation des modèles de langage large (LLM) dans les années 2020. Elle explique comment les LLM fonctionnent avec des exemples de 'prompts' et d'apprentissage contextuel pour réduire les 'hallucinations' et fournir des réponses précises en se basant sur le contexte fourni.

15:07

🤖 Déploiement de l'architecture RAG

L'équipe explique en détail le processus de déploiement de leur application basée sur l'IA généative, incluant la création d'un contexte pour l'apprentissage en contexte, le découpage des documents, l'utilisation d'un modèle d'incorporation pour générer des vecteurs, et la mise en place d'une base de données de vecteurs pour la recherche. Ils partagent également les résultats de leurs expériences initiales et l'importance de l'évaluation des métriques de performance.

20:09

📈 Expériences et améliorations

Après avoir constaté l'importance des métriques d'évaluation, l'équipe de General Italia crée un ensemble de données synthétique pour évaluer leurs résultats. Ils utilisent des outils comme Vertex AI Platform, des modèles de langage comme Jamy et Palm, et des bases de données de vecteurs comme Quadrum. Ils partagent les résultats de leurs expériences, y compris les améliorations apportées en ajustant la taille des morceaux de texte et en ajoutant des morceaux personnalisés pour répondre aux acronymes et aux définitions spécifiques du secteur de l'assurance.

25:13

🌐 Conclusion et perspectives

La session se termine par une démonstration en direct de l'application, des points clés tirés du projet et une session de questions-réponses. L'équipe de General Italia remercie Google Cloud et les ingénieurs pour leur soutien tout au long du projet. Ils discutent des prochaines étapes, notamment l'expérimentation avec de nouveaux modèles de fondation et l'amélioration des pipelines de prétraitement pour s'adapter aux nouveaux documents.

30:15

🙌 Questions-Réponses et discussions

Une session de questions-réponses est menée pour répondre aux interrogations des participants sur la création de cadres cohérents pour les systèmes RAG, la mise à jour des documents, l'approche de découpage des documents et le stockage des morceaux de texte. Les intervenants partagent des meilleures pratiques et des conseils basés sur leur expérience avec le projet.

35:18

📝 Conclusion finale et prochaines étapes

La session se conclut avec une invitation à participer à d'autres événements à venir et un remerciement aux participants. Les intervenants encouragent les participants à explorer de nouvelles fonctionnalités et à continuer à expérimenter avec les dernières technologies d'IA pour améliorer les applications RAG.

Mindmap

Keywords

💡Google Cloud Community

La Google Cloud Community est un environnement où les professionnels de l'informatique partagent leurs connaissances et expériences liées aux services cloud de Google. Dans le script, c'est le cadre de la session où l'équipe d'Italia partage leur expérience avec une application basée sur l'IA générative.

💡RAG (Retrieval-Augmented Generation)

RAG est une architecture d'apprentissage automatique qui combine la recherche d'informations (retrieval) avec la génération de texte. Dans le script, l'équipe a utilisé cette architecture pour développer une application capable de naviguer dans des documents d'entreprise complexes.

💡IA générative (Generative AI)

L'IA générative est un type d'intelligence artificielle capable de créer du contenu original, comme du texte, de l'art ou de la musique. Dans le contexte de la vidéo, elle est utilisée pour améliorer le processus de recherche d'informations dans les documents.

💡Modèle de langage (Large Language Model, LLM)

Un modèle de langage est une IA entraînée sur de vastes quantités de données textuelles et capable de comprendre et de générer du langage naturel. Dans le script, les modèles de langage sont au cœur de l'application RAG développée par l'équipe.

💡Ingénierie inverse (Prompt Engineering)

L'ingénierie inverse consiste à concevoir des requêtes (ou prompts) pour guider un modèle de langage génératif afin qu'il produise des réponses précises. Dans le script, l'équipe a mené des expériences pour optimiser les prompts afin d'améliorer les performances de leur application.

💡Traitement du langage naturel (NLP, Natural Language Processing)

Le traitement du langage naturel est un domaine de l'IA qui s'occupe de l'analyse et de la compréhension du langage humain. Dans le script, le NLP est essentiel pour permettre à l'application de comprendre et de répondre aux questions posées par les utilisateurs.

💡Curseur lexical (Lexical Search)

La recherche lexicale fait référence à la recherche de mots ou de phrases spécifiques dans un texte. Dans le script, l'équipe a exploré des stratégies de recherche lexicale pour améliorer la capacité de l'application à trouver des informations pertinentes.

💡Classement (Ranking)

Le classement, dans le contexte de la recherche d'informations, est le processus de tri des résultats pour mettre les plus pertinents en premier. Dans le script, l'équipe a travaillé sur des stratégies de classement pour que l'application puisse apporter les réponses les plus pertinentes.

💡Évaluation des modèles (Model Evaluation)

L'évaluation des modèles est le processus de mesure de la performance d'un modèle d'IA. Dans le script, l'équipe a utilisé différentes métriques pour évaluer la précision et la pertinence des réponses de leur application basée sur RAG.

💡Pipeline de données (Data Pipeline)

Un pipeline de données est une séquence d'opérations utilisée pour transformer et déplacer des données. Dans le script, l'équipe a utilisé des pipelines de données pour traiter les documents et générer les vecteurs d'entrée pour le modèle de langage.

💡Métrique de recherche d'informations (IR Metrics)

Les métriques de recherche d'informations sont utilisées pour évaluer l'efficacité de la recherche d'informations. Dans le script, des métriques telles que le rappel et la précision moyenne ont été utilisées pour évaluer les performances de la recherche d'informations de l'application.

💡Synthétique (Synthetic Data Set)

Un ensemble de données synthétique est un ensemble de données généré artificiellement, souvent utilisé pour l'entraînement et l'évaluation des modèles d'IA. Dans le script, l'équipe a créé un ensemble de données synthétique pour évaluer leur application, car ils n'avaient pas accès à un ensemble de données de questions-réponses pré-existant.

💡Hallucination

Dans le contexte de l'IA, la hallucination fait référence à la tendance des modèles à générer des réponses qui semblent plausibles mais qui sont en fait infondées. Dans le script, l'équipe mentionne la hallucination comme un problème qu'ils ont dû surmonter avec leurs modèles de langage.

💡Apprentissage contextuel (In-context Learning)

L'apprentissage contextuel est une méthode où un modèle d'IA est invité à répondre en se basant sur le contexte fourni par les données. Dans le script, l'équipe utilise l'apprentissage contextuel pour améliorer la précision des réponses de leur modèle de langage.

💡Émbedding (Embedding)

Un embedding est une représentation vectorielle d'un élément de données, comme un mot ou un paragraphe, dans un espace de dimensions inférieures. Dans le script, l'équipe utilise des modèles d'embedding pour transformer les paragraphes en vecteurs qui peuvent être comparés mathématiquement.

💡Modèle de vecteurs (Vector Database)

Une base de données de vecteurs est un type de base de données optimisée pour la recherche de vecteurs et leur traitement. Dans le script, les vecteurs d'embedding sont stockés dans une base de données de vecteurs pour une recherche efficace des paragraphes similaires.

💡Métriques de similarité (Similarity Metrics)

Les métriques de similarité sont utilisées pour mesurer la similarité entre deux éléments, comme deux vecteurs. Dans le script, des métriques de similarité sont utilisées pour trouver les paragraphes qui sont similaires au contexte de la question posée par l'utilisateur.

💡Rétrieval (Récupération d'informations)

La récupération d'informations est le processus de recherche de documents ou d'informations spécifiques dans un ensemble de données. Dans le script, l'équipe a amélioré leur processus de récupération d'informations pour s'assurer que les réponses sont basées sur les documents les plus pertinents.

💡BM25

BM25 est un algorithme de recherche d'informations qui mesure la pertinence d'un document par rapport à une requête. Dans le script, l'équipe mentionne l'utilisation de BM25 pour améliorer la qualité de la récupération d'informations.

💡Traitement des données (Data Processing)

Le traitement des données est le processus de nettoyage, de transformation et de préparation des données pour l'analyse ou l'entraînement d'un modèle d'IA. Dans le script, le traitement des données est mentionné comme une étape clé dans la création de leur application.

💡Modèle de fondation (Foundation Model)

Un modèle de fondation est un type de modèle d'IA pré-entraîné sur un large éventail de tâches. Dans le script, l'équipe mentionne l'utilisation de modèles de fondation comme Palm et Gemini Pro pour améliorer leurs résultats.

💡Infrastructure de Google (Google Infrastructure)

L'infrastructure de Google comprend les services cloud et les technologies qui permettent aux entreprises de stocker, de traiter et d'analyser des données à grande échelle. Dans le script, l'équipe a bénéficié de l'infrastructure de Google pour l'expérimentation et le déploiement de leur application.

Highlights

Introduction of the session with the Google Cloud Community and the team from Generali Italia.

Ivan Nardini's role as a customer engineer at Google Cloud in supporting the development of the AI application.

Ian and Domino's introduction as tech leads in AI and machine learning at Generali Italia.

The agenda overview covering the development and deployment of a RAG-based application for enterprise documents.

The business case for Generali Italia focusing on the importance of documentation in the insurance industry.

The challenges of managing over 400 documents and the opportunity to simplify information retrieval with AI.

The definition and history of information retrieval, from the 1950s to the integration of LLMs in the 2020s.

The concept of generative AI and its recent advancements in language understanding and content generation.

The importance of in-context learning for reducing hallucinations in large language models.

The process of creating a context section by embedding document chunks and storing them in a vector database.

The retrieval and generation steps for answering user queries using the embedded information.

The initial experiments conducted with default parameters and the realization of the need for evaluation metrics.

The creation of a synthetic data set using a large language model to serve as a validation set.

The introduction of various metrics for evaluating the performance of the model in information retrieval and QA accuracy.

The use of Vertex AI platform for experimentation, ensuring scalability, reproducibility, and artifact storage.

Custom splitting strategies implemented to optimize chunk size and improve recall and QA accuracy.

The addition of custom chunks to address acronyms and insurance definitions for better user query responses.

The correlation found between the information retrieved by the vector database and the accuracy of the LLM.

The integration of a re-ranking layer to improve the organization of information provided to the LLM.

The migration from Palm to Palm 2 models and the testing of Gemini Pro for performance improvements.

The architecture of the RAG-based system, including the ingestion and inference phases for document processing.

The takeaways from the project, including increased document accessibility and the opportunity to experiment with cutting-edge AI technologies.

Plans for future steps, such as trying new foundation models and integrating Vertex AI pipelines for model comparison.

The Q&A session addressing questions on building a consistent framework for a RAG-based system without customer Q&A data.

Discussion on handling document updates and the process of rebuilding or updating the index.

Approach to chunking large documents and the use of tools like L chain for splitting and processing.

Storing chunks of text with metadata in a vector database for efficient retrieval and filtering.

Strategies for handling follow-up questions that lack context in a conversational AI system.

Transcripts

play00:03

hello everyone and welcome to this new

play00:05

session of the Google Cloud Community

play00:08

today we have the pleasure to have the

play00:10

generally Italia Italy team uh who will

play00:14

share their experience in developing and

play00:17

deploying a rag based application to

play00:20

navigate complex uh Enterprise

play00:24

documents before to before to start let

play00:27

me to introduce uh myself uh I am Ivan

play00:31

nardini I'm a customer engineer at

play00:34

Google cloud and I supported generally

play00:36

in implementing the this generative AI

play00:39

application and together with me today

play00:41

we have Ian and Domino Ian Domino would

play00:45

you like to introduce yourself oh for

play00:48

sure thank you and welcome everybody my

play00:51

name is Ian a tech lead data scientist

play00:54

in general Italia and I support the

play00:56

development of artificial intelligence

play00:58

and machine learning Solutions

play01:01

hello I'm domano and Tech lead machine

play01:03

learning for General

play01:06

Italian okay so let's uh take a look at

play01:10

the agenda

play01:12

then so first of all we will Begin by

play01:16

presenting the scenario in which the

play01:19

document Q&A application was developed

play01:22

and next we will explore why generally

play01:24

choose to address these challenges using

play01:27

large language models within a rag based

play01:30

solution following that we will Deep

play01:32

dive uh into the process that enabled

play01:36

the generally team to successfully

play01:38

deploy the rug uh based llm application

play01:41

into production and in particular they

play01:43

will share with you some details about

play01:45

the experiments they conducted in terms

play01:47

of chuning Le lexical search and ranking

play01:51

strategies that uh ultimately lead to

play01:54

the deployment of the application and

play01:56

finally we will conclude the session

play01:58

with a live demonstration some takeaways

play02:01

and as always the Q&A session so with

play02:04

that Ian Domino the stage is

play02:09

yours okay so thank you even again uh

play02:13

let's start with the general business

play02:16

case uh generally is investing in

play02:20

technological innovation but as an

play02:22

insurance company documentation is

play02:24

always an important and significant

play02:26

component for our business uh the

play02:30

industry is Guided by technical

play02:32

regulation all accompanied by

play02:34

documentation and we have documents such

play02:37

as policy statement with terms and

play02:40

conditions premium

play02:42

statement um risk assessment reports or

play02:45

internal company knowledge uh with

play02:47

documents such as uh machine learning

play02:50

model documentation corporate

play02:52

regulations information related to uh

play02:55

legal entities and so on and uh going

play02:58

through all these documents in in order

play03:00

to find the right information is very

play03:02

important uh recognizing this problem we

play03:05

saw an opportunity um and the question

play03:08

we asked ourself was how we can Leverage

play03:12

The advancement in AI to simplify the

play03:15

information retrieval

play03:18

process so we defined first of all a

play03:21

perimeter of relevant business documents

play03:24

to focus on in order to understand the

play03:26

complexity of this challenge Uh current

play03:29

we have more than 400 documents at our

play03:32

disposal uh toiling more than 5,000

play03:36

pages and this means that it will take

play03:38

around more than 100 hours to read them

play03:42

all uh in front of these numbers we

play03:46

stand before three significant

play03:48

challenges uh firstly there is a

play03:51

continuous growth of available textual

play03:54

data and knowledge uh so this will um um

play03:59

expand our resources and this growth

play04:02

introduce another additional challenge

play04:05

the T the time it takes to extract

play04:08

information from from text the third

play04:11

challenge is related to information

play04:14

access uh from multiple different uh

play04:18

data sources located in different

play04:21

databases and the field of information

play04:23

retrieval uh can assist us with all

play04:26

these

play04:28

challenges first of all

play04:30

uh what is information retrieval uh so

play04:33

uh information retrieval is the science

play04:35

of searching for document searching for

play04:38

information in a document or searching

play04:40

for document themselves so lens embark

play04:42

on a journey through the timeline of

play04:44

information retrieval just few concept

play04:47

we me in the 1950s a decade Market by

play04:50

theoretical Foundation of uh this field

play04:54

uh with Concept like index indexing and

play04:57

search algorithm and fast forward to the

play04:59

1980s we saw the Advent of vector space

play05:02

model entering in um in the 19 um in in

play05:10

in two in the 2000 we see the

play05:12

integration of machine learning and

play05:14

search engines into uh this field and

play05:19

now in 2020s particularly in 2023 we are

play05:23

in a period with the adoption of uh

play05:26

llms so uh over all there are numerous

play05:30

information retrieval

play05:32

methodologies which have been um

play05:35

categorized here with three main

play05:38

categories uh there are the C sear

play05:41

mechanism that are based on precise

play05:44

identification of words or vectors then

play05:47

we have probabilistic approaches and

play05:50

those based on machine learning um and

play05:53

finally there are uh there are more

play05:55

advanced methods that could be an emble

play05:58

of previous ones or other architecture

play06:01

but here we will talk about rag

play06:03

architecture which is based on

play06:06

generative Ai and here we are with the

play06:09

generative AI part of this presentation

play06:12

uh so generative AI have recently shown

play06:15

to surpass the state of Art in terms of

play06:17

performance of um understanding language

play06:21

generating new contents and maintaining

play06:23

a meaningful

play06:25

conversation uh a large language model

play06:27

is a machine learning model that's is is

play06:29

trained on a vast amount of data and

play06:32

they found a very pleasant definition on

play06:34

internet uh to describe them they are

play06:37

large autoc comption systems so in this

play06:40

slide you can see a a sentence uh and

play06:44

this is an input of an nlm that from now

play06:47

on we will call a prompt so slow and

play06:51

steady wins from a grammatical

play06:53

standpoint there are many combination

play06:56

that could be used to finish this

play06:58

sentence but is a essentially only one

play07:00

way to conclude it and it's the

play07:05

race okay so one important concept for

play07:09

building rug is the in context learning

play07:11

let's see it in action for example we

play07:14

can try to ask to our large language

play07:16

model what is the color of your T-shirt

play07:19

and if we see the response you will see

play07:22

that uh the the t-shirt is red but this

play07:26

is wrong this is what we usually call

play07:30

hallucination this because the large

play07:32

language models don't know the answer to

play07:35

this question but try anyway to uh find

play07:39

an answer for that question so one way

play07:42

to uh reduce this type of hallucinations

play07:46

is the in context learning the in

play07:48

context learning is a prompt like this

play07:50

where we say to the large language model

play07:53

that uh you need to answer to that

play07:56

question using all the information that

play07:58

comes from uh the context um windows so

play08:03

in this example we are seeing that in

play08:05

the context section uh what even is

play08:08

wearing and in this case we sub if we

play08:12

submit this prompt to the large language

play08:14

model we will see that it will answer

play08:17

correctly so the idea here is to insert

play08:21

in this context section all the

play08:23

information that comes from our

play08:25

documents and in this way the our large

play08:28

language model we will able to answer to

play08:31

the um to our uh question using

play08:35

information from our documents so the

play08:38

first step we need to

play08:39

do is to uh create the context SE

play08:43

section so we start from our document

play08:46

database we split each document in

play08:49

different par paragraph or chunks and we

play08:52

pass these chunks to an embedding model

play08:56

the embedding model is a large langage

play08:58

model itself but but takes as input uh

play09:01

text and returns a list of number a

play09:04

vector so for each paragraph at this

play09:06

point we will have the corrective

play09:09

embedding here the idea is that the

play09:11

paragraph with same information or with

play09:14

similar information we will have also

play09:17

similar embedding and this similarity

play09:19

between embeddings can be calculated

play09:22

mathematically using some distance

play09:25

Matrix like the coin similarity and uh

play09:29

at this point we can take all these

play09:31

embeddings and store it in a vector

play09:33

database that are some database built do

play09:36

for retrieving and storing uh these

play09:40

vectors and at this point we can go to

play09:44

the retrieving generate steps so we have

play09:47

an user that use a query this query is

play09:50

processed using the same text Em text

play09:52

embedding model so we La as output uh

play09:56

the embedding of the user question and

play09:58

we can use this uh user embedding to St

play10:02

to search through the vector database

play10:04

all the information all the uh chunks

play10:08

that have similar information the idea

play10:11

is that these uh context that we are

play10:14

retrieving uh contains the information

play10:17

to answer that question so at this point

play10:20

we can take all this information the

play10:22

user question and the documents

play10:24

retrieved and put it in a prompt like

play10:27

the prompt we have seen in the in

play10:29

context learning and we can submit the

play10:31

prompt to the large language model that

play10:33

at this point we will answer to the user

play10:36

question here we can see a summary of

play10:39

this process so we have a question

play10:41

generated by a user we uh with a chatbot

play10:46

and a user interface we retrieve all the

play10:49

knowledge that we need to answer that

play10:51

question this information are uh used by

play10:55

the large language model to answer to

play10:58

the user uh using the internal reg of

play11:01

our

play11:05

documents so uh we can say that is not

play11:08

too complex to create an architecture of

play11:11

this kind we have conducted some initial

play11:14

experiments and these activities were

play11:17

conducted with default parameters

play11:19

leaving the splitting Methods at theault

play11:22

and even the Chun CLS uh Chun CLS at

play11:25

default and even the information

play11:27

retrieval process unchanged but we

play11:29

didn't venture to making any kind of

play11:32

sophistication within it and we obtained

play11:34

approximately 45,000 chance for the

play11:37

database in order to represent the

play11:40

entire document collection but we came

play11:43

to realize something very important we

play11:46

lacked of um evaluation metrics uh we

play11:50

didn't have a sort of compass that would

play11:52

allow us in order to uh to determine if

play11:55

we are moving in the right direction or

play11:57

in the wrong one uh and essentially

play12:00

there are two possible strategies the

play12:02

first one is having access to an

play12:04

existing data set uh that can serve as a

play12:07

validation Set uh and the second

play12:10

strategy is to create a new data set the

play12:13

first option was not feasible uh that

play12:16

because a a validation database was not

play12:19

available so we decided to create a new

play12:22

synthetic data set um we generated a

play12:26

synthetic data set using a large

play12:28

language model and first of all we

play12:31

extracted paragraphs from each document

play12:34

and then we have sent them uh into the

play12:36

large language model asking it directly

play12:39

to identify three questions and three

play12:42

Associated answers and by doing that we

play12:45

have obtained pairs of question and

play12:48

answers

play12:49

pairs so uh in this slide you can see

play12:53

the situation an LA uh the first part on

play12:57

the left is what we have just just

play12:59

discussed we have a huge amount of

play13:01

questions uh and we injected all these

play13:04

questions in the in the rag architecture

play13:07

presented by Dominico and as a result we

play13:10

have obtained pairs of real and

play13:12

predicted answer on which we can perform

play13:16

comparisons at this point we need

play13:18

metrics in order to evaluate the quality

play13:21

of our work and the introduction of

play13:25

metrics is is crucial for evaluating the

play13:28

performance of the model comparing

play13:30

predicted answers against Real answers

play13:33

in term of accuracy recall relevance and

play13:36

the the overall model ability in doing

play13:40

this task and metrics help us even to

play13:44

understand the variation in performance

play13:46

when we introduce a new fature uh into

play13:49

the system on the left you can see the

play13:51

metrix for the information retrial

play13:54

process uh on the right you can see the

play13:56

metrics related for um evaluating the

play14:00

the quality of responses generated we

play14:03

have the mean reciprocal rank which is

play14:06

the metric that is determined which

play14:08

determines uh if the chunk is placed at

play14:11

the top in the middle or at the bottom

play14:14

um of all the documents returned then we

play14:17

have metrics like the mean average

play14:19

precision and recall at a given cut off

play14:22

of K uh and that refers of the accuracy

play14:27

of the chunks over all the retriever

play14:29

chunks or over all the total number of

play14:32

current chunks in the data set I know

play14:34

that it's a little bit

play14:37

complicated um on the other side we have

play14:40

metrics such as the Rouge and this is a

play14:43

metric that is related to the machine

play14:45

translation field so moving from one

play14:48

language to another and this Bas this

play14:51

metric is based on the overlap of terms

play14:55

uh we have also the bir score that is

play14:57

another metric that we use in order to

play15:00

understand a comparison of sentence

play15:03

embeddings between predicted and real

play15:06

answer and finally last but not least um

play15:10

something new that is the QA evaluation

play15:14

and that that is a metric based on llm

play15:17

so essentially we inject the question

play15:21

the real answer and the predicted answer

play15:23

into the large language model and the

play15:25

large language model's task is to

play15:27

determine if we are um if if the if

play15:31

their answer is is correct or not uh so

play15:37

we decided to ah sorry sorry this part

play15:40

is

play15:42

for so before ding into the experiment

play15:46

let's check out the tool that we used so

play15:50

as experimentation layer we use the

play15:53

vertex platform verx AI platform inside

play15:55

the Google cloud and it's a useful tool

play15:59

because it allows us to scale the number

play16:01

of experiment and also the resources

play16:04

that we are using for that experiment it

play16:06

ensure the uh experiment reproducibility

play16:10

and finally it save for us automatically

play16:12

all the artifact that we we are

play16:15

generating through our uh experiments

play16:19

for the large language model layer we

play16:21

use the all the jamy and palm models and

play16:25

also the embedding multilingual the

play16:28

embedding model multilingual that works

play16:30

well with the Italian language that are

play16:32

the original language of our

play16:35

documents as chain layer we used lung

play16:38

chain this because it has some function

play16:41

for reading and processing uh PDFs and

play16:45

it has also function for the information

play16:47

retrieval step and finally as storage

play16:50

layer we use quadrum that is a vector

play16:53

database where we store our EMB bendings

play16:56

and we use this because it's as a very

play16:59

fast algorithm for searching through

play17:01

vectors and also an EAS installation on

play17:06

kubernetes so now that we have the

play17:09

metrix the test set and the tools that

play17:12

we can use we can start really with the

play17:14

experiments and so with our second

play17:17

experiment we implemented a

play17:21

um custom splitting strategies with our

play17:24

documents uh in particular we starting

play17:27

split by

play17:28

paragraph and then we split the

play17:31

paragraph in sub chunks using a a

play17:35

specific length uh on the other side we

play17:39

also tuned all all the other hper

play17:42

parameters that we had like the

play17:43

temperature of the model The Prompt the

play17:47

length of the various chunks and for

play17:50

example also the number of chunks that

play17:53

we insert in our prompt for the in

play17:55

context learning so after this this all

play17:58

this tuning we had that the best chunk

play18:01

size it's 1,000 Char for our chunks and

play18:05

we generated with this approach uh

play18:08

13,000 chunks for for the metric

play18:11

standoff point we had a recall at 15

play18:13

documents of 80% and a question answer

play18:17

um of uh 7

play18:22

73% on the right we can see a plot where

play18:25

we can see how the recall increase when

play18:28

we increase the number of chunks that we

play18:30

insert in our prompt so we can see that

play18:33

after the 10 10 chunks that we insert we

play18:37

had a plateau in our recall

play18:41

Cur okay for the next experiment we

play18:44

added some custom chunks in our

play18:47

collection this because a lot of our

play18:50

documents have acronyms and insurance

play18:55

defs and a lot of time we have question

play18:58

from the user where they need to where

play19:01

they ask the meaning of these acronyms

play19:03

of or some insurance definition for

play19:07

example what is the definition of c so

play19:10

we want that the chatbot is able to

play19:13

answer to this question and for doing

play19:15

this we added manually some chunks where

play19:18

we explain what that acronyms means or

play19:21

what that insurance definitions uh means

play19:25

and in this manner we generated another

play19:28

4,000

play19:30

chunks and for the metric standoff point

play19:33

with this experiment we had a recall of

play19:36

78% and a question answer of

play19:39

72% uh as you can see these metrics are

play19:42

a little bit

play19:44

lower if we compare it with the previous

play19:47

experiment but we choose to take this as

play19:50

best experiment just because we want

play19:52

that our chatboard can be able to

play19:54

explain acrs to the user or some

play19:59

Insurance

play20:01

definitions so uh we have seen after

play20:06

this experimentation and introducing the

play20:09

fact the the aspects related to

play20:11

definitions and so on that there is a

play20:13

direct correlation between the

play20:15

information retrieved by uh the vector

play20:18

database and then then the information

play20:20

used by the large language model uh but

play20:23

simply if the information from the

play20:26

vector database is incorrect thear wedge

play20:28

model cannot generate an accurate HW so

play20:32

we understood that we need to focus um

play20:36

our efforts and time on the information

play20:39

retrieval process um enhancing the

play20:42

quality of the chunks we identify and we

play20:45

move we moved from a simple let's say

play20:48

DSE search through embedding method to a

play20:51

neing method that combines embedding

play20:54

techniques with the classical uh bm25

play20:58

and these search U methods are defined

play21:01

as a mix mixture of search and then sear

play21:05

and then search and we left the other

play21:08

parameters unchanged and as you can see

play21:11

at the bottom of this slide we still

play21:13

achieved a significant performance uh

play21:16

boost in terms of recall and even in in

play21:20

the QA

play21:22

accuracy but moving forward we also

play21:26

ventured a little bit in to the research

play21:29

field and last year uh there was an

play21:31

interesting paper uh titled lost in the

play21:35

middle uh I I think that it was

play21:37

published in in November uh and it's a

play21:40

paper where a group of researchers aimed

play21:42

to understand how a large language

play21:44

models use the information from the

play21:47

context provided and this research team

play21:50

um have found that there is a um a a

play21:55

correlation um of where the the chunk

play21:59

the cor chunk is used by the large Lang

play22:01

wi model compared to the overall number

play22:04

of chunks in the prompt here in this

play22:07

graph we um we have the what socalled

play22:11

u-shaped Cur which essentially represent

play22:14

the reduction of performances in in

play22:16

relation to the um placement of corrent

play22:19

information across uh the r return

play22:22

document set and this consideration lead

play22:25

us to think about how the information is

play22:29

provided and is organized to llm so

play22:32

that's why we introduced the new layer

play22:35

or re ranking layer that is capable of

play22:38

sorting uh the information that is the

play22:41

most accurate information that might be

play22:43

used by the large language model to

play22:46

either at the top or at bottom of all

play22:49

the documents into the

play22:52

prompt so the last phase of our

play22:55

experimentation was the integration of

play22:58

two new features first of all we

play23:00

increased we increased the number of

play23:02

documents uh into the collection set and

play23:05

the second was um to moving from Palm

play23:08

one to Palm two and currently we are

play23:11

testing uh even Gemini Pro in terms of

play23:15

performances and as you can see at the

play23:17

bottom of these slides there is a recall

play23:20

reduction uh in term of performances and

play23:23

we must acknowledge uh some change in in

play23:26

terms of metrics so this is motivated by

play23:30

the introduction of new documents

play23:32

without fine-tuning the input

play23:35

preprocessing uh pipeline it's very

play23:37

important to adjust the uh the input

play23:40

pipeline um to the to the documents that

play23:44

you are going to use so we have also a

play23:47

supplementary um documents that contains

play23:50

definition within them and which can be

play23:53

used during the QA

play23:55

evaluation um as chunks

play23:58

for the questions and this could be a

play24:01

motivation for the recall reduction on

play24:04

the other sides as you can see uh from

play24:08

by migrating from Palm one to Palm Tre

play24:11

we obtained an improvement in terms of

play24:14

QA

play24:16

accuracy uh oh demo part okay so having

play24:20

discussed this

play24:22

architecture um experiments as we have

play24:25

just seen and many other things

play24:28

um let's see briefly uh just this

play24:31

application how it works uh give me one

play24:34

second to to upload the the the

play24:44

videos Okay so this um this is the the

play24:48

the user interface as you can see there

play24:52

the the user can insert a question uh

play24:55

into the the the tab and uh in this case

play25:00

we are asking to the system um explain

play25:02

the generally strategy in terms of

play25:04

environmental

play25:06

sustainability and as you can see there

play25:09

uh the answer is composed of two parts

play25:12

the first part is related to uh the real

play25:15

answer uh for the user query and the

play25:18

second part is related to um the

play25:21

document that is used by that is used

play25:25

for for providing the the the answer

play25:28

so that's is because we strongly believe

play25:31

that the user should have the

play25:33

opportunity to understand which document

play25:36

uh was used in order to extract the the

play25:40

information even in this case this is

play25:43

another question related to AI ethics

play25:46

from generali and even this case the the

play25:50

system generates uh the the answer to

play25:53

the question and then the the

play25:56

sources

play26:01

okay so this is our fin architecture

play26:04

that we build for our product and this

play26:06

is divided we have seen in two phases

play26:09

the ingestion pH phase where we start

play26:12

from a cloud storage where we store all

play26:14

our documents and then we have the

play26:16

vertex pipeline vertex pipeline are a

play26:19

tool inside vertex AI platform it

play26:23

ensure the training of models and and

play26:27

can be used also for data processing so

play26:30

we use that for doing all the data

play26:33

ingestion chunk the chunking part the

play26:36

embedding

play26:37

part and the creation of bm25 index at

play26:42

that point we have another two pipeline

play26:45

uh at step three and four that do is the

play26:47

information retrieval evaluation for

play26:50

that calculat the recall at 15 documents

play26:53

that we have seen and the part four

play26:57

where we we calculate the semantic

play27:00

evaluation and we calculate the question

play27:02

answer accuracy at this point all the

play27:06

eer parameters that we have find like

play27:08

the prompts the

play27:10

temperature are stored in an artifact

play27:13

registry on the other point we have the

play27:16

inference phase where we are user that

play27:18

interacts with a frontend service we

play27:21

have a back service that um manage all

play27:24

the prompt engineering phase and this

play27:27

can service read the information and all

play27:30

the parameter that uh you need to use

play27:33

like the temperature and the propt from

play27:35

the artifact registry we had also a no

play27:39

SQL database on top of fir store where

play27:41

we store the uh conversation between the

play27:44

user and the chatbot and this can be

play27:46

used we use it to store data man that we

play27:51

can fine tune uh in the next years uh

play27:56

the large language model

play28:00

okay so for what concerns the take homes

play28:05

with uh this project we increased the

play28:08

accessibility of our documents for all

play28:10

the company uh we had the opportunity to

play28:14

experiment with all the Google

play28:16

Foundation models so with Cutting Edge

play28:18

AI Technologies and we can rely on the

play28:22

Google infr infrastructure for the

play28:25

scalability of our experiments and also

play28:28

the reliability of our product and

play28:31

finally we are sharing a lot of

play28:33

knowledge with hian nardini and all the

play28:35

other Google Cloud

play28:38

Engineers on the other side for the next

play28:41

step uh the idea is to try the New

play28:44

Foundation model publisher like ger Pro

play28:47

1.5 try also the vertex AI AO side by

play28:52

side the pipeline these are some

play28:54

pipelines available in vertex where we

play28:56

can compare two different model and uh

play28:59

say how it answer to the same question

play29:03

so we can choose the better model and

play29:07

finally trying the vertex Vector search

play29:10

that is a vector database inside the

play29:13

Google inside the vertex this because

play29:16

the current one the vector database that

play29:18

we are using quadrant is an open source

play29:22

um tool and uh so we have also verx

play29:27

search that is implemented by default in

play29:30

vertex and finally we can also work on

play29:34

the LL lops for rug application for

play29:37

seeing for example uh What uh what to do

play29:41

when new documents are added to

play29:44

our um

play29:46

database I want to thank you also all

play29:49

the team that work on this incredible

play29:51

project and thank you you two for being

play29:54

here with

play29:56

us

play30:03

cool so thank you Ian thank you Domino

play30:06

for this great overview so as we

play30:08

promised at the beginning now it's time

play30:10

for having Q&A so we will start with

play30:12

some questions that we collect uh and

play30:15

then we will also go through the live

play30:16

questions that we just received so let's

play30:20

start with the first question which is

play30:21

something that you just touched so the

play30:24

first question is about how to build a

play30:26

consistent framework for a r based

play30:29

system when do when do you when you

play30:32

don't you do not have a possibility to

play30:35

collect hundreds of Q&A from your

play30:40

customer okay um so we have partially

play30:45

answered to this question previously um

play30:48

in our case was uh was this this

play30:53

scenario because we haven't access to

play30:56

our

play30:57

to an internal knowledge of Q&A so we

play31:01

decide to create a synthetic data set uh

play31:04

by providing uh chunks of paragraphs

play31:08

into nlm and then

play31:11

creating uh pairs of question and

play31:13

answers in in the machine learning field

play31:16

there are many possibilities in order to

play31:18

do that uh using uh llm is just one of

play31:23

them uh we suggest to have um a

play31:28

framework um with pipelines in order to

play31:31

be to to create this process in

play31:34

iterative way uh take into account that

play31:38

a synthetic data

play31:39

Generation Um is not the same as um as

play31:45

as the one that you can obtain from your

play31:47

from the the business unit or your

play31:50

customers uh so you need always to check

play31:54

the the quality of your questions and

play31:56

answers in order to have something that

play31:59

is uh quite of a good

play32:03

quality

play32:05

um nothing else I

play32:08

think okay well that's a great that's a

play32:10

great answer I think you you just take a

play32:13

it I so let's move on yeah let's move on

play32:17

the next question so what do you do if

play32:20

one of the documents uh get updated do

play32:24

you build or rebuild actually the index

play32:26

again

play32:28

okay I will say that it depends because

play32:32

if you add new documents you can just go

play32:35

through the text embedding model and

play32:38

update your index and your chat chat we

play32:42

will able to answer uh to the new

play32:46

question but maybe if you add too much

play32:49

documents uh maybe there is there are

play32:52

some more H parameters that are better

play32:55

maybe there is a better prompt a better

play32:57

length of chunks so uh my opinion is

play33:00

that if we had a lot of new documents

play33:03

maybe it's uh better to run the

play33:07

preprocessing pipeline so

play33:10

to to search if there are some better

play33:15

parameters okay that makes

play33:19

sense uh as you can see the question

play33:22

they're getting shorter and shorter

play33:23

which is good I think so uh what was

play33:26

your

play33:27

approach to

play33:29

chunking can you can you maybe live a

play33:32

little bit more yeah okay so I as I said

play33:37

previously we the first thing that we

play33:39

did is split by paragraph this because

play33:42

we want that uh a single Chun will be

play33:46

semantically different from another one

play33:49

in in this manner you don't have a chunk

play33:52

that has two different paragraph that

play33:54

can lead to some that can

play33:57

talk about different things so the first

play34:00

thing that we did is split by paragraph

play34:02

and then Subs spit um using the L chain

play34:07

tool uh the iterative

play34:12

splitter okay I hope uh it was a like

play34:19

provides the the answer that they were

play34:21

looking at so the I think this is the

play34:24

last one um let's see but

play34:27

what do you where do you store chunks of

play34:30

tax in a in a gap uh cloud storage or B

play34:36

query so as I said we use vertex

play34:40

pipeline for our experiments and vertex

play34:44

pipeline used by default save all the

play34:46

artifacts on cloud storage so I will say

play34:49

cloud

play34:50

storage

play34:52

okay you're passing the

play34:55

exam uh so the the the last no this is

play34:58

the last one this is the last question

play35:01

so they're asking for best practices for

play35:03

chunking large uh table data such as a

play35:06

complex spread spreadsheet with many

play35:09

sheets so now I don't know if you had

play35:11

this kind of data but uh maybe you face

play35:14

this uh this scenario with the other use

play35:17

case that you are working on so feel

play35:19

free to provide uh some best practices

play35:22

here okay so um maybe I can join this

play35:26

question with another that I have seen

play35:28

in in the chat uh currently we are using

play35:33

for doing that we are using an external

play35:36

uh Library which is unstructured uh in

play35:38

order to extract information from uh our

play35:42

documents um you can use um some CNN

play35:47

models in order to extract even the text

play35:50

from images uh but um you you can use um

play35:57

libraries like um Open PI Exel pandas in

play36:01

these cases or um the or even

play36:06

unstructured um it's important to take

play36:08

into account that uh it's um it's even

play36:11

important the way in which you feed the

play36:14

llm uh using this kind of data so if you

play36:18

are managing a spreadsheet you need to

play36:21

create a structure in the prompt that is

play36:25

um that is useful in order to be

play36:28

understood by the LM maybe you can

play36:30

integrate some sentences uh separators

play36:34

and change the things in order to uh

play36:37

create something that is more uh

play36:40

semantic semantically valid for the llm

play36:44

that's in my

play36:46

opinion no it makes it makes totally

play36:48

sense I don't know if Dominico you want

play36:50

to add something here or we can move on

play36:53

one of the Live question that we just

play36:56

received

play36:59

okay so one of the question that uh I

play37:02

think it's a uh it's valuable

play37:06

um they're asking about how large was

play37:10

your validation set so how many uh Q&A

play37:14

pairs you

play37:17

have okay uh so we we have around uh

play37:22

2,000 questions uh that we have used in

play37:26

order to to

play37:27

generate um this synthetic data set that

play37:31

we try to split into validation and test

play37:34

Set uh in order to be consistent in for

play37:37

the definition of hyper parameters it's

play37:40

uh we have seen that

play37:43

uh the the the

play37:46

parameters uh and the kind of rag

play37:49

architecture that we are going to use

play37:51

the the chunk lengths and so on it

play37:53

depends on the questions that uh your uh

play37:58

your rag is going to receive so it's

play38:01

very important to ask your your business

play38:04

units or your customers even what kind

play38:06

of question do you do you expect to

play38:09

generate or do do you expect to

play38:11

introduce even for um creating a

play38:17

um something that is uh okay from the

play38:21

point of view of the

play38:25

prompt okay so now let's uh let's uh uh

play38:30

talk about a couple of question let me

play38:32

ask you a couple of question uh related

play38:34

to chunks so one question is how do how

play38:38

do you store the chunks with metadata

play38:41

other than the source file

play38:47

name

play38:50

Domin okay yeah we store all this

play38:53

information uh in the vector database

play38:56

because quadrant have this feature that

play38:58

you can Store and search through um um

play39:03

metadata so is very useful to have it in

play39:06

the vector database because I've seen

play39:08

another question um that it's asking how

play39:13

do you uh use the right document of the

play39:16

right people and if you have metadata in

play39:20

your vector database you can just do the

play39:23

semantic search and then filter also on

play39:26

the on this metadata so if you you can

play39:29

add a lot of metadata on your documents

play39:32

that can be useful for your uh

play39:36

rug and let me ask let me ask the last

play39:39

question uh so how you handle scenarios

play39:44

where a user ask a followup question

play39:47

that lacks

play39:54

context I'm reading the question

play39:57

uh where okay okay I can give you the

play39:59

example by the way or you can read it if

play40:02

you

play40:06

prefer uh so uh in this case uh you

play40:09

could use multiple uh strategies you can

play40:13

even um to create a summary of the

play40:16

previous conversation and then using

play40:18

them in order to feed the The Prompt for

play40:20

generating the the the new answer or you

play40:24

can uh embed the the previous

play40:26

the entire previous conversation into uh

play40:29

The Prompt but this is based even on the

play40:33

um of

play40:34

the the length input that the large

play40:38

language model could

play40:40

accept uh currently we are still working

play40:44

on it so this feature is not integrated

play40:47

for

play40:49

us okay Dom do you

play40:54

have is exactly uh that and if you want

play40:58

in long chain these two strategies are

play41:00

already implemented so it's very easy to

play41:06

implement okay so yeah I think I think

play41:11

this is it

play41:12

so um let me just before to conclude let

play41:18

me just uh uh go back on the dck on the

play41:21

on the

play41:22

slide

play41:25

and and and yeah just one last uh

play41:28

reminder uh don't forget that we are

play41:31

going to have several events uh in March

play41:33

here you have the some of them so uh

play41:37

feel free to uh join them you have the

play41:39

link to like participate uh but for now

play41:43

uh I hope uh you enjoy the session and

play41:47

uh thank you for uh

play41:53

participating thank you thank you

play41:57

bye-bye bye-bye bye

Rate This

5.0 / 5 (0 votes)

Связанные теги
IA GénérativeRecherche d'InfoModèles de LangueGoogle CloudDéploiementModèle LLMRAG ArchitectureIngénierie de DonnéesMachine LearningSolutions IA
Вам нужно краткое изложение на английском?