AWS re:Invent 2023 - Use RAG to improve responses in generative AI applications (AIM336)
Summary
TLDRこのプレゼンテーションでは、検索を強化した生成(Retrieval Augmented Generation, RAG)を使用して、生成型AIアプリケーションの応答を向上させる方法について説明されています。カスタマイズの重要性、カスタマイズの方法、RAGの具体的な使用例、データ取り込みワークフローから問い合わせワークフローまで、Amazon Bedrockを使用してこれらのプロセスを簡単に実装する方法についても触れられています。また、LangChainのようなオープンソースの生成型AIフレームワークを使用して、Knowledge Bases for Amazon Bedrockを構築する方法も紹介されています。
Takeaways
- 📚 カスタム化された基礎モデルを使用することで、特定のドメイン言語や独特のタスクに適応させ、企業の外部データとの統合を向上させることができます。
- 🔍 検索を補助する生成(Retrieval Augmented Generation: RAG)は、応答の品質と正確さを向上させるために外部知識源を活用する技術です。
- 🌐 Amazon BedrockのKnowledge Basesを使用することで、RAGアプリケーションの構築を簡単かつ管理しやすいようにすることができます。
- 📈 データのインジェストワークフローを最適化し、テキストを数値表現に変換することで、意味を保持したセマンティック検索が可能になります。
- 🔧 組み込みのTitan text embeddingsモデルを使用することで、テキストの数値表現を作成し、関連性の高いドキュメントを検索することができます。
- 🛠️ Knowledge Bases for Amazon Bedrockは、データのインジェスト、保存、取得、生成を自動化し、完全に管理されたRAGを実現します。
- 📊 知識ベースとエージェントを組み合わせることで、静的なデータと動的なデータやAPIを統合し、高度なタスクを自動化できます。
- 📈 RAGの使用例には、コンテンツの品質向上、コンテキストベースのチャットボット、パーソナライズド検索、テキスト要約などが含まれます。
- 🔗 OpenSearch Serverless、Redis、Pineconeなどのベクトルデータベースオプションを利用することで、Knowledge Basesのニーズに応じた最適なソリューションを提供できます。
- 📚 LangChainのようなオープンソースの生成型AIフレームワークを使用することで、Knowledge Bases for Amazon Bedrockと簡単に統合できます。
- 💡 データのインジェスト後、同期(Sync)を行うことで、S3にある新しいファイルが自動的に処理され、知識ベースが更新されます。
Q & A
なぜ生成的AIアプリケーションで応答を改善するために検索を強化生成を使用する必要があるのですか?
-検索を強化生成(RAG)を使用することにより、特定のドメイン言語に適応させ、独特のタスクに適応させ、企業の外部データ(FAQやポリシーなど)をモデルに認識させることで、応答の質を向上させることができます。
基礎モデルをカスタマイズする際の一般的なアプローチは何ですか?
-基礎モデルをカスタマイズする一般的なアプローチには、プロンプトエンジニアリング、検索を強化生成、モデルの微調整、およびモデルをスクラッチから訓練するなどがあります。
プロンプトエンジニアリングとは何ですか?
-プロンプトエンジニアリングは、基礎モデルに渡すユーザー入力(プロンプト)を設計し、反復して調整することで、望むべき出力を得る方向性を誘導する技術です。
検索を強化生成(RAG)の基本的なステップは何ですか?
-RAGの基本的なステップは、文書のコーパスからテキストを取得し、それを基礎モデルに渡して企業データに基づく応答を生成することです。
Amazon BedrockのKnowledge Basesとは何ですか?
-Amazon BedrockのKnowledge Basesは、企業のデータソースを自動的にインデックス化し、検索を強化生成(RAG)アプリケーションを簡単に構築できるフルマネージドサービスです。
Knowledge Basesを使用する際のデータインジェストionワークフローは何ですか?
-データインジェストionワークフローは、外部データソース(例:S3にあるドキュメント)を取得し、チャンキングして埋め込みモデル(如:Titan text)を通過させ、最適なベクトルデータベースに保存するプロセスです。
Titan text embeddingsモデルはどのような役割を果たしますか?
-Titan text embeddingsモデルは、テキストを数値表現に変換し、語の関係と意味を維持することで、セマンティック検索やRAGの使用ケースを最適化します。
retrieve and generate APIとは何ですか?
-retrieve and generate APIは、ユーザーの質問に基づいて関連文書を取得し、それをモデルのプロンプトにフィードして最終的な応答を生成するAmazon BedrockのAPIです。
Knowledge BasesとAgents for Amazon Bedrockを組み合わせることの利点は何ですか?
-Knowledge BasesとAgents for Amazon Bedrockを組み合わせることで、静的なドキュメント情報と動的なデータベースやAPIとの情報を統合し、タスクの計画と実行を自動化する高度なアプリケーションを作成できます。
LangChainを使用してKnowledge Basesを構築する方法はどのようになるのですか?
-LangChainを使用してKnowledge Basesを構築するには、Boto3とLangChainの最新バージョンをインストールし、LangChainのラッパーを使用してAmazon Knowledge Base retrieverを初期化し、質問に応じて応答を生成する必要があります。
RAGアプリケーションを構築する際に使用されるembeddingsの重要性は何ですか?
-embeddingsは、テキストを数値表現に変換する際に語の関係と意味を維持するため、セマンティック検索やRAGの精度と正確性を確保する上で非常に重要です。
Outlines
📝 イベントの開始とRAGの紹介
イベントの開始と、RAG(Retrieval Augmented Generation)の改善について話し、カスタマイズの重要性やアプローチについて触れています。また、Knowledge Bases for Amazon Bedrockの紹介と、RuhaabとManiの自己紹介を行います。
🛠 カスタマイズ方法の概説
基礎モデルをカスタマイズする方法について説明しています。プロンプトエンジニアリング、RAG、モデルの微調整、モデルのトレーニングから始めることまでのアプローチについて詳細に説明します。
📚 RAGの仕組みとユースケース
RAGの基本的な仕組みとそのユースケースについて説明します。RAGは、外部の知識源を利用して応答の品質と正確さを向上させる方法です。コンテンツ品質の向上、企業データに基づく回答、パーソナライズされた検索、テキスト要約などがあります。
🔍 データの種類と検索方法
RAGで使用するデータの種類と検索方法について説明します。ルールに基づく検索、構造化データ、セマンティックサーチの3つの方法について詳細に説明します。また、テキストを数値表現に変換する必要性と、エMBEDDINGSモデルの選択についても触れています。
🌐 テキストエMBEDDINGSの重要性
テキストエMBEDDINGSの重要性について説明します。エMBEDDINGSは、テキストを数値表現に変換し、単語間の関係を維持するためです。これにより、意味を理解して応答を生成できます。Titan text embeddingsモデルについても触れ、その優化と利用方法について説明します。
🔧 RAGのデータインジェストと内部メカニズム
RAGのデータインジェストワークフローと内部メカニズムについて説明します。データのチャンキング、エMBEDDINGSモデルによる処理、ベクトルデータベースへの保存、そしてテキスト生成ワークフローについて詳細に説明します。
📋 Knowledge Bases for Amazon Bedrockの機能
Knowledge Bases for Amazon Bedrockの機能について説明します。データインジェストの自動化、ファウンデーションモデルとアジェントの接続、データの検索とプロンプトの強化、出典の属性付けなど、Knowledge Basesの利点と使い方について詳細に説明します。
🔄 データベースとの連携と動的な情報の取り扱い
Knowledge BasesとAgents for Amazon Bedrockの連携方法と、動的な情報を取得する方法について説明します。Agentsは、タスクの計画と実行、APIとの相互作用を担当し、Knowledge Basesは文書から情報を取得して補完します。これにより、アプリケーションはリアルタイムデータと数据库、APIと連携できます。
🛠 LangChainを使用したRAGアプリケーションの構築
LangChainを使用してRAGアプリケーションを構築する方法について説明します。Boto3とLangChainのバージョン確認、クライアントの設定、リトロイバーオブジェクトの初期化、そしてQAチェーンとの統合について詳細に説明します。GitHubにあるコード例を参照することで、QAアプリケーションやコンテキストベースのチャットボットを構築できます。
🎉 まとめと今後の展望
演讲の内容をまとめ、カスタマイズの重要性、RAGのユースケース、Knowledge Bases for Amazon Bedrockの機能、AgentsとKnowledge Basesの連携方法について再確認します。また、GitHubで公開されているノートブックやドキュメント、今後の予定についても触れています。
Mindmap
Keywords
💡retrieval augmented generation (RAG)
💡foundation models
💡prompt engineering
💡embeddings
💡Knowledge Bases for Amazon Bedrock
💡Agents for Amazon Bedrock
💡LangChain
💡Titan text embeddings model
💡data ingestion workflows
💡fine-tuning
💡source attribution
Highlights
The talk focuses on using retrieval augmented generation (RAG) to improve responses in generative AI applications.
Foundation models have vast pre-trained knowledge but may not understand specific company domains.
Customization of foundational models can adapt to domain-specific language, improve performance on unique tasks, and incorporate external company data.
Prompt engineering is a simple approach to customize models by crafting user inputs to guide model outputs.
Retrieval augmented generation leverages external knowledge sources to enhance the quality and accuracy of responses.
Model fine-tuning allows for adaptation of foundational models on specialized task-specific datasets.
Training a foundational model from scratch offers complete control over training data but requires extensive resources.
The decision on which customization methodology to use depends on the task and whether it requires context from external data.
Retrieval augmented generation is useful for improving content quality by reducing hallucinations and providing context-based responses.
RAG can be applied for personalized search by retrieving relevant content based on user profiles and preferences.
The Titan text embeddings model is optimized for text retrieval and RAG use cases and supports over 25 languages.
Knowledge Bases for Amazon Bedrock provides a fully managed RAG experience, abstracting away complex workflows.
Agents for Amazon Bedrock can interact with knowledge bases and APIs to orchestrate tasks and provide dynamic responses.
Open source frameworks like LangChain can be used in combination with Amazon Bedrock services for building RAG applications.
The process of data ingestion, embedding, and retrieval is automated and managed by Knowledge Bases for Amazon Bedrock.
Source attribution feature provides transparency by showing the data sources used to generate model responses.
The retrieve and generate API simplifies the process by handling query embedding, document retrieval, and response generation.
Transcripts
- Alright, I think it's time to get started.
So thank you everyone for attending our talk
on how to use retrieval augmented generation
to improve your responses in generative AI applications.
So hopefully that's what you're here for
and if not, then you're stuck with us.
So let's get started.
We've gonna action pack agenda today
where we're gonna cover a variety of topics
including customizing foundation models,
why you should think about customization
and common approaches for how you should customize.
Then we will go specifically
into retrieval augmented generation or RAG,
and we'll deep dive into how it works
and cover all the different components
from the data ingestion workflows to how embeddings work
and demystify a lot of the concepts
that you might have heard throughout this conference.
We'll introduce Knowledge Bases for Amazon Bedrock,
which I hope you heard in the keynotes from Adam and Swami.
And then we'll talk about
how we're making the building of RAG applications
really, really easy.
And then we'll cover lastly how these capabilities
of knowledge bases work
with other parts of the Bedrock ecosystem
such as Agents for Bedrock and also how you can leverage
open source generative AI frameworks
such as LangChain for building
retrieval augment degeneration capabilities.
And I forgot to introduce myself.
And so I'm Ruhaab Markas.
I lead product management
for Knowledge Bases and Amazon Lex.
- And I'm Mani Khanuja,
technical lead for generative AI specialist
in the worldwide specialist organization.
And today I'll be co-presenting with Ruhaab
and taking you through this journey of how you can build
your retrieval augmented generative applications
using Knowledge Bases for Amazon Bedrock.
- So quick show of hands, who has heard
of Knowledge Bases for Bedrock,
either through the keynote or through our previews.
Okay, fantastic, quite a few of you have,
and it looks like a few of you haven't,
so we'll definitely have a variety of content
for everyone here today.
Alright, so first I want to talk about why
you should think about customizing a foundational model.
So foundational models
have a vast amount of pre-trained knowledge
directly encoded into the model, right?
That's, if you've heard the term GPT,
that's really what the P stands for, right?
That pre-trained knowledge.
But it's important to understand that in many ways
these models don't really know
a lot about your specific company, right?
And so the first reason
you might want to customize foundational model
is to adapt to domain specific language.
Let's say you're in healthcare
and you need the model to understand
all the medical devices that you sell, right?
That doesn't come out of the box in most instances.
Secondly, you might want these models
to really perform better at really unique tasks
suited for your company, right?
Let's say you're a financial services company
and you're looking to do more advanced accounting
or analysis on earnings report,
and you wanna teach these models about tasks
and help really specialize these models
on a company specific data set or task.
And lastly, you may want to think about customization
when you want to improve the context and awareness
of these models with your external company data.
So how do you bring company repositories such as FAQs
or policies and other documents that exist in your company
and pass that as context into a foundation model.
So those are a few reasons why
you may want to think about customization.
And now we'll cover, you know, some common approaches
of how you can customize a foundation model.
And we'll talk about a few common approaches,
these aren't exhaustive and there are other approaches
and these will, you know, grow incrementally
in terms of the complexity, the cost
and time it takes to implement these changes.
So the most simple approach
for customizing foundational model is prompt engineering.
And a prompt is simply the user input
that you pass to a foundation model, right?
And these prompts can be crafted and iterated upon
to really steer the direction
to get the right output foundation model.
And there's a variety of different approaches
that you can leverage for prompt engineering.
Prompt priming, prompt weighting,
or even chaining different prompts.
So prompt priming is really the most basic form
of prompt engineering, which is just taking a input
or a form of instructions
and passing that to a foundation model.
Sometimes you can even pass specific examples
or tasks in the foundation model through the prompt,
and that's known as in context learning.
Another approach, as I mentioned, is prompt priming
where you, sorry, prompt weighting,
which is giving more emphasis
on certain elements of the prompt
that you want the foundation model
to really focus on, right?
So if you tell the the model, you know,
definitely don't respond
to something that you don't know about, right?
Capitalizing that and putting 5,000 exclamations,
like those things actually do bias and put emphasis
on certain parts of your instructions.
And lastly, there's prompt chaining,
which is taking more complex prompts
and breaking that down into more discreet parts
where outputs of a certain prompt
are then passed as an input into the next task.
So those are just a few examples of prompt engineering.
Secondly, there is retrieval augmented generation,
which is all about leveraging external knowledge sources
to improve the quality and accuracy of responses.
And when I use the term external knowledge sources,
it's likely that these knowledge sources
are actually internal to your company,
but it's external in terms of the knowledge
of the pre-trained model, right?
You're really helping bring new knowledge
to the foundation models, hence the term external, right?
It's external to the pre-trained foundation models.
And we'll really deep dive
into retrieval augmented generation
throughout the presentation, but the basic steps being
you're retrieving some form of text
from a corpus of documents.
You're using that as context to a foundation model
to ultimately generate a response that's grounded
in the knowledge of your company's data, right?
Which is extremely powerful when you think about
using the advanced reasoning capabilities foundation models,
but really steering that towards the knowledge
specifically from your company data.
So these two forms of customization
is really about augmenting a foundation model.
We're not actually going in
and changing anything in the foundational model itself,
but there are approaches that allow you to do that,
such as model fine-tuning,
and model fine-tuning, it allows you
to really adapt a foundational model
on a specialized task specific data set.
And this is a supervised approach,
meaning that you're training the foundation model
on labeled examples of tasks
and you've specified the expected output and outcome
through those examples, allowing you to really, you know,
train this model on a specialized task.
And through fine-tuning, you're actually updating
the weights of the model, right?
The parameters of the models are actually being adjusted
based on this customization.
And lastly, and arguably the most costly
and time intensive approach
is training the foundation model from scratch.
And it's a approach where you really want
to be in complete control of the data
that's used to train the model.
You may want to remove any inherited bias that might exist
from some of the other contexts
that the model is trained on,
and it's giving you complete control
and building a domain specific model.
But obviously this requires an extensive amount
of task specific data, a lot of compute resources
making it obviously one of the more, you know, complex
and time intensive and costly approaches.
And so while we talk about a few approaches
of customization, again, this is not exhaustive,
but some of the more common approaches
that you'll see in model customization,
and today we'll focus specifically on RAG.
So now that we know why you should customize
and common approaches on how to customize,
let's look at a mental model
for when you should use certain methodologies.
So it's easier to make a decision
of what approach to really take.
And it all starts by thinking about the task
that you want these foundation models to execute.
And does this task require context from external data
is kind of that first decision point.
And if the answer is yes,
you really then have to think about
is that data access needed in real time?
And if the data is relatively static
where it's not changing on a realtime basis,
such as frequently asked questions or policies,
that's a classic use case
for retrieval augmented generation.
But just because I use the term relatively static
doesn't mean that this data isn't changing, right?
And so I don't want that to be misleading
because you can, you know,
have data changing in this construct as well,
but it's not changing real time.
However, if the data is changing real time
and you also need the ability to connect to different tools,
meaning that I'm fetching or querying data from databases
or I'm interacting with APIs or applications and tools,
that's a use case for Agents for Amazon Bedrock.
And we'll also cover how agents and knowledge bases
can be brought together to really build, you know,
a really powerful, you know, capabilities,
so these aren't mutually exclusive by any means.
Next, if you have a use case
that's more, it's leveraging historical data, right?
A historical, like a snapshot of information
and it's a relatively simple task
that might already perform really well
from a pre-trained foundation model.
That's where prompt engineering can really help, right?
I'm passing some specific context or task or instructions,
right, as part of my prompt engineering,
and that in many times can be extremely effective.
And lastly, if I have historical data
that's maybe a bit more complex as I mentioned,
that you know, is task specific
and needs a little bit more task training,
that's where model fine-tuning, you know,
serves a really important purpose.
And you might have heard today in the keynote
that we've announced the ability for fine-tuning
for foundational models
for all the Bedrock foundation models
with the support of fine-tuning
for Anthropic Quad coming soon.
And so let's deep dive into RAG and really understand
what is retrieval augmented generation.
- So, so far what Ruhaab has covered
is how do we, why do we need to customize
and provided us some really good prescriptive guidance
on when to customized and how you can work backwards
from your particular use case.
But now let's understand
what is retrieval augmented generation?
And as the name suggest, the first part is retrieval
where you have to retrieve the information,
relevant information and then augment it
with your original query, pass that to the foundation model
to generate the accurate response.
Now there's so many aspects to it.
Now just imagine if you have
this large amount of information, right?
And then you say, okay,
let me just add everything to the model, everything,
what will happen?
Multiple things might happen.
First of all, your input size that the model can take,
which we call this context length, might not be enough
and you might get errors.
Second, just imagine if somebody throws us,
like throws a lot of information on us,
as human beings, we'll also be like, "Oh,
let me pick up the relevant one to answer the question."
How do I do that?
It takes us time, right?
The same goes for the model.
So what we need to do is provide relevant information
and that's where the retrieval part becomes super important.
So when we retrieve the relevant information
from our large knowledge data
and then provide that relevant context to the model, right?
So that relevant context,
we augment it with our original query
so that the model knows the question as well.
And then we feed that to the model and then the model,
that helps the model to generate responses.
And the prompt engineering
also plays a very important role over here
because we might want to add more instructions
to the model based on our use case.
So let's take a look at the use cases
for retrieval augmented generation.
So the first use case that comes into mind
when we think about RAG
is really improving the content quality.
How are we improving this content quality using RAG?
Is by reducing hallucinations.
So for example, as Ruhaab mentioned, right?
These models, when we are talking about pre-trained models,
they're really big, they're trained
on really big amount of data, but that data
was somewhere in some point in time, right?
That might not be very recent, so that's number one.
Oh, I don't have the recent data with the model,
and the model can act super intelligent
and provide you with incorrect answer
if you ask recent information
on which it was not trained on.
So in order to improve those quality of the answers
or the responses and remove hallucinations,
that's where we can use
retrieval augmented generation technique
to improve it, right?
Now we have covered that part,
but what if I want the model to answer
only based on my knowledge or on my enterprise data.
I don't want it to provide me answers
from its own knowledge.
I want to use the intelligence of this model
and channel it only and make it focus only towards my data.
That's where there's, you know, the applications
such as context-based chatbot
and the Q&A comes into play, right?
So you can use RAG technique to build those applications.
The third one is personalized search.
Why do we want to limit to question and answering,
why not add this technique,
because we are any ways retrieving the relevant content
and provide maybe augment our recommendation engine
and to create such type of application
based on my profile as a persona
that I might have in my preferences
or based on my previous history.
For example, if I'm on the retail side,
I bought certain products,
there is a history which is already there,
what if I want to use that along with my preferences
and show recommendations to my users?
So you can do that using the RAG technique as well.
And the last one is super close to me
just because it, the way it works.
So I wrote a book on applied machine learning
and high performance computing
in December 2022 it was published,
and somebody gave, and you know, at that time
this generative AI was also getting popular,
so somebody posted a review using generative AI
trying to summarize the book,
which was approximately 400 pages.
Now just imagine if you do that,
that was a really cool thing,
I really liked it by the way,
but it was missing the key points.
So how about using RAG techniques
to do the text summarization as well?
Or maybe I just want a summary of a particular chapter
that I'm interested in, right?
And make sure that it has all the key points
so we can use RAG techniques to do that as well.
And then when we have talked about the use cases,
how do we use different types of data, right?
You might be dealing with different types of data sets
or different types of retrievals can also happen, right?
So what technique should I use?
The one is can be simply based on certain rules
or the keywords or the phrases
and I fetch the documents, it works for me,
yes, let's use it, right?
So we have to work always backwards from our use case
or the data that we have on hand.
The second one is I might have a lot of structured data,
maybe imagine a use case,
and this is actually something that we have already built
with some of our customers.
Imagine a use case
where there is a natural language question,
but I have my data in a, let's say, analytical database
or a data warehouse or transaction data,
it can be anything, right?
And then based on that natural language question
we use the foundation model to create a query,
generate a query, and that query runs on your database,
gives the results back,
and then we use the foundation model
to synthesize those results to provide you the answer
of your original query, right?
So you as a user get a full experience
that I asked a question, I got the result,
but behind the scene,
you know, so many things were going on.
So that can be one approach.
The third one is semantic search.
And this I would like to explain with an example
because it really takes me back to high school,
or even before that actually elementary school.
So when I was in school,
there used to be a reading comprehension
where I was given this passage
and then there was certain questions
that we had to answer based on the passage, right?
So as a kid I was like, oh, I'm smart,
I'm not going to read the whole passage,
I'm going to use the keywords in the question.
I look up those keywords in the passage,
they're just two, three paragraphs, four paragraphs
and I'll be fine.
So I used to get 10 on 10 every time
and up until elementary school,
some part of the middle school,
and by the time I reached high school
that 10 on 10 actually reduces to 3 on 10, 4 on 10,
based on how lucky I was.
The reason was as I was growing, these passages
that were provided to us were becoming complicated.
The questions that were asked were tricky,
they were not anymore based on the keyword.
I literally had to understand the question,
I had to understand what the author is trying to say
at a high level before I can even attempt an answer, right?
And that's where the semantic search
for machines come into play.
So understanding the meaning of the text
and then providing you the answer, right?
So that's the third kind of retrieval
and we'll be mostly focusing
on this third kind of retrieval today.
So in order to do semantic search
or the meaning of the text, it looks lovely, right?
But when I have to do it and I have to do it,
like I'm not doing it like the machine is doing it, right?
So then what do I do?
How we, what we nearly need to do
in order to enable the machine to do it.
So what we really need to do over here
is convert our text into numerical representations.
Now why do we need to convert the text
into numerical representations?
Because we want to find the similar documents
or the text based on the question that is coming in,
and I'll double click on the numerical part in a moment,
but we have to convert the numerical representations
in such a manner that it's able to retain
the relationship between the words, right?
If it's unable to retain the relationship between the words,
then it won't be meaningful to me, or the machine, right?
Then the purpose is not solved.
So the selection of the embeddings model,
because you're not going to do it yourself, right?
You will use an embeddings model feed in the text
that will convert into numerical representations
that will maintain the meaning and the features
and the relationship between these words.
So that's how, you know, if you have to do semantic search,
you need an embeddings model, you need to convert
into numerical representations, your query will come in
and then you provide, it will fetch
the relevant results based on that.
So how does, is it helping me in briefly,
if we have to summarize it is helping me
to fetch the results based on the meaning,
it is helping me because I'm getting accurate context.
If I have accurate context
and I'm feeding accurate context to the model,
I'm getting accurate results, right?
So look at how we are connecting
each and every dot over here, right?
So selecting, so first you have your data,
then you have to split the data into chunks
so that you can create embeddings
and the high quality of the model,
which will create the embeddings,
will influence your response, the retrieval,
and the retrieval will influence your response, right?
So that's the reason why embeddings are important.
So which model to select.
And for that I'll hand it over to Ruhaab.
- Thanks Mani,
so embeddings might seem like a complicated process
and you know, the thought
of actually building an embeddings layer
seems a little daunting,
which is exactly why we launched
the general availability of the Titan text embeddings model.
And we actually launched this in September
and the Titan embeddings model
is optimized for text retrieval use cases and RAG use cases
and is available in over 25 different languages
such as English, Spanish, and Chinese.
And because Titan text embeddings is accessible
via Amazon Bedrock serverless experience,
you can access this model with a single API
without having to manage any infrastructure, right?
It's so easy, right?
You think about just pointing towards a model,
passing it the context and getting the embeddings built,
right, it's so incredibly easy to use through a single API.
Okay, so now that we know what embeddings are
and have some foundational knowledge of RAG,
let's really understand
what's happening underneath the hood, right?
Like what enables this from a technological perspective.
And before you can actually ask questions about, you know,
your data, the data actually has to be optimized
for a RAG use case.
And this is the data ingestion layer
and there's a workflow corresponding to that layer
and we'll go right to left in this workflow.
So it starts with your external data sources, right?
Your company data.
This could live in S3
and it could be in different file formats
or it could even be PDF documents and unstructured data.
We take this data
and then go through a process called chunking.
And chunking really is just the splitting off that data
into different segments, which is really useful
for optimizing for things like relevancy.
And then these chunks
are then passed into an embeddings model,
such as Titan text, and then ultimately stored
in a purpose-built vector database,
which is really optimized
for indexing and retrieval of embeddings
and can maintain the relationship and semantic meaning
that you get through an embeddings model
and once you go through this data ingestion workflow,
you're now ready to ask questions
and really see the true power of RAG.
This brings us to the text generation workflow
and it starts with a user asking a question.
So that question or query is then also, it goes through
that same embeddings model
to turn that question into a vector.
And that vector then is searched
in that same vector data source
that allows us to do things
like vector similarity search, right?
So where you don't have to ask questions
in that same rigid keyword context
where we can actually extract meaning
and look at similar aspects of that question
and how it might relate to documents.
And that's the real power of semantic search, right?
Is really looking at that relationships
and understanding meaning more deeply.
So once we get that search result,
that's the retrieval part, right?
We're retrieving that data from the vector database
and then we're passing that context
into the prompt for a foundational model.
So we're augmenting the prompt with these return passages
and that's the augmentation part, right?
That A, we're augmenting the prompt
and then ultimately this large language model,
the foundation model
is generating that final response, right?
And that's the G part.
And this workflow as you might imagine,
it can be fairly cumbersome, right?
And there's so much inherent complexity
associated with building a, you know,
a complete RAG application.
You have to manage multiple data sources,
you have to think about which vector database to use,
how do I make incremental updates to that vector database?
And it requires, actually, a lot
of different disciplines, right?
You need help from data scientists
and data engineers and infrastructure engineers.
Info about scaling and DevOps,
and a lot of this can seem daunting,
open source frameworks such as LangChain
have made this a little bit easier,
but it still requires a considerable amount
of development and coding.
And so how might we completely abstract away
all this complexity?
- And that's where we have
Knowledge Bases for Amazon Bedrock,
where we want to implement RAG or build applications
based on this RAG architecture that we just saw,
but in a very fully managed way,
so that you can focus on solving the business use cases
or the problems or you know, working on it.
And we take away all the undifferentiated
heavy lifting from you.
So how is Knowledge Bases for Amazon Bedrock
going to help you?
So first of all, it provides you
with the data ingestion part that we just saw, right?
So it will automate a lot of those things
and we'll see that in a moment.
The second part is
it will securely connect the foundation models
and even Agents for Bedrock with these knowledge bases
or your data sources, right?
The third is retrieval, right?
How we can easily retrieve the relevant data
and then augment our prompts.
So it will help you do that.
So we have features and we did recent announcements
and then we'll be doing a deeper dive on those.
And the last one is source attribution.
I don't trust anyone to be honest.
I'm just kidding, I trust a lot of people. (laughs)
But when it comes to machines, we need proof
and that's where source attribution comes into play.
That how do I know that my foundation model
is giving me the right response
because the response is based on these data sources
that I was providing, right?
So let's take a look.
Let's dive deep into the data ingestion workflow first,
because if you don't have the data in the vector DB,
you cannot really do the retrieval
augmentation and generation.
So the first part is data ingestion workflow
that we just saw, right?
In this case we are moving from left to right.
So you have new data and then the data sources, chunking,
embedding models, storing into vector DB, right?
Imagine you have to implement
each of these things on your own.
First of all, you would need resources
who can code really well.
Second, you can code
and then you have to do the maintenance of the code.
You might want to use open source frameworks,
which is great, but sometimes then you have to think about
the versioning piece of it, right?
So there's a lot that goes into it.
And then you also have to learn specific APIs
for the vector store that you're using.
What if we say and change everything
by providing you an option to and reducing it
to the choices.
What if we say choose your data source
and in this case we support Amazon S3 as a data source.
So you select your bucket
in which you have your documents, right?
And we provide support for incremental updates.
As in, when your new documents are coming in,
all you have to do is start the ingestion job, sync, right?
And then multiple data formats.
You don't have to really, you know,
worry about the different data formats
because with Knowledge Bases for Amazon Bedrock
we provide support for PDFs, comma separated files, CSV,
Excel documents, Word documents, STML files, Markdown files,
and I think that was pretty much it
and the text files as well, right?
And the list may grow as we move along.
So we have support for a lot of these file formats.
So you can literally have your data and then upload it on S3
and add it as a data source.
Then we provide you an option where you can do chunking,
like splitting your documents.
You might say, you know what,
I don't want to choose anything
because I might not be aware of those things.
That's fine too, so we have default chunking option,
which is defaulted to 300 tokens with 20% overlap.
So you don't have to choose if you don't want to, right?
But if you want to
and if you want to have a particular fixed chunks
that you are interested in, you can provide those as well.
So the second option that we have is the fixed chunking.
You provide the number of tokens
for each chunk of text that you want and an overlap,
we recommend having it between 10 to 20%.
And then choose your embedding model,
so right now we support Amazon Titan embeddings model
and Ruhaab has already covered that,
so I will not repeat that and it's important.
Just one thing that I want to double click over here
is when we say it supports 25 languages,
it's very important aspect
because remember when I was talking about the embeddings,
these embeddings, when we say numerical representations,
they are maintaining the relationship.
If the model doesn't understand the language,
it won't be able to maintain the relationship
between the words, right?
So it is important
that if your text is in a different language
that your model should know about this,
your embeddings model should know about it
and should be able to maintain that relationship.
The next part is the vector store.
So we are providing you options over here whether you want
to use OpenSearch Serverless, Redis or Pinecone, right?
So we have options over here
and all of this, you would do the choices,
you click the create knowledge base button
or if you're using SDK, that's a create knowledge base API
and everything is taken care for you.
It's automated and fully managed data ingestion
using Knowledge Bases for Amazon Bedrock.
So now we have our data ingested and ready to use.
So the next step is now how does my architecture looks like?
It looks like something like this, right?
We have knowledge bases now, the data is ready
but we still have to query, which we will provide the query,
create the embedding, retrieve the context,
augment the prompt, provide the foundation model,
still do the prompt engineering
and then get the response, right?
So we still have to do a lot of work.
What if we eliminate that
and takes away some of that heavy lifting as well.
So with that, we recently announced
two more features, or the APIs.
One is retrieve and generate,
which will literally retrieve the relevant documents,
feed it to the model and give you the response.
The second one is retrieve API
if you need more customization options.
So let's take a look.
This is how it'll look like, your whole architecture.
The user is asking a question
you call retrieve and generate API and you get the response
and this retrieve and generate API does the work for you.
It'll take your query, create the embeddings
with the embeddings model,
it will then augment it to your prompt
and then it will feed it to the model that you select.
Currently we support two models,
Claude Instant and the Claude version 2 by Anthropic.
So we support these two models that you can select
and get the generated response, right?
Pretty cool, and then if you say "This is good,
but I need more control," right?
"I don't want to do the heavy lifting
but I still want control, I still want to customize a bit."
That's where we have our second API,
which is the retrieve API where we enable you
and provide you the flexibility as well.
Over here we are still helping you
where you have your query, it will be, the retrieve API
will automatically use the embeddings model,
create the embedding for your query
and provide you the relevant data
or the relevant documents, right?
Now what you have to do once you get the relevant documents
is do the prompt augmentation, you have flexibility,
what instructions you want to provide to your prompt
based on the model
and literally use any model provided by Amazon Bedrock
or maybe, you know, you might have a custom model
or fine-tuned model that you were working
within the Bedrock system that you want to use
with retrieve API, you can do that, right?
So we have options
and we still want you to take full control
of your application, your decision points,
which really impact the answers that you are getting, right,
from your application.
So these are very important concepts.
Enough of talk, right?
Let's see something in action.
Let's see how it looks like in the console.
So the demo part now,
And I'll share my, now you can see my screen.
So where I am is basically I'm on the console
and I've searched Amazon Bedrock and then this is Bedrock.
And then I have to go to knowledge bases,
which is literally under the orchestration,
so we click over there,
and then it talks about
what all you can do from the console.
You can create a knowledge base, test the knowledge base
and then maybe use it, right?
So we'll go through that.
In order to make sure,
because you know we have limited time,
I've already created a knowledge base,
but I'll still walk you through
if you have to create what you will need to do.
So the first part when you see over here
is you have to create a knowledge base.
So you click on the create knowledge base button.
And by the way, whatever I'm showing you over here,
you can do it via SDK as well.
And then you provide a name.
I would would suggest that you provide
a very meaningful name over here
because you might end up having a lot of knowledge bases
and you don't want any confusion,
also add a meaningful description.
And then you need permissions in the role, right?
Because when we were talking about knowledge bases,
knowledge bases will be accessing your data in S3
and then we'll be storing it, creating the embeddings,
so they need access to the embeddings model as well.
And also they will be storing the embeddings
into the vector DB.
So they need access to those as well.
So make sure that your Amazon Bedrock execution role
for knowledge base has all those permissions.
And if you're unsure how to do that,
simply select create and use a new service role option
so that it's automatically created for you.
And then we go next, data source.
So provide a meaningful data source name,
provide an S3 location.
I'm just going to type,
this is not a existing S3 bucket to be honest,
I just provided the name for the demo purpose.
And then additional settings,
this is where you get to select your chunking strategy.
So you can select three options as I mentioned earlier,
default, fixed size, no chunking, right?
So you have options over here as well.
Let's do fixed size and then I can select,
maybe I want to do 512.
And typically your overlap should be around 10 to 20%,
that's what our recommendation is.
So since we only support right now
Titan embeddings model, so that's there.
And then if you say that, you know what,
I don't want to create a vector DB,
I want you to create a vector DB on my behalf
because we attended that talk
and it said fully managed RAG, right?
So that's where we have that option
that you can select this quick create
which will automatically create a vector DB
and it will create an OpenSearch serverless.
So you can choose that.
But you know what, again, we have to give you options.
What if you have an existing vector database
or an index that you want to populate.
So you can literally, if you have an index
in OpenSearch Serverless, Pinecone
or Redis Enterprise Cloud,
you literally provide the details about those
and then go next, that's it.
- [Ruhaab] And you might have heard
in the announcement today that we will be supporting
new vector database types soon,
including Aurora and MongoDB
with likely more vector database options coming.
- Yeah, so stay tuned.
Okay, and then you review your setup
and click on create knowledge base, right?
So because we want to be cognizant of the time,
we already have a knowledge base.
Now this knowledge base is based on the text documents.
So when you have created the knowledge base,
you will actually land up over here.
You will not land on there.
It's only when you go back to the knowledge bases
where you can see the list.
So once you create, you will be here.
And then most important point
is once you create a knowledge base,
you have to click on the sync button,
this is very important.
Because when we are saying we created a knowledge base
that was good but we have to sync.
Sync is the actual thing, when you do that,
it basically, it will look up
all your data that you have in S3,
it will pre-process those documents,
extract the text out of it,
split it into chunking strategy that you provided
and then pass it through the embeddings model,
then store it in the vector DB.
So that's the sync thing.
And when you have, let's say new files in your S3,
you press the sync button again
or call it SDK start ingestion job, right?
So it will literally make sure that everything is in sync.
So you need to do that, I've already done that.
And then you want to test it, right?
So if you have generate responses,
meaning we are using generate, retrieve and generate API
or you can use when you untoggle this
then it's only retrieval.
So let's start with generate responses.
I first need to select my model
and then I can select either Instant model or Claude v2,
We have 2.1 as well that was also recently announced.
Okay, hold on.
Yeah, it was too much zoomed out. (laughs)
And then you select the model
and then you can ask the question.
Now since my documents are based on tax data,
my knowledge base has all the tax related data set,
then I can ask a question.
So what I'm asking is "If I work remotely,
which state do I owe taxes," right?
I mean I just selected that because a lot of us, you know
during the pandemic we were working from home.
So I was like why not ask something like that?
And I know a lot of us are now back to office,
which is also cool.
Okay, now you click on the show result details
and notice some important things.
First of all, it is giving me the response very quickly.
Second, I can literally see the source attribution
right on my screen, right away.
So important points.
If I work remotely but your employer
is located in a particular state,
you may owe income taxes to that state
and I'm not going to read the entire thing.
So, and then if you have to look up the source
that the model used is basically right over here, right?
So it provides you
and if there are multiple sources, you will see
multiple tabs over here and I'll show you in a moment.
You can literally go to the location
of that document as well.
So this was about how we were doing retrieve and generate,
what if I just want to retrieve, right?
Let me ask the same question
because it will just make it easier for us to go through.
By default it'll retrieve top five documents,
top five most relevant documents.
And then I can go show details and I'll look at it.
So I'm seeing this particular chunk from this p17.pdf
and then another chunk from another PDF
and another chunk from another PDF,
and this retrieve API also gives you a score.
So based on which vector DB you are using
and which score you are using for that vector DB
for example, if you're using cosign similarity,
so the score will be based on that.
If you're using using Euclidean distance,
it will be based on that, right?
So the score option is also there.
So this was about how you can, you know,
use it on the console.
And we also have another demo where I will show you the APIs
and how we can do the integration with LangChain.
But the important point is if I have to build
retrieval augmented generation applications
with Knowledge Bases for Amazon Bedrock and use those APIs,
you can literally do that end-to-end
using the features that we just talked about.
But what if I want to,
I have some dynamic responses, right?
Sorry, what if I have some dynamic information
that I need to fetch
in addition to what I have in my knowledge bases?
Maybe I have a knowledge base
which has lot of order details,
but I also want to call some order API,
which gives me the status of my existing order,
which is in transit, right?
Or do multiple things around that, right?
So what do I need to do
if I want to integrate the knowledge bases
with let's say agents or other ecosystem of Amazon Bedrock?
So Ruhaab, over to you, let's please walk us through it.
- Great, if we could just go back to the slides please.
So as Mani mentioned, you saw how easy it was
to get started in terms of uploading documents
into vector database and begin
immediately interacting with those questions, right?
Just in the matter of a few steps,
you had a fully functioning RAG application
and if you recall earlier, the information
that we're storing in that document is relatively static
and even though it's syncing maybe behind the scenes
in some cadence, there's going to be times
where you need an application interact real time
with databases or even other tools and systems.
And this is what Agents for Amazon Bedrock
is really built to do
and a knowledge base can work directly with an agent
to enable that use case.
And so if you think about the real power of a agent
is they're very specific, you know, models
that are used for planning and executing tasks,
leveraging reasoning capabilities
such as chain of thought processing
and these state-of-the-art approaches
are great when you want your application
to interact with an API,
automatically generate the dialogue with the user
to collect the information needed
to execute that API in action.
You're not having to find the conversational flow,
the parameters of the API can be automatically collected
with this model asking for those bits of information
to fulfill the required arguments
of calling an API or interacting with the tool
and orchestrating those actions.
And agents, as I mentioned,
can also be combined with knowledge bases.
And you'd want to do this
when you're looking to combine actions
and those information retrieval type use cases
where you're just simply fetching context from a document
and using that as supplemental information
of interacting with the tools.
And lastly with agents and knowledge base,
again, all of this, all of these workflows
are completely abstracted away as a fully managed service
built directly in Amazon Bedrock.
So let's take an example of how this might work
when Knowledge Base and Agents need to work together.
So as I mentioned, Agents can orchestrate
the user requested task, so let's take an example
where you are asking this application to send a reminder
to all your policy holders with pending documents, okay?
And what happens is that this model
can really plan for that execution of that task
by breaking this down into smaller subtasks
planning that such as, you know,
getting the claims from a certain time period.
It may have to identify what is the paperwork even required
in this process, which might be, as I mentioned,
the knowledge that's in your knowledge base.
And then ultimately sending that reminder
so the agent can determine the right sequence of steps,
facilitate the dialogue, collect the information,
and even handle error scenarios along the way.
So an incredibly powerful ability to orchestrate
across a dynamic sequence of actions across, you know,
knowledge bases and APIs and tools
and really offer a seamless experience.
So something you know, as simple as asking for a machine
to send a reminder to all policy holders
with pending decisions, you can see the complexity
that's abstracted away to, you know,
really make that use case possible.
And now we'd like to show you
how you can also use open source generative AI frameworks
like LangChain to build knowledge bases.
And I'll have Mani walk us through another demo.
- Yes, and for that I'll be sharing my screen, so okay.
In this particular demo I'll be using the APIs
because a lot of us here might love the APIs
and the SDK experience
in addition to the console experience as well.
And also LangChain provides you
with a lot of wrappers which are pre-built.
And why do we need to reinvent the wheel
when we have something out there and we want to reuse it
but we want to reuse it with the latest features
that we just showed you, right?
So let me take you to this quick journey.
So first of all, make sure that you have
the latest Boto3 versions
and you have the latest LangChain version.
So it has to be equal to a greater than the versions
that I'm showing over here.
For LangChain it's 0.0.342
for Boto3, it's 1.33.2, right?
So make sure you have equal to
or greater than these versions.
Now the first thing that you need to do
is basically provide setup, right?
So as with any AWS service,
when you are wanting to use the APIs
you first have to create the client.
So for Bedrock and in this particular case
we need two client, one is the Bedrock runtime client,
which helps us call the invoke model.
The second is Bedrock agent runtime using which we will call
the retrieve API for the Knowledge Bases for Amazon Bedrock.
So this is what we are doing it over here
providing some model parameters
because remember this is retrieve API
and you can connect to any model provided by Amazon Bedrock.
So that's what we are doing.
You can, you provide the model parameters,
you select your model.
Now the actual retrieval.
So if you are planning to use the retrieve API
with LangChain, you will need to first
initialize a retriever object.
So you have to import Amazon Knowledge Base retriever
from LangChain and then use it.
So what do I need to pass?
I need to pass the number of results,
the relevant documents that I want, right?
So that's what I'm providing it over here
and the knowledge base ID
because how will this know which knowledge base
to get information from, super important, right?
And let me show you how you will get the knowledge base ID,
because if you are using SDK,
then you will automatically get it
as a response from the API and you can leverage it.
If you're using it from the console,
then you click on the knowledge base
and that's where you get the knowledge base ID.
So you can literally copy it
and I'm using the same knowledge base over here as well.
So a quick thing, I just wanted to point that out, right?
So now you have the relevant documents.
Now what we are going to do
is if I'm building a Q&A application with LangChain
provides you with a retriever QA chain,
and all I need to do is,
I've already declared the large language model,
I've already declared my retriever,
let's use the retrieval QA chain, pass everything together
and then keep asking questions, right?
Let's move to that part now.
And this is just showing that you know,
if you just want to retrieve the documents,
get relevant documents, you can do that.
But if you're integrating it with retrieval QA chain,
you don't need to do that to be honest,
all you need is this retriever object.
So let's take a look how we integrate.
Okay, so that's where we have the retrieval QA chain.
Now I provide my language model
which will give me the response,
then I provide my retriever object,
which will give the relevant documents
and I also provide the prompt.
So now I have flexibility that I can provide my own prompt,
I can provide my own instructions,
and this retrieval QA chain will automatically augment
the relevant documents with my prompt.
And just so that you are aware,
I just wanted to show you the prompt template as well.
So you can provide specific instructions
and model specific prompting.
So it's very important you can literally say
the model only provide information
based on the documents, right?
And based on this relevant documents.
So based on your use case, you can provide instructions.
And then once you have integrated it
with your retrieval QA chain,
you literally provide the query
to this QA object that you have created
and it will keep giving you answers.
So you don't have to initialize it over and over again.
You can literally ask multiple queries, get the answers,
multiple queries, get the answers,
and now you have a running Q&A application
with just three things, initializing your model,
initializing the Amazon knowledge base retrieval
with LangChain and then retrieval QA chain,
passing everything together
and we have the application ready, right?
So, and you can use the same pattern
if you want to build a contextual based chat bot
with the conversational chains
that LangChain provides, right?
So do explore and if you are interested
in looking through the same code, we have it on GitHub,
we'll share the resources with you.
So Ruhaab, can you just do a recap for us?
- Yeah, absolutely, thanks Mani,
and if you go back to the slides
and you know, it's incredible how quick it is
to get started using this both using LangChain,
but also if you prefer using the Bedrock console,
you saw that there's flexibility of choice
and really getting to the same output.
And so yeah, if we could go back to the slides, please,
if we just quickly recap kinda what we all covered today.
It seems like a lot and it was,
so thank you for attentively listening.
We first covered, you know, why customization is important,
the different approaches for customization,
both between augmentation and then other approaches
that are actually changing the parameters
and weights of the model.
We talked a little bit about retrieval augmented generation,
what the specific use cases are for RAG,
and then all of the different components
from data ingestion to the query workflow
and how a lot of that is just completely abstracted away
using Knowledge Bases for Amazon Bedrock.
And lastly, we talked about how knowledge bases
can be further extended when you need them to interact
with real time data and databases and APIs
where agents and knowledge bases together
can really help enable that capability.
And if you want to take a quick picture of this,
a lot of the notebooks that Mani showed you earlier
and a few more examples will be published in GitHub
for you to take a look and use as inspiration for your work
as well as the documentation
which deep dives further into Knowledge Bases.
So we hope you check that out.
So we just wanna say thank you for attending.
I hope this was useful.
Our LinkedIn handles are here,
we would love to hear from you
and see how you're using Knowledge Bases
and what feedback you have.
And don't forget to take the survey in your app
so that Mani and I can get invited again next year
to give a talk at re:Invent
and really appreciate you coming today.
Thank you and have a great conference.
(audience applauds)
Browse More Related Video
5.0 / 5 (0 votes)