AWS re:Invent 2023 - Use RAG to improve responses in generative AI applications (AIM336)

AWS Events
4 Dec 202358:50

Summary

TLDRこのプレゼンテーションでは、検索を強化した生成(Retrieval Augmented Generation, RAG)を使用して、生成型AIアプリケーションの応答を向上させる方法について説明されています。カスタマイズの重要性、カスタマイズの方法、RAGの具体的な使用例、データ取り込みワークフローから問い合わせワークフローまで、Amazon Bedrockを使用してこれらのプロセスを簡単に実装する方法についても触れられています。また、LangChainのようなオープンソースの生成型AIフレームワークを使用して、Knowledge Bases for Amazon Bedrockを構築する方法も紹介されています。

Takeaways

  • 📚 カスタム化された基礎モデルを使用することで、特定のドメイン言語や独特のタスクに適応させ、企業の外部データとの統合を向上させることができます。
  • 🔍 検索を補助する生成(Retrieval Augmented Generation: RAG)は、応答の品質と正確さを向上させるために外部知識源を活用する技術です。
  • 🌐 Amazon BedrockのKnowledge Basesを使用することで、RAGアプリケーションの構築を簡単かつ管理しやすいようにすることができます。
  • 📈 データのインジェストワークフローを最適化し、テキストを数値表現に変換することで、意味を保持したセマンティック検索が可能になります。
  • 🔧 組み込みのTitan text embeddingsモデルを使用することで、テキストの数値表現を作成し、関連性の高いドキュメントを検索することができます。
  • 🛠️ Knowledge Bases for Amazon Bedrockは、データのインジェスト、保存、取得、生成を自動化し、完全に管理されたRAGを実現します。
  • 📊 知識ベースとエージェントを組み合わせることで、静的なデータと動的なデータやAPIを統合し、高度なタスクを自動化できます。
  • 📈 RAGの使用例には、コンテンツの品質向上、コンテキストベースのチャットボット、パーソナライズド検索、テキスト要約などが含まれます。
  • 🔗 OpenSearch Serverless、Redis、Pineconeなどのベクトルデータベースオプションを利用することで、Knowledge Basesのニーズに応じた最適なソリューションを提供できます。
  • 📚 LangChainのようなオープンソースの生成型AIフレームワークを使用することで、Knowledge Bases for Amazon Bedrockと簡単に統合できます。
  • 💡 データのインジェスト後、同期(Sync)を行うことで、S3にある新しいファイルが自動的に処理され、知識ベースが更新されます。

Q & A

  • なぜ生成的AIアプリケーションで応答を改善するために検索を強化生成を使用する必要があるのですか?

    -検索を強化生成(RAG)を使用することにより、特定のドメイン言語に適応させ、独特のタスクに適応させ、企業の外部データ(FAQやポリシーなど)をモデルに認識させることで、応答の質を向上させることができます。

  • 基礎モデルをカスタマイズする際の一般的なアプローチは何ですか?

    -基礎モデルをカスタマイズする一般的なアプローチには、プロンプトエンジニアリング、検索を強化生成、モデルの微調整、およびモデルをスクラッチから訓練するなどがあります。

  • プロンプトエンジニアリングとは何ですか?

    -プロンプトエンジニアリングは、基礎モデルに渡すユーザー入力(プロンプト)を設計し、反復して調整することで、望むべき出力を得る方向性を誘導する技術です。

  • 検索を強化生成(RAG)の基本的なステップは何ですか?

    -RAGの基本的なステップは、文書のコーパスからテキストを取得し、それを基礎モデルに渡して企業データに基づく応答を生成することです。

  • Amazon BedrockのKnowledge Basesとは何ですか?

    -Amazon BedrockのKnowledge Basesは、企業のデータソースを自動的にインデックス化し、検索を強化生成(RAG)アプリケーションを簡単に構築できるフルマネージドサービスです。

  • Knowledge Basesを使用する際のデータインジェストionワークフローは何ですか?

    -データインジェストionワークフローは、外部データソース(例:S3にあるドキュメント)を取得し、チャンキングして埋め込みモデル(如:Titan text)を通過させ、最適なベクトルデータベースに保存するプロセスです。

  • Titan text embeddingsモデルはどのような役割を果たしますか?

    -Titan text embeddingsモデルは、テキストを数値表現に変換し、語の関係と意味を維持することで、セマンティック検索やRAGの使用ケースを最適化します。

  • retrieve and generate APIとは何ですか?

    -retrieve and generate APIは、ユーザーの質問に基づいて関連文書を取得し、それをモデルのプロンプトにフィードして最終的な応答を生成するAmazon BedrockのAPIです。

  • Knowledge BasesとAgents for Amazon Bedrockを組み合わせることの利点は何ですか?

    -Knowledge BasesとAgents for Amazon Bedrockを組み合わせることで、静的なドキュメント情報と動的なデータベースやAPIとの情報を統合し、タスクの計画と実行を自動化する高度なアプリケーションを作成できます。

  • LangChainを使用してKnowledge Basesを構築する方法はどのようになるのですか?

    -LangChainを使用してKnowledge Basesを構築するには、Boto3とLangChainの最新バージョンをインストールし、LangChainのラッパーを使用してAmazon Knowledge Base retrieverを初期化し、質問に応じて応答を生成する必要があります。

  • RAGアプリケーションを構築する際に使用されるembeddingsの重要性は何ですか?

    -embeddingsは、テキストを数値表現に変換する際に語の関係と意味を維持するため、セマンティック検索やRAGの精度と正確性を確保する上で非常に重要です。

Outlines

00:00

📝 イベントの開始とRAGの紹介

イベントの開始と、RAG(Retrieval Augmented Generation)の改善について話し、カスタマイズの重要性やアプローチについて触れています。また、Knowledge Bases for Amazon Bedrockの紹介と、RuhaabとManiの自己紹介を行います。

05:02

🛠 カスタマイズ方法の概説

基礎モデルをカスタマイズする方法について説明しています。プロンプトエンジニアリング、RAG、モデルの微調整、モデルのトレーニングから始めることまでのアプローチについて詳細に説明します。

10:03

📚 RAGの仕組みとユースケース

RAGの基本的な仕組みとそのユースケースについて説明します。RAGは、外部の知識源を利用して応答の品質と正確さを向上させる方法です。コンテンツ品質の向上、企業データに基づく回答、パーソナライズされた検索、テキスト要約などがあります。

15:06

🔍 データの種類と検索方法

RAGで使用するデータの種類と検索方法について説明します。ルールに基づく検索、構造化データ、セマンティックサーチの3つの方法について詳細に説明します。また、テキストを数値表現に変換する必要性と、エMBEDDINGSモデルの選択についても触れています。

20:06

🌐 テキストエMBEDDINGSの重要性

テキストエMBEDDINGSの重要性について説明します。エMBEDDINGSは、テキストを数値表現に変換し、単語間の関係を維持するためです。これにより、意味を理解して応答を生成できます。Titan text embeddingsモデルについても触れ、その優化と利用方法について説明します。

25:09

🔧 RAGのデータインジェストと内部メカニズム

RAGのデータインジェストワークフローと内部メカニズムについて説明します。データのチャンキング、エMBEDDINGSモデルによる処理、ベクトルデータベースへの保存、そしてテキスト生成ワークフローについて詳細に説明します。

30:09

📋 Knowledge Bases for Amazon Bedrockの機能

Knowledge Bases for Amazon Bedrockの機能について説明します。データインジェストの自動化、ファウンデーションモデルとアジェントの接続、データの検索とプロンプトの強化、出典の属性付けなど、Knowledge Basesの利点と使い方について詳細に説明します。

35:10

🔄 データベースとの連携と動的な情報の取り扱い

Knowledge BasesとAgents for Amazon Bedrockの連携方法と、動的な情報を取得する方法について説明します。Agentsは、タスクの計画と実行、APIとの相互作用を担当し、Knowledge Basesは文書から情報を取得して補完します。これにより、アプリケーションはリアルタイムデータと数据库、APIと連携できます。

40:13

🛠 LangChainを使用したRAGアプリケーションの構築

LangChainを使用してRAGアプリケーションを構築する方法について説明します。Boto3とLangChainのバージョン確認、クライアントの設定、リトロイバーオブジェクトの初期化、そしてQAチェーンとの統合について詳細に説明します。GitHubにあるコード例を参照することで、QAアプリケーションやコンテキストベースのチャットボットを構築できます。

45:14

🎉 まとめと今後の展望

演讲の内容をまとめ、カスタマイズの重要性、RAGのユースケース、Knowledge Bases for Amazon Bedrockの機能、AgentsとKnowledge Basesの連携方法について再確認します。また、GitHubで公開されているノートブックやドキュメント、今後の予定についても触れています。

Mindmap

Keywords

💡retrieval augmented generation (RAG)

RAGは、検索エンジンの応答の品質と正確さを向上させるための技術です。この技術は、外部の知識源を利用して、基盤となるモデルの応答に関連性の高い情報を組み合わせます。ビデオスクリプトでは、RAGがどのように動作し、どのようなユースケースに適用されるかについて詳細に説明されています。

💡foundation models

基盤モデルとは、大量の事前訓練済み知識をエンコードしたAIモデルのことで、一般的なタスクや問題に対処するために使用されます。ビデオスクリプトでは、これらのモデルをカスタマイズすることが、特定のドメイン言語や独特のタスクに適応させるための重要な手順とされています。

💡prompt engineering

プロンプトエンジニアリングは、基盤モデルに対して入力として与えるプロンプト(ユーザー入力)を設計する技術です。プロンプトを適切に設計することで、モデルの出力をコントロールし、より正確な応答を得ることができます。

💡embeddings

埋め込み(embeddings)は、テキストを数値表現に変換するプロセスで、语彙間関係を保持するために使用されます。これにより、機械がテキストの意味を理解し、関連性の高い情報を検索することができます。

💡Knowledge Bases for Amazon Bedrock

Amazon Bedrockのための知識ベースは、完全にマネージドサービスで、RAGアプリケーションを構築するためのデータインジェスト、検索、プロンプトの強化、および応答生成を簡素化するプラットフォームです。

💡Agents for Amazon Bedrock

Amazon Bedrockのためのエージェントは、タスクの計画と実行を担当するモデルで、APIとの相互作用やツールとの対話を自動化する機能を持ちます。これにより、アプリケーションはユーザーとの対話中に動的な情報を取得し、タスクを実行することができます。

💡LangChain

LangChainは、オープンソースの生成型AIフレームワークで、RAGアプリケーションを構築するためのツールとライブラリを提供します。LangChainを使用することで、開発者はAPIやSDKを利用して、より簡単にアプリケーションを構築できます。

💡Titan text embeddings model

Titan text embeddingsモデルは、テキストを最適な方法で数値表現に変換するための最適化されたモデルで、25以上の異なる言語をサポートしています。このモデルを使用することで、RAGのセマンティックサーチのための高品質な埋め込みを生成できます。

💡data ingestion workflows

データインジェストワークフローは、外部データ源からのデータを受け取り、それを処理してベクトルデータベースに保存する一連のステップです。これにより、RAGアプリケーションが質問に答えることができるように、必要な情報が事前に準備されます。

💡fine-tuning

ファインチューニングは、基盤モデルのパラメーターを調整して、特定のタスクやデータセットに適応させるための手法です。これにより、モデルは特定のビジネスニーズやデータに関連するタスクをより適切に実行できるようになります。

💡source attribution

ソース帰属は、モデルの応答がどのデータソースに基づいて生成されたかを確認するプロセスです。これにより、ユーザーはモデルの信頼性を高め、応答が提供された情報源の正確性を確認できます。

Highlights

The talk focuses on using retrieval augmented generation (RAG) to improve responses in generative AI applications.

Foundation models have vast pre-trained knowledge but may not understand specific company domains.

Customization of foundational models can adapt to domain-specific language, improve performance on unique tasks, and incorporate external company data.

Prompt engineering is a simple approach to customize models by crafting user inputs to guide model outputs.

Retrieval augmented generation leverages external knowledge sources to enhance the quality and accuracy of responses.

Model fine-tuning allows for adaptation of foundational models on specialized task-specific datasets.

Training a foundational model from scratch offers complete control over training data but requires extensive resources.

The decision on which customization methodology to use depends on the task and whether it requires context from external data.

Retrieval augmented generation is useful for improving content quality by reducing hallucinations and providing context-based responses.

RAG can be applied for personalized search by retrieving relevant content based on user profiles and preferences.

The Titan text embeddings model is optimized for text retrieval and RAG use cases and supports over 25 languages.

Knowledge Bases for Amazon Bedrock provides a fully managed RAG experience, abstracting away complex workflows.

Agents for Amazon Bedrock can interact with knowledge bases and APIs to orchestrate tasks and provide dynamic responses.

Open source frameworks like LangChain can be used in combination with Amazon Bedrock services for building RAG applications.

The process of data ingestion, embedding, and retrieval is automated and managed by Knowledge Bases for Amazon Bedrock.

Source attribution feature provides transparency by showing the data sources used to generate model responses.

The retrieve and generate API simplifies the process by handling query embedding, document retrieval, and response generation.

Transcripts

play00:00

- Alright, I think it's time to get started.

play00:02

So thank you everyone for attending our talk

play00:05

on how to use retrieval augmented generation

play00:08

to improve your responses in generative AI applications.

play00:13

So hopefully that's what you're here for

play00:15

and if not, then you're stuck with us.

play00:17

So let's get started.

play00:20

We've gonna action pack agenda today

play00:21

where we're gonna cover a variety of topics

play00:25

including customizing foundation models,

play00:29

why you should think about customization

play00:32

and common approaches for how you should customize.

play00:36

Then we will go specifically

play00:38

into retrieval augmented generation or RAG,

play00:42

and we'll deep dive into how it works

play00:44

and cover all the different components

play00:46

from the data ingestion workflows to how embeddings work

play00:51

and demystify a lot of the concepts

play00:54

that you might have heard throughout this conference.

play00:59

We'll introduce Knowledge Bases for Amazon Bedrock,

play01:02

which I hope you heard in the keynotes from Adam and Swami.

play01:07

And then we'll talk about

play01:09

how we're making the building of RAG applications

play01:12

really, really easy.

play01:15

And then we'll cover lastly how these capabilities

play01:18

of knowledge bases work

play01:20

with other parts of the Bedrock ecosystem

play01:23

such as Agents for Bedrock and also how you can leverage

play01:27

open source generative AI frameworks

play01:29

such as LangChain for building

play01:31

retrieval augment degeneration capabilities.

play01:35

And I forgot to introduce myself.

play01:37

And so I'm Ruhaab Markas.

play01:40

I lead product management

play01:41

for Knowledge Bases and Amazon Lex.

play01:44

- And I'm Mani Khanuja,

play01:46

technical lead for generative AI specialist

play01:49

in the worldwide specialist organization.

play01:51

And today I'll be co-presenting with Ruhaab

play01:55

and taking you through this journey of how you can build

play01:58

your retrieval augmented generative applications

play02:04

using Knowledge Bases for Amazon Bedrock.

play02:09

- So quick show of hands, who has heard

play02:11

of Knowledge Bases for Bedrock,

play02:13

either through the keynote or through our previews.

play02:16

Okay, fantastic, quite a few of you have,

play02:19

and it looks like a few of you haven't,

play02:20

so we'll definitely have a variety of content

play02:24

for everyone here today.

play02:27

Alright, so first I want to talk about why

play02:29

you should think about customizing a foundational model.

play02:33

So foundational models

play02:34

have a vast amount of pre-trained knowledge

play02:37

directly encoded into the model, right?

play02:40

That's, if you've heard the term GPT,

play02:42

that's really what the P stands for, right?

play02:43

That pre-trained knowledge.

play02:46

But it's important to understand that in many ways

play02:49

these models don't really know

play02:51

a lot about your specific company, right?

play02:54

And so the first reason

play02:55

you might want to customize foundational model

play02:57

is to adapt to domain specific language.

play03:01

Let's say you're in healthcare

play03:03

and you need the model to understand

play03:04

all the medical devices that you sell, right?

play03:07

That doesn't come out of the box in most instances.

play03:12

Secondly, you might want these models

play03:15

to really perform better at really unique tasks

play03:20

suited for your company, right?

play03:21

Let's say you're a financial services company

play03:24

and you're looking to do more advanced accounting

play03:27

or analysis on earnings report,

play03:29

and you wanna teach these models about tasks

play03:32

and help really specialize these models

play03:35

on a company specific data set or task.

play03:38

And lastly, you may want to think about customization

play03:42

when you want to improve the context and awareness

play03:45

of these models with your external company data.

play03:49

So how do you bring company repositories such as FAQs

play03:53

or policies and other documents that exist in your company

play03:58

and pass that as context into a foundation model.

play04:03

So those are a few reasons why

play04:05

you may want to think about customization.

play04:08

And now we'll cover, you know, some common approaches

play04:12

of how you can customize a foundation model.

play04:16

And we'll talk about a few common approaches,

play04:19

these aren't exhaustive and there are other approaches

play04:22

and these will, you know, grow incrementally

play04:26

in terms of the complexity, the cost

play04:29

and time it takes to implement these changes.

play04:32

So the most simple approach

play04:35

for customizing foundational model is prompt engineering.

play04:38

And a prompt is simply the user input

play04:42

that you pass to a foundation model, right?

play04:44

And these prompts can be crafted and iterated upon

play04:48

to really steer the direction

play04:51

to get the right output foundation model.

play04:54

And there's a variety of different approaches

play04:56

that you can leverage for prompt engineering.

play04:59

Prompt priming, prompt weighting,

play05:02

or even chaining different prompts.

play05:05

So prompt priming is really the most basic form

play05:09

of prompt engineering, which is just taking a input

play05:12

or a form of instructions

play05:14

and passing that to a foundation model.

play05:17

Sometimes you can even pass specific examples

play05:19

or tasks in the foundation model through the prompt,

play05:22

and that's known as in context learning.

play05:25

Another approach, as I mentioned, is prompt priming

play05:28

where you, sorry, prompt weighting,

play05:31

which is giving more emphasis

play05:33

on certain elements of the prompt

play05:35

that you want the foundation model

play05:37

to really focus on, right?

play05:38

So if you tell the the model, you know,

play05:43

definitely don't respond

play05:45

to something that you don't know about, right?

play05:47

Capitalizing that and putting 5,000 exclamations,

play05:49

like those things actually do bias and put emphasis

play05:52

on certain parts of your instructions.

play05:55

And lastly, there's prompt chaining,

play05:57

which is taking more complex prompts

play05:59

and breaking that down into more discreet parts

play06:02

where outputs of a certain prompt

play06:05

are then passed as an input into the next task.

play06:08

So those are just a few examples of prompt engineering.

play06:12

Secondly, there is retrieval augmented generation,

play06:17

which is all about leveraging external knowledge sources

play06:21

to improve the quality and accuracy of responses.

play06:26

And when I use the term external knowledge sources,

play06:30

it's likely that these knowledge sources

play06:32

are actually internal to your company,

play06:35

but it's external in terms of the knowledge

play06:37

of the pre-trained model, right?

play06:39

You're really helping bring new knowledge

play06:43

to the foundation models, hence the term external, right?

play06:45

It's external to the pre-trained foundation models.

play06:49

And we'll really deep dive

play06:50

into retrieval augmented generation

play06:52

throughout the presentation, but the basic steps being

play06:55

you're retrieving some form of text

play06:58

from a corpus of documents.

play07:00

You're using that as context to a foundation model

play07:04

to ultimately generate a response that's grounded

play07:08

in the knowledge of your company's data, right?

play07:11

Which is extremely powerful when you think about

play07:13

using the advanced reasoning capabilities foundation models,

play07:16

but really steering that towards the knowledge

play07:19

specifically from your company data.

play07:23

So these two forms of customization

play07:27

is really about augmenting a foundation model.

play07:30

We're not actually going in

play07:31

and changing anything in the foundational model itself,

play07:35

but there are approaches that allow you to do that,

play07:38

such as model fine-tuning,

play07:41

and model fine-tuning, it allows you

play07:43

to really adapt a foundational model

play07:46

on a specialized task specific data set.

play07:49

And this is a supervised approach,

play07:51

meaning that you're training the foundation model

play07:54

on labeled examples of tasks

play07:58

and you've specified the expected output and outcome

play08:02

through those examples, allowing you to really, you know,

play08:06

train this model on a specialized task.

play08:10

And through fine-tuning, you're actually updating

play08:13

the weights of the model, right?

play08:16

The parameters of the models are actually being adjusted

play08:19

based on this customization.

play08:22

And lastly, and arguably the most costly

play08:26

and time intensive approach

play08:28

is training the foundation model from scratch.

play08:31

And it's a approach where you really want

play08:36

to be in complete control of the data

play08:41

that's used to train the model.

play08:43

You may want to remove any inherited bias that might exist

play08:48

from some of the other contexts

play08:51

that the model is trained on,

play08:52

and it's giving you complete control

play08:54

and building a domain specific model.

play08:58

But obviously this requires an extensive amount

play09:00

of task specific data, a lot of compute resources

play09:05

making it obviously one of the more, you know, complex

play09:09

and time intensive and costly approaches.

play09:11

And so while we talk about a few approaches

play09:13

of customization, again, this is not exhaustive,

play09:16

but some of the more common approaches

play09:18

that you'll see in model customization,

play09:21

and today we'll focus specifically on RAG.

play09:26

So now that we know why you should customize

play09:28

and common approaches on how to customize,

play09:31

let's look at a mental model

play09:33

for when you should use certain methodologies.

play09:38

So it's easier to make a decision

play09:39

of what approach to really take.

play09:42

And it all starts by thinking about the task

play09:45

that you want these foundation models to execute.

play09:48

And does this task require context from external data

play09:52

is kind of that first decision point.

play09:55

And if the answer is yes,

play09:57

you really then have to think about

play09:58

is that data access needed in real time?

play10:03

And if the data is relatively static

play10:06

where it's not changing on a realtime basis,

play10:09

such as frequently asked questions or policies,

play10:13

that's a classic use case

play10:14

for retrieval augmented generation.

play10:17

But just because I use the term relatively static

play10:19

doesn't mean that this data isn't changing, right?

play10:21

And so I don't want that to be misleading

play10:25

because you can, you know,

play10:26

have data changing in this construct as well,

play10:30

but it's not changing real time.

play10:32

However, if the data is changing real time

play10:35

and you also need the ability to connect to different tools,

play10:38

meaning that I'm fetching or querying data from databases

play10:42

or I'm interacting with APIs or applications and tools,

play10:47

that's a use case for Agents for Amazon Bedrock.

play10:51

And we'll also cover how agents and knowledge bases

play10:54

can be brought together to really build, you know,

play10:58

a really powerful, you know, capabilities,

play11:02

so these aren't mutually exclusive by any means.

play11:06

Next, if you have a use case

play11:09

that's more, it's leveraging historical data, right?

play11:14

A historical, like a snapshot of information

play11:17

and it's a relatively simple task

play11:19

that might already perform really well

play11:22

from a pre-trained foundation model.

play11:24

That's where prompt engineering can really help, right?

play11:26

I'm passing some specific context or task or instructions,

play11:30

right, as part of my prompt engineering,

play11:33

and that in many times can be extremely effective.

play11:37

And lastly, if I have historical data

play11:40

that's maybe a bit more complex as I mentioned,

play11:43

that you know, is task specific

play11:46

and needs a little bit more task training,

play11:50

that's where model fine-tuning, you know,

play11:53

serves a really important purpose.

play11:55

And you might have heard today in the keynote

play11:57

that we've announced the ability for fine-tuning

play12:00

for foundational models

play12:02

for all the Bedrock foundation models

play12:04

with the support of fine-tuning

play12:06

for Anthropic Quad coming soon.

play12:10

And so let's deep dive into RAG and really understand

play12:13

what is retrieval augmented generation.

play12:18

- So, so far what Ruhaab has covered

play12:21

is how do we, why do we need to customize

play12:25

and provided us some really good prescriptive guidance

play12:28

on when to customized and how you can work backwards

play12:32

from your particular use case.

play12:34

But now let's understand

play12:36

what is retrieval augmented generation?

play12:40

And as the name suggest, the first part is retrieval

play12:46

where you have to retrieve the information,

play12:50

relevant information and then augment it

play12:54

with your original query, pass that to the foundation model

play12:59

to generate the accurate response.

play13:02

Now there's so many aspects to it.

play13:05

Now just imagine if you have

play13:07

this large amount of information, right?

play13:10

And then you say, okay,

play13:12

let me just add everything to the model, everything,

play13:17

what will happen?

play13:18

Multiple things might happen.

play13:20

First of all, your input size that the model can take,

play13:25

which we call this context length, might not be enough

play13:28

and you might get errors.

play13:30

Second, just imagine if somebody throws us,

play13:33

like throws a lot of information on us,

play13:35

as human beings, we'll also be like, "Oh,

play13:37

let me pick up the relevant one to answer the question."

play13:40

How do I do that?

play13:41

It takes us time, right?

play13:43

The same goes for the model.

play13:45

So what we need to do is provide relevant information

play13:49

and that's where the retrieval part becomes super important.

play13:53

So when we retrieve the relevant information

play13:55

from our large knowledge data

play14:00

and then provide that relevant context to the model, right?

play14:05

So that relevant context,

play14:07

we augment it with our original query

play14:10

so that the model knows the question as well.

play14:12

And then we feed that to the model and then the model,

play14:15

that helps the model to generate responses.

play14:18

And the prompt engineering

play14:20

also plays a very important role over here

play14:22

because we might want to add more instructions

play14:26

to the model based on our use case.

play14:29

So let's take a look at the use cases

play14:32

for retrieval augmented generation.

play14:35

So the first use case that comes into mind

play14:38

when we think about RAG

play14:40

is really improving the content quality.

play14:45

How are we improving this content quality using RAG?

play14:50

Is by reducing hallucinations.

play14:53

So for example, as Ruhaab mentioned, right?

play14:56

These models, when we are talking about pre-trained models,

play14:59

they're really big, they're trained

play15:00

on really big amount of data, but that data

play15:05

was somewhere in some point in time, right?

play15:10

That might not be very recent, so that's number one.

play15:14

Oh, I don't have the recent data with the model,

play15:17

and the model can act super intelligent

play15:20

and provide you with incorrect answer

play15:23

if you ask recent information

play15:26

on which it was not trained on.

play15:29

So in order to improve those quality of the answers

play15:33

or the responses and remove hallucinations,

play15:36

that's where we can use

play15:38

retrieval augmented generation technique

play15:40

to improve it, right?

play15:42

Now we have covered that part,

play15:44

but what if I want the model to answer

play15:48

only based on my knowledge or on my enterprise data.

play15:53

I don't want it to provide me answers

play15:56

from its own knowledge.

play15:57

I want to use the intelligence of this model

play16:00

and channel it only and make it focus only towards my data.

play16:06

That's where there's, you know, the applications

play16:09

such as context-based chatbot

play16:12

and the Q&A comes into play, right?

play16:15

So you can use RAG technique to build those applications.

play16:21

The third one is personalized search.

play16:24

Why do we want to limit to question and answering,

play16:27

why not add this technique,

play16:28

because we are any ways retrieving the relevant content

play16:32

and provide maybe augment our recommendation engine

play16:36

and to create such type of application

play16:39

based on my profile as a persona

play16:44

that I might have in my preferences

play16:46

or based on my previous history.

play16:49

For example, if I'm on the retail side,

play16:52

I bought certain products,

play16:54

there is a history which is already there,

play16:56

what if I want to use that along with my preferences

play17:00

and show recommendations to my users?

play17:03

So you can do that using the RAG technique as well.

play17:08

And the last one is super close to me

play17:12

just because it, the way it works.

play17:15

So I wrote a book on applied machine learning

play17:18

and high performance computing

play17:20

in December 2022 it was published,

play17:23

and somebody gave, and you know, at that time

play17:27

this generative AI was also getting popular,

play17:29

so somebody posted a review using generative AI

play17:34

trying to summarize the book,

play17:35

which was approximately 400 pages.

play17:38

Now just imagine if you do that,

play17:41

that was a really cool thing,

play17:42

I really liked it by the way,

play17:43

but it was missing the key points.

play17:46

So how about using RAG techniques

play17:51

to do the text summarization as well?

play17:53

Or maybe I just want a summary of a particular chapter

play17:55

that I'm interested in, right?

play17:57

And make sure that it has all the key points

play18:00

so we can use RAG techniques to do that as well.

play18:05

And then when we have talked about the use cases,

play18:09

how do we use different types of data, right?

play18:14

You might be dealing with different types of data sets

play18:17

or different types of retrievals can also happen, right?

play18:20

So what technique should I use?

play18:22

The one is can be simply based on certain rules

play18:26

or the keywords or the phrases

play18:28

and I fetch the documents, it works for me,

play18:30

yes, let's use it, right?

play18:33

So we have to work always backwards from our use case

play18:36

or the data that we have on hand.

play18:39

The second one is I might have a lot of structured data,

play18:43

maybe imagine a use case,

play18:45

and this is actually something that we have already built

play18:47

with some of our customers.

play18:48

Imagine a use case

play18:49

where there is a natural language question,

play18:53

but I have my data in a, let's say, analytical database

play18:56

or a data warehouse or transaction data,

play18:58

it can be anything, right?

play19:01

And then based on that natural language question

play19:05

we use the foundation model to create a query,

play19:09

generate a query, and that query runs on your database,

play19:13

gives the results back,

play19:15

and then we use the foundation model

play19:17

to synthesize those results to provide you the answer

play19:20

of your original query, right?

play19:24

So you as a user get a full experience

play19:26

that I asked a question, I got the result,

play19:28

but behind the scene,

play19:29

you know, so many things were going on.

play19:32

So that can be one approach.

play19:35

The third one is semantic search.

play19:38

And this I would like to explain with an example

play19:40

because it really takes me back to high school,

play19:44

or even before that actually elementary school.

play19:47

So when I was in school,

play19:50

there used to be a reading comprehension

play19:53

where I was given this passage

play19:55

and then there was certain questions

play19:57

that we had to answer based on the passage, right?

play20:01

So as a kid I was like, oh, I'm smart,

play20:06

I'm not going to read the whole passage,

play20:08

I'm going to use the keywords in the question.

play20:11

I look up those keywords in the passage,

play20:13

they're just two, three paragraphs, four paragraphs

play20:16

and I'll be fine.

play20:18

So I used to get 10 on 10 every time

play20:20

and up until elementary school,

play20:22

some part of the middle school,

play20:24

and by the time I reached high school

play20:26

that 10 on 10 actually reduces to 3 on 10, 4 on 10,

play20:31

based on how lucky I was.

play20:33

The reason was as I was growing, these passages

play20:38

that were provided to us were becoming complicated.

play20:41

The questions that were asked were tricky,

play20:43

they were not anymore based on the keyword.

play20:46

I literally had to understand the question,

play20:49

I had to understand what the author is trying to say

play20:53

at a high level before I can even attempt an answer, right?

play20:57

And that's where the semantic search

play20:59

for machines come into play.

play21:01

So understanding the meaning of the text

play21:05

and then providing you the answer, right?

play21:07

So that's the third kind of retrieval

play21:10

and we'll be mostly focusing

play21:11

on this third kind of retrieval today.

play21:15

So in order to do semantic search

play21:18

or the meaning of the text, it looks lovely, right?

play21:21

But when I have to do it and I have to do it,

play21:23

like I'm not doing it like the machine is doing it, right?

play21:27

So then what do I do?

play21:29

How we, what we nearly need to do

play21:32

in order to enable the machine to do it.

play21:36

So what we really need to do over here

play21:38

is convert our text into numerical representations.

play21:43

Now why do we need to convert the text

play21:45

into numerical representations?

play21:47

Because we want to find the similar documents

play21:50

or the text based on the question that is coming in,

play21:53

and I'll double click on the numerical part in a moment,

play21:57

but we have to convert the numerical representations

play22:01

in such a manner that it's able to retain

play22:04

the relationship between the words, right?

play22:08

If it's unable to retain the relationship between the words,

play22:12

then it won't be meaningful to me, or the machine, right?

play22:15

Then the purpose is not solved.

play22:17

So the selection of the embeddings model,

play22:20

because you're not going to do it yourself, right?

play22:23

You will use an embeddings model feed in the text

play22:26

that will convert into numerical representations

play22:29

that will maintain the meaning and the features

play22:33

and the relationship between these words.

play22:36

So that's how, you know, if you have to do semantic search,

play22:40

you need an embeddings model, you need to convert

play22:42

into numerical representations, your query will come in

play22:46

and then you provide, it will fetch

play22:48

the relevant results based on that.

play22:51

So how does, is it helping me in briefly,

play22:54

if we have to summarize it is helping me

play22:57

to fetch the results based on the meaning,

play23:01

it is helping me because I'm getting accurate context.

play23:05

If I have accurate context

play23:07

and I'm feeding accurate context to the model,

play23:09

I'm getting accurate results, right?

play23:12

So look at how we are connecting

play23:14

each and every dot over here, right?

play23:17

So selecting, so first you have your data,

play23:21

then you have to split the data into chunks

play23:24

so that you can create embeddings

play23:26

and the high quality of the model,

play23:27

which will create the embeddings,

play23:29

will influence your response, the retrieval,

play23:32

and the retrieval will influence your response, right?

play23:36

So that's the reason why embeddings are important.

play23:39

So which model to select.

play23:42

And for that I'll hand it over to Ruhaab.

play23:45

- Thanks Mani,

play23:46

so embeddings might seem like a complicated process

play23:49

and you know, the thought

play23:51

of actually building an embeddings layer

play23:53

seems a little daunting,

play23:55

which is exactly why we launched

play23:57

the general availability of the Titan text embeddings model.

play24:01

And we actually launched this in September

play24:04

and the Titan embeddings model

play24:06

is optimized for text retrieval use cases and RAG use cases

play24:11

and is available in over 25 different languages

play24:15

such as English, Spanish, and Chinese.

play24:20

And because Titan text embeddings is accessible

play24:23

via Amazon Bedrock serverless experience,

play24:26

you can access this model with a single API

play24:29

without having to manage any infrastructure, right?

play24:31

It's so easy, right?

play24:32

You think about just pointing towards a model,

play24:34

passing it the context and getting the embeddings built,

play24:39

right, it's so incredibly easy to use through a single API.

play24:43

Okay, so now that we know what embeddings are

play24:48

and have some foundational knowledge of RAG,

play24:51

let's really understand

play24:52

what's happening underneath the hood, right?

play24:54

Like what enables this from a technological perspective.

play24:58

And before you can actually ask questions about, you know,

play25:02

your data, the data actually has to be optimized

play25:06

for a RAG use case.

play25:08

And this is the data ingestion layer

play25:11

and there's a workflow corresponding to that layer

play25:14

and we'll go right to left in this workflow.

play25:16

So it starts with your external data sources, right?

play25:18

Your company data.

play25:21

This could live in S3

play25:23

and it could be in different file formats

play25:25

or it could even be PDF documents and unstructured data.

play25:28

We take this data

play25:30

and then go through a process called chunking.

play25:32

And chunking really is just the splitting off that data

play25:37

into different segments, which is really useful

play25:40

for optimizing for things like relevancy.

play25:43

And then these chunks

play25:45

are then passed into an embeddings model,

play25:48

such as Titan text, and then ultimately stored

play25:52

in a purpose-built vector database,

play25:55

which is really optimized

play25:56

for indexing and retrieval of embeddings

play26:00

and can maintain the relationship and semantic meaning

play26:05

that you get through an embeddings model

play26:08

and once you go through this data ingestion workflow,

play26:12

you're now ready to ask questions

play26:14

and really see the true power of RAG.

play26:19

This brings us to the text generation workflow

play26:22

and it starts with a user asking a question.

play26:25

So that question or query is then also, it goes through

play26:30

that same embeddings model

play26:32

to turn that question into a vector.

play26:35

And that vector then is searched

play26:38

in that same vector data source

play26:40

that allows us to do things

play26:42

like vector similarity search, right?

play26:44

So where you don't have to ask questions

play26:47

in that same rigid keyword context

play26:50

where we can actually extract meaning

play26:51

and look at similar aspects of that question

play26:54

and how it might relate to documents.

play26:55

And that's the real power of semantic search, right?

play26:59

Is really looking at that relationships

play27:01

and understanding meaning more deeply.

play27:04

So once we get that search result,

play27:06

that's the retrieval part, right?

play27:08

We're retrieving that data from the vector database

play27:12

and then we're passing that context

play27:15

into the prompt for a foundational model.

play27:19

So we're augmenting the prompt with these return passages

play27:24

and that's the augmentation part, right?

play27:26

That A, we're augmenting the prompt

play27:29

and then ultimately this large language model,

play27:32

the foundation model

play27:33

is generating that final response, right?

play27:35

And that's the G part.

play27:39

And this workflow as you might imagine,

play27:42

it can be fairly cumbersome, right?

play27:44

And there's so much inherent complexity

play27:47

associated with building a, you know,

play27:50

a complete RAG application.

play27:52

You have to manage multiple data sources,

play27:54

you have to think about which vector database to use,

play27:58

how do I make incremental updates to that vector database?

play28:02

And it requires, actually, a lot

play28:03

of different disciplines, right?

play28:04

You need help from data scientists

play28:06

and data engineers and infrastructure engineers.

play28:09

Info about scaling and DevOps,

play28:10

and a lot of this can seem daunting,

play28:15

open source frameworks such as LangChain

play28:17

have made this a little bit easier,

play28:19

but it still requires a considerable amount

play28:22

of development and coding.

play28:24

And so how might we completely abstract away

play28:27

all this complexity?

play28:32

- And that's where we have

play28:35

Knowledge Bases for Amazon Bedrock,

play28:38

where we want to implement RAG or build applications

play28:45

based on this RAG architecture that we just saw,

play28:49

but in a very fully managed way,

play28:51

so that you can focus on solving the business use cases

play28:56

or the problems or you know, working on it.

play28:58

And we take away all the undifferentiated

play29:01

heavy lifting from you.

play29:03

So how is Knowledge Bases for Amazon Bedrock

play29:07

going to help you?

play29:08

So first of all, it provides you

play29:11

with the data ingestion part that we just saw, right?

play29:14

So it will automate a lot of those things

play29:16

and we'll see that in a moment.

play29:18

The second part is

play29:20

it will securely connect the foundation models

play29:24

and even Agents for Bedrock with these knowledge bases

play29:29

or your data sources, right?

play29:31

The third is retrieval, right?

play29:33

How we can easily retrieve the relevant data

play29:38

and then augment our prompts.

play29:41

So it will help you do that.

play29:42

So we have features and we did recent announcements

play29:46

and then we'll be doing a deeper dive on those.

play29:49

And the last one is source attribution.

play29:53

I don't trust anyone to be honest.

play29:56

I'm just kidding, I trust a lot of people. (laughs)

play29:59

But when it comes to machines, we need proof

play30:01

and that's where source attribution comes into play.

play30:04

That how do I know that my foundation model

play30:07

is giving me the right response

play30:09

because the response is based on these data sources

play30:12

that I was providing, right?

play30:14

So let's take a look.

play30:15

Let's dive deep into the data ingestion workflow first,

play30:19

because if you don't have the data in the vector DB,

play30:21

you cannot really do the retrieval

play30:24

augmentation and generation.

play30:26

So the first part is data ingestion workflow

play30:29

that we just saw, right?

play30:30

In this case we are moving from left to right.

play30:34

So you have new data and then the data sources, chunking,

play30:39

embedding models, storing into vector DB, right?

play30:42

Imagine you have to implement

play30:44

each of these things on your own.

play30:48

First of all, you would need resources

play30:51

who can code really well.

play30:52

Second, you can code

play30:54

and then you have to do the maintenance of the code.

play30:57

You might want to use open source frameworks,

play30:59

which is great, but sometimes then you have to think about

play31:02

the versioning piece of it, right?

play31:04

So there's a lot that goes into it.

play31:07

And then you also have to learn specific APIs

play31:09

for the vector store that you're using.

play31:11

What if we say and change everything

play31:13

by providing you an option to and reducing it

play31:19

to the choices.

play31:21

What if we say choose your data source

play31:24

and in this case we support Amazon S3 as a data source.

play31:28

So you select your bucket

play31:30

in which you have your documents, right?

play31:33

And we provide support for incremental updates.

play31:35

As in, when your new documents are coming in,

play31:37

all you have to do is start the ingestion job, sync, right?

play31:42

And then multiple data formats.

play31:45

You don't have to really, you know,

play31:47

worry about the different data formats

play31:49

because with Knowledge Bases for Amazon Bedrock

play31:52

we provide support for PDFs, comma separated files, CSV,

play31:58

Excel documents, Word documents, STML files, Markdown files,

play32:03

and I think that was pretty much it

play32:05

and the text files as well, right?

play32:08

And the list may grow as we move along.

play32:11

So we have support for a lot of these file formats.

play32:15

So you can literally have your data and then upload it on S3

play32:20

and add it as a data source.

play32:23

Then we provide you an option where you can do chunking,

play32:27

like splitting your documents.

play32:29

You might say, you know what,

play32:31

I don't want to choose anything

play32:32

because I might not be aware of those things.

play32:35

That's fine too, so we have default chunking option,

play32:38

which is defaulted to 300 tokens with 20% overlap.

play32:42

So you don't have to choose if you don't want to, right?

play32:45

But if you want to

play32:46

and if you want to have a particular fixed chunks

play32:49

that you are interested in, you can provide those as well.

play32:52

So the second option that we have is the fixed chunking.

play32:55

You provide the number of tokens

play32:57

for each chunk of text that you want and an overlap,

play33:02

we recommend having it between 10 to 20%.

play33:05

And then choose your embedding model,

play33:08

so right now we support Amazon Titan embeddings model

play33:12

and Ruhaab has already covered that,

play33:13

so I will not repeat that and it's important.

play33:17

Just one thing that I want to double click over here

play33:19

is when we say it supports 25 languages,

play33:23

it's very important aspect

play33:25

because remember when I was talking about the embeddings,

play33:28

these embeddings, when we say numerical representations,

play33:31

they are maintaining the relationship.

play33:34

If the model doesn't understand the language,

play33:37

it won't be able to maintain the relationship

play33:39

between the words, right?

play33:40

So it is important

play33:42

that if your text is in a different language

play33:45

that your model should know about this,

play33:47

your embeddings model should know about it

play33:49

and should be able to maintain that relationship.

play33:54

The next part is the vector store.

play33:56

So we are providing you options over here whether you want

play33:59

to use OpenSearch Serverless, Redis or Pinecone, right?

play34:03

So we have options over here

play34:05

and all of this, you would do the choices,

play34:08

you click the create knowledge base button

play34:10

or if you're using SDK, that's a create knowledge base API

play34:14

and everything is taken care for you.

play34:17

It's automated and fully managed data ingestion

play34:22

using Knowledge Bases for Amazon Bedrock.

play34:27

So now we have our data ingested and ready to use.

play34:30

So the next step is now how does my architecture looks like?

play34:35

It looks like something like this, right?

play34:37

We have knowledge bases now, the data is ready

play34:39

but we still have to query, which we will provide the query,

play34:44

create the embedding, retrieve the context,

play34:47

augment the prompt, provide the foundation model,

play34:50

still do the prompt engineering

play34:52

and then get the response, right?

play34:55

So we still have to do a lot of work.

play34:58

What if we eliminate that

play35:00

and takes away some of that heavy lifting as well.

play35:03

So with that, we recently announced

play35:07

two more features, or the APIs.

play35:09

One is retrieve and generate,

play35:11

which will literally retrieve the relevant documents,

play35:15

feed it to the model and give you the response.

play35:18

The second one is retrieve API

play35:20

if you need more customization options.

play35:22

So let's take a look.

play35:25

This is how it'll look like, your whole architecture.

play35:29

The user is asking a question

play35:31

you call retrieve and generate API and you get the response

play35:35

and this retrieve and generate API does the work for you.

play35:40

It'll take your query, create the embeddings

play35:43

with the embeddings model,

play35:44

it will then augment it to your prompt

play35:47

and then it will feed it to the model that you select.

play35:49

Currently we support two models,

play35:51

Claude Instant and the Claude version 2 by Anthropic.

play35:55

So we support these two models that you can select

play35:58

and get the generated response, right?

play36:02

Pretty cool, and then if you say "This is good,

play36:06

but I need more control," right?

play36:09

"I don't want to do the heavy lifting

play36:11

but I still want control, I still want to customize a bit."

play36:15

That's where we have our second API,

play36:18

which is the retrieve API where we enable you

play36:21

and provide you the flexibility as well.

play36:25

Over here we are still helping you

play36:28

where you have your query, it will be, the retrieve API

play36:33

will automatically use the embeddings model,

play36:36

create the embedding for your query

play36:38

and provide you the relevant data

play36:40

or the relevant documents, right?

play36:43

Now what you have to do once you get the relevant documents

play36:46

is do the prompt augmentation, you have flexibility,

play36:50

what instructions you want to provide to your prompt

play36:52

based on the model

play36:54

and literally use any model provided by Amazon Bedrock

play36:59

or maybe, you know, you might have a custom model

play37:01

or fine-tuned model that you were working

play37:04

within the Bedrock system that you want to use

play37:07

with retrieve API, you can do that, right?

play37:10

So we have options

play37:12

and we still want you to take full control

play37:16

of your application, your decision points,

play37:19

which really impact the answers that you are getting, right,

play37:24

from your application.

play37:26

So these are very important concepts.

play37:30

Enough of talk, right?

play37:32

Let's see something in action.

play37:34

Let's see how it looks like in the console.

play37:36

So the demo part now,

play37:42

And I'll share my, now you can see my screen.

play37:45

So where I am is basically I'm on the console

play37:48

and I've searched Amazon Bedrock and then this is Bedrock.

play37:52

And then I have to go to knowledge bases,

play37:54

which is literally under the orchestration,

play37:57

so we click over there,

play37:59

and then it talks about

play38:01

what all you can do from the console.

play38:04

You can create a knowledge base, test the knowledge base

play38:06

and then maybe use it, right?

play38:08

So we'll go through that.

play38:10

In order to make sure,

play38:12

because you know we have limited time,

play38:13

I've already created a knowledge base,

play38:15

but I'll still walk you through

play38:16

if you have to create what you will need to do.

play38:19

So the first part when you see over here

play38:22

is you have to create a knowledge base.

play38:24

So you click on the create knowledge base button.

play38:27

And by the way, whatever I'm showing you over here,

play38:29

you can do it via SDK as well.

play38:32

And then you provide a name.

play38:34

I would would suggest that you provide

play38:36

a very meaningful name over here

play38:38

because you might end up having a lot of knowledge bases

play38:41

and you don't want any confusion,

play38:44

also add a meaningful description.

play38:47

And then you need permissions in the role, right?

play38:50

Because when we were talking about knowledge bases,

play38:54

knowledge bases will be accessing your data in S3

play38:57

and then we'll be storing it, creating the embeddings,

play39:02

so they need access to the embeddings model as well.

play39:04

And also they will be storing the embeddings

play39:07

into the vector DB.

play39:08

So they need access to those as well.

play39:11

So make sure that your Amazon Bedrock execution role

play39:15

for knowledge base has all those permissions.

play39:17

And if you're unsure how to do that,

play39:20

simply select create and use a new service role option

play39:25

so that it's automatically created for you.

play39:28

And then we go next, data source.

play39:31

So provide a meaningful data source name,

play39:33

provide an S3 location.

play39:35

I'm just going to type,

play39:38

this is not a existing S3 bucket to be honest,

play39:42

I just provided the name for the demo purpose.

play39:45

And then additional settings,

play39:47

this is where you get to select your chunking strategy.

play39:51

So you can select three options as I mentioned earlier,

play39:54

default, fixed size, no chunking, right?

play39:57

So you have options over here as well.

play39:59

Let's do fixed size and then I can select,

play40:02

maybe I want to do 512.

play40:05

And typically your overlap should be around 10 to 20%,

play40:09

that's what our recommendation is.

play40:13

So since we only support right now

play40:15

Titan embeddings model, so that's there.

play40:17

And then if you say that, you know what,

play40:20

I don't want to create a vector DB,

play40:23

I want you to create a vector DB on my behalf

play40:26

because we attended that talk

play40:27

and it said fully managed RAG, right?

play40:30

So that's where we have that option

play40:32

that you can select this quick create

play40:34

which will automatically create a vector DB

play40:37

and it will create an OpenSearch serverless.

play40:41

So you can choose that.

play40:43

But you know what, again, we have to give you options.

play40:47

What if you have an existing vector database

play40:50

or an index that you want to populate.

play40:52

So you can literally, if you have an index

play40:54

in OpenSearch Serverless, Pinecone

play40:57

or Redis Enterprise Cloud,

play40:59

you literally provide the details about those

play41:02

and then go next, that's it.

play41:04

- [Ruhaab] And you might have heard

play41:06

in the announcement today that we will be supporting

play41:08

new vector database types soon,

play41:12

including Aurora and MongoDB

play41:14

with likely more vector database options coming.

play41:18

- Yeah, so stay tuned.

play41:20

Okay, and then you review your setup

play41:22

and click on create knowledge base, right?

play41:25

So because we want to be cognizant of the time,

play41:28

we already have a knowledge base.

play41:29

Now this knowledge base is based on the text documents.

play41:33

So when you have created the knowledge base,

play41:35

you will actually land up over here.

play41:37

You will not land on there.

play41:38

It's only when you go back to the knowledge bases

play41:41

where you can see the list.

play41:42

So once you create, you will be here.

play41:44

And then most important point

play41:47

is once you create a knowledge base,

play41:48

you have to click on the sync button,

play41:50

this is very important.

play41:52

Because when we are saying we created a knowledge base

play41:55

that was good but we have to sync.

play41:58

Sync is the actual thing, when you do that,

play42:00

it basically, it will look up

play42:03

all your data that you have in S3,

play42:07

it will pre-process those documents,

play42:10

extract the text out of it,

play42:12

split it into chunking strategy that you provided

play42:16

and then pass it through the embeddings model,

play42:19

then store it in the vector DB.

play42:21

So that's the sync thing.

play42:22

And when you have, let's say new files in your S3,

play42:26

you press the sync button again

play42:27

or call it SDK start ingestion job, right?

play42:30

So it will literally make sure that everything is in sync.

play42:35

So you need to do that, I've already done that.

play42:38

And then you want to test it, right?

play42:41

So if you have generate responses,

play42:43

meaning we are using generate, retrieve and generate API

play42:47

or you can use when you untoggle this

play42:50

then it's only retrieval.

play42:51

So let's start with generate responses.

play42:53

I first need to select my model

play42:58

and then I can select either Instant model or Claude v2,

play43:03

We have 2.1 as well that was also recently announced.

play43:08

Okay, hold on.

play43:13

Yeah, it was too much zoomed out. (laughs)

play43:17

And then you select the model

play43:18

and then you can ask the question.

play43:20

Now since my documents are based on tax data,

play43:24

my knowledge base has all the tax related data set,

play43:27

then I can ask a question.

play43:30

So what I'm asking is "If I work remotely,

play43:33

which state do I owe taxes," right?

play43:36

I mean I just selected that because a lot of us, you know

play43:39

during the pandemic we were working from home.

play43:41

So I was like why not ask something like that?

play43:44

And I know a lot of us are now back to office,

play43:47

which is also cool.

play43:50

Okay, now you click on the show result details

play43:53

and notice some important things.

play43:55

First of all, it is giving me the response very quickly.

play43:59

Second, I can literally see the source attribution

play44:04

right on my screen, right away.

play44:08

So important points.

play44:10

If I work remotely but your employer

play44:13

is located in a particular state,

play44:15

you may owe income taxes to that state

play44:17

and I'm not going to read the entire thing.

play44:19

So, and then if you have to look up the source

play44:22

that the model used is basically right over here, right?

play44:25

So it provides you

play44:26

and if there are multiple sources, you will see

play44:29

multiple tabs over here and I'll show you in a moment.

play44:33

You can literally go to the location

play44:34

of that document as well.

play44:37

So this was about how we were doing retrieve and generate,

play44:41

what if I just want to retrieve, right?

play44:45

Let me ask the same question

play44:46

because it will just make it easier for us to go through.

play44:50

By default it'll retrieve top five documents,

play44:53

top five most relevant documents.

play44:56

And then I can go show details and I'll look at it.

play44:59

So I'm seeing this particular chunk from this p17.pdf

play45:05

and then another chunk from another PDF

play45:07

and another chunk from another PDF,

play45:09

and this retrieve API also gives you a score.

play45:13

So based on which vector DB you are using

play45:16

and which score you are using for that vector DB

play45:19

for example, if you're using cosign similarity,

play45:22

so the score will be based on that.

play45:24

If you're using using Euclidean distance,

play45:26

it will be based on that, right?

play45:28

So the score option is also there.

play45:31

So this was about how you can, you know,

play45:34

use it on the console.

play45:36

And we also have another demo where I will show you the APIs

play45:40

and how we can do the integration with LangChain.

play45:43

But the important point is if I have to build

play45:46

retrieval augmented generation applications

play45:50

with Knowledge Bases for Amazon Bedrock and use those APIs,

play45:54

you can literally do that end-to-end

play45:56

using the features that we just talked about.

play46:00

But what if I want to,

play46:02

I have some dynamic responses, right?

play46:06

Sorry, what if I have some dynamic information

play46:09

that I need to fetch

play46:09

in addition to what I have in my knowledge bases?

play46:12

Maybe I have a knowledge base

play46:14

which has lot of order details,

play46:16

but I also want to call some order API,

play46:19

which gives me the status of my existing order,

play46:21

which is in transit, right?

play46:24

Or do multiple things around that, right?

play46:26

So what do I need to do

play46:28

if I want to integrate the knowledge bases

play46:31

with let's say agents or other ecosystem of Amazon Bedrock?

play46:36

So Ruhaab, over to you, let's please walk us through it.

play46:41

- Great, if we could just go back to the slides please.

play46:46

So as Mani mentioned, you saw how easy it was

play46:53

to get started in terms of uploading documents

play46:56

into vector database and begin

play46:58

immediately interacting with those questions, right?

play47:01

Just in the matter of a few steps,

play47:03

you had a fully functioning RAG application

play47:05

and if you recall earlier, the information

play47:09

that we're storing in that document is relatively static

play47:11

and even though it's syncing maybe behind the scenes

play47:13

in some cadence, there's going to be times

play47:15

where you need an application interact real time

play47:19

with databases or even other tools and systems.

play47:23

And this is what Agents for Amazon Bedrock

play47:27

is really built to do

play47:29

and a knowledge base can work directly with an agent

play47:33

to enable that use case.

play47:36

And so if you think about the real power of a agent

play47:40

is they're very specific, you know, models

play47:45

that are used for planning and executing tasks,

play47:49

leveraging reasoning capabilities

play47:52

such as chain of thought processing

play47:55

and these state-of-the-art approaches

play47:58

are great when you want your application

play48:03

to interact with an API,

play48:05

automatically generate the dialogue with the user

play48:09

to collect the information needed

play48:11

to execute that API in action.

play48:13

You're not having to find the conversational flow,

play48:16

the parameters of the API can be automatically collected

play48:21

with this model asking for those bits of information

play48:25

to fulfill the required arguments

play48:27

of calling an API or interacting with the tool

play48:30

and orchestrating those actions.

play48:33

And agents, as I mentioned,

play48:36

can also be combined with knowledge bases.

play48:39

And you'd want to do this

play48:41

when you're looking to combine actions

play48:44

and those information retrieval type use cases

play48:46

where you're just simply fetching context from a document

play48:49

and using that as supplemental information

play48:52

of interacting with the tools.

play48:55

And lastly with agents and knowledge base,

play48:59

again, all of this, all of these workflows

play49:02

are completely abstracted away as a fully managed service

play49:06

built directly in Amazon Bedrock.

play49:11

So let's take an example of how this might work

play49:15

when Knowledge Base and Agents need to work together.

play49:19

So as I mentioned, Agents can orchestrate

play49:23

the user requested task, so let's take an example

play49:27

where you are asking this application to send a reminder

play49:33

to all your policy holders with pending documents, okay?

play49:38

And what happens is that this model

play49:41

can really plan for that execution of that task

play49:45

by breaking this down into smaller subtasks

play49:50

planning that such as, you know,

play49:52

getting the claims from a certain time period.

play49:56

It may have to identify what is the paperwork even required

play50:02

in this process, which might be, as I mentioned,

play50:04

the knowledge that's in your knowledge base.

play50:08

And then ultimately sending that reminder

play50:11

so the agent can determine the right sequence of steps,

play50:15

facilitate the dialogue, collect the information,

play50:20

and even handle error scenarios along the way.

play50:23

So an incredibly powerful ability to orchestrate

play50:26

across a dynamic sequence of actions across, you know,

play50:30

knowledge bases and APIs and tools

play50:33

and really offer a seamless experience.

play50:35

So something you know, as simple as asking for a machine

play50:40

to send a reminder to all policy holders

play50:43

with pending decisions, you can see the complexity

play50:47

that's abstracted away to, you know,

play50:49

really make that use case possible.

play50:52

And now we'd like to show you

play50:54

how you can also use open source generative AI frameworks

play50:59

like LangChain to build knowledge bases.

play51:03

And I'll have Mani walk us through another demo.

play51:09

- Yes, and for that I'll be sharing my screen, so okay.

play51:16

In this particular demo I'll be using the APIs

play51:19

because a lot of us here might love the APIs

play51:23

and the SDK experience

play51:24

in addition to the console experience as well.

play51:27

And also LangChain provides you

play51:30

with a lot of wrappers which are pre-built.

play51:33

And why do we need to reinvent the wheel

play51:36

when we have something out there and we want to reuse it

play51:41

but we want to reuse it with the latest features

play51:44

that we just showed you, right?

play51:46

So let me take you to this quick journey.

play51:49

So first of all, make sure that you have

play51:52

the latest Boto3 versions

play51:54

and you have the latest LangChain version.

play51:57

So it has to be equal to a greater than the versions

play52:00

that I'm showing over here.

play52:01

For LangChain it's 0.0.342

play52:06

for Boto3, it's 1.33.2, right?

play52:10

So make sure you have equal to

play52:13

or greater than these versions.

play52:15

Now the first thing that you need to do

play52:18

is basically provide setup, right?

play52:21

So as with any AWS service,

play52:24

when you are wanting to use the APIs

play52:26

you first have to create the client.

play52:30

So for Bedrock and in this particular case

play52:32

we need two client, one is the Bedrock runtime client,

play52:36

which helps us call the invoke model.

play52:39

The second is Bedrock agent runtime using which we will call

play52:43

the retrieve API for the Knowledge Bases for Amazon Bedrock.

play52:48

So this is what we are doing it over here

play52:51

providing some model parameters

play52:53

because remember this is retrieve API

play52:56

and you can connect to any model provided by Amazon Bedrock.

play52:59

So that's what we are doing.

play53:01

You can, you provide the model parameters,

play53:03

you select your model.

play53:06

Now the actual retrieval.

play53:09

So if you are planning to use the retrieve API

play53:12

with LangChain, you will need to first

play53:16

initialize a retriever object.

play53:18

So you have to import Amazon Knowledge Base retriever

play53:23

from LangChain and then use it.

play53:26

So what do I need to pass?

play53:28

I need to pass the number of results,

play53:31

the relevant documents that I want, right?

play53:34

So that's what I'm providing it over here

play53:36

and the knowledge base ID

play53:37

because how will this know which knowledge base

play53:40

to get information from, super important, right?

play53:43

And let me show you how you will get the knowledge base ID,

play53:47

because if you are using SDK,

play53:50

then you will automatically get it

play53:52

as a response from the API and you can leverage it.

play53:55

If you're using it from the console,

play53:57

then you click on the knowledge base

play53:59

and that's where you get the knowledge base ID.

play54:02

So you can literally copy it

play54:03

and I'm using the same knowledge base over here as well.

play54:06

So a quick thing, I just wanted to point that out, right?

play54:09

So now you have the relevant documents.

play54:11

Now what we are going to do

play54:13

is if I'm building a Q&A application with LangChain

play54:17

provides you with a retriever QA chain,

play54:20

and all I need to do is,

play54:22

I've already declared the large language model,

play54:24

I've already declared my retriever,

play54:27

let's use the retrieval QA chain, pass everything together

play54:31

and then keep asking questions, right?

play54:33

Let's move to that part now.

play54:35

And this is just showing that you know,

play54:37

if you just want to retrieve the documents,

play54:40

get relevant documents, you can do that.

play54:42

But if you're integrating it with retrieval QA chain,

play54:45

you don't need to do that to be honest,

play54:47

all you need is this retriever object.

play54:49

So let's take a look how we integrate.

play54:53

Okay, so that's where we have the retrieval QA chain.

play54:58

Now I provide my language model

play55:00

which will give me the response,

play55:02

then I provide my retriever object,

play55:04

which will give the relevant documents

play55:05

and I also provide the prompt.

play55:07

So now I have flexibility that I can provide my own prompt,

play55:10

I can provide my own instructions,

play55:12

and this retrieval QA chain will automatically augment

play55:16

the relevant documents with my prompt.

play55:18

And just so that you are aware,

play55:21

I just wanted to show you the prompt template as well.

play55:24

So you can provide specific instructions

play55:26

and model specific prompting.

play55:28

So it's very important you can literally say

play55:31

the model only provide information

play55:35

based on the documents, right?

play55:36

And based on this relevant documents.

play55:39

So based on your use case, you can provide instructions.

play55:42

And then once you have integrated it

play55:45

with your retrieval QA chain,

play55:47

you literally provide the query

play55:49

to this QA object that you have created

play55:51

and it will keep giving you answers.

play55:53

So you don't have to initialize it over and over again.

play55:56

You can literally ask multiple queries, get the answers,

play55:59

multiple queries, get the answers,

play56:00

and now you have a running Q&A application

play56:04

with just three things, initializing your model,

play56:08

initializing the Amazon knowledge base retrieval

play56:10

with LangChain and then retrieval QA chain,

play56:12

passing everything together

play56:14

and we have the application ready, right?

play56:17

So, and you can use the same pattern

play56:19

if you want to build a contextual based chat bot

play56:21

with the conversational chains

play56:23

that LangChain provides, right?

play56:25

So do explore and if you are interested

play56:27

in looking through the same code, we have it on GitHub,

play56:30

we'll share the resources with you.

play56:32

So Ruhaab, can you just do a recap for us?

play56:35

- Yeah, absolutely, thanks Mani,

play56:36

and if you go back to the slides

play56:38

and you know, it's incredible how quick it is

play56:41

to get started using this both using LangChain,

play56:44

but also if you prefer using the Bedrock console,

play56:47

you saw that there's flexibility of choice

play56:49

and really getting to the same output.

play56:53

And so yeah, if we could go back to the slides, please,

play56:58

if we just quickly recap kinda what we all covered today.

play57:03

It seems like a lot and it was,

play57:05

so thank you for attentively listening.

play57:08

We first covered, you know, why customization is important,

play57:12

the different approaches for customization,

play57:15

both between augmentation and then other approaches

play57:18

that are actually changing the parameters

play57:20

and weights of the model.

play57:22

We talked a little bit about retrieval augmented generation,

play57:25

what the specific use cases are for RAG,

play57:29

and then all of the different components

play57:32

from data ingestion to the query workflow

play57:34

and how a lot of that is just completely abstracted away

play57:38

using Knowledge Bases for Amazon Bedrock.

play57:41

And lastly, we talked about how knowledge bases

play57:44

can be further extended when you need them to interact

play57:47

with real time data and databases and APIs

play57:51

where agents and knowledge bases together

play57:54

can really help enable that capability.

play57:59

And if you want to take a quick picture of this,

play58:02

a lot of the notebooks that Mani showed you earlier

play58:05

and a few more examples will be published in GitHub

play58:09

for you to take a look and use as inspiration for your work

play58:13

as well as the documentation

play58:15

which deep dives further into Knowledge Bases.

play58:18

So we hope you check that out.

play58:23

So we just wanna say thank you for attending.

play58:26

I hope this was useful.

play58:28

Our LinkedIn handles are here,

play58:29

we would love to hear from you

play58:31

and see how you're using Knowledge Bases

play58:34

and what feedback you have.

play58:35

And don't forget to take the survey in your app

play58:39

so that Mani and I can get invited again next year

play58:41

to give a talk at re:Invent

play58:43

and really appreciate you coming today.

play58:46

Thank you and have a great conference.

play58:48

(audience applauds)

Rate This

5.0 / 5 (0 votes)

Related Tags
AI技術検索強化カスタマイズRAG知識ベースAmazon BedrockAIアプリケーションMani KhanujaRuhaab Markasre:Invent
Do you need a summary in English?