Enhancing search on AWS with AI, RAG, and vector databases (L300) | AWS Events

AWS Events
4 Jun 202437:08

Summary

TLDRこのビデオスクリプトでは、AWSの専門家DyaとSuが、RDS PostgresとAurora Postgresを利用した大規模な言語モデルの機能強化について語っています。彼らは、基礎モデルと従来の機械学習モデルの違い、およびこれらのモデルをカスタマイズする様々な方法を紹介しています。さらに、PG Vector拡張機能とベクターストアを使用して、仕事求人と履歴書の間の意味的検索を行うデモを行い、効率的な採用プロセスを実現しています。

Takeaways

  • 😀 ビデオスクリプトはAWSの専門家DyaとSuによるセッションで、RDS PostgressとAurora Postgressの機能、特にRAG(Retrieval-Augmented Generation)モデルの活用方法について解説しています。
  • 🌟 RAGモデルは、生成的AIの1つの形態で、大量の未ラベリングデータを用いて多様なタスクをこなすことができるとされています。
  • 🔍 RAGモデルは、従来の機械学習モデルとは異なり、特定のタスクに特化したモデルを複数トレーニングするのではなく、1つのモデルで多様なタスクに対応するようトレーニングされています。
  • 📚 トレーニングされたRAGモデルには、知識の切離(知識が古いデータに基づいているため、最新の情報にアクセスできない)や虚構(事実に反する情報を自信を持って提供する)という課題があります。
  • 🛠️ RAGモデルをカスタマイズするには、ファインチューニングや継続的なトレーニング、リトリバル拡張の3つの方法があります。ファインチューニングでは、ラベリングされたデータを使ってモデルをさらにトレーニングします。
  • 🔑 PG VectorはPostgressの拡張機能で、ベクター検索やストアのための機能を提供し、RDSやAurora Postgressで使用することができます。
  • 📈 PG Vectorは、ベクター型の新しいデータ型を作成し、ベクター間の距離を計算したり、インデックスを使用して高速化された近似検索を行うことができます。
  • 🏢 AWSのセッションでは、RAGモデルとPG Vectorを活用したデモが行われ、職務経歴書の絞込や要約の生成など、実践的な応用方法が紹介されています。
  • 🔗 RDSやAurora Postgressは、他のAWSのベクターストアサービスと同様に、ベクターデータの格納や検索に役立ちます。
  • 🔧 実際にRAGモデルとPG Vectorを活用する方法は、GitHubのレポジトリで提供されており、参加者は自分でも試してみることができます。
  • 📝 セッションの最後に、参加者からの質問を受付、回答する時間が設けられており、より深い理解を促進しています。

Q & A

  • AWSのRDS PostgresとAurora Postgresの主な違いは何ですか?

    -RDS PostgresとAurora PostgresはどちらもAWSで提供されるデータベースサービスですが、AuroraはRDS Postgresよりも高度な機能を備えており、スケーラビリティとパフォーマンスが優れています。また、Auroraは複数の可用性ゾーンにまたがるレプリケーションを提供し、高可用性を確保します。

  • RDSとAurora Postgresで使用されるPG Vector拡張機能とは何ですか?

    -PG VectorはPostgresの拡張機能で、ベクター検索やベクターストアの機能を提供します。これにより、パラメータの数に制限を設けずに、効率的なベクターデータ型の検索が可能になります。

  • RAG(Retrieval-Augmented Generation)とはどのような技術ですか?

    -RAGは、生成型AI技術の一種で、モデルに問い合わせる前にシステムがデータを検索し、そのコンテキストを用いてモデルが応答するようにします。これにより、モデルはより適切な情報を提供でき、応答の質が向上します。

  • ベクターストアを使用する際の主な利点は何ですか?

    -ベクターストアは、大量のデータを効率的に検索できるためのデータベースです。テキスト、画像、ビデオなど、さまざまなタイプのデータを数値配列に変換して保存し、セマンティックな検索を可能にします。

  • AWS Aurora Postgresで利用可能なインデックスの種類には何がありますか?

    -AWS Aurora Postgresでは、反復ファイルフラット(IVF Flat)と階層ナビゲーTABLE SMALL WORLD(HNSW)という2つのインデックスタイプが利用可能です。これにより、高速な近似検索が可能になります。

  • RDSとAurora Postgressで実現されるセマンティックサーチとはどのようなものですか?

    -セマンティックサーチは、自然言語処理を利用して問い合わせとデータの意味的な類似度を評価し、最も関連性の高い結果を提供する検索手法です。RDSとAurora Postgressでは、PG Vector拡張機能を利用してこの検索を実現します。

  • なぜAIスタートアップはパラメータの数を増やすことでベクター表現を改善しようとしていますか?

    -パラメータの数は、モデルがデータを表現する精度を決定する要素です。パラメータが多ければ多いほど、より複雑で細かいデータの特徴を捉えることができます。

  • AWSのベクターストアとして利用可能なサービスには何がありますか?

    -AWSでは、Aurora、OpenSearch、MemoryDB for Redis、Neptune、DynamoDBなど、複数のベクターストアサービスを提供しています。これにより、さまざまなデータタイプに対応した検索が可能になります。

  • RAGを使用して求職者からの履歴書を自動的に絞り込むプロセスはどのように機能しますか?

    -RAGを使用すると、求人情報と履歴書のベクター表現を作成し、意味的に最も一致する履歴書を特定します。その後、大規模言語モデルを使用して選択された履歴書の要約を作成し、採用担当者が迅速に判断を下すことができます。

  • AWSのサービスを使用して生成されたAIが提供する応答の質をどのように評価できますか?

    -応答の質は、モデルが提供する情報の正確性、関連性、詳細度に基づいて評価されます。また、実際のビジネスニーズやユーザーの質問に対する適切な応答を提供する能力も重要な評価基準です。

Outlines

00:00

😀 AIセッションの紹介とゲネラティブAIの基礎

セッションの司会であるDyaとSuが紹介され、AIのセッションの内容とゲストの専門性を説明。DyaはAWSのRDS Postgressのソリューションアーキテクトとして7年間勤務し、顧客のRDSとAurora Postgressの最適化を手がける。Suは開発者リレーションチームのプリンシパルデベロッパーアドボケートとして、機械学習の分野で活躍。彼らはRDSとAessでのPG Vectorの機能、およびRAG(Retrieval-Augmented Generation)の能力について議論し、デモを行う。ゲネラティブAIの概念が紹介され、伝統的な機械学習モデルとの違いが説明されている。

05:02

🔧 モデルのカスタマイズとRAGの紹介

Suがモデルのカスタマイズ方法について説明。基礎モデルは言語を理解し、その後に様々なタスクをこなすことができる。カスタマイズはファインチューニング、継続的な事前トレーニング、リトリーブ拡張(RAG)の3つの方法で行える。RAGでは、モデルに質問を直接投げかけず、代わりにシステムからデータのコンテキストを取得し、それを用いてモデルに問う。RAGの利便性と、どのようにしてモデルがより具体的な応答を提供するかが強調されている。

10:04

📚 ベクター埋め込みとデータの数値化

データがどのように数値化され、モデルが理解できる形式に変換されるかが説明されている。トークン化とベクター埋め込みの概念が紹介され、その重要性と、どのようにしてモデルがテキストを数値配列に変換するかが解説されている。また、AWS上で利用可能な様々なベクターストアについても言及されている。

15:05

🔎 RAGの検索メカニズムとAuroraの活用

RAGを使用した検索メカニズムの仕組みが説明され、Auroraがベクターストアとしてどのように機能するかが詳細に紹介されている。質問がAuroraに投げられ、関連するデータが提供され、それを用いてモデルが応答を提供するプロセスが解説されている。また、どのようにして意味のある検索が可能になるか、およびその利点についても説明されている。

20:06

🛠️ RDSとAurora Postgressの拡張機能

Postgressの拡張機能と、特にPG Vectorの機能が強調されている。RDSとAurora PostgressにPG Vector拡張機能をインストールすることで、ベクター検索とストレージの能力を拡張できる。この拡張機能が提供する新しいデータ型Vectorと、その利便性について説明されている。

25:08

🔑 ベクターの検索とインデックスの種類

ベクター検索におけるインデックスの種類と、それぞれの特徴が解説されている。逆引きファイルフラット(IVF flat)と階層的ナビゲーTABLE SMALL WORLD(HSW)インデックスが紹介され、どのようにして近傍検索を高速化するかが説明されている。

30:08

📝 RAGとベクターストアを使った履歴書の選考デモ

RAGとベクターストアを活用した履歴書の選考プロセスがデモンストレーションされている。求人説明と履歴書のデータが用意され、意味の近い履歴書を選出するプロセスが紹介されている。また、選ばれた履歴書の要約がどのようにして生成されるかが解説されており、その利便性についても言及されている。

35:12

🌐 GitHubデモと技術的な詳細

デモンストレーションのコードとデモがGitHubで公開されており、どのようにして実装されるかの詳細が提供されている。また、埋め込みやインデックスアルゴリズムの詳細についても言及されており、関心を持つ聴衆が深く学ぶことができる。セッションの終了に向けて、聴衆に質問を募集している。

Mindmap

Keywords

💡AWS

AWSはAmazon Web Servicesの略で、クラウドコンピューティングサービスを提供するAmazonの部門です。ビデオではAWSの専門家が話すため、AWSはビデオの技術的背景に寄与しています。例えば、AWS for RDS Postgressと関連する専門家の紹介や、AWSのサービスを利用したデモンストレーションがビデオ内で行われています。

💡RDS Postgres

RDS Postgresは、AWSが提供するリレーショナルデータベースサービスの一つで、PostgreSQLデータベースエンジンを利用しています。ビデオではRDS Postgresとその拡張機能を通じて、データの最適化と検索能力の向上について説明しています。

💡Aurora

AuroraはAWSが提供するマネージドなクラウドデータベースエンジンで、RDS Postgresと同様にリレーショナルデータベースサービスを提供しますが、より高性能でスケーラブルなサービスを目指しています。ビデオではAuroraとそのベクター機能を使用して、効率的なデータ検索を行っている例が紹介されています。

💡ベクター埋め込み(Vector embeddings)

ベクター埋め込みは、テキストデータを数値型の配列に変換するプロセスであり、自然言語処理や機械学習の分野でよく使われます。ビデオでは、テキストを数値に変換し、それを用いて意味のある検索を行う方法について説明しています。

💡生成的AI(Generative AI)

生成的AIとは、大量の未ラベリングデータを学習し、新しいテキスト、画像、ビデオなどのコンテンツを生成する人工知能のサブ分野です。ビデオでは、生成的AIがどのようにしてさまざまなタスクを行うかについて説明しており、その応用例としてデモンストレーションが行われています。

💡ファインチューニング(Fine-tuning)

ファインチューニングは、既存の機械学習モデルに対してさらに特定のタスクに適応させるプロセスです。ビデオではファインチューニングを通じて、汎用的な言語モデルを特定のユースケースに合わせて調整する方法について説明しています。

💡継続的プリトレーニング(Continuous pre-training)

継続的プリトレーニングは、モデルを継続的にトレーニングすることで、新たなデータや情報をモデルに吸収させるプロセスです。ビデオでは、この方法がファインチューニングとどのように異なり、どのようにしてモデルを更新していくかについて説明しています。

💡リトリーバル增强(Retrieval augmentation)

リトリーバル增强は、モデルが特定の問いに対する答えを生成する際に、事前にデータから関連性の高い情報を検索し、その情報を基に答えを生成するプロセスです。ビデオでは、RAG(Retrieval-Augmented Generation)モデルと呼ばれるこの手法を使って、より正確な情報を提供する例を紹介しています。

💡近似最近傍検索(Approximate nearest neighbor search)

近似最近傍検索は、データベースから最も近いベクターを見つけるための効率的なアルゴリズムです。ビデオでは、PG Vector拡張機能を使って行う近似最近傍検索について説明しており、その方法がどのようにして高速化を提供できるかについて触れています。

💡インデックス(Index)

インデックスは、データベースなどでデータを高速に検索するために使用されるデータ構造です。ビデオでは、ベクターデータに対してインデックスを用いて近似検索を行う方法について説明しており、その利点とトレードオフについても触れています。

💡ヒアラルキーカルナビゲータブルスモールワールド(Hierarchical navigable small world)

ヒアラルキーカルナビゲータブルスモールワールドは、複雑なデータセットに対して効率的な近似最近傍検索を行うアルゴリズムの一つです。ビデオでは、このアルゴリズムがどのようにして層を経て最も近いベクターを見つけるのかを説明しています。

💡逆ファイルフラットインデックス(Inverted file flat index)

逆ファイルフラットインデックスは、テキスト検索やベクター検索で使用されるインデックスのタイプの一つで、クエリベクターと各ベクターの距離を計算して最も近いものを検索します。ビデオでは、このインデックスタイプがどのようにして近似最近傍検索を行うかについて説明しています。

💡ユビキタス言語モデル(Foundational models)

ユビキタス言語モデルは、様々なタイプのデータを扱い、多数のタスクをこなすことができる人工知能モデルです。ビデオでは、このモデルがどのようにして様々なデータを学習し、様々な問いに対する答えを提供するかについて説明しています。

Highlights

Dya介绍了自己作为AWS RDS Postgress解决方案架构师的背景以及今日会议的主题:讨论RAG(Retrieval-Augmented Generation)能力和PG Vector的一些特性。

解释了生成性AI(Generative AI)的概念及其与传统机器学习模型的区别,强调了基础模型使用大量未标记数据进行训练。

讨论了基础模型面临的挑战,包括知识截止问题和幻觉(hallucinations),即模型提供错误信息却表现得很自信。

提出了模型定制化的概念,即通过提供特定上下文信息来定制化模型的响应。

Su介绍了自己作为机器学习领域的主要开发者倡导者,并继续讨论了如何定制大型语言模型以适应特定用例。

解释了微调(fine-tuning)的概念,即通过提供标记数据来调整模型的权重,以适应特定任务。

讨论了继续预训练(continuous pre-training)的概念,即使用未标记数据继续训练模型。

介绍了检索增强(retrieval augmentation)的概念,其中模型在回答问题前会先从数据中检索上下文信息。

解释了向量嵌入(Vector embeddings)的概念,即将文本转换为数字数组以供模型处理。

讨论了向量存储(Vector store)的作用,即存储和检索数据的数字表示形式。

展示了如何使用Aurora数据库作为向量存储,并解释了其在RAG中的作用。

介绍了PG Vector扩展,这是Postgres的一个扩展,用于增强向量搜索和存储能力。

解释了使用索引进行近似搜索的优势,包括速度提升和对大量维度数据的处理。

讨论了不同类型的索引技术,包括倒排文件平坦索引(IVF flat)和层次导航小世界(HNSW)索引。

提供了一个使用Aurora Postgres和PG Vector进行简历筛选的演示,展示了RAG在实际工作流程中的应用。

演示了如何使用大型语言模型为筛选出的简历生成摘要,以帮助招聘经理快速做出决策。

提供了GitHub仓库链接,其中包含演示代码和其他相关示例,供观众深入了解和实践。

Transcripts

play00:01

first of all I know it has been a long

play00:03

day thank you so much for joining this

play00:05

session it's the last session I promised

play00:07

to keep you entertained for with my

play00:10

co-speaker Su for the next uh 40

play00:13

minutes uh so my name is dya and I'm a

play00:16

specialist Solutions architect uh for

play00:18

RDS postgress and I've been with AWS for

play00:22

almost 7 years now and uh my primary uh

play00:26

uh my day-to-day job involves helping

play00:29

customers

play00:30

working with RDS and Aurora postgress by

play00:33

adopting it and also optimizing their

play00:36

workloads on the same so

play00:38

today uh uh I'm joined by su who uh

play00:43

who's just sitting right there maybe

play00:46

introduce yeah he he probably introduce

play00:49

himself uh later but um so we both will

play00:53

be basically discussing uh the rag

play00:56

capabilities of what is rag uh with

play00:59

gender

play01:01

and uh some of the features of uh PG

play01:04

Vector which you can use with RDS and

play01:07

aess so we'll discuss some of those

play01:09

things and then we have a demo at the

play01:11

end as well so generative AI has taken

play01:15

the world biosom and I'm sure you have

play01:18

heard the statement many times today

play01:19

right so um but these uh models are

play01:23

trained on vast majorities of unlabeled

play01:27

data right and this data can be of

play01:30

different types it could be text it

play01:33

could be images it could be videos and

play01:36

and and much more and instead of having

play01:40

one model for per task like traditional

play01:44

machine learning models how how does

play01:47

foundational models differ so if you see

play01:50

if you see on the left I have

play01:52

traditional ml models where you have one

play01:54

model and you train it for some label

play01:57

data and then it does some task right it

play02:00

could be summarization text generation

play02:03

Q&A this is one task one model this is

play02:06

the traditional machine learning model

play02:07

way of doing things but this could be

play02:11

tedious right because if you have

play02:12

different uh use cases you have to train

play02:15

different models and the label data to

play02:17

get the desired result you

play02:19

want whereas for foundational models uh

play02:23

instead of gathering label data for each

play02:26

model and training multiple models you

play02:28

just provide unlabeled data of different

play02:31

types and this is not just this is vast

play02:35

amounts of data I'm talking about right

play02:37

so you train it against vast amounts of

play02:40

unlabeled data to uh to basically enable

play02:44

you to do the different

play02:47

tasks and uh however there are some

play02:51

challenges there there could be some

play02:54

challenges with traditional foundational

play02:55

models models which could be uh

play02:59

knowledge cut off off and hallucinations

play03:01

I'll go to them one by one but by

play03:04

knowledge cut off I mean if I ask um

play03:07

let's say let's take a very simple

play03:09

example of a chatboard if I ask that

play03:11

chatboard a question of some event which

play03:14

very recently happened right for example

play03:17

uh magnitude of a recent earthquake in

play03:20

Taiwan last week it might be trained for

play03:25

a specific cut off date the model so

play03:28

let's say the model is trained till

play03:30

December 2023 so it does not have that

play03:34

information so there is a knowledge cut

play03:36

off that there it does not know the

play03:38

answer or even I I actually think

play03:41

Hallucination is even worse because

play03:45

hallucinations is when a foundational

play03:47

model gives you back a factually

play03:51

incorrect information with

play03:53

confidence so that is even worse because

play03:56

if I don't get an answer that's fine but

play03:58

if I get something incorrect

play04:00

that's worse so that is what

play04:02

Hallucination is when I don't know the

play04:05

fact but uh I am giving you the

play04:08

information uh the incorrect information

play04:11

and these are the limitations for Lear

play04:13

language language models

play04:15

today but um is there a way to fix it so

play04:20

for now let's take this example of we

play04:22

have a foundational model and you ask it

play04:25

a prompt question let's say explain

play04:27

thermodynamics to a middle school

play04:28

student right what it would do it do is

play04:31

it would um take some pre it is drained

play04:35

trained on some data from the web or or

play04:38

uh you know it takes some data from the

play04:40

web or wherever it has information from

play04:42

the data it's trained on and give you a

play04:45

response

play04:47

but is this

play04:49

response is this response basically

play04:52

somehow customized to your use case or

play04:55

what if you could give it some

play04:57

information about your context like your

play05:02

company uh starts from the last month or

play05:05

uh any other statistics which is more

play05:07

contextual to your use case what if you

play05:10

could do that so that is customization

play05:14

and I'm going to hand it over to suan to

play05:17

explain it further

play05:19

sure

play05:21

okay can you hear me okay yes okay

play05:25

thanks uh Diva so uh hello everyone my

play05:27

name is Su dnat uh I work with the

play05:29

developer relations team uh as a

play05:31

principal developer advocate for machine

play05:33

learning and uh what we're going to talk

play05:36

from now onwards is uh how you can

play05:39

customize your large language model to

play05:42

do something meaningful for your use

play05:44

case so Diva touched upon how

play05:46

foundational models are built and how

play05:48

fundamentally they are different from a

play05:50

traditional machine learning model so

play05:51

before I uh go ahead uh one small

play05:54

example that I want to give you which

play05:55

will build the intuition but what we are

play05:57

going to talk next is that

play06:00

when you think about foundational model

play06:02

think about that you are training a

play06:04

model just to understand the language

play06:08

that it might be English Spanish or

play06:10

whatever it is and after that you are

play06:13

expected to do something which you can

play06:15

leverage from that particular model

play06:17

because that model understand uh English

play06:19

or Spanish right so then you can ask any

play06:22

question for example uh if you are in

play06:25

let's say fifth grade and if I tell you

play06:27

that there is a fire uh on the stage

play06:30

so you can immediately say that this is

play06:33

a negative sentiment although till that

play06:35

fifth grade you are never learned what

play06:37

is negative sentiment or what is

play06:39

positive sentiment but since you have an

play06:41

understanding of English it was very

play06:43

easy for you uh to make that judgment

play06:46

that it is uh a sentiment analys I mean

play06:49

it is a negative sentiment okay so how

play06:51

we can customize our model so the first

play06:55

way to customize is using fine-tuning

play06:58

which is similar to to this what the

play07:00

example I gave is that uh that kid is

play07:02

now able to identify what is a positive

play07:05

sentiment and negative sentiment now

play07:07

imagine that I'm telling that kid

play07:10

hundreds of such examples where that kid

play07:12

just know English as of today but you

play07:15

are giving him or her thousands of

play07:18

examples of positive sentiment and

play07:20

negative sentiment and after that you

play07:22

expect that kid to response to any

play07:24

situation so this is nothing but

play07:26

fine-tuning where you take any large

play07:28

language model from from anthropic claw

play07:30

mrail you name it and then what you do

play07:33

is you fine tune with respect to your

play07:36

data and your data should be labeled

play07:39

unlike uh how the large language model

play07:41

was trained when Diva mentioned that it

play07:44

takes only unlabeled data while you're

play07:46

doing the fine tuning you have to make

play07:47

sure that you give the trained data uh

play07:50

labeled data just like in the kits

play07:53

example uh you are giving us a situation

play07:56

and it you are telling the kid that this

play07:58

is a positive sentiment you're telling

play08:00

another uh situation you're telling that

play08:01

this is a negative sentiment right so

play08:03

this is what uh labeled data is about

play08:06

right you have the uh data along with

play08:08

its label but this is a little

play08:10

cumbersome because you need to have some

play08:12

machine learning expertise and what

play08:14

essentially happens under the hood is

play08:16

when you provide your label data to the

play08:18

large language model there are few

play08:21

weights which got changed in the last

play08:23

layer of that model okay so this is

play08:26

about uh fine-tuning what are the other

play08:29

uh things that we can do the other thing

play08:31

that you can do is you can continue the

play08:33

pre-training because entropic CLA Amazon

play08:37

we have trained our model what we are

play08:38

calling it as a large language model or

play08:40

foundational model what you can do is

play08:42

you take that model and continue that

play08:45

pre-training but now you are doing it

play08:47

for with your data and the difference is

play08:50

with respect to fine tuning is that for

play08:53

continuous training you are actually

play08:56

using unlabeled data just like the data

play08:58

that was used for uh pre-training but

play09:01

again this also needs a lot of resource

play09:03

and background around data science and

play09:06

uh it is time cumbersome right and it's

play09:08

not always you have uh all of your data

play09:11

rightly available in the way uh that a

play09:14

model can understand and the last is

play09:17

retrieval augmentation or retrieval

play09:19

information retrieval and this is where

play09:22

u u rag comes in so what you do is you

play09:25

take the model as it is uh but when you

play09:29

ask the model you don't ask the model

play09:32

the question directly rather you first

play09:35

ask the system uh to get some context

play09:38

from your data and then you take that

play09:40

context and your question and then you

play09:43

ask uh the large language model so the

play09:46

large language model knows your question

play09:49

and it also knows some context you can

play09:51

think of it like this um so let's say

play09:53

this is a library and there's a a

play09:56

gentleman sitting in front and you have

play09:59

some question and you just ask uh some

play10:01

question to that person and uh that

play10:03

person uh goes to through this library

play10:06

and he picks five books and he just

play10:09

gives you that five books and he says

play10:11

that okay take this five books you

play10:13

should have the answer in this so this

play10:15

five books are nothing but the context

play10:17

and the question you already have so now

play10:18

it's your job to find the answer and

play10:21

this is where rag comes in and we will

play10:23

uh when we go through the demo you will

play10:25

get the uh intuition about you know how

play10:27

these two things are connected

play10:30

so today we are going to talk about this

play10:31

but always remember that there are other

play10:34

options available one is fine tuning

play10:36

another is continuous pre-training okay

play10:38

so all right so before we jump into uh

play10:41

rag and how we can Implement that uh we

play10:44

have to first understand uh in what

play10:48

language we can talk to the large

play10:50

language model so the fundamental uh

play10:53

Theory here is we are talking about a

play10:56

language or in this case let's say

play10:58

English but it be any other language and

play11:00

the way that computer understands is

play11:03

numbers right it's not the binary

play11:05

numbers that is the under you know it's

play11:06

at the very uh lowest s level but just

play11:10

an array of numbers so let's say you you

play11:13

give any any sentence like I am at AWS

play11:16

Dublin conference so this number uh this

play11:18

statement you cannot throw to any uh

play11:22

large language model or to a rack system

play11:25

so you have to convert that into numbers

play11:28

so what Vector embeddings is basically

play11:32

is to find and convert this English text

play11:36

into an array of numbers how many how

play11:39

many of you have heard of um tokens in

play11:42

the context of large language model

play11:44

tokens 5 billion yeah so what is that

play11:46

token so token is basically is a as to

play11:51

keep it very simple you can consider

play11:53

that as one word as token but

play11:56

technically it's not one word it could

play11:58

be a middle of the word like let's say

play12:00

going maybe two tokens go and ing

play12:03

because in is a pre you know prefix

play12:05

which can be postfix which can be

play12:06

attached to lot of other words so there

play12:09

is no point to store going doing

play12:12

sleeping at different uh uh in in the

play12:16

form of different embeddings so here in

play12:20

this sentence I am at AWS Dublin this is

play12:23

converted into this array of numbers in

play12:27

reality what would happen is it would we

play12:29

first convert that uh entire sentence

play12:32

into small small tokens so for example I

play12:35

am at and so on and then each of those

play12:39

uh words will be converted into an array

play12:42

of numbers so now this is the numeric

play12:45

representation of your data and the

play12:47

better the representation is the better

play12:49

model that you will get or the better

play12:51

response you will get so all the AI

play12:53

based startups Amazon and all other

play12:56

companies they all are training their

play12:58

model to get a better representation of

play13:01

your

play13:02

text and that's why we uh we see that uh

play13:06

50 billion parameters 70 billion

play13:08

parameters it's nothing but that many

play13:10

number of parameters are used to get uh

play13:14

your representation of your

play13:16

number all right so let's see how things

play13:18

uh work uh in the context of rag so

play13:21

let's say you have your data it could be

play13:23

image data Text data or uh or a video or

play13:27

anything of that sense you don't have to

play13:29

uh think that this only works uh with

play13:32

text although that is uh that is the

play13:34

most common use case but imagine that

play13:36

you have an image you break that image

play13:38

into pixels let's say 512 x 512 every

play13:41

pixel you represent that in the form of

play13:44

array of numbers okay so what you do you

play13:47

take your data so first you have to

play13:49

convert that into an array of numbers

play13:51

right so we have not yet come to asking

play13:53

questions and all so we are in the proc

play13:56

in in the time where we are processing

play13:58

the data so we take one of the machine

play14:00

learning model you we don't care what

play14:02

model it is uh for now let's say that we

play14:05

are using bedrock and one of the

play14:07

embedding model which converts that data

play14:09

into an array of numbers that's all it's

play14:11

the most uh basic thing that you have to

play14:13

do before even start working with Rag

play14:16

and what you do with that data you have

play14:17

to store that data uh into some place

play14:21

and what is that place you have to pick

play14:22

up one of the vector stores so we have

play14:25

lot of vector stores uh functionality

play14:28

available on AWS

play14:29

uh in this case we are going to use

play14:32

Amazon Aurora which is one of our

play14:34

relational database service and div is

play14:37

going to dive deep into that uh in a

play14:38

moment so once you have your data stored

play14:42

in this database we are done uh with the

play14:45

prerequisite uh to get started and

play14:48

leverage large language model so once

play14:50

you have your data into this database

play14:53

now when you ask the

play14:55

question uh do you feel that you will

play14:57

ask the question directly to

play14:59

Bedrock or any large language model yes

play15:01

no how many of how many of you feel yes

play15:04

you should ask the question

play15:06

directly okay how many of you feel no is

play15:10

that you should not yes so we should not

play15:12

ask the question directly to bedrock

play15:14

right because if you had to do that what

play15:16

is what is the point of doing all this

play15:18

exercise what we did earlier so first

play15:20

thing that you will do is you are going

play15:22

to ask that question to Aurora or the

play15:24

vector store uh Vector database of your

play15:27

choice and you can think of it as the

play15:30

librarian and it will give you some

play15:32

context right these are the five books

play15:33

in the earlier example which we took and

play15:37

it will just give you some uh relevant

play15:40

data which is relevant to your question

play15:42

and how it is doing that it is doing

play15:44

that uh because you have stored the data

play15:47

uh into numbers and we are doing a

play15:50

semantic search it's not like a

play15:53

uh uh text based search where you say

play15:56

that okay select all employees from this

play15:59

database where employee number is this

play16:02

we don't do that here we are doing a

play16:03

semantic search that means whosoever are

play16:06

closed like it's not exact closed just

play16:08

give me okay so we'll dive into that a

play16:10

little bit late uh when we discuss

play16:12

different uh uh you uh similarity uh

play16:16

mechanisms and now you have the

play16:18

questions and have the context so you

play16:20

can ask bedrock and Bedrock will respond

play16:23

right so this is the whole life cycle of

play16:25

uh rag and who will do this interaction

play16:27

with the database and interaction with

play16:29

your large language model there are lot

play16:31

of uh tools available but uh we are

play16:34

using Lang chain for this demo but uh

play16:37

you can use uh Bedrock agents and there

play16:40

are a lot of third partyy tools but uh

play16:42

langin is mostly uh common in this field

play16:45

so let me give you a quick uh uh 30

play16:47

seconds demo so we have one of our

play16:49

employee Ricardo he's mostly focused on

play16:51

uh open source he's based out of London

play16:53

so he writes newsletter every uh every

play16:56

week so now what I've done is I've Ed

play16:59

all his newsletters uh for last one year

play17:02

and I created a uh PDF so that I can ask

play17:05

questions about the newsletters because

play17:07

his newsletters are very very uh recent

play17:09

and relevant so what Divya mentioned

play17:11

earlier that when you ask some question

play17:13

relevant question like what is the

play17:14

magnitude of earthquake in Taiwan uh it

play17:17

was not able to answer because it's a

play17:18

knowledge cut off but here uh we should

play17:22

be able to get that answer because that

play17:23

newsletters are all very relevant and we

play17:26

are not going to ask questions directly

play17:27

to the large language model but uh we

play17:30

will use a vector store to store the

play17:32

embeddings of this uh uh blogs okay so

play17:36

here if you see uh we you get an option

play17:39

let me see if you can play this can you

play17:41

try to play this

play17:45

uh yeah can you click on play yeah thank

play17:48

you so much so here um you can browse

play17:52

and select uh the PDF or whatever data

play17:55

that you have it could be an S3 bucket

play17:57

as well you can select the large

play17:59

language model that you want to use uh

play18:01

to ask the question and if you see here

play18:04

we asked who who are the top 10 uh open

play18:07

source contributors for that particular

play18:10

year and it could able to fetch that

play18:13

information uh from uh the blogs that

play18:16

was stored in that PDF if you ask the

play18:19

same question to Claude or any large

play18:21

language model it will just give you uh

play18:23

a random answer so this is the benefit

play18:26

so another question that we have asked

play18:29

is when is the open source India

play18:31

conference scheduled so it is saying

play18:32

that the conference is scheduled in

play18:34

October 12th at nimhans convention

play18:37

center in Bangalore so these are the

play18:39

things that you can do when you use rag

play18:42

so it is actually doing a semantic

play18:45

search on the database so when I asked a

play18:49

question that question is not going to

play18:50

bedrock directly it is going to take

play18:53

that uh contextual information from the

play18:56

database uh using the vector store and

play18:59

then it is uh getting the answer from

play19:01

the large language model okay can we go

play19:03

to the next slide

play19:08

please can you go to the next

play19:10

slide yeah just go to the next slide I

play19:12

think it's again playing

play19:16

that and all of this code uh uh uh I'll

play19:20

share it with in the GitHub repo if you

play19:22

want to try this out this is a simple

play19:23

streamlit application and uh uh you can

play19:26

just uh you know just have a look

play19:29

okay so uh now we talked about uh Vector

play19:32

store as Aurora but we do have lot of

play19:35

different uh Vector stores uh like open

play19:38

search uh then Aurora post which is the

play19:41

focus uh today uh then we also have uh

play19:45

memory DB for redis uh Neptune document

play19:49

DB uh and Dynamo DB one more we have

play19:52

it's uh Kendra which is not a vector

play19:54

store but you can also leverage uh

play19:56

Kendra for this but I don't want to dive

play19:58

into all this service uh kind of a uh

play20:00

discussion but uh I'll hand it over to

play20:03

Diva where she will dive into more on

play20:05

Aurora and how you can leverage that

play20:08

yeah

play20:10

much can you hear me again no seems like

play20:13

not now yes okay so thank you San for a

play20:18

great insight into what rag is now let's

play20:22

dive deep into my topic of interest uh

play20:25

RDS and Aurora postgress how you can

play20:27

Leverage the an extension uh to

play20:32

basically have those generative a

play20:33

capabilities right first of all why

play20:36

postgress so how many of you actually

play20:38

use postgress today for our

play20:41

workloads wow okay that's great and how

play20:44

many of you use RDS or Aurora

play20:48

postgress very good very good so um

play20:52

right now the vector capabilities which

play20:54

uh Su actually discussed before is

play20:57

available only for RDS and pogress for

play20:59

the relational side it's not there for

play21:02

the myql bit so we'll get

play21:04

into why postgress so postgress is a the

play21:09

most advanced uh uh open source

play21:12

relational engine and uh it has an

play21:15

extension which is called PG Vector now

play21:19

what are extensions in postgress

play21:20

extension is just a module or a program

play21:23

which can be installed into the existing

play21:26

postgress uh source code to extend its

play21:30

capabilities so you if if because there

play21:33

are a few postgress users in the

play21:35

audience I'm sure you must be using

play21:37

let's say some other extensions like

play21:39

postes for geographical um storing data

play21:42

or using that or PG stat statements or

play21:45

there are a lot of different extensions

play21:48

and PG Vector is one of them to help you

play21:51

with your uh Vector search and Vector

play21:54

storage capabilities so it's an open of

play21:57

course it's it's you can just install it

play21:59

on RDS or Aurora postgress the extension

play22:02

and uh you can use IVF flat or uh hn uh

play22:06

HS snw indexing techniques and uh it

play22:10

works perfectly with the existing uh

play22:12

clients you don't need to do much in RDS

play22:14

and Aurora postgress it's just a command

play22:16

to basically create the extension and is

play22:19

available for both RDS and Aurora so

play22:21

this is the exact command you need to

play22:23

write uh create extension vector and it

play22:27

what it basically does is it creates a

play22:30

new data type of type Vector uh and

play22:35

uh and now you might ask the question

play22:38

but why would we use this because we

play22:41

have had arrays and cube in postgress

play22:45

for a long time now so why should we use

play22:48

vectors or why should we use this

play22:50

extension so the two main reasons which

play22:52

come to my mind are the limitation on

play22:55

the number of parameters what Su was

play22:58

mentioning earlier the number of

play22:59

parameters you can store for um uh for a

play23:03

particular vector and uh and the second

play23:06

one is the vector capabilities so this

play23:11

capability of

play23:12

searching

play23:15

uh the capability of being able to do

play23:17

these searches right so being able to

play23:20

calculate the distance between two

play23:22

vectors so you have uh you have a couple

play23:26

of uh different ways how you can do that

play23:29

because what we are trying to do is we

play23:31

basically have a query Vector right so

play23:34

like for example some gave an example of

play23:36

a question you ask a kid right you have

play23:38

that query Vector that okay uh what

play23:41

would you do if there's a fire on the

play23:42

stage right that is your query but how

play23:46

close is that query to the information I

play23:49

have to the vector embeddings which I

play23:51

have already those numbers right so in

play23:54

order to get the answer so the closer it

play23:56

is to the existing vector Bings that is

play23:59

my answer right so you have these

play24:02

different um ways of calculating the

play24:05

distance between vectors with uh with

play24:08

the PG Vector extension uh with with the

play24:10

vectors you

play24:18

get sorry can you change this okay

play24:22

thanks and

play24:24

uh you can use the searches as it is

play24:29

but what if you need speed right you

play24:32

need it to speed it up because think

play24:35

about it this way if you are storing the

play24:37

vectors and they have a lot of

play24:39

Dimensions millions of Dimensions or the

play24:42

number of Dimensions you can have if you

play24:45

have more dimensions and you searching

play24:48

against it it can take a longer time

play24:51

depending on how much data you have

play24:54

right but and that is because it doesn't

play24:57

exact search so when you are asking uh

play25:02

when you ask a question it matches it

play25:04

exactly to what you have but what if you

play25:08

could do an approximate search but uh

play25:12

faster so using indexes you can do an uh

play25:16

you can do an approximate near nearest

play25:19

neighbor search with some trade-off to

play25:22

the um to the to the answer so basically

play25:27

you might get different different

play25:28

answers if you're using indexes might

play25:31

but it Rec you basically trade off for

play25:34

Speed because you're doing an

play25:36

approximate search and not an exact

play25:38

search it uh it would be

play25:41

faster and uh at this time PG Vector

play25:44

supports two types of indexes inverted

play25:47

file flat and hierarchal navigable small

play25:49

world indexes so let's try to understand

play25:53

these two index types right so the IVF

play25:56

flat indexing

play25:59

uh what we do is you have uh you have

play26:02

your query Vector so let's let's uh

play26:05

let's keep that side for a minute but

play26:07

those are all the vectors on the top so

play26:09

the first uh the first box which you see

play26:12

that is all the vectors you have okay

play26:15

now what you do as a first step is

play26:17

cluster the vectors into buckets each

play26:20

having a centroid and by centr I mean

play26:24

one one vector which is uh which is

play26:27

let's say equidistant to all of them

play26:28

right so you have the centroid and then

play26:32

you take your query vector and find the

play26:35

nearest centroid by comparing the

play26:37

distance to each of those centroids so

play26:40

you take the query Vector there and you

play26:41

compare it with all the centroids and by

play26:44

comparing it you understand that okay

play26:47

this uh this Vector which the query

play26:50

Vector is closest to this bucket so as

play26:54

you see here we are taking a bucket of

play26:56

vectors and it is very clear that this

play26:58

this is an approximate nearest neighbor

play27:01

search

play27:03

right uh the second type is a bit more

play27:07

complex than the first one but I will

play27:08

try my best to explain it so you have

play27:12

the hierarchical navigable small worlds

play27:14

come from navigable small worlds which

play27:17

basically means you have different

play27:20

graphs G different layers of graph

play27:24

graphs this is the first layer right

play27:27

this is the top most layer where the

play27:30

vectors are further away from each other

play27:34

right so in this first layer you have

play27:37

your query Vector just like the previous

play27:39

example and you have an entry point this

play27:42

entry point is defined already right you

play27:46

enter via this entry point into the

play27:48

graph and you calculate the distance of

play27:51

this query Vector to all the points here

play27:54

so if you see the query Vector was

play27:57

closest to to the second point in the

play27:59

first layer right now we move to the Now

play28:02

we move to that point because we know

play28:04

this is closest to my query Vector in

play28:06

the first

play28:07

layer then I have the second layer where

play28:11

now I calculate the distance again of

play28:15

the query Vector with the other points

play28:18

right so now my point of Interest has

play28:22

moved to the point below because that is

play28:25

closer to my query vector and as as we

play28:28

go from layer to layer these vectors are

play28:31

closer to each other so now when we go

play28:34

to the to the lowest layer we see that

play28:37

okay that red dot that red uh dot which

play28:42

you see is the closest to my query

play28:44

Vector so this is the result which I

play28:46

will return and that's why it's

play28:48

hierarchical because it has these layers

play28:50

and we have to calculate it this way at

play28:52

at each

play28:54

layer now for the next part uh Suman

play28:58

will go over a a demo for you for using

play29:02

this this knowledge which we which we

play29:05

have explained just now about vectors

play29:07

and using Aurora pogress for resume

play29:09

screening thank you thanks so

play29:14

much okay so this is a very complex

play29:17

topic today thank uh uh these uh

play29:19

architectures uh and the uh algorithm

play29:22

that is being used for sorting uh if you

play29:25

are if you have worked in distributed

play29:27

system databases you have B B3 b+3 if

play29:31

you have worked in that so it's similar

play29:33

to that but this is used in vector

play29:36

embeddings or vector stores all right so

play29:38

what we are going to do now is I'll

play29:41

quickly show you uh whatever we have

play29:43

learned so far U and we will conclude

play29:46

with that in next 2 minutes is how these

play29:50

things work uh in practice so imagine

play29:53

that you are a hiring manager or an HR

play29:56

farm and you would like to uh short list

play30:01

few of the candidates for your uh

play30:03

company for a particular position now

play30:06

you got hundreds of rums so now you want

play30:08

to make your system little bit uh more

play30:11

efficient by using Rag and Vector stores

play30:15

to short list the top two or top three

play30:17

candidates and you can start uh the

play30:19

interview process so first before I show

play30:22

you this uh don't take this example as

play30:24

is this is just to give you an intuition

play30:26

that what are possible uh what are the

play30:28

things that are possible but you can

play30:30

imagine the same thing in a chatbot in

play30:33

any of the uh third party companies that

play30:36

uh you might be using or any of the apps

play30:38

that you have in your phone in fact with

play30:40

phone uh I just forgot to mention uh how

play30:43

many of you use Amazon app in your

play30:46

phone yeah have you ever tried that um

play30:50

uh image search option when you search

play30:52

for a product have you tried that yes

play30:55

not many okay so when you go out from

play30:57

this room uh you just open your app

play31:01

instead of you go to the search button

play31:02

where you type your product that you are

play31:04

looking for you will see that there will

play31:06

be an icon where you can just click and

play31:08

you can use your camera to take a

play31:10

picture so you take anything like your

play31:12

mobile phone pen whatever you find just

play31:14

take a picture you will find that it

play31:16

will search and it will give you that

play31:17

product we typically use in in in the in

play31:20

in the Western countries uh you will see

play31:22

that they people use that at Walmart or

play31:24

Target because you can search and check

play31:27

the price and you can get the price

play31:29

match in the store itself so under the

play31:32

hood we are using uh Vector search uh in

play31:35

that uh you know when you are doing that

play31:36

search okay just to give you that

play31:38

context all right so uh this is uh a

play31:41

quick demo so what we are going to do

play31:43

now is we will be putting some job

play31:46

description uh and then that job

play31:49

description is nothing but is our query

play31:52

right so this is our job description and

play31:54

then we will also upload some rums and

play31:57

and then we will try to find what Diva

play32:00

was mentioning semantic search that

play32:02

which of the resumés have a semantically

play32:05

closed with respect to the job

play32:06

description and after that we will say

play32:09

that okay these are the two or three

play32:10

resumés we can Define how many uh uh

play32:14

resumes we want to short list we will

play32:16

also use a large language model to do a

play32:19

summary of those shortlisted candidates

play32:22

so that uh you know when a hiring

play32:24

manager gets that resume he or she will

play32:27

also get a summary of that candidate

play32:29

okay so that's a pipeline so here again

play32:32

this is also available in GitHub you can

play32:34

try it out uh of your own in your

play32:36

account so first what we will do is uh

play32:39

we will go and check for one of the uh

play32:44

opening or whatever is the job

play32:46

description and you can just uh copy

play32:50

that and uh we are going to paste that

play32:53

uh here so I also put one tab so that

play32:56

you don't have to search for that when

play32:57

you trying it you can just click on that

play32:59

you will see uh some job description uh

play33:01

downloaded in your browser and then uh

play33:05

we will say that we want the top two

play33:07

rums and these are the rumes sample RS

play33:11

uh this is also available in the GitHub

play33:13

so that you don't have to search for

play33:14

resumes or open a rack just to uh test

play33:17

this out and then uh you can select uh

play33:21

the large language model of your

play33:23

choice and uh that's it okay so that's

play33:26

that's what uh you have to do and uh now

play33:31

what is happening is at this point it is

play33:33

now creating the embeddings of all those

play33:36

resumés and your job description and

play33:38

then it is once it is done it is storing

play33:41

it in Aurora and then using post uh uh

play33:45

and PG Vector as an extension it is able

play33:47

to do a semantic search so here if you

play33:51

see it has short listed uh two rumes

play33:54

it's still going on yeah so these are

play33:56

the two rums which we got as a context

play33:59

from that semantic search and we also

play34:01

got some summary so this summary will

play34:04

help uh the hiring manager to take a

play34:06

quick decision before even uh opening

play34:08

the resume and this summary is coming

play34:11

from the large language model directly

play34:12

because you are going to let's say

play34:15

entropic or CLA in this case and saying

play34:17

that okay this is the vector for a

play34:20

person this is nothing but resume for

play34:23

for me as a human just can you just

play34:25

summarize that and it is able to

play34:27

summarize that person okay so uh that's

play34:30

all uh I have uh for uh uh this

play34:34

particular demo but I just thought to

play34:36

give you a run through of this

play34:38

architecture okay so we had the job

play34:41

description we had the bunch of rumes

play34:44

and we also defined how many resumes we

play34:47

want now this is if you are uh uh from

play34:50

the development side uh I just uh gave

play34:54

few of the uh Cote snippet so that you

play34:57

get the intuition so here how we are

play35:00

creating this uh embeddings uh we are

play35:03

using Lang chain which we have seen and

play35:06

in the earlier in the session and we are

play35:08

using the PG Vector uh class and the way

play35:11

that we create is just simply PG Vector

play35:14

from documents and then we list all

play35:16

those seven rumes so that's all you

play35:19

don't have to do anything of your own

play35:22

and then um you just do a semantic

play35:25

search in your vector store by giving

play35:28

the job description and the uh number of

play35:30

rumes that you want so again we are

play35:32

using uh the vector store object which

play35:35

we created here using Lang chain and

play35:38

once that is done what we want is to

play35:41

create a summary and the way that we

play35:43

create a summary is using again Lang

play35:46

chain summarize and we can just provide

play35:49

the large language model which in this

play35:51

case is bedrock if you see the bedrock

play35:54

in the brackets I have not mentioned

play35:55

anything but if when you look at the

play35:58

code you will see that we have used

play36:00

Cloud so we just mentioned that cloud I

play36:02

model ID of the cloud in the Bedrock

play36:05

function call that's it so we can just

play36:08

get the summary and return that function

play36:10

so this the whole thing is wrapped

play36:12

around uh a streamlit application which

play36:14

you can find in this uh GitHub repo and

play36:18

this GitHub repo uh also have U one

play36:21

block post uh which I published couple

play36:23

of weeks back uh which goes a little

play36:25

deep into embeddings and how different

play36:28

uh uh indexing algorithms work which

play36:31

diva mentioned uh in his presentation

play36:34

that how uh how efficiently you can do

play36:37

that semantic search so if you want to

play36:39

dive deep and learn uh that a little bit

play36:41

you may like to visit that so but

play36:43

everything is there in this GitHub repo

play36:45

it has this particular demo but also

play36:47

there are a bunch of other demos so if

play36:49

you have any questions uh feel free to

play36:51

ask we will be uh around but uh thank

play36:53

you so much for your time uh really

play36:55

appreciate uh hanging on uh till the

play36:58

last session on the uh for this uh for

play37:00

this event thank you thanks everyone

play37:03

yeah

Rate This

5.0 / 5 (0 votes)

Связанные теги
AWSRDSPostgresAIセミナー専門家機械学習ベクター検索言語モデルカスタマイズ
Вам нужно краткое изложение на английском?