Enhancing search on AWS with AI, RAG, and vector databases (L300) | AWS Events
Summary
TLDRこのビデオスクリプトでは、AWSの専門家DyaとSuが、RDS PostgresとAurora Postgresを利用した大規模な言語モデルの機能強化について語っています。彼らは、基礎モデルと従来の機械学習モデルの違い、およびこれらのモデルをカスタマイズする様々な方法を紹介しています。さらに、PG Vector拡張機能とベクターストアを使用して、仕事求人と履歴書の間の意味的検索を行うデモを行い、効率的な採用プロセスを実現しています。
Takeaways
- 😀 ビデオスクリプトはAWSの専門家DyaとSuによるセッションで、RDS PostgressとAurora Postgressの機能、特にRAG(Retrieval-Augmented Generation)モデルの活用方法について解説しています。
- 🌟 RAGモデルは、生成的AIの1つの形態で、大量の未ラベリングデータを用いて多様なタスクをこなすことができるとされています。
- 🔍 RAGモデルは、従来の機械学習モデルとは異なり、特定のタスクに特化したモデルを複数トレーニングするのではなく、1つのモデルで多様なタスクに対応するようトレーニングされています。
- 📚 トレーニングされたRAGモデルには、知識の切離(知識が古いデータに基づいているため、最新の情報にアクセスできない)や虚構(事実に反する情報を自信を持って提供する)という課題があります。
- 🛠️ RAGモデルをカスタマイズするには、ファインチューニングや継続的なトレーニング、リトリバル拡張の3つの方法があります。ファインチューニングでは、ラベリングされたデータを使ってモデルをさらにトレーニングします。
- 🔑 PG VectorはPostgressの拡張機能で、ベクター検索やストアのための機能を提供し、RDSやAurora Postgressで使用することができます。
- 📈 PG Vectorは、ベクター型の新しいデータ型を作成し、ベクター間の距離を計算したり、インデックスを使用して高速化された近似検索を行うことができます。
- 🏢 AWSのセッションでは、RAGモデルとPG Vectorを活用したデモが行われ、職務経歴書の絞込や要約の生成など、実践的な応用方法が紹介されています。
- 🔗 RDSやAurora Postgressは、他のAWSのベクターストアサービスと同様に、ベクターデータの格納や検索に役立ちます。
- 🔧 実際にRAGモデルとPG Vectorを活用する方法は、GitHubのレポジトリで提供されており、参加者は自分でも試してみることができます。
- 📝 セッションの最後に、参加者からの質問を受付、回答する時間が設けられており、より深い理解を促進しています。
Q & A
AWSのRDS PostgresとAurora Postgresの主な違いは何ですか?
-RDS PostgresとAurora PostgresはどちらもAWSで提供されるデータベースサービスですが、AuroraはRDS Postgresよりも高度な機能を備えており、スケーラビリティとパフォーマンスが優れています。また、Auroraは複数の可用性ゾーンにまたがるレプリケーションを提供し、高可用性を確保します。
RDSとAurora Postgresで使用されるPG Vector拡張機能とは何ですか?
-PG VectorはPostgresの拡張機能で、ベクター検索やベクターストアの機能を提供します。これにより、パラメータの数に制限を設けずに、効率的なベクターデータ型の検索が可能になります。
RAG(Retrieval-Augmented Generation)とはどのような技術ですか?
-RAGは、生成型AI技術の一種で、モデルに問い合わせる前にシステムがデータを検索し、そのコンテキストを用いてモデルが応答するようにします。これにより、モデルはより適切な情報を提供でき、応答の質が向上します。
ベクターストアを使用する際の主な利点は何ですか?
-ベクターストアは、大量のデータを効率的に検索できるためのデータベースです。テキスト、画像、ビデオなど、さまざまなタイプのデータを数値配列に変換して保存し、セマンティックな検索を可能にします。
AWS Aurora Postgresで利用可能なインデックスの種類には何がありますか?
-AWS Aurora Postgresでは、反復ファイルフラット(IVF Flat)と階層ナビゲーTABLE SMALL WORLD(HNSW)という2つのインデックスタイプが利用可能です。これにより、高速な近似検索が可能になります。
RDSとAurora Postgressで実現されるセマンティックサーチとはどのようなものですか?
-セマンティックサーチは、自然言語処理を利用して問い合わせとデータの意味的な類似度を評価し、最も関連性の高い結果を提供する検索手法です。RDSとAurora Postgressでは、PG Vector拡張機能を利用してこの検索を実現します。
なぜAIスタートアップはパラメータの数を増やすことでベクター表現を改善しようとしていますか?
-パラメータの数は、モデルがデータを表現する精度を決定する要素です。パラメータが多ければ多いほど、より複雑で細かいデータの特徴を捉えることができます。
AWSのベクターストアとして利用可能なサービスには何がありますか?
-AWSでは、Aurora、OpenSearch、MemoryDB for Redis、Neptune、DynamoDBなど、複数のベクターストアサービスを提供しています。これにより、さまざまなデータタイプに対応した検索が可能になります。
RAGを使用して求職者からの履歴書を自動的に絞り込むプロセスはどのように機能しますか?
-RAGを使用すると、求人情報と履歴書のベクター表現を作成し、意味的に最も一致する履歴書を特定します。その後、大規模言語モデルを使用して選択された履歴書の要約を作成し、採用担当者が迅速に判断を下すことができます。
AWSのサービスを使用して生成されたAIが提供する応答の質をどのように評価できますか?
-応答の質は、モデルが提供する情報の正確性、関連性、詳細度に基づいて評価されます。また、実際のビジネスニーズやユーザーの質問に対する適切な応答を提供する能力も重要な評価基準です。
Outlines
😀 AIセッションの紹介とゲネラティブAIの基礎
セッションの司会であるDyaとSuが紹介され、AIのセッションの内容とゲストの専門性を説明。DyaはAWSのRDS Postgressのソリューションアーキテクトとして7年間勤務し、顧客のRDSとAurora Postgressの最適化を手がける。Suは開発者リレーションチームのプリンシパルデベロッパーアドボケートとして、機械学習の分野で活躍。彼らはRDSとAessでのPG Vectorの機能、およびRAG(Retrieval-Augmented Generation)の能力について議論し、デモを行う。ゲネラティブAIの概念が紹介され、伝統的な機械学習モデルとの違いが説明されている。
🔧 モデルのカスタマイズとRAGの紹介
Suがモデルのカスタマイズ方法について説明。基礎モデルは言語を理解し、その後に様々なタスクをこなすことができる。カスタマイズはファインチューニング、継続的な事前トレーニング、リトリーブ拡張(RAG)の3つの方法で行える。RAGでは、モデルに質問を直接投げかけず、代わりにシステムからデータのコンテキストを取得し、それを用いてモデルに問う。RAGの利便性と、どのようにしてモデルがより具体的な応答を提供するかが強調されている。
📚 ベクター埋め込みとデータの数値化
データがどのように数値化され、モデルが理解できる形式に変換されるかが説明されている。トークン化とベクター埋め込みの概念が紹介され、その重要性と、どのようにしてモデルがテキストを数値配列に変換するかが解説されている。また、AWS上で利用可能な様々なベクターストアについても言及されている。
🔎 RAGの検索メカニズムとAuroraの活用
RAGを使用した検索メカニズムの仕組みが説明され、Auroraがベクターストアとしてどのように機能するかが詳細に紹介されている。質問がAuroraに投げられ、関連するデータが提供され、それを用いてモデルが応答を提供するプロセスが解説されている。また、どのようにして意味のある検索が可能になるか、およびその利点についても説明されている。
🛠️ RDSとAurora Postgressの拡張機能
Postgressの拡張機能と、特にPG Vectorの機能が強調されている。RDSとAurora PostgressにPG Vector拡張機能をインストールすることで、ベクター検索とストレージの能力を拡張できる。この拡張機能が提供する新しいデータ型Vectorと、その利便性について説明されている。
🔑 ベクターの検索とインデックスの種類
ベクター検索におけるインデックスの種類と、それぞれの特徴が解説されている。逆引きファイルフラット(IVF flat)と階層的ナビゲーTABLE SMALL WORLD(HSW)インデックスが紹介され、どのようにして近傍検索を高速化するかが説明されている。
📝 RAGとベクターストアを使った履歴書の選考デモ
RAGとベクターストアを活用した履歴書の選考プロセスがデモンストレーションされている。求人説明と履歴書のデータが用意され、意味の近い履歴書を選出するプロセスが紹介されている。また、選ばれた履歴書の要約がどのようにして生成されるかが解説されており、その利便性についても言及されている。
🌐 GitHubデモと技術的な詳細
デモンストレーションのコードとデモがGitHubで公開されており、どのようにして実装されるかの詳細が提供されている。また、埋め込みやインデックスアルゴリズムの詳細についても言及されており、関心を持つ聴衆が深く学ぶことができる。セッションの終了に向けて、聴衆に質問を募集している。
Mindmap
Keywords
💡AWS
💡RDS Postgres
💡Aurora
💡ベクター埋め込み(Vector embeddings)
💡生成的AI(Generative AI)
💡ファインチューニング(Fine-tuning)
💡継続的プリトレーニング(Continuous pre-training)
💡リトリーバル增强(Retrieval augmentation)
💡近似最近傍検索(Approximate nearest neighbor search)
💡インデックス(Index)
💡ヒアラルキーカルナビゲータブルスモールワールド(Hierarchical navigable small world)
💡逆ファイルフラットインデックス(Inverted file flat index)
💡ユビキタス言語モデル(Foundational models)
Highlights
Dya介绍了自己作为AWS RDS Postgress解决方案架构师的背景以及今日会议的主题:讨论RAG(Retrieval-Augmented Generation)能力和PG Vector的一些特性。
解释了生成性AI(Generative AI)的概念及其与传统机器学习模型的区别,强调了基础模型使用大量未标记数据进行训练。
讨论了基础模型面临的挑战,包括知识截止问题和幻觉(hallucinations),即模型提供错误信息却表现得很自信。
提出了模型定制化的概念,即通过提供特定上下文信息来定制化模型的响应。
Su介绍了自己作为机器学习领域的主要开发者倡导者,并继续讨论了如何定制大型语言模型以适应特定用例。
解释了微调(fine-tuning)的概念,即通过提供标记数据来调整模型的权重,以适应特定任务。
讨论了继续预训练(continuous pre-training)的概念,即使用未标记数据继续训练模型。
介绍了检索增强(retrieval augmentation)的概念,其中模型在回答问题前会先从数据中检索上下文信息。
解释了向量嵌入(Vector embeddings)的概念,即将文本转换为数字数组以供模型处理。
讨论了向量存储(Vector store)的作用,即存储和检索数据的数字表示形式。
展示了如何使用Aurora数据库作为向量存储,并解释了其在RAG中的作用。
介绍了PG Vector扩展,这是Postgres的一个扩展,用于增强向量搜索和存储能力。
解释了使用索引进行近似搜索的优势,包括速度提升和对大量维度数据的处理。
讨论了不同类型的索引技术,包括倒排文件平坦索引(IVF flat)和层次导航小世界(HNSW)索引。
提供了一个使用Aurora Postgres和PG Vector进行简历筛选的演示,展示了RAG在实际工作流程中的应用。
演示了如何使用大型语言模型为筛选出的简历生成摘要,以帮助招聘经理快速做出决策。
提供了GitHub仓库链接,其中包含演示代码和其他相关示例,供观众深入了解和实践。
Transcripts
first of all I know it has been a long
day thank you so much for joining this
session it's the last session I promised
to keep you entertained for with my
co-speaker Su for the next uh 40
minutes uh so my name is dya and I'm a
specialist Solutions architect uh for
RDS postgress and I've been with AWS for
almost 7 years now and uh my primary uh
uh my day-to-day job involves helping
customers
working with RDS and Aurora postgress by
adopting it and also optimizing their
workloads on the same so
today uh uh I'm joined by su who uh
who's just sitting right there maybe
introduce yeah he he probably introduce
himself uh later but um so we both will
be basically discussing uh the rag
capabilities of what is rag uh with
gender
and uh some of the features of uh PG
Vector which you can use with RDS and
aess so we'll discuss some of those
things and then we have a demo at the
end as well so generative AI has taken
the world biosom and I'm sure you have
heard the statement many times today
right so um but these uh models are
trained on vast majorities of unlabeled
data right and this data can be of
different types it could be text it
could be images it could be videos and
and and much more and instead of having
one model for per task like traditional
machine learning models how how does
foundational models differ so if you see
if you see on the left I have
traditional ml models where you have one
model and you train it for some label
data and then it does some task right it
could be summarization text generation
Q&A this is one task one model this is
the traditional machine learning model
way of doing things but this could be
tedious right because if you have
different uh use cases you have to train
different models and the label data to
get the desired result you
want whereas for foundational models uh
instead of gathering label data for each
model and training multiple models you
just provide unlabeled data of different
types and this is not just this is vast
amounts of data I'm talking about right
so you train it against vast amounts of
unlabeled data to uh to basically enable
you to do the different
tasks and uh however there are some
challenges there there could be some
challenges with traditional foundational
models models which could be uh
knowledge cut off off and hallucinations
I'll go to them one by one but by
knowledge cut off I mean if I ask um
let's say let's take a very simple
example of a chatboard if I ask that
chatboard a question of some event which
very recently happened right for example
uh magnitude of a recent earthquake in
Taiwan last week it might be trained for
a specific cut off date the model so
let's say the model is trained till
December 2023 so it does not have that
information so there is a knowledge cut
off that there it does not know the
answer or even I I actually think
Hallucination is even worse because
hallucinations is when a foundational
model gives you back a factually
incorrect information with
confidence so that is even worse because
if I don't get an answer that's fine but
if I get something incorrect
that's worse so that is what
Hallucination is when I don't know the
fact but uh I am giving you the
information uh the incorrect information
and these are the limitations for Lear
language language models
today but um is there a way to fix it so
for now let's take this example of we
have a foundational model and you ask it
a prompt question let's say explain
thermodynamics to a middle school
student right what it would do it do is
it would um take some pre it is drained
trained on some data from the web or or
uh you know it takes some data from the
web or wherever it has information from
the data it's trained on and give you a
response
but is this
response is this response basically
somehow customized to your use case or
what if you could give it some
information about your context like your
company uh starts from the last month or
uh any other statistics which is more
contextual to your use case what if you
could do that so that is customization
and I'm going to hand it over to suan to
explain it further
sure
okay can you hear me okay yes okay
thanks uh Diva so uh hello everyone my
name is Su dnat uh I work with the
developer relations team uh as a
principal developer advocate for machine
learning and uh what we're going to talk
from now onwards is uh how you can
customize your large language model to
do something meaningful for your use
case so Diva touched upon how
foundational models are built and how
fundamentally they are different from a
traditional machine learning model so
before I uh go ahead uh one small
example that I want to give you which
will build the intuition but what we are
going to talk next is that
when you think about foundational model
think about that you are training a
model just to understand the language
that it might be English Spanish or
whatever it is and after that you are
expected to do something which you can
leverage from that particular model
because that model understand uh English
or Spanish right so then you can ask any
question for example uh if you are in
let's say fifth grade and if I tell you
that there is a fire uh on the stage
so you can immediately say that this is
a negative sentiment although till that
fifth grade you are never learned what
is negative sentiment or what is
positive sentiment but since you have an
understanding of English it was very
easy for you uh to make that judgment
that it is uh a sentiment analys I mean
it is a negative sentiment okay so how
we can customize our model so the first
way to customize is using fine-tuning
which is similar to to this what the
example I gave is that uh that kid is
now able to identify what is a positive
sentiment and negative sentiment now
imagine that I'm telling that kid
hundreds of such examples where that kid
just know English as of today but you
are giving him or her thousands of
examples of positive sentiment and
negative sentiment and after that you
expect that kid to response to any
situation so this is nothing but
fine-tuning where you take any large
language model from from anthropic claw
mrail you name it and then what you do
is you fine tune with respect to your
data and your data should be labeled
unlike uh how the large language model
was trained when Diva mentioned that it
takes only unlabeled data while you're
doing the fine tuning you have to make
sure that you give the trained data uh
labeled data just like in the kits
example uh you are giving us a situation
and it you are telling the kid that this
is a positive sentiment you're telling
another uh situation you're telling that
this is a negative sentiment right so
this is what uh labeled data is about
right you have the uh data along with
its label but this is a little
cumbersome because you need to have some
machine learning expertise and what
essentially happens under the hood is
when you provide your label data to the
large language model there are few
weights which got changed in the last
layer of that model okay so this is
about uh fine-tuning what are the other
uh things that we can do the other thing
that you can do is you can continue the
pre-training because entropic CLA Amazon
we have trained our model what we are
calling it as a large language model or
foundational model what you can do is
you take that model and continue that
pre-training but now you are doing it
for with your data and the difference is
with respect to fine tuning is that for
continuous training you are actually
using unlabeled data just like the data
that was used for uh pre-training but
again this also needs a lot of resource
and background around data science and
uh it is time cumbersome right and it's
not always you have uh all of your data
rightly available in the way uh that a
model can understand and the last is
retrieval augmentation or retrieval
information retrieval and this is where
u u rag comes in so what you do is you
take the model as it is uh but when you
ask the model you don't ask the model
the question directly rather you first
ask the system uh to get some context
from your data and then you take that
context and your question and then you
ask uh the large language model so the
large language model knows your question
and it also knows some context you can
think of it like this um so let's say
this is a library and there's a a
gentleman sitting in front and you have
some question and you just ask uh some
question to that person and uh that
person uh goes to through this library
and he picks five books and he just
gives you that five books and he says
that okay take this five books you
should have the answer in this so this
five books are nothing but the context
and the question you already have so now
it's your job to find the answer and
this is where rag comes in and we will
uh when we go through the demo you will
get the uh intuition about you know how
these two things are connected
so today we are going to talk about this
but always remember that there are other
options available one is fine tuning
another is continuous pre-training okay
so all right so before we jump into uh
rag and how we can Implement that uh we
have to first understand uh in what
language we can talk to the large
language model so the fundamental uh
Theory here is we are talking about a
language or in this case let's say
English but it be any other language and
the way that computer understands is
numbers right it's not the binary
numbers that is the under you know it's
at the very uh lowest s level but just
an array of numbers so let's say you you
give any any sentence like I am at AWS
Dublin conference so this number uh this
statement you cannot throw to any uh
large language model or to a rack system
so you have to convert that into numbers
so what Vector embeddings is basically
is to find and convert this English text
into an array of numbers how many how
many of you have heard of um tokens in
the context of large language model
tokens 5 billion yeah so what is that
token so token is basically is a as to
keep it very simple you can consider
that as one word as token but
technically it's not one word it could
be a middle of the word like let's say
going maybe two tokens go and ing
because in is a pre you know prefix
which can be postfix which can be
attached to lot of other words so there
is no point to store going doing
sleeping at different uh uh in in the
form of different embeddings so here in
this sentence I am at AWS Dublin this is
converted into this array of numbers in
reality what would happen is it would we
first convert that uh entire sentence
into small small tokens so for example I
am at and so on and then each of those
uh words will be converted into an array
of numbers so now this is the numeric
representation of your data and the
better the representation is the better
model that you will get or the better
response you will get so all the AI
based startups Amazon and all other
companies they all are training their
model to get a better representation of
your
text and that's why we uh we see that uh
50 billion parameters 70 billion
parameters it's nothing but that many
number of parameters are used to get uh
your representation of your
number all right so let's see how things
uh work uh in the context of rag so
let's say you have your data it could be
image data Text data or uh or a video or
anything of that sense you don't have to
uh think that this only works uh with
text although that is uh that is the
most common use case but imagine that
you have an image you break that image
into pixels let's say 512 x 512 every
pixel you represent that in the form of
array of numbers okay so what you do you
take your data so first you have to
convert that into an array of numbers
right so we have not yet come to asking
questions and all so we are in the proc
in in the time where we are processing
the data so we take one of the machine
learning model you we don't care what
model it is uh for now let's say that we
are using bedrock and one of the
embedding model which converts that data
into an array of numbers that's all it's
the most uh basic thing that you have to
do before even start working with Rag
and what you do with that data you have
to store that data uh into some place
and what is that place you have to pick
up one of the vector stores so we have
lot of vector stores uh functionality
available on AWS
uh in this case we are going to use
Amazon Aurora which is one of our
relational database service and div is
going to dive deep into that uh in a
moment so once you have your data stored
in this database we are done uh with the
prerequisite uh to get started and
leverage large language model so once
you have your data into this database
now when you ask the
question uh do you feel that you will
ask the question directly to
Bedrock or any large language model yes
no how many of how many of you feel yes
you should ask the question
directly okay how many of you feel no is
that you should not yes so we should not
ask the question directly to bedrock
right because if you had to do that what
is what is the point of doing all this
exercise what we did earlier so first
thing that you will do is you are going
to ask that question to Aurora or the
vector store uh Vector database of your
choice and you can think of it as the
librarian and it will give you some
context right these are the five books
in the earlier example which we took and
it will just give you some uh relevant
data which is relevant to your question
and how it is doing that it is doing
that uh because you have stored the data
uh into numbers and we are doing a
semantic search it's not like a
uh uh text based search where you say
that okay select all employees from this
database where employee number is this
we don't do that here we are doing a
semantic search that means whosoever are
closed like it's not exact closed just
give me okay so we'll dive into that a
little bit late uh when we discuss
different uh uh you uh similarity uh
mechanisms and now you have the
questions and have the context so you
can ask bedrock and Bedrock will respond
right so this is the whole life cycle of
uh rag and who will do this interaction
with the database and interaction with
your large language model there are lot
of uh tools available but uh we are
using Lang chain for this demo but uh
you can use uh Bedrock agents and there
are a lot of third partyy tools but uh
langin is mostly uh common in this field
so let me give you a quick uh uh 30
seconds demo so we have one of our
employee Ricardo he's mostly focused on
uh open source he's based out of London
so he writes newsletter every uh every
week so now what I've done is I've Ed
all his newsletters uh for last one year
and I created a uh PDF so that I can ask
questions about the newsletters because
his newsletters are very very uh recent
and relevant so what Divya mentioned
earlier that when you ask some question
relevant question like what is the
magnitude of earthquake in Taiwan uh it
was not able to answer because it's a
knowledge cut off but here uh we should
be able to get that answer because that
newsletters are all very relevant and we
are not going to ask questions directly
to the large language model but uh we
will use a vector store to store the
embeddings of this uh uh blogs okay so
here if you see uh we you get an option
let me see if you can play this can you
try to play this
uh yeah can you click on play yeah thank
you so much so here um you can browse
and select uh the PDF or whatever data
that you have it could be an S3 bucket
as well you can select the large
language model that you want to use uh
to ask the question and if you see here
we asked who who are the top 10 uh open
source contributors for that particular
year and it could able to fetch that
information uh from uh the blogs that
was stored in that PDF if you ask the
same question to Claude or any large
language model it will just give you uh
a random answer so this is the benefit
so another question that we have asked
is when is the open source India
conference scheduled so it is saying
that the conference is scheduled in
October 12th at nimhans convention
center in Bangalore so these are the
things that you can do when you use rag
so it is actually doing a semantic
search on the database so when I asked a
question that question is not going to
bedrock directly it is going to take
that uh contextual information from the
database uh using the vector store and
then it is uh getting the answer from
the large language model okay can we go
to the next slide
please can you go to the next
slide yeah just go to the next slide I
think it's again playing
that and all of this code uh uh uh I'll
share it with in the GitHub repo if you
want to try this out this is a simple
streamlit application and uh uh you can
just uh you know just have a look
okay so uh now we talked about uh Vector
store as Aurora but we do have lot of
different uh Vector stores uh like open
search uh then Aurora post which is the
focus uh today uh then we also have uh
memory DB for redis uh Neptune document
DB uh and Dynamo DB one more we have
it's uh Kendra which is not a vector
store but you can also leverage uh
Kendra for this but I don't want to dive
into all this service uh kind of a uh
discussion but uh I'll hand it over to
Diva where she will dive into more on
Aurora and how you can leverage that
yeah
much can you hear me again no seems like
not now yes okay so thank you San for a
great insight into what rag is now let's
dive deep into my topic of interest uh
RDS and Aurora postgress how you can
Leverage the an extension uh to
basically have those generative a
capabilities right first of all why
postgress so how many of you actually
use postgress today for our
workloads wow okay that's great and how
many of you use RDS or Aurora
postgress very good very good so um
right now the vector capabilities which
uh Su actually discussed before is
available only for RDS and pogress for
the relational side it's not there for
the myql bit so we'll get
into why postgress so postgress is a the
most advanced uh uh open source
relational engine and uh it has an
extension which is called PG Vector now
what are extensions in postgress
extension is just a module or a program
which can be installed into the existing
postgress uh source code to extend its
capabilities so you if if because there
are a few postgress users in the
audience I'm sure you must be using
let's say some other extensions like
postes for geographical um storing data
or using that or PG stat statements or
there are a lot of different extensions
and PG Vector is one of them to help you
with your uh Vector search and Vector
storage capabilities so it's an open of
course it's it's you can just install it
on RDS or Aurora postgress the extension
and uh you can use IVF flat or uh hn uh
HS snw indexing techniques and uh it
works perfectly with the existing uh
clients you don't need to do much in RDS
and Aurora postgress it's just a command
to basically create the extension and is
available for both RDS and Aurora so
this is the exact command you need to
write uh create extension vector and it
what it basically does is it creates a
new data type of type Vector uh and
uh and now you might ask the question
but why would we use this because we
have had arrays and cube in postgress
for a long time now so why should we use
vectors or why should we use this
extension so the two main reasons which
come to my mind are the limitation on
the number of parameters what Su was
mentioning earlier the number of
parameters you can store for um uh for a
particular vector and uh and the second
one is the vector capabilities so this
capability of
searching
uh the capability of being able to do
these searches right so being able to
calculate the distance between two
vectors so you have uh you have a couple
of uh different ways how you can do that
because what we are trying to do is we
basically have a query Vector right so
like for example some gave an example of
a question you ask a kid right you have
that query Vector that okay uh what
would you do if there's a fire on the
stage right that is your query but how
close is that query to the information I
have to the vector embeddings which I
have already those numbers right so in
order to get the answer so the closer it
is to the existing vector Bings that is
my answer right so you have these
different um ways of calculating the
distance between vectors with uh with
the PG Vector extension uh with with the
vectors you
get sorry can you change this okay
thanks and
uh you can use the searches as it is
but what if you need speed right you
need it to speed it up because think
about it this way if you are storing the
vectors and they have a lot of
Dimensions millions of Dimensions or the
number of Dimensions you can have if you
have more dimensions and you searching
against it it can take a longer time
depending on how much data you have
right but and that is because it doesn't
exact search so when you are asking uh
when you ask a question it matches it
exactly to what you have but what if you
could do an approximate search but uh
faster so using indexes you can do an uh
you can do an approximate near nearest
neighbor search with some trade-off to
the um to the to the answer so basically
you might get different different
answers if you're using indexes might
but it Rec you basically trade off for
Speed because you're doing an
approximate search and not an exact
search it uh it would be
faster and uh at this time PG Vector
supports two types of indexes inverted
file flat and hierarchal navigable small
world indexes so let's try to understand
these two index types right so the IVF
flat indexing
uh what we do is you have uh you have
your query Vector so let's let's uh
let's keep that side for a minute but
those are all the vectors on the top so
the first uh the first box which you see
that is all the vectors you have okay
now what you do as a first step is
cluster the vectors into buckets each
having a centroid and by centr I mean
one one vector which is uh which is
let's say equidistant to all of them
right so you have the centroid and then
you take your query vector and find the
nearest centroid by comparing the
distance to each of those centroids so
you take the query Vector there and you
compare it with all the centroids and by
comparing it you understand that okay
this uh this Vector which the query
Vector is closest to this bucket so as
you see here we are taking a bucket of
vectors and it is very clear that this
this is an approximate nearest neighbor
search
right uh the second type is a bit more
complex than the first one but I will
try my best to explain it so you have
the hierarchical navigable small worlds
come from navigable small worlds which
basically means you have different
graphs G different layers of graph
graphs this is the first layer right
this is the top most layer where the
vectors are further away from each other
right so in this first layer you have
your query Vector just like the previous
example and you have an entry point this
entry point is defined already right you
enter via this entry point into the
graph and you calculate the distance of
this query Vector to all the points here
so if you see the query Vector was
closest to to the second point in the
first layer right now we move to the Now
we move to that point because we know
this is closest to my query Vector in
the first
layer then I have the second layer where
now I calculate the distance again of
the query Vector with the other points
right so now my point of Interest has
moved to the point below because that is
closer to my query vector and as as we
go from layer to layer these vectors are
closer to each other so now when we go
to the to the lowest layer we see that
okay that red dot that red uh dot which
you see is the closest to my query
Vector so this is the result which I
will return and that's why it's
hierarchical because it has these layers
and we have to calculate it this way at
at each
layer now for the next part uh Suman
will go over a a demo for you for using
this this knowledge which we which we
have explained just now about vectors
and using Aurora pogress for resume
screening thank you thanks so
much okay so this is a very complex
topic today thank uh uh these uh
architectures uh and the uh algorithm
that is being used for sorting uh if you
are if you have worked in distributed
system databases you have B B3 b+3 if
you have worked in that so it's similar
to that but this is used in vector
embeddings or vector stores all right so
what we are going to do now is I'll
quickly show you uh whatever we have
learned so far U and we will conclude
with that in next 2 minutes is how these
things work uh in practice so imagine
that you are a hiring manager or an HR
farm and you would like to uh short list
few of the candidates for your uh
company for a particular position now
you got hundreds of rums so now you want
to make your system little bit uh more
efficient by using Rag and Vector stores
to short list the top two or top three
candidates and you can start uh the
interview process so first before I show
you this uh don't take this example as
is this is just to give you an intuition
that what are possible uh what are the
things that are possible but you can
imagine the same thing in a chatbot in
any of the uh third party companies that
uh you might be using or any of the apps
that you have in your phone in fact with
phone uh I just forgot to mention uh how
many of you use Amazon app in your
phone yeah have you ever tried that um
uh image search option when you search
for a product have you tried that yes
not many okay so when you go out from
this room uh you just open your app
instead of you go to the search button
where you type your product that you are
looking for you will see that there will
be an icon where you can just click and
you can use your camera to take a
picture so you take anything like your
mobile phone pen whatever you find just
take a picture you will find that it
will search and it will give you that
product we typically use in in in the in
in the Western countries uh you will see
that they people use that at Walmart or
Target because you can search and check
the price and you can get the price
match in the store itself so under the
hood we are using uh Vector search uh in
that uh you know when you are doing that
search okay just to give you that
context all right so uh this is uh a
quick demo so what we are going to do
now is we will be putting some job
description uh and then that job
description is nothing but is our query
right so this is our job description and
then we will also upload some rums and
and then we will try to find what Diva
was mentioning semantic search that
which of the resumés have a semantically
closed with respect to the job
description and after that we will say
that okay these are the two or three
resumés we can Define how many uh uh
resumes we want to short list we will
also use a large language model to do a
summary of those shortlisted candidates
so that uh you know when a hiring
manager gets that resume he or she will
also get a summary of that candidate
okay so that's a pipeline so here again
this is also available in GitHub you can
try it out uh of your own in your
account so first what we will do is uh
we will go and check for one of the uh
opening or whatever is the job
description and you can just uh copy
that and uh we are going to paste that
uh here so I also put one tab so that
you don't have to search for that when
you trying it you can just click on that
you will see uh some job description uh
downloaded in your browser and then uh
we will say that we want the top two
rums and these are the rumes sample RS
uh this is also available in the GitHub
so that you don't have to search for
resumes or open a rack just to uh test
this out and then uh you can select uh
the large language model of your
choice and uh that's it okay so that's
that's what uh you have to do and uh now
what is happening is at this point it is
now creating the embeddings of all those
resumés and your job description and
then it is once it is done it is storing
it in Aurora and then using post uh uh
and PG Vector as an extension it is able
to do a semantic search so here if you
see it has short listed uh two rumes
it's still going on yeah so these are
the two rums which we got as a context
from that semantic search and we also
got some summary so this summary will
help uh the hiring manager to take a
quick decision before even uh opening
the resume and this summary is coming
from the large language model directly
because you are going to let's say
entropic or CLA in this case and saying
that okay this is the vector for a
person this is nothing but resume for
for me as a human just can you just
summarize that and it is able to
summarize that person okay so uh that's
all uh I have uh for uh uh this
particular demo but I just thought to
give you a run through of this
architecture okay so we had the job
description we had the bunch of rumes
and we also defined how many resumes we
want now this is if you are uh uh from
the development side uh I just uh gave
few of the uh Cote snippet so that you
get the intuition so here how we are
creating this uh embeddings uh we are
using Lang chain which we have seen and
in the earlier in the session and we are
using the PG Vector uh class and the way
that we create is just simply PG Vector
from documents and then we list all
those seven rumes so that's all you
don't have to do anything of your own
and then um you just do a semantic
search in your vector store by giving
the job description and the uh number of
rumes that you want so again we are
using uh the vector store object which
we created here using Lang chain and
once that is done what we want is to
create a summary and the way that we
create a summary is using again Lang
chain summarize and we can just provide
the large language model which in this
case is bedrock if you see the bedrock
in the brackets I have not mentioned
anything but if when you look at the
code you will see that we have used
Cloud so we just mentioned that cloud I
model ID of the cloud in the Bedrock
function call that's it so we can just
get the summary and return that function
so this the whole thing is wrapped
around uh a streamlit application which
you can find in this uh GitHub repo and
this GitHub repo uh also have U one
block post uh which I published couple
of weeks back uh which goes a little
deep into embeddings and how different
uh uh indexing algorithms work which
diva mentioned uh in his presentation
that how uh how efficiently you can do
that semantic search so if you want to
dive deep and learn uh that a little bit
you may like to visit that so but
everything is there in this GitHub repo
it has this particular demo but also
there are a bunch of other demos so if
you have any questions uh feel free to
ask we will be uh around but uh thank
you so much for your time uh really
appreciate uh hanging on uh till the
last session on the uh for this uh for
this event thank you thanks everyone
yeah
5.0 / 5 (0 votes)