GraphRAG: The Marriage of Knowledge Graphs and RAG: Emil Eifrem
Summary
TLDRこのビデオでは、開発者がより良いアプリケーションを構築するために関係性によってつながる個々のデータポイントを活用することが重要であると語られています。検索の進化と、Googleがページランク技術を利用して検索結果の質を向上させた歴史が説明されています。さらに、知識グラフの導入と、それを基に構築されたグラフRAG(Retrieval-Augmented Generation)が、次期ウェブ検索の時代を迎えると述べています。また、グラフRAGの利点、特に精度の向上や開発の容易さ、そして説明可能性の強化が強調されています。最後に、知識グラフの作成方法や新しいツール「知識グラフビルダー」のデモが紹介されています。
Takeaways
- 🔍 スクリプトでは、開発者がより良いアプリケーションを構築するために、個々のデータポイントだけでなく、関係性によってつながるデータポイントを利用することが重要であると強調されています。
- 🌐 検索技術の進化について語り、当初はアルタ・ヴィスタなどの多数のウェブ検索会社がいたが、ページランク技術を導入したGoogleがその問題を解決し、検索市場を支配しました。
- 📊 Googleは2012年に知識グラフを導入し、ドキュメント間のテキストとリンクだけでなく、文書に埋め込まれた概念も格納するように検索技術を進化させました。
- 🧠 最近の検索技術の進化は、LLM(Large Language Models)と知識グラフの組み合わせによるもので、これはWeb検索のグラフRAG(Retrieval-Augmented Generation)時代をもたらすとされています。
- 💡 Graph RAGは、取得パスで知識グラフを使用するRAGであり、ベクター検索などの他の技術と組み合わせることで、より高度な検索機能を実現します。
- 🛠️ Graph RAGを使用することで、顧客サポートボットなどのアプリケーションの開発が容易になり、サポート記事などのテキストファイルから関連する情報を取得することができます。
- 📈 研究によると、Graph RAGを使用することで、回答の正確性が大幅に向上し、特定のクラスの質問に答えることができるようになります。
- 🛡️ Graph RAGは、アプリケーションの開発が容易になるだけでなく、説明可能性、監査可能性、ガバナンスが向上し、ビジネスにとっても良い影響を与えます。
- 🚀 知識グラフの作成は、構造化データ、非構造化データ、および混合データから行うことができますが、特に混合データがエンタープライズでの主要な使用例となっています。
- 🛠️ 非構造化データから知識グラフを作成するための新しいツール「Knowledge Graph Builder」がリリースされ、PDFファイルやWikipediaページ、YouTubeリンクなどからデータを抽出してグラフを作成することができます。
Q & A
開発者がより良いアプリケーションを構築するためにデータポイントどうすればよいですか?
-開発者は個々のデータポイントだけでなく、関係性によってつながる個々のデータポイントを利用することで、アプリケーションを構築し、アプリケーションの構築を改善することができます。
Googleがウェブ検索をどのように進化させましたか?
-Googleは、ページランクというグラフアルゴリズムを適用し、インターネットとウェブの規模に応じて、最も重要な検索結果を早期に提供することで、ウェブ検索を進化させました。
Googleの知識グラフとは何ですか?
-Googleの知識グラフは、文書間のテキストとリンクだけでなく、文書に埋め込まれた概念も保存するデータ構造で、ノードと関係性にキーバリュープロパティを複数追加できるグラフです。
グラフRAGとは何ですか?
-グラフRAGは、取得パスで知識グラフを使用するRAGです。知識グラフだけでなく、ベクトル検索などの他の技術と組み合わせることも可能です。
グラフRAGを使用する利点は何ですか?
-グラフRAGを使用する利点は、回答の正確性の向上、開発の容易さ、説明可能性およびガバナンスの強化です。
知識グラフの作成はどのように行いますか?
-知識グラフの作成には構造データ、非構造データ、および混合データの3つのタイプがありますが、特にエンタープライズでのプロダクションユース ケースでは混合データが中心です。
非構造データから知識グラフを作成するにはどうすればよいですか?
-非構造データから知識グラフを作成するには、新しいツールである知識グラフビルダーを使用して、PDFファイルやYouTubeリンク、Wikipediaリンクなどをドラッグアンドドロップしてグラフを作成します。
グラフRAGを使用する場合の開発の難しさはどの程度ですか?
-知識グラフの作成が完了している場合、グラフRAGを使用したアプリケーションの開発は比較的容易ですが、知識グラフの作成自体には学習曲線があり、ツールも未成熟なため難しく感じる場合があります。
グラフRAGを使用する場合、どのような種類のグラフが存在しますか?
-グラフRAGでは、言語グラフとドメイングラフの2つのタイプのグラフが存在し、これらは非構造化情報からグラフを作成する際に関連性があります。
知識グラフビルダーのデモで示された機能は何ですか?
-知識グラフビルダーのデモでは、PDFファイルやYouTubeリンク、Wikipediaページなどの情報をアップロードし、それらから知識グラフを作成し、視覚化できる機能が示されています。
Outlines
🌐 インターネット検索の進化とグラフRAGの紹介
講演者は、開発者が関係性に結びつけられた個々のデータポイントを利用してより良いアプリケーションを構築できるよう、プロフェッショナルな人生を捧げてきたと述べています。特に、LLMとGENの世界におけるその適用について話す予定ですが、まず検索技術の進化について語ります。90年代中期には、数十のウェブ検索会社がキーワードベースのテキスト検索技術を用いていましたが、それによって返される検索結果が膨大になり、Alav Vista効果と呼ばれる問題が生じました。これは、検索クエリに対して数千のヒットが返され、ユーザーが関連する結果を見つけるまでページを何度も確認しなければならなくなったため、ユーザー体験が低下したことを指しています。この問題は、Googleがページランクというグラフアルゴリズムを導入して解決しました。ページランクは、インターネットの規模に適応されたベクトルセントラリティアルゴリズムであり、最も重要な検索結果をすぐに提供するのに役立ちました。その後、2012年にGoogleはナレッジグラフを導入し、ドキュメント間のテキストとリンクだけでなく、ドキュメントに埋め込まれた概念も保存するように検索技術をアップグレードしました。ナレッジグラフは、検索結果に構造化されたテキストと関連する情報が表示されるビジュアルパネルを通じてユーザーに提供されます。最近では、GoogleはAIエンジニアリングカンファレンスで示された旅行の計画の例を通じて、LLMとナレッジグラフの組み合わせによる次のウェブ検索の時代を示しました。
🔍 グラフRAGの定義とその利点
グラフRAGは、取得パスでナレッジグラフを使用するRAGと定義されています。これは、ナレッジグラフだけを使用することを意味するものではありませんが、他の技術如くベクトル検索と組み合わせて使用される可能性があります。例えば、顧客サービスボットを構築する場合、サポート記事がテキストファイルとして保存されているとします。RAGを使用して、テキストをノードのプロパティに追加し、記事が特定のWi-Fi製品についてのものであると示す関係を設定します。ユーザーが質問すると、ベクトル検索を用いてコアドキュメントを取得し、グラフをたどってその周りのより多くの文脈を取得します。これにより、より高い正確性を持つ回答を得ることができます。研究によると、ナレッジグラフとベクトル検索の組み合わせ使用により、回答の正確性が3倍に増大すると言われています。また、グラフRAGを使用することで、ベクトル検索だけでは不可能だった別のタイプの質問に答えることができます。
🛠 グラフRAGの開発の易しさとデータの種類
グラフRAGを使用すると、アプリケーションの開発が容易になるという利点がありますが、ナレッジグラフの作成には学習曲線があります。既存のナレッジグラフがある場合、RAGアプリケーションの構築はより簡単ですが、ナレッジグラフの作成自体は学習が必要です。データは構造化データ、非構造化データ、半構造化データの3つのタイプに分けられますが、特に半構造化データはエンタープライズでのプロダクションユース ケースの多くに該当します。ナレッジグラフの作成には、構造化データからプロパティグラフモデルへの変換が比較的容易ですが、非構造化データの扱いは困難であり、ツールも未成熟です。
🚀 ナレッジグラフの作成とデモ
ナレッジグラフの作成に関する講演では、新しいツールであるナレッジグラフビルダーのデモが行われました。このツールを使用すると、PDFファイルやYouTubeリンク、Wikipediaページなどからデータをドラッグアンドドロップしてグラフを作成することができます。デモでは、アンドリュー・ングのニュースレター、OpenAIのWikipediaページ、SwixとAlesioのYouTubeポッドキャストなどからデータを追加し、ナレッジグラフビルダーにアップロードしました。これにより、文書の論理的概念要素がグラフに抽出され、情報全体が視覚化されます。また、デバッグツールとしても非常に価値があり、アプリケーションの構築時にデータと対話しながら問題を解決できると説明されました。
Mindmap
Keywords
💡グラフRAG
💡ナレッジグラフ
💡ベクター検索
💡ページランク
💡アラビスタ効果
💡開発の容易さ
💡説明可能性
💡構造データ
💡非構造データ
💡ナレッジグラフビルダー
Highlights
专注于通过关系连接的个体数据点来构建更好的应用程序。
讨论了搜索技术的演变,特别是从Alav Vista到Google的转变。
Google通过PageRank算法解决了Alav Vista效应,即搜索结果过多问题。
介绍了知识图谱的引入,它是自PageRank以来Google最大的后端搜索技术升级。
知识图谱通过节点和关系的结构化数据提高了搜索结果的准确性。
提出了Graph RAG(Retrieval-Augmented Generation)的概念,即在检索路径中使用知识图谱。
Graph RAG结合了向量搜索和知识图谱,以提高应用程序的准确性。
展示了如何使用Graph RAG来构建客户服务机器人,以提供更准确的答案。
Graph RAG的三个主要好处是:更高的准确性、更简单的开发和更好的可解释性。
讨论了如何从结构化、非结构化和混合数据中创建知识图谱。
介绍了知识图谱构建工具,它可以从PDF、Wikipedia页面和YouTube链接中提取数据。
展示了知识图谱构建工具的实际演示,包括从不同来源提取数据并创建图谱。
强调了图表示的清晰性和对开发人员在构建应用程序时的易用性。
讨论了向量空间表示与图空间表示的不同,以及它们在搜索和开发中的应用。
解释了Graph RAG如何通过图结构来增强检索结果,提供更丰富的上下文。
强调了知识图谱在提高应用程序的可解释性和治理方面的重要性。
Transcripts
[Music]
I basically dedicated my professional
life towards getting developers to be
able to build better applications and
build applications better by leveraging
not just individual data points kind of
retrieved at once like one at a time or
summed up or grouped calculated averages
but individual data points connected by
relationships right and today I'm going
to talk about that applied in the world
of llms and
gen so before I do that though I'm going
to take a little bit of a detour I'm
going to talk about search the evolution
of search everyone here in this room
knows that the vast majority of web
searches today are handled with Google
but some of you know that it didn't
start that way it started this way who
here recognizes this web page right yeah
who here recognizes alav Vista as a name
like a a few people right um back in the
mid90s there was dozens of web search
company dozens plural like 30 40 50 web
search companies and they all used
basically the same technology they lo
used keyword-based text search inverted
index type search bm25 like for those of
you who know what that means and it
worked really really well until it
didn't and the Ala Vista effect kicked
in which was the not that you search for
something you got a thousand or
thousands of hits back and you had to
look through Page after page until you
found the result that was relevant to
you the alav Vista effect you got too
much back from the internet that wasn't
a problem in the beginning because most
of the things you searched for when I
went on onto the internet in the
beginning got zero results back because
there was no content about that on the
internet
right but the Ala fist effect too many
search results was solved by Google this
is Google's press release mid you know
mid 2000 they talk about a billion URLs
they've indexed right but they also talk
about the the technology that they use
behind the scenes the technology called
page rank that delivers the most
important search results really early on
in fact the first the top 10 Blue Links
on that first page right that technology
page rank is actually a graph algorithm
which is actually it's called igen
vector centrality and the innovation
that Google did was applying that to the
scale of the internet and the scale of
the web right page rank that ushered in
and created honestly the most valuable
company on the planet for quite quite
some while the page rank era right that
lasted for about a decade about a dozen
years until in 2012 Google wrote this
blog post which is an amazing blog post
introducing the knowledge graph things
not strings where this they said you
know what guys we've done an upgrade on
the back end of our search technology
the biggest one since we invented page
rank where we're move moving away from
not just storing the text and the links
between the documents but also the
concepts embedded in those documents
things not just strings and we all know
what the knowledge graft looks like
visually when you search for something
on on on Google today Moscone Center
just around the the corner from here
you're going to get this little panel
right on the right hand side if you look
at that panel it has a combination of
unstructured text in this case from
Wikipedia with structure text it has the
address the owner of the mcone building
you know that kind of stuff this thing
is backed on the back end by the data
structure looking like this right it has
these concept the rings that we call
nodes that are connected to other nodes
through relationships and the both the
nodes and the Rel relationships have key
value properties you can attach two
three a th000 10,000 on both the nodes
and very importantly also on the
relationships this is a Knowledge Graph
and that was the next decade or so 12
years of Google's dominance until a few
months ago a few months ago at Google IO
they took the next step ushered in by
the AI Engineers conference a year ago
well not quite but of course the entire
C around gen and this is one of the
example that they did the classic travel
itinerary they helped me plan out this
this travel everyone here is in this
room knows that this is backed by an llm
and it is backed by an llm in
combination with this knowledge graph
data structure graph rag this is usering
in the next era of web search the graph
rag era what I'm going to talk to you
about today is how can you use well
first of all should you and if so how
can you use graph rag for your own rag
based
applications so what is graph rag right
it is very very simple graph rag is rag
where on the retrieval path you use a
Knowledge Graph very very simple it
doesn't say you only use a Knowledge
Graph but you use a Knowledge Graph
maybe in combination with other
Technologies like vector search so let's
take the classic example of a customer
service bot right and let's say that you
are working at a company that is
building Wi-Fi routers for example right
and you have a bunch of support articles
right and they've been stored in text
files right and then you are tasked with
building a bot that either is gives
direct end users access to it or your
own customer service agent employees
like access to this information and you
know how to do this because you live in
the llm world and the Gen world so
you're going to use rag for this right
and so you have that data it's text
documents you've added that text onto
the properties of particular nodes right
so have a node per article but then
you've also said that you know what this
article is about this particular Wi-Fi
product right you have a relationship to
that Wi-Fi product and that Wi-Fi
product sits in a hierarchy of other
Wi-Fi products and it's written by this
particular customer service engineer you
know that kind of stuff and then the end
user has a question hey my wife Wii
lights are flashing yellow and my
connection drops like what should I do
something like that I think we all know
how we do this we vectorize the search
right we get a some kind of vector
embedding back we use Vector search to
get the core documents but here's where
the graph rag part kicks in you get
those core articles back which are
linked to the noes actually the text is
on the nodes but then you use the graph
to Traverse from there and retrieve more
context around it maybe it's not just
that particular article for that
particular Wi-Fi but something else in
that family maybe you use the fact that
this particular engineer has very highly
ranked content and then you rank that
higher right you retrieve more context
than what you get out of the a&n based
search from your from your vector store
and you pass that on to the llm along
with the question you get an answer back
and you hand it to the
user so the core pattern is actually
really really simple but really really
powerful right you start with doing a
vector search I think of this almost as
a primary key it's of course not a
primary but almost like a primary key
lookup into the graph you use that
Vector search you get a an initial set
of nodes then you walk the graph and you
expand that and find relevant content
based on the structure of the graph then
you take that and you return it to the
LM or optionally maybe that gives you a
th000 or 10,000 nodes back and then you
do what Google did you rank that you get
the top K based on the structure of the
graph maybe you even use page rank right
you get that you pass it on to the llm
really really simple but really really
powerful and then there's a number more
advanced patterns but that's kind of the
next the next talk I'll do in a year the
like more sophisticated graph retrieval
patterns right but the core one very
very
simple okay so if that's what graph rag
is what are the benefits of graph rag
when should you use it when should you
not use it the first and most Stark
benefit is accuracy it's directly
correlated to the quality of the answer
there's been a ton of research articles
about this in the last six months or
something like that I believe the first
one was this one by data. world I just
picked out three out at random here that
I that I that I like this is the first
one that I know of by dataworld which is
a data cataloging company based on a
knowledge Gra graph and they proved out
across I think 43 different questions
that on average the response quality the
accuracy was three times higher if they
use a knowledge graph in combination
with with Vector
search I love this paper by LinkedIn uh
it's a shows a very similar type I think
it's like 75% or 77% increase in in
accuracy um but it also has a great
architecture view so you can take a the
QR code right there look at that paper
which combines various components and
also the flow through that that I
thought was just really pedagogical um
but by and large it's showing the same
thing a little bit of different numbers
but significantly higher accuracy when
it used graph in combination with Vector
search and then Microsoft had a
fantastic blog post and subsequently I
think two academic papers the blog post
was in February of this year where they
also talk about the increased quality of
respon bonds but also beyond that hey
you know what graph rag enables us to
answer another important class of of
questions that we couldn't even do with
Vector search alone or Baseline Vector
search that's what they or Baseline rag
alone so first benefit higher quality
response
back the second one is easier
development and this one is a little bit
interesting because there's an asterisk
in there because what we hear very
clearly from our user is that it's
easier to build rag applications with
graph rag compared to Baseline rag but
we also hear it's like it's actually
hard and what's the Nuance there well
the Nuance is if you already have a
Knowledge Graph up and running so
there's a learning curve where people
need to learn how do I create the
knowledge graph in the first place once
you have that it's a lot easier but how
do you create that Knowledge Graph right
so let's put a little pin in that if I
rush through the next few slides quickly
enough I'm going to show you hopefully a
demo on on on on that but let's put a
little pin in that so this is an example
this is from a um a very high growth
stage fintech company that is very
Cutting Edge in Ai and they started
playing around with graph rag a few
about six months ago and they took an
existing application and they said you
know what we're going to Port this from
a vector database to Neo and most of the
operations yield a better result they
can calculate the embeddings on a
database level getting related actions
is as simple as following the
relationships between nodes and this one
I love the cache and the cach here is
their application they call it the Cache
can be visualized this is an extremely
valuable debugging tool and in the
parenthesis I actually already fixed a
couple of bugs just thanks to this right
amazing like once you've been able to
create that graph it's a lot easier to
build your rag
application and why is that right right
so let's talk a little bit about
representation let's say we have the
phrase in there apples and oranges are
both fruit and we want to represent that
in Vector space and in graph space in
graph space we already talked about this
apple is a fruit orange is a fruit
pretty easy that's the representation in
graph space in Vector space it looks
like this maybe or maybe this is
something else like we actually don't
know two different ways of representing
that phrase and then we can run
similarity calculations in different
ways using these both both
representations that I'm not going to go
through right now we can search in
different ways these are not competing
ways of doing it they're complimentary
ways of doing it right one is not better
than the other except I will make one
statement which
is when you sit down and you write your
application when you build your
application I'm actually going to make
the statement that one of them is
superior this Vector space
representation is completely opaque to a
human
being but the graph representation is
very very clear it is explicit it's
deterministic it's visual you can see it
you can touch it as you build our
applications this is the I already fixed
a couple of bugs thanks to this just by
porting it from a vector only store to
graph rag they were able to see and work
with the data and that is really
freaking powerful that shows up in
development time as you're building your
applications it's also showing up for
our friends in it who worry about things
maybe that is not directly related to
building the application which is
explainability which is auditability
which is
governance That explicit data structure
has knock on effects over there that are
really really powerful once you're up
and running in production and You' to be
able to explain why something happen
happened so higher accuracy better
answers easier to build once you're
through the hump of creating the
knowledge graph and then increased
explainability and governance for it and
the business right those are the three
things so how do you get started with
with graph raging well I've talked a lot
about this already like how do you
create the knowledge graph in the first
place so a little bit of nuance here so
basically there are three types of data
out in the world that I care about when
I think about knowledge graph creation
the first one is structure data so this
is your data in your snowflake or
something like that or postgress right
the other one is unstructured data PDF
files raw text from a web page and the
other one the third one is mixed people
tend to call this semi-structured but
it's not hit me up afterwards and I'll
tell you why it's not but basically what
this one is is structure data where some
of the fields are long form text right B
basically we're great in the first
bucket in the graph world it's very easy
to go from Snowflake or postgress or
MySQL or Oracle into a property graph
model the unstructured one is really
freaking hard right it's hard to do in
theory it's also had immature tooling
for a long run the middle one is
actually where the majority of at least
Enterprise production use cases are in
the real world
so man two and a half minutes this is
rough um there are two types of graphs
and I'm not going to talk about them I
want to talk about them lexical graphs
and domain graphs is actually really
relevant but I really want to get to
this demo so I've talked about creating
graphs with unstructured information so
we just built this new tool that we
launched just a few weeks ago called the
knowledge graph Builder and you see it
here I can can you see the screen okay
so basically here you can drag and drop
your PDF files you can put in YouTube
links Wikipedia links you can point it
to your kind of cloud service bucket
right and it's can extract the data from
there and create the graph so here I
added a few things I added um a PDF of
Andrew ning's newsletter the batch I
added the Wikipedia page for open Ai and
I added the YouTube from swix and alesio
you know the four Wars lat and space
podcast so I added all that and I
uploaded it into this knowledge graph
Builder and when I do that it creates if
let's see here I knew the ethernet
connection was going to do it it
automatically created a little Knowledge
Graph if it
renders wait for it it says one minute
here so it better render pretty soon all
right let me do this again please work
oh
no yeah oh my my why isn't oh oh crap oh
no and it's ticking down all
right wait for it wait for it all
right you can do
it can do
it and I was like trying to keep it
alive in the in the thing too all right
okay let's see I think we are here and
then it says show me a graph and it's
not going to show me the graph oh yeah
it will come on you can do it all right
yes so what we have here check this
out I would love to sit here and just
drink in your applause but we need to
look at this data so check this out this
is the document the four Wars document
here are the various chunks and then you
can take a chunk and you can expand that
this I put in the the the embedding and
you can I'll zoom out here and you can
see that it takes the The Logical
concept elements out of that chunk like
machine learning they talk about
something that is developed in a similar
fashion I don't even know there's some
company there right and you get that
entire graph of all this information on
top of that I really don't have time to
show it but there's also I really don't
have time to show it there's a chat but
in here that you can use and you can
introspect the result that gets back
I'll one more second take up your phones
if you think this looks cool take a
photo of this QR code and you're going
to have an amazing landing page where
you have access to all of this
information you can get up and running
yourself thank you for the additional
minute thank you thanks everyone for
paying attention
[Music]
5.0 / 5 (0 votes)