GraphRAG: The Marriage of Knowledge Graphs and RAG: Emil Eifrem

AI Engineer
28 Aug 202419:14

Summary

TLDRこのビデオでは、開発者がより良いアプリケーションを構築するために関係性によってつながる個々のデータポイントを活用することが重要であると語られています。検索の進化と、Googleがページランク技術を利用して検索結果の質を向上させた歴史が説明されています。さらに、知識グラフの導入と、それを基に構築されたグラフRAG(Retrieval-Augmented Generation)が、次期ウェブ検索の時代を迎えると述べています。また、グラフRAGの利点、特に精度の向上や開発の容易さ、そして説明可能性の強化が強調されています。最後に、知識グラフの作成方法や新しいツール「知識グラフビルダー」のデモが紹介されています。

Takeaways

  • 🔍 スクリプトでは、開発者がより良いアプリケーションを構築するために、個々のデータポイントだけでなく、関係性によってつながるデータポイントを利用することが重要であると強調されています。
  • 🌐 検索技術の進化について語り、当初はアルタ・ヴィスタなどの多数のウェブ検索会社がいたが、ページランク技術を導入したGoogleがその問題を解決し、検索市場を支配しました。
  • 📊 Googleは2012年に知識グラフを導入し、ドキュメント間のテキストとリンクだけでなく、文書に埋め込まれた概念も格納するように検索技術を進化させました。
  • 🧠 最近の検索技術の進化は、LLM(Large Language Models)と知識グラフの組み合わせによるもので、これはWeb検索のグラフRAG(Retrieval-Augmented Generation)時代をもたらすとされています。
  • 💡 Graph RAGは、取得パスで知識グラフを使用するRAGであり、ベクター検索などの他の技術と組み合わせることで、より高度な検索機能を実現します。
  • 🛠️ Graph RAGを使用することで、顧客サポートボットなどのアプリケーションの開発が容易になり、サポート記事などのテキストファイルから関連する情報を取得することができます。
  • 📈 研究によると、Graph RAGを使用することで、回答の正確性が大幅に向上し、特定のクラスの質問に答えることができるようになります。
  • 🛡️ Graph RAGは、アプリケーションの開発が容易になるだけでなく、説明可能性、監査可能性、ガバナンスが向上し、ビジネスにとっても良い影響を与えます。
  • 🚀 知識グラフの作成は、構造化データ、非構造化データ、および混合データから行うことができますが、特に混合データがエンタープライズでの主要な使用例となっています。
  • 🛠️ 非構造化データから知識グラフを作成するための新しいツール「Knowledge Graph Builder」がリリースされ、PDFファイルやWikipediaページ、YouTubeリンクなどからデータを抽出してグラフを作成することができます。

Q & A

  • 開発者がより良いアプリケーションを構築するためにデータポイントどうすればよいですか?

    -開発者は個々のデータポイントだけでなく、関係性によってつながる個々のデータポイントを利用することで、アプリケーションを構築し、アプリケーションの構築を改善することができます。

  • Googleがウェブ検索をどのように進化させましたか?

    -Googleは、ページランクというグラフアルゴリズムを適用し、インターネットとウェブの規模に応じて、最も重要な検索結果を早期に提供することで、ウェブ検索を進化させました。

  • Googleの知識グラフとは何ですか?

    -Googleの知識グラフは、文書間のテキストとリンクだけでなく、文書に埋め込まれた概念も保存するデータ構造で、ノードと関係性にキーバリュープロパティを複数追加できるグラフです。

  • グラフRAGとは何ですか?

    -グラフRAGは、取得パスで知識グラフを使用するRAGです。知識グラフだけでなく、ベクトル検索などの他の技術と組み合わせることも可能です。

  • グラフRAGを使用する利点は何ですか?

    -グラフRAGを使用する利点は、回答の正確性の向上、開発の容易さ、説明可能性およびガバナンスの強化です。

  • 知識グラフの作成はどのように行いますか?

    -知識グラフの作成には構造データ、非構造データ、および混合データの3つのタイプがありますが、特にエンタープライズでのプロダクションユース ケースでは混合データが中心です。

  • 非構造データから知識グラフを作成するにはどうすればよいですか?

    -非構造データから知識グラフを作成するには、新しいツールである知識グラフビルダーを使用して、PDFファイルやYouTubeリンク、Wikipediaリンクなどをドラッグアンドドロップしてグラフを作成します。

  • グラフRAGを使用する場合の開発の難しさはどの程度ですか?

    -知識グラフの作成が完了している場合、グラフRAGを使用したアプリケーションの開発は比較的容易ですが、知識グラフの作成自体には学習曲線があり、ツールも未成熟なため難しく感じる場合があります。

  • グラフRAGを使用する場合、どのような種類のグラフが存在しますか?

    -グラフRAGでは、言語グラフとドメイングラフの2つのタイプのグラフが存在し、これらは非構造化情報からグラフを作成する際に関連性があります。

  • 知識グラフビルダーのデモで示された機能は何ですか?

    -知識グラフビルダーのデモでは、PDFファイルやYouTubeリンク、Wikipediaページなどの情報をアップロードし、それらから知識グラフを作成し、視覚化できる機能が示されています。

Outlines

00:00

🌐 インターネット検索の進化とグラフRAGの紹介

講演者は、開発者が関係性に結びつけられた個々のデータポイントを利用してより良いアプリケーションを構築できるよう、プロフェッショナルな人生を捧げてきたと述べています。特に、LLMとGENの世界におけるその適用について話す予定ですが、まず検索技術の進化について語ります。90年代中期には、数十のウェブ検索会社がキーワードベースのテキスト検索技術を用いていましたが、それによって返される検索結果が膨大になり、Alav Vista効果と呼ばれる問題が生じました。これは、検索クエリに対して数千のヒットが返され、ユーザーが関連する結果を見つけるまでページを何度も確認しなければならなくなったため、ユーザー体験が低下したことを指しています。この問題は、Googleがページランクというグラフアルゴリズムを導入して解決しました。ページランクは、インターネットの規模に適応されたベクトルセントラリティアルゴリズムであり、最も重要な検索結果をすぐに提供するのに役立ちました。その後、2012年にGoogleはナレッジグラフを導入し、ドキュメント間のテキストとリンクだけでなく、ドキュメントに埋め込まれた概念も保存するように検索技術をアップグレードしました。ナレッジグラフは、検索結果に構造化されたテキストと関連する情報が表示されるビジュアルパネルを通じてユーザーに提供されます。最近では、GoogleはAIエンジニアリングカンファレンスで示された旅行の計画の例を通じて、LLMとナレッジグラフの組み合わせによる次のウェブ検索の時代を示しました。

05:01

🔍 グラフRAGの定義とその利点

グラフRAGは、取得パスでナレッジグラフを使用するRAGと定義されています。これは、ナレッジグラフだけを使用することを意味するものではありませんが、他の技術如くベクトル検索と組み合わせて使用される可能性があります。例えば、顧客サービスボットを構築する場合、サポート記事がテキストファイルとして保存されているとします。RAGを使用して、テキストをノードのプロパティに追加し、記事が特定のWi-Fi製品についてのものであると示す関係を設定します。ユーザーが質問すると、ベクトル検索を用いてコアドキュメントを取得し、グラフをたどってその周りのより多くの文脈を取得します。これにより、より高い正確性を持つ回答を得ることができます。研究によると、ナレッジグラフとベクトル検索の組み合わせ使用により、回答の正確性が3倍に増大すると言われています。また、グラフRAGを使用することで、ベクトル検索だけでは不可能だった別のタイプの質問に答えることができます。

10:02

🛠 グラフRAGの開発の易しさとデータの種類

グラフRAGを使用すると、アプリケーションの開発が容易になるという利点がありますが、ナレッジグラフの作成には学習曲線があります。既存のナレッジグラフがある場合、RAGアプリケーションの構築はより簡単ですが、ナレッジグラフの作成自体は学習が必要です。データは構造化データ、非構造化データ、半構造化データの3つのタイプに分けられますが、特に半構造化データはエンタープライズでのプロダクションユース ケースの多くに該当します。ナレッジグラフの作成には、構造化データからプロパティグラフモデルへの変換が比較的容易ですが、非構造化データの扱いは困難であり、ツールも未成熟です。

15:04

🚀 ナレッジグラフの作成とデモ

ナレッジグラフの作成に関する講演では、新しいツールであるナレッジグラフビルダーのデモが行われました。このツールを使用すると、PDFファイルやYouTubeリンク、Wikipediaページなどからデータをドラッグアンドドロップしてグラフを作成することができます。デモでは、アンドリュー・ングのニュースレター、OpenAIのWikipediaページ、SwixとAlesioのYouTubeポッドキャストなどからデータを追加し、ナレッジグラフビルダーにアップロードしました。これにより、文書の論理的概念要素がグラフに抽出され、情報全体が視覚化されます。また、デバッグツールとしても非常に価値があり、アプリケーションの構築時にデータと対話しながら問題を解決できると説明されました。

Mindmap

Keywords

💡グラフRAG

グラフRAGは、リカバリーパスでナレッジグラフを使用するRAG(Retrieval-Augmented Generation)のことで、テキストドキュメントをノードに追加し、それらを関係性によってつなぐことで、より多くの文脈情報を取得できる技術です。ビデオでは、この技術がウェブ検索の新しい時代をもたらすと説明されており、顧客サービスボットの例として、Wi-Fiルーターのサポート記事から関連する情報を取得する際に使用されています。

💡ナレッジグラフ

ナレッジグラフは、概念をノードとして表現し、それらを関係性によってつなぐデータ構造です。ビデオでは、Googleのナレッジグラフが検索技術の進化において重要な役割を果たし、テキストだけでなく概念を格納するデータベースとして機能するよう進化したと説明されています。

💡ベクター検索

ベクター検索は、テキストをベクター空間に埋め込んで類似性を計算する技術です。ビデオでは、グラフRAGにおいてベクター検索がノードへの初期セットを取得する手段として使用され、その後グラフをウォークして関連性のあるコンテンツを取得するというプロセスが説明されています。

💡ページランク

ページランクは、Googleがウェブページの重要性を評価するアルゴリズムで、リンクの質と量に基づいてページのランクを決定します。ビデオでは、ページランク技術がインターネットのスケールに応用され、最も価値のある検索結果を提供するようになったと触れられています。

💡アラビスタ効果

アラビスタ効果は、検索エンジンがユーザーのクエリに対してあまりにも多くのヒットを返し、その中から関連性の高い結果を見つけるのが難しくなることを指します。ビデオでは、Googleがこの問題を解決し、検索結果の質を向上させるためにページランク技術を開発したと説明されています。

💡開発の容易さ

開発の容易さは、グラフRAGを使用することでアプリケーションを構築するプロセスが簡素化されることを指します。ビデオでは、既存のナレッジグラフがあれば、RAGアプリケーションの構築が容易になると述べていますが、ナレッジグラフの作成自体には学習曲線があると同時に触れられています。

💡説明可能性

説明可能性は、システムが生成した結果をユーザーが理解できるように、その背後にあるロジックやプロセスを説明する能力を指します。ビデオでは、グラフRAGが明示的なデータ構造を提供することにより、アプリケーションの説明可能性や監査可能性が向上するという利点があると説明されています。

💡構造データ

構造データは、データベースなどの構造化されたフォーマットで保存されたデータです。ビデオでは、ナレッジグラフの作成において、構造データからプロパティグラフモデルへの変換が比較的容易であると述べられています。

💡非構造データ

非構造データは、PDFファイルやWebページの生テキストなどの構造化されていない形式のデータです。ビデオでは、ナレッジグラフの作成において非構造データは理論上困難であり、ツールも未成熟であると指摘しています。

💡ナレッジグラフビルダー

ナレッジグラフビルダーは、ビデオで紹介された新しいツールで、PDFファイルやYouTubeリンク、Wikipediaページなどからデータを抽出し、ナレッジグラフを作成することができる機能です。ビデオでは、このツールがナレッジグラフの作成プロセスを簡素化する力を持っていると強調されています。

Highlights

专注于通过关系连接的个体数据点来构建更好的应用程序。

讨论了搜索技术的演变,特别是从Alav Vista到Google的转变。

Google通过PageRank算法解决了Alav Vista效应,即搜索结果过多问题。

介绍了知识图谱的引入,它是自PageRank以来Google最大的后端搜索技术升级。

知识图谱通过节点和关系的结构化数据提高了搜索结果的准确性。

提出了Graph RAG(Retrieval-Augmented Generation)的概念,即在检索路径中使用知识图谱。

Graph RAG结合了向量搜索和知识图谱,以提高应用程序的准确性。

展示了如何使用Graph RAG来构建客户服务机器人,以提供更准确的答案。

Graph RAG的三个主要好处是:更高的准确性、更简单的开发和更好的可解释性。

讨论了如何从结构化、非结构化和混合数据中创建知识图谱。

介绍了知识图谱构建工具,它可以从PDF、Wikipedia页面和YouTube链接中提取数据。

展示了知识图谱构建工具的实际演示,包括从不同来源提取数据并创建图谱。

强调了图表示的清晰性和对开发人员在构建应用程序时的易用性。

讨论了向量空间表示与图空间表示的不同,以及它们在搜索和开发中的应用。

解释了Graph RAG如何通过图结构来增强检索结果,提供更丰富的上下文。

强调了知识图谱在提高应用程序的可解释性和治理方面的重要性。

Transcripts

play00:00

[Music]

play00:13

I basically dedicated my professional

play00:16

life towards getting developers to be

play00:18

able to build better applications and

play00:21

build applications better by leveraging

play00:24

not just individual data points kind of

play00:27

retrieved at once like one at a time or

play00:30

summed up or grouped calculated averages

play00:34

but individual data points connected by

play00:36

relationships right and today I'm going

play00:38

to talk about that applied in the world

play00:40

of llms and

play00:42

gen so before I do that though I'm going

play00:45

to take a little bit of a detour I'm

play00:47

going to talk about search the evolution

play00:48

of search everyone here in this room

play00:50

knows that the vast majority of web

play00:52

searches today are handled with Google

play00:54

but some of you know that it didn't

play00:56

start that way it started this way who

play00:57

here recognizes this web page right yeah

play01:00

who here recognizes alav Vista as a name

play01:03

like a a few people right um back in the

play01:06

mid90s there was dozens of web search

play01:09

company dozens plural like 30 40 50 web

play01:11

search companies and they all used

play01:13

basically the same technology they lo

play01:15

used keyword-based text search inverted

play01:18

index type search bm25 like for those of

play01:21

you who know what that means and it

play01:22

worked really really well until it

play01:26

didn't and the Ala Vista effect kicked

play01:28

in which was the not that you search for

play01:31

something you got a thousand or

play01:33

thousands of hits back and you had to

play01:35

look through Page after page until you

play01:38

found the result that was relevant to

play01:40

you the alav Vista effect you got too

play01:42

much back from the internet that wasn't

play01:44

a problem in the beginning because most

play01:46

of the things you searched for when I

play01:47

went on onto the internet in the

play01:48

beginning got zero results back because

play01:50

there was no content about that on the

play01:52

internet

play01:53

right but the Ala fist effect too many

play01:56

search results was solved by Google this

play02:00

is Google's press release mid you know

play02:02

mid 2000 they talk about a billion URLs

play02:05

they've indexed right but they also talk

play02:07

about the the technology that they use

play02:09

behind the scenes the technology called

play02:11

page rank that delivers the most

play02:13

important search results really early on

play02:16

in fact the first the top 10 Blue Links

play02:19

on that first page right that technology

play02:23

page rank is actually a graph algorithm

play02:26

which is actually it's called igen

play02:28

vector centrality and the innovation

play02:30

that Google did was applying that to the

play02:32

scale of the internet and the scale of

play02:34

the web right page rank that ushered in

play02:38

and created honestly the most valuable

play02:41

company on the planet for quite quite

play02:43

some while the page rank era right that

play02:46

lasted for about a decade about a dozen

play02:48

years until in 2012 Google wrote this

play02:53

blog post which is an amazing blog post

play02:55

introducing the knowledge graph things

play02:58

not strings where this they said you

play03:00

know what guys we've done an upgrade on

play03:03

the back end of our search technology

play03:04

the biggest one since we invented page

play03:06

rank where we're move moving away from

play03:09

not just storing the text and the links

play03:12

between the documents but also the

play03:14

concepts embedded in those documents

play03:17

things not just strings and we all know

play03:20

what the knowledge graft looks like

play03:22

visually when you search for something

play03:24

on on on Google today Moscone Center

play03:27

just around the the corner from here

play03:29

you're going to get this little panel

play03:30

right on the right hand side if you look

play03:32

at that panel it has a combination of

play03:35

unstructured text in this case from

play03:37

Wikipedia with structure text it has the

play03:41

address the owner of the mcone building

play03:44

you know that kind of stuff this thing

play03:46

is backed on the back end by the data

play03:48

structure looking like this right it has

play03:51

these concept the rings that we call

play03:54

nodes that are connected to other nodes

play03:56

through relationships and the both the

play03:58

nodes and the Rel relationships have key

play04:01

value properties you can attach two

play04:03

three a th000 10,000 on both the nodes

play04:05

and very importantly also on the

play04:08

relationships this is a Knowledge Graph

play04:11

and that was the next decade or so 12

play04:14

years of Google's dominance until a few

play04:18

months ago a few months ago at Google IO

play04:22

they took the next step ushered in by

play04:24

the AI Engineers conference a year ago

play04:27

well not quite but of course the entire

play04:29

C around gen and this is one of the

play04:32

example that they did the classic travel

play04:34

itinerary they helped me plan out this

play04:36

this travel everyone here is in this

play04:37

room knows that this is backed by an llm

play04:40

and it is backed by an llm in

play04:43

combination with this knowledge graph

play04:45

data structure graph rag this is usering

play04:49

in the next era of web search the graph

play04:53

rag era what I'm going to talk to you

play04:55

about today is how can you use well

play04:58

first of all should you and if so how

play05:00

can you use graph rag for your own rag

play05:03

based

play05:05

applications so what is graph rag right

play05:08

it is very very simple graph rag is rag

play05:11

where on the retrieval path you use a

play05:14

Knowledge Graph very very simple it

play05:18

doesn't say you only use a Knowledge

play05:20

Graph but you use a Knowledge Graph

play05:22

maybe in combination with other

play05:23

Technologies like vector search so let's

play05:25

take the classic example of a customer

play05:28

service bot right and let's say that you

play05:30

are working at a company that is

play05:33

building Wi-Fi routers for example right

play05:36

and you have a bunch of support articles

play05:38

right and they've been stored in text

play05:40

files right and then you are tasked with

play05:44

building a bot that either is gives

play05:47

direct end users access to it or your

play05:48

own customer service agent employees

play05:51

like access to this information and you

play05:53

know how to do this because you live in

play05:55

the llm world and the Gen world so

play05:57

you're going to use rag for this right

play05:59

and so you have that data it's text

play06:01

documents you've added that text onto

play06:05

the properties of particular nodes right

play06:07

so have a node per article but then

play06:10

you've also said that you know what this

play06:12

article is about this particular Wi-Fi

play06:14

product right you have a relationship to

play06:16

that Wi-Fi product and that Wi-Fi

play06:18

product sits in a hierarchy of other

play06:20

Wi-Fi products and it's written by this

play06:22

particular customer service engineer you

play06:25

know that kind of stuff and then the end

play06:27

user has a question hey my wife Wii

play06:30

lights are flashing yellow and my

play06:32

connection drops like what should I do

play06:34

something like that I think we all know

play06:36

how we do this we vectorize the search

play06:39

right we get a some kind of vector

play06:40

embedding back we use Vector search to

play06:43

get the core documents but here's where

play06:44

the graph rag part kicks in you get

play06:47

those core articles back which are

play06:49

linked to the noes actually the text is

play06:51

on the nodes but then you use the graph

play06:53

to Traverse from there and retrieve more

play06:56

context around it maybe it's not just

play06:58

that particular article for that

play07:00

particular Wi-Fi but something else in

play07:03

that family maybe you use the fact that

play07:06

this particular engineer has very highly

play07:08

ranked content and then you rank that

play07:10

higher right you retrieve more context

play07:13

than what you get out of the a&n based

play07:15

search from your from your vector store

play07:18

and you pass that on to the llm along

play07:20

with the question you get an answer back

play07:22

and you hand it to the

play07:25

user so the core pattern is actually

play07:28

really really simple but really really

play07:29

powerful right you start with doing a

play07:32

vector search I think of this almost as

play07:35

a primary key it's of course not a

play07:36

primary but almost like a primary key

play07:38

lookup into the graph you use that

play07:40

Vector search you get a an initial set

play07:43

of nodes then you walk the graph and you

play07:46

expand that and find relevant content

play07:49

based on the structure of the graph then

play07:52

you take that and you return it to the

play07:53

LM or optionally maybe that gives you a

play07:56

th000 or 10,000 nodes back and then you

play07:59

do what Google did you rank that you get

play08:01

the top K based on the structure of the

play08:04

graph maybe you even use page rank right

play08:06

you get that you pass it on to the llm

play08:09

really really simple but really really

play08:11

powerful and then there's a number more

play08:13

advanced patterns but that's kind of the

play08:15

next the next talk I'll do in a year the

play08:18

like more sophisticated graph retrieval

play08:20

patterns right but the core one very

play08:23

very

play08:24

simple okay so if that's what graph rag

play08:29

is what are the benefits of graph rag

play08:31

when should you use it when should you

play08:32

not use it the first and most Stark

play08:35

benefit is accuracy it's directly

play08:39

correlated to the quality of the answer

play08:41

there's been a ton of research articles

play08:43

about this in the last six months or

play08:46

something like that I believe the first

play08:48

one was this one by data. world I just

play08:49

picked out three out at random here that

play08:52

I that I that I like this is the first

play08:54

one that I know of by dataworld which is

play08:56

a data cataloging company based on a

play08:58

knowledge Gra graph and they proved out

play09:00

across I think 43 different questions

play09:04

that on average the response quality the

play09:07

accuracy was three times higher if they

play09:10

use a knowledge graph in combination

play09:12

with with Vector

play09:13

search I love this paper by LinkedIn uh

play09:17

it's a shows a very similar type I think

play09:19

it's like 75% or 77% increase in in

play09:23

accuracy um but it also has a great

play09:26

architecture view so you can take a the

play09:28

QR code right there look at that paper

play09:30

which combines various components and

play09:33

also the flow through that that I

play09:34

thought was just really pedagogical um

play09:38

but by and large it's showing the same

play09:39

thing a little bit of different numbers

play09:41

but significantly higher accuracy when

play09:44

it used graph in combination with Vector

play09:46

search and then Microsoft had a

play09:49

fantastic blog post and subsequently I

play09:51

think two academic papers the blog post

play09:54

was in February of this year where they

play09:56

also talk about the increased quality of

play09:58

respon bonds but also beyond that hey

play10:01

you know what graph rag enables us to

play10:04

answer another important class of of

play10:07

questions that we couldn't even do with

play10:09

Vector search alone or Baseline Vector

play10:11

search that's what they or Baseline rag

play10:14

alone so first benefit higher quality

play10:17

response

play10:18

back the second one is easier

play10:21

development and this one is a little bit

play10:23

interesting because there's an asterisk

play10:25

in there because what we hear very

play10:27

clearly from our user is that it's

play10:28

easier to build rag applications with

play10:32

graph rag compared to Baseline rag but

play10:34

we also hear it's like it's actually

play10:36

hard and what's the Nuance there well

play10:38

the Nuance is if you already have a

play10:40

Knowledge Graph up and running so

play10:42

there's a learning curve where people

play10:44

need to learn how do I create the

play10:45

knowledge graph in the first place once

play10:47

you have that it's a lot easier but how

play10:50

do you create that Knowledge Graph right

play10:52

so let's put a little pin in that if I

play10:55

rush through the next few slides quickly

play10:57

enough I'm going to show you hopefully a

play10:58

demo on on on on that but let's put a

play11:01

little pin in that so this is an example

play11:03

this is from a um a very high growth

play11:07

stage fintech company that is very

play11:10

Cutting Edge in Ai and they started

play11:12

playing around with graph rag a few

play11:15

about six months ago and they took an

play11:17

existing application and they said you

play11:19

know what we're going to Port this from

play11:21

a vector database to Neo and most of the

play11:24

operations yield a better result they

play11:26

can calculate the embeddings on a

play11:27

database level getting related actions

play11:30

is as simple as following the

play11:31

relationships between nodes and this one

play11:34

I love the cache and the cach here is

play11:37

their application they call it the Cache

play11:39

can be visualized this is an extremely

play11:41

valuable debugging tool and in the

play11:43

parenthesis I actually already fixed a

play11:46

couple of bugs just thanks to this right

play11:49

amazing like once you've been able to

play11:51

create that graph it's a lot easier to

play11:54

build your rag

play11:55

application and why is that right right

play11:59

so let's talk a little bit about

play12:01

representation let's say we have the

play12:03

phrase in there apples and oranges are

play12:05

both fruit and we want to represent that

play12:07

in Vector space and in graph space in

play12:11

graph space we already talked about this

play12:13

apple is a fruit orange is a fruit

play12:16

pretty easy that's the representation in

play12:19

graph space in Vector space it looks

play12:21

like this maybe or maybe this is

play12:24

something else like we actually don't

play12:26

know two different ways of representing

play12:28

that phrase and then we can run

play12:31

similarity calculations in different

play12:33

ways using these both both

play12:35

representations that I'm not going to go

play12:37

through right now we can search in

play12:39

different ways these are not competing

play12:42

ways of doing it they're complimentary

play12:44

ways of doing it right one is not better

play12:46

than the other except I will make one

play12:48

statement which

play12:50

is when you sit down and you write your

play12:53

application when you build your

play12:54

application I'm actually going to make

play12:56

the statement that one of them is

play12:58

superior this Vector space

play13:00

representation is completely opaque to a

play13:02

human

play13:03

being but the graph representation is

play13:07

very very clear it is explicit it's

play13:10

deterministic it's visual you can see it

play13:12

you can touch it as you build our

play13:14

applications this is the I already fixed

play13:17

a couple of bugs thanks to this just by

play13:19

porting it from a vector only store to

play13:22

graph rag they were able to see and work

play13:24

with the data and that is really

play13:26

freaking powerful that shows up in

play13:29

development time as you're building your

play13:31

applications it's also showing up for

play13:34

our friends in it who worry about things

play13:38

maybe that is not directly related to

play13:40

building the application which is

play13:43

explainability which is auditability

play13:45

which is

play13:47

governance That explicit data structure

play13:50

has knock on effects over there that are

play13:53

really really powerful once you're up

play13:55

and running in production and You' to be

play13:56

able to explain why something happen

play14:01

happened so higher accuracy better

play14:04

answers easier to build once you're

play14:06

through the hump of creating the

play14:08

knowledge graph and then increased

play14:10

explainability and governance for it and

play14:12

the business right those are the three

play14:16

things so how do you get started with

play14:18

with graph raging well I've talked a lot

play14:20

about this already like how do you

play14:22

create the knowledge graph in the first

play14:23

place so a little bit of nuance here so

play14:26

basically there are three types of data

play14:29

out in the world that I care about when

play14:30

I think about knowledge graph creation

play14:32

the first one is structure data so this

play14:34

is your data in your snowflake or

play14:36

something like that or postgress right

play14:39

the other one is unstructured data PDF

play14:42

files raw text from a web page and the

play14:44

other one the third one is mixed people

play14:47

tend to call this semi-structured but

play14:49

it's not hit me up afterwards and I'll

play14:50

tell you why it's not but basically what

play14:52

this one is is structure data where some

play14:54

of the fields are long form text right B

play14:59

basically we're great in the first

play15:01

bucket in the graph world it's very easy

play15:03

to go from Snowflake or postgress or

play15:05

MySQL or Oracle into a property graph

play15:09

model the unstructured one is really

play15:12

freaking hard right it's hard to do in

play15:16

theory it's also had immature tooling

play15:18

for a long run the middle one is

play15:21

actually where the majority of at least

play15:24

Enterprise production use cases are in

play15:26

the real world

play15:29

so man two and a half minutes this is

play15:31

rough um there are two types of graphs

play15:34

and I'm not going to talk about them I

play15:36

want to talk about them lexical graphs

play15:38

and domain graphs is actually really

play15:39

relevant but I really want to get to

play15:41

this demo so I've talked about creating

play15:45

graphs with unstructured information so

play15:47

we just built this new tool that we

play15:49

launched just a few weeks ago called the

play15:51

knowledge graph Builder and you see it

play15:53

here I can can you see the screen okay

play15:56

so basically here you can drag and drop

play15:58

your PDF files you can put in YouTube

play16:01

links Wikipedia links you can point it

play16:03

to your kind of cloud service bucket

play16:06

right and it's can extract the data from

play16:08

there and create the graph so here I

play16:10

added a few things I added um a PDF of

play16:14

Andrew ning's newsletter the batch I

play16:17

added the Wikipedia page for open Ai and

play16:20

I added the YouTube from swix and alesio

play16:23

you know the four Wars lat and space

play16:25

podcast so I added all that and I

play16:27

uploaded it into this knowledge graph

play16:30

Builder and when I do that it creates if

play16:36

let's see here I knew the ethernet

play16:38

connection was going to do it it

play16:41

automatically created a little Knowledge

play16:45

Graph if it

play16:47

renders wait for it it says one minute

play16:50

here so it better render pretty soon all

play16:53

right let me do this again please work

play17:00

oh

play17:02

no yeah oh my my why isn't oh oh crap oh

play17:10

no and it's ticking down all

play17:14

right wait for it wait for it all

play17:19

right you can do

play17:21

it can do

play17:23

it and I was like trying to keep it

play17:25

alive in the in the thing too all right

play17:30

okay let's see I think we are here and

play17:34

then it says show me a graph and it's

play17:36

not going to show me the graph oh yeah

play17:38

it will come on you can do it all right

play17:44

yes so what we have here check this

play17:49

out I would love to sit here and just

play17:50

drink in your applause but we need to

play17:52

look at this data so check this out this

play17:54

is the document the four Wars document

play17:57

here are the various chunks and then you

play17:59

can take a chunk and you can expand that

play18:01

this I put in the the the embedding and

play18:05

you can I'll zoom out here and you can

play18:08

see that it takes the The Logical

play18:11

concept elements out of that chunk like

play18:14

machine learning they talk about

play18:16

something that is developed in a similar

play18:17

fashion I don't even know there's some

play18:19

company there right and you get that

play18:22

entire graph of all this information on

play18:26

top of that I really don't have time to

play18:28

show it but there's also I really don't

play18:30

have time to show it there's a chat but

play18:33

in here that you can use and you can

play18:35

introspect the result that gets back

play18:37

I'll one more second take up your phones

play18:40

if you think this looks cool take a

play18:43

photo of this QR code and you're going

play18:46

to have an amazing landing page where

play18:47

you have access to all of this

play18:48

information you can get up and running

play18:50

yourself thank you for the additional

play18:52

minute thank you thanks everyone for

play18:53

paying attention

play18:59

[Music]

Rate This

5.0 / 5 (0 votes)

Related Tags
グラフRAG知識グラフ検索技術開発者データ構造AIエンジン検索精度開発ツールデータ抽出AI開発
Do you need a summary in English?