Augmentation of Data Governance with ChatGPT and Large LLMs

Lights On Data Show podcast
15 May 202337:38

Summary

TLDR在这期《数据之光》节目中,我们邀请到了RecordPoint的联合创始人兼CEO安东尼·伍德沃德,深入探讨了数据治理如何通过ChatGPT和其他大型语言模型得到增强。安东尼分享了他对于大型语言模型定义的见解,以及它们如何帮助企业更有效地管理数据,降低风险,并提高合规性。此外,他还讨论了大型语言模型在数据分类、总结、问题回答、实体识别和情感分析等方面的应用,以及使用这些模型时可能遇到的风险和挑战。通过此次对话,观众能够对如何利用这些前沿技术优化数据治理过程获得更深入的理解。

Takeaways

  • 😀 Anthony Woodward是Record Point的联合创始人和CEO,这是一个快速发展的软件服务解决方案,旨在帮助组织发现、治理和控制他们的数据,以便更严格的合规、更高效和更低的风险。
  • 📊 大型语言模型(LLMs),如ChatGPT和其他,被讨论用于数据治理增强,它们通过神经网络,能够生成文本,进行对话,写作文,等等。
  • ☕ Anthony分享了他对咖啡的热情,以及搬到西雅图后,如何继续他的咖啡爱好和学习雪板作为新爱好。
  • 🧠 LLMs是基于数学模型,能够模拟人脑的连接神经元方式来处理和生成语言,这让它们在处理语言方面表现出色。
  • 🔍 LLMs在数据治理中的应用包括数据分类、数据摘要、问答系统,以及实体识别和情感分析,这对于管理风险和合规性至关重要。
  • 🤖 尽管LLMs表现强大,但它们也容易受到偏见的影响,需要管理和调整以确保准确性和公正性。
  • 💡 LLMs能够在不需要预先定义的本体论的情况下生成分类和本体论,但通过细化可以提高它们的准确性。
  • 🚀 Record Point使用LLMs来提高数据分类和标记的效率,同时也提供了一个反馈循环,允许人工校验和改善分类结果。
  • 📈 LLMs预计将极大加速数据治理和其他商业流程的能力,预示着人工智能技术将成为未来技术的一个核心组成部分。
  • 🔑 Anthony提醒,使用免费版本的LLMs(如ChatGPT)可能会与数据隐私政策冲突,推荐在商业应用中考虑使用付费版本或API。
  • 🌐 LLMs的未来发展将进一步深化它们在数据治理、隐私保护和业务决策支持中的作用,但同时也需要注意数据的安全性和隐私保护。

Q & A

  • 安东尼·伍德沃德是谁?

    -安东尼·伍德沃德是Record Point的联合创始人兼首席执行官,这是一家快速增长的软件即服务解决方案公司,专注于帮助组织发现、管理和控制他们的数据,以实现更严格的合规性、更高效率和更低风险。

  • 什么是大型语言模型(LLMs)?

    -大型语言模型是一种人工智能或机器学习的变种,特别是使用神经网络,它们是非常擅长生成文本的数学模型。这意味着它们能够与人进行对话、生成论文,并且能以与问题上下文相关的方式进行,而且语法正确。

  • 大型语言模型如何帮助数据治理?

    -大型语言模型可以帮助数据治理通过:1. 分类数据,2. 汇总数据,3. 提供问答服务,4. 实体识别,5. 情感分析。这些功能可以用于标记数据、简化信息,并根据风险和治理要求采取相应的行动。

  • 大型语言模型能否自动生成本体论?

    -大型语言模型位于监督学习和无监督学习之间,它们可以在一定程度上自动生成本体论,但通过微调模型可以提高准确性。尽管它们能够生成英语基础的本体论,但提高特定本体论的精确度需要进行定制化训练。

  • 大型语言模型在处理个人数据时可能面临的风险是什么?

    -大型语言模型在处理个人数据时可能面临的风险包括偏见、数学幻觉、误导信息等。这些模型可能会反映它们所代表的数据的偏见,或者因为模型混淆而产生错误的联系和输出。

  • 如何确保使用大型语言模型时考虑到批判性思维?

    -使用大型语言模型时,需要培养用户的批判性思维能力,以判断模型提供的结果是否可信。这包括理解模型的局限性、识别可能的偏见,并在决策过程中综合考虑多个因素。

  • Record Point是如何利用大型语言模型改进数据治理的?

    -Record Point利用大型语言模型通过分类和标记数据,理解数据的隐私背景,以及提供数据的控制和处理能力来改进数据治理。他们专注于数据清理、分类和企业内不同系统的数据控制。

  • 大型语言模型能否直接应用GDPR本体论来构建政策和规则?

    -大型语言模型本身不能直接应用GDPR本体论来构建政策和规则。它们可以提供关于如何应用GDPR的建议,但实际上实施特定的数据保护和隐私管理措施需要更专业的工具和流程。

  • 大型语言模型的未来发展方向是什么?

    -大型语言模型的未来发展方向预计将深入整合到各种技术和业务流程中,提高信息汇总、决策支持和数据处理的效率。它们将在未来的技术解决方案中扮演关键角色,影响从数据治理到客户服务等广泛领域。

  • 安东尼·伍德沃德平时更倾向于使用哪个大型语言模型?

    -安东尼·伍德沃德在使用大型语言模型时没有特定偏好,他在不同的情况下会根据需要选择使用Chat GPT或Bard。他认为这两个模型各有优缺点,并且随着模型的不断更新和进化,他的选择也可能会改变。

Outlines

00:00

🎉 引言与嘉宾介绍

本视频节目专注于数据治理与大型语言模型(如ChatGPT)的增强。特邀嘉宾Anthony Woodward,RecordPoint的联合创始人兼CEO,深入探讨了数据治理的现状及其与隐私保护的交集。Anthony分享了自己的背景,包括他从澳大利亚搬到西雅图后的生活变化,以及他对咖啡的热爱和对冲浪与滑雪的兴趣。

05:00

🔍 大型语言模型简介

Anthony对大型语言模型(LLMs),如ChatGPT和Google Bard进行了简介,强调它们在文本生成方面的强大能力,能够进行对话、撰写文章等。他解释了这些模型如何通过学习互联网上的大量数据集来识别语言模式,并在回应中保持语法正确性。尽管这些模型表现出色,但Anthony指出,它们目前还不能理解或反映人类的复杂思维过程。

10:01

🛠️ 数据治理中LLM的应用

讨论了LLMs在数据治理中的潜在用途,如分类数据、总结数据和问题回答,以及识别实体和情感分析。Anthony解释了LLMs如何帮助标记数据,提高数据治理的效率,并通过分析和总结数据,支持决策过程。他还探讨了LLMs在隐私和控制方面的应用,如识别与个人相关的细节(如种族或健康信息)并分析数据的情绪倾向。

15:02

🚀 LLM在实践中的挑战

Anthony讨论了LLMs实际应用中的挑战,特别是它们倾向于反映并放大现有数据中的偏见。他强调了LLMs可能"幻觉"或产生误导信息的风险,这通常是由于模型训练数据的不完整性或偏差。此外,他提到了用户在使用LLMs结果时需要批判性思维的重要性,以及专家在纠正模型偏差和验证信息准确性方面的不可替代性。

20:02

🤖 LLM与商业数据的桥梁

探索了LLMs如何作为商业和数据之间的桥梁,尤其是在数据驱动决策和数据故事讲述方面。通过将LLMs应用于企业内部数据集,可以改进信息搜索和总结,支持新产品研发和市场分析。Anthony同时提醒,使用免费版LLM服务时要注意数据隐私问题,建议通过付费服务来保护企业数据。

25:03

📊 RecordPoint的角色与贡献

介绍了RecordPoint如何利用LLMs增强数据治理,通过自动化数据分类、管理和控制来提高效率。公司开发了大量连接器,使客户能够轻松地将LLMs应用于不同的数据源,如Salesforce和Twitter。Anthony强调了人工检查和模型训练的重要性,确保数据处理的准确性和符合规范性。

30:04

🌟 数据治理的未来趋势

讨论了数据治理和LLMs应用的未来发展,预见LLMs将如何继续革新数据管理和分析过程。Anthony预测,随着技术的进步,LLMs将被广泛集成到各种业务流程中,从而提高决策效率和数据利用率。他将LLMs的兴起比作互联网的革命性影响,预示着数据治理领域即将发生的深远变革。

35:05

🔑 LLM工具选择与应用实例

Anthony分享了他个人在使用LLMs工具时的偏好,比较了ChatGPT和Bard的优势和用途。他说明了在不同场景下,根据所需的信息类型和处理能力,会选择不同的LLM工具来辅助工作。通过具体例子,展示了LLMs在撰写商业计划和报告中的实际应用,体现了其在提高工作效率和信息获取能力方面的价值。

Mindmap

Keywords

💡数据治理

数据治理是指对数据的管理和控制过程,确保数据的质量和安全。在视频中,数据治理与大型语言模型(LLMs)如ChatGPT的结合被讨论,强调了利用这些模型对数据进行分类、总结和实现合规性的潜力,提高效率,降低风险。例如,通过应用大型语言模型对数据进行自动分类,有助于实现更严密的数据治理。

💡大型语言模型(LLMs)

大型语言模型,如ChatGPT,是基于深度学习的模型,能够生成文本、理解和回答问题。视频中将其解释为通过数学模型近似人脑的方式,利用神经网络对大量数据进行处理,以此生成与输入相关的、语法正确的文本。这些模型在数据治理中的应用,如自动化数据分类和提供总结,展示了它们如何辅助处理大量数据。

💡数据分类

数据分类是将数据根据其特征、用途或其他标准分成不同类别的过程。视频强调了LLMs在数据分类上的能力,如能基于内容自动生成标签,对数据治理而言,这能大大提高对数据的理解和管理效率。例如,通过自动识别和标记数据的敏感度级别,帮助企业遵守相关的数据保护法规。

💡合规性

合规性涉及确保企业活动符合相关法律、政策和标准。视频中提到,利用LLMs可以帮助组织更有效地发现、治理和控制数据,以满足更紧密的合规要求。通过自动分类和处理数据,LLMs能帮助企业更好地理解其数据,确保遵守数据保护法律如GDPR。

💡风险降低

风险降低是指通过各种措施减少潜在损害的可能性。视频中,通过使用大型语言模型来增强数据治理,可以帮助组织更有效地识别和管理数据相关的风险,比如通过识别敏感数据并确保其合理处理,从而降低数据泄露或不合规的风险。

💡效率提高

效率提高是指用更少的资源或时间完成更多的工作。视频讨论了LLMs如何通过自动化处理大量数据任务,如分类和总结,来提高数据治理的效率。例如,使用LLMs自动化生成报告或提供数据洞察,可以节省时间并加快决策过程。

💡总结能力

总结能力是指将大量信息压缩成简短、易于理解的格式的能力。在视频中,强调了LLMs在数据治理中的应用,特别是它们在自动总结数据方面的能力,这有助于快速提供数据的关键洞察,支持更有效的决策和报告。

💡隐私

隐私是指保护个人信息不被未授权访问或披露的权利。视频中讨论了LLMs在处理数据时如何识别和处理个人敏感信息,以及它们在帮助组织遵守数据隐私法规方面的潜力,例如通过识别包含个人信息的数据并适当处理以保护隐私。

💡实体识别

实体识别是从文本中自动识别和分类具有特定意义的信息(如人名、地点、组织)的过程。视频中提到,LLMs能够执行实体识别,这对于数据治理尤其重要,因为它可以帮助识别和标记涉及特定个人或实体的数据,从而更好地管理和保护这些信息。

💡情感分析

情感分析是指确定文本的情感倾向(如正面、负面或中性)的过程。视频中讨论了LLMs在识别数据情绪方面的应用,这可以帮助组织理解数据背后的情绪,如客户反馈,从而对策略或产品进行相应调整。

💡偏见与误导

偏见与误导指的是由于数据或模型的不完善而产生的不准确或歧视性的信息。视频中提到,虽然LLMs在数据治理中有很多优点,但它们也可能因为训练数据的偏见而产生误导性的结果,强调了监控和纠正这些偏见的重要性。

Highlights

Introduction of Anthony Woodward, CEO of Record Point, and discussion on augmenting data governance with ChatGPT and other large language models.

Anthony Woodward shares his hobbies, including a passion for coffee and snowboarding, illustrating his personal connection to the topics of discussion.

Explanation of large language models (LLMs) as AI that excels in generating text, holding conversations, and producing grammatically correct essays.

The potential of LLMs in classifying data, summarizing information, and assisting in data governance by applying classifications and ontologies.

Discussion on the role of LLMs in enhancing data governance, including their ability to improve data classification and summarize data points.

Exploration of how LLMs can contribute to data governance by recognizing entities and analyzing sentiment, particularly in privacy-sensitive contexts.

Addressing the capabilities of LLMs in generating ontologies and classifying data based on both predefined ontologies and content-derived ones.

The conversation highlights the risks associated with LLMs, including biases and the potential for generating inaccurate or misleading information.

Insights into using LLMs as a bridge between business and data, enhancing storytelling with data, and providing a more nuanced understanding of business data.

A word of caution on the different privacy implications between the free and paid versions of OpenAI's services.

Discussion on Record Point's approach to leveraging LLM technology for data governance, emphasizing the importance of a feedback loop for verifying classification.

Insight into how Record Point's unique application of LLMs enables tailored data governance solutions that respect customer privacy and data integrity.

The potential of LLMs to build policies and rules based on specific ontologies, such as GDPR, and apply these to classified data for enhanced governance.

A look into the future of data governance with LLMs, suggesting a significant acceleration in capability and potential applications across various business processes.

Closing remarks on the evolution of LLMs and their impact on data governance, with Anthony Woodward sharing his preference between ChatGPT and Bard based on their respective strengths.

Transcripts

play00:01

[Music]

play00:03

hello everybody everyone welcome on the

play00:06

lights on data show today we're going to

play00:08

talk about augmentation of data

play00:10

governance with cha gbt and other large

play00:13

language models

play00:15

I love it it's a great great topic and

play00:18

very very relevant today our wonderful

play00:22

guest today is Anthony Woodward Anthony

play00:25

Woodward is the co-founder and CEO of

play00:27

record point of fast growing software

play00:30

service solution focused on helping

play00:33

organizations discover govern and

play00:35

control their data for tighter

play00:38

compliance more efficiency and less risk

play00:42

Anthony is regarded as one of the

play00:45

leading thinkers on the intersection of

play00:47

data and privacy welcome Anthony it's a

play00:50

wonderful to have you on the show

play00:51

thank you thanks for having me

play00:54

now before we go into

play00:57

um this great topic today I wanted to

play01:00

ask you if you have some interesting

play01:03

hobbies that you would like to share

play01:04

with our listeners and if maybe you've

play01:07

discovered a new hobby since you moved

play01:09

all the way from Australia to Seattle

play01:12

yeah I know uh look I I bought one of my

play01:15

hobbies um with me and I certainly

play01:17

discovered a hobby as well

play01:19

um I share a a passion for coffee I

play01:21

wouldn't be very Australian if I wasn't

play01:24

feeding everyone over their head about

play01:25

how much better coffee is in Australia

play01:27

and how I think most Australians think

play01:29

we've discovered coffee which I know is

play01:31

very untrue but um you know certainly

play01:34

there's a view that we've perfected it

play01:35

so um I know I followed George on

play01:37

LinkedIn and see your your weekly posts

play01:40

on coffee and um if you head over to my

play01:43

LinkedIn I try to one-up you on that um

play01:45

with some some different uh you know

play01:47

siphon coffee or some other pieces so um

play01:51

certainly coffee is a strong passion and

play01:53

um I don't know how anybody survives a

play01:55

day without something funny

play02:00

we're in great company

play02:02

absolutely and the good news you know is

play02:05

um uh really having moved to Seattle

play02:07

which is the U.S capital of coffee they

play02:10

had some work to do I think they catch

play02:12

up on Melbourne uh but um I was able to

play02:14

at least a transplant

play02:16

um that that particular passion and

play02:17

continue it uh in Seattle and look you

play02:20

know I'm coming from Australia I I

play02:22

certainly um surfed a lot as a child and

play02:25

spent a lot of time out on the waves

play02:27

which is a lot harder in the Pacific

play02:28

Northwest

play02:29

um but at least being how cold it is in

play02:31

the ocean

play02:32

um so I'm a bit of an avid snowboarder

play02:35

these days it certainly transferred my

play02:37

um my board uh you know uh water but

play02:41

water in the water into the board in the

play02:43

snow so that that's uh that's certainly

play02:44

a piece out how to spend my weekends but

play02:46

so yeah I've been going for hours about

play02:48

that but but

play02:50

um those are might kind of be too

play02:52

obvious

play02:53

we love it it sounds very very exciting

play02:56

maybe you can go to Whistler sometime if

play02:58

you haven't already and to Tofino for

play03:01

surfing certainly certainly it may not

play03:03

always though and and planning to be out

play03:05

there again when the season rolls back

play03:07

around so

play03:09

wonderful

play03:11

well we're happy to to have you closer

play03:14

um so let's get into the topic and

play03:16

today's topic so our first question uh

play03:19

just to ease us into it

play03:21

um what are or what is a large language

play03:23

learning model

play03:25

yeah it's a great question you know

play03:27

there's a lot of hype out there uh how

play03:31

chat GPT and you know Bard and a bunch

play03:34

of other uh large language models but

play03:37

um if I can really break it down

play03:39

um to the level of Simplicity you know

play03:42

large language models are uh it's AI or

play03:46

you know A variation of machine learning

play03:49

in particular using a neural network and

play03:52

we'll get to that uh in a little bit in

play03:55

a second but they're just really

play03:58

um mathematical models that are

play04:02

astonishingly good at generating text so

play04:06

you know that what does that mean well

play04:07

it means that they can hold a

play04:09

conversation with you uh they can

play04:11

generate an essay they can

play04:15

um do that

play04:17

um in a way that is

play04:19

um you know really coherent to the

play04:21

context of the question you're asking

play04:23

them

play04:24

um and they can do that you know that is

play04:25

grammatically correct particularly in

play04:27

English I mean it's not talking about

play04:28

other languages at this stage they're

play04:30

not not as good

play04:32

um but what

play04:34

um were there really

play04:37

um are at the end of the day is a

play04:39

mathematical model that

play04:43

um really approximates the human brain

play04:46

in the sense of having interconnected

play04:49

neurons so if you think about a brain

play04:51

and how a brain works where there are

play04:53

neurons and there's a firing to connect

play04:55

different thoughts and processes

play04:58

together as a sort of conceptual model

play05:00

uh you know large language models are a

play05:03

way to use a neural network

play05:05

to transform

play05:07

weighted mathematical models into

play05:10

desired output as language and what that

play05:15

really comes down to is a really really

play05:18

large set of mathematical models that

play05:22

are able to generate and transform data

play05:26

from the statistical numbers it has for

play05:29

each of the words so all of that's kind

play05:31

of a lot of a lot of pieces but to

play05:33

really simplify that at the end of the

play05:34

day if you think about

play05:37

chat GPT or Google bard what they've

play05:40

done is being said large data sets out

play05:43

of the internet they've broken that down

play05:46

to look at what are the patterns of

play05:48

language or words that go together that

play05:52

understood all right if I I'm asked a

play05:54

question about

play05:56

um a particular topic what is the kind

play05:59

of language I've seen on that topic and

play06:00

then give it back to you in a structured

play06:02

grammatically correct English way

play06:05

they're not able to think they're not

play06:07

able to

play06:09

um really go beyond

play06:12

um you know the recognition of entity so

play06:15

recognition of what are the things

play06:17

occurring around these words

play06:20

um and they're just able to then

play06:21

regurgitate these things within concept

play06:23

so you know the hype cycle around

play06:27

um them being able to you know turn into

play06:31

Skynet or something of that else is is

play06:33

very far off they are really just a very

play06:37

complex but

play06:39

um large set of data in a mathematical

play06:42

model

play06:43

okay that's good news about about uh not

play06:46

being there yet that's kind of level so

play06:49

we have a question here I love the way

play06:51

you explained it it was really very you

play06:55

know in a very easy to understand way

play06:57

more complex thing you know very easy to

play07:00

understand why thank you yeah I think we

play07:02

have a question here from Kate strasny

play07:03

and she's wondering then what problems

play07:06

are llms solving for data governance or

play07:09

you know what how can we use llm to

play07:12

augment our data governance efforts

play07:16

great question Kate and it's something

play07:18

that certainly

play07:19

um I think still going to evolve right

play07:21

we probably don't have all the answers

play07:22

today by

play07:24

um there are some really exciting areas

play07:26

that this opens up so

play07:28

you know really considering that that

play07:31

definition of large language models

play07:33

there are some things that they're

play07:34

really good at you know firstly

play07:36

classifying data so if it's able to

play07:39

understand

play07:41

different strings of text or potentially

play07:44

even different strings of data rather

play07:47

than just hacks then

play07:49

um you know we're able to quite

play07:50

accurately and reliably apply a

play07:53

classification or a series of different

play07:56

ontologies uh to that data and that gets

play08:00

super interesting because now we can tag

play08:02

the data up for different aspects of

play08:06

governance slash risk if I can think of

play08:09

both sides of the coin governance is as

play08:12

a result of risk that we've identified

play08:15

um you know to do things and really

play08:17

that's where things get interesting

play08:18

right because if we're able to

play08:21

understand those tags we can use a

play08:25

um statistical model that isn't

play08:27

pre-trained to look over a lot of data

play08:30

set that my needs and governance and

play08:31

apply that labeling

play08:33

but that's sort of job one job two is

play08:37

being able to summarize that data you

play08:39

know one of the problems I think we have

play08:40

in the data governance field is not just

play08:43

understanding

play08:45

you know what is the data how should we

play08:47

action the data what is the risk of that

play08:49

data but also how do we give it back to

play08:53

those that need it in forms that's

play08:56

consumable so you know

play08:59

um uh we we've traditionally had search

play09:02

engines and other processors to allow us

play09:04

to search and control through find

play09:07

things that that come up as hits and

play09:09

then go through that data the great

play09:11

thing about Live Language models is it

play09:13

is it's really great at summarizing

play09:17

um different data points but potentially

play09:19

even summarizing multiple data points

play09:21

within a relationship and that's where

play09:23

it gets really interesting so if you can

play09:25

imagine

play09:26

I have data it's classified within

play09:31

um you know an ontology of even risk um

play09:34

you know that allows us to look at you

play09:37

know a good classic example um depending

play09:39

on the jurisdiction you're in

play09:40

occupational health and safety OSHA in

play09:43

the US OHS and other parts of the world

play09:47

um and we can start to break down you

play09:50

know a summarization of all the risks

play09:52

inside the organization based on using a

play09:55

large language model and then output

play09:57

another ontology that can apply to

play09:59

classification this stuff all kind of

play10:00

Builds on itself I'm so it's really good

play10:02

at that it's also really good at

play10:05

question and answering so when you think

play10:07

again about the application of these

play10:10

tools you know going and asking

play10:13

um can you quantify uh how much of a

play10:17

thing do I have how much of information

play10:20

I have to provide for lawyers in a legal

play10:23

case is very very good at that and then

play10:25

then lastly you know the last two points

play10:28

um which I think really come to the area

play10:29

of privacy

play10:31

um and control it's really good at

play10:34

entity recognition but not even the sort

play10:36

of traditional entity recognition that

play10:38

we think about of people places

play10:40

organizations companies you know those

play10:43

are all pretty straightforward and

play10:45

although you know llms are really good

play10:47

at that entity recognition as you

play10:50

recognition to things like ethnicity

play10:53

um you know uh you know health issues

play10:56

um that might relate to a person so

play10:58

really getting more granular and

play11:00

thinking that about the risk of that

play11:02

data and how is your government you know

play11:04

that that's it within that and then

play11:06

connected to that is sentiment analysis

play11:09

so being able to look at the sentiment

play11:11

that wraps around it you know I was

play11:13

talking to a

play11:14

um a large customer today in the

play11:16

healthcare sector and they were talking

play11:19

about look I want to find all of the

play11:21

doctor reports that

play11:24

um are to do with a particular type of

play11:26

cancer that

play11:29

um have had a negative outcome for the

play11:31

patient

play11:32

um and we use a large language model to

play11:34

break that data down and blur through it

play11:37

obviously you know you've got to be very

play11:38

careful when you get get all the

play11:40

doctor's reports in a particular

play11:42

um Health in a particular hospital

play11:43

because there's got to be a bunch of

play11:45

these airlines are really good at

play11:47

getting away from those false positives

play11:49

understanding the sentiment

play11:50

understanding the connectivity of those

play11:53

words and then bringing that data back

play11:55

um to be used in a meaningful way so you

play11:58

know very good at that kind of data

play12:00

mining content categorization and then

play12:02

summarizing a data right okay well we

play12:06

have some follow-up questions to this

play12:07

and uh the first one comes from Dan

play12:09

Everett and specifically on what you're

play12:11

mentioning on the that the llms can

play12:13

really help with that classification

play12:15

he's wondering then let me just bring it

play12:18

up here

play12:19

is it doing the classification based on

play12:21

ontologies that you have to build or

play12:24

deontologies are being derived from the

play12:26

content itself

play12:27

what a great question then

play12:29

um the beauty of llms is

play12:33

um in in the world of AI there are a

play12:36

couple of different AI tools so all AI

play12:39

methods so there's supervised and

play12:42

unsupervised so supervised means that

play12:45

we've got a set ontology and we're

play12:48

training it we're going to train it to

play12:49

do a thing um it could be an ontology it

play12:52

could be some other uh Factor the thing

play12:55

about llms is they are somewhere in the

play12:58

middle of the supervisors and and

play13:00

supervised in that they're kind of

play13:01

semi-supervised so you do need to give

play13:04

them some tokens to fine-tune the models

play13:08

but

play13:09

um it's it's fair the models by

play13:11

themselves are pretty good at uh

play13:14

building an unsupervised set of

play13:17

processes so what does that mean it

play13:20

means that they are really able to

play13:22

generate ontologies by themselves but

play13:26

they are better if you fine-tune them

play13:28

like if you want to get down to the

play13:30

higher levels of accuracy then you can

play13:33

fine tune them by training them and and

play13:35

manipulating

play13:37

um the models to be more effective but

play13:38

they will generate an ontology without

play13:40

that they'll you know they do have an

play13:42

understanding of English in particular

play13:44

and I'll keep saying that because

play13:45

they're not necessarily so good today at

play13:48

non-english and that will evolve and get

play13:51

better over time by

play13:53

um you know they are very good at

play13:55

building English based autonomy

play13:57

ontologies

play13:59

um and being able to do that

play13:59

classification themselves but you can

play14:02

make them better by doing the sort of

play14:04

what AI experts would call transfer

play14:06

learning where you're able to give you

play14:09

more specificity uh to the model

play14:12

um on the ontology you're trying to

play14:14

create

play14:15

and and uh my other question was you

play14:18

you're mentioning that you can really

play14:19

figure out different entities

play14:22

um from from that data so you mentioned

play14:24

ethnicity as an example so it doesn't

play14:25

mean that it's it's really extrapolating

play14:27

figure out the different attributes that

play14:30

uh would be based on everything else

play14:32

that you kind of have in there and

play14:33

determine oh you know you are you know

play14:36

this and that

play14:37

yeah so it's some really interesting

play14:41

um uh academic papers around

play14:44

um you know looking at even the use of

play14:46

language and what that indicates around

play14:49

uh your your ethnicity and background

play14:53

um and the types of words that you use

play14:56

you know I'm I grew up Australian so

play14:59

um we do use some funny

play15:01

um uh phrases and phraseology

play15:05

um that when we write and and talk about

play15:07

things

play15:08

um yeah I confuse the uh my American

play15:11

friends all the time with uh with things

play15:14

I say

play15:15

um and I think you know that will come

play15:16

through in my emails and my speech and

play15:19

and so the models are able to use that

play15:22

to then match where they've seen that

play15:24

you know because these large language

play15:25

models are large they've got you know

play15:27

English language from many places again

play15:29

they can attribute that back into a

play15:32

particular

play15:33

um Regional rationale or driver based on

play15:37

the language and the use of that

play15:38

language it does go further though where

play15:41

um you know you can also look at the um

play15:46

data about you particularly if you're

play15:48

feeding it with

play15:49

um privacy data or data around

play15:52

um the individuals that that are

play15:54

concerned and it can use those factoids

play15:56

to build a a picture of your ethnicity

play15:59

and background also

play16:01

well those are all incredible benefits

play16:04

as well

play16:05

yeah

play16:07

but of course there's risks uh I I would

play16:10

think I mean that last example that you

play16:13

mentioned it can definitely turn into a

play16:15

big risk do you are there any that come

play16:18

to mind that we should watch out for

play16:21

plus you're uh we talked about the good

play16:24

things about large language models in

play16:26

reality there there have already been

play16:28

some bad things

play16:30

um so you know and I think

play16:32

um it is worth saying that the you know

play16:35

I started off by saying it's not Sky now

play16:37

and and some of the proof points that

play16:39

it's not skymet are our large language

play16:42

models are really prone to biases if not

play16:44

manage properly

play16:46

um you know they can effectively

play16:49

mathematically hallucinate

play16:52

um you know and lie and really that's

play16:54

just because it's a reflection of the

play16:56

data that they represent but they get

play16:58

confused

play17:00

um and they get confused because they

play17:01

see a a group of things and I say they

play17:05

see it because again I'm trying to refer

play17:06

them almost as if they're sentient and

play17:08

they're not

play17:09

um but but the model sees

play17:11

um you know a a mathematical set of

play17:15

Concepts that are connected those

play17:17

neurons again that we spoke about

play17:19

um but but in the same way that your

play17:22

mind can make things up and and give you

play17:25

you know

play17:26

um experience the concept you haven't

play17:28

actually experienced they you know the

play17:30

model can do the same and that's

play17:32

collected a whole bunch of facts so it

play17:33

crew it creates out of those facts or it

play17:36

doesn't actually it sees in those facts

play17:38

that aren't really joined a relationship

play17:41

and from that relationship it will give

play17:43

you a bias or it will give you a

play17:45

hallucination or maybe even the data

play17:47

itself that it was fed had a bias or a

play17:49

hallucination in it but when you look at

play17:52

it across the abstraction you can't see

play17:54

it as a set of individuals but when it's

play17:56

connected it's there so

play17:59

um you know that's something we really

play18:00

have to be very careful about like how

play18:02

factually accurate are they

play18:04

um you know how how

play18:07

um how are those models going to

play18:10

um be used and and what is the

play18:13

prevalence behind the model so that you

play18:16

know you know how it's constructed and

play18:18

and what biases are going to lie inside

play18:21

that

play18:22

and also how can we maybe educate the

play18:25

users

play18:26

to you know in critical thinking when it

play18:29

comes to the results the chat TPT brings

play18:32

up and this is in general also for

play18:34

Google searches right

play18:36

um You need to use your own critical

play18:39

mind to establish if those things that

play18:43

come up are actually trustworthy

play18:46

but I think the chargpt this makes it

play18:48

even more complex and more difficult

play18:51

yeah absolutely and it's you know it's

play18:54

why you know

play18:55

um we're always going to need experts

play18:57

because you know

play19:00

um they're still going to need to be

play19:01

some level of human triage of those

play19:03

things like the models are not good at

play19:07

understanding their own biases or

play19:09

self-reflecting you know being able to

play19:10

do that self-reflection they're just a

play19:12

mathematical model so of course they

play19:13

can't self-aver play that seems

play19:15

self-evident but you know we do I mean

play19:18

even myself you're Amazed by tap jpt you

play19:20

go ask it a question it gives you this

play19:22

fantastic answer it appears sentient but

play19:25

but it's really not it's just this

play19:26

mathematical model that has no ability

play19:28

to self-reflect

play19:30

um so we can't think about whereas

play19:31

biases are and correct for that and

play19:33

let's just talk to so again the good

play19:36

thing about these models is they can be

play19:37

trained out of some of those behaviors

play19:39

and you know we're only at the beginning

play19:41

of the arm to do that well

play19:43

so

play19:47

um Petra here has has a question uh for

play19:50

you Anthony and he's wondering

play19:53

how do you see llms especially Chad gbt

play19:56

but not necessarily can be a bridge

play19:58

between business and data both in terms

play20:01

of being business slash data driven but

play20:04

also in telling stories about data and

play20:06

stories with data

play20:08

yeah I think what's really great about

play20:11

the Live Language models is you can use

play20:13

them as a seed

play20:16

um to match against your own business

play20:19

data so you know I talked before around

play20:21

the different ways that we can use them

play20:25

from a data governance perspective but

play20:28

um it also can apply just as well to

play20:31

data sets

play20:34

each other inside the firewall inside

play20:37

the business so the

play20:40

um you know there are some really great

play20:42

scenarios of using them as a much better

play20:44

search engine again because they're much

play20:45

better at that

play20:47

um summarization of information and much

play20:49

better at understanding the context of

play20:50

information and much better understand

play20:52

the context of your question

play20:55

um so you know you can you know go ahead

play20:59

and use them to bring that data set

play21:01

together you know uh when you're asked

play21:03

for you know in a business context of

play21:05

being asked to research a new product or

play21:07

research a new capability you know being

play21:10

able to look at all of the data already

play21:11

inside your inside your business

play21:13

summarize that data and you know uh

play21:16

develop that new product for instance

play21:17

they're very very good at that

play21:19

um so you you can already using open AI

play21:23

using some of the apis from Google and

play21:25

and others Facebook and others

play21:28

um you know do that today I will say

play21:31

just click one word of caution

play21:33

um if you are going to go use open OA

play21:35

and chat GPT please go pay for it not

play21:37

that I'm not recommending you pay them

play21:40

money but a really big different privacy

play21:42

implications if you use the free version

play21:44

we probably don't have time to cover it

play21:45

all in the episode today for the whole

play21:47

other Episode by itself uh but just a

play21:50

word of warning the privacy policy for

play21:52

the free version versus the paid version

play21:54

is very different effectively sharing

play21:56

your data for free with open AI if you

play21:58

don't pay for it

play21:59

um and that can have some really big

play22:00

implications from a business perspective

play22:02

as well

play22:03

very interesting so and and yes like you

play22:07

said we're not going to get into it but

play22:08

just to clarify is it um if you pay for

play22:10

it so if you uh upgrade yours you have

play22:13

the P subscription or if you're using

play22:15

their API to then connect is it either

play22:19

or or just the API version that would

play22:21

not would respect your privacy a little

play22:23

bit more

play22:24

yeah unless it's changed like I believe

play22:26

in order to use their API in any

play22:28

meaningful way you actually have to pay

play22:30

for it anyway so they can't do the same

play22:31

rate

play22:32

um there are some free elements for the

play22:33

API by um in order to properly use the

play22:36

API of business data you would you would

play22:38

need to pay right right and is that ever

play22:40

it is you know reminding us if it's a

play22:43

free service then your data is the price

play22:47

absolutely absolutely and look I think

play22:50

you know as we've already seen again um

play22:52

with these large language models you

play22:53

know the very large companies behind

play22:55

them are just chomping through our data

play22:58

and and using it to perfect their models

play23:00

and without attribution and you know

play23:03

there's a whole a bunch of law to be

play23:05

written you know seeing what's happening

play23:06

in Italy and other parts of Europe

play23:09

um where there are some real caveats

play23:12

around what they're doing uh you know in

play23:14

the world and that's yet to play out um

play23:16

let's see how that case will

play23:19

um develops over time

play23:21

so so Anthony

play23:23

um let's talk a little bit about Breaker

play23:25

Point too where where um where does it

play23:28

come in how how can it help our data

play23:30

governance efforts

play23:32

uh we're at a point of being um doing

play23:35

data governance for some time where a

play23:37

SAS solution uh you know we work with

play23:40

people like New York City Seattle City

play23:43

and

play23:44

um there in British Columbia where you

play23:47

are absorbed you know with um the the

play23:49

liquor and cannabis board for instance

play23:51

um the

play23:52

um what what we've done for some time is

play23:56

classify data to apply retentioning but

play23:58

to also understand its privacy context

play24:03

um how that data should be treated and

play24:04

handled over time

play24:06

where

play24:07

um we've been doing a lot of work and

play24:09

then where we've been adding to our tool

play24:11

and where

play24:12

um we'd be really looking at large

play24:13

language models is really improving that

play24:15

classification that tagging so what

play24:19

we've done

play24:20

um that we think's rather unique is

play24:23

um we build about a thousand odd

play24:25

connectors to Upstream systems so you

play24:27

can feed these models

play24:30

um you know be that Salesforce be that

play24:32

you know Twitter and and

play24:35

um like data Lakes like snowflake or

play24:37

just traditional places like file shares

play24:39

and so we've really focused I guess on

play24:41

three points one is how do you clean

play24:44

your data up so the very basic things of

play24:48

looking at redundant obsolete trivial

play24:50

data because we want to you know in

play24:52

order for these large language models to

play24:54

work we actually need better cleanse

play24:55

data that has less biases and all those

play24:57

sorts of things

play24:59

um so absolutely that that data

play25:01

cleansing capability or we called Data

play25:03

inventory

play25:04

um two then looking at the

play25:06

categorization of data what um

play25:08

ontologies does it apply to so from a

play25:11

data governance perspective that could

play25:12

well be a risk ontology it could be a

play25:15

record-keeping file plan ontology it

play25:18

could be um you know looking at

play25:20

different dimensions of privacy you know

play25:22

who are the entities in this document

play25:23

who does it talk to what are the

play25:25

relationships or or data um you know

play25:28

they use the word document and data

play25:29

interchangeably here

play25:31

um and then providing that control so

play25:33

that the Enterprise is able to do that

play25:36

in all of those systems

play25:38

um within a large language model so

play25:39

that's very much where we've been

play25:41

focused at record point is really

play25:42

building out

play25:44

um very simple ways for a customer to

play25:47

leverage these Technologies

play25:49

um but point and click connect to a

play25:51

Salesforce classify that Salesforce

play25:53

manage that Salesforce and Anthony I'm

play25:56

very curious if if as you're leveraging

play25:59

this technology do you have some sort of

play26:01

a feedback loop that your somebody is

play26:04

sort of verifying that classification

play26:05

for example as a first round and you

play26:08

have somebody from the business and

play26:09

they're like yeah yeah you know spot

play26:10

checking and uh it's good to go

play26:13

yeah some of the things that we've

play26:15

always we've really felt um is important

play26:20

um I don't think our customers agree

play26:22

um you can operate our tool again it's

play26:25

it's semi-supervised

play26:27

um so you can operate a completely and

play26:28

unsupervised fashion and it will just go

play26:30

and do a thing and the accuracy comes

play26:32

down but there are scenarios where that

play26:35

makes sense I've got a large set of data

play26:36

I just wanted to go and discover right I

play26:38

just wanted to go classified it can go

play26:40

do those things

play26:41

but we also recognize and and I think

play26:45

you know um one of the things that's

play26:47

quite yeah that we've really focused on

play26:49

and done a lot of

play26:50

um development r b on is the ability to

play26:54

um override or have a human train the

play26:56

model because it's it still is the case

play26:59

these large language models are very

play27:00

good

play27:01

um but they still as we said earlier I

play27:03

have biases and problems and we need

play27:05

humans to be able to ride back things

play27:07

we've done an outsole that relates to

play27:09

give you that capability go it's not one

play27:12

of these or it is one of these

play27:15

um in a much stronger way and then

play27:16

adjust the model

play27:18

um based on your data set and that's

play27:20

that's a really important

play27:23

um governance piece right because you

play27:24

need to be able to govern the model not

play27:26

just govern the data

play27:29

and what record point is doing to help

play27:31

one client would it be able to benefit

play27:33

another as well

play27:35

yeah again you get into some some

play27:37

interesting legal issues so yes in terms

play27:39

of process we we learn

play27:42

um where people are tweaking the facts

play27:45

in the models we learn

play27:48

um what are the various seeds we should

play27:49

provide we don't share in our case every

play27:53

customer has a model to themselves so

play27:56

their data never leaves their own

play27:58

tenancy within our SAS solution

play28:01

um and is confined to that so that way

play28:03

we're not sharing data across sort of

play28:05

boundaries because even though the data

play28:08

isn't trapped in a model there is waste

play28:10

reverse engineer that or you know you

play28:12

may end up with outputs that are raised

play28:15

based on another another business if if

play28:17

you shared these models so we've

play28:20

developed a way to create a effectively

play28:21

a seed model that um doesn't have any of

play28:24

the businesses own data in it but gets

play28:27

you started

play28:28

um and then on that you'll then build

play28:30

with your own data over time and it but

play28:32

it is completely um separated from every

play28:34

other

play28:35

um customer in in in our platform

play28:39

Dan is wondering about this uh very

play28:41

specific application if if you have the

play28:44

llm that's um we have an ontology based

play28:46

on gdpr

play28:48

then can you use the LMS to build

play28:51

policies and rules based on that

play28:52

ontology classify that data based on it

play28:55

you know and apply property policies and

play28:57

rules to the classified entities

play29:03

um llms by themselves won't do those

play29:06

things

play29:07

um they'll go close to giving you

play29:10

and on you know if you've given them the

play29:12

rules of uh of gdpr and then you've

play29:16

looked at uh asking if the question

play29:19

around how duty power applies

play29:23

um they will give you answers but they

play29:24

won't they won't

play29:26

um do those things uh where you take the

play29:30

specific policies of say data subject

play29:33

access requests in dsat and then track

play29:36

the time periodicity of supplying that

play29:39

back um to a customer they won't do that

play29:41

out of the box

play29:43

um yeah and I thank you very much for

play29:45

the tool because it does does lead back

play29:46

a little bit to what we do at record

play29:48

Point that's how we're augmenting those

play29:50

models is looking at those workflows and

play29:53

those specific processes in gdpr or CCPA

play29:57

or whatever the legislative outcome is

play30:00

um and and giving you the actions that

play30:04

come from that classification so that

play30:08

that that's very much the world we live

play30:09

in is how do you you apply a set of

play30:11

regulatory actions to an AI outcome

play30:14

so I mean I can

play30:17

I understand all these benefits and it's

play30:19

fantastic it really makes my heart jump

play30:22

uh in a positive way it's just looking

play30:25

at all the possibilities and how the

play30:27

world is advancing in the series is

play30:29

purely fantastic do you think I mean

play30:32

it's definitely removing a lot of the

play30:34

hassle from organizations on how to deal

play30:37

with some of these data governance

play30:38

aspects do you feel it will still be

play30:41

able to or it should still involve

play30:43

stakeholders from across the business

play30:44

and their data governance efforts now

play30:47

with you know Chad GPT and LMS and

play30:50

record Point really helping a lot of the

play30:53

groundwork

play30:55

absolutely I I mean you have to bring

play30:58

the

play31:00

um yeah the business owners and and the

play31:02

stakeholders along

play31:04

um in this journey you know we you know

play31:06

the thing about AI is it is a big leap

play31:08

like it is really accelerating our

play31:11

ability to do things at a really great

play31:13

Pace that previously took a lot longer

play31:16

um and was much more difficult to do

play31:19

but you can't just go take that leap in

play31:22

the corner because if you take the leap

play31:24

in the corner

play31:25

um you've kind of you know jumped a

play31:27

dimension if you will

play31:29

um but nobody else came with you

play31:31

um and so I think you know there's so

play31:33

many benefits here for the business and

play31:35

how we're organizing data and thinking

play31:39

about

play31:40

um they've been making data more

play31:41

effective that you'd want to share it

play31:44

you'd want to bring your stakeholders

play31:46

along you'd want to integrate them into

play31:47

that and I think what's really great

play31:49

about what's happened in the public

play31:50

discourse around open Ai and chat GPT

play31:54

and the rest of it is there's a real

play31:56

awareness of wanting to have that

play31:57

conversation and seeing this as a as a

play32:00

real mechanism to make their lives and

play32:03

their jobs easier you know what we've

play32:05

really always focused on our record

play32:06

point is how do we abstract as much of

play32:10

the data governance and and risk

play32:13

decisions from the user so it doesn't

play32:15

burden them the really great things

play32:17

about llms is we've really taken about

play32:19

now that Leap Forward and I think that's

play32:21

the starting point of that conversation

play32:23

of you know I'm not here to come and add

play32:25

something more to your day I'm actually

play32:26

here to come and take something away

play32:28

from you today

play32:29

um and make it more effective and better

play32:31

for you and that's really

play32:33

um you know when you have that

play32:34

conversation it's really productive

play32:37

right yeah and it will frees up the mind

play32:40

of the stakeholders so they can focus on

play32:42

the Strategic decisions which is awesome

play32:44

so that's great to bring them in because

play32:47

now they you know they have more

play32:49

capacity to to focus on that

play32:51

um I wanted to ask Anthony how do you

play32:54

see that um you know this Evolution will

play32:56

help data governance so where where does

play33:01

this all go towards you you mentioned

play33:03

acceleration

play33:05

um just now so where are we heading

play33:08

yeah

play33:10

um you know uh it's it's the it's the

play33:12

million dollar question

play33:14

um look uh we really really are at a

play33:19

very early stage for what we've seen

play33:22

from our chat GPT and and and you know

play33:25

the the really large language models

play33:27

that are out there that there is you

play33:30

know whilst they are going to

play33:32

revolutionize

play33:34

um the processes we have you know I

play33:36

think in data governments they're

play33:37

actually a really easy application like

play33:40

we're one of the first areas that we'll

play33:42

be able to take advantage of them very

play33:43

quickly

play33:44

but

play33:46

um there is a plethora of additional

play33:49

workloads and capabilities that we're

play33:51

going to be able to use them for right

play33:53

because they're really good at

play33:55

understanding uh context you know and

play34:00

um summarization and

play34:03

um capturing

play34:05

um you know different

play34:07

um decisions is a strong word but

play34:09

supporting evidence to decisions

play34:12

um you know you're really going to see

play34:13

them more and more pop up into different

play34:16

business processes that we all have so

play34:19

um you know you're being able to augment

play34:21

a workflow you know onboarding a patient

play34:24

to a hospital but bringing that data set

play34:26

summarizing the kind of problems they've

play34:28

had in the past so really bringing facts

play34:31

and making them

play34:32

um Bubble to the top in a timely manner

play34:34

of that context and that's a very simple

play34:36

example but that really applies to

play34:38

everything so you know uh we will really

play34:41

see them roll out in in systems

play34:43

everywhere I think in in 10 15 years

play34:45

time

play34:47

every technology will have some AI llm

play34:51

component to it you know I I certainly

play34:54

see llms and what's happened with AI

play34:57

here is just as revolutionary as the

play34:59

internet if not more Revolution internet

play35:02

yeah

play35:04

yeah the future is very exciting lastly

play35:07

I want to take this question from Ravid

play35:09

he's wondering what do you use most

play35:11

charging PT or Bart

play35:15

um I'm really oscillating between the

play35:17

two

play35:18

um they both have some strengths and

play35:19

weaknesses as a as you as you probably

play35:22

have experienced

play35:24

um look if I asked to say advice to take

play35:26

today as an example I use Bard a little

play35:29

bit more today

play35:30

um I I found bad uh I I just a touch

play35:34

better and

play35:35

um I've today I was working on a new

play35:38

business plan and some board reports

play35:41

um and I was looking for some data to

play35:42

bring into that but a little better at

play35:44

bringing in statistical information and

play35:46

and those sorts of things so

play35:48

um I've I've leaned in that direction

play35:49

but these models are evolving so quickly

play35:53

you know open AI put out gpt4

play35:57

um there are others coming in the

play35:58

pipeline that we'll see come out and and

play36:00

so you asked me that same question again

play36:02

tomorrow and I'll probably give you a

play36:03

different answer but it's a great

play36:05

question

play36:06

thank you Anthony

play36:08

well thank you so much for being on the

play36:10

show and putting the lights on uh

play36:12

augmentation of the governance with Chad

play36:13

GPT

play36:14

[Music]

play36:16

not having me it was a lot of fun

play36:18

Anthony where can people find you and

play36:21

also find more about record point

play36:24

I'm usually on a plane over the Pacific

play36:26

somewhere that um

play36:30

uh so so look uh yeah Anthony Woodward

play36:34

on LinkedIn and many other places

play36:37

um you can find me in you know any any

play36:39

social network

play36:40

um I'm relatively active but

play36:43

um if you if you hit recordpoint.com

play36:46

um you know we have a whole bunch more

play36:47

description our website we have a number

play36:49

of blog articles on large language

play36:51

models and chat GPT and how we're using

play36:53

Ai and those processes

play36:56

um and you know certainly from there

play36:59

um if you uh hit um one of the contact

play37:01

Arsenal or some of those processes there

play37:03

um you'll be able to talk to myself or

play37:05

one of the team and

play37:06

um always happy to have a reach out on

play37:08

Twitter so I'm at woodwas w o o d w a on

play37:12

Twitter or hit me up on LinkedIn um

play37:15

always hoping to have a chat with

play37:16

anybody that's interested in this topic

play37:19

oh thank you

play37:21

all right well thank you so much

play37:22

everybody for joining in on this

play37:24

exciting topic and uh looking forward to

play37:27

see you next week

play37:28

and thank you very much for the

play37:30

questions everyone really really good

play37:32

questions and very good engagement

play37:35

thank you and have a wonderful day thank

play37:37

you

Rate This

5.0 / 5 (0 votes)

Related Tags
数据治理ChatGPTAI技术企业策略隐私保护效率提升风险管理大数据机器学习行业趋势
Do you need a summary in English?