Augmentation of Data Governance with ChatGPT and Large LLMs
Summary
TLDR在这期《数据之光》节目中,我们邀请到了RecordPoint的联合创始人兼CEO安东尼·伍德沃德,深入探讨了数据治理如何通过ChatGPT和其他大型语言模型得到增强。安东尼分享了他对于大型语言模型定义的见解,以及它们如何帮助企业更有效地管理数据,降低风险,并提高合规性。此外,他还讨论了大型语言模型在数据分类、总结、问题回答、实体识别和情感分析等方面的应用,以及使用这些模型时可能遇到的风险和挑战。通过此次对话,观众能够对如何利用这些前沿技术优化数据治理过程获得更深入的理解。
Takeaways
- 😀 Anthony Woodward是Record Point的联合创始人和CEO,这是一个快速发展的软件服务解决方案,旨在帮助组织发现、治理和控制他们的数据,以便更严格的合规、更高效和更低的风险。
- 📊 大型语言模型(LLMs),如ChatGPT和其他,被讨论用于数据治理增强,它们通过神经网络,能够生成文本,进行对话,写作文,等等。
- ☕ Anthony分享了他对咖啡的热情,以及搬到西雅图后,如何继续他的咖啡爱好和学习雪板作为新爱好。
- 🧠 LLMs是基于数学模型,能够模拟人脑的连接神经元方式来处理和生成语言,这让它们在处理语言方面表现出色。
- 🔍 LLMs在数据治理中的应用包括数据分类、数据摘要、问答系统,以及实体识别和情感分析,这对于管理风险和合规性至关重要。
- 🤖 尽管LLMs表现强大,但它们也容易受到偏见的影响,需要管理和调整以确保准确性和公正性。
- 💡 LLMs能够在不需要预先定义的本体论的情况下生成分类和本体论,但通过细化可以提高它们的准确性。
- 🚀 Record Point使用LLMs来提高数据分类和标记的效率,同时也提供了一个反馈循环,允许人工校验和改善分类结果。
- 📈 LLMs预计将极大加速数据治理和其他商业流程的能力,预示着人工智能技术将成为未来技术的一个核心组成部分。
- 🔑 Anthony提醒,使用免费版本的LLMs(如ChatGPT)可能会与数据隐私政策冲突,推荐在商业应用中考虑使用付费版本或API。
- 🌐 LLMs的未来发展将进一步深化它们在数据治理、隐私保护和业务决策支持中的作用,但同时也需要注意数据的安全性和隐私保护。
Q & A
安东尼·伍德沃德是谁?
-安东尼·伍德沃德是Record Point的联合创始人兼首席执行官,这是一家快速增长的软件即服务解决方案公司,专注于帮助组织发现、管理和控制他们的数据,以实现更严格的合规性、更高效率和更低风险。
什么是大型语言模型(LLMs)?
-大型语言模型是一种人工智能或机器学习的变种,特别是使用神经网络,它们是非常擅长生成文本的数学模型。这意味着它们能够与人进行对话、生成论文,并且能以与问题上下文相关的方式进行,而且语法正确。
大型语言模型如何帮助数据治理?
-大型语言模型可以帮助数据治理通过:1. 分类数据,2. 汇总数据,3. 提供问答服务,4. 实体识别,5. 情感分析。这些功能可以用于标记数据、简化信息,并根据风险和治理要求采取相应的行动。
大型语言模型能否自动生成本体论?
-大型语言模型位于监督学习和无监督学习之间,它们可以在一定程度上自动生成本体论,但通过微调模型可以提高准确性。尽管它们能够生成英语基础的本体论,但提高特定本体论的精确度需要进行定制化训练。
大型语言模型在处理个人数据时可能面临的风险是什么?
-大型语言模型在处理个人数据时可能面临的风险包括偏见、数学幻觉、误导信息等。这些模型可能会反映它们所代表的数据的偏见,或者因为模型混淆而产生错误的联系和输出。
如何确保使用大型语言模型时考虑到批判性思维?
-使用大型语言模型时,需要培养用户的批判性思维能力,以判断模型提供的结果是否可信。这包括理解模型的局限性、识别可能的偏见,并在决策过程中综合考虑多个因素。
Record Point是如何利用大型语言模型改进数据治理的?
-Record Point利用大型语言模型通过分类和标记数据,理解数据的隐私背景,以及提供数据的控制和处理能力来改进数据治理。他们专注于数据清理、分类和企业内不同系统的数据控制。
大型语言模型能否直接应用GDPR本体论来构建政策和规则?
-大型语言模型本身不能直接应用GDPR本体论来构建政策和规则。它们可以提供关于如何应用GDPR的建议,但实际上实施特定的数据保护和隐私管理措施需要更专业的工具和流程。
大型语言模型的未来发展方向是什么?
-大型语言模型的未来发展方向预计将深入整合到各种技术和业务流程中,提高信息汇总、决策支持和数据处理的效率。它们将在未来的技术解决方案中扮演关键角色,影响从数据治理到客户服务等广泛领域。
安东尼·伍德沃德平时更倾向于使用哪个大型语言模型?
-安东尼·伍德沃德在使用大型语言模型时没有特定偏好,他在不同的情况下会根据需要选择使用Chat GPT或Bard。他认为这两个模型各有优缺点,并且随着模型的不断更新和进化,他的选择也可能会改变。
Outlines
🎉 引言与嘉宾介绍
本视频节目专注于数据治理与大型语言模型(如ChatGPT)的增强。特邀嘉宾Anthony Woodward,RecordPoint的联合创始人兼CEO,深入探讨了数据治理的现状及其与隐私保护的交集。Anthony分享了自己的背景,包括他从澳大利亚搬到西雅图后的生活变化,以及他对咖啡的热爱和对冲浪与滑雪的兴趣。
🔍 大型语言模型简介
Anthony对大型语言模型(LLMs),如ChatGPT和Google Bard进行了简介,强调它们在文本生成方面的强大能力,能够进行对话、撰写文章等。他解释了这些模型如何通过学习互联网上的大量数据集来识别语言模式,并在回应中保持语法正确性。尽管这些模型表现出色,但Anthony指出,它们目前还不能理解或反映人类的复杂思维过程。
🛠️ 数据治理中LLM的应用
讨论了LLMs在数据治理中的潜在用途,如分类数据、总结数据和问题回答,以及识别实体和情感分析。Anthony解释了LLMs如何帮助标记数据,提高数据治理的效率,并通过分析和总结数据,支持决策过程。他还探讨了LLMs在隐私和控制方面的应用,如识别与个人相关的细节(如种族或健康信息)并分析数据的情绪倾向。
🚀 LLM在实践中的挑战
Anthony讨论了LLMs实际应用中的挑战,特别是它们倾向于反映并放大现有数据中的偏见。他强调了LLMs可能"幻觉"或产生误导信息的风险,这通常是由于模型训练数据的不完整性或偏差。此外,他提到了用户在使用LLMs结果时需要批判性思维的重要性,以及专家在纠正模型偏差和验证信息准确性方面的不可替代性。
🤖 LLM与商业数据的桥梁
探索了LLMs如何作为商业和数据之间的桥梁,尤其是在数据驱动决策和数据故事讲述方面。通过将LLMs应用于企业内部数据集,可以改进信息搜索和总结,支持新产品研发和市场分析。Anthony同时提醒,使用免费版LLM服务时要注意数据隐私问题,建议通过付费服务来保护企业数据。
📊 RecordPoint的角色与贡献
介绍了RecordPoint如何利用LLMs增强数据治理,通过自动化数据分类、管理和控制来提高效率。公司开发了大量连接器,使客户能够轻松地将LLMs应用于不同的数据源,如Salesforce和Twitter。Anthony强调了人工检查和模型训练的重要性,确保数据处理的准确性和符合规范性。
🌟 数据治理的未来趋势
讨论了数据治理和LLMs应用的未来发展,预见LLMs将如何继续革新数据管理和分析过程。Anthony预测,随着技术的进步,LLMs将被广泛集成到各种业务流程中,从而提高决策效率和数据利用率。他将LLMs的兴起比作互联网的革命性影响,预示着数据治理领域即将发生的深远变革。
🔑 LLM工具选择与应用实例
Anthony分享了他个人在使用LLMs工具时的偏好,比较了ChatGPT和Bard的优势和用途。他说明了在不同场景下,根据所需的信息类型和处理能力,会选择不同的LLM工具来辅助工作。通过具体例子,展示了LLMs在撰写商业计划和报告中的实际应用,体现了其在提高工作效率和信息获取能力方面的价值。
Mindmap
Keywords
💡数据治理
💡大型语言模型(LLMs)
💡数据分类
💡合规性
💡风险降低
💡效率提高
💡总结能力
💡隐私
💡实体识别
💡情感分析
💡偏见与误导
Highlights
Introduction of Anthony Woodward, CEO of Record Point, and discussion on augmenting data governance with ChatGPT and other large language models.
Anthony Woodward shares his hobbies, including a passion for coffee and snowboarding, illustrating his personal connection to the topics of discussion.
Explanation of large language models (LLMs) as AI that excels in generating text, holding conversations, and producing grammatically correct essays.
The potential of LLMs in classifying data, summarizing information, and assisting in data governance by applying classifications and ontologies.
Discussion on the role of LLMs in enhancing data governance, including their ability to improve data classification and summarize data points.
Exploration of how LLMs can contribute to data governance by recognizing entities and analyzing sentiment, particularly in privacy-sensitive contexts.
Addressing the capabilities of LLMs in generating ontologies and classifying data based on both predefined ontologies and content-derived ones.
The conversation highlights the risks associated with LLMs, including biases and the potential for generating inaccurate or misleading information.
Insights into using LLMs as a bridge between business and data, enhancing storytelling with data, and providing a more nuanced understanding of business data.
A word of caution on the different privacy implications between the free and paid versions of OpenAI's services.
Discussion on Record Point's approach to leveraging LLM technology for data governance, emphasizing the importance of a feedback loop for verifying classification.
Insight into how Record Point's unique application of LLMs enables tailored data governance solutions that respect customer privacy and data integrity.
The potential of LLMs to build policies and rules based on specific ontologies, such as GDPR, and apply these to classified data for enhanced governance.
A look into the future of data governance with LLMs, suggesting a significant acceleration in capability and potential applications across various business processes.
Closing remarks on the evolution of LLMs and their impact on data governance, with Anthony Woodward sharing his preference between ChatGPT and Bard based on their respective strengths.
Transcripts
[Music]
hello everybody everyone welcome on the
lights on data show today we're going to
talk about augmentation of data
governance with cha gbt and other large
language models
I love it it's a great great topic and
very very relevant today our wonderful
guest today is Anthony Woodward Anthony
Woodward is the co-founder and CEO of
record point of fast growing software
service solution focused on helping
organizations discover govern and
control their data for tighter
compliance more efficiency and less risk
Anthony is regarded as one of the
leading thinkers on the intersection of
data and privacy welcome Anthony it's a
wonderful to have you on the show
thank you thanks for having me
now before we go into
um this great topic today I wanted to
ask you if you have some interesting
hobbies that you would like to share
with our listeners and if maybe you've
discovered a new hobby since you moved
all the way from Australia to Seattle
yeah I know uh look I I bought one of my
hobbies um with me and I certainly
discovered a hobby as well
um I share a a passion for coffee I
wouldn't be very Australian if I wasn't
feeding everyone over their head about
how much better coffee is in Australia
and how I think most Australians think
we've discovered coffee which I know is
very untrue but um you know certainly
there's a view that we've perfected it
so um I know I followed George on
LinkedIn and see your your weekly posts
on coffee and um if you head over to my
LinkedIn I try to one-up you on that um
with some some different uh you know
siphon coffee or some other pieces so um
certainly coffee is a strong passion and
um I don't know how anybody survives a
day without something funny
we're in great company
absolutely and the good news you know is
um uh really having moved to Seattle
which is the U.S capital of coffee they
had some work to do I think they catch
up on Melbourne uh but um I was able to
at least a transplant
um that that particular passion and
continue it uh in Seattle and look you
know I'm coming from Australia I I
certainly um surfed a lot as a child and
spent a lot of time out on the waves
which is a lot harder in the Pacific
Northwest
um but at least being how cold it is in
the ocean
um so I'm a bit of an avid snowboarder
these days it certainly transferred my
um my board uh you know uh water but
water in the water into the board in the
snow so that that's uh that's certainly
a piece out how to spend my weekends but
so yeah I've been going for hours about
that but but
um those are might kind of be too
obvious
we love it it sounds very very exciting
maybe you can go to Whistler sometime if
you haven't already and to Tofino for
surfing certainly certainly it may not
always though and and planning to be out
there again when the season rolls back
around so
wonderful
well we're happy to to have you closer
um so let's get into the topic and
today's topic so our first question uh
just to ease us into it
um what are or what is a large language
learning model
yeah it's a great question you know
there's a lot of hype out there uh how
chat GPT and you know Bard and a bunch
of other uh large language models but
um if I can really break it down
um to the level of Simplicity you know
large language models are uh it's AI or
you know A variation of machine learning
in particular using a neural network and
we'll get to that uh in a little bit in
a second but they're just really
um mathematical models that are
astonishingly good at generating text so
you know that what does that mean well
it means that they can hold a
conversation with you uh they can
generate an essay they can
um do that
um in a way that is
um you know really coherent to the
context of the question you're asking
them
um and they can do that you know that is
grammatically correct particularly in
English I mean it's not talking about
other languages at this stage they're
not not as good
um but what
um were there really
um are at the end of the day is a
mathematical model that
um really approximates the human brain
in the sense of having interconnected
neurons so if you think about a brain
and how a brain works where there are
neurons and there's a firing to connect
different thoughts and processes
together as a sort of conceptual model
uh you know large language models are a
way to use a neural network
to transform
weighted mathematical models into
desired output as language and what that
really comes down to is a really really
large set of mathematical models that
are able to generate and transform data
from the statistical numbers it has for
each of the words so all of that's kind
of a lot of a lot of pieces but to
really simplify that at the end of the
day if you think about
chat GPT or Google bard what they've
done is being said large data sets out
of the internet they've broken that down
to look at what are the patterns of
language or words that go together that
understood all right if I I'm asked a
question about
um a particular topic what is the kind
of language I've seen on that topic and
then give it back to you in a structured
grammatically correct English way
they're not able to think they're not
able to
um really go beyond
um you know the recognition of entity so
recognition of what are the things
occurring around these words
um and they're just able to then
regurgitate these things within concept
so you know the hype cycle around
um them being able to you know turn into
Skynet or something of that else is is
very far off they are really just a very
complex but
um large set of data in a mathematical
model
okay that's good news about about uh not
being there yet that's kind of level so
we have a question here I love the way
you explained it it was really very you
know in a very easy to understand way
more complex thing you know very easy to
understand why thank you yeah I think we
have a question here from Kate strasny
and she's wondering then what problems
are llms solving for data governance or
you know what how can we use llm to
augment our data governance efforts
great question Kate and it's something
that certainly
um I think still going to evolve right
we probably don't have all the answers
today by
um there are some really exciting areas
that this opens up so
you know really considering that that
definition of large language models
there are some things that they're
really good at you know firstly
classifying data so if it's able to
understand
different strings of text or potentially
even different strings of data rather
than just hacks then
um you know we're able to quite
accurately and reliably apply a
classification or a series of different
ontologies uh to that data and that gets
super interesting because now we can tag
the data up for different aspects of
governance slash risk if I can think of
both sides of the coin governance is as
a result of risk that we've identified
um you know to do things and really
that's where things get interesting
right because if we're able to
understand those tags we can use a
um statistical model that isn't
pre-trained to look over a lot of data
set that my needs and governance and
apply that labeling
but that's sort of job one job two is
being able to summarize that data you
know one of the problems I think we have
in the data governance field is not just
understanding
you know what is the data how should we
action the data what is the risk of that
data but also how do we give it back to
those that need it in forms that's
consumable so you know
um uh we we've traditionally had search
engines and other processors to allow us
to search and control through find
things that that come up as hits and
then go through that data the great
thing about Live Language models is it
is it's really great at summarizing
um different data points but potentially
even summarizing multiple data points
within a relationship and that's where
it gets really interesting so if you can
imagine
I have data it's classified within
um you know an ontology of even risk um
you know that allows us to look at you
know a good classic example um depending
on the jurisdiction you're in
occupational health and safety OSHA in
the US OHS and other parts of the world
um and we can start to break down you
know a summarization of all the risks
inside the organization based on using a
large language model and then output
another ontology that can apply to
classification this stuff all kind of
Builds on itself I'm so it's really good
at that it's also really good at
question and answering so when you think
again about the application of these
tools you know going and asking
um can you quantify uh how much of a
thing do I have how much of information
I have to provide for lawyers in a legal
case is very very good at that and then
then lastly you know the last two points
um which I think really come to the area
of privacy
um and control it's really good at
entity recognition but not even the sort
of traditional entity recognition that
we think about of people places
organizations companies you know those
are all pretty straightforward and
although you know llms are really good
at that entity recognition as you
recognition to things like ethnicity
um you know uh you know health issues
um that might relate to a person so
really getting more granular and
thinking that about the risk of that
data and how is your government you know
that that's it within that and then
connected to that is sentiment analysis
so being able to look at the sentiment
that wraps around it you know I was
talking to a
um a large customer today in the
healthcare sector and they were talking
about look I want to find all of the
doctor reports that
um are to do with a particular type of
cancer that
um have had a negative outcome for the
patient
um and we use a large language model to
break that data down and blur through it
obviously you know you've got to be very
careful when you get get all the
doctor's reports in a particular
um Health in a particular hospital
because there's got to be a bunch of
these airlines are really good at
getting away from those false positives
understanding the sentiment
understanding the connectivity of those
words and then bringing that data back
um to be used in a meaningful way so you
know very good at that kind of data
mining content categorization and then
summarizing a data right okay well we
have some follow-up questions to this
and uh the first one comes from Dan
Everett and specifically on what you're
mentioning on the that the llms can
really help with that classification
he's wondering then let me just bring it
up here
is it doing the classification based on
ontologies that you have to build or
deontologies are being derived from the
content itself
what a great question then
um the beauty of llms is
um in in the world of AI there are a
couple of different AI tools so all AI
methods so there's supervised and
unsupervised so supervised means that
we've got a set ontology and we're
training it we're going to train it to
do a thing um it could be an ontology it
could be some other uh Factor the thing
about llms is they are somewhere in the
middle of the supervisors and and
supervised in that they're kind of
semi-supervised so you do need to give
them some tokens to fine-tune the models
but
um it's it's fair the models by
themselves are pretty good at uh
building an unsupervised set of
processes so what does that mean it
means that they are really able to
generate ontologies by themselves but
they are better if you fine-tune them
like if you want to get down to the
higher levels of accuracy then you can
fine tune them by training them and and
manipulating
um the models to be more effective but
they will generate an ontology without
that they'll you know they do have an
understanding of English in particular
and I'll keep saying that because
they're not necessarily so good today at
non-english and that will evolve and get
better over time by
um you know they are very good at
building English based autonomy
ontologies
um and being able to do that
classification themselves but you can
make them better by doing the sort of
what AI experts would call transfer
learning where you're able to give you
more specificity uh to the model
um on the ontology you're trying to
create
and and uh my other question was you
you're mentioning that you can really
figure out different entities
um from from that data so you mentioned
ethnicity as an example so it doesn't
mean that it's it's really extrapolating
figure out the different attributes that
uh would be based on everything else
that you kind of have in there and
determine oh you know you are you know
this and that
yeah so it's some really interesting
um uh academic papers around
um you know looking at even the use of
language and what that indicates around
uh your your ethnicity and background
um and the types of words that you use
you know I'm I grew up Australian so
um we do use some funny
um uh phrases and phraseology
um that when we write and and talk about
things
um yeah I confuse the uh my American
friends all the time with uh with things
I say
um and I think you know that will come
through in my emails and my speech and
and so the models are able to use that
to then match where they've seen that
you know because these large language
models are large they've got you know
English language from many places again
they can attribute that back into a
particular
um Regional rationale or driver based on
the language and the use of that
language it does go further though where
um you know you can also look at the um
data about you particularly if you're
feeding it with
um privacy data or data around
um the individuals that that are
concerned and it can use those factoids
to build a a picture of your ethnicity
and background also
well those are all incredible benefits
as well
yeah
but of course there's risks uh I I would
think I mean that last example that you
mentioned it can definitely turn into a
big risk do you are there any that come
to mind that we should watch out for
plus you're uh we talked about the good
things about large language models in
reality there there have already been
some bad things
um so you know and I think
um it is worth saying that the you know
I started off by saying it's not Sky now
and and some of the proof points that
it's not skymet are our large language
models are really prone to biases if not
manage properly
um you know they can effectively
mathematically hallucinate
um you know and lie and really that's
just because it's a reflection of the
data that they represent but they get
confused
um and they get confused because they
see a a group of things and I say they
see it because again I'm trying to refer
them almost as if they're sentient and
they're not
um but but the model sees
um you know a a mathematical set of
Concepts that are connected those
neurons again that we spoke about
um but but in the same way that your
mind can make things up and and give you
you know
um experience the concept you haven't
actually experienced they you know the
model can do the same and that's
collected a whole bunch of facts so it
crew it creates out of those facts or it
doesn't actually it sees in those facts
that aren't really joined a relationship
and from that relationship it will give
you a bias or it will give you a
hallucination or maybe even the data
itself that it was fed had a bias or a
hallucination in it but when you look at
it across the abstraction you can't see
it as a set of individuals but when it's
connected it's there so
um you know that's something we really
have to be very careful about like how
factually accurate are they
um you know how how
um how are those models going to
um be used and and what is the
prevalence behind the model so that you
know you know how it's constructed and
and what biases are going to lie inside
that
and also how can we maybe educate the
users
to you know in critical thinking when it
comes to the results the chat TPT brings
up and this is in general also for
Google searches right
um You need to use your own critical
mind to establish if those things that
come up are actually trustworthy
but I think the chargpt this makes it
even more complex and more difficult
yeah absolutely and it's you know it's
why you know
um we're always going to need experts
because you know
um they're still going to need to be
some level of human triage of those
things like the models are not good at
understanding their own biases or
self-reflecting you know being able to
do that self-reflection they're just a
mathematical model so of course they
can't self-aver play that seems
self-evident but you know we do I mean
even myself you're Amazed by tap jpt you
go ask it a question it gives you this
fantastic answer it appears sentient but
but it's really not it's just this
mathematical model that has no ability
to self-reflect
um so we can't think about whereas
biases are and correct for that and
let's just talk to so again the good
thing about these models is they can be
trained out of some of those behaviors
and you know we're only at the beginning
of the arm to do that well
so
um Petra here has has a question uh for
you Anthony and he's wondering
how do you see llms especially Chad gbt
but not necessarily can be a bridge
between business and data both in terms
of being business slash data driven but
also in telling stories about data and
stories with data
yeah I think what's really great about
the Live Language models is you can use
them as a seed
um to match against your own business
data so you know I talked before around
the different ways that we can use them
from a data governance perspective but
um it also can apply just as well to
data sets
each other inside the firewall inside
the business so the
um you know there are some really great
scenarios of using them as a much better
search engine again because they're much
better at that
um summarization of information and much
better at understanding the context of
information and much better understand
the context of your question
um so you know you can you know go ahead
and use them to bring that data set
together you know uh when you're asked
for you know in a business context of
being asked to research a new product or
research a new capability you know being
able to look at all of the data already
inside your inside your business
summarize that data and you know uh
develop that new product for instance
they're very very good at that
um so you you can already using open AI
using some of the apis from Google and
and others Facebook and others
um you know do that today I will say
just click one word of caution
um if you are going to go use open OA
and chat GPT please go pay for it not
that I'm not recommending you pay them
money but a really big different privacy
implications if you use the free version
we probably don't have time to cover it
all in the episode today for the whole
other Episode by itself uh but just a
word of warning the privacy policy for
the free version versus the paid version
is very different effectively sharing
your data for free with open AI if you
don't pay for it
um and that can have some really big
implications from a business perspective
as well
very interesting so and and yes like you
said we're not going to get into it but
just to clarify is it um if you pay for
it so if you uh upgrade yours you have
the P subscription or if you're using
their API to then connect is it either
or or just the API version that would
not would respect your privacy a little
bit more
yeah unless it's changed like I believe
in order to use their API in any
meaningful way you actually have to pay
for it anyway so they can't do the same
rate
um there are some free elements for the
API by um in order to properly use the
API of business data you would you would
need to pay right right and is that ever
it is you know reminding us if it's a
free service then your data is the price
absolutely absolutely and look I think
you know as we've already seen again um
with these large language models you
know the very large companies behind
them are just chomping through our data
and and using it to perfect their models
and without attribution and you know
there's a whole a bunch of law to be
written you know seeing what's happening
in Italy and other parts of Europe
um where there are some real caveats
around what they're doing uh you know in
the world and that's yet to play out um
let's see how that case will
um develops over time
so so Anthony
um let's talk a little bit about Breaker
Point too where where um where does it
come in how how can it help our data
governance efforts
uh we're at a point of being um doing
data governance for some time where a
SAS solution uh you know we work with
people like New York City Seattle City
and
um there in British Columbia where you
are absorbed you know with um the the
liquor and cannabis board for instance
um the
um what what we've done for some time is
classify data to apply retentioning but
to also understand its privacy context
um how that data should be treated and
handled over time
where
um we've been doing a lot of work and
then where we've been adding to our tool
and where
um we'd be really looking at large
language models is really improving that
classification that tagging so what
we've done
um that we think's rather unique is
um we build about a thousand odd
connectors to Upstream systems so you
can feed these models
um you know be that Salesforce be that
you know Twitter and and
um like data Lakes like snowflake or
just traditional places like file shares
and so we've really focused I guess on
three points one is how do you clean
your data up so the very basic things of
looking at redundant obsolete trivial
data because we want to you know in
order for these large language models to
work we actually need better cleanse
data that has less biases and all those
sorts of things
um so absolutely that that data
cleansing capability or we called Data
inventory
um two then looking at the
categorization of data what um
ontologies does it apply to so from a
data governance perspective that could
well be a risk ontology it could be a
record-keeping file plan ontology it
could be um you know looking at
different dimensions of privacy you know
who are the entities in this document
who does it talk to what are the
relationships or or data um you know
they use the word document and data
interchangeably here
um and then providing that control so
that the Enterprise is able to do that
in all of those systems
um within a large language model so
that's very much where we've been
focused at record point is really
building out
um very simple ways for a customer to
leverage these Technologies
um but point and click connect to a
Salesforce classify that Salesforce
manage that Salesforce and Anthony I'm
very curious if if as you're leveraging
this technology do you have some sort of
a feedback loop that your somebody is
sort of verifying that classification
for example as a first round and you
have somebody from the business and
they're like yeah yeah you know spot
checking and uh it's good to go
yeah some of the things that we've
always we've really felt um is important
um I don't think our customers agree
um you can operate our tool again it's
it's semi-supervised
um so you can operate a completely and
unsupervised fashion and it will just go
and do a thing and the accuracy comes
down but there are scenarios where that
makes sense I've got a large set of data
I just wanted to go and discover right I
just wanted to go classified it can go
do those things
but we also recognize and and I think
you know um one of the things that's
quite yeah that we've really focused on
and done a lot of
um development r b on is the ability to
um override or have a human train the
model because it's it still is the case
these large language models are very
good
um but they still as we said earlier I
have biases and problems and we need
humans to be able to ride back things
we've done an outsole that relates to
give you that capability go it's not one
of these or it is one of these
um in a much stronger way and then
adjust the model
um based on your data set and that's
that's a really important
um governance piece right because you
need to be able to govern the model not
just govern the data
and what record point is doing to help
one client would it be able to benefit
another as well
yeah again you get into some some
interesting legal issues so yes in terms
of process we we learn
um where people are tweaking the facts
in the models we learn
um what are the various seeds we should
provide we don't share in our case every
customer has a model to themselves so
their data never leaves their own
tenancy within our SAS solution
um and is confined to that so that way
we're not sharing data across sort of
boundaries because even though the data
isn't trapped in a model there is waste
reverse engineer that or you know you
may end up with outputs that are raised
based on another another business if if
you shared these models so we've
developed a way to create a effectively
a seed model that um doesn't have any of
the businesses own data in it but gets
you started
um and then on that you'll then build
with your own data over time and it but
it is completely um separated from every
other
um customer in in in our platform
Dan is wondering about this uh very
specific application if if you have the
llm that's um we have an ontology based
on gdpr
then can you use the LMS to build
policies and rules based on that
ontology classify that data based on it
you know and apply property policies and
rules to the classified entities
um llms by themselves won't do those
things
um they'll go close to giving you
and on you know if you've given them the
rules of uh of gdpr and then you've
looked at uh asking if the question
around how duty power applies
um they will give you answers but they
won't they won't
um do those things uh where you take the
specific policies of say data subject
access requests in dsat and then track
the time periodicity of supplying that
back um to a customer they won't do that
out of the box
um yeah and I thank you very much for
the tool because it does does lead back
a little bit to what we do at record
Point that's how we're augmenting those
models is looking at those workflows and
those specific processes in gdpr or CCPA
or whatever the legislative outcome is
um and and giving you the actions that
come from that classification so that
that that's very much the world we live
in is how do you you apply a set of
regulatory actions to an AI outcome
so I mean I can
I understand all these benefits and it's
fantastic it really makes my heart jump
uh in a positive way it's just looking
at all the possibilities and how the
world is advancing in the series is
purely fantastic do you think I mean
it's definitely removing a lot of the
hassle from organizations on how to deal
with some of these data governance
aspects do you feel it will still be
able to or it should still involve
stakeholders from across the business
and their data governance efforts now
with you know Chad GPT and LMS and
record Point really helping a lot of the
groundwork
absolutely I I mean you have to bring
the
um yeah the business owners and and the
stakeholders along
um in this journey you know we you know
the thing about AI is it is a big leap
like it is really accelerating our
ability to do things at a really great
Pace that previously took a lot longer
um and was much more difficult to do
but you can't just go take that leap in
the corner because if you take the leap
in the corner
um you've kind of you know jumped a
dimension if you will
um but nobody else came with you
um and so I think you know there's so
many benefits here for the business and
how we're organizing data and thinking
about
um they've been making data more
effective that you'd want to share it
you'd want to bring your stakeholders
along you'd want to integrate them into
that and I think what's really great
about what's happened in the public
discourse around open Ai and chat GPT
and the rest of it is there's a real
awareness of wanting to have that
conversation and seeing this as a as a
real mechanism to make their lives and
their jobs easier you know what we've
really always focused on our record
point is how do we abstract as much of
the data governance and and risk
decisions from the user so it doesn't
burden them the really great things
about llms is we've really taken about
now that Leap Forward and I think that's
the starting point of that conversation
of you know I'm not here to come and add
something more to your day I'm actually
here to come and take something away
from you today
um and make it more effective and better
for you and that's really
um you know when you have that
conversation it's really productive
right yeah and it will frees up the mind
of the stakeholders so they can focus on
the Strategic decisions which is awesome
so that's great to bring them in because
now they you know they have more
capacity to to focus on that
um I wanted to ask Anthony how do you
see that um you know this Evolution will
help data governance so where where does
this all go towards you you mentioned
acceleration
um just now so where are we heading
yeah
um you know uh it's it's the it's the
million dollar question
um look uh we really really are at a
very early stage for what we've seen
from our chat GPT and and and you know
the the really large language models
that are out there that there is you
know whilst they are going to
revolutionize
um the processes we have you know I
think in data governments they're
actually a really easy application like
we're one of the first areas that we'll
be able to take advantage of them very
quickly
but
um there is a plethora of additional
workloads and capabilities that we're
going to be able to use them for right
because they're really good at
understanding uh context you know and
um summarization and
um capturing
um you know different
um decisions is a strong word but
supporting evidence to decisions
um you know you're really going to see
them more and more pop up into different
business processes that we all have so
um you know you're being able to augment
a workflow you know onboarding a patient
to a hospital but bringing that data set
summarizing the kind of problems they've
had in the past so really bringing facts
and making them
um Bubble to the top in a timely manner
of that context and that's a very simple
example but that really applies to
everything so you know uh we will really
see them roll out in in systems
everywhere I think in in 10 15 years
time
every technology will have some AI llm
component to it you know I I certainly
see llms and what's happened with AI
here is just as revolutionary as the
internet if not more Revolution internet
yeah
yeah the future is very exciting lastly
I want to take this question from Ravid
he's wondering what do you use most
charging PT or Bart
um I'm really oscillating between the
two
um they both have some strengths and
weaknesses as a as you as you probably
have experienced
um look if I asked to say advice to take
today as an example I use Bard a little
bit more today
um I I found bad uh I I just a touch
better and
um I've today I was working on a new
business plan and some board reports
um and I was looking for some data to
bring into that but a little better at
bringing in statistical information and
and those sorts of things so
um I've I've leaned in that direction
but these models are evolving so quickly
you know open AI put out gpt4
um there are others coming in the
pipeline that we'll see come out and and
so you asked me that same question again
tomorrow and I'll probably give you a
different answer but it's a great
question
thank you Anthony
well thank you so much for being on the
show and putting the lights on uh
augmentation of the governance with Chad
GPT
[Music]
not having me it was a lot of fun
Anthony where can people find you and
also find more about record point
I'm usually on a plane over the Pacific
somewhere that um
uh so so look uh yeah Anthony Woodward
on LinkedIn and many other places
um you can find me in you know any any
social network
um I'm relatively active but
um if you if you hit recordpoint.com
um you know we have a whole bunch more
description our website we have a number
of blog articles on large language
models and chat GPT and how we're using
Ai and those processes
um and you know certainly from there
um if you uh hit um one of the contact
Arsenal or some of those processes there
um you'll be able to talk to myself or
one of the team and
um always happy to have a reach out on
Twitter so I'm at woodwas w o o d w a on
Twitter or hit me up on LinkedIn um
always hoping to have a chat with
anybody that's interested in this topic
oh thank you
all right well thank you so much
everybody for joining in on this
exciting topic and uh looking forward to
see you next week
and thank you very much for the
questions everyone really really good
questions and very good engagement
thank you and have a wonderful day thank
you
Browse More Related Video
![](https://i.ytimg.com/vi/decB4ySvmSw/hq720.jpg)
The Data Governance Challenges of LLMs
![](https://i.ytimg.com/vi/192S3xNbcEs/hq720.jpg)
Networking for GenAI Training and Inference Clusters | Jongsoo Park & Petr Lapukhov
![](https://i.ytimg.com/vi/NI9ziIAjO6E/hq720.jpg)
Ilya Sutskever | The birth of AGI will subvert everything |AI can help humans but also cause trouble
![](https://i.ytimg.com/vi/Iq4YStiGADs/hq720.jpg)
How Did Dario & Ilya Know LLMs Could Lead to AGI?
![](https://i.ytimg.com/vi/2VVzM7a1jMI/hq720.jpg?sqp=-oaymwEoCIAKENAF8quKqQMcGADwAQH4AbYIgAK4CIoCDAgAEAEYciBQKC4wDw==&rs=AOn4CLBfRRGOwu065WG11TLUIdm0mDYBAQ)
Five Megatrends Influencing Supply Chain Risk And Resilience with Bindiya Vakil
![](https://i.ytimg.com/vi/zYyOF1JQato/hq720.jpg)
Stream of Search (SoS): Learning to Search in Language
5.0 / 5 (0 votes)