Maggie Little: Data & Digital Ethics

The Kennedy Institute of Ethics
8 Jun 202127:17

Summary

TLDRIn this talk, Senior Research Scholar Maggie Little explores the ethical implications of the data and digital revolution, akin to the agricultural revolution 10,000 years ago. She discusses the massive amounts of data generated through new technologies and the ethical concerns surrounding privacy, surveillance, and data usage. Little also addresses the revolution in data analytics, including AI and machine learning, and the potential for bias and mistrust. She emphasizes the importance of designing, deploying, and governing these technologies responsibly to preserve privacy, advance justice, and maintain public trust.

Takeaways

  • 🌐 The data and digital revolution is compared to the monumental shift from hunter-gatherers to agriculture 10,000 years ago, emphasizing its transformative impact on society.
  • 📈 The revolution is powered by two main components: the data revolution and advancements in data analytics, particularly artificial intelligence (AI) and machine learning.
  • 🔍 The data revolution involves massive amounts of data gathered through new technologies like drones with facial recognition and smart cities, as well as organic data from consumer digital activities.
  • 🤖 AI and machine learning allow for the discovery of hidden patterns in data that traditional statistics might miss, offering new insights but also posing ethical challenges.
  • 🔒 Privacy rights are a central ethical concern in the data revolution, with questions around the collection, storage, and secondary use of personal data.
  • 🚨 Informational risks, such as the potential for data breaches and de-anonymization, are significant ethical issues that must be managed when handling sensitive data.
  • 🏥 The potential for AI tools to improve public health, personalized medicine, and medical imaging is vast, but they must be designed and deployed with careful consideration of ethical implications.
  • 🚫 The misuse of AI tools, including mission creep and commercialization, can lead to ethical concerns, especially if the tools are used for purposes beyond their intended design.
  • 📊 Ethical biases in AI can occur when algorithms unintentionally reinforce existing prejudices or不公平ly impact vulnerable populations, highlighting the need for careful algorithm design and validation.
  • 🌟 Public trust is crucial for the successful deployment of AI in healthcare and other sectors, and transparency, communication, and democratic accountability are key to maintaining this trust.

Q & A

  • What is the significance of the data or digital revolution compared to historical societal changes?

    -The data or digital revolution is considered as momentous for human society as the agricultural revolution 10,000 years ago, which saw humans shift from being hunter-gatherers to settled agriculture.

  • What are the two main components of the current revolution in the 21st century?

    -The two main components are the data revolution and the revolution in data analytics, both powered by the massive increase in computational power achieved in the last two decades.

  • How has the way we gather data changed due to new technologies?

    -New technologies like drone planes with facial recognition and smart cities with sensors have enabled the gathering of massive amounts of data, which is part of the data revolution.

  • What is meant by 'organic data' in the context of the data revolution?

    -Organic data refers to novel forms of data that are generated from everyday activities, such as social media posts, which can be scraped and analyzed for insights.

  • What is the role of artificial intelligence and machine learning in the data analytics revolution?

    -Artificial intelligence and machine learning play a crucial role by enabling the analysis of large datasets to find patterns that traditional statistics might miss, thus enhancing our ability to extract information from data.

  • What are the potential benefits of leveraging massive amounts of data in public health?

    -Leveraging massive amounts of data can aid in public health by generating hypotheses, tracking disease hot spots, and improving personalized medicine and medical imaging analysis.

  • What are the ethical concerns regarding the collection and storage of data?

    -Ethical concerns include privacy rights, the question of who owns the data, the risk of data being used beyond its original intent, and the potential for misuse or commercialization of data.

  • Why is it important to consider informational risks when dealing with sensitive data?

    -Informational risks are important because they involve the potential for sensitive information to be identified or inferred, even from anonymized data, through data aggregation or breaches, which can lead to privacy violations.

  • What is the 'mosaic effect' in the context of data ethics?

    -The mosaic effect refers to the risk of de-anonymization where combining non-sensitive data from one database with sensitive data from another can lead to the identification of individuals and their sensitive information.

  • How can the deployment of AI tools in healthcare be ethically problematic?

    -The deployment of AI tools can be ethically problematic due to issues like inaccurate predictions, biases in algorithms that disproportionately affect vulnerable populations, and the potential for misuse or mission creep where the tool is used for purposes beyond its original intent.

  • Why is public trust important when it comes to the use of AI in public health?

    -Public trust is crucial because without it, people may be less likely to engage with health services or follow public health recommendations, which can have negative impacts on individual and community health.

Outlines

00:00

🌐 The Digital Revolution and Its Ethical Challenges

Maggie Little introduces the concept of the data or digital revolution, comparing its significance to the agricultural revolution of 10,000 years ago. She outlines two main components of this revolution: the data revolution and the computational power revolution. The data revolution involves massive data collection, enabled by new technologies like drones and smart cities, as well as the generation of organic data from consumer digital activities. The computational power revolution encompasses advancements in data analytics and artificial intelligence, which allow for the discovery of hidden patterns in data. Little emphasizes the potential benefits of these technologies for public health and personalized medicine but also warns of the ethical perils if not used responsibly.

05:04

🔒 Data Ethics: Privacy and Informational Risks

This paragraph delves into the ethical considerations of data ethics, particularly around privacy rights and the collection, storage, and use of data. Little discusses the challenges posed by new surveillance methods and the collection of data without consent. She also addresses the issue of secondary data use, where data initially collected for one purpose may be used for another, raising questions about who should give permission for such use. The concept of informational risks is introduced, highlighting the potential for sensitive information to be inferred or de-anonymized when data is combined from different sources. The importance of considering these risks before using data, even for beneficial purposes, is emphasized.

10:04

🛡️ Data Governance and Mission Creep

The focus shifts to the risks associated with data governance, including the potential for data breaches and the misuse of data through 'mission creep,' where data initially collected for one purpose is used for other purposes over time. Little illustrates this with the example of health records and phone metadata, which, when combined, can reveal sensitive information about individuals. The paragraph also touches on the risks of commercializing databases and the dangers of regime change, where data collected under one government may be misused by a subsequent, potentially less ethical government. The importance of robust data governance policies is underscored to prevent such misuse.

15:04

🤖 AI Ethics: Design, Deployment, and Accountability

The discussion turns to the ethics of AI, specifically in the context of decision support tools. Little highlights the importance of considering accuracy, potential biases, and the broader implications of deploying AI tools. She gives an example of an AI tool developed to predict when pneumonia patients could be safely discharged, which initially showed accuracy issues due to biases in the training data. The paragraph emphasizes the need for clear verification and validation of AI tools beyond their initial training and validation datasets to ensure they perform well in real-world scenarios and do not perpetuate harmful biases.

20:04

🏥 Bias in AI and Its Ethical Implications

This section explores the ethical concerns related to biased algorithms in AI, particularly those that can disproportionately affect vulnerable populations. Little cites a study that found an algorithm was less likely to refer black patients for special medical care compared to white patients, even when they were equally sick. This example illustrates how AI can inadvertently perpetuate existing biases if not carefully designed and monitored. The paragraph stresses the need for algorithms to be transparent, fair, and free from discriminatory impacts, and it calls for a critical examination of the data and decisions that inform AI systems.

25:06

🌟 Trust, Transparency, and the Future of AI

In the final paragraph, Little discusses the importance of public trust in AI and the risks to that trust if AI tools are not well understood or are perceived as infallible. She notes that AI's complexity and the difficulty in explaining its recommendations can lead to mistrust, especially if there are concerns about accuracy or fairness. The paragraph concludes by emphasizing the need for transparency, democratic accountability, and robust governance structures when deploying AI tools. Little suggests that these practices are essential to ensure that AI is used responsibly and ethically, preserving trust and benefiting society.

Mindmap

Keywords

💡Data Revolution

The term 'Data Revolution' refers to the transformative impact of digital technologies on the collection, storage, and analysis of massive amounts of data. In the video, this concept is central to understanding the ethical implications of how data is gathered, used, and protected. The script mentions new technologies like drones with facial recognition and smart cities with sensors, which exemplify the novel ways data is being amassed.

💡Digital Revolution

The 'Digital Revolution' is analogous to the agricultural revolution in its significance, marking a massive shift in human society due to advancements in computational power and digital technologies. The video discusses how this revolution has led to the creation and analysis of vast data sets, impacting fields like healthcare, public health, and social sciences.

💡Artificial Intelligence (AI)

Artificial Intelligence (AI) is the simulation of human intelligence in machines that are programmed to think like humans and mimic their actions. The video emphasizes AI's role in data analytics, particularly in machine learning algorithms that uncover hidden patterns in data. An example given is the use of AI in predicting patient outcomes in healthcare settings.

💡Machine Learning

Machine Learning is a subset of AI that enables computers to learn from data without explicit programming. The video explains how machine learning algorithms are trained on data to identify patterns that can be used for predictive analytics. It is highlighted as a key component in the data revolution, with the potential to reveal insights that are not accessible through traditional statistics.

💡Privacy Rights

Privacy Rights are the individual's rights to control or influence what information related to them may be collected and stored, and by whom and to whom that information may be disclosed. The video discusses the ethical dilemmas surrounding the collection of data, especially when it comes to surveillance technologies and the potential for misuse of personal information.

💡Data Breaches

A 'Data Breach' refers to an incident where data is accessed, stolen, or used in an unauthorized way. The video script mentions the risk of data breaches as a significant ethical concern, especially when sensitive health data is involved. It underscores the importance of robust cybersecurity measures to protect data.

💡Informational Risks

Informational Risks are the potential harms that may come to individuals or groups due to the misuse or mishandling of their data. The video highlights the concept of 'mosaic effect' or 'inferential privacy risks,' where combining different data sets can lead to the de-anonymization of individuals, even if the data was initially anonymized.

💡Ethical Bias

Ethical Bias refers to the unfair or disproportionate impact of an algorithm or decision-making process on certain vulnerable populations. The video provides an example of an algorithm that was less likely to refer black patients for special medical care, illustrating how biases in data can lead to ethically concerning outcomes in AI applications.

💡Data Governance

Data Governance is the framework of policies, roles, and responsibilities that manage the entire data lifecycle. The video emphasizes the need for strong data governance policies to prevent misuse of data, ensure ethical use, and maintain public trust. It is highlighted as a critical component in the responsible handling of data.

💡Public Trust

Public Trust refers to the confidence and reliance that the public places in institutions or technologies to act in their best interest. The video discusses how the use of AI and data analytics can affect public trust, especially in healthcare. It stresses the importance of transparency, communication, and democratic accountability in maintaining trust in these technologies.

Highlights

The data or digital revolution is compared to the monumental shift from hunter-gatherer to sedentary agriculture societies around 10,000 years ago.

The revolution is powered by a massive increase in computational power, leading to two distinct revolutions: the data revolution and the revolution in data analytics.

New technologies for data gathering, such as drones with facial recognition and smart city sensors, contribute to the data revolution.

Data is now being generated in massive and novel ways, including consumer digital data from mobile phones and other devices.

Data analytics have evolved with artificial intelligence and machine learning, allowing for the discovery of hidden patterns within data.

AI tools can support decision-making, though they do not replace human judgment and must be carefully validated.

Ethical considerations in data ethics include privacy rights, the ethics of data gathering, and the potential misuse of data.

Informational risks are a key concern, especially when sensitive data is involved or when data can be de-anonymized.

Data breaches and poor data governance policies can lead to the misuse of data and loss of public trust.

The ethics of AI involves considerations of accuracy, potential biases, and the transparency of AI decision-making processes.

AI tools must be designed and deployed with careful attention to privacy preservation, justice, and trust.

The potential for misuse of AI tools, such as mission creep and commercialization, requires robust governance and oversight.

Public trust is crucial for the successful deployment of AI, and misunderstandings about AI can undermine this trust.

The complexity of AI algorithms can make it difficult to explain their recommendations, which is a challenge for transparency and accountability.

Democratic accountability and public communication are essential for the ethical use of AI tools in society.

Good intentions are not sufficient; the design, deployment, and governance of data and AI tools must be ethically responsible.

Transcripts

play00:08

hi

play00:08

my name is maggie little i'm a senior

play00:10

research scholar at the kennedy

play00:12

institute of ethics

play00:13

and today i'm going to talk to you about

play00:15

the ethics of what people sometimes call

play00:17

the data or digital revolution

play00:20

so this revolution has been um

play00:23

analogized as

play00:24

being as momentous for human society as

play00:27

10 000 years bc the uh agricultural

play00:30

revolution when humans move from being

play00:32

hunter gatherers having settled

play00:34

sedentary agriculture

play00:36

that massive of a change so

play00:39

what exactly are the components of this

play00:42

current revolution in the 21st century

play00:44

well

play00:45

let's divide it into two really it's two

play00:47

revolutions

play00:48

both of them um powered by the massive

play00:52

increase in computational power that has

play00:54

been achieved in the last

play00:55

two decades so first there's the

play00:58

data revolution so we've always gathered

play01:02

lots of data including on humans and

play01:04

health which is the context we're

play01:05

especially interested in today

play01:07

we can think of bench scientists making

play01:09

studies of molecules gathering data or

play01:12

clinical researchers gathering data on

play01:14

humans

play01:15

if we're interested in social science

play01:16

research we think of people doing

play01:18

surveys

play01:19

okay but the new forms of data

play01:24

are revolutionary because for one thing

play01:26

they're

play01:27

massive um massive in part because

play01:30

we've got new technologies for gathering

play01:34

data uh think of drone planes with

play01:37

facial recognition technology that are

play01:38

being used in some countries

play01:40

to do surveillance of who is breaking

play01:43

quarantine during covet

play01:44

okay um

play01:48

or smart cities uh putting sensors in

play01:50

all of the

play01:51

light posts to see what when warm bodies

play01:54

go by

play01:55

okay so we're we have new ways of

play01:57

gathering massive amounts of data

play01:59

we also have a treasure trove of data

play02:02

that's being generated

play02:03

every day in other contexts not about

play02:06

science or information

play02:08

but like the consumer digital revolution

play02:10

so this

play02:11

mobile phone is as long as it's on

play02:16

sending out massive waves

play02:19

of data it's pinging the cell tower

play02:21

every few seconds telling it

play02:23

where it's located so the cell phone

play02:26

companies have the metadata on this

play02:28

phone and have

play02:29

enormous enormous amounts of it that we

play02:32

could

play02:32

mine to see what we could learn from it

play02:35

so

play02:35

that means that much of the data is also

play02:38

novel it's sometimes called organic data

play02:39

so

play02:40

we're finding out data by scraping

play02:42

people's twitter posts and seeing

play02:44

when people get especially angry versus

play02:47

happy with an

play02:48

election result and finally the new data

play02:51

revolution has lots of pooled data

play02:53

so we have different databases out there

play02:56

that are now being agglomerated and have

play02:58

made to talk with

play02:59

each other for instance health records

play03:03

cross-checked with somebody's social

play03:04

with people's social media posts

play03:06

we're going to talk about all of that

play03:09

well next the revolutionary

play03:12

question has to do with the revolution

play03:14

in the way we analyze

play03:16

data what are the data analytics or data

play03:19

science methods we have

play03:20

and here's where we're going to talk

play03:21

about artificial intelligence and

play03:23

machine learning

play03:24

so traditional ways of extracting

play03:27

information from

play03:28

raw or structured data was about using

play03:31

traditional statistics methods which

play03:33

very very complicated very very uh

play03:36

powerful and might even be

play03:40

uh often these days is executed by a

play03:43

computer program

play03:44

so using algorithms which are just fixed

play03:47

sets

play03:47

of rules but we now have new kinds of

play03:51

algorithms called machine learning

play03:53

algorithms that are basically a special

play03:55

type of algorithm it's still a set

play03:57

um fixed set of rules but they're able

play04:00

to

play04:01

find patterns hidden inside of data

play04:05

that humans would not be able to see or

play04:07

discover with traditional

play04:09

statistics or maybe they could do with

play04:11

traditional statistics but it would be

play04:12

too expensive to bother doing

play04:14

it so machine learning gets

play04:17

trained on data thinks it has found a

play04:20

way to

play04:21

isolate patterns and you validate it

play04:24

and from that you can build an ai tool

play04:26

which is something like a predictive

play04:28

analytic tool it might not make the

play04:29

decision for you but it could be it's

play04:31

decision support

play04:32

and those analytics allow us in concert

play04:35

with the massive data to

play04:36

find new things we couldn't see before

play04:40

so as with any revolution enormous

play04:43

potential for good

play04:44

we can now leverage enormous amounts of

play04:48

data

play04:48

finding wisdom inside to help with

play04:52

public health it's being used in covid

play04:55

right now to

play04:56

generate um hypotheses for directions or

play04:59

to track

play04:59

hot spots uh of of the epidemic

play05:03

um it's also being used in personalized

play05:06

medicine it's also being used to more

play05:08

accurately in some cases

play05:09

read medical imaging

play05:12

better than a human can do so these are

play05:14

enormous potential for

play05:15

for common good but there's also

play05:18

enormous potential

play05:19

peril if the technology

play05:23

is not designed deployed and governed

play05:26

for its ethically responsible use so how

play05:29

should we think about

play05:30

the ethical issues involved here let's

play05:32

take the two revolutions in

play05:33

in turn let's start with data ethics

play05:35

when you've got

play05:37

this new treasure trove of data how do

play05:39

you think of

play05:40

the ethics of using it of

play05:43

gathering it keeping it

play05:46

even if you leave it on a shelf and you

play05:48

don't do any analytics on yet

play05:49

just the issue about gathering it and

play05:51

holding on to it what are those ethics

play05:54

so the first issue in data ethics is a

play05:57

very familiar one

play05:59

question about privacy rights and

play06:02

one way to put the issue uh

play06:06

there are privacy rights having to do

play06:08

with sort of whose data

play06:10

is it anyway and let's look at two

play06:12

different cases

play06:13

so one i mentioned had to do with new

play06:15

surveillance methods like

play06:16

drones and smart city sensors here you

play06:20

might think there are important

play06:22

questions about whether it's even okay

play06:24

to

play06:24

gather data by passersby who can't opt

play06:27

out

play06:28

right you can't opt out of a drone going

play06:29

overhead you can't opt out of

play06:32

the street lamp collecting information

play06:34

about you

play06:35

except by staying in your house so no

play06:38

escaping if you don't stay in your house

play06:41

is it okay if it's for good intentions

play06:45

for massive amounts of information to be

play06:46

gathered about us or when do we talk

play06:48

about the limits

play06:49

of that that's really about the ethics

play06:53

of surveillance

play06:55

but there's also fascinating and deeply

play06:57

important questions about

play07:01

for data that is already gathered and

play07:03

we'll we'll assume in a not in in

play07:05

ethically acceptable ways

play07:07

like the cell phone uh tower pings of my

play07:10

of my phone my mobile network operator

play07:13

has those so that they know for one

play07:15

thing how to bill me how much data am i

play07:17

pulling down the like

play07:20

but it's one thing for the mobile

play07:23

um phone operator network operator

play07:26

to use the data for their own billing

play07:28

purposes let's say

play07:30

and another for them to sell it or lease

play07:33

it or give access to it to somebody who

play07:35

wants to probe it for different use

play07:38

who should give the permission for that

play07:40

is it the mobile network operator

play07:42

who gets to say go ahead and and

play07:46

mine that data or is it the people who

play07:48

on the phones this is an unsettled

play07:49

question so far

play07:51

but basically the key ethical issue is

play07:53

you can't assume that because the data

play07:55

already exists

play07:56

it's okay to use it for anything this is

play07:57

actually a familiar issue in medical

play07:59

ethics

play08:00

the ethics of secondary data use

play08:03

now when we start trying to figure out

play08:04

the contours of privacy rights

play08:07

one of the really critical things we

play08:08

need to pay most attention to is issues

play08:11

about what are called informational

play08:13

risks let's take a look

play08:16

so when the data collected

play08:21

or that's being proposed to be reused is

play08:24

data about

play08:25

sensitive information we know that it

play08:28

has more

play08:29

informational risks attached to

play08:32

uh the data subjects is the people that

play08:35

the data is about

play08:36

people sometimes say data is just a

play08:38

number but there's a person behind it

play08:39

when it's human data

play08:42

if it's about health records for

play08:44

instance obviously sensitive information

play08:49

it's also

play08:53

important to understand that sometimes a

play08:55

database has information that isn't

play08:57

intrinsically sensitive

play08:59

but if it were joined with another

play09:00

database that has sensitive information

play09:03

we could get what sometimes called the

play09:04

mosaic effect or inferential privacy

play09:06

risks

play09:07

that putting the two databases together

play09:09

would allow me to know

play09:10

or to identify who's the person

play09:15

and know their sensitive information

play09:17

even though in both databases

play09:19

the subjects have been anonymized so

play09:21

this is the

play09:22

risk of de-anonymization and the bigger

play09:25

the

play09:25

data and the more it's aggregated with

play09:28

other data sources the

play09:29

bigger the risk of de-anonymization

play09:31

where somebody could

play09:33

actually look in with just a few

play09:34

inferences figure out oh it was you

play09:36

who was at that location in front of

play09:38

that clinic

play09:39

um or at that protest in political

play09:42

circums

play09:42

uh context and the like so very

play09:45

important for

play09:47

those who want to probe data even for

play09:50

very good

play09:50

purposes with the best of intentions to

play09:53

do a sort of informational risk audit

play09:55

what are the actual risks of doing this

play10:00

but then you might think those risks

play10:03

don't really

play10:04

exist if we just keep the database sort

play10:06

of sequestered away but we all know

play10:08

uh um think about it for just one moment

play10:10

more that that's not really true

play10:11

so the the

play10:14

the issues about informational risk is

play10:16

that that information may

play10:18

escape as it were um the

play10:21

the software in which the the database

play10:23

is encoded so how can that happen well

play10:26

the one that people talk about most of

play10:27

all is data breaches right we have to

play10:29

worry and be stewards of data

play10:32

and keep hide cyber security so that

play10:35

others

play10:35

bad actors can't come in and steal the

play10:37

data and in fact

play10:38

um famous example from 2015 anthem

play10:42

blue cross 78.8 million

play10:46

patient records were stolen so

play10:49

risks of data breach are very real but

play10:52

the worry isn't just about the potential

play10:54

for a breach of an outside actor coming

play10:57

in

play10:58

under cover of cyber night and stealing

play11:00

the data

play11:01

they're actually very profound risks if

play11:04

we don't have good data governance

play11:06

policies ensconced

play11:08

in policy that that that

play11:11

those who are meant to have access to

play11:13

the database

play11:15

might end up misusing it so this is

play11:18

sometimes called

play11:18

mission creep so imagine you're a

play11:21

government and you have a

play11:22

huge database that combines your

play11:24

population's health records

play11:26

social media uh phone metadata records

play11:29

the whole 90 yards

play11:31

okay that is enough information to be

play11:33

able to de-anonymize or identify people

play11:35

in it and know all sorts of things about

play11:36

them

play11:37

but imagine that your government says

play11:38

that's okay i only want to use it for

play11:40

coven

play11:42

protection efforts that's all i'm going

play11:44

to do

play11:45

that's great but there's a huge

play11:49

tendency once we've pulled together data

play11:52

sometimes

play11:53

they're called data oceans it's so

play11:55

valuable to you that means it's really

play11:56

valuable for other purposes and it can

play11:58

be

play11:58

very easy to think gosh while we've got

play12:01

it we could also use it for this good

play12:03

purpose

play12:04

and that good purpose but you might not

play12:06

do as rigorous an analysis of

play12:08

about those informational risks or

play12:10

whether it's really an appropriate use

play12:12

and huge temptations for commercializing

play12:15

the databases you have

play12:16

especially in resource-poor countries

play12:19

and then also

play12:20

think about regime change so it's one

play12:23

thing for

play12:24

a current leader of a government to have

play12:26

that database but they pass it down

play12:28

right to whoever is in control next and

play12:30

depending on the government

play12:32

political climate you live in that could

play12:34

be really really problematic

play12:37

okay let's switch over now from data

play12:40

ethics

play12:41

to the ethics of doing those new

play12:43

analytics the ethics of

play12:44

ai so here there are a few

play12:48

things that it's very important to keep

play12:51

in mind

play12:52

when thinking about designing and

play12:55

deciding to deploy

play12:56

an ai tool in a given context and for

play12:59

our purposes let's just

play13:00

assume that we're just just talking

play13:02

about decision support tools

play13:04

we're not yet talking about the robot

play13:06

autonomously doing its own thing

play13:08

okay or someday becoming conscious we're

play13:10

just talking about right now the kinds

play13:11

of tools that are out there

play13:13

things like predictive analytic tools

play13:14

that say

play13:16

we um trained machine learning

play13:19

algorithms on some

play13:21

super cool rich data that we couldn't

play13:24

make heads or tails out of

play13:26

it found ways to sort for instance like

play13:29

they did this with pictures of

play13:30

cats and pictures of dogs right machine

play13:33

learning algorithms

play13:34

figured out which parameters in the

play13:36

pixels

play13:37

after they gave them millions of images

play13:39

could decide which is a

play13:40

dog or cat and now they're pretty good

play13:42

at seeing a novel a new photo saying

play13:44

that's a dog or that's a cat

play13:46

okay so you train it up you say

play13:50

my my machine learning algorithm is

play13:52

giving me a really good result

play13:54

then i verify it on new data to see if

play13:57

it was still

play13:58

holds true and then i say now i've got a

play14:00

predictive analytic tool let's go ahead

play14:02

and use it

play14:05

but all of that is a far cry from real

play14:07

world accuracy

play14:14

let me give an example so

play14:17

uh university of pittsburgh developed an

play14:21

interesting ai tool

play14:22

using a set of machine learning

play14:23

algorithms

play14:27

they wanted to figure out a more

play14:29

accurate way of

play14:30

predicting when a pneumonia patient

play14:34

could safely be discharged rather than

play14:36

stay inpatient

play14:38

doctors use their best judgment but can

play14:40

we do even better than that if we can

play14:41

see

play14:41

patterns in massive amounts of data and

play14:45

train

play14:45

um an ai tool on it well they

play14:48

um developed uh an ai tool it was

play14:52

trained on 750 000 patients from 78

play14:56

hospitals who had pneumonia and

play14:59

after doing all of this work the tool

play15:02

predicted more accurately in that

play15:04

hospital system

play15:05

which pneumonia patients could be safely

play15:07

discharged and when which needed to be

play15:09

impatient predicted it better than the

play15:11

doctors in that hospital

play15:13

system fantastic

play15:17

one hitch when they looked at the

play15:20

results

play15:20

they found that the ai tool was one of

play15:24

its glitches

play15:24

it turned out was saying patients that

play15:28

had

play15:28

pneumonia together with asthma

play15:31

are safer to discharge than patients

play15:33

with pneumonia at

play15:34

uh alone now that makes absolutely no

play15:37

sense any of you who are

play15:38

in the medical field know that's crazy

play15:40

asthma is a complicating factor to

play15:42

pneumonia and vice versa

play15:45

well they look behind the scenes it

play15:46

turns out that all of that training data

play15:48

in that hospital system

play15:50

was um governed was the result of a

play15:54

a specific hospital policy they used

play15:56

which was

play15:58

the patients who came in with pneumonia

play16:00

and asthma

play16:02

were front-lined to icu intensive care

play16:08

well that meant those patients had very

play16:10

good outcomes because it more than

play16:12

compensated for the double for the extra

play16:13

risk of having both conditions

play16:15

so the computer's looking at that data

play16:17

and the computer says boy

play16:19

having asthma and pneumonia together as

play16:21

a marker for somebody that saved a

play16:22

discharge

play16:23

they caught it okay so they don't use it

play16:25

anymore

play16:26

but it's an incredibly important ethics

play16:28

lesson because there are a lot of

play16:31

for-profit private vendor-driven

play16:34

ai tools and health now and people

play16:38

don't ask for clear verification

play16:41

they don't ask for have you

play16:44

looked at it in contexts that are

play16:46

sufficiently general outside of your

play16:48

training and even validation data

play16:51

and many of those private companies

play16:52

regard all of that data as proprietary

play16:54

so

play16:55

some people have said it's a little bit

play16:56

like imagining a drug company says i've

play16:58

got a great new

play17:00

drug i'm not going to show you any of

play17:02

the data and there's no fda to review it

play17:03

but just trust me it's going to work

play17:05

great so some people have actually

play17:06

suggested that the fda expand its

play17:08

ability to regulate what are called

play17:10

digital health tools

play17:14

here's a second thing people need to

play17:15

worry about

play17:18

sometimes the inaccuracies or

play17:21

what we might call statistical biases

play17:24

right self-selection bad representation

play17:26

representation

play17:27

pools and the like sometimes

play17:31

statistical biases are especially bad

play17:35

because they're ethically biased and let

play17:37

me explain what i mean

play17:39

so to a statistician a bias is anything

play17:41

that that tilts away from a fully

play17:43

accurate

play17:44

generalization but in ethics a

play17:47

bias is an unfair

play17:51

disparate impact that has a special

play17:55

impact on vulnerable populations

play17:58

so there's very deep worries and lots of

play18:01

good work that's

play18:02

begun to be done about biased algorithms

play18:05

in the ethical sense

play18:07

um so let me once again give an example

play18:11

um so uh

play18:15

there's a study in 2019

play18:19

of a particular algorithm by optum which

play18:22

by the way was an algorithm

play18:24

that was helping to manage care for 200

play18:26

million people in us

play18:27

okay this was active and out there

play18:31

they found in a review of the

play18:34

algorithm's deployment

play18:36

that it was less likely to refer black

play18:38

patients for special medical care

play18:42

relative to their white counterparts so

play18:44

equally sick

play18:46

white versus black patient this

play18:48

algorithm

play18:51

uh referred the white patient to special

play18:54

medical care far more than it would the

play18:56

equally sick black patient

play18:58

okay first important thing to point out

play19:01

is

play19:02

that's more than just a statistical bias

play19:06

it's also an ethically infused bias we

play19:09

care more when the errors are

play19:10

concentrated on

play19:12

suspect classifications right places of

play19:13

historical oppression

play19:15

and um ones dealing with vulnerable

play19:17

populations which happen to be

play19:19

both in the case of black americans in

play19:21

the united

play19:22

states right now what had happened by

play19:24

the way

play19:26

well um it turns out that

play19:29

the algorithm had used as a proxy for

play19:32

how much medical need you you had

play19:34

how sick how much need for special

play19:37

treatment you had

play19:40

how many dollars you spent on health

play19:42

care

play19:45

the algorithm is just a proxy if you

play19:47

were a patient who over a year

play19:49

spent a lot on health care that must

play19:52

have meant you needed more care

play19:54

if you spent less on health care you

play19:56

didn't need as much

play19:57

then it turned out that white patients

play19:59

spent more on health care

play20:01

for equal sickness than black patients

play20:03

did

play20:04

but of course we know the reason for

play20:07

that was not because

play20:09

the black patients actually needed less

play20:10

care so they had less

play20:12

access to medical care so couldn't

play20:14

afford it to begin with in the aggregate

play20:16

and in some communities less trust of of

play20:19

um

play20:20

of medical care and social reticence um

play20:23

so i've got a quote here from raghavan

play20:25

and barakas and a great report from

play20:26

brookings institute

play20:28

reminding us that algorithms by their

play20:30

nature

play20:31

don't question the human decisions

play20:32

underlying a data set

play20:34

instead they faithfully attempt to

play20:36

reproduce past decisions which can lead

play20:38

them to reflect the very sort of human

play20:39

biases they're intended to replace

play20:42

so if your data set contains unquitting

play20:47

but definite bias inside of it it's

play20:50

going to replicate it and hide it under

play20:51

the cloak of objectivity

play20:56

also without ethics of ai we need to

play21:00

remember

play21:02

it's not just worries about accuracy and

play21:06

including inaccuracy that leads to bias

play21:08

so imagine for a minute

play21:10

magic wand we're talking about

play21:13

predictive analytic tools decision

play21:15

support ai

play21:17

that's fabulously accurate in

play21:20

predictions

play21:22

better than humans and and we've tested

play21:24

it and

play21:25

not only do they have high accuracy but

play21:27

the accuracy is distributed well

play21:29

across subpopulations right

play21:35

there are two other things that we need

play21:36

to keep in mind when thinking about

play21:38

deploying ai the first one is just like

play21:40

we saw with data

play21:42

and and and gathering and maintaining a

play21:45

rich

play21:46

and tempting database um

play21:50

once you've got a predictive analytic

play21:52

tool like a dashboard let's say

play21:56

it's subject to misuse

play21:59

again mission creep let's use it for

play22:01

something else temptations of

play22:02

commercialization regime change

play22:05

um uh to give an example if you have a

play22:08

predictive analytic that

play22:09

talks about the risk for instance of

play22:12

getting coveted or the risk of acquiring

play22:14

hiv

play22:16

if even if that's very accurate in fact

play22:19

especially if it's very accurate

play22:21

you might worry about who's going to be

play22:23

able to use that tool to make that

play22:26

assessment so for instance there are

play22:29

ai tools under development now to do um

play22:33

facial pattern recognition for mood

play22:36

um to including to screen people

play22:40

for mental health issues now currently

play22:43

not accurate enough to

play22:46

be deployed certainly the healthcare

play22:49

industry but they are being deployed

play22:51

in employment recruiting and screening

play22:55

so there are commercial uh

play22:59

tools being sold now that say we've got

play23:02

an ai

play23:03

tool you've got too many people to

play23:05

interview for the slots you have you

play23:06

want to get the best ones and we'll do

play23:08

an analysis of their facial patterns

play23:10

and tell you who's likely to be a good

play23:12

co-worker

play23:13

or to be expensive to hire because of

play23:15

mental health issues

play23:18

one worry would be that they're going to

play23:19

be inaccurate in certain biased ways but

play23:21

in other words

play23:22

if they're accurate that's sensitive

play23:24

health information that it's not the

play23:26

employer's business to know

play23:28

and finally but very importantly no

play23:31

discussion of

play23:32

ai is complete without talking about the

play23:35

risk

play23:36

of mistrust so people in public health

play23:41

know and remind us all the time that

play23:43

public trust

play23:45

is hard won and easily lost and that

play23:47

without public trust

play23:49

you don't have anything so in health you

play23:52

will have people accessing

play23:54

your hospitals and clinics there's

play23:56

mistrust of the covered vaccine you're

play23:58

going to have people

play23:59

not getting vaccinated which hurts all

play24:00

of us so public trust is a

play24:03

really really incredibly valuable

play24:06

commodity that needs to be stewarded

play24:08

carefully

play24:10

but use of ai has some real risks on

play24:13

public trust it doesn't mean it can't be

play24:14

overcome but they have to be attended to

play24:17

one thing is that just

play24:19

it's hard to understand and and to

play24:22

explain to people

play24:23

what ai is and that it's not some sort

play24:26

of scary or magical

play24:27

like crystal ball it's a set of computer

play24:29

algorithms that are very very

play24:30

technical and are only as good as the

play24:33

data you train them on

play24:34

and often fail so they have to be really

play24:37

highly validated

play24:39

if if people have misunderstandings

play24:43

about what ai

play24:44

is that can really undermine so they're

play24:47

just issues about almost a kind of

play24:49

scientific literacy which is hard enough

play24:51

for the rest for all of us to catch up

play24:52

with but for large

play24:54

public health applications we need to be

play24:55

careful of that

play24:58

but also ai has a sort of

play25:02

separate uh challenge that it faces

play25:06

so i mentioned that ai is uh machine

play25:08

learning is fundamentally about

play25:11

um a kind of algorithm that can find

play25:13

hidden patterns inside of

play25:14

rich data that humans couldn't see or be

play25:17

too expensive to find out

play25:20

the patterns that they find often

play25:23

are patterns that the engineer herself

play25:26

could not explain why the why the

play25:29

algorithm is making the recommendation

play25:31

it does

play25:32

because the patterns are so complicated

play25:36

but imagine what that means in terms of

play25:38

being able to explain to the public

play25:40

or to an individual patient why did you

play25:42

recommend me for this chemo versus that

play25:44

doc

play25:44

and you have to say well the ai tool

play25:46

tells me that's the right idea and it's

play25:47

based on a lot of training data

play25:49

what does the ai tool see that you're

play25:50

not seeing doc i don't know that's the

play25:52

way ai tools work

play25:54

so again not that this can't be overcome

play25:56

that we should never use them

play25:58

but it would be irresponsible to use

play26:00

them without

play26:01

surrounding practices about

play26:04

communication

play26:05

transparency and what some people are

play26:07

calling democratic accountability

play26:10

um having some governance uh

play26:13

thoughtfulness around the use of these

play26:16

tools making sure they're validated

play26:18

having thoughtfulness around the public

play26:20

communication around them

play26:22

and having the kind of oversight to know

play26:24

when we shouldn't use them

play26:27

so in summation with the data and

play26:29

digital revelation

play26:30

both on the data side and the ai tools

play26:33

side

play26:34

good intentions are not enough

play26:38

how you design them deploy them and

play26:40

govern them

play26:41

can have enormous consequences for

play26:43

whether it's done responsibly

play26:45

so they need to be designed from the

play26:48

get-go

play26:49

for privacy preservation for justice

play26:52

advancement

play26:53

and for preserving trust

play26:57

and you need to ensure robust governance

play26:59

structures before you start any of this

play27:02

oversight how would we know if

play27:03

something's going wrong and as i

play27:05

mentioned democratic accountability

play27:07

so that there's some transparency with

play27:09

the society that you're

play27:11

trying to help as you use these tools

play27:15

thanks

Rate This

5.0 / 5 (0 votes)

Etiquetas Relacionadas
Data EthicsAI AccountabilityDigital RevolutionPrivacy RightsHealth InformaticsMachine LearningSurveillance EthicsAlgorithmic BiasData GovernancePublic Trust
¿Necesitas un resumen en inglés?