Counterfactual Fairness
Summary
TLDRThis talk delves into the concept of Counterfactual Fairness in machine learning, highlighting issues like racial and gender biases in algorithms. The speaker introduces a causal model approach to address unfairness by considering how sensitive attributes influence decisions. The proposed solution involves a metric and algorithm for learning fair classifiers, demonstrated through an example of law school admission. The talk concludes with a discussion on the practical application of these models and the challenges of ensuring fairness in machine learning.
Takeaways
- 🧠 The talk emphasizes the impressive capabilities of machine learning, such as surpassing human performance in image classification and game playing, but also highlights the need to address significant problems like bias and discrimination.
- 🔍 The speaker introduces the concept of Counterfactual Fairness, which is about creating algorithms that do not discriminate based on sensitive attributes like race or sex.
- 🤖 The talk discusses the limitations of 'Fairness Through Unawareness', where simply removing sensitive attributes from a model does not guarantee fairness due to the influence of these attributes on other features.
- 📈 The 'Equality of Opportunity' approach by Hardt et al. is mentioned, which corrects for unfairness by using sensitive attributes but has limitations as it does not account for biases in the target label itself.
- 🔗 The importance of causal models is stressed to understand how sensitive attributes like race and sex can influence other variables and lead to unfair outcomes.
- 📊 Counterfactuals are introduced as a method to evaluate fairness by imagining what the classifier's prediction would be if a person's sensitive attributes were different, thus allowing for a single change to be observed in its effects.
- 📚 The speaker proposes a learning algorithm that uses causal models to create fair classifiers by only considering features that are not descendants of sensitive attributes.
- 📉 The trade-off between fairness and accuracy is acknowledged, as fair classifiers may have lower predictive accuracy due to the exclusion of biased information.
- 📝 The practical application of the proposed method is demonstrated using a dataset of US law school students, showing the impact of different approaches on fairness and accuracy.
- 🤝 The talk concludes by emphasizing the role of causal models in addressing unfairness in machine learning decisions and the need for further research in this area.
- 🙏 The speaker thanks the co-authors and the audience, inviting questions and discussion on the presented topic.
Q & A
What is the main topic of the talk?
-The main topic of the talk is Counterfactual Fairness in machine learning, focusing on how to design algorithms that do not discriminate and are fair.
What are some examples of machine learning applications mentioned in the talk?
-Examples mentioned include image classifications, human-level Atari and Go players, skin cancer recognition systems, predicting police officer deployment, deciding on jail incarceration, and personalized advertisements for housing, jobs, and products.
What issues are highlighted with machine learning systems in terms of fairness?
-Issues highlighted include face detection systems that better identify white people, algorithms showing racist tendencies in advertising recommendations, and sexist biases in word embeddings associating men with bosses and women with assistants.
What is the intuitive notion of fairness proposed in the talk?
-The intuitive notion of fairness proposed is that a fair classifier gives the same prediction had the person had a different race or sex.
How does the talk address the problem of sensitive attributes in machine learning?
-The talk proposes a method that involves modeling the influences of sensitive attributes causally before constructing a classifier, using counterfactuals to determine what the classifier would predict had someone's race or sex been different.
What is the concept of 'Fairness Through Unawareness' mentioned in the talk?
-'Fairness Through Unawareness' is a technique where sensitive attributes are removed from the classifier to make it unaware of these attributes, aiming to make fair predictions.
What is the issue with simply removing sensitive attributes in a classifier?
-The issue is that the remaining features may still be influenced by the sensitive attributes, leading to biased predictions even though the classifier is unaware of the sensitive attributes directly.
What is the 'Equality of Opportunity' approach proposed by Hardt et al. in 2016?
-The 'Equality of Opportunity' approach proposes building a classifier that uses sensitive attributes to correct for unfairness, ensuring equal accuracy in predicting outcomes like law school success for different racial groups.
How does the talk propose to model unfair influences in data?
-The talk proposes modeling unfair influences by assigning variables for each feature, introducing causal links from sensitive attributes to these attributes, and using counterfactuals to determine predictions under different conditions.
What is the definition of 'Counterfactual Fairness' introduced in the talk?
-'Counterfactual Fairness' is defined as a predictor being fair if it gives the same prediction in a world where someone had a different race, gender, or other sensitive attributes.
How does the talk demonstrate the practical application of the proposed fairness approach?
-The talk demonstrates the practical application by using a dataset of US law school students, fitting a causal model, computing unobserved variables, and learning a classifier based on features that are not descendants of sensitive attributes.
What are the potential limitations or challenges in using the proposed fairness approach?
-Potential limitations include the need for accurate causal models, the assumption that interventions are real, and the possibility that the model may not account for all biases, such as those in the selection of the dataset.
How does the talk address the trade-off between accuracy and fairness in machine learning?
-The talk acknowledges that achieving counterfactual fairness may come at the cost of reduced accuracy, as some biased but predictive features are removed from the model.
Outlines
🧑🏫 Introduction to Counterfactual Fairness
The speaker begins by acknowledging the impressive advancements in machine learning, such as superior image classification and game-playing capabilities, and effective medical diagnostics. However, they emphasize the need to address significant issues, including racial and gender biases in algorithms, using examples like biased face detection systems and Google's advertising recommendation system. The talk introduces the concept of counterfactual fairness, which posits that a fair classifier should provide the same prediction regardless of a person's race or sex. The speaker outlines the plan to design a metric and an algorithm based on this fairness definition, using the example of a machine learning system for law school admissions.
🔍 Challenges with Fairness Through Unawareness
This paragraph delves into the complexities of achieving fairness in machine learning by simply removing sensitive attributes like race and sex from the classifier's input. The speaker points out that other features, such as GPA and LSAT scores, may still be influenced by these sensitive attributes due to systemic biases. They reference a paper by Hardt et al. that proposes using sensitive attributes to correct for unfairness, but also highlight its limitations, particularly when the target label itself may be biased due to factors like stereotype threat.
🔧 Constructing Counterfactually Fair Classifiers
The speaker introduces a novel approach to fairness by using causal models to understand and address the influence of sensitive attributes on other features and the target label. They propose modeling the causal relationships and then using counterfactuals to assess fairness. Counterfactuals allow for the examination of what the classifier's prediction would have been had the individual's race or sex been different. The process involves computing unobserved variables, imagining a counterfactual condition, and recomputing observed variables based on the causal model. The goal is to create a classifier that provides consistent predictions across both actual and counterfactual data.
📈 Demonstrating Counterfactual Fairness in Practice
The speaker discusses the practical application of their proposed method using a dataset of US law school students. They compare the outcomes of different classifiers: one using all available features, one omitting sensitive attributes, and one that is counterfactually fair. The counterfactually fair classifier, while potentially less accurate due to the exclusion of biased information, is presented as the preferred model for its fairness. The speaker also acknowledges the importance of the chosen causal model in determining the fairness and accuracy of the classifier, and invites further exploration of potential biases and the selection of individuals in datasets.
🤝 Closing Remarks and Q&A
In the concluding part of the talk, the speaker summarizes the key points: the influence of race, gender, and other factors on machine learning decisions, the introduction of Counterfactual Fairness as a metric, and the provision of a learning algorithm for fair predictors. They thank their co-authors and the audience and open the floor for questions. The Q&A session touches on the practicality of implementing the model, the handling of non-linear relationships, and the potential for biases in dataset selection, with the speaker acknowledging the complexity of these issues and the need for further research.
Mindmap
Keywords
💡Counterfactual Fairness
💡Machine Learning
💡Face Detection Systems
💡Word Embeddings
💡Causal Model
💡Sensitive Attributes
💡Fairness Through Unawareness
💡Equality of Opportunity
💡Counterfactuals
💡Stereotype Threat
💡Causal Relationship
Highlights
Machine learning advancements have led to systems that outperform humans in image classification and game playing, but significant fairness issues remain.
Machine learning is being used in critical areas such as crime prediction, bail decisions, and personalized advertisements, raising ethical concerns.
Issues like biased face detection systems and racist algorithms in advertising recommendation systems highlight the need for fairness in AI.
Sexist biases are evident in word embeddings trained on Google News, associating men with bosses and women with assistants.
The talk introduces the concept of 'Counterfactual Fairness' as an intuitive notion of fairness in classification.
A fair classifier is defined as one that would give the same prediction if the person's race or sex were different.
The approach involves designing a metric and an algorithm for learning fair classifiers based on the counterfactual fairness definition.
An example of a machine learning system for law school admissions is used to illustrate the approach to fairness.
Sensitive attributes like race and sex are distinguished from other features to address fairness concerns in classification.
The technique 'Fairness Through Unawareness' is critiqued for failing to account for indirect biases in non-sensitive features.
The 'Equality of Opportunity' approach is discussed, which uses sensitive attributes to correct for unfairness in predictions.
The causal model is introduced to represent influences and biases in data, allowing for a more nuanced understanding of fairness.
Counterfactuals are used to imagine the prediction a classifier would make if a person's sensitive attributes were different.
A classifier is considered fair if it provides the same predictions on original and counterfactual data.
The method for constructing fair classifiers involves using features that are not descendants of sensitive attributes.
The practical application of the method is demonstrated using a dataset of US law school students.
The trade-off between accuracy and fairness is discussed, acknowledging the reduction in predictive power when excluding biased information.
The importance of selecting the right causal model is emphasized, as different models can lead to different fairness outcomes.
The presentation concludes with a call to address biases in machine learning decisions caused by race, gender, and other factors.
The Counterfactual Fairness metric and learning algorithm are presented as a solution for creating fair predictors.
The Q&A session explores practical aspects of implementing the model, potential biases, and the impact of selection biases in datasets.
Transcripts
>> Okay, great. Thanks a lot.
Today, I'll be talking about Counterfactual Fairness.
It's a joint work with Josh, Chris, and Ricardo.
The way I like to start off
the talk is just by patting us on the back.
Machine learning these days is amazing.
We have image classifications that do better than humans.
We have human-level Atari and Go players,
and we have skin cancer recognition systems that
do just as well if not better than human doctors.
So, why not use it everywhere? And people are.
People are using it for predicting
where police officers should go in order to catch crime.
They're using it to decide
whether or not to keep someone in jail.
They're also using it to make
increasingly personalized advertisements about
housing, jobs, and products.
But, in this talk,
I want to stress that there's still
significant problems with machine learning
that we need to address.
For instance, there are face detection systems that are
better identifying the faces
of white people than black people.
Algorithms are even more explicitly racist.
So, this was an example of
Google advertising recommendation system.
When you search for people's names that
were more often associated with black individuals,
you get advertisements like,
"Is this person arrested?"
This is an example shown
by the person who discovered this Latanya Sweeney,
who's the Harvard Professor
of Governments and Technology.
Machine learning has also been shown to be sexist.
So, there're word embeddings.
When trained on the Google News corpus,
they make associations like,
men are bosses and women are assistants.
So, to address this in this talk,
we're going to take a step towards a solution.
Step towards making algorithms that don't
discriminate, that are fair.
Specifically, we're going to start
from a very intuitive notion of fairness,
that a fair classifier gives the same prediction had
the person had a different race or a sex.
We're going to design a metric
to test with this definition,
and an algorithm that describes
how to learn classifiers that are fair.
So, let me demonstrate how this approach
works by considering an example.
A machine learning system that
decides who should be accepted into law school.
To make this decision,
we have data about
individuals that have already been in law school,
specifically their sex, their race,
their college GPA scores before they got to law school,
their law school entrance exam score as in the US,
this is the LSAT,
and their first year law grades,
which is what many law firms are going to use as
a measure of their success.
So, by the same token the way that we're going to try to
predict whether we should admit students or
not is whether or not their first year law grade is good.
So, we're going to try to learn a predicted Y hat
from these set of features to this label.
Now, straight away many people would be hesitant
to use these features in particular
for classification because you're
ensuring that people with
different values of race
or sex get different classifications.
So, let me distinguish between
features where we feel
a bit uncomfortable using directly,
and call these sensitive attributes A,
and the remaining features as X.
Because of this distinction,
the first thing we might try is just
simply remove these sensitive attributes.
Because now, the classifier is unaware of
the sense of attributes directly.
So, this technique is
called Fairness Through Unawareness.
This should allow us to make fair predictions, right?
Well, actually there's a crucial issue here.
The issue is that if we consider
the remaining features while we
have a good sense that
someone's GPA is influenced by their knowledge.
At the same time, it may also be unfairly
Influenced by the sensitive attributes we just removed.
The reason for this is that there are
studies that show that
minority students may feel that
teachers are unsupportive of them.
At the same time, teachers may believe that students
of a certain race have behavior issues,
which influences how they assign grades to them?
The same goes for
the other nonsense of actually, the LSAT score.
Minority students because of economic history may
have limited access to certain academic institutions,
and teachers may implicitly decide to place students
in honors classes or not based on race.
So, even though we removed the sensitive attributes,
we've failed to make our classifier unbiased against
certain races because the features themselves are biased.
So, we might try to do something a bit more clever.
In 2016, there's a very nice paper by Hardt et al.
called Equality of Opportunity.
They realized this problem
with just throwing away the sensitive attributes.
So, they proposed to build
a classifier that uses the sensitive attributes,
and use them to correct for unfairness.
The way they did this, is they said,
well as long as our classifier is equally accurate at
predicting law school success in particular,
then it provides equal opportunity
for individuals who are
black and individuals who are white.
But, what if race also
influences our label, law school success.
There's evidence that it does,
minority students' grades maybe
measurably worse simply because there aren't
minority race teachers in law school or at least as
many because of those biases I talked about earlier.
This causal phenomenon is due to
a phenomenon called stereotype threat.
So, this may lead to this result where we have
different acceptance rate or
different good law grades
for members in different groups.
Equality of Opportunity says that as long as we
predict these percentages equally accurately.
If I have a classifier that gives
34 percent of blacks
predicts they'll have a good law grades,
and 51 percent of white students, then we're fair.
But, in doing so,
we don't account for the fact that the target label,
in this case, we unfairly biased.
So, in this talk,
we're going to take a different approach.
We've proposed to model
these influences that I mentioned,
causally before constructing a classifier.
Specifically, let's assign a variable for each of
our features and our nonsensing features and our label.
Because we said, we believe race
influences these attributes,
we're going to introduce a causal link
from race to these attributes.
The same goes for sex.
We believe this is unfair causal link.
Finally, we said that we believe our features are
also influenced by someone's law knowledge.
Something that we can't observe directly,
but we can model.
The useful thing about having a causal model,
is now we can talk about how we believe
unfairness is playing a role in our data.
So, what does this causal model telling us
why it's more formally.
Well, every arrow in this causal model is
a functional relationship
between the variables that connects,
specifically the arrow from U to Y,
it means that Y is some function of a view.
Could be a deterministic non-deterministic
function, but some function.
Now we ask,
why does this actually help
us deal with a problem of fairness?
It becomes clear when we go back to our original goal,
which was to say we'd like to enforce
this definition of fairness,
giving a metric for it and then showing
how to design algorithms that satisfy it.
But this definition seems quite tricky because we have to
somehow imagine that someone's race or
sex had been different. And how can we do that?
Well, we can do that using
a quantity called counterfactuals.
Imagine we have a classifier that uses GPA
and LSAT score in order to make
a prediction of law school success.
Counterfactuals allow us to ask the question,
what would have the classifier predicted,
had someone's race or sex change?
I'll tell you how that works.
It works for any individual using a three-step procedure.
In the first step, we just take
our causal model which we believe represents our data,
and we compute any unobserved
variables in the causal model,
in this case, law knowledge.
This is what U looks like for
this person and what I want to
emphasize here is that, in this model,
U describes all the information in G, L,
and Y that isn't described by race or sex.
So it's sort of extracting
fair information from these variables.
Okay. So after we do that,
next we're going to imagine
that someone had a different race,
for instance, imagine that this person had been white.
So this is the counterfactual condition,
and in the third and final step,
we're going to use the structural equations,
equations that are described
by all of these arrows in this causal model.
The unobserved variable U that I
calculated and then counterfactually changed
race in order to recompute
all our observed variables G, L, and Y.
Let me denote these new variables
by the subscript, this counterfactual subscript.
When we do that we may get something like this.
So counterfactuals, what I
want you to take home that this procedure is,
they allow us to imagine making
just a single change to a person
and to observe how this change
propagates and affects everything else about that person.
So, let's go back to our goal.
A classifier is going to be fair then if it gives us
the same predictions on
the original data as it does on the counterfactual data.
So more formally, for
any features x for any sense of variable A,
we have that the distribution over different kind of
actual conditions of
our predictor Y hat has to be identical.
I'd like to contrast our definition with
another recent definition that was at
last year's [inaudible] Kilbertus et al.
Their definition is similar,
except that instead of using
counterfactuals they use this do
operator to intervene on the sensitive variable.
What this means is
that the differences here is that in our definition,
we're comparing the same individual with a different,
imagine version of themselves
according to the causal model,
while in their definition,
they are sort of grouping all individuals who happen to
align on the same observed features.
So there's a trade-off sort of
more specific but it requires additional assumptions.
Now, most classifiers because
these features will often
change won't be counterfactually fair.
So, how do we go about constructing
these sorts of classifies?
Well, intuitively, we're not going to want to use
any features that change when race or
sex changes like these.
But notice that when we change race or
sex because there's no variables directly going to U,
we could use this law knowledge variable.
In general, any features in your causal model
which aren't descendants of your sense of variables,
you can use in
any unobserved variables to
make counterfactually fair predictor.
This sort of makes sense because
of how he's describing you
earlier like U is sort of everything about G, L,
and Y that isn't described,
it's like the fair parts of G,
L, and Y, so we should be able to use that.
So how do we go about constructing our classifier?
Well, it's just a simple procedure.
You're going to take your features, your labels,
and your sense of variables,
and then you're going to fit
the causal model that you believe
best describes your data.
Then for every person,
you're going to compute any unobserved variables
about them in that causal model.
Then you're going to learn a predictor by
only learning one on
features that are unobserved
or that aren't descendants of A,
and the final classifier is
guaranteed to be counterfactually fair.
So, how does this work in practice?
Well, we demonstrate our method using a dataset of
US law school students with the features
I just described in this running example.
What we can do is, we could say, "Well,
if we didn't care about fairness and we use
all the features highlighted in red to make a prediction.
This is the unfair our RMSE you would get."
Then, as I described before,
what we could do is, we
could remove the sense of attributes,
we'd still have something that's likely
unfair because we believe these attributes are
influenced by race and gender and we get
this sort of accuracy results.
But because we want to make a fair prediction,
because we care more about things than just accuracy,
we're going to fit
a causal model that we believe represents our data
and by that I mean we're going to
learn all of these weights W and biases B,
and we're going to sample
unobserved variables U and then
learn to classifier just on those samples.
The cost of achieving
this counterfactual fairness is
that our predictor is our RMSE.
This is sensible because we believe that parts of G,
L, and Y are biased,
that they are polluted by race.
So, when we take away that information from J, L,
and Y we should be less able to predict Y.
But because accuracy isn't our only goal,
this is the model we propose.
But I do want to point out that depending on
the causal model that you believe is most
accurate you'll get different results.
So this model here it makes
less strict assumptions using
in the previous model and because of that,
the fair classifier will have different accuracy,
is natural trade-off there.
So, I also want to quickly say that,
if we believe that
the first causal modal I showed you
is true, then what we can do is,
we can sample people from that model that agree with
the data and then we can sample
a counterfactual individuals from that model,
and we can see how
classifier as the first two I showed you,
one they use all the features,
one they use all of them but the sense of attributes,
how much they change
when we actually change someone's race,
when we actually change someone's gender.
So if this is the right causal model than we really
should be learning counterfactually fair predictors.
So, what I want you all to take away from
this talk is that race, gender,
sexual orientation, other things could
cause machine-learning decisions to change unfairly.
Our idea to address this was to describe how
the sense of attributes calls in
fair decisions by designing a causal model.
We then introduce a fairness metric
that we call Counterfactual Fairness,
which states that a predictor is fair
if it gives the same prediction in
a world we had a different race, gender or otherwise.
We then give a learning algorithm
to learn these predictors,
and we demonstrate our technique for making
fair predictions for law school success.
I'd like to thank my co-authors' Chris,
who's at Turing, Josh,
who's at NYU Stern and Ricardo,
who's at UCL, and I'd like
to thank you all for listening.
I'll take your questions now.
>> That shows my ignorance of structural causal models.
In your final example,
when you had your data set,
it just looked like a normal latent variable model.
Is it also just like- can I
put it in Stern or some other software,
crank the handle and that's it or do I have to-?
>> Yes, we actually use Stern.
>> So it's really practical actually?
>> Yeah. So the only distinction between
the invariable model and the causal model
is that it's like a philosophical one,
you believe that interventions are real. Yeah.
>>This is anything that you
are presenting depending on the form of
the relationship so that- I noticed in
your examples are linear and
we substituted [inaudible] things in there.
>> Yeah. They can be non-linear
but you have to know them.
So in order to compute counterfactual,
so you can learn them, but they have to be explicit.
Yeah, but they could be non-linear. Sure.
>> Is that the [inaudible]?
>> Exactly.
>> Can you find biases in
the selection into your dataset that you're training for-
>> Yeah.
>> -if you assume that there are-
>> That's a very good question. Right, maybe- so
for a predictive policing maybe certain people are
selected more often because of racism.
So no, we haven't addressed the selection problem,
but I think that's
a really interesting feature direction.
>> In one of the last
slides showing the difference between
the counterfactual predictor [inaudible] on the attributes,
there were some of the [inaudible] that [inaudible].
>> Yeah.
>> So does it imply that the labels or [inaudible] in that case.
>> Yeah. For that causal model that we fit,
it implies that at least on
the data that we drew and the model we fit,
there's not a change there but it
may be that on different models.
>> [inaudible] might be wrong?
>> Exactly. Yeah.
>> Okay, [inaudible]
>> Great, let's thank Matt again
for a wonderful presentation.
関連動画をさらに表示
MIT 6.S191 (2018): Issues in Image Classification
How to detect drift and resolve issues in you Machine Learning models?
Intelligenza Artificiale: cos'e' e perche' e' importante che (anche) le donne se ne occupino
[S2E1] Prescriptive Analytics | 5 Minutes With Ingo
What is a Machine Learning Engineer
Bias in AI is a Problem
5.0 / 5 (0 votes)