Explainable AI explained! | #5 Counterfactual explanations and adversarial attacks
Summary
TLDRIn diesem Video wird die Erklärung von Gegenfaktualen in der Künstlichen Intelligenz vorgestellt. Der Sprecher erklärt, wie Gegenfaktuale dazu verwendet werden können, um einem Modellbenutzer zu zeigen, welche minimalen Änderungen an den Eingabedaten eine andere Vorhersage zur Folge haben. Das Beispiel einer Schlaganfallvorhersage zeigt, wie ein Patient durch Anpassungen seines Body-Mass-Index seine Risikobewertung verändern kann. Gegenfaktuale werden sowohl zur Erklärung von Blackbox-Modellen als auch zur Erzeugung von Angriffen verwendet. Verschiedene Berechnungsmethoden, wie zufällige Sampling oder genetische Algorithmen, werden ebenfalls besprochen.
Takeaways
- 📊 Counterfactuals bieten eine alternative Erklärungsmethode für KI-Modelle, indem sie zeigen, was geändert werden könnte, um eine andere Vorhersage zu erhalten.
- 🤖 Im Gegensatz zu Feature-Importance-Techniken fokussieren sich Counterfactuals darauf, wie kleine Änderungen in den Eingabewerten das Vorhersageergebnis beeinflussen können.
- 🧑⚕️ Ein Beispiel im Video zeigt, wie John durch die Anpassung seines Body-Mass-Index seine Schlaganfallprognose von 90 % auf 70 % senken könnte.
- 📝 Counterfactuals stellen die minimal notwendigen Änderungen dar, die erforderlich sind, um das Zielklassenergebnis zu ändern, wie von Lewis (1973) theoretisch definiert.
- 📚 Der Ansatz wurde 2017 durch das Papier 'Counterfactual Explanations Without Opening the Black Box' in die maschinelle Lernforschung eingeführt.
- ⚔️ Es gibt Ähnlichkeiten zwischen der Erzeugung von Gegenfakten und der Erstellung von adversarialen Beispielen, da beide auf minimale Änderungen abzielen, um unterschiedliche Ergebnisse zu erzielen.
- ⚙️ Es gibt weiße und schwarze Box-Ansätze zur Berechnung von Gegenfakten, je nachdem, ob der Zugriff auf die Modelldetails besteht oder nicht.
- 🧬 Einige Methoden zur Berechnung von Counterfactuals verwenden genetische Algorithmen, die Selektion, Mutation und Crossovers beinhalten.
- 🔍 Microsoft bietet mit der Python-Bibliothek 'Dice' ein Tool zur Berechnung von kontrafaktischen Erklärungen, das verschiedene Gegenfakten generieren kann.
- 🎲 Da es oft keine beste Gegenfaktenerklärung gibt, kann es sinnvoll sein, mehrere diverse Gegenfakten zu betrachten, um die passende Erklärung für die jeweilige Situation zu finden.
Q & A
Was sind Kontrafaktische Erklärungen im Vergleich zu Feature-Wichtigkeitstechniken wie SHAP und LIME?
-Kontrafaktische Erklärungen geben an, welche minimalen Änderungen an den Eingabefeatures vorgenommen werden müssen, um das Modell zu einem anderen Ergebnis zu bringen, im Gegensatz zu SHAP und LIME, die die Wichtigkeit der Features und deren Einfluss auf die Vorhersage zeigen.
Wie könnte eine kontrafaktische Erklärung für den Patienten John aussehen?
-Ein Beispiel für eine kontrafaktische Erklärung für John wäre: 'Momentan liegt dein Schlaganfallrisiko bei 90 %. Wenn du deinen Body-Mass-Index auf 25 senken würdest, würde sich das Risiko auf 70 % senken.'
Wie werden kontrafaktische Erklärungen mathematisch definiert?
-Eine kontrafaktische Erklärung ist die kleinste Änderung in den Eingabefeatures, die die Vorhersage zu einem anderen Ergebnis führt, wie etwa von Schlaganfall zu keinem Schlaganfall.
Was ist der Unterschied zwischen kontrafaktischen und kontrastiven Erklärungen?
-Kontrafaktische und kontrastive Erklärungen sind in der Literatur ähnlich; beide zielen darauf ab, die minimale Änderung zu identifizieren, die eine andere Vorhersage erzeugt.
Welche Rolle spielt die Entscheidungsmatrix bei der Generierung kontrafaktischer Erklärungen?
-Die Entscheidungsmatrix stellt die verschiedenen Kombinationen der Feature-Werte dar, die die Grenze zwischen zwei verschiedenen Vorhersagen bestimmen. Durch Änderung einzelner Werte kann die minimal notwendige Änderung identifiziert werden, die die Vorhersage ändert.
Warum gibt es oft mehrere mögliche kontrafaktische Erklärungen?
-Es gibt oft mehrere mögliche kontrafaktische Erklärungen, weil es verschiedene Kombinationen von Änderungen in den Features geben kann, die das gewünschte Ergebnis erzielen. Dieses Phänomen wird als Rashomon-Effekt bezeichnet.
Wie können kontrafaktische Erklärungen in Machine Learning genutzt werden?
-Kontrafaktische Erklärungen können genutzt werden, um zu verstehen, welche Änderungen an den Eingabefeatures erforderlich wären, um eine bestimmte Vorhersage zu ändern, z. B. das Risiko eines Schlaganfalls zu senken oder einen Kredit zu erhalten.
Welche Rolle spielen White-Box- und Black-Box-Ansätze bei der Berechnung kontrafaktischer Erklärungen?
-White-Box-Ansätze nutzen interne Modellinformationen wie Gewichte von Neuronalen Netzwerken zur Berechnung kontrafaktischer Erklärungen, während Black-Box-Ansätze das Modell mehrmals abfragen, um die Beziehung zwischen Eingabe und Ausgabe zu erkennen.
Was ist der Zweck von genetischen Algorithmen bei der Berechnung kontrafaktischer Erklärungen?
-Genetische Algorithmen nutzen Auswahl, Mutation und Kreuzung, um kontrafaktische Erklärungen zu berechnen. Sie simulieren evolutionäre Prozesse, um die optimalen Änderungen zu finden.
Was ist der Nutzen der DICE-Bibliothek zur Berechnung kontrafaktischer Erklärungen?
-Die DICE-Bibliothek von Microsoft kann mehrere kontrafaktische Erklärungen generieren, die möglichst unterschiedlich sind, damit Benutzer die am besten geeignete Erklärung für ihre spezifische Situation auswählen können.
Outlines
🎬 Einführung in Gegenfaktische Erklärungen
In diesem Abschnitt wird der Begriff der Gegenfaktischen Erklärungen eingeführt, die im Gegensatz zu traditionellen Interpretierbarkeitsmethoden wie SHAP oder LIME stehen. Der Sprecher erklärt anhand eines Beispiels, wie man durch minimale Anpassungen der Eingabedaten andere Vorhersagen erhalten kann, z. B. wie ein Patient durch die Senkung des BMI das Risiko eines Schlaganfalls reduzieren könnte. Gegenfaktische Erklärungen zeigen also, wie Änderungen bestimmter Faktoren zu alternativen Ergebnissen führen.
🤖 Gegenfaktische Erklärungen in der Praxis
Hier wird beschrieben, wie Gegenfaktische Erklärungen mathematisch berechnet werden und welche Ansätze (White-Box und Black-Box) dafür existieren. Der Fokus liegt darauf, wie diese Änderungen an den Eingabedaten durch Abfragen des Modells oder durch Zugriff auf die internen Modellparameter durchgeführt werden können. Es werden auch Beispiele für die Verwendung genetischer Algorithmen zur Berechnung von Gegenfaktischen genannt.
📊 Beispiel für eine Tabellendaten-Erklärung
Der Abschnitt zeigt an einem tabellarischen Datenbeispiel, wie kleine Änderungen an den Eingabewerten wie dem BMI zu einer neuen Vorhersage führen können. Der Sprecher erklärt, dass durch wiederholte Änderungen der Eingabe eine gute Annäherung an die Entscheidungsgrenze des Modells erreicht wird. Es wird auch die Rolle von Bibliotheken wie DICE (Diverse Counterfactual Explanations) hervorgehoben, die mehrere Gegenfaktische Erklärungen für dieselbe Situation bieten können.
🖥️ Implementierung von Gegenfaktischen Erklärungen in Python
In diesem Teil wird die praktische Implementierung der Gegenfaktischen Erklärungen in Python mit der DICE-Bibliothek erklärt. Der Sprecher zeigt, wie man die notwendigen Datenobjekte und Modellobjekte erstellt und wie man diverse Gegenfaktische generieren kann. Es wird gezeigt, wie man Eingabewerte ändert und die Änderungen visualisiert, um nachvollziehen zu können, welche Faktoren die Vorhersage beeinflussen.
⚙️ Einschränkungen und Optimierungen von Gegenfaktischen
Hier wird die Möglichkeit erörtert, Einschränkungen bei der Generierung von Gegenfaktischen festzulegen, um realistischere Ergebnisse zu erhalten. Der Sprecher erklärt, wie man machbare Gegenfaktische sicherstellen kann, indem man zulässige Wertebereiche für die Änderungen definiert. Es wird auch darauf hingewiesen, dass die Vielfalt der Gegenfaktischen je nach Modell begrenzt sein kann.
Mindmap
Keywords
💡Erklärbare Künstliche Intelligenz (Explainable AI)
💡Kontrafaktische Erklärungen
💡Black-Box-Modelle
💡Merkmalsbedeutung (Feature Importance)
💡Optimierungsproblem
💡Model Agnostic
💡Entscheidungsgrenze (Decision Boundary)
💡Rashomon-Effekt
💡Adversarielle Angriffe
💡Gradientenabstieg (Gradient Descent)
Highlights
Introduction to explainable AI series and focus on interpretability techniques.
Shift from feature importance to counterfactual explanations, providing actionable insights.
Definition of counterfactuals: Minimum input change that alters prediction to a different output.
Example of counterfactuals using a stroke prediction model: Changing BMI to lower stroke risk.
Explanation of contrastive explanations: How they differ from feature importance methods.
Discussion on the historical background of counterfactuals in psychology and AI.
Comparison between counterfactuals and adversarial examples, both aiming to change model outputs.
Overview of optimization problems used to find counterfactuals or adversarial samples.
Introduction to white-box and black-box approaches to compute counterfactuals.
Detailed example of generating counterfactuals for tabular data using model-agnostic methods.
Mention of Dice library for creating diverse counterfactual explanations in Python.
Discussion of the Rashomon effect: Multiple valid counterfactuals for the same prediction.
Use of genetic algorithms and gradient-based methods to generate counterfactuals.
Demonstration of code to generate counterfactuals using a random forest model.
Conclusion and mention of upcoming video on layer-wise relevance propagation for neural networks.
Transcripts
[Music]
hi everyone welcome back to this
explainable ai series in the last videos
we've seen different interpretability
techniques to better understand black
box machine learning models
these methods mainly produced feature
importances and showed us how the inputs
affect our prediction
today i want to talk about another type
of explanation
which is usually called a counterfactual
in my opinion it's a pretty powerful
approach
and i hope that at the end of this video
you will be familiar with it as well
let's start with an intuitive basic
explanation of what counterfactuals are
in the second video i introduced the
data set for this series which is a
binary classification problem for
stroke prediction we said that our test
patient
john is interested in why he gets a
certain prediction from our model
previously our explanations using sharp
and lime
had a form like h is the most important
feature or
the higher h gets the higher the stroke
risk will be and so on
so these were mainly feature importances
and dependency plots
now counterfactuals go in another
direction
here we want to tell john what he could
do to avoid a stroke so it's a counter
fact
that gives him the possibility to change
the situation
a counterfactual could look like this
hey john
right now your prediction is 90 stroke
if you would decrease your body mass
index to 25
the prediction would be 70 no stroke so
the orange part here is now the
counterfactual
which is just another data point that
leads to a different prediction
we can think of it like a newly created
person that would end up with
no stroke so all values for our data
point for john stay the same
just body mass index is changed to
another value
and if this is done we get another
output for our black box model
which is highlighted in green we can
also give a more
formal prediction for counterfactuals
generally a counterfactual
is the smallest change in the input
features
that changes the prediction to another
output just like in our example going
from stroke to no stroke
just a side note counterfactual
explanations are sometimes also called
contrastive explanations in the
literature
so here i wanted to visualize the basic
idea for tabular data
one instance so one row in the tabular
data set
corresponds to one person and the
columns are the different features
our model uses so age body mass index
and so on
so all we have to do now is change a
specific value of our inputs
so that the prediction alters to the
target class for example no
stroke it was shown that providing
explanations of that sort so
counterfactuals
lead to a high explanatory value for
humans
counterfactuals exist already a long
time in psychology
a theoretical definition was initially
presented by
lewis in 1973. the idea for using them
in machine learning was
first presented in a paper called
counterfactual explanations without
opening the black box
this paper was published in 2017
the basic idea for calculating
counterfactuals comes from a different
ai
field which is also called ai safety
the goal in this field is to make
machine learning models more secure
against manipulation adversarial
examples are
well-designed input samples that use the
shortages
of machine learning algorithms to
generate false predictions
an example for this is this classifier
for which the input on the left
is slightly changed by adding
adversarial noise in the middle
as a result the prediction changes from
panda to gibbon
the approach to generate adversary
examples is quite similar to
generating counterfactuals as both look
for the minimal changes in the input
to generate a different output in both
cases we want to solve the following
optimization problem
find x prime which is the counterfactual
or
adversarial sample that changes the
prediction of our black box model
to a target class c in the example on
the left
access the panda image with a panda
prediction
and on the right we would have x prime
which is the adjusted input
that leads to a given prediction in this
optimization problem on the right
we also see a distance function d which
is minimized
this distance function makes sure that
we stay as close
as possible to the original inputs
that's because we still want to have a
panda on the image
but aim to get a different prediction by
our blackbox model
so to summarize it in both cases when
explaining black boxes and also when
attacking black boxes
we want to find a similar input data
point
that changes the prediction if you are
further interested in the second case
adversarial attacks i've also uploaded a
video on how to code that for
convolutional neural networks
okay so now we know what counter factors
are
and how they can be formulated
mathematically but
how do we calculate them in real life
there exist
several approaches to compute
contractuals for a prediction
all we need to do is solve the
previously presented optimization
problem
generally the approaches can be divided
into white box and black box approaches
if we have access to the model internals
such as the weights in the neural
network
then that is perfect and we can use this
information to faster find the minimum
in the optimization problem
for example in a neural network we can
simply calculate gradients for our
problem
that guide us using gradient descent if
we don't have access
we need to rely on the relationship
between inputs and outputs
that means we have to find a solution by
querying the model many times
that is the model agnostic way of
calculating counterfactuals
while the white box approaches on the
left are
model specific for both categories many
different ideas exist and i won't go
into the detail of the individual papers
however here i linked some references if
you want to read more about this topic
just an example the work for certif ai
uses a genetic algorithm to create
counterfactuals
that means it uses selection mutation
and crossovers
so the typical operations in
evolutionary algorithms
the work by vachta et al is model
specific
and uses an atom optimizer and gradient
descent to find the optimal
counterfactual
all of these approaches perform some
sort of perturbation on the input
that means they change the feature
values either randomly or guided in some
way
to better explain this let's again
quickly jump to our tabular example
here we would for instance randomly
change values
such as here we decrease the body mass
index to 32
we do this many times for example again
for 29
and then again for 22.
the last change would flip the
prediction to no stroke
if we do this many times we get a good
approximation of the decision boundary
for our model and can determine the
minimal changes that are required to
predict no stroke
and then we can use this information and
tell to this person for example john
hey if you decrease your body mass index
to 22 you will probably not get a stroke
again this approach is model agnostic
and quite similar to brute force
but if we can use the model internals we
can apply these feature changes more
intelligently
this approximation of the decision
boundary is also nicely visualized by
this example from
dice which is a counterfactuals python
library by
microsoft in this example they want to
provide an explanation
why a loan has been approved or rejected
and how to change the situation
at this point you might say wait a
minute there are many different
possibilities for counterfactuals
in this example we can either increase
the income by ten thousand dollars
or we increase the income by five
thousand and have
one more year of credit history both
will get us on the other side so both
will get the loan approved
so what is the best counterfactual well
there is no best
both are valid counterfactuals and we
cannot really say which one is better
this behavior is also known as the
rashomon effect
that's why this python library i just
talked about produces several
counterfactuals
so for example five dice so the name of
the library stands for
diverse counterfactual explanations that
simply means we want
several counterfactuals that are as
different as possible
this gives us the option to select the
most suitable explanation for our
personal situation so now that we are
familiar with all the concepts
let's switch to the code and compute
some counterfactuals
so here we are in vs code again and just
like in the other videos
we imported the data the random first
model which is our black box classifier
and some metrics to calculate the
performance of our model
so let's run this and you've seen this
many times now
import the data and we have
one second we have a shape of that for
the test and train data
and then we fit the model which again is
having a nice accuracy but imbalanced
data sets so
our f1 score is not so nice all right
and now we come to the contractual part
so
this is the library i previously talked
about
you can install it running pip install
diceml
and it will create several so diverse
contractual explanations for us
for this library we need to create two
things we need a data object
and a model object the data object tells
us
what the data input looks like if there
are continuous features
that's important because if we want to
perturbate the features for example
increase or decrease
age we need to consider discrete
features and continuous features because
they have different
perturbation strategies so we need to
manually pass in
which features are continuous in our
data set and then we also specify here
what is our target variable
and then here in the data frame section
we pass in the data from our data loader
okay in the model part we simply pass
the model
and we select the back ends so there are
back ends for tensorflow pytorch
and many others and i will select cycle
learn
as this model comes from scikit-learn
so now using those two things so the
data which is called data dice here
and the random forest dies which is the
model
object we can create this explainer
instance
which is called dice and here we can
specify
which methods we want to use for
generating counterfactuals
as i said there exist model agnostic in
model specific approaches
the most model agnostic one is random
sampling
but also genetic algorithms and other
optimizers are available
if you have a deep learning model you
can of course use an optimizer for deep
learning
that simply uses the gradients so here's
an overview
this is the github page from dice and
here are
some of the methods so they call it
gradient based so that's
model specific methods specifically
designed for neural networks
and for model agnostic approaches they
have those three
okay that's pretty much it regarding
what we need
and now what we can do is let me just
quickly run this
now we can select input data points so
as i said
this approach is a local explanability
approach so we need to select
a single data input we can use for our
generation
and then we call this function generate
counterfactuals
on the explainer object we created over
here
and we pass our input data point which
is simply a first
sample in my test data set and here i
specify
i want three counterfactuals so three
diverse counterfactuals to be generated
and as this is a binary classification
problem so the stroke prediction data
set
i can simply pass in here opposite so it
will flip
the class so let's run this
and the second function down here is
then
so this generates us the counterfactual
and then i can call on this
counterfactual objects a function called
visualize as data frame
and this will show us what is the input
so what is this
individual data sample and what are the
three counterfactuals that were
generated
okay so first let's have a look at this
query instance
so we see that the original
classification was
zero so no stroke for this first data
point
and this is based on our random forest
model
and we see this is a male uh merits
and all the other features and we can
see 70 years old
body mass index of 30.4 and a glucose
level of 72.
and now when i call this visualize as
data frame
i can pass in show only changes
that means this table down here will
only
show the differences to this data points
so as i said a counterfactual is just
another data point
that is slightly changed from this
individual
input and we can see
we see some values that changed in the
middle here which is
work type private but especially we have
many changes regarding body mass index
so apparently if the body mass index is
too low the stroke risk
increases again i wouldn't trust this
model but this is just for this example
so here we have three counterfactuals
that suggest us to
decrease the body mass index to get a
stroke prediction
so usually we would go in the other
direction and say
when do you not get a stroke prediction
but in this example we just select
stroke because
this first data point is no stroke
okay so of course body mass index 0.9
doesn't really make sense that's why we
also have the option to pass
in feasibility criteria for example you
can say
only change those features because
usually you cannot change from
male to female for example
and additionally we can also say the
permitted ranges
for changes must lie between those two
values
and there are in this library a couple
of additional options to
ensure that we get feasible
counterfactuals so counterfactuals that
actually make sense
so if i run this again i can pass in
those two parameters
into this generate counterfactuals
function again again we want to generate
three counterfactuals for the opposite
class
but now using this feasibility criteria
so again we get the same outputs but now
we should see okay here we have changes
that
lie within the permitted range so
now it says you need to change the body
mass index to the specific value and no
other changes in this example so if you
would have a more complex models you
would have changes at several positions
but again a counterfactual
is the minimal change to change the
prediction
and here apparently the minimal change
is to decrease the body mass index by
around 10 and again we also don't have a
lot of diversity in this model
that's why those three diverse
counterfactuals are still quite similar
that's just because the model is not so
good
alright that's it about counterfactual
explanations today i hope you find them
as powerful as i do
and if you have further questions just
let me know in the comments
the code is again available on github
the link is in the description
the next video will be the last part of
this explainable ai series
where we have a look at layer-wise
relevance propagation
a method which is specifically designed
for neural networks
Weitere ähnliche Videos ansehen
Überwachtes Lernen | Die Welt der KI entdecken 05
Was ist ein Algorithmus? - Einstieg Algorithmen 1
So erstellst du fotorealistische KI Bilder in Canva. Schritt für Schritt Anleitung.
Explainable AI explained! | #3 LIME
Was sind Derivate? Einfach erklärt! | Finanzlexikon
Large Language Models from scratch
5.0 / 5 (0 votes)