How to detect DeepFakes with MesoNet | 20 MIN. PYTHON TUTORIAL
Summary
TLDRIn this video, Kalyn from Kite explores deep fakes and introduces Mesonet, a neural network designed to detect them. The video discusses how deep fakes are generated using autoencoders and the creation of a dataset from existing deep fake videos. It then delves into Mesonet's architecture, a convolutional neural network, and its performance in identifying real and fake images. The script also highlights a data bias issue where the majority of deep fake images are pornographic, potentially skewing the model's predictions.
Takeaways
- 😲 Deep fakes are synthetic media created by deep neural networks, making them nearly indistinguishable from real images, audio, or video.
- 👀 The potential misuse of deep fakes for spreading misinformation and undermining trust is a serious concern.
- 🧠 Masonet is a convolutional neural network designed to identify deep fakes, trained on a dataset of real and fake images.
- 🎨 Deep fakes can be generated by training autoencoders on different datasets, allowing for the blending of distinct visual styles.
- 📸 The Masonet dataset was assembled by extracting face images from existing deep fake videos and real video sources.
- 🤖 Masonet's architecture consists of four convolutional blocks followed by a fully connected layer, optimized for image feature recognition.
- 🔍 Batch normalization in neural networks helps improve training speed, performance, and stability by normalizing layer inputs.
- 📊 The model's predictions are influenced by the nature of the training data, which in this case includes a significant amount of pornographic deep fakes.
- 🔎 The video suggests a potential bias in Masonet's predictions due to the prevalence of pornographic content in the deep fake dataset.
- 💻 The video also promotes Kite, an AI-powered coding assistant that enhances programming efficiency through features like autocompletion and documentation.
Q & A
What is the main topic of the video?
-The main topic of the video is deep fakes and the exploration of a neural network called MesoNet designed to identify them.
What is a deep fake?
-A deep fake refers to images, audio, or video that are fake, depicting events that never occurred, created by deep neural networks to be nearly indistinguishable from real counterparts.
How can deep fakes be generated?
-Deep fakes can be generated using autoencoders, which compress and decompress image data, allowing for the blending of different images, such as merging Van Gogh's 'Starry Night' with Da Vinci's 'Mona Lisa'.
What is MesoNet and what is it used for?
-MesoNet is a convolutional neural network designed to identify deep fake images. It is used to make predictions on image data to distinguish between real and deep fake images.
How does MesoNet handle the data it is trained on?
-MesoNet handles data by extracting face images from existing deep fake videos and real video sources like TV shows and movies, ensuring a balanced distribution of facial angles and resolutions.
What is the structure of MesoNet's neural network?
-MesoNet's neural network consists of four convolutional blocks followed by one fully connected hidden layer and an output layer for predictions.
What is batch normalization and why is it used in MesoNet?
-Batch normalization is a technique that normalizes the inputs to each layer of a network to improve speed, performance, and stability. It reduces the interdependence between parameters and the input distribution of the next layer.
How does MesoNet make predictions on image data?
-MesoNet makes predictions by loading weights, processing image data through its convolutional and fully connected layers, and outputting a value that indicates whether an image is real or a deep fake.
What is a problematic data artifact mentioned in the video regarding deep fake data?
-A problematic data artifact is that an overwhelming amount of deep fake data is pornographic, which can lead MesoNet to use this statistical reality as a heuristic for its predictions.
What is Kite and how does it relate to the video?
-Kite is an AI-powered coding assistant that helps users code faster and smarter by providing autocomplete suggestions and reducing keystrokes. It is mentioned in the video as a tool that can be integrated into various code editors.
What is the significance of the data collection method for MesoNet's performance?
-The data collection method is significant because it affects the model's ability to generalize and make accurate predictions. The video discusses how MesoNet's training data, which includes a high percentage of pornographic deep fake content, could influence its predictions.
Outlines
😲 Deep Fakes and the Role of AI in Detection
Kalyn introduces the concept of deep fakes, which are AI-generated images, audio, or video that depict events that never occurred, and are nearly indistinguishable from real media. The video discusses the potential for deep fakes to be used maliciously to spread misinformation, highlighting the need for AI tools like MesoNet to identify them. The script also references a video of a deep fake of President Obama, which is actually actor Jordan Peele, to illustrate the technology's capabilities. The video will explore MesoNet's ability to predict whether images are deep fakes or real, and touch on the ethical implications and potential misuse of deep fake technology.
💻 Implementing MesoNet: An AI Model for Deep Fake Detection
This segment delves into the technical aspects of implementing MesoNet, a convolutional neural network designed to identify deep fakes. It discusses the creation of the deep fake dataset used to train MesoNet, which involved extracting face images from existing deep fake videos and real video sources. The video explains the process of auto-encoding used to generate deep fakes and how MesoNet's architecture, including its convolutional blocks and fully connected layers, contributes to its ability to make predictions. The script also introduces Kite, an AI-powered coding assistant that helps programmers code more efficiently by providing autocompletion and documentation.
🔍 Analyzing MesoNet's Performance on Deep Fake Detection
The script describes the process of preparing image data for MesoNet, including scaling pixel values and using an image data generator. It outlines the steps for creating a classifier, setting up the network architecture with convolutional and pooling layers, and loading pre-trained weights. The video then demonstrates how to use MesoNet to make predictions on individual images, discussing the implications of predicted output values and how they relate to the model's confidence in its predictions. It also introduces a plotter function to visually analyze the model's performance across different categories of images, such as correctly identified deep fakes and misclassified real images.
🚨 Addressing the Prevalence of Pornographic Deep Fakes in Datasets
The final paragraph addresses a significant issue in deep fake detection: the overwhelming presence of pornographic content in deep fake datasets. It discusses how this bias could lead MesoNet to make predictions based on the statistical likelihood of an image being pornographic rather than its authenticity. The video suggests a potential solution to this problem by proposing the use of real data from pornographic sites to balance the dataset. It concludes by summarizing the key points about deep fakes, the role of AI in detecting them, and the importance of considering data biases when training AI models.
Mindmap
Keywords
💡Deep Fakes
💡Neural Networks
💡Mesonet
💡Autoencoders
💡Convolutional Neural Network (CNN)
💡Batch Normalization
💡Max Pooling
💡Binary Classification
💡Data Stratification
💡Kite
Highlights
Introduction to deep fakes and the challenges they pose in terms of authenticity and misinformation.
Exploration of the potential for deep fakes to be used for both creative and malicious purposes.
Discussion of the importance of developing technology to identify deep fakes to combat misinformation.
Introduction to MesoNet, a convolutional neural network designed to identify deep fakes.
Explanation of the two different models developed to identify deep fakes and their training on distinct datasets.
Overview of the process of generating deep fakes using auto encoders and the creation of a blended artwork example.
Description of how the deep fake dataset was assembled from existing videos, contrasting with generating deep fakes from scratch.
Insight into the stratification of the dataset to ensure even distribution of facial angles and resolutions.
Detailed explanation of the MesoNet architecture, including its convolutional blocks and fully connected layers.
Discussion on the role of batch normalization in improving the performance of neural networks.
Explanation of the max pooling layer's function in reducing data dimensionality for faster computation.
Demonstration of how to instantiate the MesoNet model and load its weights for prediction.
Process of preparing image data for MesoNet, including scaling and data generator setup.
Identification of potential issues with class indices and the solution to remove hidden folders affecting data processing.
Evaluation of MesoNet's predictions on individual images and the interpretation of confidence levels.
Organization of predictions into categories for analysis: correctly predicted deep fakes, reals, misclassified deep fakes, and reals.
Analysis of the problematic data artifact where deep fakes are predominantly pornographic, affecting the model's predictions.
Suggestion for neutralizing the effect of pornographic deep fakes in the dataset by balancing the data sources.
Conclusion summarizing the video's exploration of deep fakes, the MesoNet model, and the challenges in identifying deep fakes accurately.
Transcripts
hey everybody it's kalyn from kite the
ai-powered coding assistant
and today we're going to explore the
topic of deep fakes and examine a neural
network designed to identify them
called mesonet
[Music]
the concept of deep fake refers to
images audio or video that are fakes
that is they depict events that never
occurred but unlike
methods of manipulating media in the
past like photoshop
these deep fakes are created by deep
neural networks to be nearly
indistinguishable
from their real counterparts check out
this video of president obama addressing
the nation for example
we're entering an era in which our
enemies can make it look like
anyone is saying anything at any point
in time did anything sound strange
that's because it wasn't obama speaking
it was jordan peele
the advances in the field of deep fakes
are equal parts impressive
and alarming on the upside the fidelity
with which we can alter media will
certainly lead to some world-class
memes but in the wrong hands this
technology can be used to spread
misinformation
and undermine public trust almost like a
sci-fi type of identity theft
where you can get anyone to say anything
and it's opposite
this means that as we get better at
generating deep fakes
we must also get better at identifying
them masonette is a convolutional neural
network designed for exactly this
purpose
in today's video we'll use masonette to
make predictions on image data
we'll examine four sets of images
correctly identify deep fakes
correctly identified reals misidentified
deep fakes
and misidentified reals and we'll see
whether the human eye can pick up on
any insights into the world of deep
fakes let's begin with a brief
discussion of masonette
and by the way you can find a link to
the author's white paper in the
description below
in the paper the researchers explained
that they had developed two different
models to identify deep fakes
both of which were trained and evaluated
on two different data sets
we will be using the meso four model
trained on the deep fake data set
now let's begin by exploring how deep
fakes are generated and how this data
set was assembled
and then we'll proceed to the model deep
fakes can be generated by using auto
encoders
at the highest level auto encoders work
like this when the data are processed
such as image data data get compressed
by an encoder
the purpose of this compression is to
suppress the effect of noise in the data
and to reduce
computational complexity conversely the
original image can be restored
at least approximately by passing the
compressed version of the image
through a decoder now suppose we want to
create a deep fake that blends van
gogh's starry night and da vinci's mona
lisa
to do so when we train the auto encoders
for different data sets
we allow the encoders to share weights
while keeping their decoders separate
that way an image of the mona lisa can
be compressed according to a general
logic
taking into account things like the
illumination position
and the expression of her face but when
it gets restored
this will be done according to the logic
specific to the starry night
which has the effect of overlaying van
gogh's distinctive style
onto da vinci's masterpiece although
this is how deep fakes are
often generated the authors note that
their deep fake data set was created
differently
rather than generating deep fakes from
scratch which they explained would limit
the amount
and diversity of the fake data that they
could then feed masonette
they chose to extract face images from
existing deep fake videos
they used about 175 existing videos
pulled from popular deep fake
platforms and that created their deep
fake data set
they explained that they extracted the
specific frames that contain
faces from the deep fake videos they
also note that they followed a similar
process for extracting the real
image data from real video sources like
tv shows and movies
and finally they explain how they
stratified their data so that the
various angles of the faces and levels
of resolution
were distributed evenly across the real
and deep fake data sets
since machine learning is all about
finding patterns in data
it's actually extremely important to
understand the nature of the data and
how it's collected
because you're ultimately going to feed
that through a model to understand it
as you'll see later with this
understanding of the data we reveal some
important insights
but stick around for that let's first
explore the model
meso4 is a convolutional neural network
with four convolutional blocks followed
by one
fully connected hidden layer as we
reproduce the model we'll explain what
each of these means
and you can find a link to the masonette
authors repository in the description as
well below
i want to take a minute to talk about
kite which is an ai-powered
coding assistant that'll help you code
faster and smarter
whether you're new to python or already
a pro you should try out kite as your
autocomplete to reduce your keystrokes
and save time programming
i use a free plugin for your code editor
that uses machine learning to save you
keystrokes while you're programming
so if you're using atom vs code spyder
pycharm
sublime or vim kite will seamlessly
integrate into your coding workflow
kite can complete entire lines of code
and it has a feature called intelligent
snippets that will help you fill in
arguments and method calls with
variables defined earlier in your script
the window on the right side of my
screen here is also a kite feature
called the kite copilot
it automatically shows you relevant
python documentation while you type
based on your cursor location this saves
you time from having to google search
for docs the best part of kite is that
it's free and you can download it from
the link in the description below
let's begin with our imports
and then let's create a dictionary
called image dimensions to store our
image dimensions
which are the height and width of the
image in pixels
and the number of color channels we set
the height and the width to 256
and since we're using color images we
want to set the channels to three
then we create a classifier class just
as the authors had done
which makes for very neat code by
prescribing just the essential methods
for the network in simple terms
we will be using the methods to load
weights and to make predictions
and next let's create a mesa 4 class the
mesa 4 class takes just one argument
and that's the classifier class we just
created we set the gradient descent
optimizer and we set its learning rate
in the constructor
and we set the parameters to compile the
model
now it's time to create the network
architecture we create a method called
init
model first we create our input layer
and assign it to the variable x
for the input layer we just need to pass
the three dimensions of our image data
then we create our four convolutional
blocks
convolutional blocks always include a
convolutional layer
and a max pooling layer and in masonette
these blocks also include a batch
normalization layer
the convolutional layer represented by
conv 2d
is the trickiest part here we set the
size
and the number of filters we will use in
convolution
check out this illustration here's how
it works
each filter represents a distinct image
feature for example a horizontal line
during convolution this filter is passed
over an image to assess the degree to
which specific regions of that image
correspond with the filter if you're a
math whiz this is done by calculating
the dot product of the filter
with each filter size region of the
image for each of the color channels
for the rest of us the important thing
to know is that the filter
identifies the existence and location of
specific image features
like horizontal or vertical lines
after the convolutional layer comes the
batch normalization layer
batch normalization is a novel technique
for improving the speed
performance and stability of neural
networks
it works by normalizing the inputs to
each layer of the network
which reduces the interdependence
between the parameters for a given layer
and the input distribution of the next
layer
this interdependence is called internal
covariate
shift and it has a destabilizing effect
on the learning process
for more about batch normalization check
out the link in the description below
the last layer of our convolutional
blocks is the pooling layer
it is in the pooling layer that we
significantly reduce the dimensionality
of our data
which greatly speeds up computation
masonette uses max pooling for this
layer
which means we reduce a region of pixel
values to that region's maximum value
it may seem like we're throwing away too
much data in this pooling layer
but remember during convolution our
model was able to locate
important image features which means we
can focus on just the parts that matter
most
think about your own visual field right
now odds are you're focusing on a very
small subset of the available data
just like masonette with successive
blocks in the convolutional base
cnns proceed to higher order feature
representations
from lines to corners to shapes to faces
masonette has four blocks in its
convolutional base followed by a fully
connected hidden layer
and then the output layer for the
prediction now that we've got the
network architecture established
we need to instantiate the model and
load the weights
we download the weights from the
masonette github repo and then save the
file mesa 4
df in a folder called weights
we load the weights using the load
method of our classifier by specifying
the file path
mesonet is now ready to make predictions
on image data
so next up we start preparing our image
data we download the deep fake
validation data set
from the mesonet github repo again
that's linked in the description below
we structure our real and deep fake
image data in separate folders
underneath a folder we call
data this is important for the flow from
directory method
which infers classes from the file
structure
the pixel values in our image data exist
in the range between 0
and 255. large integer coefficients like
this
complicate gradient descent when using
typical learning rates
so the next step is to scale our data by
a factor of one
divided by 255 that way the pixel values
fall into the range from zero to one
we instantiate our image data generator
that rescales our images
then we pass in our data by specifying
the directory path to our data folder
we set the batch size to 1 so we process
images individually
and then we set the class mode to binary
for a binary classification task of
predicting images
as real or deep fake
now let's check our class indices
flow from directory should have inferred
the names of our classes from the names
of the sold folders within the data
folder
and there should be just two classes
where our deep fakes are represented by
zero
and are reals by one there are two
problems that might arise
first if your classes are reversed such
that your deep fakes are represented by
one
and the reals by zero you'll have to
flip mesonet predictions by subtracting
these values from one
the second issue relates to having more
than just two classes
as we discussed flow from directory uses
file structure to process data
and is therefore sensitive to the
structure many ids including jupiter
labs which we're using today
will conduct periodic autosaves and
store this data in
hidden folders called i pymb
checkpoints this extra folder
compromises the flow from directory
method
so we'll have to remove it we can do
this by opening our terminal
accessing the working directory that
holds our data and removing the hidden
file
or we can do this directly from a
jupiter cell by using a magic command
we just type the exclamation mark to
invoke the command line interface
and then issue our command to remove
this file
after this let's return the image data
generator and recheck our class
indices to ensure we successfully
removed this autosave folder
great things are looking good now and
we're ready to pass in an image through
mesonet
the generator.next method returns two
items the pixel data of a given image
and the actual label for it whether it's
real or a deep fake
so let's set variables x and y equal to
generator.next
let's write three print statements to
evaluate the prediction
for the first let's show mason that's
predicted output for the image
rounded to four digits for the second
let's show the actual label and for the
third let's see whether masonette's
prediction is accurate
that is whether the actual label
corresponds with masonette's predicted
output after rounding to 0
or 1. we can also see the image in
question by using the m show
function from the matplotlib dot pi plot
package
our x data currently has an extra
dimension for its position in the batch
and this needs to be removed before
imshow can properly render the image
we can do this by using the numpy
function squeeze passing numpy.squeezex
as the argument
for plot m show
now we can see how mesonet performed on
a particular image
when predicted outputs are nearly 0 or 1
this corresponds with high degree of
confidence in the prediction
but when the predicted output approaches
0.5 the confidence in the model
prediction approaches a random guess
so let's run this a few more times and
let's see if we see
any patterns
let's organize our predictions into four
categories correctly predicted deep
fakes
correctly predicted reels misclassified
deep fakes
and misclassified reels
and we create lists to keep track of
which images fall into each category
then we write a for loop to iterate
through our data set and sort each
observation into one of these four
categories
we create two lists for each category
one to store the image data
and the other to store the corresponding
prediction value
now let's create our plotter function
which we'll use to show random batches
of images in each category
plotter takes two arguments a list of
image data and the list of corresponding
predictions
we use an f string for the x label so we
can show mesonet's predicted output for
the image in question
let's check out a collection of
correctly identified real images
we pass correct real and correct real
pred
into our plotter function
checking out these images things look
pretty good you might see some
characters from tv or movies that you
recognize here
look at the predicted output for the
images the closer these values are to
one the more confidence masonette
has that the image is real notice that
most of these outputs vary between a
range pretty close to one
okay for contrast let's look at real
images that were misclassified as deep
fakes
let's again use our plotter function but
this time we pass it misclassified real
and misclassified real pred
although these are all representing
errors it is reassuring to note that the
model's confidence for these predictions
tends to be closer to 0.5 and that's to
say that it's less confident in these
predictions that turned out to be wrong
so it's pretty much a guess and i guess
that's okay
now let's check our correctly identified
deep fakes
whoa okay that's an
odd pattern in our data there
and let's also check the deep fakes
misclassified as real
we now return to masonette's method of
collecting deep fake data
you may recall that they acquired deep
fake image data from popular deep fake
video platforms that are online
well according to the september 2019
report called the state of deep fakes
conducted by deep trace labs 96
of deep fake media is pornographic deep
fake pornography is among
the worst abuses in the world of deep
fakes and besides being a significant
social issue this also complicates the
technical side of our deep fake
classifier
here is why masonette was trained on
real data
acquired from tv and movies sources that
offer a great variety of facial
expressions and settings
however the deep fake data that the
authors acquired
was from popular deep fake platforms on
the internet sources that are
overwhelmingly dominated by pornographic
content
therefore it's expected that the model
took advantage of a data artifact
the statistical reality that deep fakes
tend to be pornographic
and reals tend to be non-pornographic
and used it as a heuristic for its
predictions
the authors explain that the models make
predictions under real conditions of
diffusion on the internet
which explains their use of popular deep
fake platforms including pornographic
websites
one wonders then whether we can
neutralize this effect and force the
model to recognize deep fakes without
the aid of statistical accidents
by using some different data since deep
fake data is limited in supply
and overwhelmingly pornographic we could
acquire our real data from pornographic
sites as well
rather than just tv shows and movies and
unlike deep fake data pornography is
relatively easy to find on the internet
or so i've heard today we talked about
deep fakes
what they are how they're generated and
why they matter
we examined a model designed to identify
deep fake images called masonette
and we implemented it to explore how it
works in doing so we did encounter a
problematic data artifact
that an overwhelming amount of deep fake
images are indeed pornographic
we explained why this matters and we
gave a suggestion for how we could
potentially neutralize this effect
well i hope you enjoyed today's video
and that you learned something about
deep fakes and neural nets
make sure to subscribe to our channel
for more data science content like this
and remember to download kite the ai
powered coding assistant
so you can code faster and smarter
Ver Más Videos Relacionados
ニューラルネットワークの性能を決定づけるデータの量と質
Tutorial 1- Introduction to Neural Network and Deep Learning
Welcome (Deep Learning Specialization C1W1L01)
Gender Classification From Vocal Data (Using 2D CNNs) - Data Every Day #090
You Can't Believe Anything On The Internet After This...
Intro to Generative AI for Busy People
5.0 / 5 (0 votes)