The Math Behind Generative Adversarial Networks Clearly Explained!

Normalized Nerd
20 Jun 202017:03

Summary

TLDRThis educational video demystifies Generative Adversarial Networks (GANs) by explaining their core components: the generative model 'G' and the discriminative model 'D'. It outlines GANs as a competitive process where 'G' creates fake data and 'D' tries to distinguish between real and fake data. The video simplifies complex concepts like the minimax game, value function, and the convergence of probability distributions, using analogies and step-by-step explanations. It aims to make viewers comfortable with the foundational math behind GANs, one of AI's most sophisticated inventions.

Takeaways

  • 😎 GANs (Generative Adversarial Networks) are composed of two models: a generative model 'G' and a discriminative model 'D'.
  • 🧠 Discriminative models predict the target variable given the input, like logistic regression, while generative models learn the joint probability distribution of input and output, like Naive Bayes.
  • 🎭 In GANs, the generator 'G' produces fake data points, and the discriminator 'D' determines if a data point is real or generated.
  • 🤼‍♂️ G and D compete in an adversarial setup, improving each other's performance over time.
  • 🧮 The structure of GAN involves multi-layered neural networks with weights (theta G and theta D), chosen for their ability to approximate any function.
  • 📊 The generator takes random noise as input and produces 'G of Z', attempting to mimic the original data distribution.
  • 🔢 The discriminator acts as a binary classifier, outputting the probability of the input being real data, trained with labels 1 for real and 0 for generated data.
  • 🎲 GANs operate on a minimax game theory, where G aims to minimize and D aims to maximize a value function, similar to the binary cross-entropy function.
  • 📉 The value function for GANs is optimized using stochastic gradient descent, with the discriminator updated multiple times for every update of the generator.
  • 🏁 The training goal of GANs is for the generator to replicate the original data distribution so well that the discriminator cannot distinguish between real and fake data.

Q & A

  • What are Generative Adversarial Networks (GANs)?

    -GANs are a class of artificial intelligence algorithms used in unsupervised machine learning, consisting of two neural networks, a generative model G and a discriminative model D, that are trained together in an adversarial setup. G generates new data points (fake data), and D evaluates them to determine if they are real or fake.

  • What is the difference between generative and discriminative models?

    -Generative models learn the joint probability distribution of input and output variables and can create new instances of data by learning the data distribution. Discriminative models learn the conditional probability of the target variable given the input variable and are used for prediction tasks, like logistic regression and linear regression.

  • How does the generator in a GAN work?

    -The generator in a GAN takes a random noise vector as input and transforms it into a data point that resembles the original data distribution. It does this by learning the underlying data distribution through training.

  • What role does the discriminator play in a GAN?

    -The discriminator in a GAN acts as a binary classifier that determines whether a given data point is real (from the original dataset) or fake (generated by the generator). It competes with the generator, improving its ability to distinguish between real and fake data.

  • What is the minimax game in the context of GANs?

    -The minimax game refers to the adversarial competition between the generator and discriminator in a GAN. The generator tries to minimize the loss function to produce more convincing fake data, while the discriminator tries to maximize the loss function to better distinguish between real and fake data.

  • How is the value function of a GAN defined?

    -The value function of a GAN is defined as the expectation of the log probability of the discriminator correctly classifying real data as real and fake data as fake. It is a mathematical expression that both the generator and discriminator aim to optimize in opposite directions.

  • What is the goal of the training process in a GAN?

    -The goal of the training process in a GAN is for the generator to produce data that is indistinguishable from the real data, and for the discriminator to be unable to accurately classify the generated data as fake. This is achieved when the generator's output distribution converges to the real data distribution.

  • Why is the training of GANs considered difficult?

    -Training GANs is difficult because it involves finding a balance between the generator's ability to produce realistic data and the discriminator's ability to distinguish between real and fake data. The training process can be unstable, and the models may not converge properly without careful tuning of hyperparameters and training techniques.

  • What is the role of the noise vector in GANs?

    -The noise vector, often sampled from a simple distribution like a Gaussian, serves as the input to the generator. It is a source of randomness that allows the generator to produce a diverse range of outputs, contributing to the variety of the generated data.

  • How does the concept of Jensen-Shannon divergence relate to GANs?

    -Jensen-Shannon divergence is a measure of the difference between two probability distributions. In the context of GANs, it is used to show that as the generator optimizes the value function, the distribution of generated data (PG) converges to the distribution of the real data (Pdata), minimizing the divergence between them.

Outlines

00:00

🧠 Introduction to Generative Adversarial Networks (GANs)

The video begins with an introduction to GANs, emphasizing that they are not a single model but a combination of two models: a generative model (G) and a discriminative model (D). The presenter clarifies the difference between generative and discriminative models in machine learning, explaining that generative models learn the joint probability distribution of input and output variables, while discriminative models learn the conditional probability of the target variable given the input. The video then delves into the adversarial nature of GANs, where the generator creates fake data points, and the discriminator determines whether data points are original or generated. The presenter uses the universal approximation theorem to justify the use of neural networks in GANs, highlighting their ability to approximate any function.

05:02

🎲 The Minimax Game and Value Function of GANs

The second paragraph explains the concept of a minimax game, which is a two-player game where one player tries to maximize their chances of winning while the other tries to minimize the first player's chances. This is analogous to the adversarial setup in GANs, where the generator (G) aims to minimize a certain expression, and the discriminator (D) aims to maximize it. The value function for GANs is introduced, which is similar to the binary cross-entropy function used in binary classification tasks. The presenter then demonstrates how the value function relates to the binary cross-entropy function and how it is used in the training process of GANs. The concept of expectations in probability theory is also introduced to explain how the value function is applied to the entire training dataset.

10:04

🔍 Proving Convergence of GANs to the Original Data Distribution

In this segment, the presenter discusses the theoretical guarantee of GANs' ability to replicate the original data distribution. The goal is to show that the generator's output distribution (PG) will converge to the original data distribution (Pdata) if the generator can find the global minimum of the value function. The presenter uses calculus to find the conditions under which the discriminator (D) maximizes the value function and then substitutes these conditions back into the value function to analyze the generator's objective. The concept of Jensen-Shannon divergence is introduced as a measure of the difference between two probability distributions, and it is shown that the generator's objective is to minimize this divergence, which leads to the conclusion that PG will equal Pdata at the global minimum of the value function.

15:05

📈 Training Phases and Challenges in GANs

The final paragraph describes the different phases of training in GANs and the challenges involved. Initially, neither the generator nor the discriminator performs well, as the generator's output distribution (PG) does not replicate the original data distribution (Pdata), and the discriminator cannot accurately classify data points. As training progresses, the discriminator improves and can distinguish between real and fake data, while the generator's output distribution becomes closer to the original data distribution. Eventually, if training is successful, the generator's output distribution becomes indistinguishable from the original data distribution, and the discriminator cannot tell the difference, outputting 0.5 for every input. The presenter concludes by acknowledging the complexity of training GANs in practice and encourages viewers to appreciate the mathematical concepts behind this advanced AI technology.

Mindmap

Keywords

💡Generative Adversarial Networks (GANs)

GANs refer to a class of machine learning models that consist of two neural networks, a generator and a discriminator, which are trained together in an adversarial fashion. The generator creates new data points, while the discriminator evaluates them to determine if they are real or fake. In the context of the video, GANs are used to produce new data points that closely resemble the original data distribution, with the ultimate goal of the generator being to 'fool' the discriminator into thinking the generated data is real.

💡Generative Model

A generative model is a type of machine learning model that learns the joint probability distribution of the input and output variables. It can be used to generate new data instances by sampling from the learned distribution. In the video, the generator in a GAN is a generative model that produces fake data points, aiming to replicate the distribution of the original data.

💡Discriminative Model

Discriminative models, in contrast to generative models, learn the conditional probability of the target variable given the input variable. They are typically used for classification tasks. In the video, the discriminator in a GAN is a discriminative model that tries to distinguish between real and generated data points.

💡Neural Networks

Neural networks are a set of algorithms modeled loosely after the human brain that are designed to recognize patterns. They interpret sensory data through a kind of machine perception, labeling, or clustering raw input. In the video, both the generator and discriminator are implemented as neural networks due to their ability to approximate any function, as per the universal approximation theorem.

💡Adversarial Setup

In the context of GANs, an adversarial setup refers to the competitive training process where the generator and discriminator are pitted against each other. The generator aims to create increasingly realistic data, while the discriminator aims to improve its ability to identify real from fake data. This competition leads to the improvement of both models.

💡Minimax Game

A minimax game is a two-player game where one player aims to maximize their chances of winning while the other tries to minimize the first player's chances. In the video, the training of GANs is likened to a minimax game, where the generator tries to minimize the discriminator's ability to correctly classify data, and the discriminator tries to maximize its accuracy.

💡Value Function

In the context of GANs, the value function is a mathematical expression that represents the objective of the minimax game. It is what the generator and discriminator aim to minimize or maximize, respectively. The video explains that the value function for GANs is similar to the binary cross-entropy function, which is used to measure the performance of the discriminator.

💡Stochastic Gradient Descent

Stochastic Gradient Descent (SGD) is an optimization algorithm used to update the parameters of a model in a way that minimizes a loss function. In the video, SGD is used to optimize the value function for both the generator and discriminator during the training of a GAN.

💡Jensen-Shannon Divergence

Jensen-Shannon Divergence is a method for measuring the difference between two probability distributions. It is symmetric and always finite, unlike the more commonly used Kullback-Leibler divergence. In the video, it is mentioned as a way to understand the optimization goal of a GAN, where the generator aims to minimize the divergence between the distribution of generated data and the original data.

💡Global Minimum

The global minimum refers to the lowest possible value of a function, which in the context of GANs, the generator aims to achieve for the value function. Achieving the global minimum means that the generator has successfully learned to produce data that is indistinguishable from the original data distribution. The video discusses the theoretical underpinnings of why and how this can be achieved.

Highlights

Generative Adversarial Networks (GANs) are not a single model but a combination of two models: a generative model (G) and a discriminative model (D).

Discriminative models learn the conditional probability of the target variable given the input variable, like logistic regression.

Generative models learn the joint probability distribution of the input and output variables, allowing for new data instance generation.

In GANs, the generator (G) produces new data points, while the discriminator (D) determines if data points are original or generated.

GANs operate in an adversarial setup where the generator and discriminator compete, improving each other's performance.

The structure of GANs involves multi-layered neural networks for G and D, with weights represented by theta G and theta D.

The generator takes random noise as input and produces data points that mimic the original data distribution.

The discriminator acts as a binary classifier, determining the likelihood of the input data being from the original dataset.

GAN training involves a minimax game where the generator tries to minimize the loss function while the discriminator tries to maximize it.

The value function for GANs is derived, which is similar to the binary cross-entropy function but with expectations over the entire dataset.

The training loop for GANs includes an outer loop for the entire training process and an inner loop for discriminator updates.

For every K updates of the discriminator, the generator is updated once, following the principle of alternating training.

The theoretical guarantee of GANs is discussed, aiming to prove that the generator's distribution will converge to the original data distribution.

Jensen-Shannon divergence is introduced as a method to measure the difference between two probability distributions.

The global minimum of the GAN value function is analyzed, showing that the generator's distribution matches the original data's distribution at this point.

Different phases of GAN training are described, from initial setup to the point where the generator successfully replicates the data distribution.

The video concludes with a simplified view of GANs, emphasizing the difficulty of training and the importance of understanding the underlying math.

Transcripts

play00:00

hello people from the future welcome to

play00:01

normalise nerd in this video I'm gonna

play00:03

explain the gang yes the famous

play00:06

generative adversarial networks I know

play00:09

that this is one of those topics if you

play00:11

don't approach it properly then this

play00:13

might feel really intimidating but trust

play00:15

me by the end of this video you will

play00:17

feel very comfortable with gangs now I

play00:19

put a lot of effort in making these

play00:21

videos so if you like my content please

play00:23

subscribe and hit the bell icon let's

play00:25

get started okay the first thing that

play00:34

you need to know is gang is not a single

play00:36

model it's a combination of two models

play00:38

the first one is a generative model

play00:40

called G and the second one is a

play00:42

discriminative model called D now what

play00:45

the hell are discriminative and

play00:46

generative models well in machine

play00:48

learning we have two main methods for

play00:50

building predictive models the most

play00:52

famous one is the discriminative method

play00:54

well in this case the model learns the

play00:57

conditional probability of the target

play01:01

variable given the input variable most

play01:04

common examples are logistic regression

play01:06

linear regression etc on the other hand

play01:09

in a generative model the model learns

play01:12

the joint probability distribution of

play01:15

the input variable and the output

play01:17

variable if the model wants to predict

play01:20

something then it uses Bayes theorem and

play01:22

it computes the conditional probability

play01:25

of the target variable given the input

play01:28

variable the most common example is the

play01:30

naive Bayes model

play01:32

the biggest advantage of generative

play01:34

models over discriminative models is

play01:36

that we can use generative model to make

play01:39

new instances of data because in this

play01:43

case we are learning the distribution

play01:45

function of the data itself which is

play01:47

simply not possible using a

play01:49

discriminator in our Gantz we are using

play01:52

this generative model to produce new

play01:54

data points that is we are producing

play01:56

fake data points using our generator and

play01:59

we are using this discriminator to tell

play02:03

if a given data point is an original one

play02:05

or it has been produced by our generator

play02:07

now these two models work in an

play02:10

adversarial setup that

play02:11

means they compete with each other and

play02:13

eventually both of them gets better and

play02:16

better in their job let me show you the

play02:19

structure of this thing okay so here's

play02:22

the high-level view of our gang this G

play02:25

and D are nothing but multi-layered

play02:28

neural networks and this theta G and

play02:31

theta D are just the weights okay we are

play02:34

using neural networks here because they

play02:36

can approximate any function we know

play02:38

that from the universal approximation

play02:40

theorem now look at here suppose this is

play02:43

our distribution function of the

play02:46

original data now in reality we can't

play02:49

really draw that or even mathematically

play02:51

compute that because we input images we

play02:55

mean put voices we input videos and they

play02:58

are higher dimensional complex data so

play03:00

this is only for mathematical analysis

play03:02

okay and look at here this is a noise

play03:05

distribution and you can see that this

play03:07

is just the normal distribution I am

play03:09

taking and I am gonna sample randomly

play03:13

some data from this distribution and

play03:15

we'll feed that to our generator will to

play03:19

get something from the generator

play03:21

we must input something right and we are

play03:24

inputting here noise that means this Z

play03:26

contains no information and after

play03:30

passing this Z to our model generator it

play03:34

will produce something called G of Z now

play03:37

look at that I have described the

play03:39

distribution of the G of Z with the same

play03:43

X that I have written for the earlier

play03:45

data well I am doing this because the

play03:47

domain of our original data is same as

play03:49

the range of G of Z this is important

play03:52

because we are trying to replicate our

play03:54

original data so just remember the short

play03:57

forms when I say P data this represents

play03:59

the probability distribution of our

play04:01

original data

play04:02

when I say PZ it represents the

play04:04

distribution of the noise and when I say

play04:07

PG it represents the distribution

play04:09

function for the output of our generator

play04:11

and we are going to pass reconstructed

play04:16

data and original data to our

play04:18

discriminator and this will provide us a

play04:22

single number and the single

play04:24

will tell the probability of the input

play04:27

belonging to the original data so you

play04:31

can see this discriminator is just a

play04:32

simple binary classifier and for the

play04:35

training purpose when you are putting

play04:37

the original data to the discriminator

play04:40

we will say Y is equal to 1 and when we

play04:42

are going to pass reconstructed data we

play04:44

will say the level is 0 and the D will

play04:48

try to maximize the chances of

play04:51

predicting correct classes but G will

play04:55

try to fool D so we can say that G and D

play04:58

are playing a two-player minimax game

play05:01

what the hell is that

play05:03

well a minimax game is just a two-player

play05:05

game like tic-tac-toe where we can

play05:07

interpret the objective as one player is

play05:10

trying to maximize its probability of

play05:13

winning what the other player is trying

play05:14

to minimize the probability of winning

play05:17

of the first player okay now we are

play05:20

saying about maximizing and minimizing

play05:22

but what should they maximize or

play05:24

minimize we need a mathematical

play05:26

expression right and that's called as

play05:28

value function let me show you the value

play05:30

function for this minimax game here well

play05:33

this is the value function for gaen and

play05:36

here mean and Max simply represents that

play05:39

G wants to minimize this expression and

play05:41

D wants to maximize this expression now

play05:44

I know that at first this might feel

play05:46

gibberish but if you look here closely

play05:49

you will find that this expression is

play05:51

surprisingly similar to the binary cross

play05:55

Centauri function and if you are feeling

play05:57

like that then you are absolutely

play05:59

correct

play06:00

let me show you why so this is our

play06:03

ordinary binary cross interval function

play06:04

for a moment just ignore the negative

play06:07

sign and the summation so this is just

play06:10

the binary kazantip function for one

play06:13

input right why is the ground truth that

play06:15

is the label and Y hat is just the

play06:18

prediction of the model when y is equal

play06:21

to 1 that is when we are passing the

play06:22

original data the wipe read is equal to

play06:26

D of X so if you just replace these

play06:29

things in the formula you will get lost

play06:31

to Ln of D of X now when we are giving

play06:36

the

play06:37

data as our input the wipe red will be d

play06:40

of G of Z because obviously first we

play06:43

have passed the noise to our generator

play06:45

and it has produced something and then

play06:47

we are giving the produced fake data to

play06:49

the model B and if you replace these

play06:53

things in the function you will get Ln

play06:56

of 1 minus D of G of Z now let's combine

play07:01

them so I have just added them together

play07:03

and we get this does it look similar to

play07:06

the value function yes but here we are

play07:09

missing the capital E's at the front

play07:11

well they are just expectations

play07:15

understand that this expression is valid

play07:17

for only one data point but we have to

play07:21

do this for the entire training data set

play07:23

right and to represent that

play07:25

mathematically we need to use

play07:26

expectation well expectation is just the

play07:30

average value of some experiment if you

play07:32

perform this experiment a large number

play07:34

of times

play07:35

suppose you are playing a game where you

play07:36

need to roll a die and your score is the

play07:39

number on the upper face so if you play

play07:42

this game for a really long time then

play07:44

the expected score is 3.5 the formula is

play07:49

very simple you just need to add all the

play07:51

possible outcomes multiplied with their

play07:54

probability so it's kind of a weighted

play07:56

mean so let's apply the expectation on

play07:59

this equation and look at here that we

play08:02

are adding all the scores with their

play08:05

probability same thing goes for here but

play08:08

this is only true for a discrete

play08:10

distribution if we assume that our P

play08:12

data and PZ are actually continuous

play08:15

distribution then the integral sign will

play08:18

replace the summation and we have to

play08:21

place the DX and DZ accordingly and this

play08:25

whole thing is written in the short form

play08:28

as e ok so now you know the value

play08:32

function for Gann does it look

play08:34

intimidating now I don't think so now

play08:37

I'm gonna tell you how we optimize this

play08:39

function in practice well this is our

play08:42

big training loop and just like every

play08:45

other neural network we have to optimize

play08:46

the loss function using some stochastic

play08:49

process

play08:50

I am using here the stochastic gradient

play08:52

descent okay so first we enter our big

play08:56

training loop and we fix the learning of

play08:59

G and then we are entering the inner

play09:01

loop for B well this loop will continue

play09:04

for K steps okay and in this loop first

play09:08

we take m data points from the original

play09:11

distribution and M data points from the

play09:14

fake data okay and then we update the

play09:18

parameters of our discriminator by

play09:20

gradient ascent why because remember

play09:23

that our discriminator is trying to

play09:26

maximize the value function so after we

play09:29

have performed K updates of D we get out

play09:32

of this loop and we fix the learning of

play09:34

D now we are going to train our

play09:36

generator for this case we take only M

play09:40

fake data samples and update the

play09:43

parameters of our generator by gradient

play09:46

descent why because remembers generator

play09:48

is trying to minimize the value function

play09:51

now you might ask why I haven't taken

play09:53

this portion in the update step of

play09:58

generator well look closely does this

play10:00

expression contains any term

play10:03

corresponding to the generator no so the

play10:07

partial derivative of this term with

play10:09

respect to theta GU will be zero that's

play10:11

why we are taking only this portion one

play10:14

important thing you should note that for

play10:17

every key updates of the discriminator

play10:20

we are updating the generator once okay

play10:23

if you have understood this video so far

play10:25

then you know what is the value function

play10:28

for Gann and how we optimize this in

play10:30

practice but if you are like me and want

play10:32

to know what is the guarantee that our

play10:34

generator will surely replicate the

play10:36

original distribution then take a deep

play10:39

breath and continue watching okay just

play10:42

to be clear we want to prove that PG

play10:44

will converge to P data if our generator

play10:48

is able to find the global minimum for

play10:51

the value function in other words we

play10:53

want to show that PG is equal to P data

play10:56

at the global minimum of the value

play10:58

function okay this is a two-step process

play11:00

first of all we are fixing the G

play11:03

we wanna see for which value of the

play11:06

discriminator the value function is

play11:08

maximum look here that I have replaced G

play11:11

of Z with X well we can do this because

play11:14

the domain of both of them are same now

play11:17

if you differentiate this then you will

play11:20

see that the maximum value of this

play11:23

expression will occur if the d of x

play11:26

attains this expression P data over P

play11:31

data plus P G well obviously one can

play11:33

differentiate that and attain this

play11:35

expression but let us look into it ibly

play11:38

so we can represent our value function

play11:40

like this formula a ln x plus b ln 1

play11:45

minus x and we want to find the value of

play11:48

x for which this expression is maximum

play11:51

so if I take B is equal to 0.6 and a is

play11:55

equal to 0.45 then you will see that the

play11:58

graph looks something like this and the

play12:00

Maxima occurs at point 4 to 9 which is

play12:05

nothing but a Upon A plus B now let's

play12:09

fix the BX as this and replace that into

play12:15

our value function so after fixing D and

play12:17

substituting that in the value function

play12:20

we get this and after a little

play12:22

modification we are getting this long

play12:24

expression and here

play12:26

mimsey just represents that G will try

play12:29

to minimize this thing now understand

play12:31

what we want to do here

play12:32

well we want to prove that probability

play12:35

distribution of generator will be

play12:37

exactly same as the probability

play12:40

distribution of the data so it makes

play12:43

sense to talk about some of the methods

play12:45

to measure the difference between two

play12:47

distributions and one of the most famous

play12:50

methods are G is divergence that is

play12:54

Jenson Shannon divergence now if you

play12:57

look at the formula for J's divergence

play13:00

then it looks surprisingly close to this

play13:04

long expression isn't it just for a

play13:07

refresher this e here just represents

play13:10

the expectation in the first portion to

play13:13

find the expectation of this value we

play13:16

are using the probabilities from the

play13:19

first distribution but in the second

play13:21

portion we are using the probabilities

play13:24

from the second distribution okay now

play13:27

let's see if we can somehow get to the J

play13:31

is divergence from this thing okay so

play13:34

after the little modification we are

play13:36

getting this so what have we done here

play13:38

we have just multiplied two in these two

play13:41

logarithms and for this we need to

play13:45

subtract two times the Ln two here all

play13:48

right and if we look closely here then

play13:51

this whole portion is actually equals to

play13:54

two times the J is divergence of P data

play13:58

and PG and obviously we have the

play14:01

negative two Ln two here so G wants to

play14:06

minimize this what is the minimum value

play14:08

of this expression

play14:09

well the J is divergence between any two

play14:12

distribution cannot be negative the

play14:15

minimum it can get is zero and it will

play14:18

attend zero only when p1 is equal to p2

play14:23

that is if P data is equal to P G then

play14:27

only this term will be zero and the

play14:29

whole expression will attain its minimum

play14:31

that is minus 2 Ln 2 so voila now you we

play14:35

have proved that add the global minimum

play14:38

of our value function the P G will be

play14:42

exactly same as P data and our generator

play14:45

is actually trying to attain that state

play14:48

now let me show you how G achieves that

play14:51

state that is different phases of

play14:53

training so at the beginning neither the

play14:55

discriminator nor the generator knows

play14:58

what they are doing so the P G is not

play15:02

replicating the P data and the

play15:04

classifier discriminator is not

play15:06

classifying as well after updating the

play15:10

theta D that is when the discriminator

play15:13

has learned something so the classifier

play15:16

will be better so now the discriminator

play15:18

can actually distinguish between the

play15:20

real data and the fake data now after

play15:23

the generator has learned something look

play15:26

at that the distribution P G

play15:29

is now closer to the P data and the

play15:32

discriminator is trying to predict the

play15:34

true level of the data points but it is

play15:37

not performing as well now at the end

play15:40

when the generator has attained the

play15:43

minimum of the value function then it

play15:47

has successfully replicated the

play15:49

distribution function of the data point

play15:52

so now PG is indistinguishable from P

play15:56

data so now it is impossible for the

play15:59

discriminator to tell which data point

play16:01

is an original one and which data point

play16:04

is a generated one so the discriminator

play16:07

will output 0.5 for every input and that

play16:10

is what we want to achieve well this is

play16:13

a very simplistic view of the gaen in

play16:15

reality training the Gann is really hard

play16:18

the goal of this video was to make you

play16:20

understand

play16:21

Gantz I hope you are now very

play16:23

comfortable with the concept of ganz and

play16:24

if you have understood everything that I

play16:26

have talked about in this video then do

play16:28

congratulate yourself because now you

play16:30

know the math behind one of the finest

play16:34

inventions in AI I hope you have liked

play16:37

this video please share this video and

play16:39

subscribe to my channel

play16:40

stay safe and thanks for watching

play16:47

[Music]

Rate This

5.0 / 5 (0 votes)

Ähnliche Tags
GANsAIMachine LearningDiscriminative ModelsGenerative ModelsNeural NetworksDeep LearningData DistributionAI TutorialTech Education
Benötigen Sie eine Zusammenfassung auf Englisch?