What are Diffusion Models?

Ari Seff
20 Apr 202215:28

Summary

TLDR扩散模型是一种新兴的生成模型,通过模拟数据从清晰图像逐渐加入高斯噪声至纯噪声的过程,然后逆向去除噪声以恢复原始图像。这种模型在图像生成和条件设置方面表现出色,甚至在某些任务中超越了生成对抗网络(GANs)。视频解释了扩散模型的基本机制,包括正向扩散过程、逆向去噪过程以及如何通过优化变分下界来训练模型。此外,还探讨了如何将扩散模型应用于条件生成和图像修复等任务。

Takeaways

  • 🌟 扩散模型(Diffusion Models)是生成模型领域的新兴方法,尤其在图像生成方面取得了显著进展。
  • 🚀 扩散模型通过逐步添加高斯噪声将图像转换为纯噪声样本,然后学习逆向过程以恢复原始图像。
  • 🔄 正向扩散过程模拟数据逐渐变为噪声,而逆向过程则尝试从噪声中恢复数据。
  • 📈 扩散模型使用马尔可夫链(Markov Chain)来描述正向噪声添加过程,每一步的分布仅依赖于前一步的样本。
  • 🎯 逆向过程中,模型通过学习去除噪声的步骤,逐步将样本引导回数据流形,生成合理的样本。
  • 🔧 在训练过程中,扩散模型的目标是最大化一个下界(变分下界),而不是直接最大化模型对数据的密度分配。
  • 🔄 扩散模型可以条件性地生成样本,例如根据类别标签或文本描述来生成图像。
  • 🖼️ 扩散模型在图像修复(inpainting)任务中也表现出色,通过专门训练的模型来填补图像中的缺失部分。
  • 📊 扩散模型与其他生成模型(如生成对抗网络GANs)相比,在某些任务上表现出更好的性能。
  • 🛠️ 扩散模型的训练依赖于对正向过程中每一步的噪声水平的准确学习。
  • 🔗 扩散模型与概率流(Probability Flow ODE)有紧密联系,后者通过数值积分近似对数似然。

Q & A

  • 什么是扩散模型?

    -扩散模型是一种生成模型,它通过模拟数据从清晰样本逐渐加入噪声直到变成纯噪声的过程,然后再逆向操作,逐步去除噪声以恢复原始样本,从而学习数据的分布。

  • 扩散模型在图像生成领域取得了哪些成就?

    -扩散模型在图像生成领域取得了显著的成功,它们在某些任务上甚至超过了生成对抗网络(GANs)等其他类型的生成模型。例如,最近的扩散模型在感知质量指标上超越了GANs,并且在将文本描述转换为图像、绘画和图像操作等条件设置中表现出色。

  • 扩散模型的正向过程是如何定义的?

    -扩散模型的正向过程是通过一个马尔可夫链来定义的,其中每一步的分布仅依赖于前一步的样本。这个过程逐渐向图像中加入噪声,最终形成一个纯噪声分布。

  • 为什么扩散模型使用小步长的正向过程?

    -使用小步长的正向过程意味着学习逆向过程不会太困难。小步长减少了每一步的不确定性,使得模型能够更准确地推断出前一步的状态。

  • 扩散模型的逆向过程是如何工作的?

    -扩散模型的逆向过程是一个学习过程,它通过一个参数化的反向马尔可夫链来逐步去除噪声,目标是恢复到数据的分布。这个过程通过最大化一个变分下界来训练,而不是直接最大化似然函数。

  • 扩散模型的训练目标是什么?

    -扩散模型的训练目标是最大化变分下界(也称为证据下界),这是一个关于边际对数似然的下界。这个目标包括一个重建项和一个KL散度项,分别鼓励模型最大化数据的期望密度和使近似后验分布与潜在变量的先验分布相似。

  • 扩散模型如何实现条件生成?

    -扩散模型可以通过将条件变量(如类别标签或句子描述)作为额外输入在训练期间进行条件生成。在推理时,模型可以使用这些条件信息来生成特定于条件的样本。

  • 扩散模型在图像修复(inpainting)任务中的表现如何?

    -扩散模型在图像修复任务中取得了成功。通过专门针对此任务进行微调的模型,可以在给定完整上下文的情况下更好地填充图像中缺失的部分,从而避免边缘伪影。

  • 扩散模型与变分自编码器(VAEs)有什么相似之处?

    -扩散模型与VAEs相似之处在于它们都可以被视为潜在变量生成模型,并且它们都使用变分下界作为训练目标。然而,扩散模型的正向过程通常是固定的,而逆向过程是学习的重点。

  • 扩散模型的连续时间形式可以产生什么?

    -扩散模型的连续时间形式可以产生所谓的概率流(probability flow ODE),这允许通过数值积分来近似对数似然,从而提供了一种不同于变分下界的密度估计方法。

  • 扩散模型与分数匹配模型有什么关系?

    -扩散模型与分数匹配模型之间存在紧密的联系。分数匹配模型中的分数实际上等于扩散模型中预测的噪声,直到一个缩放因子。因此,我们可以将扩散模型中的去噪过程视为近似地跟随数据对数密度梯度的过程。

Outlines

00:00

🌟 扩散模型的基本概念

本段介绍了扩散模型的基本原理,即通过向图像中添加高斯噪声并重复该过程,最终得到一个纯噪声样本。然后,扩散模型的目标是从这个纯噪声图像开始,逐步去除噪声,恢复出原始的清晰图像。这种方法在图像生成领域取得了成功,并且在某些任务上超越了其他生成模型,如生成对抗网络(GANs)。视频将探讨扩散模型的基本机制及其在不同生成设置中的适应性。

05:02

📈 扩散模型的正向与逆向过程

这一部分详细解释了扩散模型中的正向和逆向过程。正向过程通过逐步添加噪声将样本推向数据流形之外,而逆向过程则训练模型回到数据流形,生成合理的样本。介绍了如何通过最大化一个下界来优化模型,这个下界是变分下界,它包括重构项和KL散度项。此外,还讨论了如何将扩散模型视为潜在变量生成模型,并解释了训练目标的设定。

10:04

🛠️ 扩散模型的训练与实现

本段讨论了扩散模型的训练细节,包括如何实现逆向步骤、如何处理噪声变量以及如何通过重参数化来提高样本质量。作者选择了将逆向步骤的方差设置为时间特定的常数,并提出了一种简单的变分下界版本,该版本在训练中关注更具挑战性的噪声步骤。此外,还探讨了如何使扩散模型有条件地采样,例如使用类别标签或文本描述,以及如何进行图像修复(inpainting)等条件生成问题。

15:04

🚀 扩散模型的进展与前景

最后一段总结了扩散模型的进展和未来方向。扩散模型在生成任务中的速度较慢,但通过持续的研究,正在努力提高采样速度。扩散模型还与概率流(probability flow ODE)和分数匹配模型有紧密联系,这些模型通过数值积分近似对数似然。扩散模型的发展势头强劲,其在图像生成和密度估计基准测试中的性能令人兴奋。

Mindmap

Keywords

💡扩散模型

扩散模型是一种生成模型,通过模拟数据从清晰图像逐渐加入高斯噪声直至变为纯噪声的过程,然后学习如何逆转这一过程,从噪声图像中恢复出原始图像。在视频中,扩散模型被用来生成图像,并在某些任务上超越了生成对抗网络(GANs)。

💡高斯噪声

高斯噪声是一种统计噪声,其概率密度函数符合正态分布,即钟形曲线。在扩散模型中,高斯噪声被用来逐步模糊和扭曲原始图像,直到图像变得无法识别,形成纯噪声图像。

💡生成对抗网络(GANs)

生成对抗网络(GANs)是一种由生成器和判别器组成的深度学习模型,用于生成新的、与真实数据相似的数据实例。在视频中,扩散模型在某些任务上的表现已经开始与GANs相媲美甚至超越。

💡条件生成

条件生成是指生成模型在生成数据时考虑额外的条件信息,如文本描述、类别标签等,以生成符合特定条件的数据。在视频中,扩散模型被用于将文本描述转换为图像,展示了其在条件生成任务中的应用。

💡马尔可夫链

马尔可夫链是一种数学模型,用于描述随机过程中各个状态之间的转换,其中下一个状态的概率仅依赖于当前状态。在扩散模型中,正向扩散过程被建模为一个马尔可夫链,每个时间步骤的分布仅依赖于前一个时间步的样本。

💡变分自编码器(VAEs)

变分自编码器(VAEs)是一种生成模型,包含编码器和解码器两部分,编码器负责将输入数据映射到潜在空间的分布,解码器则从潜在空间重构数据。扩散模型的训练目标与VAEs类似,都是通过最大化边证据下界来训练的。

💡变分下界

变分下界,也称为证据下界,是用于训练生成模型的一种目标函数,它提供了对数据的边际对数似然的下界。在扩散模型中,通过最大化变分下界来训练模型,使其能够更准确地生成数据。

💡KL散度

KL散度,即Kullback-Leibler散度,是衡量两个概率分布差异的指标。在扩散模型中,KL散度用于鼓励模型的近似后验分布与潜在变量的先验分布相似。

💡重采样

重采样是指在生成过程中,从模型学习的分布中抽取样本的过程。在扩散模型中,重采样用于从纯噪声分布开始,通过学习到的反向过程逐步生成清晰的数据样本。

💡自编码器

自编码器是一种神经网络,它通过一个编码器将输入数据压缩成一个低维表示,然后通过一个解码器重构输出。在扩散模型中,虽然正向过程是固定的,但反向过程类似于自编码器的解码器部分,用于从潜在空间重构数据。

💡修复

修复是指在图像或数据中填补缺失或损坏部分的过程。扩散模型在修复任务中表现出色,可以通过训练来填补图像中的空洞或缺失区域。

💡概率流

概率流是一种由扩散模型启发的模型,它通过数值积分来近似对数似然函数。概率流允许对数据的生成过程进行更精细的控制,从而在生成模型领域提供了一种新的视角。

Highlights

扩散模型(diffusion models)是生成建模领域中的一种新兴方法,尤其在图像生成方面取得了显著成功。

扩散模型在某些任务上已经超越了生成对抗网络(GANs)等其他类型的生成模型。

扩散模型的基本思想是模拟一个逐渐添加高斯噪声的过程,然后学习如何逆转这个过程,从噪声图像恢复出清晰的图像。

正向扩散过程被设计成一个马尔可夫链,其中每一步的分布仅依赖于前一步的样本。

扩散模型的反向过程被训练来产生回到数据流形的轨迹,从而生成合理的样本。

扩散模型的训练目标不是直接最大化似然目标,而是最大化一个下界,即变分下界或证据下界。

扩散模型可以视为一种潜在变量生成模型,其中正向过程类似于编码器从数据生成潜在变量,反向过程类似于解码器从潜在变量重构数据。

扩散模型的训练只需要训练一个网络,与需要联合训练两个网络的VAE不同。

扩散模型的反向步骤被参数化为高斯分布,这有助于减少训练过程中的方差。

扩散模型可以实现条件采样,例如根据类别标签或文本描述生成图像。

扩散模型可以通过特殊训练来指导采样,而不依赖于第二个网络。

扩散模型在图像修复(inpainting)任务中取得了成功,通过微调特定于该任务的模型来获得更好的结果。

扩散模型与其他退化模型相比,受限于慢速的马尔可夫链,这与GANs的单次前向传递形成对比。

扩散模型的连续时间形式可以产生概率流(probability flow ODE),通过数值积分近似对数似然。

去噪扩散模型与分数匹配模型之间存在紧密联系,后者通过梯度引导的马尔可夫链生成样本。

扩散模型的发展势头强劲,其在多个领域的应用前景令人兴奋。

Transcripts

play00:00

imagine we take an image and add a bit

play00:02

of gaussian noise to it

play00:04

then do this again

play00:05

if we repeat this enough times

play00:07

eventually we'll have an unrecognizable

play00:09

picture of static a sample of pure noise

play00:13

now what if we could figure out how to

play00:15

undo this process

play00:17

that is start from a noise image

play00:19

gradually remove the noise and end up

play00:21

with a coherent image

play00:24

this is the basic idea behind diffusion

play00:26

models an approach gaining traction and

play00:28

generative modeling it had success

play00:31

particularly in the domain of image

play00:32

generation and they are starting to

play00:34

rival and in some cases surpass other

play00:37

kinds of generative models you may be

play00:38

familiar with on certain tasks

play00:41

for example recent diffusion models have

play00:43

outperformed generative adversarial

play00:44

networks known as gans in perceptual

play00:47

quality metrics and they've also shown

play00:49

impressive performance in various

play00:50

conditional settings such as converting

play00:52

text descriptions to images

play00:55

in painting

play00:56

and manipulation

play00:59

in this video we'll try to understand

play01:00

the basic mechanism behind diffusion

play01:02

models and how they can be adapted to

play01:04

different generative settings

play01:07

we'll start with a sample from some

play01:09

target data distribution like an image

play01:11

from a training set

play01:13

let's call this x0

play01:16

now let's define a forward diffusion

play01:18

process that gradually adds noise to the

play01:20

image over big t time steps

play01:23

our model will be tasked with starting

play01:25

at x big t and undoing this noise

play01:27

through what we'll call the reverse

play01:29

process

play01:31

the forward process which we'll denote

play01:33

with q takes the form of a markov chain

play01:36

where the distribution at a particular

play01:37

time step only depends on the sample

play01:40

from the immediately previous step

play01:42

so we can write out the distribution of

play01:44

corrupted samples conditioned on the

play01:46

initial data point x0 as the product of

play01:48

successive single step conditionals

play01:52

in the case of continuous data each

play01:54

transition is parameterized as a

play01:56

diagonal gaussian

play01:58

beta t here is the variance at a

play02:00

particular time step t

play02:03

typically these variances are treated as

play02:05

hyperparameters and follow a fixed

play02:07

schedule for a particular training run

play02:09

beta generally increases with time and

play02:11

is restricted to be between zero and one

play02:14

meaning that this coefficient radical

play02:17

one minus beta t

play02:18

will likewise be non-zero but less than

play02:21

one bringing the mean of each new

play02:23

gaussian closer to zero

play02:27

in the limit as t approaches infinity

play02:30

q will approach a gaussian centered at

play02:32

zero with identity covariance losing all

play02:35

information about the original sample

play02:38

in practice the total number of steps

play02:40

big t is on the order of a thousand

play02:44

using a large albeit finite number of

play02:46

steps allows us to set the individual

play02:48

variances beta t to be very small while

play02:52

still approximately maintaining the same

play02:54

limiting distribution

play02:56

but why do we want to use a small step

play02:57

size what's the benefit

play03:00

well it means that learning to undo the

play03:02

steps of the forward process won't be

play03:04

too difficult

play03:05

let's consider a simple case in one

play03:07

dimension

play03:08

suppose we were given the distribution

play03:10

of a forward process sample at time t

play03:12

minus one and it resembled a mixture of

play03:15

gaussians with two modes

play03:17

we then observe x t

play03:19

and want to infer the posterior

play03:21

distribution over x t minus one

play03:24

that is we'd like to determine where did

play03:26

the chain likely come from in order to

play03:29

arrive at x t

play03:31

what was the previous step of the chain

play03:34

if the noise step that is q of x t given

play03:37

x t minus 1 is allowed to be large then

play03:40

we will be quite uncertain about the

play03:42

location of x t minus 1. who knows where

play03:45

we jumped from

play03:47

but if the forward noise step is

play03:48

restricted to be small there is much

play03:50

less ambiguity about x t minus 1.

play03:53

we could then be justified in modeling

play03:55

the posterior of the forward step that

play03:57

is q of x t minus 1 given x t

play04:01

with a unimodal gaussian

play04:03

eliminating the contribution from the

play04:05

mode to the right

play04:06

and in fact it can be shown

play04:08

theoretically that in the limit of

play04:09

infinitesimal step sizes the true

play04:11

reverse process will have the same

play04:13

functional form as the forward process

play04:17

so diffusion models leverage this

play04:18

observation parameterizing each learned

play04:21

reverse step to also be a unimodal

play04:23

diagonal gaussian

play04:25

aside from the sample at time t the

play04:27

model also takes t as input in order to

play04:30

account for the forward process variance

play04:32

schedule different time steps are

play04:33

associated with different noise levels

play04:35

and the model can learn to undo these

play04:37

individually

play04:39

like the forward process the reverse

play04:41

process is set up as a markov chain

play04:43

and we can write out the joint

play04:45

probability of a sequence of samples as

play04:47

a product of conditionals and the

play04:49

marginal probability of x big t

play04:53

so what is p of x big t here exactly

play04:56

well it's the same as q of x big t the

play05:00

pure noise distribution

play05:02

so at inference time in order to

play05:03

actually generate a sample we start from

play05:05

a gaussian and begin sampling from the

play05:07

learned individual steps of the reverse

play05:09

process p of x t minus 1 given x t until

play05:14

producing an x0

play05:17

okay great so we've defined these

play05:18

forward and reverse diffusion processes

play05:21

the forward process is designed to

play05:23

essentially push a sample off the data

play05:25

manifold turning it into noise and the

play05:27

reverse process is trained to produce a

play05:29

trajectory back to the data manifold

play05:31

resulting in a reasonable sample

play05:34

but what objective will we actually be

play05:36

optimizing is it some maximum likelihood

play05:39

objective where we directly maximize the

play05:41

density assigned to x0 by the model

play05:44

well not exactly

play05:46

if we try to calculate p of x0 we see

play05:49

that we have to marginalize over all the

play05:51

possible trajectories all the ways we

play05:53

could have arrived at x0 when starting

play05:56

from a noise sample

play05:57

this unfortunately is intractable

play06:00

but it turns out we can maximize a lower

play06:03

bound

play06:04

to do this let's view x1 through x big t

play06:07

as latent variables and x0 as an

play06:09

observed variable

play06:11

allowing us to interpret a diffusion

play06:12

model as a kind of latent variable

play06:14

generative model

play06:16

if we think back to another latent

play06:18

variable model you may be familiar with

play06:20

variational autoencoders commonly known

play06:22

as vaes

play06:24

we might get a hint about our training

play06:25

objective

play06:27

as a quick reminder in a vae we have an

play06:30

encoder that produces a distribution

play06:31

over latency given a data input x

play06:35

and a decoder that reconstructs the data

play06:37

by producing a distribution over data x

play06:40

given a latent input z

play06:42

so we can think of the forward process

play06:44

in diffusion models as analogous to the

play06:46

encoder producing latency from data

play06:49

and the reverse process as analogous to

play06:51

the decoder producing data from latents

play06:55

now unlike a vae encoder the forward

play06:57

process here is typically fixed

play07:00

it's the reverse process that we focus

play07:02

solely on learning

play07:03

this means that only a single network

play07:05

needs to be trained unlike in a vae

play07:07

where two networks are trained jointly

play07:11

so we can now borrow the basic training

play07:12

objective used by vaes and a number of

play07:15

other latent variable models

play07:17

when we have a model with observations x

play07:19

and latent variable z

play07:21

we can derive what's called the

play07:23

variational lower bound also known as

play07:25

the evidence lower bound

play07:27

a lower bound on the marginal log

play07:29

likelihood p theta of x

play07:32

we won't walk through the full

play07:33

derivation here but the end result is a

play07:35

likelihood term also known as a

play07:37

reconstruction term subtracted by a kl

play07:40

divergence term

play07:41

the likelihood term encourages the model

play07:43

to maximize the expected density

play07:45

assigned to the data

play07:47

while the kl divergence encourages the

play07:49

approximate posterior q z given x to be

play07:52

similar to the prior on the latent

play07:54

variable p of z

play07:56

as we saw earlier x0 will serve as the

play07:59

observation in the diffusion model

play08:01

framework while x1 through big t will

play08:04

take the place of the latent variable z

play08:06

here

play08:07

let's substitute these in

play08:10

alright now let's simplify a bit

play08:14

we can expand the kl divergence to

play08:16

combine the two terms into a single

play08:18

expectation

play08:28

and finally we can refactor the chain

play08:30

probabilities into their individual

play08:32

steps

play08:35

now there's a nice property of the

play08:36

forward process queue that we didn't

play08:38

touch on earlier any arbitrary step of

play08:41

the forward process can be sampled

play08:43

directly in closed form

play08:45

this is just because the sum of

play08:47

independent gaussian steps is still a

play08:49

gaussian

play08:51

so at training time any term of this

play08:53

objective can be obtained without having

play08:55

to simulate an entire chain

play08:58

likewise we can optimize this objective

play09:00

by randomly sampling pairs of x t minus

play09:02

one and x t

play09:04

and maximizing the conditional density

play09:06

assigned by the reverse step to x t

play09:09

minus one

play09:10

however because different trajectories

play09:12

may visit different samples at time t

play09:14

minus one on the way to hitting xt

play09:17

the setup can have high variance

play09:19

limiting training efficiency

play09:22

to help with this we can rearrange the

play09:23

objective as follows

play09:26

let's examine each component

play09:28

p of x big t is fixed it's just the

play09:31

start of the reverse process the pure

play09:33

noise distribution

play09:35

and as we saw earlier the whole forward

play09:36

process q is also treated as fixed

play09:40

so we just have to worry about these two

play09:41

terms to the right

play09:45

here we have a sum of kl divergences

play09:47

each between a reverse step and a

play09:50

forward process posterior conditioned on

play09:52

x0

play09:54

one can prove with bayes rule that when

play09:56

we treat the original sample x0 as known

play09:59

like it is during training these q terms

play10:02

are actually just gaussians

play10:04

since the reverse step is already

play10:05

parameterized as a gaussian each kl

play10:08

divergence now is simply comparing two

play10:10

gaussians and can be evaluated in closed

play10:13

form

play10:14

this helps reduce variance in the

play10:16

training process because instead of

play10:18

aiming to reconstruct monte carlo

play10:20

samples

play10:21

the targets for the reverse step become

play10:22

the true posteriors of the forward

play10:24

process given x0

play10:27

there are a couple different ways we

play10:28

could imagine implementing the reverse

play10:30

step p theta in the paper denoising

play10:33

diffusion probabilistic models

play10:35

ddpm for short

play10:37

the authors elect to set the reverse

play10:39

process variances to time specific

play10:41

constants as they found learning them to

play10:43

lead to unstable training and lower

play10:45

quality samples

play10:47

so the reverse step network is solely

play10:49

tasked with learning the means

play10:52

they then suggest a reparameterization

play10:54

that aims to have the network predict

play10:55

the noise that was added rather than the

play10:57

gaussian mean

play10:59

first we can rewrite sampling from an

play11:01

arbitrary forward step by using an

play11:03

auxiliary noise variable epsilon

play11:05

epsilon here has a constant distribution

play11:08

independent of the forward time step t

play11:10

and the reverse step model can be

play11:12

designed to simply predict this epsilon

play11:15

the authors also found that a simpler

play11:17

version of the variational bound that

play11:19

discards the term weights that appear in

play11:21

the original bound led to better sample

play11:23

quality

play11:24

so compared to the original variational

play11:26

lower bound their objective down weight

play11:28

steps that have very small noise at

play11:30

early time steps of the forward process

play11:33

allowing training to focus on more

play11:35

challenging greater noise steps

play11:38

like other generative frameworks

play11:40

diffusion models can be made to sample

play11:42

conditionally given some variable of

play11:44

interest like a class label or a

play11:46

sentence description

play11:48

one way to do this is to just feed the

play11:50

conditioning variable y as an additional

play11:52

input during training

play11:54

in theory the model should learn to use

play11:56

y as a helpful hint about what it should

play11:58

be reconstructing in practice some work

play12:01

has shown that further guiding the

play12:02

diffusion process with a separate

play12:04

classifier can help

play12:06

in this setup we take a trained

play12:07

classifier and push the reverse

play12:09

diffusion process in the direction of

play12:11

the gradient of the target label

play12:13

probability with respect to the current

play12:15

noise image

play12:16

and we can do this not just with single

play12:18

word labels but also with higher

play12:20

dimensional text descriptions as well

play12:23

of course one drawback of this technique

play12:25

is the reliance upon a second network

play12:28

an alternative approach eliminates this

play12:29

reliance instead using special training

play12:32

of the diffusion model itself to guide

play12:34

the sampling

play12:35

in the paper classifier free diffusion

play12:37

guidance the conditioning label y is set

play12:40

to a null label with some probability

play12:42

during training

play12:44

then at inference time the reconstructed

play12:46

samples are artificially pushed further

play12:49

towards the y conditional direction and

play12:51

away from the null label

play12:53

even though no new information is being

play12:55

given to the model they found this to

play12:57

produce higher quality samples under

play12:59

human evaluation compared to classifier

play13:01

guidance

play13:04

inpainting is another conditional

play13:05

generation problem where diffusion

play13:07

models have had success

play13:08

the naive way to perform in-painting

play13:10

with diffusion models is to take a model

play13:12

trained in the standard way and at

play13:14

inference time replace known regions of

play13:17

an image with a sample from the forward

play13:18

process after each reverse step

play13:21

now this works okay but can lead to edge

play13:24

artifacts

play13:25

the model is not being made aware of the

play13:27

full surrounding context only a hazy

play13:30

version of it

play13:32

instead better results come from

play13:33

fine-tuning a model specifically for

play13:35

this task

play13:36

we can randomly remove sections of

play13:38

training images and have the model

play13:40

attempt to fill them in conditioned on

play13:42

the full clear context

play13:45

we can compare diffusion models to some

play13:46

other prominent degenerative models

play13:49

for sampling tasks the fusion models are

play13:51

somewhat limited by the slow markov

play13:53

chain

play13:54

this contrasts for example with gans

play13:56

which can generate images in a single

play13:58

forward pass

play13:59

ongoing work aims to speed up sampling

play14:02

in diffusion models

play14:04

as we saw earlier diffusion models allow

play14:06

us to calculate a variational lower

play14:08

bound on the log likelihood similar to

play14:10

vaes

play14:11

in practice this lower bound can be

play14:13

quite good and even competitive on

play14:15

density estimation benchmarks which have

play14:17

long been dominated by auto aggressive

play14:19

models

play14:21

going beyond lower bounds a continuous

play14:23

time formulation of diffusion models can

play14:25

give rise to what's called a probability

play14:27

flow ode

play14:29

this enables approximating log

play14:30

likelihood via numerical integration

play14:34

there's a close connection between

play14:35

denoising diffusion models and what are

play14:37

called score matching models and often

play14:40

these are now grouped together into a

play14:41

single class of models score here refers

play14:44

to the gradient of the log of the target

play14:47

probability density with respect to the

play14:49

data

play14:50

a score network is trained to estimate

play14:52

this value

play14:53

then a markov chain is set up to

play14:55

actually produce samples from the learn

play14:57

distribution

play14:58

guided by this gradient

play15:00

well it turns out the score can actually

play15:02

be shown to be equivalent to the noise

play15:04

that's predicted in the denoising

play15:06

diffusion objective up to a scaling

play15:08

factor

play15:09

so we can think of undoing the noise in

play15:11

a diffusion model approximately as

play15:13

trying to follow the gradient of the

play15:15

data log density

play15:18

diffusion models are really gaining

play15:19

momentum and it's been exciting to see

play15:21

their progress

play15:22

check out the links in the description

play15:24

to learn more

play15:25

thanks for watching

Rate This

5.0 / 5 (0 votes)

Related Tags
扩散模型生成模型图像生成去噪过程条件生成变分自编码器概率流得分匹配深度学习人工智能
Do you need a summary in English?