Giulio Biroli - Generative AI and Diffusion Models: a Statistical Physics Analysis

RoMaDS
3 Oct 202358:14

Summary

TLDRこのスクリプトは、最近の扩散モデルに関する研究を紹介し、生成的AIの進歩とその物理的な意味を探求しています。扩散モデルは、画像やテキスト生成の最先端に位置しており、興味深いトピックです。スクリプトでは、扩散モデルの仕組み、特にその逆プロセスとその中的での物理的な性質について詳細に説明し、データの次元数やデータの量がどのようにモデルの性能に影響を与えるかについても議論します。また、扩散モデルの理論的な背景と、その精度に関する数学的な結果も触れています。最後に、研究者が取り組んでいるいくつかの重要な課題についても触れています。

Takeaways

  • 🌟 拡散モデルは非常に興味深く、特に物理学からの多くの知見が適用されています。
  • 🔍 拡散モデルは生成的AIの分野で重要な突破を果たしており、画像やテキスト生成の分野で結果を出すことが示されています。
  • 📈 拡散モデルは時間の経過とともに画像を白ノイズに変化させ、逆に白ノイズを画像に戻すことで新しい画像を生成します。
  • 🔄 拡散プロセスは前向プロセスと逆のプロセスであり、その時間スケールが重要です。
  • 🎯 拡散モデルの学習には、スコア関数の近似に関する重要な役割があります。
  • 📚 理論的には、十分に近似されたスコア関数を持つ拡散モデルは、データの分布を適切に近似できます。
  • 📈 データの次元とデータの量のバランスが重要であり、異なる時間スケールでの拡散モデルのパフォーマンスが異なります。
  • 🧠 データの分布が特定のデータセットに関連する場合、拡散モデルはデータの重みを正しく捉えることが難しい可能性があります。
  • 🔧 拡散モデルの学習には、データ次元とデータの量の関係、および近似クラスの選択などの重要な要素があります。
  • 🌐 拡散モデルは高次元のデータにも適用され、その次元とデータの量の関係が複雑な問題を解く上で重要な役割を果たします。
  • 🚀 拡散モデルの研究は進化し続けており、より複雑な問題や実際の応用にも取り組むようになっています。

Q & A

  • ディフュージョンモデルとは何ですか?

    -ディフュージョンモデルは、画像やテキストの生成に使用される方法であり、特に最近の進歩の中で突破的な成果を生み出しているとされています。

  • ディフュージョンモデルが注目される理由は何ですか?

    -ディフュージョンモデルは、高品質な画像やテキストを生成することができ、その能力が多くの分野で価値があると認識されているため注目されています。

  • ディフュージョンモデルがどのように動作するか説明してください。

    -ディフュージョンモデルは、最初にノイズを含む画像を生成し、徐々にそのノイズを減少させることで、最終的にクリーンな画像を生成するプロセスを経て動作します。

  • ディフュージョンモデルの数学的な美点は何ですか?

    -ディフュージョンモデルの数学的な美点は、そのシンプルさにあります。モデルを理解し、実装することが比較的容易で、効果的な結果を生成することができるためです。

  • ディフュージョンモデルを学習させるためにはどのような手順が必要ですか?

    -ディフュージョンモデルを学習させるためには、時間逆転のプロセスを理解し、スコア関数を学習する必要があります。深度学習技術を用いて、データからスコア関数を近似することが一般的です。

  • ディフュージョンモデルが生成する画像の品質はどのように評価されるのですか?

    -ディフュージョンモデルが生成する画像の品質は、視覚的に評価されることが一般的です。また、画像の鮮明さ、解像度、リアルな表現力などの観点から評価することもできます。

  • ディフュージョンモデルを使用する際の課題は何ですか?

    -ディフュージョンモデルを使用する際の課題としては、データの次元性、データの量、そしてモデルの適切な訓練などが挙げられます。特に、高次元でのデータ処理には計算コストが高くなり、適切な近似技術が必要です。

  • ディフュージョンモデルの応用分野にはどのようなものがありますか?

    -ディフュージョンモデルは、画像生成、テキスト生成、音声合成、データの復元や增强など、多くの分野で応用されています。また、アート作品の作成や新しい資料の発見など、創造的な分野にも活用されています。

  • ディフュージョンモデルの研究が進むことで、どのような影響が期待されますか?

    -ディフュージョンモデルの研究が進むことで、より高品質な画像やテキストの生成が期待されます。また、データの欠如やノイズの多いデータを扱う能力が向上し、医療分野や自然災害の迅速な対応など、社会的な課題に対する貢献が期待されます。

  • ディフュージョンモデルと他の生成モデルとの比較において、どのような特徴がありますか?

    -ディフュージョンモデルは、生成モデルの一種であり、他のモデルと比較して、特に画像や音声の高品質な生成能力が特徴です。また、データの前処理や後処理が比較的少ないため、シンプルな構造を持ち、計算コストを抑えることができます。

  • ディフュージョンモデルを実装する際に注意すべき点は何ですか?

    -ディフュージョンモデルを実装する際には、適切なデータセットの選択、適切なネットワーク構造の設計、そして適切な学習率やパラメータの調整など特别注意が必要です。また、過学習や欠学習を避けるために、適切な正則化技術を用いることも重要です。

Outlines

00:00

🤖 導入と拡散モデルの紹介

この段落では、講師が拡散モデルとその興味深い性質について説明し、特に物理学からの視点から見て多くの可能性があると強調しています。拡散モデルは最近の生成AIの突破的な進歩であり、画像とテキスト生成の分野で最高の結果を出していると述べています。講師は、拡散モデルがどのように動作するかを解説し、特にそのシンプルさと美しさに焦点を当てています。

05:03

🧠 拡散モデルの数学と学習プロセス

この段落では、拡散モデルの数学的背景と学習プロセスについて詳しく説明されています。講師は、物理の観点から時逆概念を導入し、拡散モデルが白ノイズから画像を復元する方法を説明しています。また、生成されたデータの確率分布をどのように近似するか、そしてその分布を最小化するパラメータを見つけるために、深層学習技術を使用していることも触れています。

10:04

📈 理論的な結果と次元の役割

この段落では、拡散モデルに関する理論的な結果とデータ次元の役割について述べられています。講師は、最近の研究に基づいて、モデルの近似がデータ分布にどの程度近づくかを示す結果を紹介しています。また、データの次元数が大きいほど、モデルの正確さが向上することが示唆されています。しかし、次元数が非常に大きい場合のデータの需要量との違いについても議論しており、重要な問題が開かれています。

15:06

🌐 次元とデータの数との競争

この段落では、データ次元の数とその競争について詳しく説明されています。講師は、高次元データの場合、拡散モデルがどのように動作するかを示しています。また、データの数と次元の関係を通じて、どの程度のデータ数が必要なか、そしてどの時点でモデルが正確に動作するかについても議論されています。この分析は、拡散モデルの理解と改善に役立つ重要な視点を提供しています。

20:08

🧬 データ分布と対称性の破れ

この段落では、データ分布と対称性の破れについて詳細に説明されています。講師は、シンプルなモデルであるIsingモデルを使用して、データの異なる構成に対する対称性の破れを示しています。また、小さく異なる2つの状態を持つシステムを扱い、拡散モデルがこれらの状態をどのように復元するかを探求しています。この分析は、拡散モデルの生成プロセスとその制限に関する深い理解を提供しています。

25:10

🕒 時間と次元の競争の結論

この最後の段落では、時間とデータ次元の競争についての結論が示されています。講師は、2つの異なる時間の範囲について議論し、それぞれが生成プロセスと対称性の破れに関連していることを説明しています。また、次元の数とデータ数との競争を通じて、拡散モデルの正確さとその制限についても触れています。最後に、講師は、この分野での今後の研究方向と実用的な応用についても触れています。

Mindmap

Keywords

💡diffusion models

ディフュージョンモデルは、画像やテキストの生成に使用される高度な統計モデルです。この動画では、ディフュージョンモデルが最近の物理학の進歩の中で重要な突破として注目され、特に2020年以降、最先端の想像力アプリケーションで使用されていると説明されています。ディフュージョンモデルは、画像を白いノイズに変환し、诺伊マンプロセスを逆运算することで、新しい画像を生成する手法です。

💡generative AI

ジェネラティブAIは、新しいデータやコンテンツを生成するように設計された人工知能の分野です。この動画では、ジェネラティブAIが画像やテキスト生成において重要な突破をもたらしたとされ、特に最近数年間にわたってその進歩が加速していると述べられています。

💡score function

スコア関数は、確率分布に関する情報を含む関数で、ディフュージョンモデルにおいては、ノイズを除去して元のデータに戻す際に重要な役割を果たします。この動画では、スコア関数がディフュージョンプロセスの逆運動方程式を定義し、元のデータ分布に戻るための力の方向を提供するものとして説明されています。

💡stochastic dynamics

確率的ダイナミクスは、ランダムな要素を含むシステムの動きを記述する数学的モデルです。この動画では、ディフュージョンモデルが確率的ダイナミクスの方程式を用いて、データの餌を白いノイズから元に戻すプロセスを模倣していることが説明されています。

💡machine learning

機械学習は、コンピューターシステムがデータから教養や知識を獲得し、パフォーマンスを向上させる技術です。この動画では、機械学習がスコア関数をデータから学習し、ディフュージョンモデルを改善する役割を果たしていることが強調されています。

💡high-dimensional data

高次元データは、多数の変数を持ち、非常に複雑なデータ構造を持つデータです。この動画では、ディフュージョンモデルが高次元データの特性を扱う方法と、そのデータの次元数がモデルの性能に与える影響について説明されています。

💡equilibrium thermodynamics

平衡熱力学は、システムが熱的および力学的に平衡状態にある場合の物理法則を研究する分野です。この動画では、ディフュージョンモデルが均衡状態の熱力学と関連していることや、その理論がモデルの理解に役立っていることが説明されています。

💡time reversal

時間反転は、物理法則の対称性に関する概念で、時間の経過を反転させることで、システムの過去の状態を再現することを意味します。この動画では、ディフュージョンモデルが時間反転の概念を用いて、データの餌を復元するプロセスを説明しています。

💡approximation

近似は、正確な解を求めることができない場合に、次元数を減少させたり、複雑な式を簡略化することで、問題を解く方法です。この動画では、近似がスコア関数を推定する際に使用され、ディフュージョンモデルの精度を向上させる役割を持っていることが示されています。

💡deep nets

ディープネッツは、複数の隠れ層を持ち、高层度のデータ処理を行う人工ニューラルネットワークです。この動画では、ディープネッツがスコア関数の近似を行っていることや、その構造がディフュージョンモデルの性能にどのように影響を与えるかについて説明されています。

💡theoretical results

理論結果は、数学的または物理学的なモデルから導かれた厳密な解や予測です。この動画では、理論結果がディフュージョンモデルの理解や改善にどのように役立つかについて説明されており、特にデータの次元数や近似の精度などの要素が理論的に分析されていることが重点となっています。

Highlights

Diffusion models are a breakthrough in generative AI, particularly in image and text generation.

Diffusion models transform images into white noise and learn to reverse the process, generating new images from noise.

The beauty of diffusion models lies in their simplicity, which is also a key factor in their success.

The role of the score function in diffusion models is crucial as it allows the model to learn how to go backward in time.

Machine learning techniques are used to approximate the score function, enabling the generation of new data.

The study explores the role of dimension and the number of data in diffusion models, particularly in high dimensions.

Theoretical results on diffusion models are scarce, indicating a vast area for potential research and discovery.

The concept of effective temperature is introduced, relating to the noise added to the system during the diffusion process.

The importance of the early stages of the backward process in capturing the correct probability distribution is discussed.

The potential role of symmetry breaking in diffusion models and its impact on the generation process is highlighted.

The study finds that the number of data and dimension have a competitive relationship in the performance of diffusion models.

The role of approximation classes and network architecture in the performance of diffusion models is questioned.

The potential applications of diffusion models in areas such as image completion and copyright issues are mentioned.

The importance of stopping the diffusion process at the right time to avoid collapsing back to the original data is discussed.

The study concludes that diffusion models can be considered good depending on what is being observed or asked of the model.

Future research directions include understanding the role of the exact score and the competition between the number of data and dimension.

Transcripts

play00:10

all right okay so um so today I would

play00:13

like to tell you about the very recent

play00:15

work that I've done in collaboration

play00:17

with mzar on diffusion models I find

play00:20

diffusion models very very very very

play00:23

interesting I think there is there is a

play00:25

lot of lot to do especially from physics

play00:27

so what I want to do today is to give

play00:30

you an introduction I will also go read

play00:32

in the detail of the math because it's

play00:34

very simple this is also the beauty of

play00:36

it and then I will tell you what

play00:42

we what should I click to

play00:46

change all right okay

play00:49

so generative I think generative AI is

play00:52

one of really of the Breakthrough that

play00:56

has have been done in in the the last

play00:58

years and clearly there are iive result

play01:00

in image and text generation and here I

play01:03

just while I show you I mean diff Fusion

play01:05

models

play01:06

are the models the method that are used

play01:09

in the State ofthe art for imagination

play01:11

and now this I mean it's before they

play01:13

used guns and but starting from 2020

play01:16

actually diffusion models are really the

play01:18

one that are used so here I just give

play01:19

you an example of images generated by

play01:22

diffusion models and we should what

play01:24

these Fusion models are and then well

play01:26

I'm sure that maybe many of you have

play01:28

played with d e or it's equivalent from

play01:31

Google so here is well in which you do a

play01:34

diffusion model are used in the

play01:36

application for text to image and here

play01:38

create a three three picture just

play01:40

yesterday for you uh using using that

play01:43

just to show you what what they can do

play01:45

all right so what I want to do so next

play01:48

is really explain to you what diffusion

play01:49

models are and so diffusion models are

play01:52

something very simple so let just let me

play01:54

introduce a little bit of notation so

play01:56

you start with a set of images let's

play01:58

call it a new so it's just a vector in

play02:02

dimension n mu runs from one to P so P

play02:05

will be the number of data and what the

play02:08

model do uh what the model do is the

play02:11

following so you start a l equation at

play02:16

time it equal zero while the L equation

play02:18

is initialized to the value which just

play02:21

correspond to the image and then you run

play02:23

a very simple

play02:24

on L equation so one of the most simple

play02:29

L equation and I'm sorry I know that

play02:30

there mathematician here and well this

play02:33

is DX I know that is not defined but

play02:37

inform there is no problem as we do in

play02:40

physics so you run this equation and

play02:43

what you do when you run this equation

play02:44

is that you start from this initial

play02:47

condition and this equation at long time

play02:49

where converge to Independent

play02:52

identically distributed gaan with mean

play02:54

zero in variance one on each for each

play02:57

component of the vector

play03:00

well visually what you do is start from

play03:02

a nice image then you run this equation

play03:05

and you transform this equation in White

play03:06

Noise you also know exactly what is the

play03:09

probability L of X at time T well if you

play03:12

know what is the probability of Thea

play03:14

let's call a well it's very simple this

play03:17

is just just integrate this equation and

play03:20

what you find is that you just conol

play03:22

this equation with the G so you have you

play03:24

know what the probability P of e at a

play03:26

given time is and what you see what this

play03:29

G is it's noising putting noise on the

play03:32

image now what this diffusion model

play03:34

learn is they learn they learn how to go

play03:38

backward in time so you see you

play03:40

transform images thetion images in White

play03:43

Noise well if you learn how to go back

play03:46

so to transform white noise in an image

play03:48

then if you want to generate a new image

play03:50

it's easy to just give a white noise you

play03:52

just draw a white noise you let it go

play03:55

backwards and you you get a new image so

play03:58

it looks miraculous but this is really

play04:00

what this models are doing now well to

play04:04

show you exactly a little bit how they

play04:06

do

play04:08

this so the idea is the following so how

play04:12

to go backward in time uh so while in

play04:14

physics we learn about time reversal so

play04:16

this time reversal is a little bit

play04:18

different from what we do in physics but

play04:19

is a generic property of stochastic

play04:21

equation is that well if you know what

play04:24

is PT of X and we know what PT of X is

play04:27

well if you take the log respect to X

play04:29

this is what is called score function

play04:32

and now if you use this score function

play04:34

and you define this Lan equation which

play04:36

is just the backw in time L equation

play04:39

this L equation what it does is really

play04:41

you integrate it you just go backward

play04:44

and it transform quite nice in the

play04:47

original distribution is0 a and this is

play04:49

an exact result now the problem is that

play04:52

in general you don't know the score

play04:54

function but if you knew the score

play04:55

function that this is allow you actually

play04:57

to go back in in in time and transform

play05:00

white noise in the data and this well so

play05:02

the idea then is that then you can use

play05:05

actually machine buring techniques to

play05:07

try to learn this score function and if

play05:10

you learn it well enough then you

play05:12

generate new data and at least for me

play05:14

which that I work a long time on

play05:16

stochastic Dynamics L equation I if you

play05:19

think about this it just means that you

play05:21

start from white noise and you learn

play05:23

what are the forces that work on this

play05:25

kind of particle and that transform a

play05:28

white noise

play05:30

they make it move this forces and then

play05:31

this white noise become I know face I

play05:34

mean the face of man or the picture of

play05:38

dog now this idea interesting actually

play05:42

this idea of go back over time and using

play05:46

model then in 2015 in a paper by

play05:50

physicist pH the boundary with with Mach

play05:52

learning was still physicist and if you

play05:54

see actually the title of the paper was

play05:57

different thermodynamics and you will

play05:59

see we discuss also uh some idea about

play06:02

of equilibrium thermodynamics at the end

play06:05

so while this was done in 2015 and

play06:07

somehow remain silent or until 202 20

play06:13

and then 2020 really picked up the lot I

play06:16

think because people learn actually how

play06:17

to learn the score so in this paper I

play06:20

mean they proposed the idea but then

play06:22

starting from 2012 they they learn how

play06:25

toar they understood how to learn this

play06:28

score from data and and then it started

play06:30

to be used everywhere so well for a nice

play06:34

review you can see this this review by

play06:36

Yan which is I 2022 2023 so let me tell

play06:40

you now how you learn

play06:43

SC so the idea is the following so while

play06:47

you have to learn this function so these

play06:49

are the forces that you have to use

play06:51

while this function is a function in

play06:52

high dimension well if you have data in

play06:54

principle the idea is that you should

play06:56

well it's a regression program in which

play06:58

you will param

play07:00

with typically a deep net function and

play07:03

then this function should be as well as

play07:05

as much as possible equal to to this to

play07:08

this function f and so let for the

play07:10

moment let Define population loss so the

play07:13

idea is again for moment there are no

play07:15

data there is the exact PT of X the

play07:18

probability distribution at time is the

play07:21

noise version of the data and then what

play07:23

you have to do is that you construct a

play07:25

parameterization of this function term

play07:27

of parameter Theta and you want to

play07:28

minimize this function and typically

play07:31

they use some kinds of deep net which

play07:34

has this kind of this kind of form now

play07:36

you can work a little bit about this

play07:38

around this population loss and what you

play07:40

do is you just develop the square and so

play07:43

you take you have one term which is s

play07:45

Theta Square then you have the cross

play07:47

product and then you have F square but F

play07:49

square has no parameter in it so it's

play07:51

just a you don't care so if you look at

play07:54

these terms well this term here is the

play07:56

gr of L divided by the P so P me P so

play08:03

this p and this p goes away so here what

play08:05

you get is DX s of X and gra of P so

play08:10

what is gr of P well if you take PT of X

play08:13

which is here the gr with respect to X

play08:15

will act on this so you just make an xus

play08:19

A expon minus C go back go down okay so

play08:22

at the end you end up with this kind of

play08:25

loss this loss which is the expectation

play08:28

over all the stochastic process so X and

play08:32

A of s x and then xus a

play08:37

expon and now this is very easy actually

play08:39

to make it a problem of empirical

play08:42

minimization because now once you have

play08:44

the data so for each for each data for

play08:47

each image you run the Lan Dynamic so

play08:51

you have one trajectory you multiple

play08:53

trajectory but let's let's keep it

play08:55

simple so for each image you round one

play08:57

trajectory so you have a and so what you

play08:59

have to do now is well for each data and

play09:02

each trajectory well you have just to

play09:05

minimize this empirical loss so find the

play09:07

parameter the of P then minimize it and

play09:10

if you find the parameter then you find

play09:11

an approximation of of the score and

play09:15

then while you do it for a fixed number

play09:17

of times so you discretize the lement

play09:19

Dynamics in fixed in with small steps

play09:23

and then once you have it then you can

play09:25

run your backward your vement Dynamics

play09:27

and you can create in principle very

play09:30

simple of course there is theorization

play09:34

here and then there is also the fact

play09:37

about really how these things work so

play09:39

what I want to do now uh before telling

play09:42

you our result is to tell you what is

play09:45

known from the theoretical point of and

play09:48

actually what is interesting is very

play09:50

very little this time so actually many

play09:52

physicist that work on on this m

play09:54

learning problem start introduction say

play09:56

that it's a very po problem but there is

play09:58

very

play09:59

known especially in deep Nets well there

play10:02

is little more but I mean there is a

play10:04

long tradition math and computer science

play10:06

on this and actually on this case there

play10:09

is actually very little know meaning

play10:11

that even ma computer there are not many

play10:13

work and so the state of the art let's

play10:17

say theoretical math results have been

play10:21

obtained young researcher the and also

play10:25

another one is called who Ox for and

play10:28

this these are the kinds of

play10:30

result they have found so very formally

play10:33

what they tell you is that but if you

play10:34

find an approximation such that s of X

play10:37

is close enough to the score to the true

play10:40

score if the distribution of data is

play10:42

regular enough then you can find posi

play10:46

constant that the probability

play10:49

distribution the probity distribution

play10:51

that you get model is close enough to

play10:54

the distribution of the data and close

play10:57

enough so this is to variation distance

play11:00

for the ones we know otherwise just a

play11:02

distance and you see that this is less

play11:04

than different Factor so big T is the

play11:07

time at which you run the L Dynamics

play11:10

because well you cannot run it at

play11:12

infinite time so you know that you start

play11:14

from the data you run the Dynamics at

play11:16

certain point you stop and then you go

play11:18

backward in time so this tell you that

play11:20

you have to go long enough time but this

play11:24

is this is more and then actually you

play11:27

see that when you run enough in time you

play11:29

may have a problem because so you can

play11:32

actually keep this if you take an

play11:34

approximation which is good enough and

play11:36

if you take a time in the is so could

play11:41

good that this is a very

play11:44

bad but actually if you do additional

play11:47

additional hypothesis on the data

play11:49

distribution then this becomes a power

play11:51

so it's it's not as bad as okay so this

play11:55

is the kind of result that they have and

play11:58

well if you look at this result there is

play11:59

clearly one or at least or maybe two

play12:02

elephant in the room so the first thing

play12:04

is we all know that one big problem what

play12:07

makes the problem difficult is that the

play12:08

data are very high dimension which is

play12:11

what mat discussed and then there is the

play12:13

course of dimensionality and then you

play12:15

should discuss how many parameters you

play12:17

need and here well the dimension is not

play12:20

is not is not here actually hi you all

play12:22

this Con so while there are really many

play12:25

important questions are open

play12:27

and can uh can help to make progress and

play12:33

so well let me tell you just just there

play12:36

is no selection well the first more I

play12:39

think the most important one is what is

play12:41

the RO of the dimensional the data in

play12:43

all this business and then if you try to

play12:46

understand this clearly this is also

play12:47

related to with how many data uh sorry

play12:52

it's more the RO of the dimension of the

play12:54

data and then how much data you need

play12:56

actually to have a good diffusion model

play12:58

I will also try to tell you what does it

play13:00

mean I can say what is a good diffusion

play13:03

model it's not a question it's a good

play13:06

diffusion diffusion model will produce

play13:08

an approximation of the probability

play13:10

distribution this are probability

play13:11

distribution high dimension and while

play13:14

what is a good approximation of the

play13:16

which in my Dimension is not it's not a

play13:17

tri question and then I think there is

play13:20

another question that I would not

play13:21

address at all is what is the role of

play13:23

the approximation class so why they use

play13:26

what they call a unit they use some kind

play13:28

of

play13:29

deep net to

play13:32

uh to approximate the the score and of

play13:35

course one should should ask I mean what

play13:37

is the role of the number of parameter

play13:39

that are used what is the for why the

play13:41

foror is important which are exactly the

play13:44

kind of questions that M addressed but I

play13:48

mean we want should address also in this

play13:50

context all right so what I want to tell

play13:52

you in the following is what we did

play13:54

which is the first attemp first study of

play13:56

the role of Dimension and also the

play13:58

number of data in uh in the case of

play14:01

diffusion model so while since there is

play14:04

nothing on the RO of Dimension

play14:08

well which we can start with the simp

play14:11

the simplest model so what is the simp

play14:15

model

play14:17

data and Okay g dat of course is simple

play14:20

but here we take High dimensional so

play14:22

this would be the first example that

play14:24

stud so high dimension what you do Di

play14:29

the data again the same notation P the

play14:31

number of the data and now we consider

play14:33

the limit in which the dimension is very

play14:35

large and the number of dat is very

play14:36

large and the vector of the data Su just

play14:39

a g Vector which has a certain Zer and

play14:43

certain c0 and now c0 is an N byn Matrix

play14:47

and infinity so we should say something

play14:50

about properties and the property will

play14:52

be that the density of value of C will

play14:55

converge some function to some well

play14:58

defined

play14:59

all right then let's do let's apply the

play15:02

idea of the model so again I remind you

play15:05

you start you take a certain number of

play15:09

data then you run

play15:11

your equation now in this case well

play15:15

since p z so PT of X is the distribution

play15:18

of X at T and well what you have is that

play15:22

here the convolution of the initial

play15:25

G with a G so thetion of G one is a so

play15:30

it means that P of X is

play15:33

a so which means that the score exact

play15:36

score this is the L of a so L of is

play15:40

quadratic function X you take the

play15:42

derivative so you get the score the

play15:45

exact score is lar next which means that

play15:49

has this form so the exact score has

play15:50

this form and you can comput it exactly

play15:54

and the this Matrix W after T is just

play15:58

given by this expression so you can see

play16:00

that when Z this goes away you get just

play16:04

c one the yes just understand how do you

play16:08

define the dimension of the data not of

play16:11

this the real

play16:14

data the dimens I mean just just the

play16:18

it's just for an image I just Define the

play16:21

number of pixel and then maybe there is

play16:22

the

play16:24

color just most definition possible

play16:30

and so when T is equal Z you see W is

play16:33

just the inverse

play16:36

ofan data and when go Infinity this goes

play16:39

away and this goes away just get the

play16:42

identity the temperat so the system is

play16:45

very simple the score is very simple at

play16:48

long time all right so this is the

play16:50

analysis in the exact case but of course

play16:53

not we don't want to understand exactly

play16:55

we want to understand when we have

play16:56

certain number of data and we we have a

play16:59

certain dimension of

play17:01

data so what you should do and please

play17:05

well ask question if there is something

play17:06

which is not because I think I have less

play17:09

than to keep in minutes so

play17:14

yeah when you learn in practice when

play17:17

they learn T do they learn different

play17:20

networks with completely different for

play17:22

every time no it's the same

play17:27

same

play17:29

so the uh yes the is like a variable in

play17:32

the network yeah yeah actually they

play17:35

don't even do it they don't even do this

play17:38

time this time this time they just throw

play17:41

at random the different

play17:46

times yes

play17:49

exactly yes well sometimes I mean again

play17:52

maybe this again you have time so here

play17:55

you see here I Define it at a given time

play17:57

what they do they Define

play17:59

is over Theory maybe with some waying so

play18:03

they they they wait more the initial

play18:05

time than the time and then they just

play18:08

minimize in one

play18:11

row okay so in this case C data let's

play18:14

work out what what they do in practice

play18:17

so this we know in this case lucky we

play18:19

know what is the expression of the exact

play18:21

score this you don't know images in this

play18:25

case we know so we can EXA we can say

play18:27

okay the score I know that it's linear

play18:29

so what I should do I should use the

play18:31

linear score and I should try to get

play18:33

from the data what is the Matrix W that

play18:35

I have to reuse so how would you do it

play18:38

well you take this you put you plug it

play18:40

in the empirical loss and then you use

play18:43

exactly the form of the I told you well

play18:46

you see so this is lar in X you just

play18:49

develop you differentiate with to W and

play18:52

at the end in this case you get an exact

play18:55

expression for the W the empirical W

play18:58

that you get from the data and the

play19:00

empirical W what you can express it in

play19:02

term of matrices which are nothing else

play19:05

that matrices that tells you coari

play19:08

empirical coari of the process so for

play19:10

example this Matrix D well this you st

play19:13

over so for each data you have a

play19:15

trajectory and so what you're doing here

play19:17

you are summing over the T trajectory x

play19:20

i m XJ M so this is nothing as that the

play19:23

empirical

play19:25

ciance

play19:26

at of the

play19:29

process and then M similar is a memory

play19:33

memory Matrix so if you get D you get M

play19:35

while you put here and this is your

play19:38

empirical estimation of w now the

play19:41

question is well this is what you get

play19:44

empirically this is actually a lucky

play19:47

case because you know even what is the

play19:49

exact score so there is no problem of

play19:51

approximation plus now the question is

play19:54

if you are in high dimension how much

play19:56

data I need actually to get that this W

play19:59

is

play20:00

equal to this W which is the exact

play20:03

one now now you can see that you can

play20:07

start to so the question that you can

play20:09

ask what is the RO of the dimension how

play20:11

many data one needs to get a good

play20:12

diffusion model and you can start to see

play20:15

how the dimension can play a role and

play20:17

the reason is that imagine that you're

play20:19

in fixed Dimension if you're in fixed

play20:21

Dimension and this Matrix C if you want

play20:24

to estimate it in a good way when you

play20:27

just take very large and then since this

play20:30

for different piece of different

play20:32

trajectory are just independent R of

play20:34

variable so you just use the central

play20:35

imerial this will converge it will

play20:38

converge to the correct Co variance and

play20:41

if you converge to the correct variance

play20:43

then say for and then you get get back

play20:45

the true Matrix now the problem is is

play20:48

the dimension is large this is a very

play20:52

large Dimension is a very large Matrix

play20:55

and you know that well for very large

play20:56

Matrix one has to be careful so if you

play20:58

look at this this is nothing else than a

play21:00

wish Matrix for people that like

play21:03

matrices and in case of wish matrices is

play21:05

known that if the dimension of the

play21:08

Matrix is of the same order of P of

play21:11

number of data

play21:14

which then I mean this empirical is

play21:17

quite different from from the true M so

play21:19

you can you can start to feel the

play21:21

dimension will play a role and there

play21:22

will be a competition between Dimension

play21:24

and the number of so let's see so at the

play21:27

end in this

play21:28

it's nothing else that the Rand the

play21:30

problem to understand the competition

play21:32

between Dimension and number of

play21:35

parameters

play21:38

yes are

play21:40

not and yes

play21:50

yeah yeah yeah are com from the

play21:56

same not for what going to start in

play21:59

general the process in yet but for what

play22:04

I'm going to T so the different reges

play22:05

now it's not important but you're right

play22:07

I want to stud to study the backward the

play22:11

backward diffusion in principle yes

play22:14

isation all right so now what is this

play22:16

mat Theory problem and I want just to

play22:18

give you the key key idea that allow you

play22:21

to understand what's going on so let's

play22:23

let's focus again on this Matrix and so

play22:25

the let me tell you exactly what I told

play22:28

for if I consider just the element

play22:31

I then when P

play22:34

isar well I know that at a certain point

play22:37

this will convert to the good value CT

play22:40

which is the

play22:41

truear and then if p is not INF there

play22:44

will be a flation what is the order of

play22:46

this flation just

play22:51

one now what I can do I can rescale

play22:55

actually this one root of p i rescale it

play22:58

this way we say I we write it by square

play23:01

root of in the denominator the square

play23:03

root of n in the numerator and so it

play23:06

means that this I WR

play23:11

as which

play23:14

is I hope it's clear I just put this one

play23:17

square root of n on top and so there is

play23:18

a square root of n on the bottom now the

play23:21

interesting thing is that if I write

play23:23

this way now the Matrix DT is C The

play23:26

Matrix that I'm interested in

play23:28

that I would like to to have as an

play23:30

estimation and then I have an error and

play23:32

this error is root ofid by P typ Matrix

play23:35

R now the Matrix R has the correct

play23:38

scaling in large dimension in such a way

play23:40

that the density of

play23:42

Val so if you for example if you think

play23:45

about

play23:46

the matrices to getc

play23:51

of one you must have value as the

play23:54

elements

play23:56

areal so if I write it this way now well

play24:01

this is a problem that has been studied

play24:04

a lot in recent years with for simple

play24:06

matrices so you take a deterministic mat

play24:09

C of T and this R is the mat for

play24:13

the which is not the case

play24:16

Cas then this is called by matx model

play24:20

and there been a lot of works this from

play24:22

physics and then from math that tell you

play24:25

that well if p is much larger than n

play24:28

the density of values of B is exactly

play24:32

the same of theity of Val of when goity

play24:36

but the vectors are completely different

play24:39

if p is much larger than n s so if you

play24:42

have much much more data than Dimension

play24:45

so Dimension Square then really you have

play24:48

a convergence between e and C meaning

play24:51

that vales of c are the same value of D

play24:54

are to sub terms and Vector are also

play24:57

oriented exactly the good way okay so

play25:00

this is what happens for the de Matrix

play25:02

Plus goe in our case if you see what C I

play25:05

mean it's we are in a different case

play25:07

because this is a more wish Matrix and

play25:09

this is a bit more complicated but

play25:11

that's the idea so what we have that the

play25:13

matx that we want to estimate is

play25:16

corrupted by random Matrix times

play25:18

something which is root of n/ p and we

play25:21

can use the same kind of arent that we

play25:23

used for the model so cutting on story

play25:28

short so what you get in this case is

play25:30

that you find now in the generative

play25:33

diffusion prion high diens you can find

play25:36

three reges so what the diffusion model

play25:38

do it does is that it's going to

play25:41

generate gion data this is clear because

play25:44

it's start from G data then the Lan

play25:46

equation gives you g trajectory then you

play25:49

go back again with the G trory and so it

play25:52

will be G and so will be mean zero and

play25:57

the

play25:58

coari and the question is how these

play26:00

coari resemble the true coari of that

play26:03

you have AAL Z and so what you find is

play26:05

that if the number of data is much less

play26:09

than Dimension then I the model is

play26:11

clearly wrong this Co matx has nothing

play26:14

to do with two Matrix just diffusion

play26:17

model just

play26:18

prodense now there is the second regime

play26:21

which is the number of dat is much

play26:23

larger than Dimension but is less than a

play26:26

dimension Square

play26:28

then you get the same of values of this

play26:30

Matrix that's zero but different vectors

play26:34

so what this means is that if you look

play26:36

at the generative process if you look at

play26:39

quantities like this x x/ n so this

play26:43

quantity which is nothing else that the

play26:45

trace is related to the trace of C which

play26:47

is related to the integral of the

play26:50

spectral of the lens values by Lambda so

play26:53

this would be correctly reproduced so

play26:55

the C the global strength of flation

play26:59

will be correctly reproduced but if you

play27:01

want to know what is the direction which

play27:03

have the flation so Vector then it could

play27:05

be wrong and also the interesting thing

play27:09

which is more for mathematician that was

play27:11

suggested to us

play27:13

from anous refere is that the so you can

play27:17

look at the distance between High

play27:19

probity

play27:20

distribution and these distance have

play27:22

been discussed a lot in recent years

play27:24

especially in connection with optimal

play27:27

trans

play27:28

and what it is distance in this case you

play27:32

find in this distance between this High

play27:36

dimensional and high

play27:39

dimension process vanish where from Z

play27:43

here now let's look at the third regime

play27:46

the third regime which number of par is

play27:48

very large is larger than the dimension

play27:50

Square in this case really the

play27:52

generative process is good Vector values

play27:55

are good so everything the direction is

play27:58

correct and in this case what you get is

play28:00

that not only

play28:01

the zero but also the total variation

play28:04

distance which is a very restri it's

play28:06

very requiring distancing Andor

play28:09

distribution in high dimension Al thises

play28:12

so I think what is interesting here is

play28:14

that first it's a very simple model but

play28:15

you start to see first how the dimension

play28:19

play a role and how this is in

play28:21

competition with the number of data and

play28:25

you can also see that actually when you

play28:27

say Genera models is good well it

play28:30

depends what looking for so if you're

play28:33

looking to this kind of variable then

play28:34

it's good but if you're looking really

play28:37

to the direction of equation so this reg

play28:41

is yes yeah and the time doesn't play

play28:45

any I stop before forgetting complet the

play28:49

initial condition that yes so in this

play28:51

case we didn't look at the time but this

play28:55

is just what I'm going to do in the next

play28:57

SL but you're right here in this we just

play29:00

went time or very Lou

play29:03

we but that's an important isue

play29:08

but

play29:10

yes even if vectors wrong still if you

play29:14

are looking at any direction looking at

play29:18

the vant direction you already

play29:21

have to take a direction yes Direction

play29:25

random yes yeah yes so do you have a

play29:29

good VAR in all

play29:34

Direction I mean yeah if you take a

play29:36

random

play29:38

Direction it's more less AAL to all all

play29:41

vect so It's Tricky I mean you pick the

play29:43

direction if you pick the direction and

play29:46

you take it up random then you will be

play29:47

good I agree but if you want to know

play29:50

what are the important

play29:53

direction

play29:55

that even you

play30:04

have I good I

play30:11

can I don't think so I mean it's one VOR

play30:13

is really

play30:17

completely de compos of all the other

play30:20

ones it's just because well yeah I mean

play30:23

they're very sensible to peration so

play30:26

they

play30:28

complet all right so let's now so this

play30:31

is simple case I mentioned so but we we

play30:36

wanted to go a little bit beyond this

play30:38

want to add a distribution of data which

play30:41

is and in order to do

play30:45

this what kind of probability

play30:47

distribution Dimension can

play30:50

have are the ones that come from from

play30:53

the one you stud phys especially at

play30:57

have a transition so model you can have

play31:00

many things in mind so Ian it's I models

play31:03

I mean G liquids so for the say any

play31:07

standard physics distribution in the

play31:11

when the system PL it's the protion high

play31:14

dimension and we want to be in the low

play31:17

case in which the Symmetry is broken so

play31:20

what this means is that you while we

play31:22

give uh to to the diffusion model we

play31:24

give different configuration some

play31:27

that we are considering thetic system

play31:30

will be positive magnetized will be

play31:32

negative magnetized we transort

play31:34

everything in white noise and then the

play31:36

system should be able actually well to

play31:38

generate some configuration which are

play31:40

positive and some configuration which

play31:42

are negative and of course in our head

play31:46

there was also the idea that the

play31:47

different B in circumstance maybe could

play31:50

to them different C if I have a dog and

play31:53

a cat and then transform white and then

play31:56

the have to be able

play31:58

in

play31:59

C and so just to give you I me this is a

play32:02

very simple one dimensional example in

play32:05

which while you have

play32:07

two here you apply the diffusion

play32:11

transform just one and the system has to

play32:14

be able to go back and break the syet so

play32:18

this the diffusion model and what we ask

play32:22

is how the Sy is doing this but now in

play32:24

very high

play32:25

dimension okay

play32:27

okay

play32:29

so again we the tradition physics what

play32:34

we can do is in first thing we can look

play32:36

at the simplest model ever and the

play32:38

simplest model is of f transition is the

play32:41

CIS model so which we take

play32:44

iing a which are plus or minus one and

play32:47

we are full I everyone interact with

play32:50

everyone else interaction one and if is

play32:54

enough there

play32:56

are one postive St with magnetization

play33:01

and St and now the thing that we do we

play33:04

want to play with we put a magnetic

play33:06

field which is very small so here one

play33:10

because what we want to is that we want

play33:12

to that the weight of the two states the

play33:14

plus and minus is not and as we have

play33:17

usual case but it's going to be I don't

play33:20

know one3 and 23 because what we want to

play33:23

see is that what we also have in mind is

play33:26

Imagine That I train the system when I

play33:29

train a diffusion model when I we train

play33:33

there are some I don't know some more

play33:37

images of one class than more images of

play33:39

the other class is the actually

play33:41

diffusion model is a is aable actually

play33:43

to get this kind

play33:45

of information right so the way the

play33:49

statistical we of the different classes

play33:52

because somehow I mean our was that

play33:54

maybe you get it right and you always

play33:56

see cut and dos but maybe actually the

play33:59

uh the weights of which you get the cut

play34:02

and do maybe it's wrong or maybe it's

play34:03

more difficult to get and we to see that

play34:05

this be the case so and then the other

play34:08

thing that we want to ask is so this is

play34:10

why we take this h n and is the

play34:14

dimension always and so the other thing

play34:16

we want to know is in this case exactly

play34:18

Sol let's see how actually the infusion

play34:20

model right this case is going to

play34:24

generate the bre okay so this

play34:29

oops it's simple enough to be completely

play34:33

solved so in this case we don't play the

play34:36

same game than before in the sense that

play34:38

we don't work for the moment with a

play34:41

fixed number of data so we just look at

play34:43

what is the exact score so how actually

play34:46

diffusion model that works perfectly how

play34:49

works so in this case we can compute the

play34:51

score exactly so I don't give you the

play34:53

expression of the score but I just uh so

play34:57

I just tell you what that Zoom what

play35:00

happens at the beginning of the backward

play35:01

process you start with white noise and

play35:05

you start to run backward the diffusion

play35:07

process what kind of uh L equation we

play35:10

have and how it's able actually to

play35:12

reconstruct the images so well first at

play35:16

the beginning of the backward process th

play35:19

over I of XI while

play35:21

X by so this

play35:24

is so if you divide

play35:27

of you get a variable which is than one

play35:30

and on this variable you can actually

play35:32

write is the evolution of this varable

play35:35

here is simple enough also

play35:37

[Music]

play35:41

newy and here you can compute exactly

play35:44

the V the potential and the potential is

play35:48

well there is clearly Ex two different

play35:50

kind depending whether root

play35:52

of so root of Dimension time exponential

play35:56

minus so

play35:57

is very large at the beginning of the

play35:59

back process and it's very small at the

play36:02

end of the back process because

play36:04

we so what you see is that at the very

play36:06

beginning of the backward process here

play36:09

what you get is that b is an inverted

play36:11

potential like this now since there is

play36:13

an H is not around zero but it's

play36:15

actually an inverted

play36:17

Parabola Center somehow to the right in

play36:20

this case so what happens now is that

play36:23

when you start the process so me will be

play36:25

somewhere here then because of it will

play36:28

start to diffuse but what if it's let

play36:30

say from this side Ty go down increase

play36:34

in this direction on this side increase

play36:36

this direction and since the parabola is

play36:38

slightly shif to the right in this case

play36:40

because H is positive so we will have

play36:42

more trajectory that we go to the left

play36:45

TR go to the right so this is how the

play36:47

system start start to have different

play36:50

weights and now when the of expon is

play36:54

much larger than one this is the

play36:56

potential that you get first thing that

play36:58

you can see is that H is not there is

play37:00

not there anymore so there is no more

play37:02

any information in the backward

play37:05

diffusion process about the Ws and here

play37:07

what happens is that here this is the m

play37:10

the Symmetry there are some trajectory

play37:12

that are going down this way St are

play37:15

going down this way the barrier here has

play37:17

become very very

play37:21

large so it's never going to go back so

play37:24

this IM has been broken and really the

play37:27

parameter at which you have the Symmetry

play37:29

breaking is this is

play37:31

[Music]

play37:32

exponential so here we see really how

play37:35

the system bre and we can also show that

play37:39

with this kind of equation this

play37:42

potential have exact score so system

play37:45

reconstruct exactly the weights but what

play37:48

is important you see that reconstruct

play37:50

good way the weights you have to be very

play37:52

I the model has to work very well and

play37:56

the Very beginning of the so this is

play37:59

when everything is deed in term of

play38:02

breaking and the weights of the

play38:05

different glasses in the generating

play38:08

process this is what happen model can

play38:11

can one be a more General than this and

play38:13

the answer is yes so actually you can do

play38:17

this you can get this result in any kind

play38:20

of statis physics model so you don't

play38:22

need

play38:25

to the particle models it's and the ni

play38:29

thing is and the way to do it is the

play38:31

follow so here is the way which I write

play38:33

the backward process is just inage of

play38:37

equation here is the score now the very

play38:40

interesting thing about the score and

play38:42

this is also why I think all this is

play38:44

related to how thermodynamics and also

play38:47

all this idea which came from Jo is

play38:52

that if you look at what it is a there a

play38:55

people part which do ex and then what

play38:59

here is this is derivative of the free

play39:02

energy with respect to external mag

play39:04

field so this is the free energy

play39:09

of you know the distribution at the

play39:12

beginning then you the energy and you

play39:15

the energy as function of the external

play39:16

mag if you know this energy then you

play39:19

know the score and what you have to do

play39:20

you take the derivative of free energy

play39:22

and you evaluate with an external

play39:24

magnetic field which is just equal to X

play39:29

expon the of course that don't

play39:35

know then you know what is the equation

play39:39

the that all to

play39:41

go now why this is helpful is because

play39:44

now if we study a case in which we have

play39:47

symmetry breaking and we have weights

play39:50

which are slightly different between one

play39:52

phase and the other and we are at the

play39:55

very beginning of the back process so

play39:57

the external magnetic field is very

play39:58

small you see

play40:00

expon is very large so we can actually

play40:04

write the exponential minus v i can th

play40:07

over over different states that you have

play40:11

and then while the the external magnetic

play40:13

field is very small it's like we

play40:15

breaking the symmetry so you have th

play40:17

over I of c and this is the

play40:21

magnetization

play40:24

alation of the

play40:27

of and this is something is completely

play40:31

always do it

play40:33

it which is the case at the beginning of

play40:35

the process Now using this expression

play40:38

you in here and you have a general

play40:40

expression of the scope at the very

play40:41

beginning of the back process and again

play40:46

you find something which is a

play40:47

generalization that I told you before

play40:49

this is going to be a some of Al of the

play40:51

weights want to reconstruct exponential

play40:54

of new Alx new Al X is the projection of

play40:57

the vector X over the direction of the

play41:00

magnetization that you have in state

play41:02

Alpha and then you have this factor

play41:04

which is

play41:06

exp so just cutting the long story short

play41:09

is similar to what we had before so you

play41:11

will see that the weight matter and you

play41:13

really construct the way that when root

play41:15

of n exponential minus is more one and

play41:18

when this Factor becomes very large what

play41:21

then symmetry is broken and the system

play41:24

has committed then you have lost any

play41:26

information about the Ws so in this

play41:28

example if you think about this this a

play41:30

very L Dimension example which just one

play41:33

dimension but what it means is that at

play41:35

the very beginning the system will have

play41:37

this pH transition the PA so the system

play41:39

will commit some will go this way

play41:41

another will go this way and once this

play41:44

region which

play41:46

correspond is then the system is Comm

play41:50

you never go back you go on this way on

play41:52

this way so this clearly tells us that

play41:57

as was asking is it's very important

play42:01

when you stop the L equation in the

play42:03

forward process if you stop it too uh

play42:07

not not far enough you are in trouble if

play42:09

you stop it here well probably it's

play42:11

going to be very B for the generation

play42:13

process and what is the time in which

play42:15

this depends on the dimension of the

play42:18

data so well the question is what about

play42:21

images that all this work for images so

play42:24

for for images we don't know what is

play42:26

probability distribution of of a but one

play42:29

can do

play42:31

experiment and so actually there's been

play42:33

done some experiments by the

play42:37

BNS so what what he did

play42:40

is I took 10 and if they consider two

play42:44

classes of 10 in principle each class is

play42:47

represented with one

play42:50

with classes that you can construct an

play42:53

artificial data set in which in one

play42:56

class is is there I don't know a little

play42:58

bit more than the other class and then

play43:00

you try to see whether the diffus model

play43:02

is able to first to get back correctly

play43:05

the images and second to get correctly

play43:07

the way

play43:08

they and the answer is well he he really

play43:11

saw exactly what without doing any

play43:15

analys in term of Dimension but he

play43:17

clearly saw that the system is able

play43:19

actually to reconstruct Imaging which

play43:20

are not bad but to get the weight

play43:22

correctly you have to go to long time

play43:24

and to be extremely precise in

play43:26

estimation of

play43:28

score very long time then there is

play43:31

another numerical evidence which come

play43:32

from a paper which is very

play43:35

recent which is June June

play43:38

2023 uh so here what they do you cannot

play43:41

see but not important the what they do

play43:45

is that they do exactly what I tell you

play43:47

so they run the L process time and then

play43:51

they go back and this is a value which

play43:55

is a um so it's is low if actually the

play44:01

system is working well and it's high is

play44:04

not working well something a matric that

play44:06

is used to to see how much the probity

play44:09

distribution of the generator is s to

play44:12

the distribution of the in and here what

play44:15

you have that maybe you cannot see on

play44:17

the xaxis is the time at which they stop

play44:20

so here is a very long time here is a

play44:22

very short time and here is also what as

play44:24

you see what they say is that there is a

play44:26

certain time at which

play44:28

clearly you go below this time then this

play44:31

system is going to be very bad you stop

play44:34

it too early and there is another go

play44:37

beyond this time that

play44:40

is and here

play44:42

is an example so here is when you stop

play44:45

is the quality of image when you stop

play44:47

the quality of image when you

play44:51

stop indirect evidence it's not a direct

play44:54

evidence but it's an evidence that a

play44:56

specific time scale at which things

play44:58

really change and then what they did

play45:00

also because they were also triggered by

play45:03

the idea of symmetry breaking is that

play45:05

they compute so here you have two

play45:07

classes and they compute actually the

play45:10

potential which in the direction which

play45:13

is a linear interpolation between one

play45:15

image and the other so they are not it's

play45:18

the potential that you have in PR in the

play45:20

backward process in principle is a

play45:22

function of many many variables but you

play45:24

can always look at this function just in

play45:26

One Direction and so what they what they

play45:29

claim here is that inde this time scale

play45:31

that they find here is the time scale

play45:33

over which uh this potential go from

play45:37

let's say two Wells to one one well so

play45:39

in this case remember what I showed you

play45:41

before was the potential was

play45:43

inverted

play45:46

here yes only

play45:48

for very

play45:53

time if you go to very L time it's

play45:56

no

play46:01

yes

play46:03

one this one

play46:08

yeah this one

play46:11

yeah right

play46:14

yeah is going down yes so do we know

play46:20

something that is going down if you go

play46:23

to no

play46:32

yes so I don't know I don't know this

play46:35

related some kind of over so reality I

play46:37

don't know don't

play46:40

know so I know that there are problems

play46:44

related to the let so this in this Cas

play46:47

when go it

play46:49

that okay no I

play46:53

think probably this is this is when you

play46:56

stop at very early time because it's the

play46:58

inverted time so here is time here

play47:01

is here the go down it's probably just

play47:06

nrow I'm

play47:11

sorry they they claim that actually this

play47:14

will be the the potential I show you

play47:16

before but it's inted and also their

play47:18

case just one Dimensions it's much more

play47:21

complicated things but they claim to

play47:23

observe this kind of symmetry breaking

play47:24

that I told you in reality I'm not sure

play47:27

that it I it's what you can do if you

play47:30

numerically but I'm not sure that it's

play47:32

exactly at the same time that we have

play47:34

here but it's it's some evidence that

play47:36

they

play47:39

have all right so now I can conclude so

play47:42

what I wanted to do today is to present

play47:45

the first study of high dimensional

play47:48

diffusion model and the two message May

play47:52

is one is that clearly we see that there

play47:54

is a two different time regimes is the

play47:57

time regime related to I think imry

play47:59

breaking and probably IM the formation

play48:02

of different classes and here it's

play48:04

important if you don't to get the

play48:05

weights right so getting the weight of

play48:07

the different classes would be for

play48:08

example very important for all

play48:10

discussion thess in machine learning and

play48:13

then actually once the system is

play48:14

committed has committed and then then

play48:18

just each one of class or each phase if

play48:21

you have SC physics model so we were

play48:23

able to study the simp case which is

play48:25

dimensional G the competition between

play48:28

number of data

play48:30

and and dimension and they also discuss

play48:34

what does it mean that the diffusion

play48:35

model is good well we have SE that well

play48:38

it depends what you ask are some some

play48:40

some regim in which diffusion model can

play48:42

be considered to be good if you look at

play48:43

some observables but then if you want to

play48:46

be extremely then you have have a very

play48:48

large number of of data now there are I

play48:52

say well I just do three perspective

play48:54

which are three things which we we are

play48:56

working on but I think there are many

play48:57

many more uh so the first thing is in

play49:00

all my f I always knew what was the

play49:04

exact score and I the C case I then use

play49:08

empirical anization to get an

play49:10

approximation of this exact score but

play49:13

then in images you don't know what is

play49:15

the exact score so there is clearly

play49:16

discussion about what happens if you use

play49:20

well an approximation for this exact

play49:22

score which maybe it's not good enough

play49:25

and why actually the approximation that

play49:26

have been used this kind of AR which I

play49:29

show you kind of deep net which goes

play49:31

down goes up again why good so there is

play49:33

a discussion to do and things to think

play49:36

about what are the good approximation

play49:38

classes what are the good architecture

play49:40

and why they are good given the data

play49:43

that we have so what what mat discussed

play49:45

before so then the I think there is in

play49:48

this context so it would be nice

play49:51

actually to discuss the competition

play49:53

between the number of data and dimension

play49:55

so in cases which are more more

play49:57

difficult than just the G and the

play49:59

easiest would be to

play50:01

repeat simple distribution

play50:05

like so in the case of the C Cas I just

play50:09

exact score but of course I can play the

play50:11

same game so I can take an empirical

play50:15

sorry I can take empiric anization that

play50:18

F the parameter and give how much data I

play50:21

need that large dimension in order that

play50:24

Theus model works well the one can go a

play50:27

little bit beyond to model which are

play50:30

better for

play50:32

statis and then actually thenk can go

play50:35

towards realistic application so for

play50:37

problems actually but there is realistic

play50:39

application for example it's in painting

play50:42

which is the case in which you just know

play50:44

a part of the image and then you want

play50:46

that your diffusion model actually

play50:48

create an image whichit well just is is

play50:51

more P something it can be discussed and

play50:55

very nice

play50:56

relationship point to set inal physics

play51:01

and then there is the copyright problem

play51:03

copy problem that discuss so here

play51:05

clearly there is for application you to

play51:09

people from De mind what they really

play51:11

don't want is that your diffusion model

play51:14

recreate an image has been used for

play51:17

because then huge problem on copyri and

play51:20

it's true that if you if you think about

play51:22

if you the only things that you know is

play51:24

the of data but it could be that the

play51:28

original distribution is just Del

play51:30

function exactly localized on data so if

play51:33

your model was perfect should be able to

play51:36

rec would reconstruct that would

play51:38

reconstruct the data from

play51:39

which but it doesn't do well actually in

play51:43

some case it does so understanding when

play51:45

it's going to do it or when it's not

play51:47

going to do it I think it's a very

play51:49

interesting

play51:50

very problem on which people are really

play51:53

very very interested

play52:11

right can you interpret the P of x that

play52:14

you get as the effective effective model

play52:18

with

play52:20

temp so it's like adding temp model and

play52:24

so the next is when you say that the

play52:27

time is not enough to achieve the

play52:31

per because you are still below thetical

play52:34

temper so you then go up to the par phas

play52:38

so you can recover the whole

play52:40

distribution

play52:42

correctly um no I I don't think it can

play52:46

be actually S as temp because in a sense

play52:49

only when you are at times Liv which

play52:52

scale like Logan the system has always

play52:55

the broken pH so what yeah we are

play53:00

putting noise on top of it but you're

play53:02

not putting the noises I mean you would

play53:05

do it C The Temper I think it's a

play53:07

different kind of noise I think there

play53:09

might be

play53:10

actually an interpretation with cost

play53:13

raining more more than temperat so if

play53:16

you I think it's if you add noise but

play53:20

which depends on scale then adding the

play53:23

noise would be more like a cause rain so

play53:25

in this case you're just looking I

play53:27

[Music]

play53:28

mean but I would think it's more like a

play53:31

cing than CH in any case the critical

play53:36

time that you need in order to make does

play53:38

it corresponds to when for example you

play53:42

have simple two

play53:46

G distribution

play53:49

G time

play53:52

get so can we relate the optimal time

play54:00

time the

play54:04

[Music]

play54:07

distribtion but

play54:16

then

play54:22

Rel

play54:24

and

play54:27

because what you do so what you do here

play54:29

is that you learn the diffusion model on

play54:33

images and then what you want to do is

play54:35

that you provide you have want to have

play54:38

me say okay give you just a small piece

play54:42

of image you should create generate New

play54:46

Image which which which mat well with

play54:50

this small piece but you don't want to

play54:52

reconstruct the image which has this

play54:54

piece inide so again you don't want to

play54:57

reproduce the so you just want toce the

play54:59

new kind new you want new Genera model

play55:03

if you want but that produce images that

play55:06

much well with this with this

play55:10

part so you told us about the time SC

play55:15

diens TR you may also have a Time SC to

play55:18

do with when the data

play55:21

point diffusion is enough br

play55:27

yes think

play55:33

it's yeah so I think it's so the time in

play55:36

which you have a blurred so I think this

play55:39

this time scale it's more related

play55:42

to the copyright problem that there is

play55:46

the time at which you realize that your

play55:49

distribution is just makes an average

play55:52

over over this and small time so they

play55:56

really

play56:01

different thanks for besides the

play56:04

mathematics can you give us some

play56:06

intuition on why the first stages of the

play56:09

backward for process are so crucial to

play56:11

guess the right probability distribution

play56:14

because when you measure the score I get

play56:16

the feeling that if you measure for long

play56:18

enough but the end of the process which

play56:20

is the beginning of backwards is just

play56:22

asymptotic GA going ga how how does it

play56:25

contain so much information that when

play56:27

you go backwards you miss that you are I

play56:30

think it's the reason is that if you

play56:32

think in term of so the data so the data

play56:36

has so thinking about G process which is

play56:40

so they have some

play56:42

direction which you have very strong

play56:45

cation and this direction so if you have

play56:47

noise these are the ones that will

play56:49

disappear last so I think it also what

play56:52

you see when you see the images you

play56:54

start to see the form of the image at

play56:56

the first thing and then there are all

play56:58

the details that AR right so I think

play56:59

it's this one thing which is a very big

play57:01

picture that has space has a very strong

play57:05

strong component of the sign and this is

play57:07

the one that disappear lot so it's the

play57:09

one the backward will first so this is

play57:12

also something to do with the fact that

play57:14

you mentioned I think at the beginning

play57:16

that when you learn the score with the

play57:18

net you put more weight on the

play57:21

last or first no actually no actually

play57:25

what they do they don't think about this

play57:26

so so what they do is they put they they

play57:30

put okay know they play with the weight

play57:34

at the beginning and they not at the at

play57:36

the end and they play with the weight at

play57:38

the beginning and the reason is that

play57:39

they don't want to have this copy

play57:41

problem they don't want collapse back to

play57:44

to the original

play57:45

data and instead they don't do anything

play57:49

but in principle one might be interested

play57:50

in guing very well end

play57:53

process yes but then but then for

play57:57

example they don't ask so I think there

play57:58

was no paper that asked the question

play58:00

typically see the system is good because

play58:02

well the image are good and they don't

play58:04

think for example to get the weights

play58:06

right having the idea that get the

play58:08

weights right then then you want to put

play58:10

a lot of precision at the

play58:13

beginning

Rate This

5.0 / 5 (0 votes)

Related Tags
拡散モデル機械学習理論解説画像生成テキスト変換データ分析物理学統計力学次元競争近似手法
Do you need a summary in English?