Advanced Theory | Neural Style Transfer #4

Aleksa Gordić - The AI Epiphany
17 Apr 202022:05

Summary

TLDRDieses Video präsentiert die fortgeschrittenen Techniken der neuronalen Stilübertragung, beginnend mit dem berühmten ImageNet-Klassifikationswettbewerb 2012 und der Einführung von AlexNet. Es folgen die Entwicklungen von ZF-Net und VGG, die die Architektur von AlexNet weiterentwickelten. Ein wichtiger Schritt war die Veröffentlichung des Papers 'Understanding the Image Representations by Inverting them', das den ersten Ansatz zur Rekonstruktion von Bildern aus tiefen Codes vorstellte. Daraus entstand der Deep Dream Algorithmus und schließlich die neurale Stilübertragung. Das Video diskutiert verschiedene Algorithmen, die Geschwindigkeit, Qualität und Flexibilität verbessern, und zeigt, wie die Forschung in dieser Richtung weiterentwickelt wurde, einschließlich der Anwendung auf Videos, 3D-Modelle und Audio. Es hebt auch die Herausforderungen hervor, die die Gemeinschaft noch zu lösen hat, und schließt mit einer spannenden Tatsache über eine künstliche Intelligenz-gestaltete Kunst, die für fast halbe Million Dollar versteigert wurde.

Takeaways

  • 🎨 Die Entwicklung von neuronalen Stilübertragungsalgorithmen begann 2012 mit dem ImageNet-Klassifikationswettbewerb und der Einführung von Convolutional Neural Networks (CNNs) mit der AlexNet-Architektur.
  • 📈 Die Arbeit von AlexNet markierte einen Wendepunkt in der Bildklassifizierung, da sie die bestehenden Methoden signifikant übertraf und die Effizienz von CNNs aufzeigte.
  • 🔍 Die Forschung nach der AlexNet-Ära konzentrierte sich auf die Verbesserung der CNN-Architekturen, wie ZF-Net und VGG, die die kombinatorische Raum, die von AlexNet gesetzt wurde, weiter erforschten.
  • 👀 Ein wichtiger Schritt zur Verständlichkeit von CNNs war die Publikation des Papers 'Visualizing and Understanding Convolutional Networks', das die visuelle Struktur von Bildmustern, die bestimmte Feature Maps auslösen, aufzeigte.
  • 🖼 Die Arbeit 'Understanding the Image Representations by Inverting Them' aus dem Jahr 2014 war ein bahnbrechender Beitrag zur Rekonstruktion von Eingabebildern aus tiefen Feature Maps und führte zur Entwicklung des Deep Dream-Algorithmus.
  • 🎭 Die Einführung des neuronalen Stilübertragungsalgorithmus verbindet die Techniken der Bildrekonstruktion aus tiefen Codes mit der Textursynthese, um Stil und Inhalt zu einem neuen Bild zu kombinieren.
  • 🚀 Die Weiterentwicklung der Algorithmen zielt darauf ab, die Geschwindigkeit zu erhöhen, die Qualität zu verbessern und die Flexibilität in der Anzahl der übertragbaren Stile zu erhöhen.
  • 🌟 Einige der innovativesten Ansätze zur Verbesserung der Algorithmen beinhalten die Verwendung von Instanznormalisierung und bedingter Instanznormalisierung, um die Qualität und Flexibilität zu steigern.
  • 🎭 Die Kontrolle über den Stilübertragungsprozess, einschließlich räumlicher Steuerung, Farbkontrolle und Skalensteuerung, ermöglicht es Künstlern und Entwicklern, die Ausgabe des Netzwerks oder des Algorithmus zu beeinflussen.
  • 🌐 Die Anwendung des neuronalen Stilübertragungsalgorithmus wurde auf verschiedene Medien erweitert, einschließlich 3D-Modellen, Fotorealismus, Audio und Storyboards, was zeigt, wie vielfältig und anpassungsfähig die Technik ist.

Q & A

  • Was ist das grundlegende Konzept hinter dem neuronalen Stilübertragungsverfahren?

    -Das grundlegende Konzept des neuronalen Stilübertragungsverfahrens ist es, die visuelle Struktur eines Inhaltsbildes zu erhalten, während die Farb- und Texturmerkmale eines Stilbildes übertragen werden.

  • Welches war das bahnbrechende Werk von 2012, das die Forschung in Richtung neuronaler Netze beeinflusste?

    -Das bahnbrechende Werk von 2012 war die Einführung von Convolutional Neural Networks mit der Architektur namens AlexNet, das bei der ImageNet-Klassifikationsherausforderung alle Mitbewerber übertraf.

  • Wie nennen Sie die Methode, die das Netzwerk dazu bringt, in bestimmten Feature Maps mehr von etwas zu sehen, was es bereits erkennt?

    -Diese Methode wird als Deep Dream bezeichnet und sie nutzt die Pareidolie-Effekte, indem sie die Feature-Maps durch Gradientenascension maximiert, um psychedelische Bilder zu erzeugen.

  • Was ist die Bedeutung von Gram-Matrizen in der Stilübertragung?

    -Gram-Matrizen werden verwendet, um die Korrelationen zwischen verschiedenen Feature-Maps zu erfassen, was für die Übertragung des Stils von einem Bild auf ein anderes wesentlich ist.

  • Wie unterscheidet sich die Methode von Johnson von der ursprünglichen Methode von Gatys et al.?

    -Johnsons Methode optimiert die Transformation des Bildes im Raum der Transformationsgewichte anstatt der Pixel im Bildraum und verwendet die gleiche Verlustfunktion wie Gatys et al., aber sie ist schneller und unterstützt nur einen Stil.

  • Was ist die Bedeutung von Instanznormalisierung in der neuronalen Stilübertragung?

    -Instanznormalisierung ermöglicht es, die Qualität und Flexibilität der Stilübertragung zu erhöhen, indem sie die statistischen Parameter (Mittelwert und Varianz) für jedes Feature-Map in einem Mini-Batch separat berechnet, anstatt sie über alle Trainingsbeispiele zu generalisieren.

  • Wie können mehrere Stilarten in der neuronalen Stilübertragung kombiniert werden?

    -Mehrere Stilarten können kombiniert werden, indem für jeden Stil ein eigener Satz von Normalisierungsparametern (Beta und Gamma) verwendet wird, die auf die Feature-Maps des Inhaltsbildes angewendet werden, um mehrere stilisierte Bilder mit verschiedenen Stilen zu erzeugen.

  • Welche Technik wird verwendet, um die räumliche Kontrolle bei der neuronalen Stilübertragung zu erreichen?

    -Die räumliche Kontrolle wird erreicht, indem man Segmentierungsmasken verwendet, um bestimmte Regionen des Inhaltsbildes mit Stilmerkmalen aus dem Stilbild zu kombinieren, wobei Morphologische Operatoren wie Erosion verwendet werden, um die Übergänge zwischen den Regionen zu glätten.

  • Wie kann man die Farbeingabe bei der neuronalen Stilübertragung kontrollieren?

    -Die Farbeingabe kann kontrolliert werden, indem man die Farb- und Intensitätsinformationen in einem Farbraum trennen, in dem sie getrennt sind, und dann die Farbkanäle des Inhaltsbildes mit den Farbkanälen des Stilbildes zu kombinieren.

  • Was ist der Hauptunterschied zwischen klassischer und photorealistischer neuronaler Stilübertragung?

    -Bei der photorealistischen neuronalen Stilübertragung wird nur die Farbinformation auf das Inhaltsbild übertragen, wohingegen bei der klassischen neuronalen Stilübertragung auch Textur- und Strukturmerkmale des Stilbildes übertragen werden.

Outlines

00:00

🖥️ Entwicklung der neuronalen Stilübertragung

Dieser Absatz stellt die Grundlagen der neuronalen Stilübertragung vor und führt in die Geschichte ein, die im Jahr 2012 begann. Es wird über die wichtige Arbeit von AlexNet bei der ImageNet-Klassifikationsherausforderung gesprochen, die mit Convolutional Neural Networks (CNN) und dem AlexNet-Architektur eine enorme Verbesserung bei der Bildklassifizierung brachte. Die Abfolge von Forschungsarbeiten, die auf AlexNet aufbauen, wird ebenso behandelt wie das Verständnis von CNNs und die visuelleisierung von Feature Maps durch das Paper 'Visualizing and Understanding Convolutional Networks'. Zudem wird die Arbeit 'Understanding the Image Representations by Inverting them' erwähnt, die einen Weg zur Rekonstruktion von Eingangsbildern aus tiefen Feature Maps aufzeigt und für die Entwicklung von Deep Dream und der neuralen Stilübertragung von Bedeutung war.

05:03

🎨 Fortschritte in der Textursyntheseverfahren

In diesem Absatz wird auf die Verwendung von CNNs durch die Arbeit 'Texture Synthesis using Convolutional Neural Networks' eingegangen, die zur Erstellung von Texturbildern beitrug. Es wird betont, wie diese Arbeit auf früheren Konzepten aufbaut, die Zusammenfassungsstatistiken über Filterantworten aufbauten. Die Verwendung von Gram-Matrizen und die Entwicklung von VGG-Netzwerken, die die Tiefe und die Größe des Kerns untersuchten, werden ebenfalls erwähnt. Schließlich wird die Verbindung von verschiedenen Forschungsarbeiten zur Entwicklung des finalen Algorithmus für die neuronale Stilübertragung beschrieben, der eine Mischung aus Inhalt und Stil in einem Bild erzeugt.

10:04

🔍 Verbesserungen und Kontrolle in der Stilübertragung

Dieser Absatz konzentriert sich auf Verbesserungen der neuronalen Stilübertragung, insbesondere auf die Geschwindigkeit und Flexibilität. Es werden zwei unabhängige Papiere von Johnson und von Gatys genannt, die eine schnellere Methode zur Stilübertragung vorschlagen, indem sie das Bild in der Transformationsraum anstatt der Pixel im Bildraum optimieren. Die Einführung von Instanznormalisierung wird als Schlüssel zur Verbesserung der Qualität und Flexibilität der Methoden behandelt. Darüber hinaus wird die Kontrolle über den Stilübertragungsprozess diskutiert, wie z. B. die räumliche Steuerung, die Farbkontrolle und die Steuerung der Skalierung von Pinselstrichen.

15:07

🌟 Neue Wege in der Stilübertragung

In diesem Abschnitt wird auf innovative Ansätze in der neuronalen Stilübertragung eingegangen, die unendliche Flexibilität ermöglichen, indem sie Style-Transfer auf eine Weise implementieren, die keine lernenden Parameter erfordert. Es wird über die Verwendung von Batch-Normalisierungsstatistiken gesprochen und wie diese zur Erzielung von Stilübertragungen beitragen können, ohne auf trainierte Modelle angewiesen zu sein. Die Herausforderungen in der Evaluation und der Vergleichbarkeit verschiedener Methoden werden ebenso thematisiert wie die Notwendigkeit, die Darstellung von Stilen zu entkoppeln, um mehr Kontrolle über die Perzeptualfaktoren zu erhalten.

20:09

🚀 Zukunft der neuronalen Stilübertragung

Der letzte Absatz betont die Vielfalt der Anwendungen der neuronalen Stilübertragung in verschiedenen Bereichen wie 3D-Modellen, Audio und Storyboards. Es wird auf die Herausforderungen in der Forschung hingewiesen, wie die fehlende standardisierte Bewertung und die Darstellung von Stilen. Ein interessanter Faktor ist die Interaktion zwischen Kunst und Technologie, die durch den Verkauf eines künstlerischen Werks, das nicht mit einer neuronalen Stilübertragungsmethode erstellt wurde, aber dennoch eine hohe Auktionsumme erzielte, hervorgehoben wird. Der Absatz endet mit einem Aufruf an die Zuschauer, das Video zu genießen und auf weitere Inhalte zu warten.

Mindmap

Keywords

💡Neural Style Transfer

Neural Style Transfer ist eine künstliche Intelligenz-Technologie, die es ermöglicht, die visuelle Stilmerkmale eines Bildes auf ein anderes Bild zu übertragen. Im Video wird diese Technik als zentrales Thema behandelt, um zu zeigen, wie sie in der Forschung entwickelt und verbessert wurde. Ein Beispiel aus dem Skript ist die Erwähnung des 'seminalen Werks von Gatys und seinen Kollegen', welches zu einer neuen Forschungsrichtung beigetragen hat.

💡AlexNet

AlexNet ist eine tiefe künstliche Neuronale Netzarchitektur, die in 2012 bei der ImageNet-Klassifikationsherausforderung zum Durchbruch kam. Im Video wird AlexNet als Katalysator für das Interesse an konvolutionalen Neuronalen Netzen (CNNs) beschrieben, die die Grundlage für viele spätere Arbeiten, einschließlich des Neural Style Transfer, gelegt haben.

💡Feature Maps

Feature Maps sind eine Komponente von CNNs, die Informationen über die Struktur und Eigenschaften von Bildern erfassen. Im Video wird erläutert, wie die Forschung die visuellen Strukturen, die bestimmte Feature Maps auslösen, visualisiert hat, um ein besseres Verständnis der Funktionsweise von CNNs zu gewinnen.

💡VGG

VGG ist ein CNN-Modell, das durch die Erweiterung der Tiefe und die Verwendung größerer Kerngrößen als AlexNet und ZFNet bekannt wurde. Im Video wird VGG als Fortsetzung der Forschung zur Verbesserung der Bildkodierung und -verarbeitung durch CNNs erwähnt.

💡Deep Dream

Deep Dream ist ein Algorithmus, der psychedelische Bilder erzeugt, indem er die Aktivitäten von Feature Maps in CNNs maximiert. Im Video wird Deep Dream als eine der ersten Anwendungen der Techniken beschrieben, die später zum Neural Style Transfer beigetragen haben.

💡Texture Synthesis

Texture Synthesis ist ein Verfahren, das es ermöglicht, Texturmerkmale aus einem Bild auf ein anderes zu übertragen. Im Video wird dies als ein zweites Hauptelement vorgestellt, das zur Entwicklung des Neural Style Transfer Algorithmus beigetragen hat.

💡Instance Normalization

Instance Normalization ist ein Verfahren, das verwendet wird, um die Verteilung von Feature Maps in CNNs zu normalisieren. Im Video wird dies als ein Schlüsselkonzept beschrieben, das die Qualität und Flexibilität von Neural Style Transfer Algorithmen verbessert.

💡Spatial Control

Spatial Control bezieht sich auf die Fähigkeit, den Bereich oder die Region eines Bildes zu steuern, auf die ein bestimmter Stil angewendet wird. Im Video wird dies als eine der Verbesserungen des Algorithmus dargestellt, die es ermöglicht, mehr Kontrolle über das Ergebnis des Style Transfers zu haben.

💡Color Control

Color Control ist die Fähigkeit, die Farbgebung des Ergebnisses eines Style Transfers zu steuern. Im Video wird dies als eine Methode beschrieben, die es ermöglicht, die Farbgebung des Inhaltsbildes oder des Stilbildes zu verwenden, um das endgültige Bild zu beeinflussen.

💡Scale Control

Scale Control bezieht sich auf die Kontrolle über die Größe oder den Grad der Strukturen, die im Style Transfer verwendet werden. Im Video wird dies als eine Möglichkeit erläutert, die Art und Weise zu beeinflussen, wie die Stilmerkmale auf das Inhaltsbild angewendet werden.

💡Distribution Alignment

Distribution Alignment ist der Prozess, bei dem die Verteilungen der neuronalen Aktivierungen zwischen Bildern ausgerichtet werden, um Style Transfer zu erreichen. Im Video wird dies als eine tiefere Erkenntnis dargestellt, die zeigt, dass Style Transfer im Wesentlichen eine Frage der Anpassung der Verteilungen ist.

Highlights

介绍神经风格迁移的高级理论,以及从2012年开始的发展历程。

2012年ImageNet分类挑战赛中,AlexNet使用卷积神经网络架构取得突破性进展。

ZFNet和VGGNet在AlexNet的基础上,通过探索网络深度和卷积核大小进一步优化。

2013年的论文《Visualizing and Understanding Convolutional Networks》通过可视化帮助理解CNN中的特征映射。

2014年的研究通过逆向工程从深度特征重构图像,为深度梦境算法和神经风格迁移算法奠定基础。

深度梦境算法通过梯度上升最大化特定层的特征响应,创造出迷幻图像。

神经风格迁移算法的第二个主要组成部分是纹理合成,利用VGG网络的丰富特征表示来创造纹理。

神经风格迁移算法结合了从深度特征重构图像的内容部分和纹理合成的风格部分。

Johnson的方法通过优化图像变换而不是像素来提高神经风格迁移的速度。

Gatys等人的方法通过实例归一化提高了神经风格迁移的灵活性和质量。

条件实例归一化允许使用多种风格进行风格迁移,提供了更大的灵活性。

空间控制允许对图像的特定区域应用不同的风格。

颜色控制允许从内容图像或风格图像中选择颜色通道。

尺度控制允许结合不同风格图像的粗细节和细细节。

神经风格迁移的挑战包括静态评估、表示解耦和缺乏标准基准图像集。

神经风格迁移算法已经扩展到3D模型、音频以及故事色彩图像和视频。

神经风格迁移算法的演变展示了艺术与技术之间有趣的动态关系。

神经风格迁移算法的未来发展和挑战,包括提高算法的灵活性和质量。

Transcripts

play00:00

in the last couple of videos we saw the

play00:03

basic theory of noise file transfer we

play00:06

saw my implementation of the seminal

play00:08

work by datas and his colleagues and now

play00:13

we're gonna put it like the whole deal

play00:15

with my brother I like a broader

play00:17

perspective and to see how it all came

play00:19

to be as well as all the follow-up work

play00:22

that came afterwards because basically

play00:24

opened up a whole new research direction

play00:27

so in this video we're going to cover

play00:29

the advanced neural style transfer

play00:32

theory and the story starts in 2012 so

play00:36

there was this now already famous

play00:38

imagenet classification challenge and in

play00:42

2012 new method was proposed using

play00:45

convolutional neural networks with the

play00:47

architecture called Alex net and it

play00:50

basically smashed all the competitors

play00:52

both in 2012 as well as normally I know

play00:55

the previous years so for example in

play00:58

2011 Sanchez and pruning the advise this

play01:01

method using really like heavy

play01:04

mathematics Fisher vectors and whatnot

play01:07

and this method just like was twice

play01:11

twice as twice as bad as as Alex net

play01:14

even though we do use so much mats and

play01:17

you can see the quantum leap in 2012

play01:19

where the air on the classification

play01:21

challenge went from 25.8 all the way

play01:25

down to 16.4 and we're going to focus on

play01:28

these three nuts in this video so Alex

play01:32

net basically just sparked such a huge

play01:35

interest in cnn's and ZF net and vgg

play01:38

basically just explored the

play01:40

combinatorial space that was already set

play01:42

by the alex net and people were

play01:46

generally interested in how seen and

play01:49

liked architecture worked so there was

play01:53

this awesome paper in 2013 titled

play01:57

visualizing and understanding cabinets

play01:59

by the same guys who advised the ZF net

play02:03

so they created these really cool

play02:05

visualizations that helped us better see

play02:08

which image structures tend to trigger

play02:10

certain future Maps

play02:12

and you can see here on the screen that

play02:15

in the top left corner this future map

play02:18

like gets triggered by when when you

play02:21

when when the when it gets dark phases

play02:24

in the input images and on the top right

play02:28

you can see that this feature map really

play02:30

likes some round objects and bottom

play02:32

bottom right you can see that this

play02:34

feature map particularly likes spiral

play02:37

objects vgg came afterwards it just just

play02:43

pretty much improved upon alex net and

play02:45

ZF net by exploring the depth and the

play02:49

the size of the kernel of the count

play02:52

kernel and we still did not quite

play02:57

understand how these deep codes work or

play03:01

better said what they what they have

play03:03

learned so in 2014 this seminal work

play03:07

came along titled understanding the

play03:10

image representations by inverting them

play03:13

it was the first paper to propose this

play03:16

method of reconstructing input image

play03:19

from from deep code from feature maps

play03:22

something we already know from previous

play03:24

videos and let me just reiterate again

play03:26

so we just do the optimization in the

play03:28

image space on on the noise image and

play03:31

you can see come one for example on the

play03:35

top left image that we get a really

play03:38

detailed reconstruction if we try and

play03:43

invert the like future maps from from

play03:47

those shallow layers but if you go into

play03:49

deeper layers of the math and try to

play03:51

invert those codes we get something like

play03:54

calm for that's image underneath and

play03:57

it's like more abstract it still keeps

play04:01

the semantics of the image but like

play04:04

concrete details are getting lost this

play04:06

work was pretty much inception point for

play04:08

the creation of the deep dream algorithm

play04:10

by google guys and if you're not already

play04:13

familiar with that deep dream just gives

play04:16

you all of these psychedelic like images

play04:19

by exploiting what is cool as pareidolia

play04:22

effect so what it does on the higher

play04:24

level is this

play04:26

the network sees like in certain feature

play04:29

maps it just says hey whatever you see

play04:32

give me more of it the implementation is

play04:34

as simple as just maximizing the Future

play04:37

response at a certain layer by doing say

play04:40

gradient ascent and not gradient descent

play04:43

and that's equivalent to saying give me

play04:45

more of what you see but more important

play04:48

for our story it was the first main

play04:52

ingredient for the inception of neural

play04:55

style transfer algorithm the second main

play04:57

ingredient also came from the creator of

play04:59

MST languages in the work title texture

play05:03

synthesis using ComNet and here he just

play05:06

exploited the the rich feature

play05:09

representation of the vdg network to

play05:12

create these awesome textures that you

play05:15

can see on the screen and this is

play05:16

basically the same thing we did in the

play05:18

last video it is important to appreciate

play05:20

that this work also did not come out of

play05:22

the blue so the conceptual framework of

play05:25

building up some summary statistics over

play05:29

certain filter responses was already in

play05:32

place so for example in Portilla and

play05:35

simone salzburg and instead of using

play05:37

feature filter responses of VG Ginette

play05:40

they use the filter responses of a

play05:42

linear filter bank and instead of using

play05:44

gram matrix that captures the

play05:47

correlations between future maps they

play05:49

used a carefully chosen set of summary

play05:50

statistics and finally combining the

play05:54

previous work of reconstructing images

play05:57

from deep codes which basically gives us

play05:59

the content portion of the stylized

play06:01

image and combining that with Gaeta

play06:05

since texture synthesis work we're

play06:08

conceptually transferring style is

play06:10

equivalent to transferring texture so

play06:12

this part gives us the the style portion

play06:15

of the stylized image we finally get the

play06:17

the final algorithm it's just

play06:20

interesting how connecting a couple of

play06:21

dots created a surge of research in this

play06:25

new direction

play06:26

lots of full opera came after the

play06:28

original algorithm was advised back in

play06:31

2015 and what is interesting is that

play06:34

there is this interesting relationship

play06:36

that I as all of these algorithms of

play06:38

came

play06:38

two roads and that's this three-way

play06:41

trade-off between speed quality and

play06:44

flexibility in the number of styles that

play06:47

the algorithm can produce and that will

play06:49

become clear what it means a bit later

play06:51

in the video the original algorithm was

play06:53

pretty high-quality infinite flexibility

play06:56

in the sense you can transfer any style

play06:58

but really slow so let's see how we can

play07:01

improve the speed portion so the main

play07:04

idea is this instead of using the

play07:06

optimization algorithm let's just pass

play07:08

in the image you have fit for pass and

play07:10

get a stylized image on and there were

play07:12

basically two independent papers that

play07:15

implementing this idea back in March

play07:17

2016 and those were by Johnson and by

play07:19

uliana and I'll show the Johnson's

play07:22

method here because it's conceptually

play07:24

simpler a bit higher quality but a bit

play07:26

lower speed so the method goes like this

play07:29

so we're optimizing this image transform

play07:31

Nats weight and not the pixels in the in

play07:35

the image space the loss is the same as

play07:38

in gatorsburg

play07:40

and it gets defined by the deep now and

play07:42

I always find that really interesting

play07:43

and it gets trained on the MS Koga

play07:47

dataset so we iterate through images in

play07:50

the data set and the style was get is

play07:53

fixed while the Contin loss is specific

play07:56

for every single image in the data set

play07:58

and by doing so the net learns to

play08:01

minimize the style on arbitrary input

play08:03

image let's see how it ranks against the

play08:06

three-way trade off so it's still the

play08:08

fastest implementation out there it's

play08:10

got the lowest possible flexibility it

play08:12

supports only one style and the quality

play08:15

you can see the graphs here and let's

play08:17

focus on the leftmost graph because it's

play08:19

just for the lowest res input image and

play08:21

in the intersection of the blue and

play08:24

green curves that defines the the place

play08:27

where the loss is the same for the two

play08:28

methods for the Gators in Johnson's

play08:30

methods and we can see that happens

play08:32

around 80th iteration of l-bfgs

play08:34

optimizer which means that the quality

play08:36

of this method is the same as Gaydos

play08:38

after atl BFGS situations now there's a

play08:42

reason i mentioned giuliana here

play08:44

he basically unlocked the quality and

play08:46

flexibility for these v4 methods by

play08:49

introducing the concept of instance

play08:52

normalized

play08:52

instance normalization is really similar

play08:54

to bad formalization which was advised a

play08:56

year and a half before by Christians

play08:59

Aggie and Sergey I offer from Google and

play09:02

the only difference is this let's say

play09:04

the space or over to calculate the

play09:06

statistics so whereas a bathroom ization

play09:08

used a particular feature map from every

play09:11

single training example the instant

play09:14

civilization just uses a single feature

play09:15

map and you can see it on the image here

play09:18

where the spatial dimensions of the

play09:21

feature map the H and W are collapsed in

play09:23

a single dimension and n is the mini

play09:25

batch size that the instance

play09:27

normalization is using just single

play09:29

training example to figure out those

play09:31

statistics and let me just reiterate

play09:33

when I say statistics I mean finding the

play09:36

mean and variance of the distribution

play09:38

and using those to normalize the

play09:40

distribution making it unit variance and

play09:43

zero mean now later applying those FM

play09:46

prams those betas and gammas to just

play09:49

keep the original representation power

play09:51

of the network now if you've never heard

play09:53

about better ization this route probably

play09:56

sound like rubbish tea and i'd usually

play09:59

suggest reading the original paper but

play10:02

this time the paper is really big and

play10:03

the visualizations are really bad so I

play10:06

just suggest using either medium some

play10:10

either medium blog or towards data

play10:12

science blog and I'll link some of those

play10:13

in the description but see these

play10:15

normalization layers in action so if you

play10:17

apply those in the generated network we

play10:20

get these results and in the bottom row

play10:23

you can see that the in summarization

play10:25

achieves greater quality and I already

play10:27

mentioned that in summarization unlock

play10:30

greater quality and bigger flexibility

play10:32

also the first paper to exploit the

play10:36

greater flexibility was this conditional

play10:40

instance normalization paper and they

play10:42

achieved 32 styles where that wasn't the

play10:45

hard limit it was just that the number

play10:47

of parameters grows linearly if you want

play10:50

to add more styles the main idea goes

play10:51

like this so we do the same thing as

play10:54

thing as an instance normalization and

play10:56

that's normalized distribution making it

play10:58

univariant since 0 mean and then instead

play11:02

of using a single pair of betas and

play11:03

gammas as a means of normalization

play11:05

we every single style has despair

play11:10

associated with it

play11:11

and the simple idea enables us to create

play11:15

multiple stylized images using multiple

play11:18

styles and you can see here that

play11:20

interpolating those different styles we

play11:22

can get a whole like continuous pace of

play11:24

new stylized images and it's really

play11:27

surprising that using only two single

play11:31

parameters we can define a completely

play11:33

new style so we've seen some really

play11:35

high-quality methods like the original

play11:37

Geddes method we've seen some really

play11:39

fast methods like Johnson's method and

play11:41

we've seen some semi flexible methods

play11:44

like these this conditional instance

play11:46

normalization so what else do we want

play11:48

from our MST algorithm and the answer is

play11:51

control you usually don't have the

play11:53

control of what what the network or the

play11:56

algorithm outputs and you want to

play11:58

control stuff like like space in the

play12:01

sense we slightly apply to which portion

play12:03

of the which region of the image and

play12:06

then you want to control whether you

play12:08

take the color from the continuity or

play12:10

take it from the style image and you

play12:12

want to also have control over which

play12:15

brush strokes to to use on the coarse

play12:18

scale and which ones to use on the fine

play12:19

scale so let's take a look at the

play12:21

spatial control a bit deeper so the idea

play12:25

goes like this let's take the sky region

play12:28

of the style image which is defined by

play12:29

the black pixels of its corresponding

play12:31

segmentation mask you can see it on the

play12:33

top right corner of the image and

play12:35

applied to the sky region of the

play12:39

continent which is defined by the black

play12:40

pixels of its corresponding segmentation

play12:42

mask and this time let's take this whole

play12:45

style image and apply to the non sky

play12:48

region of the continent which is defined

play12:50

by the white pixels of its corresponding

play12:52

segmentation mask this mixing of styles

play12:55

doesn't really happen in the image space

play12:57

by just combining those images with

play12:59

segmentation masks it happens in the

play13:02

feature space where they use some

play13:04

morphological operators such as erosion

play13:06

to get those nicely blended together and

play13:09

when it comes to color control first why

play13:12

well sometimes you just get an output

play13:15

image which you don't like like this one

play13:18

now for the help

play13:19

well one method is to do this you take

play13:22

the content and style images you

play13:24

transform them into some color space

play13:27

where the color information and the

play13:29

intensity information is separable and

play13:32

what you do is that you take the

play13:34

luminance components of style and

play13:36

content images you do the style transfer

play13:38

and then you take the color channels

play13:41

from the complement and just concatenate

play13:43

those with the output and you get the

play13:46

final image and that's the one you see

play13:48

under D now controlling scale is really

play13:51

simple what you do is you take fine

play13:54

scale brushstrokes from one painting you

play13:57

combine those with the coarse scale

play13:59

angular geometric shapes from another

play14:02

style image and you produce a new style

play14:04

image on and then you just use that one

play14:06

in EMST in a classic honesty procedure

play14:09

to get the image under E and just be

play14:12

aware that this is something useful to

play14:13

have and not for the fun part so up

play14:16

until now we consider those graham

play14:18

matrices to be some kind of a natural

play14:20

law like we had to match those in order

play14:22

to transfer style and as this paper

play14:24

shows the mystifying neural style

play14:26

transfer matching those gray matrices is

play14:30

nothing but like minimizing this maximum

play14:33

mean distribution with a polynomial

play14:35

kernel that means we can use other

play14:38

criminals like linear or or Gaussian

play14:41

kernel to achieve style transfer and as

play14:44

it says right here this reveals that

play14:47

neural thought transfer is intrinsically

play14:48

a process of distribution alignment of

play14:50

the neural activations between images

play14:53

which means we basically just need to

play14:55

align those distributions in order to

play14:57

transfer a style and there are various

play14:59

ways to do that

play15:00

and one important that will be important

play15:03

for us is this batch normalization

play15:06

statistics and we already saw him for

play15:10

that in the conditional instance

play15:11

normalization paper now this work took

play15:14

it even further and achieved infinite

play15:17

flexibility ie it can transfer any of

play15:19

style possible and it does that the

play15:22

following way so you take the compliment

play15:24

you pass it through the feed-forward now

play15:26

and you take a specific feature map you

play15:29

just normalize it by finding its mean

play15:32

and variance

play15:32

and then you do the same thing for the

play15:35

style image you find the same feature

play15:37

map you find the mean and the variance

play15:39

and you just take those mean and

play15:41

variance parameters and apply them to

play15:43

the content feature map you pass it

play15:46

through the decoder and you get the

play15:47

stylized image ah so no learnable

play15:50

parameter is this time you just pass two

play15:52

single parameters and you achieve style

play15:54

transfer let's see it once more on the

play15:56

screen so you take the content feature

play15:58

map you normalize it by finding its mean

play16:00

and variance and then you find the mean

play16:03

and variance for the style image and you

play16:05

just reapply to the compliment and you

play16:06

get a stylized image now the good thing

play16:10

is that those f-fine parameters those

play16:12

betas and gammas don't need to be

play16:15

learned anymore such as in

play16:16

virtualization and also in instance

play16:19

realization we had to learn those and in

play16:21

conditional instance normalization here

play16:23

they are not a learnable parameters but

play16:26

the bad thing is that the quadrille

play16:27

still has to be trained so you need a

play16:29

finite set of style images upon which to

play16:32

train this decoder and that means it

play16:34

won't be it won't perform as good for

play16:36

unseen style images the follow-up work

play16:39

fixed this problem

play16:42

achieving truly infinite flexibility in

play16:45

the sense of number of styles it can

play16:47

apply although it yet you had to

play16:49

sacrifice the code a little bit as per

play16:51

three-way trade off and the method works

play16:53

like this so you just train a simple

play16:56

image reconstruction or the encoder and

play16:58

you insert this WCT block and let's see

play17:02

what what it does so it does the

play17:04

whitening on the content features by

play17:07

just by just basically figuring out the

play17:11

eigen decomposition of those content

play17:13

features applying a couple of linear

play17:15

transformations and ending up with a

play17:17

grande matrix that's uncorrelated

play17:18

meaning only the values on the diagonal

play17:22

have once and everything else is zero

play17:23

just don't dwell maths here just try and

play17:26

follow along and then you just find the

play17:30

same eigen decomposition but this time

play17:32

for style features apply a couple of

play17:34

linear transformations and you end up

play17:36

with content features they have the same

play17:39

same Grimek matrix as the style image

play17:42

which is something we all always did but

play17:46

this

play17:46

time without any learning any training

play17:48

so we saw the evolution of methods for

play17:50

static images and in parallel people

play17:52

were advising new methods for videos and

play17:55

the only additional problem that these

play17:57

methods you have to solve is how to keep

play17:59

the temporal coherence between frames

play18:01

and as you can see here we've got some

play18:05

original frames in the style image and

play18:07

if you just do a dummy like applying NST

play18:10

a / / single frame there's going to be a

play18:14

lot of flickering a lot of a lot of

play18:16

think insistency between different

play18:18

frames because every single time you run

play18:21

it you the image gets to a different

play18:23

local optimum whereas when you apply the

play18:26

with temporal constraint you can see

play18:28

that the images are much more smoother

play18:30

and consistent between frames the way we

play18:32

achieve this temporal consistency is the

play18:34

following so we take this green frame

play18:36

the previous frame and we we forward

play18:39

warp it using the optical flow

play18:41

information and we take this red frame

play18:45

the next frame and we penalize it for

play18:47

deviating from the green frame on those

play18:52

locations on those pixels where the

play18:54

optical flow was stable ie there was no

play18:57

dis occlusion happening this method you

play18:59

use the slow optimization procedure so

play19:02

needless to say was painfully slow and

play19:04

the follow-up work just did the same

play19:06

thing as for static images you just

play19:08

transferred this problem into a

play19:09

feed-forward Network I've actually used

play19:11

this very same model in the beginning of

play19:13

the series to create the very first clip

play19:16

it's called recon ad and this is the

play19:18

clip we created so it's in the beginning

play19:20

is simple inconsistent and it becomes

play19:22

temporally consistent and smooth so we

play19:25

saw neural style transfer for static

play19:28

images we saw for videos and the thing

play19:31

with this NST a field is that people are

play19:33

trying to apply it everywhere so we've

play19:35

got NSD for story color images and

play19:37

videos that can be used in VR there is

play19:40

this concept of transferring style to 3d

play19:43

models where the artist just needs to to

play19:47

paint this this sphere in the top right

play19:50

corner and it gets transferred to the 3d

play19:52

model there is this photorealistic

play19:54

neural style transfer where you

play19:56

basically only transfer the color

play19:57

information onto the Contin

play20:00

there is also newest I'll transfer for

play20:02

audio and style aware content loss etc

play20:06

etc I thought you'd be really valuable

play20:08

to share how the this whole field just

play20:11

evolved how people were trying to

play20:12

connect various dots and build upon each

play20:15

other's work and this is how research

play20:17

looks like the field is nowhere near

play20:19

having all problems sorted out there are

play20:21

still a lot of challenges out there one

play20:24

of those uses static evaluation so there

play20:27

is still not there doesn't exist some a

play20:29

numerical method which can help us

play20:30

compare different dynasty methods and

play20:33

say hey this is better this one is worse

play20:34

we are still using those side-by-side

play20:38

subjective visual comparisons and

play20:41

different user studies to figure out

play20:43

which method is better also there is no

play20:46

standard benchmark image set meaning

play20:48

everybody is using their own style

play20:50

images everybody is using their own

play20:51

condom just and it's kind of hard to

play20:53

just wrap your head around and compare

play20:56

different methods another challenge is

play20:57

this representation disentangling so we

play21:00

saw some efforts like the controlling

play21:02

those perceptual factors like scale like

play21:06

space and color it would be really nice

play21:08

if we had some latent space

play21:10

representation where we could just tweak

play21:12

certain dimension and get out all of

play21:14

those perceptual factors changing the

play21:16

way we want them to change and I want to

play21:19

wrap this up with a fun fact and did you

play21:22

see the image on the right there could

play21:23

you guess it was it was sold in on an

play21:25

auction could you guess the price tag it

play21:28

was almost half a million dollars and

play21:30

the thing is it wasn't even created by a

play21:33

neural style transfer algorithm it was

play21:34

trading by Gans you can see the equation

play21:36

down there but it's just so interesting

play21:38

that we were living in an age where

play21:39

there is this interesting dynamics going

play21:41

on between art and tech and I think it's

play21:44

really a great time to be alive so

play21:46

really hope you found this video

play21:48

valuable if you're new to my channel

play21:50

consider subscribing and sharing and

play21:52

there's awesome you can't

play21:55

so stay tuned and seeing the next video

play21:57

[Music]

Rate This

5.0 / 5 (0 votes)

Related Tags
Neuronale NetzeStilübertragungKünstliche IntelligenzAlexNetVGGNetDeepDreamInhaltskontrolleFarbmanipulationVideoanwendung3D-Modelle
Do you need a summary in English?