Advanced Theory | Neural Style Transfer #4
Summary
TLDRDieses Video präsentiert die fortgeschrittenen Techniken der neuronalen Stilübertragung, beginnend mit dem berühmten ImageNet-Klassifikationswettbewerb 2012 und der Einführung von AlexNet. Es folgen die Entwicklungen von ZF-Net und VGG, die die Architektur von AlexNet weiterentwickelten. Ein wichtiger Schritt war die Veröffentlichung des Papers 'Understanding the Image Representations by Inverting them', das den ersten Ansatz zur Rekonstruktion von Bildern aus tiefen Codes vorstellte. Daraus entstand der Deep Dream Algorithmus und schließlich die neurale Stilübertragung. Das Video diskutiert verschiedene Algorithmen, die Geschwindigkeit, Qualität und Flexibilität verbessern, und zeigt, wie die Forschung in dieser Richtung weiterentwickelt wurde, einschließlich der Anwendung auf Videos, 3D-Modelle und Audio. Es hebt auch die Herausforderungen hervor, die die Gemeinschaft noch zu lösen hat, und schließt mit einer spannenden Tatsache über eine künstliche Intelligenz-gestaltete Kunst, die für fast halbe Million Dollar versteigert wurde.
Takeaways
- 🎨 Die Entwicklung von neuronalen Stilübertragungsalgorithmen begann 2012 mit dem ImageNet-Klassifikationswettbewerb und der Einführung von Convolutional Neural Networks (CNNs) mit der AlexNet-Architektur.
- 📈 Die Arbeit von AlexNet markierte einen Wendepunkt in der Bildklassifizierung, da sie die bestehenden Methoden signifikant übertraf und die Effizienz von CNNs aufzeigte.
- 🔍 Die Forschung nach der AlexNet-Ära konzentrierte sich auf die Verbesserung der CNN-Architekturen, wie ZF-Net und VGG, die die kombinatorische Raum, die von AlexNet gesetzt wurde, weiter erforschten.
- 👀 Ein wichtiger Schritt zur Verständlichkeit von CNNs war die Publikation des Papers 'Visualizing and Understanding Convolutional Networks', das die visuelle Struktur von Bildmustern, die bestimmte Feature Maps auslösen, aufzeigte.
- 🖼 Die Arbeit 'Understanding the Image Representations by Inverting Them' aus dem Jahr 2014 war ein bahnbrechender Beitrag zur Rekonstruktion von Eingabebildern aus tiefen Feature Maps und führte zur Entwicklung des Deep Dream-Algorithmus.
- 🎭 Die Einführung des neuronalen Stilübertragungsalgorithmus verbindet die Techniken der Bildrekonstruktion aus tiefen Codes mit der Textursynthese, um Stil und Inhalt zu einem neuen Bild zu kombinieren.
- 🚀 Die Weiterentwicklung der Algorithmen zielt darauf ab, die Geschwindigkeit zu erhöhen, die Qualität zu verbessern und die Flexibilität in der Anzahl der übertragbaren Stile zu erhöhen.
- 🌟 Einige der innovativesten Ansätze zur Verbesserung der Algorithmen beinhalten die Verwendung von Instanznormalisierung und bedingter Instanznormalisierung, um die Qualität und Flexibilität zu steigern.
- 🎭 Die Kontrolle über den Stilübertragungsprozess, einschließlich räumlicher Steuerung, Farbkontrolle und Skalensteuerung, ermöglicht es Künstlern und Entwicklern, die Ausgabe des Netzwerks oder des Algorithmus zu beeinflussen.
- 🌐 Die Anwendung des neuronalen Stilübertragungsalgorithmus wurde auf verschiedene Medien erweitert, einschließlich 3D-Modellen, Fotorealismus, Audio und Storyboards, was zeigt, wie vielfältig und anpassungsfähig die Technik ist.
Q & A
Was ist das grundlegende Konzept hinter dem neuronalen Stilübertragungsverfahren?
-Das grundlegende Konzept des neuronalen Stilübertragungsverfahrens ist es, die visuelle Struktur eines Inhaltsbildes zu erhalten, während die Farb- und Texturmerkmale eines Stilbildes übertragen werden.
Welches war das bahnbrechende Werk von 2012, das die Forschung in Richtung neuronaler Netze beeinflusste?
-Das bahnbrechende Werk von 2012 war die Einführung von Convolutional Neural Networks mit der Architektur namens AlexNet, das bei der ImageNet-Klassifikationsherausforderung alle Mitbewerber übertraf.
Wie nennen Sie die Methode, die das Netzwerk dazu bringt, in bestimmten Feature Maps mehr von etwas zu sehen, was es bereits erkennt?
-Diese Methode wird als Deep Dream bezeichnet und sie nutzt die Pareidolie-Effekte, indem sie die Feature-Maps durch Gradientenascension maximiert, um psychedelische Bilder zu erzeugen.
Was ist die Bedeutung von Gram-Matrizen in der Stilübertragung?
-Gram-Matrizen werden verwendet, um die Korrelationen zwischen verschiedenen Feature-Maps zu erfassen, was für die Übertragung des Stils von einem Bild auf ein anderes wesentlich ist.
Wie unterscheidet sich die Methode von Johnson von der ursprünglichen Methode von Gatys et al.?
-Johnsons Methode optimiert die Transformation des Bildes im Raum der Transformationsgewichte anstatt der Pixel im Bildraum und verwendet die gleiche Verlustfunktion wie Gatys et al., aber sie ist schneller und unterstützt nur einen Stil.
Was ist die Bedeutung von Instanznormalisierung in der neuronalen Stilübertragung?
-Instanznormalisierung ermöglicht es, die Qualität und Flexibilität der Stilübertragung zu erhöhen, indem sie die statistischen Parameter (Mittelwert und Varianz) für jedes Feature-Map in einem Mini-Batch separat berechnet, anstatt sie über alle Trainingsbeispiele zu generalisieren.
Wie können mehrere Stilarten in der neuronalen Stilübertragung kombiniert werden?
-Mehrere Stilarten können kombiniert werden, indem für jeden Stil ein eigener Satz von Normalisierungsparametern (Beta und Gamma) verwendet wird, die auf die Feature-Maps des Inhaltsbildes angewendet werden, um mehrere stilisierte Bilder mit verschiedenen Stilen zu erzeugen.
Welche Technik wird verwendet, um die räumliche Kontrolle bei der neuronalen Stilübertragung zu erreichen?
-Die räumliche Kontrolle wird erreicht, indem man Segmentierungsmasken verwendet, um bestimmte Regionen des Inhaltsbildes mit Stilmerkmalen aus dem Stilbild zu kombinieren, wobei Morphologische Operatoren wie Erosion verwendet werden, um die Übergänge zwischen den Regionen zu glätten.
Wie kann man die Farbeingabe bei der neuronalen Stilübertragung kontrollieren?
-Die Farbeingabe kann kontrolliert werden, indem man die Farb- und Intensitätsinformationen in einem Farbraum trennen, in dem sie getrennt sind, und dann die Farbkanäle des Inhaltsbildes mit den Farbkanälen des Stilbildes zu kombinieren.
Was ist der Hauptunterschied zwischen klassischer und photorealistischer neuronaler Stilübertragung?
-Bei der photorealistischen neuronalen Stilübertragung wird nur die Farbinformation auf das Inhaltsbild übertragen, wohingegen bei der klassischen neuronalen Stilübertragung auch Textur- und Strukturmerkmale des Stilbildes übertragen werden.
Outlines
🖥️ Entwicklung der neuronalen Stilübertragung
Dieser Absatz stellt die Grundlagen der neuronalen Stilübertragung vor und führt in die Geschichte ein, die im Jahr 2012 begann. Es wird über die wichtige Arbeit von AlexNet bei der ImageNet-Klassifikationsherausforderung gesprochen, die mit Convolutional Neural Networks (CNN) und dem AlexNet-Architektur eine enorme Verbesserung bei der Bildklassifizierung brachte. Die Abfolge von Forschungsarbeiten, die auf AlexNet aufbauen, wird ebenso behandelt wie das Verständnis von CNNs und die visuelleisierung von Feature Maps durch das Paper 'Visualizing and Understanding Convolutional Networks'. Zudem wird die Arbeit 'Understanding the Image Representations by Inverting them' erwähnt, die einen Weg zur Rekonstruktion von Eingangsbildern aus tiefen Feature Maps aufzeigt und für die Entwicklung von Deep Dream und der neuralen Stilübertragung von Bedeutung war.
🎨 Fortschritte in der Textursyntheseverfahren
In diesem Absatz wird auf die Verwendung von CNNs durch die Arbeit 'Texture Synthesis using Convolutional Neural Networks' eingegangen, die zur Erstellung von Texturbildern beitrug. Es wird betont, wie diese Arbeit auf früheren Konzepten aufbaut, die Zusammenfassungsstatistiken über Filterantworten aufbauten. Die Verwendung von Gram-Matrizen und die Entwicklung von VGG-Netzwerken, die die Tiefe und die Größe des Kerns untersuchten, werden ebenfalls erwähnt. Schließlich wird die Verbindung von verschiedenen Forschungsarbeiten zur Entwicklung des finalen Algorithmus für die neuronale Stilübertragung beschrieben, der eine Mischung aus Inhalt und Stil in einem Bild erzeugt.
🔍 Verbesserungen und Kontrolle in der Stilübertragung
Dieser Absatz konzentriert sich auf Verbesserungen der neuronalen Stilübertragung, insbesondere auf die Geschwindigkeit und Flexibilität. Es werden zwei unabhängige Papiere von Johnson und von Gatys genannt, die eine schnellere Methode zur Stilübertragung vorschlagen, indem sie das Bild in der Transformationsraum anstatt der Pixel im Bildraum optimieren. Die Einführung von Instanznormalisierung wird als Schlüssel zur Verbesserung der Qualität und Flexibilität der Methoden behandelt. Darüber hinaus wird die Kontrolle über den Stilübertragungsprozess diskutiert, wie z. B. die räumliche Steuerung, die Farbkontrolle und die Steuerung der Skalierung von Pinselstrichen.
🌟 Neue Wege in der Stilübertragung
In diesem Abschnitt wird auf innovative Ansätze in der neuronalen Stilübertragung eingegangen, die unendliche Flexibilität ermöglichen, indem sie Style-Transfer auf eine Weise implementieren, die keine lernenden Parameter erfordert. Es wird über die Verwendung von Batch-Normalisierungsstatistiken gesprochen und wie diese zur Erzielung von Stilübertragungen beitragen können, ohne auf trainierte Modelle angewiesen zu sein. Die Herausforderungen in der Evaluation und der Vergleichbarkeit verschiedener Methoden werden ebenso thematisiert wie die Notwendigkeit, die Darstellung von Stilen zu entkoppeln, um mehr Kontrolle über die Perzeptualfaktoren zu erhalten.
🚀 Zukunft der neuronalen Stilübertragung
Der letzte Absatz betont die Vielfalt der Anwendungen der neuronalen Stilübertragung in verschiedenen Bereichen wie 3D-Modellen, Audio und Storyboards. Es wird auf die Herausforderungen in der Forschung hingewiesen, wie die fehlende standardisierte Bewertung und die Darstellung von Stilen. Ein interessanter Faktor ist die Interaktion zwischen Kunst und Technologie, die durch den Verkauf eines künstlerischen Werks, das nicht mit einer neuronalen Stilübertragungsmethode erstellt wurde, aber dennoch eine hohe Auktionsumme erzielte, hervorgehoben wird. Der Absatz endet mit einem Aufruf an die Zuschauer, das Video zu genießen und auf weitere Inhalte zu warten.
Mindmap
Keywords
💡Neural Style Transfer
💡AlexNet
💡Feature Maps
💡VGG
💡Deep Dream
💡Texture Synthesis
💡Instance Normalization
💡Spatial Control
💡Color Control
💡Scale Control
💡Distribution Alignment
Highlights
介绍神经风格迁移的高级理论,以及从2012年开始的发展历程。
2012年ImageNet分类挑战赛中,AlexNet使用卷积神经网络架构取得突破性进展。
ZFNet和VGGNet在AlexNet的基础上,通过探索网络深度和卷积核大小进一步优化。
2013年的论文《Visualizing and Understanding Convolutional Networks》通过可视化帮助理解CNN中的特征映射。
2014年的研究通过逆向工程从深度特征重构图像,为深度梦境算法和神经风格迁移算法奠定基础。
深度梦境算法通过梯度上升最大化特定层的特征响应,创造出迷幻图像。
神经风格迁移算法的第二个主要组成部分是纹理合成,利用VGG网络的丰富特征表示来创造纹理。
神经风格迁移算法结合了从深度特征重构图像的内容部分和纹理合成的风格部分。
Johnson的方法通过优化图像变换而不是像素来提高神经风格迁移的速度。
Gatys等人的方法通过实例归一化提高了神经风格迁移的灵活性和质量。
条件实例归一化允许使用多种风格进行风格迁移,提供了更大的灵活性。
空间控制允许对图像的特定区域应用不同的风格。
颜色控制允许从内容图像或风格图像中选择颜色通道。
尺度控制允许结合不同风格图像的粗细节和细细节。
神经风格迁移的挑战包括静态评估、表示解耦和缺乏标准基准图像集。
神经风格迁移算法已经扩展到3D模型、音频以及故事色彩图像和视频。
神经风格迁移算法的演变展示了艺术与技术之间有趣的动态关系。
神经风格迁移算法的未来发展和挑战,包括提高算法的灵活性和质量。
Transcripts
in the last couple of videos we saw the
basic theory of noise file transfer we
saw my implementation of the seminal
work by datas and his colleagues and now
we're gonna put it like the whole deal
with my brother I like a broader
perspective and to see how it all came
to be as well as all the follow-up work
that came afterwards because basically
opened up a whole new research direction
so in this video we're going to cover
the advanced neural style transfer
theory and the story starts in 2012 so
there was this now already famous
imagenet classification challenge and in
2012 new method was proposed using
convolutional neural networks with the
architecture called Alex net and it
basically smashed all the competitors
both in 2012 as well as normally I know
the previous years so for example in
2011 Sanchez and pruning the advise this
method using really like heavy
mathematics Fisher vectors and whatnot
and this method just like was twice
twice as twice as bad as as Alex net
even though we do use so much mats and
you can see the quantum leap in 2012
where the air on the classification
challenge went from 25.8 all the way
down to 16.4 and we're going to focus on
these three nuts in this video so Alex
net basically just sparked such a huge
interest in cnn's and ZF net and vgg
basically just explored the
combinatorial space that was already set
by the alex net and people were
generally interested in how seen and
liked architecture worked so there was
this awesome paper in 2013 titled
visualizing and understanding cabinets
by the same guys who advised the ZF net
so they created these really cool
visualizations that helped us better see
which image structures tend to trigger
certain future Maps
and you can see here on the screen that
in the top left corner this future map
like gets triggered by when when you
when when the when it gets dark phases
in the input images and on the top right
you can see that this feature map really
likes some round objects and bottom
bottom right you can see that this
feature map particularly likes spiral
objects vgg came afterwards it just just
pretty much improved upon alex net and
ZF net by exploring the depth and the
the size of the kernel of the count
kernel and we still did not quite
understand how these deep codes work or
better said what they what they have
learned so in 2014 this seminal work
came along titled understanding the
image representations by inverting them
it was the first paper to propose this
method of reconstructing input image
from from deep code from feature maps
something we already know from previous
videos and let me just reiterate again
so we just do the optimization in the
image space on on the noise image and
you can see come one for example on the
top left image that we get a really
detailed reconstruction if we try and
invert the like future maps from from
those shallow layers but if you go into
deeper layers of the math and try to
invert those codes we get something like
calm for that's image underneath and
it's like more abstract it still keeps
the semantics of the image but like
concrete details are getting lost this
work was pretty much inception point for
the creation of the deep dream algorithm
by google guys and if you're not already
familiar with that deep dream just gives
you all of these psychedelic like images
by exploiting what is cool as pareidolia
effect so what it does on the higher
level is this
the network sees like in certain feature
maps it just says hey whatever you see
give me more of it the implementation is
as simple as just maximizing the Future
response at a certain layer by doing say
gradient ascent and not gradient descent
and that's equivalent to saying give me
more of what you see but more important
for our story it was the first main
ingredient for the inception of neural
style transfer algorithm the second main
ingredient also came from the creator of
MST languages in the work title texture
synthesis using ComNet and here he just
exploited the the rich feature
representation of the vdg network to
create these awesome textures that you
can see on the screen and this is
basically the same thing we did in the
last video it is important to appreciate
that this work also did not come out of
the blue so the conceptual framework of
building up some summary statistics over
certain filter responses was already in
place so for example in Portilla and
simone salzburg and instead of using
feature filter responses of VG Ginette
they use the filter responses of a
linear filter bank and instead of using
gram matrix that captures the
correlations between future maps they
used a carefully chosen set of summary
statistics and finally combining the
previous work of reconstructing images
from deep codes which basically gives us
the content portion of the stylized
image and combining that with Gaeta
since texture synthesis work we're
conceptually transferring style is
equivalent to transferring texture so
this part gives us the the style portion
of the stylized image we finally get the
the final algorithm it's just
interesting how connecting a couple of
dots created a surge of research in this
new direction
lots of full opera came after the
original algorithm was advised back in
2015 and what is interesting is that
there is this interesting relationship
that I as all of these algorithms of
came
two roads and that's this three-way
trade-off between speed quality and
flexibility in the number of styles that
the algorithm can produce and that will
become clear what it means a bit later
in the video the original algorithm was
pretty high-quality infinite flexibility
in the sense you can transfer any style
but really slow so let's see how we can
improve the speed portion so the main
idea is this instead of using the
optimization algorithm let's just pass
in the image you have fit for pass and
get a stylized image on and there were
basically two independent papers that
implementing this idea back in March
2016 and those were by Johnson and by
uliana and I'll show the Johnson's
method here because it's conceptually
simpler a bit higher quality but a bit
lower speed so the method goes like this
so we're optimizing this image transform
Nats weight and not the pixels in the in
the image space the loss is the same as
in gatorsburg
and it gets defined by the deep now and
I always find that really interesting
and it gets trained on the MS Koga
dataset so we iterate through images in
the data set and the style was get is
fixed while the Contin loss is specific
for every single image in the data set
and by doing so the net learns to
minimize the style on arbitrary input
image let's see how it ranks against the
three-way trade off so it's still the
fastest implementation out there it's
got the lowest possible flexibility it
supports only one style and the quality
you can see the graphs here and let's
focus on the leftmost graph because it's
just for the lowest res input image and
in the intersection of the blue and
green curves that defines the the place
where the loss is the same for the two
methods for the Gators in Johnson's
methods and we can see that happens
around 80th iteration of l-bfgs
optimizer which means that the quality
of this method is the same as Gaydos
after atl BFGS situations now there's a
reason i mentioned giuliana here
he basically unlocked the quality and
flexibility for these v4 methods by
introducing the concept of instance
normalized
instance normalization is really similar
to bad formalization which was advised a
year and a half before by Christians
Aggie and Sergey I offer from Google and
the only difference is this let's say
the space or over to calculate the
statistics so whereas a bathroom ization
used a particular feature map from every
single training example the instant
civilization just uses a single feature
map and you can see it on the image here
where the spatial dimensions of the
feature map the H and W are collapsed in
a single dimension and n is the mini
batch size that the instance
normalization is using just single
training example to figure out those
statistics and let me just reiterate
when I say statistics I mean finding the
mean and variance of the distribution
and using those to normalize the
distribution making it unit variance and
zero mean now later applying those FM
prams those betas and gammas to just
keep the original representation power
of the network now if you've never heard
about better ization this route probably
sound like rubbish tea and i'd usually
suggest reading the original paper but
this time the paper is really big and
the visualizations are really bad so I
just suggest using either medium some
either medium blog or towards data
science blog and I'll link some of those
in the description but see these
normalization layers in action so if you
apply those in the generated network we
get these results and in the bottom row
you can see that the in summarization
achieves greater quality and I already
mentioned that in summarization unlock
greater quality and bigger flexibility
also the first paper to exploit the
greater flexibility was this conditional
instance normalization paper and they
achieved 32 styles where that wasn't the
hard limit it was just that the number
of parameters grows linearly if you want
to add more styles the main idea goes
like this so we do the same thing as
thing as an instance normalization and
that's normalized distribution making it
univariant since 0 mean and then instead
of using a single pair of betas and
gammas as a means of normalization
we every single style has despair
associated with it
and the simple idea enables us to create
multiple stylized images using multiple
styles and you can see here that
interpolating those different styles we
can get a whole like continuous pace of
new stylized images and it's really
surprising that using only two single
parameters we can define a completely
new style so we've seen some really
high-quality methods like the original
Geddes method we've seen some really
fast methods like Johnson's method and
we've seen some semi flexible methods
like these this conditional instance
normalization so what else do we want
from our MST algorithm and the answer is
control you usually don't have the
control of what what the network or the
algorithm outputs and you want to
control stuff like like space in the
sense we slightly apply to which portion
of the which region of the image and
then you want to control whether you
take the color from the continuity or
take it from the style image and you
want to also have control over which
brush strokes to to use on the coarse
scale and which ones to use on the fine
scale so let's take a look at the
spatial control a bit deeper so the idea
goes like this let's take the sky region
of the style image which is defined by
the black pixels of its corresponding
segmentation mask you can see it on the
top right corner of the image and
applied to the sky region of the
continent which is defined by the black
pixels of its corresponding segmentation
mask and this time let's take this whole
style image and apply to the non sky
region of the continent which is defined
by the white pixels of its corresponding
segmentation mask this mixing of styles
doesn't really happen in the image space
by just combining those images with
segmentation masks it happens in the
feature space where they use some
morphological operators such as erosion
to get those nicely blended together and
when it comes to color control first why
well sometimes you just get an output
image which you don't like like this one
now for the help
well one method is to do this you take
the content and style images you
transform them into some color space
where the color information and the
intensity information is separable and
what you do is that you take the
luminance components of style and
content images you do the style transfer
and then you take the color channels
from the complement and just concatenate
those with the output and you get the
final image and that's the one you see
under D now controlling scale is really
simple what you do is you take fine
scale brushstrokes from one painting you
combine those with the coarse scale
angular geometric shapes from another
style image and you produce a new style
image on and then you just use that one
in EMST in a classic honesty procedure
to get the image under E and just be
aware that this is something useful to
have and not for the fun part so up
until now we consider those graham
matrices to be some kind of a natural
law like we had to match those in order
to transfer style and as this paper
shows the mystifying neural style
transfer matching those gray matrices is
nothing but like minimizing this maximum
mean distribution with a polynomial
kernel that means we can use other
criminals like linear or or Gaussian
kernel to achieve style transfer and as
it says right here this reveals that
neural thought transfer is intrinsically
a process of distribution alignment of
the neural activations between images
which means we basically just need to
align those distributions in order to
transfer a style and there are various
ways to do that
and one important that will be important
for us is this batch normalization
statistics and we already saw him for
that in the conditional instance
normalization paper now this work took
it even further and achieved infinite
flexibility ie it can transfer any of
style possible and it does that the
following way so you take the compliment
you pass it through the feed-forward now
and you take a specific feature map you
just normalize it by finding its mean
and variance
and then you do the same thing for the
style image you find the same feature
map you find the mean and the variance
and you just take those mean and
variance parameters and apply them to
the content feature map you pass it
through the decoder and you get the
stylized image ah so no learnable
parameter is this time you just pass two
single parameters and you achieve style
transfer let's see it once more on the
screen so you take the content feature
map you normalize it by finding its mean
and variance and then you find the mean
and variance for the style image and you
just reapply to the compliment and you
get a stylized image now the good thing
is that those f-fine parameters those
betas and gammas don't need to be
learned anymore such as in
virtualization and also in instance
realization we had to learn those and in
conditional instance normalization here
they are not a learnable parameters but
the bad thing is that the quadrille
still has to be trained so you need a
finite set of style images upon which to
train this decoder and that means it
won't be it won't perform as good for
unseen style images the follow-up work
fixed this problem
achieving truly infinite flexibility in
the sense of number of styles it can
apply although it yet you had to
sacrifice the code a little bit as per
three-way trade off and the method works
like this so you just train a simple
image reconstruction or the encoder and
you insert this WCT block and let's see
what what it does so it does the
whitening on the content features by
just by just basically figuring out the
eigen decomposition of those content
features applying a couple of linear
transformations and ending up with a
grande matrix that's uncorrelated
meaning only the values on the diagonal
have once and everything else is zero
just don't dwell maths here just try and
follow along and then you just find the
same eigen decomposition but this time
for style features apply a couple of
linear transformations and you end up
with content features they have the same
same Grimek matrix as the style image
which is something we all always did but
this
time without any learning any training
so we saw the evolution of methods for
static images and in parallel people
were advising new methods for videos and
the only additional problem that these
methods you have to solve is how to keep
the temporal coherence between frames
and as you can see here we've got some
original frames in the style image and
if you just do a dummy like applying NST
a / / single frame there's going to be a
lot of flickering a lot of a lot of
think insistency between different
frames because every single time you run
it you the image gets to a different
local optimum whereas when you apply the
with temporal constraint you can see
that the images are much more smoother
and consistent between frames the way we
achieve this temporal consistency is the
following so we take this green frame
the previous frame and we we forward
warp it using the optical flow
information and we take this red frame
the next frame and we penalize it for
deviating from the green frame on those
locations on those pixels where the
optical flow was stable ie there was no
dis occlusion happening this method you
use the slow optimization procedure so
needless to say was painfully slow and
the follow-up work just did the same
thing as for static images you just
transferred this problem into a
feed-forward Network I've actually used
this very same model in the beginning of
the series to create the very first clip
it's called recon ad and this is the
clip we created so it's in the beginning
is simple inconsistent and it becomes
temporally consistent and smooth so we
saw neural style transfer for static
images we saw for videos and the thing
with this NST a field is that people are
trying to apply it everywhere so we've
got NSD for story color images and
videos that can be used in VR there is
this concept of transferring style to 3d
models where the artist just needs to to
paint this this sphere in the top right
corner and it gets transferred to the 3d
model there is this photorealistic
neural style transfer where you
basically only transfer the color
information onto the Contin
there is also newest I'll transfer for
audio and style aware content loss etc
etc I thought you'd be really valuable
to share how the this whole field just
evolved how people were trying to
connect various dots and build upon each
other's work and this is how research
looks like the field is nowhere near
having all problems sorted out there are
still a lot of challenges out there one
of those uses static evaluation so there
is still not there doesn't exist some a
numerical method which can help us
compare different dynasty methods and
say hey this is better this one is worse
we are still using those side-by-side
subjective visual comparisons and
different user studies to figure out
which method is better also there is no
standard benchmark image set meaning
everybody is using their own style
images everybody is using their own
condom just and it's kind of hard to
just wrap your head around and compare
different methods another challenge is
this representation disentangling so we
saw some efforts like the controlling
those perceptual factors like scale like
space and color it would be really nice
if we had some latent space
representation where we could just tweak
certain dimension and get out all of
those perceptual factors changing the
way we want them to change and I want to
wrap this up with a fun fact and did you
see the image on the right there could
you guess it was it was sold in on an
auction could you guess the price tag it
was almost half a million dollars and
the thing is it wasn't even created by a
neural style transfer algorithm it was
trading by Gans you can see the equation
down there but it's just so interesting
that we were living in an age where
there is this interesting dynamics going
on between art and tech and I think it's
really a great time to be alive so
really hope you found this video
valuable if you're new to my channel
consider subscribing and sharing and
there's awesome you can't
so stay tuned and seeing the next video
[Music]
Browse More Related Video
Massive Leap Toward AGI: AI Scientist, Grok 2, SearchGPT, Agent Q, New Coding Model
Optimization method | Neural Style Transfer #3
Wie 365 Tage Ohne Social Media Mich Verändert Haben
CRISPR - Gentechnik wird alles für immer verändern
Die Geschichte des Elektroautos - Einfach Elektroauto
Feed-forward method | Neural Style Transfer #5
5.0 / 5 (0 votes)