Michał Kudelski (TCL): Inpainting using Deep Learning: from theory to practice
Summary
TLDRThe speaker from TCL Research Europe introduces their AI project focused on image and video inpainting using deep learning, a technique to reconstruct lost or deteriorated parts of visual media. Applications include restoring old photos, scene editing, and even uncensoring animations. The talk covers the use of partial convolutions, challenges in training, and practical issues like batch normalization and high-resolution inpainting. The presentation concludes with sample results and an invitation to learn more about TCL's innovative projects.
Takeaways
- 📍 The speaker is from TCL Research Europe, a new R&D center focusing on AI methods, particularly in computer vision for smart devices like TVs and smartphones.
- 🎨 'Inpainting' is the process of reconstructing lost or deteriorated parts of images or videos, which is the main topic of the presentation.
- 🤖 Deep learning, specifically partial convolutions, is the approach used in the speaker's project for image inpainting, which is more advanced than traditional methods.
- 🔍 The project's practical applications include restoring old photos, automatic scene editing, and even uncensoring images, demonstrating the versatility of inpainting.
- 🛠️ Training data for inpainting models can be obtained from existing databases or by generating random masks to simulate missing parts of images.
- 🌟 The architecture of the inpainting model is based on an encoder-decoder structure, with partial convolutions accounting for missing data in the input image.
- 🔧 The model's loss function is a combination of several elements, including pixel-wise loss, perceptual loss, style loss, and total variation loss, each contributing to the quality of the inpainted output.
- 🚀 Challenges in inpainting include issues with batch normalization due to varying mask sizes and the increased computational demand of high-resolution images.
- 🔍 Solutions to these challenges include training with diversified masks, using instance normalization, or removing normalization layers altogether.
- 🔄 The speaker also discusses the potential of using adversarial losses and a new loss function called IDM-RF to improve the realism and diversity of inpainted images.
- 📈 TCL Research Europe is actively working on advancing inpainting technology, with a focus on practical applications and overcoming technical hurdles for real-world use.
Q & A
What is TCL Research Europe and what is its primary focus?
-TCL Research Europe is a new R&D center established by TCL in Warsaw. It primarily focuses on AI methods, specifically in the area of computer vision, as TCL is a major manufacturer of Smart TVs and smartphones.
What is the concept of 'inpainting' in the context of the presented project?
-Inpainting refers to the process of reconstructing lost or deteriorated parts of images or videos. It involves using an input image with a mask indicating the missing parts, and then reconstructing those parts based on the surrounding context.
Why is the topic of inpainting considered interesting and important?
-Inpainting is considered interesting due to its applications in various fields such as restoring old photos and videos, automatic scene editing, retouching, denoising, and even entertainment like uncensoring Japanese animations. It was also a topic at the prestigious NIPS conference, indicating its significance in the AI community.
What is the role of deep learning in the inpainting project presented?
-Deep learning is used to build an inpainting model that can effectively reconstruct missing image parts. It is based on a recent paper introducing partial convolutions, which is a technique that takes into account the masks indicating missing areas during the convolution process.
What are partial convolutions and how do they differ from traditional convolutions?
-Partial convolutions are a modification of traditional convolutions that account for missing data by multiplying the input patch with a mask before performing the convolution. This means that during the convolution, only the pixels outside of the mask are considered, and the mask is updated after each layer to reflect the reconstructed pixels.
What are some practical issues encountered during the inpainting project?
-Some practical issues include problems with batch normalization due to varying mask sizes, difficulties with high-resolution inpainting due to increased computational cost, and challenges with reconstructing detailed textures at higher resolutions.
How can batch normalization issues be addressed in the inpainting model?
-Batch normalization issues can be addressed by using techniques such as freeze training, where batch normalization layers are frozen after initial training, allowing the model to adapt to different mask sizes during fine-tuning. Other methods include using instance normalization or removing batch normalization layers altogether.
What are some approaches to handle high-resolution inpainting challenges?
-To handle high-resolution inpainting challenges, one can reduce model size, optimize the model for inference, use quantization techniques, or leverage specialized hardware like DSP processors. Additionally, increasing the receptive fields of the model or using architectures with different receptive field sizes can help improve results.
What is the significance of mask generation in the inpainting process?
-Mask generation is crucial as it defines the areas of the image that need to be inpainted. Specialized masks can be generated using techniques like semantic segmentation or object detection to focus on specific elements like faces or objects, which can be useful for automatic scene editing or fine-tuning the model for specific applications.
Can you provide an example of how the inpainting model can be applied to facial images?
-The inpainting model can be trained on facial images to reconstruct missing parts of faces realistically. It can also be used for facial retouching, such as smoothing out wrinkles or removing imperfections, resulting in a retouched and more aesthetically pleasing facial image.
Outlines
📚 Introduction to AI Research and Inpainting at TCL
The speaker introduces TCL Research Europe, a new R&D center in Warsaw focusing on AI, specifically computer vision for applications like Smart TVs and smartphones. The main topic, 'inpainting,' is presented as a process to reconstruct missing or deteriorated parts of images or videos using AI methods. The speaker outlines the talk's structure, which includes explaining inpainting, showcasing a deep learning approach, discussing practical issues, and presenting results.
🎨 Deep Learning Approach to Image Inpainting
This paragraph delves into the specifics of using deep learning for inpainting. The speaker discusses the advantages of deep learning over traditional methods, particularly in handling complex tasks like reconstructing faces or objects. The architecture of the model is described, emphasizing the use of partial convolutions that take into account the masks indicating missing areas. The process of training the model, including the importance of mask diversity and the structure of the encoder-decoder model, is explained.
🔍 Advanced Architectures and Loss Functions in Inpainting
The speaker presents various advanced architectures for inpainting, including those from Adobe and recent NIPS conferences. Different approaches like multi-column convolutional neural networks and the use of specialized convolutions to increase receptive fields are discussed. The paragraph also covers the composition of the loss function, which includes pixel-wise loss, perceptual loss, style loss, and total variation loss, highlighting their roles in optimizing the model's performance.
🛠️ Practical Challenges in Inpainting
The speaker addresses practical issues encountered during the project, such as problems with batch normalization due to varying mask sizes and the challenges of training at different resolutions. Solutions like freeze training, using diversified masks, and considering alternative normalization techniques are suggested. The discussion also touches on the removal of batch normalization layers to avoid artifacts and color coherence issues.
🖼️ High-Resolution Inpainting and Its Challenges
This paragraph focuses on the challenges of high-resolution inpainting, including increased computational demands and memory consumption. Strategies to address these issues, such as model optimization, quantization, and leveraging mobile device hardware like DSP units, are presented. The speaker also discusses the 'big bath problem,' where high-resolution images require reconstructing many more pixels, and suggests increasing receptive fields and using multi-stream models as potential solutions.
🔧 Enhancing Inpainting with Advanced Techniques
The speaker discusses methods to improve inpainting results, particularly at higher resolutions. Techniques such as training on high-resolution images, using adversarial loss to avoid artifacts, and combining inpainting with super-resolution are suggested. The importance of detailed textures in high-resolution inpainting is highlighted, and the potential for post-processing to blend original and reconstructed patches for realism is explored.
🎭 Applications and Future of Inpainting Technology
The final paragraph showcases sample results of the inpainting model, demonstrating its effectiveness in removing unwanted objects and reconstructing faces realistically. The speaker emphasizes the potential applications of inpainting in automatic scene editing and face retouching. The paragraph concludes with a summary of the importance of inpainting, the journey from research to practical application, and an invitation for interested individuals to engage with TCL Research Europe.
Mindmap
Keywords
💡Inpainting
💡Deep Learning
💡Computer Vision
💡Partial Convolution
💡Mask
💡Loss Function
💡Normalization
💡High-Resolution
💡Generative Adversarial Networks (GANs)
💡Receptive Field
💡Mask Generation
Highlights
Introduction of TCL Research Europe, a new R&D center focusing on AI methods, specifically in computer vision.
In-painting is the process of reconstructing lost or deteriorated parts of images or videos.
In-painting can be used for restoring old photos, automatic scene editing, and denoising.
Deep learning is used in in-painting to capture high-level semantics of images, unlike traditional methods.
The importance of training data and the use of partial convolutions in the in-painting model.
Different architectures for in-painting, including encoder-decoder models and multi-column convolutional neural networks.
Loss functions used for optimizing the in-painting model, including pixel loss, perceptual loss, and total variation loss.
The use of adversarial loss and generative adversarial networks for refining in-painting results.
Practical issues encountered during the in-painting project, such as artifacts caused by batch normalization.
Approaches to address high-resolution in-painting challenges, including model optimization and the use of DSP processors.
The problem of detailed textures in high-resolution in-painting and methods to improve realism.
The generation of specialized masks for in-painting using techniques like semantic segmentation and object detection.
Sample results showcasing the effectiveness of the in-painting model in removing objects and reconstructing faces.
The potential of in-painting for entertainment, such as uncensoring images and revealing details in animations.
The need for further improvement in in-painting models to address artifacts and enhance detail reconstruction.
Summary emphasizing the value of in-painting, the journey from research to practical application, and the ongoing projects at TCL Research Europe.
Transcripts
hello everyone as we have heard I'm from
TCL research Europe which is a new R&D
center we started in August here in
Warsaw and we are mostly focusing on the
AI actually only on the AI methods and
mostly in the area of computer vision
because TCL is a is a big manufacturer
Chinese manufacturer of Smart TVs and
smartphones as well and I would like to
present you one of our one of our
projects namely in painting and the plan
is simple so first I will tell you in
simple words what in painting is and I
will try to show you why it is
interesting then I will I will show you
one sample approach based on deep
learning deep learning this is the one
that we are building on in our project I
will also mention about some other
approaches and modifications possible
modifications then I will also say a few
words about some practical issues that
we encountered during during the project
and now I'll I will show some sample
results and summarize at the end so let
me start what is what actually is in
painting so so the answer is quite quite
simple so it is the process of
reconstructing lost of the Terek
deteriorated parts of images or videos
so like in this example we have an input
image then we put some masks on it and
we are trying to reconstruct the missing
parts of the image basing on the
neighborhood so that's more or less the
topic and why it's interesting so the
first answer is it was on the nips
recent on recent nips conference nips is
the top AI conference so it's the answer
itself it's if it's on nips then okay it
has to be interesting but believe me or
not there are some other reasons so in
other applications useful so for example
it can be used to restore old old photos
or videos like here we have some defects
on on a photo we can put a mask on it
and then try to restore what the photo
should like without without the defect
in our application obvious one is an
automatic scene editing
and retouching so for example we have
some photos with with objects we want to
remove the objects so we put put the
mask on objects and then we have a clear
photo without without the objects on it
there are also some other applications
like in painting can be used for the
noising as well so as a kind of a side
effect the in painting results tend to
be smooth even if we even if we put
noisy input and there are works
working on it on it and trying to figure
out what the mask should look like to
achieve a good denoising results but
that will not focus on that also it can
be used for a compression so here are
some interesting interesting results so
from the only the 5% of pixels of course
if we choose those peaks or pixels in a
smart way
we are able to reconstruct the whole
image so yes it can be used for
compassion clearly here also I was
considering to remove it but we have a
weekend after all so let's also talk
about entertainment so this is a there
was a there was a recent model published
which does on censoring of images in
particular we can do uncensored on
censoring of Japanese animations like
like here and reveal some reveal some
interesting details out of out of its
ends or images so ok now I I hope you
are convinced that this is an important
problem so let me let me start with
describing how we can solve it solve it
with deep learning our baseline approach
is based on an immediate paper quite
recent one introducing partial
convolutions so I will tell about these
partial compositions and describe the
whole the whole the whole pipeline of
training in painting model but let me
start with the with the answer to the
question why did learning so there exist
many many classical methods to do the in
painting based on example an example
based in paper in painting or some
patches
and there are also commercial solutions
like adobe of course it's working on it
they work pretty well but also they have
they have some problems so first of all
it's hard to be accepted to nips if you
don't do the deep learning so this is
one reason to do this to do this with
deep learning if you want to go to nibs
but again there are other other reasons
as well so traditional traditional
methods they usually work well for
specific tasks like for example
background in painting when you can just
simply repeat some patches from the
neighborhood to reconstruct the missing
part and they have problems with let's
say hallucinating the the missing
content if we are talking about
challenging tasks like complex objects
or faces for example and deep learning
in contrast does quite well because it
also captures some high level semantics
of images and for example here you can
see this is output from our model where
we we are able to reconstruct face
realistically so if we if you use
traditional methods then probably this
would not look like a face anymore okay
so how to how to do this step by step
first of all we need training data it's
quite this is a good news it's quite
simple to get the data because you can
use any photos actually so we can use
any existing databases like image net
places and so on or any kind of photos
the the the simplest option is to simply
generate some random masks like this one
and try to learn to restore the missing
parts of the of the of the images one
important thing here to mention is that
masks do matter so for example in the
original paper that I mentioned they
proposed a way to to create diversified
masks because they need to be device
diversifies during training as much as
possible so there have different shapes
they cover the different area of areas
of the image and so on and also it is
also well worth considering to use some
specialized masks like masks put on some
face landmarks or on objects I will
mention about it later as well so when
have masks we have dead training data we
have images what we need is a model so
this is a one architecture that we are
building on it's quite popular it's
based on unit unit is a an encoder and
decoder based based architecture used
for example for image segmentation with
many successes the difference the
difference is here is that instead of
using normal brush convolution we are
using so called partial convolution that
they that take takes into account also
masks I will talk about it later in the
next slide so it more or less looks like
that in the encoder part part we get an
image then we use a strided convolution
so the image during the collision is
during the convolution is down ston
scaled then we have budge normalization
and we have let's say another layer here
another layer here and again
convolutions tried it so going here in
the encoder we are decreasing the
resolution of the image and we are
adding some more feature maps to it and
then in the decoder phase we do the
upscaling here we don't use any kind of
transpose convolution on deconvolution
we just use a simple obscure lling here
and based on the neighbor nearest
neighbor approach and then we do the
partial convolution again we also have
the skip connections which can be quite
important in the case of in baiting
because in particular in the last layer
our model can can produce the output
basing on the whole processed image here
and reconstructed image here and also we
can it can take and map the images that
the original Peaks pixels from the
original image in the area which is
outside of the mask so that's more or
less how the actual architecture looks
like let me tell you a few words about
this partial convolution because it's
quite quite simple idea so actually it's
like it's it's a simple convolution but
before doing the convolution we are
multiplying our input patch of the image
with masks so everywhere where the mask
is you
everywhere where the mask is we are
setting the pixels to zeros and then we
are doing the convolution so we are only
considering the that the pixels outside
of the masks and then we are doing that
the normalization because convolution is
based on sums so if we are removing some
elements then we need some normalization
component here which just puts our
activations back to the same level in
irrespective of the mask size so that's
that's the that's the difference with
with a convolution and also one
important thing is that the mask is also
updated so after each layer so after
each layer we are updating the mask if
we if we in one layer we reconstructed
some pixels so when we are from the
point of view of a given pixels when our
receptive field was covering some real
pixels in the place of the previous
layer not not the mask then we are able
to to calculate some activation and then
we are we are updating this removing
mask from this pixel so we are
considering the information that we
don't start it information as a normal
information in the nest subsequent
layers so our a mask is shrinking from
layer to layer and then it usually
disappears in the encoder part okay so
that's that's how how this partial
convolution works okay III want also to
mention about some other architectures
here from from from different papers
there are two recent approaches one from
Adobe and one from from the recent nips
conference so you can see some
modifications here like this
architecture consists of two two parts
first there's the encoder and decoder
part which performs some coarse
reconstruction based only on only on
let's say per pixel reconstruction error
and then there is another part of the
network which performs refinement using
some adversarial loss and generative
generative advertiser Network
framework so this is one possible
extension here another one this is
called multi column convolution
convolutional neural network and we have
different several different streams and
they operate with different filter sizes
so they have different receptive fields
they take the same input and then then
at some point they they are combined
with each other and then the decoding is
is common for all the older streams and
also there are some other layers used
like for example delighted for example
delayed convolution here which is a
modification of convolution with
increased receptive receptive fields
without coming into details because as
we will see later receptive fields are
crucial here in the code in the problem
of in painting so okay that was about
the the architecture so what else do it
of course we need the last function to
optimize to optimize our model
parameters the loss functions in the
loss function in the original paper is
composed on many of many many elements
the first one is based on a simple per
pixel per pixel loss so per pixel
perfect perfect celery construction
error and here we are considering two
two elements one is calculated inside
the mask and another one outside of the
mask so these are two per pixel lost
components we also have something like
perceptual loss which looks at two
images like in this in this part the
ground floors image and the output image
but not in a pixel space but in a higher
level feature space so we are extracting
some features from a pre-training model
like vgg 16 model for example and here
we are calculating this part comparing
these features four four four two
outputs taking the L 1 norm and summing
over these free free layers here in this
case and also we we do this not only for
our output image but also for the in the
the so called composite image which is
composed of
they reconstructed let's say masks and
original pixels put around so like here
in the whole the whole formulation we
put more attention to the in inside of
mask reconstruction error and also days
there is a similar style loss which is
similar to perceptual loss but before
taking the l1 norm we are performing
autocorrelation using some gram matrix
and then after the autocorrelation we we
do the same more or less with some
normalization factor depending on the
size of our feature map taken from you
Gigi which is number of channels - width
of our feature map and these two
components conceptual laws and style
laws there are there are used also in
other problems like style transfer for
example and they are more more in line
with human perception than the simple
reconstruction error from from the
previous component and the last
component here that total variation loss
which is also quite popular in other
applications it is a kind of a penalty
for non smooth output so we are we are
calculating int it we are calculating it
in the area P which is our mask slightly
enlarged slightly enlarge by by a
deletion operation and here we want to
we want the output to be smooth inside
the mask and on the boundary between the
mask and the original original image so
the total loss looks some something like
like this so this is the this is a
simple weighted weighted sum of the
whole all of these components these are
weights taken directly from from from
the paper but you have to keep in mind
that they depend on manufacturers so for
example of course on your data and on
the model that you are used to get to
compute the perceptual style loss so you
have to actually tune the weights to
your particular problem and just monitor
the contribution of each loss component
during the training of course there are
some other loss components positive
which we are working on right now and
trying to add it to our pipeline so as I
mentioned already the adversarial laws
can be helpful here like in this
pipeline in this pipeline we are
training to discriminators one local
looking at the whole picture
one local looking at the at the mask and
one global looking at the whole picture
picture and they are trained to
distinguish between original picture
original images and images generated by
our in painting model and they are
trained together with the generator in a
standard adverbial setup a generative
all visual networks adversarial networks
network setup so this loss can be quite
helpful and another kind of loss is IDM
IDM RF loss introduced in the recent
nips paper which is implicit diversified
Markov random field loss so the name is
quite quite impressive but it's quite
interesting as well so the the we've are
without going into details the idea is
that our reconstructed patches should be
similar to the nearest neighbors of of
their of of these patches in the
original image so we are taking a patch
we are looking for some nearest
neighbour in a feature space so in a
higher-level feature space and then we
want our reconstructed output like grass
in this case look like the real grass
around and also it is constructed the
loss is constructed in a way that this
is the diversified part of the name so
we don't want one patch from the from
the original image to be repeated many
times we want to look for different
patches around all similar but all
different and we want our our output to
be realistic and also diversify it's not
not a simple repeating pattern so we are
also adding this to our model right now
okay so mmm that's more or less the
whole pipeline so then we start we
having these components data model and
loss we train with with standards as
Gidi algorithms like atom for example
the problem is that we in this with this
architecture training time is quite long
so it we've on the whole image net data
for example it takes a week to train
train a reasonable model on a single GPU
machine so let me come right now to to
some practical issues I would like to
share with you here so the first one is
with batch normalization in general
masks cause some problems with with with
personalization because various mask
sizes in general affect activation
distributions and you can observe it as
a several problem so for example you can
observe I'm not sure if you are able to
see it but this kind of artifacts so in
the place of masks you see some non
smooth nurses and some some kind of this
artifacts here so this is an example of
but normalization related artifact and
our problem is that actually our model
treats the boundaries of the image also
as masks and there is a problem if you
train the model in a lower resolution
like 500 per 500 pixels for example and
then as it is a fully convolutional
model you can apply it to a high
resolution but then when the model is
processing the input of the image that
the middle part then it gets some
different of activations because it is
used to seeing a boundary around and
here there is no boundary there is still
image so the activations slightly differ
and in this extreme case when we do the
reconstruction without a mask with a
empty mask we see that here are some
problems with with normalization also
are visible so what we can do about it
first of all and this was proposed in
the original paper that I mentioned we
can use to face training so first we do
the training with bash normalization and
then we we freeze personalization layers
the learn about trainable parameters of
the
raishin in the encoder part and then we
do the fine tuning with the
personalization freeze so the model can
just adapt itself to this different
different activations coming from
different masks that's this one
technique then we observe that also
using diversified masks mask sizes
including also empty masks can can help
with this then of course you can replace
standard batch normalization with some
other normal normalizations like
instance normalization for example which
does generalization not on batches but
on single images this could help but we
haven't tried yet but there are some
papers showing that maybe this could be
a good direction and also it's it's
quite a good idea it can make sense to
remove botulinum ization layers at all
because all of these problems and also
some other problems with with color
coherence mentioned in many many papers
some recent papers remove the bottom
ization completely and it is it can make
sense also because usually with these
kind of models we are training on small
batches so because the model size is
huge
we are training with a size on a single
GPU we can we can train using the batch
size of 4 for example which where the
benefits of personalization are not that
visible so it also works without
personalization actually quite quite
well now I would like to tell you about
several issues which are related to high
resolution painting I know what is high
resolution in painting most of the
papers claim that they do actually high
resolution so they call 512 per her high
512 pixels a high resolution because the
the first works on in painting were on
much much smaller images like 64 / 64
pixels for example but if you are a
smart phone or smart TV manufacturer
manufacturer or like TCL for you high
resolution is at least this one and
there are some problems appear because
moving from this resolution to this
resolution even so only only changing it
twice in a single dimension then we have
4 times longer prediction time
because it is proportional to the number
of pixels and also the memory
consumption is is bigger for this so the
first problem is with CPU and memory on
especially on a mobile devices and what
what can we do about it
of course we can reduce the model size
and train smaller models we can optimize
the model for inference for example we
can use the quantization techniques and
move from higher precision to lower
precision and in our calculations during
drink addiction we can also optimize the
critical the critical parts of the
inference code and we also in our in our
R&D centre we also have a group working
on it so optimizing the convolutions and
so on for mobile devices and we are also
extensively trying to verify the
possibility of launching our models on
mobile devices using the their GP GPU
and DSP digital signal processing units
so for example Qualcomm claims that you
can receive you can get up to eight
times speed-up using a DSP processor so
we are trying with this but it's you
have to know that it's not that simple
actually to use this DSP even if you are
the if we are the phone manufacturer we
need to use some developed developer's
boards it's not that simple to just run
it on on a normal phone and test it yes
and whenever possible probably you
should do that in painting in lower less
Ellucian so you should play with some
crops and rescaling techniques and maybe
super-resolution in your pipeline just
to avoid high level resolution because
it's just expensive and not only it's
expensive but it's it's also difficult
so another problem related to to higher
resolution is big maths problem we call
it big bath problem because when we we
have the same picture in the same image
like here and we try to remove this
mountain there is a mountain here
actually in this result resolution it's
much simpler and it worked works battle
better than in the case of a higher
resolution because here we need to
reconstruct many many more pixels and
it becomes really difficult for a model
so how can we help this basically we
need to increase the receptive fields of
our models we can achieve this by
increasing the size of the convolutional
filters or increasing the number of
layers but again this is expensive and
or we can use some other other kind of
modifications of convolutional layers
like as I mentioned at the Leighton
convolution or we can play with
architectures I also mentioned about it
so we can use some initial course part
of the network and then again refining
Network or this multi stream model with
different sizes of receptive fields from
small to bigger ones and the last
problem related to high resolution I'm
talking about this high resolution
because it's really important from the
practical point of view if you want to
apply it within your product is a
detailed textures issue so what what
what looks dyes in a lower laser Lucien
as I mentioned most models most
publications show results in this
resolution then it becomes unacceptable
if you move to the high resolution so
like here we are reconstructing
reconstructing this part this part of
the image and we clearly see the
difference between the reconstructed
level of details and the texture around
so somehow we need to address it address
it as well so first of all we should
train on higher resolution images at
least on crops of high resolution images
of course and then right now we are
playing as I mention a lot with
different with different loss functions
for example this adversarial loss is
quite promising here because you know
the discriminator trying to distinguish
between real and generated photos it
should learn somehow to detect this
these artifacts these patterns here
inside and then our generator in
generative adversarial training should
learn to fool the discriminator so it
should avoid this kind of patterns so we
believe that this kind of these can help
also this
MRF like a loss seems to be a good idea
to improve here and also as I mentioned
we can combine in painting with some
other techniques like super resolution
for example so we can use either super
resolution as a post-processing or we
can build in super resolution to our
model there's specialized to present for
the in painting that's one idea and
after all if if nothing helps then we
can do some post-processing and it is
also post-processing similar to
traditional techniques so then after
after finishing in baiting we can
somehow analyze our patches and look for
search for some similar patches around
and maybe try to blend the original high
resolution patches with our
reconstructed patches to to make it more
realistic
okay the last the last issue I want to
mention is the issue of mask generation
so in fact you may need some special and
kind of masks and you can use many
techniques like semantic segmentation
object mating silent object detection
face facial and manga detection to
generate some kind of special as much
specialized masks on objects on our
particular elements of faces and what
can you use it for of course for
automatic scene editing it would be a
nice feature if you don't need to draw a
mask you just point an object and it
disappears from your photo so it's quite
quite obvious and also during training
you can use this smart masks to let's
say make your training more in line with
the business application so if you want
to remove objects in your with your
model in your business application then
you can use this this mask at the at
least to fine-tune your model and
similar in the cases of faces if you
want to do the in painting and face
retouching removing some defects of
wrinkles on faces and probably you don't
need to train your model to reconstruct
eyes and nose because that's much much
more difficult
and maybe people don't want to just
reconstruct daily eyes because then they
don't look that similar to them so smart
masks can be also helpful okay let me
come to some some examples some sample
results of our in painting model so here
are two examples we have a nice scene
here on the Left we want to remove some
people and some buildings from that
scene because we don't like them and
this is the the output generated by our
model it looks pretty nice
just remember it's in low resolution so
it's 512 / 4 512 here you have the Levin
dusky family you also have you also have
Clara here and you can just remove it
from the future if you if you don't like
if you prefer levin dosti without clara
for example and this is actually a nice
nice example showing benefit of of deep
learning approach because if you do the
same with a classical approach some
strange things happen because they
usually the classical approach tries to
get some some patches from the
neighborhood and what you can see is I
don't have an example here but you can
see the third the third leg of the van
das key for example in this place so
it's that also shows how how it works
and as I'm not planning to sell it to
you right now so I also show some more
difficult and not that beautiful results
so we still work we are still working on
improving this like here we are removing
the lamp and something and the results
is also different than the neighborhood
details around I mentioned about it and
here we are removing a big object a
table in a quite complex scene and we
get something like this so when you look
your first look may may say okay it's
quite okay there is a floor and and so
on but when you look closer you will
you'll see some strange artifacts and
also this chair here is not
reconstructed perfectly so still there
is a there is a big big place for
improvement here and okay some face
example
facing painting examples actually it
really works well so we trained the face
model on celebrities photos and as you
can see the reconstruction is really
nice so we can reconstruct complex
semantic parts of faces like nose and
eyes in a realistic way so this is
original this is in painted just looking
on into this image and also we can use
this model to to do the face retouching
like in this case we are smoothing the
area under the eyes and removing some
wrinkles and we get a smooth celebrity
face out of your face so that's that's
the idea okay let me summarize quickly
so in painting is a cool and useful
topic and it can be solved with deep
learning as I showed and it worth to
remember that there's a long way or
always from the initial results from the
paper to production if you want to
actually make a function for a
smartphone for example for a smartphone
gallery for example and also I would
like you to remember that we are doing
some pretty cool projects in TCL
research Europe so if you are interested
don't hesitate to visit our webpage or
contact me directly or we have a tenth
one floor up from here where where you
can talk to us at any rate and between
the trade but between the breaks as well
ok thank you very much
[Applause]
関連動画をさらに表示
Intelligenza Artificiale: cos'e' e perche' e' importante che (anche) le donne se ne occupino
Storytelling in PowerPoint: Learn McKinsey’s 3-Step Framework
How to Tips for Creating Social Media Graphics - Graphic Design Tutorial
How I Would Learn Python FAST in 2024 (if I could start over)
ESRGAN Paper Walkthrough
How to Become an AI Product Manager - AI PM Community Session #42
5.0 / 5 (0 votes)