SDXL Local LORA Training Guide: Unlimited AI Images of Yourself
Summary
TLDRThis video tutorial guides you through the process of training your own Lora (Low Resource Adaptation) model to generate personalized images using Stable Diffusion XL. It covers installing the required software, preparing image datasets, setting up training parameters, and evaluating the resulting Lora files. The tutorial emphasizes the importance of carefully curating training images, adjusting hyperparameters, and leveraging techniques like blip captioning to improve model performance. With this knowledge, viewers can create customized AI image models tailored to their specific needs, whether for personal or professional use.
Takeaways
- 🤖 The video provides a tutorial on training a LoRA (Low-Rank Adaptation) model for Stable Diffusion XL, which allows generating personalized images using a smaller fine-tuned model file.
- 📂 To train a LoRA model, you need a dataset of high-quality images showcasing various angles, lighting conditions, and expressions of the subject you want to generate.
- 🔧 The tutorial walks through installing and configuring the Kya SS software, which provides a user interface for training LoRA models.
- 📝 The instance prompt used for training should be a celebrity or object that has existing data in Stable Diffusion XL, rather than a random string.
- 🖼️ Regularization images, which are varied and high-resolution images of the class you're training (e.g., men or women), are essential to prevent model overfitting.
- ⚙️ Key training parameters include learning rate, network rank (which affects detail and file size), and optimizer settings like AdaFactor.
- 🕰️ Training a LoRA model can take several hours, depending on the number of source images and the available GPU resources.
- 🔍 Once trained, the LoRA files can be used with Stable Diffusion XL to generate personalized images by including the appropriate prompt trigger.
- 📊 The tutorial demonstrates using the XYZ Plot feature to visualize and compare the outputs of different LoRA files, revealing a trade-off between flexibility and precision.
- 💬 The video encourages viewers to experiment with their own models and provide feedback or ask questions in the comments section.
Q & A
What is the core topic of the video script?
-The core topic of the video script is a tutorial on how to train a LoRA (Low-Rank Adaptation) model to instruct Stable Diffusion, a generative AI model, to generate images of specific people or objects.
What software is used for training the LoRA model?
-The video script mentions using a software called 'KyaaSS' as the user interface to set up the parameters and train the LoRA model.
What is the importance of the 'instance prompt' when training a LoRA model?
-The 'instance prompt' is crucial as it provides guidance to the model on what to create. Using a celebrity or object with many existing images in Stable Diffusion XL as the instance prompt can lead to better results than using a random string of characters.
How many images are typically required for training a decent LoRA model?
-According to the script, a decent LoRA model can be trained with as few as 10 images, but the author tends to use anywhere from 10 to 20 images for typical training.
What is the purpose of 'regularization images' in the training process?
-Regularization images help prevent model overfitting. Hundreds of varied, high-resolution images representing the class of images being trained (e.g., men or women) should be used for regularization.
What is the purpose of 'blip captioning' in the training process?
-Blip captioning uses AI to scan the training images, generate text captions describing the images, and create a text file with keywords associated with the images. This helps Stable Diffusion understand the context and keywords related to the training data.
How does the 'network rank' parameter affect the trained LoRA model?
-A higher network rank value increases the detail retained in the trained LoRA model, resulting in better color, lighting, and overall detail. However, it also increases the size of the generated LoRA files.
How can the trained LoRA model be used with Stable Diffusion image generation software?
-To use the trained LoRA model in Stable Diffusion image generation software like Automatic1111, the user should select the Stable Diffusion XL base 1.0 model, add the trained LoRA file(s) to the prompt, and include the specific keyword trigger set during training.
What is the purpose of the 'XYZ plot' functionality mentioned in the script?
-The 'XYZ plot' functionality allows the user to generate a series of images side by side, each using a different trained LoRA file. This helps compare the results and find the best balance between flexibility and precision for the desired application.
What is the typical tradeoff between flexibility and precision when selecting a trained LoRA file?
-According to the script, the LoRA files trained earlier in the process (lower numbers) tend to provide more flexibility, allowing for more artistic freedom and variation from the training data. In contrast, the later LoRA files (higher numbers) produce results with higher precision and closer resemblance to the original training images.
Outlines
🤖 Setting Up Kya SS for Stable Diffusion XL Training
This paragraph provides step-by-step instructions for installing and setting up Kya SS, a user interface for training Stable Diffusion XL models. It covers downloading the Kya SS repository, running the setup, configuring the environment, and launching the GUI application.
🖼️ Preparing Images and Captions for Model Training
The paragraph explains how to source and prepare images for training a custom Stable Diffusion XL model. It discusses the importance of having a diverse set of images with varying lighting, facial expressions, and backgrounds. It also covers the use of BLIP captioning to generate text descriptions of the images, which aids in the training process.
⚙️ Configuring Training Parameters for Stable Diffusion XL
This paragraph delves into the various training parameters for Stable Diffusion XL within the Kya SS interface. It covers settings such as batch size, epochs, learning rates, optimization algorithms, resolution, and network ranks. The paragraph provides guidance on selecting appropriate values for these parameters based on the available hardware and desired model performance.
🔍 Evaluating and Selecting the Best Trained Model
The final paragraph demonstrates how to evaluate and compare the trained Stable Diffusion XL models using the XYZ plot feature. It explains how to generate a grid of images using different trained models (Loras) and analyze the trade-off between flexibility and precision. The paragraph suggests strategies for selecting the most suitable trained model based on the desired balance of these factors.
Mindmap
Keywords
💡Stable Diffusion XL
💡LoRA (Low-Rank Adaptation)
💡Kya SS
💡Training Data
💡Prompts
💡VRAM
💡Network Rank
💡Blip Captioning
💡Seed
💡XYZ Plot
Highlights
Stability AI released Stable Diffusion XL, a generative AI model that can generate stunning images of just about anything.
Demonstrates how to train a LoRA (Low Rank Adaptation) to instruct Stable Diffusion on how an object, person, or anything else should look.
Explains how to install and set up Kya SS, a user interface for training and setting up parameters for LoRA models.
Highlights the importance of sourcing a variety of high-resolution images with different lighting, facial expressions, and backgrounds for training.
Recommends using a celebrity or well-known person as the instance prompt to guide the model with existing data in Stable Diffusion XL.
Explains the use of regularization images to prevent model overfitting and the need for varied, high-resolution images of the class being trained.
Demonstrates how to perform BLIP captioning to add context and keywords to the training images.
Provides guidance on setting various training parameters, such as batch size, epochs, learning rate, and network rank.
Explains the trade-off between model flexibility and precision based on the epoch or LoRA file used.
Shows how to load and use the trained LoRA files in Automatic1111 or other Stable Diffusion image generators.
Demonstrates how to generate images using the trained LoRA files and compare the results side by side using the XYZ plot script.
Offers insights on finding the right balance between model flexibility and precision based on the LoRA file chosen.
Encourages viewers to try training their own models and share their experiences or questions in the comments.
Provides a step-by-step guide on installing dependencies, setting up the environment, and configuring training parameters for LoRA model training.
Emphasizes the importance of having a gaming PC with a capable GPU to train LoRA models efficiently.
Transcripts
stability AI released stable diffusion
XL it's a generative AI model that can
generate stunning images of just about
anything today I'm going to show you how
to train a Laura or low rank adaptation
it's a small file that can be trained to
instruct stable diffusion on how an
object person or really anything should
look you can find hundreds of
pre-trained luras on civid AI for
everything from animals people and even
not safe for work content but what if we
want to train your own Laura to create
Imes of yourself or really anyone else
for that matter if you have a gaming PC
you can probably train your own model to
produce high quality stunning images
just like these to get started we're
going to install a piece of software
called Kya SS let's get into it Kya SS
provides a user interface for you to
train and set up the parameters for your
own models to get started if you have a
Windows machine you're going to need
python installed get and visual studio
now if you're already running stable
diffusion or any other generative AI
tools on your system you probably have
these installed already if not check out
one of my other tutorial videos that
step you through the process the first
step is going to your command prompt
typing CMD will fire that off and then
go to the directory where you want to
install Coya make sure you've got plenty
of drive space here it is going to be
pretty intensive from there we're just
going to copy the get clone command from
the Coya install Direction that's going
to clone the repo to a directory called
coore SS once that's done change to the
K SS directory and run the setup.bat
file since we're performing a new
installation we're going to select
option one this part's going to take
just a few minutes it's installing a
whole bunch of files and dependencies so
just sit back and relax once that's done
it's going to ask you which computer
environment you're running this machine
or Amazon AWS select this machine if you
have a multi-cpu or a multi-gpu system
you can select one of those options
otherwise no distributed training if you
want to run your training on CPU only
absolutely not it'd be terribly slow
especially if you have a good GPU you
wish to optimize your script with torch
Dynamo no do you want to use deep speed
what gpus by ID should be used for
training this machine select all which
is the default now it's going to ask you
if you want to run fp16 or
bf16 and this is going to depend on your
GPU if you have an RTX 30 or 40 series
GPU you're going to select bf16 if you
have an older GPU in your system you're
going to select fp16 now at this point
the installation is done so you can
either go to your COA SS directory and
double click on the guey dobat file or
select option five if you still have
your command prompt open as you can see
that's going to start an entirely new
command prompt it's going to load
everything you need in order to start
the guey as you can see here on the
right hand side of the screen go and
close that old command prompt at this
time we're done with that at this point
it's time to Source some images to train
your model with and the important thing
here is you want a lot of different
variations of lighting facial expression
and backgrounds this is going to make
the model more flexible in the end for
example if you wanted to train a model
for Margo Robbie you might go to Google
images and perform a Google image search
it's important to get really high
resolution images so I tend to go to
tools and then filter by size for large
you also don't want images that have
multiple people in it just something
else to keep in mind in my case I just
broke out my phone and took a whole
bunch of cell of myself around the house
and outside with a bunch of different
facial expressions and lighting
environments to get a good mix of
pictures and as far as the number of
images is concerned you can really train
a decent model with as few as 10 images
I tend to get anywhere from 10 to 20 for
my typical training now normally at this
point if you've trained a stable
diffusion checkpoint model before you
know that you'd normally do image
cropping which sets all the images to a
fixed size with stable diffusion XL
training that's actually unnecessary and
in fact you're going to get better
results if you don't do that now we'll
go back to the UI for COA we're going to
open the Laura tab and if you happen to
be one of my patreon subscribers I've
got this Json file that has all the
configurations that you need in order to
start training your model it's called
sdxl Koya SS Laura config and it's set
up for an RTX 3090 that's the GPU I've
got running in this machine now at this
point you could just get going you
wouldn't have any additional
configuration to do but of course I'm
going to step you through everything so
even if you aren't one of my patreon
subscribers you can still do this start
to finish now under Laura you're going
to see this tools section click on that
and then click on dream Booth Laura
folder preparation this used to be under
a tab called deprecated but newer
versions have properly move this under
the tools section the first thing we're
going to fill out here is the instance
prompt this is super important and most
people get this wrong most tutorials out
there are going to tell you to use a
random string of characters or something
super unique
in order to train your model but what
this really does is give you less
flexibility and worse results in fact
even if you're training a model of
yourself you want to use another
celebrity or someone else some other
object that has a lot of images already
in stable defusion XL so it knows what
to create it has sort of guidance
parameters if you will so what I like to
do is go to a celebrity lookalike site
just drag and drop one of your images
there this is going to tell you which
celebrity you some resemble or that your
character somewhat resembles that's a
really good starting place to train your
model in my case I'm actually going to
type Tom Cruz for the class prompt I'm
going to type man because that's what
I'm training if you were training a cat
or a dog or a woman you'd set that as a
class prompt for training images this is
where you're going to set the directory
that you saved all of your images that
you collected for your character earlier
now on to regularization images this
helps prevent model overfitting and
you're going to want hundreds of images
here that represent the class of images
you're trying to train in our case men
these need to be varied and they need to
be very high resolution in fact I've
already got these uploaded to my patreon
for both men and women if you're not one
of my patreon subscribers that's okay
you can find these databases online or
you could even create one yourself for
repeats I always go with 20 this is the
number of times that each image is going
to be trained in the model the final
thing you want to do is set the Final
Destination training directory this is
where all of your output data including
the Laura files created by the training
are going to end up now we just click
the button to copy info to folders tab
now when you go back to training and
folders everything's already pre-filled
the only thing you want to make sure
that you update is the model output name
so in my case I'm going to use my name
which is Brian orlo it and I'm going to
do a hyphen tomore Cruz this is so for
this Laura every single one of the files
is going to have my name so I know what
the subject was that I was training and
then it's going to have Tom Cruz which
is what I know I need to use for my
prompt now we need to head over to the
utilities tab we're going to have to do
some captioning specifically we're going
to click on blip captioning blip
captioning just uses artificial
intelligence to scan the images look
them over and then create a text file
that has all the keywords that are
associated with how the images look it's
how you get stable defusion to
understand the context and the words and
keywords that are associated with each
of the images that you're using as your
training data go and select your Source
images directory make sure the file
extension is. txt and for the prefix to
add to the blip caption we're going to
go ahead and use the celebrity name that
we had earlier so in this case tomore
Cruz go and click on caption images and
then if you load up that command prompt
you're going to see that it's going
through and it's starting to caption
each individual image once it's done go
ahead and go back to your training image
directory and you're going to see that
you're going to have the image and then
this text caption file next to to it
when you load that you're going to see
that it's going to have tomore Cruz or
whatever celebrity name you chose to use
and it says a bald man sitting in a room
with a lamp above him that's not bad but
you could add some additional context
here that's just going to help with the
training so you could say wearing a gray
polo shirt with two buttons load up
another one here and it says Tom Cruz a
man in a blue shirt is taking a selfie
not bad but just take a few minutes and
go through here and add any additional
little context ual elements that you
want to add to each of these images just
to give it a little more detail about
what's going on once you're done
renaming all the text files make sure
that you select them all copy and then
paste them into your Source image
directory so that they're there with
your initial training images that you're
going to use for this now we're going to
jump into the mey part we're going to go
to Laura training parameters if you're
using my config file everything's
already set up for you otherwise let's
go through each of these train batch
size I usually leave this at one this is
the number of images it's going to train
at one given time it's going to use more
vram but it will speed up to training if
you do have this at a higher number I
just happened to leave it at one for
Epoch this is basically a way to split
up the training remember how we set 20
repeats earlier for each image if we
leave Epoch set to one it means we're
going to train 20 steps for each Source
image and be done if we have 10 Source
images we're going to train 200 steps
now if we set this to 10 epics we'll
train 2,000 steps and so on I typically
just set this to 10 the other thing I do
is save every n EPO I set this to one
this means we're going to end up with 10
Lura files at the end of this we're
going to be able to go through and kind
of pick which one looks the best and
there's a trade-off that I'll talk about
a little bit later between flexibility
and precision and I'll show you how to
find the best model that you get for
caption extension make sure that's set
to. txt that's the same thing we used
when we did the blip captioning earlier
for mix precision and save Precision if
you're if you're running an RTX 3090
like I am or a 40 series GPU you're
going to set that to bf16 for both
otherwise fp16 I don't touch the number
of CPU threads per core and then I do
make sure that I check both cach latence
and cash latence to disk this is just
going to speed things up a little bit
more for the learning rate scheduler
we're going to select constant and then
Optimizer is Ada Factor now it's really
important that if you select Ada a
factor you've got a copy from the
description in the video these extra
optimized parameters you need scale
parameter false relative step false and
warm-up initialization false learning
rate man there is a lot of information
about this online and I trained probably
30 different Laura files just trying
different settings and I found that it
really doesn't make that big a
difference by having a slightly Higher
Learning rate it just means there's
going to be a little bit more difference
between the different epochs or the Lura
files that are generated and you might
find that you have a a higher quality
more trained model at a lower Epoch my
case I typically go with
0.00003 and I suggest you do the same
for learning rate warm-up we leave that
at zero max resolution should be 1,24 by
1,24 this is the default resolution for
stable diffusion XL now you can save a
little bit of vram here if you don't
have an RTX 390 or a 4090 that has 24 GB
of vram you could set something like 768
by 768 it's going to save on vram and
let you train a little bit more
efficiently but the trade-off is the
images that are generated are going to
be a little bit lower quality enable
buckets make sure this is selected this
is very important this ensures that you
don't have to crop your images it
doesn't matter what resolution vertical
and horizontal they are it's going to
take those in and use them just fine for
both text encoder learning rate and unet
learning rate you're going to set both
of those to
0.3 just like we set the learning rate
to earlier check the box for no half vae
and then Network rank this one's a
little bit more interesting Network rank
increases the detail retained in the
model but it also increases the size of
the Lura file that's generated higher
numbers here are going to have more
detail better color better lighting so I
usually go with 256 for the network Rank
and one for the network Alpha but just
be aware that every one of your Lura
files that's generated by this and
there's going to be 10 for this
particular run are going to be about 1.7
GB in size now if you don't have much
vram or you want smaller files you could
train it something like 32 for Network
Rank and 16 for Network Alpha it's going
to be a little bit lower quality model
but that might be okay depending on what
you're training and what you're getting
after we're going to scroll back up to
the top and click on Advance scroll down
and make sure that you check gradient
checkpointing cross attention should be
set to X formers and then don't upscale
bucket resolution just leave it how it
is and once you click on start training
you can go ahead and pull back open that
command propped window it's going to
show you the progress in my case since
I'm doing a video right now I can't run
this at the same time because it'll use
too much memory but it uses about 20
gigabyt of vram and it's going to take
about 10 hours because I have 40 files
if you have 10 images that you're
training on it should take about a third
of that time and keep in mind too if you
tune down the resolution or you change
some of those other settings that I
mentioned earlier you can get this to
run on about 12 GB of vram although 16
is probably preferable once all of
that's done it's time to load this up in
automatic 1111 or whatever other stable
diffusion image generator software you
use first thing we want to do is make
sure we select stable diffusion XL base
1.0 and then we're going to figure out a
prompt that we're going to insert I like
to go over to civid AI just find a
random image that I think looks cool and
use that as sort of a baseline so we'll
go ahead and select the prompt here
we'll paste it into the prompt and and
then at the very end we're going to go
down to this Laura section we're going
to find Brian love it Tom Cruz we're
going to find the first one and we're
just going to go 1 2 3 4 five 6 7 8 9 10
that's right we're going to select all
10 Laura files and I'll show you why you
can see that that adds each of the Lura
files up here to The Prompt we're going
to go ahead and rightclick those and
click on copy and then we're going to
delete all but the first one now the
really important thing is we need our
keyword trigger here for our Laura so so
where it says close portrait of a man
I'm going to delete a and I'm going to
say close portrait of Tom Cruz man since
that's our prompt trigger that we set
when we trained this model I'm going to
crank up the sampling steps to about 30
then we're just going to generate an
image that's not too bad for a first
attempt but I'd really like something
that's facing forward a little bit
easier to see so I'm going to go ahead
and generate another one also be sure to
set your resolution to 1,24 by 1,24 this
is a pretty good result nice looking
image this is what the first first Laura
that we trained now I'm going to show
you how you can see the comparison side
by side of all 10 of your Laura files so
down here we're going to select this
sort of recycle symbol that's going to
set the seed to this image seed so that
every single one of the images we
generate here in a minute is going to
use the same seed for script in the
dropdown we're going to select XYZ plot
and for the X Type we're going to select
prompt Sr now remember we going to have
you copy all those Laura files that we
had up in the prompt earlier now you're
going to paste those into the X values
over here and in between each one you're
going to put a comma until all of them
have a comma except for the very last
Laura file what it's going to do is it's
going to look for this very first Laura
image and then it's going to replace
that in the prompt with the different
Laura files for every single image
generation so when we click on generate
it's going to actually generate 10
images horizontally along the xaxis and
each one's going to use a different Lura
file you'll see here in a minute all
right that takes just a few minutes but
as you can see it produces all the
images side by side it's a really cool
way to just kind of take a look at all
the different Laura files and the images
that they generate the other cool thing
here is you can sort of think about this
as a Continuum from the left to the
right the Laura files on the right hand
side of this are going to produce really
high quality kind of really close to the
original image you can even see that
elements of the shirt sort of change it
looks more like my gray polo that I was
wearing as it gets more to the right on
the opposite end of the spectrum on on
the left here you're going to get Lura
files that are really flexible so you
might be able to get more artistic
Freedom if you wanted to create an anime
version of the person or some sort of
crazy hair something else that didn't
really exist in the training data you
might have better luck using one of the
Lura files from farther to the left than
those farther to the right for me I
typically find a good mixture balance of
flexibility and precision is somewhere
around Laura three or four for my own
personal uses let me know what you find
when you train your own models also let
me know in the comments below if you
have any questions or anything else I
can help out with otherwise I'm Brian
love it this is all your Tech AI we'll
check you next time thank you so much
Ver más vídeos relacionados
![](https://i.ytimg.com/vi/t-aFl3xjn5w/hq720.jpg)
Explained simply: How does AI create art?
![](/_next/static/media/default-video-cover.615af72e.png)
Is Adobe Firefly better than Midjourney and Stable Diffusion?
![](https://i.ytimg.com/vi/t1x5ZTI-tq4/hq720.jpg)
Text to Image generation using Stable Diffusion || HuggingFace Tutorial Diffusers Library
![](https://i.ytimg.com/vi/ghvdXDbhFv8/hq720.jpg)
How to set Scale Bar using ImageJ software | Microscope | Imaging #howto #image
![](https://i.ytimg.com/vi/a9dWI7z1ax4/hq720.jpg)
Mastering Leonardo AI: A Comprehensive Step-by-Step Tutorial for Beginners
![](https://i.ytimg.com/vi/ZLbVdvOoTKM/hq720.jpg)
How to Build an LLM from Scratch | An Overview
5.0 / 5 (0 votes)