SDXL Local LORA Training Guide: Unlimited AI Images of Yourself

All Your Tech AI
2 Jan 202417:09

Summary

TLDRThis video tutorial guides you through the process of training your own Lora (Low Resource Adaptation) model to generate personalized images using Stable Diffusion XL. It covers installing the required software, preparing image datasets, setting up training parameters, and evaluating the resulting Lora files. The tutorial emphasizes the importance of carefully curating training images, adjusting hyperparameters, and leveraging techniques like blip captioning to improve model performance. With this knowledge, viewers can create customized AI image models tailored to their specific needs, whether for personal or professional use.

Takeaways

  • 🤖 The video provides a tutorial on training a LoRA (Low-Rank Adaptation) model for Stable Diffusion XL, which allows generating personalized images using a smaller fine-tuned model file.
  • 📂 To train a LoRA model, you need a dataset of high-quality images showcasing various angles, lighting conditions, and expressions of the subject you want to generate.
  • 🔧 The tutorial walks through installing and configuring the Kya SS software, which provides a user interface for training LoRA models.
  • 📝 The instance prompt used for training should be a celebrity or object that has existing data in Stable Diffusion XL, rather than a random string.
  • 🖼️ Regularization images, which are varied and high-resolution images of the class you're training (e.g., men or women), are essential to prevent model overfitting.
  • ⚙️ Key training parameters include learning rate, network rank (which affects detail and file size), and optimizer settings like AdaFactor.
  • 🕰️ Training a LoRA model can take several hours, depending on the number of source images and the available GPU resources.
  • 🔍 Once trained, the LoRA files can be used with Stable Diffusion XL to generate personalized images by including the appropriate prompt trigger.
  • 📊 The tutorial demonstrates using the XYZ Plot feature to visualize and compare the outputs of different LoRA files, revealing a trade-off between flexibility and precision.
  • 💬 The video encourages viewers to experiment with their own models and provide feedback or ask questions in the comments section.

Q & A

  • What is the core topic of the video script?

    -The core topic of the video script is a tutorial on how to train a LoRA (Low-Rank Adaptation) model to instruct Stable Diffusion, a generative AI model, to generate images of specific people or objects.

  • What software is used for training the LoRA model?

    -The video script mentions using a software called 'KyaaSS' as the user interface to set up the parameters and train the LoRA model.

  • What is the importance of the 'instance prompt' when training a LoRA model?

    -The 'instance prompt' is crucial as it provides guidance to the model on what to create. Using a celebrity or object with many existing images in Stable Diffusion XL as the instance prompt can lead to better results than using a random string of characters.

  • How many images are typically required for training a decent LoRA model?

    -According to the script, a decent LoRA model can be trained with as few as 10 images, but the author tends to use anywhere from 10 to 20 images for typical training.

  • What is the purpose of 'regularization images' in the training process?

    -Regularization images help prevent model overfitting. Hundreds of varied, high-resolution images representing the class of images being trained (e.g., men or women) should be used for regularization.

  • What is the purpose of 'blip captioning' in the training process?

    -Blip captioning uses AI to scan the training images, generate text captions describing the images, and create a text file with keywords associated with the images. This helps Stable Diffusion understand the context and keywords related to the training data.

  • How does the 'network rank' parameter affect the trained LoRA model?

    -A higher network rank value increases the detail retained in the trained LoRA model, resulting in better color, lighting, and overall detail. However, it also increases the size of the generated LoRA files.

  • How can the trained LoRA model be used with Stable Diffusion image generation software?

    -To use the trained LoRA model in Stable Diffusion image generation software like Automatic1111, the user should select the Stable Diffusion XL base 1.0 model, add the trained LoRA file(s) to the prompt, and include the specific keyword trigger set during training.

  • What is the purpose of the 'XYZ plot' functionality mentioned in the script?

    -The 'XYZ plot' functionality allows the user to generate a series of images side by side, each using a different trained LoRA file. This helps compare the results and find the best balance between flexibility and precision for the desired application.

  • What is the typical tradeoff between flexibility and precision when selecting a trained LoRA file?

    -According to the script, the LoRA files trained earlier in the process (lower numbers) tend to provide more flexibility, allowing for more artistic freedom and variation from the training data. In contrast, the later LoRA files (higher numbers) produce results with higher precision and closer resemblance to the original training images.

Outlines

00:00

🤖 Setting Up Kya SS for Stable Diffusion XL Training

This paragraph provides step-by-step instructions for installing and setting up Kya SS, a user interface for training Stable Diffusion XL models. It covers downloading the Kya SS repository, running the setup, configuring the environment, and launching the GUI application.

05:00

🖼️ Preparing Images and Captions for Model Training

The paragraph explains how to source and prepare images for training a custom Stable Diffusion XL model. It discusses the importance of having a diverse set of images with varying lighting, facial expressions, and backgrounds. It also covers the use of BLIP captioning to generate text descriptions of the images, which aids in the training process.

10:01

⚙️ Configuring Training Parameters for Stable Diffusion XL

This paragraph delves into the various training parameters for Stable Diffusion XL within the Kya SS interface. It covers settings such as batch size, epochs, learning rates, optimization algorithms, resolution, and network ranks. The paragraph provides guidance on selecting appropriate values for these parameters based on the available hardware and desired model performance.

15:03

🔍 Evaluating and Selecting the Best Trained Model

The final paragraph demonstrates how to evaluate and compare the trained Stable Diffusion XL models using the XYZ plot feature. It explains how to generate a grid of images using different trained models (Loras) and analyze the trade-off between flexibility and precision. The paragraph suggests strategies for selecting the most suitable trained model based on the desired balance of these factors.

Mindmap

Keywords

💡Stable Diffusion XL

Stable Diffusion XL is a generative AI model capable of producing high-quality images from textual prompts. It is an extension of the original Stable Diffusion model, with improved capabilities for generating intricate and detailed visuals. The video focuses on training a custom model variation called a 'LoRA' (Low-Rank Adaptation) using Stable Diffusion XL, allowing users to fine-tune the model for specific subjects or styles. This is highlighted throughout the script as the main topic being demonstrated.

💡LoRA (Low-Rank Adaptation)

A LoRA (Low-Rank Adaptation) is a type of fine-tuning technique used with large language models or generative AI models like Stable Diffusion XL. It involves training a small set of parameters that can be combined with the base model to produce customized outputs tailored to specific domains or styles. In the context of the video, the process of training a LoRA using personal images is explained in detail, enabling users to create custom models for generating images of themselves or any other subject.

💡Kya SS

Kya SS is a software application mentioned in the video that provides a user interface for training and configuring LoRA models with Stable Diffusion XL. It simplifies the process of setting up the training parameters, preparing the data, and generating the final LoRA files. The video walks through the installation and setup of Kya SS as a crucial step in the LoRA training process.

💡Training Data

Training data refers to the set of images used to fine-tune the Stable Diffusion XL model through the LoRA training process. The video emphasizes the importance of gathering a diverse set of high-quality images of the subject, varying in lighting conditions, facial expressions, and backgrounds. Having a comprehensive training data set is crucial for creating a flexible and accurate LoRA model capable of generating realistic and varied outputs.

💡Prompts

Prompts are textual descriptions used as input for generative AI models like Stable Diffusion XL. In the context of the video, prompts are used to guide the model in generating images based on the trained LoRA. The video demonstrates how to construct effective prompts by combining elements from existing images with the trained LoRA file name. Prompts play a vital role in controlling the output of the generative model and achieving desired results.

💡VRAM

VRAM (Video Random Access Memory) is a type of memory specifically designed for handling graphics processing tasks. In the video, VRAM is mentioned as a crucial factor in determining the training parameters and resolution settings for the LoRA training process. Models like Stable Diffusion XL are computationally intensive and require substantial VRAM resources, especially when working with high-resolution images or larger training datasets.

💡Network Rank

Network Rank is a training parameter mentioned in the video that controls the level of detail and complexity retained in the trained LoRA model. Higher Network Rank values result in more intricate and detailed outputs, capturing nuances in color, lighting, and textures. However, this also increases the file size of the generated LoRA. The video recommends finding a balance between Network Rank and available VRAM resources to achieve the desired level of detail.

💡Blip Captioning

Blip Captioning is a process mentioned in the video that uses artificial intelligence to analyze the training images and generate textual captions describing the content and context of each image. These captions are then used as additional input during the LoRA training process, helping the model better understand the semantic relationships between the visual elements and textual descriptions. Blip Captioning enhances the model's ability to interpret and generate relevant outputs based on the prompts.

💡Seed

In the context of generative AI models like Stable Diffusion XL, a seed is a numeric value used to initialize the random number generator responsible for generating the output images. By setting a specific seed value, the video demonstrates how to consistently reproduce the same image across multiple generations, allowing for side-by-side comparisons of the different trained LoRA models. Controlling the seed is crucial for evaluating and comparing the performance of various LoRA configurations.

💡XYZ Plot

The XYZ Plot is a visualization technique demonstrated in the video, which allows for side-by-side comparisons of multiple generated images using different LoRA models. By setting the X-axis to represent the different trained LoRA files, the video showcases how a single input prompt can generate varying outputs based on the applied LoRA. This visual comparison helps users analyze the trade-offs between flexibility and precision across the trained LoRA models, aiding in the selection of the most suitable model for their needs.

Highlights

Stability AI released Stable Diffusion XL, a generative AI model that can generate stunning images of just about anything.

Demonstrates how to train a LoRA (Low Rank Adaptation) to instruct Stable Diffusion on how an object, person, or anything else should look.

Explains how to install and set up Kya SS, a user interface for training and setting up parameters for LoRA models.

Highlights the importance of sourcing a variety of high-resolution images with different lighting, facial expressions, and backgrounds for training.

Recommends using a celebrity or well-known person as the instance prompt to guide the model with existing data in Stable Diffusion XL.

Explains the use of regularization images to prevent model overfitting and the need for varied, high-resolution images of the class being trained.

Demonstrates how to perform BLIP captioning to add context and keywords to the training images.

Provides guidance on setting various training parameters, such as batch size, epochs, learning rate, and network rank.

Explains the trade-off between model flexibility and precision based on the epoch or LoRA file used.

Shows how to load and use the trained LoRA files in Automatic1111 or other Stable Diffusion image generators.

Demonstrates how to generate images using the trained LoRA files and compare the results side by side using the XYZ plot script.

Offers insights on finding the right balance between model flexibility and precision based on the LoRA file chosen.

Encourages viewers to try training their own models and share their experiences or questions in the comments.

Provides a step-by-step guide on installing dependencies, setting up the environment, and configuring training parameters for LoRA model training.

Emphasizes the importance of having a gaming PC with a capable GPU to train LoRA models efficiently.

Transcripts

play00:00

stability AI released stable diffusion

play00:02

XL it's a generative AI model that can

play00:05

generate stunning images of just about

play00:07

anything today I'm going to show you how

play00:09

to train a Laura or low rank adaptation

play00:12

it's a small file that can be trained to

play00:14

instruct stable diffusion on how an

play00:16

object person or really anything should

play00:19

look you can find hundreds of

play00:20

pre-trained luras on civid AI for

play00:22

everything from animals people and even

play00:25

not safe for work content but what if we

play00:27

want to train your own Laura to create

play00:30

Imes of yourself or really anyone else

play00:32

for that matter if you have a gaming PC

play00:35

you can probably train your own model to

play00:37

produce high quality stunning images

play00:39

just like these to get started we're

play00:41

going to install a piece of software

play00:43

called Kya SS let's get into it Kya SS

play00:46

provides a user interface for you to

play00:47

train and set up the parameters for your

play00:50

own models to get started if you have a

play00:52

Windows machine you're going to need

play00:54

python installed get and visual studio

play00:57

now if you're already running stable

play00:59

diffusion or any other generative AI

play01:01

tools on your system you probably have

play01:03

these installed already if not check out

play01:05

one of my other tutorial videos that

play01:07

step you through the process the first

play01:09

step is going to your command prompt

play01:11

typing CMD will fire that off and then

play01:13

go to the directory where you want to

play01:15

install Coya make sure you've got plenty

play01:17

of drive space here it is going to be

play01:19

pretty intensive from there we're just

play01:21

going to copy the get clone command from

play01:23

the Coya install Direction that's going

play01:25

to clone the repo to a directory called

play01:27

coore SS once that's done change to the

play01:31

K SS directory and run the setup.bat

play01:34

file since we're performing a new

play01:36

installation we're going to select

play01:37

option one this part's going to take

play01:39

just a few minutes it's installing a

play01:41

whole bunch of files and dependencies so

play01:43

just sit back and relax once that's done

play01:45

it's going to ask you which computer

play01:46

environment you're running this machine

play01:48

or Amazon AWS select this machine if you

play01:51

have a multi-cpu or a multi-gpu system

play01:55

you can select one of those options

play01:56

otherwise no distributed training if you

play01:59

want to run your training on CPU only

play02:02

absolutely not it'd be terribly slow

play02:04

especially if you have a good GPU you

play02:06

wish to optimize your script with torch

play02:08

Dynamo no do you want to use deep speed

play02:11

what gpus by ID should be used for

play02:13

training this machine select all which

play02:16

is the default now it's going to ask you

play02:18

if you want to run fp16 or

play02:20

bf16 and this is going to depend on your

play02:23

GPU if you have an RTX 30 or 40 series

play02:26

GPU you're going to select bf16 if you

play02:30

have an older GPU in your system you're

play02:32

going to select fp16 now at this point

play02:34

the installation is done so you can

play02:36

either go to your COA SS directory and

play02:39

double click on the guey dobat file or

play02:42

select option five if you still have

play02:43

your command prompt open as you can see

play02:45

that's going to start an entirely new

play02:47

command prompt it's going to load

play02:48

everything you need in order to start

play02:50

the guey as you can see here on the

play02:52

right hand side of the screen go and

play02:53

close that old command prompt at this

play02:55

time we're done with that at this point

play02:57

it's time to Source some images to train

play02:59

your model with and the important thing

play03:01

here is you want a lot of different

play03:03

variations of lighting facial expression

play03:06

and backgrounds this is going to make

play03:07

the model more flexible in the end for

play03:09

example if you wanted to train a model

play03:11

for Margo Robbie you might go to Google

play03:13

images and perform a Google image search

play03:15

it's important to get really high

play03:16

resolution images so I tend to go to

play03:18

tools and then filter by size for large

play03:21

you also don't want images that have

play03:23

multiple people in it just something

play03:25

else to keep in mind in my case I just

play03:27

broke out my phone and took a whole

play03:29

bunch of cell of myself around the house

play03:31

and outside with a bunch of different

play03:33

facial expressions and lighting

play03:35

environments to get a good mix of

play03:36

pictures and as far as the number of

play03:38

images is concerned you can really train

play03:40

a decent model with as few as 10 images

play03:43

I tend to get anywhere from 10 to 20 for

play03:45

my typical training now normally at this

play03:47

point if you've trained a stable

play03:49

diffusion checkpoint model before you

play03:51

know that you'd normally do image

play03:52

cropping which sets all the images to a

play03:55

fixed size with stable diffusion XL

play03:57

training that's actually unnecessary and

play04:00

in fact you're going to get better

play04:01

results if you don't do that now we'll

play04:03

go back to the UI for COA we're going to

play04:05

open the Laura tab and if you happen to

play04:07

be one of my patreon subscribers I've

play04:09

got this Json file that has all the

play04:12

configurations that you need in order to

play04:14

start training your model it's called

play04:16

sdxl Koya SS Laura config and it's set

play04:19

up for an RTX 3090 that's the GPU I've

play04:22

got running in this machine now at this

play04:24

point you could just get going you

play04:25

wouldn't have any additional

play04:26

configuration to do but of course I'm

play04:28

going to step you through everything so

play04:30

even if you aren't one of my patreon

play04:32

subscribers you can still do this start

play04:33

to finish now under Laura you're going

play04:35

to see this tools section click on that

play04:38

and then click on dream Booth Laura

play04:40

folder preparation this used to be under

play04:42

a tab called deprecated but newer

play04:44

versions have properly move this under

play04:46

the tools section the first thing we're

play04:48

going to fill out here is the instance

play04:50

prompt this is super important and most

play04:53

people get this wrong most tutorials out

play04:55

there are going to tell you to use a

play04:56

random string of characters or something

play04:58

super unique

play05:00

in order to train your model but what

play05:02

this really does is give you less

play05:03

flexibility and worse results in fact

play05:06

even if you're training a model of

play05:08

yourself you want to use another

play05:09

celebrity or someone else some other

play05:12

object that has a lot of images already

play05:14

in stable defusion XL so it knows what

play05:18

to create it has sort of guidance

play05:20

parameters if you will so what I like to

play05:22

do is go to a celebrity lookalike site

play05:25

just drag and drop one of your images

play05:27

there this is going to tell you which

play05:28

celebrity you some resemble or that your

play05:30

character somewhat resembles that's a

play05:32

really good starting place to train your

play05:34

model in my case I'm actually going to

play05:36

type Tom Cruz for the class prompt I'm

play05:38

going to type man because that's what

play05:40

I'm training if you were training a cat

play05:42

or a dog or a woman you'd set that as a

play05:45

class prompt for training images this is

play05:47

where you're going to set the directory

play05:49

that you saved all of your images that

play05:50

you collected for your character earlier

play05:52

now on to regularization images this

play05:55

helps prevent model overfitting and

play05:57

you're going to want hundreds of images

play05:58

here that represent the class of images

play06:01

you're trying to train in our case men

play06:04

these need to be varied and they need to

play06:05

be very high resolution in fact I've

play06:08

already got these uploaded to my patreon

play06:10

for both men and women if you're not one

play06:12

of my patreon subscribers that's okay

play06:15

you can find these databases online or

play06:17

you could even create one yourself for

play06:19

repeats I always go with 20 this is the

play06:21

number of times that each image is going

play06:23

to be trained in the model the final

play06:25

thing you want to do is set the Final

play06:27

Destination training directory this is

play06:29

where all of your output data including

play06:31

the Laura files created by the training

play06:33

are going to end up now we just click

play06:35

the button to copy info to folders tab

play06:37

now when you go back to training and

play06:39

folders everything's already pre-filled

play06:42

the only thing you want to make sure

play06:43

that you update is the model output name

play06:45

so in my case I'm going to use my name

play06:47

which is Brian orlo it and I'm going to

play06:50

do a hyphen tomore Cruz this is so for

play06:54

this Laura every single one of the files

play06:56

is going to have my name so I know what

play06:58

the subject was that I was training and

play07:00

then it's going to have Tom Cruz which

play07:02

is what I know I need to use for my

play07:04

prompt now we need to head over to the

play07:05

utilities tab we're going to have to do

play07:08

some captioning specifically we're going

play07:10

to click on blip captioning blip

play07:12

captioning just uses artificial

play07:13

intelligence to scan the images look

play07:16

them over and then create a text file

play07:18

that has all the keywords that are

play07:20

associated with how the images look it's

play07:22

how you get stable defusion to

play07:24

understand the context and the words and

play07:26

keywords that are associated with each

play07:28

of the images that you're using as your

play07:30

training data go and select your Source

play07:32

images directory make sure the file

play07:34

extension is. txt and for the prefix to

play07:37

add to the blip caption we're going to

play07:39

go ahead and use the celebrity name that

play07:40

we had earlier so in this case tomore

play07:43

Cruz go and click on caption images and

play07:46

then if you load up that command prompt

play07:48

you're going to see that it's going

play07:49

through and it's starting to caption

play07:50

each individual image once it's done go

play07:52

ahead and go back to your training image

play07:54

directory and you're going to see that

play07:55

you're going to have the image and then

play07:57

this text caption file next to to it

play07:59

when you load that you're going to see

play08:01

that it's going to have tomore Cruz or

play08:04

whatever celebrity name you chose to use

play08:06

and it says a bald man sitting in a room

play08:08

with a lamp above him that's not bad but

play08:10

you could add some additional context

play08:13

here that's just going to help with the

play08:14

training so you could say wearing a gray

play08:17

polo shirt with two buttons load up

play08:19

another one here and it says Tom Cruz a

play08:22

man in a blue shirt is taking a selfie

play08:24

not bad but just take a few minutes and

play08:26

go through here and add any additional

play08:28

little context ual elements that you

play08:30

want to add to each of these images just

play08:31

to give it a little more detail about

play08:33

what's going on once you're done

play08:34

renaming all the text files make sure

play08:36

that you select them all copy and then

play08:39

paste them into your Source image

play08:40

directory so that they're there with

play08:42

your initial training images that you're

play08:44

going to use for this now we're going to

play08:46

jump into the mey part we're going to go

play08:47

to Laura training parameters if you're

play08:50

using my config file everything's

play08:52

already set up for you otherwise let's

play08:53

go through each of these train batch

play08:55

size I usually leave this at one this is

play08:57

the number of images it's going to train

play09:00

at one given time it's going to use more

play09:01

vram but it will speed up to training if

play09:04

you do have this at a higher number I

play09:06

just happened to leave it at one for

play09:08

Epoch this is basically a way to split

play09:10

up the training remember how we set 20

play09:12

repeats earlier for each image if we

play09:14

leave Epoch set to one it means we're

play09:16

going to train 20 steps for each Source

play09:19

image and be done if we have 10 Source

play09:21

images we're going to train 200 steps

play09:24

now if we set this to 10 epics we'll

play09:26

train 2,000 steps and so on I typically

play09:29

just set this to 10 the other thing I do

play09:31

is save every n EPO I set this to one

play09:35

this means we're going to end up with 10

play09:37

Lura files at the end of this we're

play09:39

going to be able to go through and kind

play09:40

of pick which one looks the best and

play09:42

there's a trade-off that I'll talk about

play09:44

a little bit later between flexibility

play09:46

and precision and I'll show you how to

play09:48

find the best model that you get for

play09:50

caption extension make sure that's set

play09:52

to. txt that's the same thing we used

play09:54

when we did the blip captioning earlier

play09:56

for mix precision and save Precision if

play09:58

you're if you're running an RTX 3090

play10:00

like I am or a 40 series GPU you're

play10:03

going to set that to bf16 for both

play10:05

otherwise fp16 I don't touch the number

play10:08

of CPU threads per core and then I do

play10:11

make sure that I check both cach latence

play10:14

and cash latence to disk this is just

play10:15

going to speed things up a little bit

play10:17

more for the learning rate scheduler

play10:19

we're going to select constant and then

play10:21

Optimizer is Ada Factor now it's really

play10:24

important that if you select Ada a

play10:25

factor you've got a copy from the

play10:27

description in the video these extra

play10:30

optimized parameters you need scale

play10:32

parameter false relative step false and

play10:35

warm-up initialization false learning

play10:38

rate man there is a lot of information

play10:40

about this online and I trained probably

play10:43

30 different Laura files just trying

play10:45

different settings and I found that it

play10:47

really doesn't make that big a

play10:49

difference by having a slightly Higher

play10:50

Learning rate it just means there's

play10:52

going to be a little bit more difference

play10:53

between the different epochs or the Lura

play10:56

files that are generated and you might

play10:58

find that you have a a higher quality

play10:59

more trained model at a lower Epoch my

play11:03

case I typically go with

play11:05

0.00003 and I suggest you do the same

play11:08

for learning rate warm-up we leave that

play11:10

at zero max resolution should be 1,24 by

play11:14

1,24 this is the default resolution for

play11:17

stable diffusion XL now you can save a

play11:20

little bit of vram here if you don't

play11:21

have an RTX 390 or a 4090 that has 24 GB

play11:25

of vram you could set something like 768

play11:28

by 768 it's going to save on vram and

play11:32

let you train a little bit more

play11:33

efficiently but the trade-off is the

play11:35

images that are generated are going to

play11:37

be a little bit lower quality enable

play11:39

buckets make sure this is selected this

play11:41

is very important this ensures that you

play11:43

don't have to crop your images it

play11:45

doesn't matter what resolution vertical

play11:47

and horizontal they are it's going to

play11:49

take those in and use them just fine for

play11:51

both text encoder learning rate and unet

play11:54

learning rate you're going to set both

play11:55

of those to

play11:57

0.3 just like we set the learning rate

play12:00

to earlier check the box for no half vae

play12:03

and then Network rank this one's a

play12:05

little bit more interesting Network rank

play12:07

increases the detail retained in the

play12:09

model but it also increases the size of

play12:11

the Lura file that's generated higher

play12:14

numbers here are going to have more

play12:15

detail better color better lighting so I

play12:17

usually go with 256 for the network Rank

play12:20

and one for the network Alpha but just

play12:23

be aware that every one of your Lura

play12:25

files that's generated by this and

play12:26

there's going to be 10 for this

play12:28

particular run are going to be about 1.7

play12:31

GB in size now if you don't have much

play12:33

vram or you want smaller files you could

play12:36

train it something like 32 for Network

play12:38

Rank and 16 for Network Alpha it's going

play12:41

to be a little bit lower quality model

play12:43

but that might be okay depending on what

play12:45

you're training and what you're getting

play12:46

after we're going to scroll back up to

play12:48

the top and click on Advance scroll down

play12:50

and make sure that you check gradient

play12:52

checkpointing cross attention should be

play12:54

set to X formers and then don't upscale

play12:57

bucket resolution just leave it how it

play12:59

is and once you click on start training

play13:01

you can go ahead and pull back open that

play13:02

command propped window it's going to

play13:04

show you the progress in my case since

play13:07

I'm doing a video right now I can't run

play13:08

this at the same time because it'll use

play13:10

too much memory but it uses about 20

play13:13

gigabyt of vram and it's going to take

play13:15

about 10 hours because I have 40 files

play13:17

if you have 10 images that you're

play13:19

training on it should take about a third

play13:21

of that time and keep in mind too if you

play13:23

tune down the resolution or you change

play13:25

some of those other settings that I

play13:26

mentioned earlier you can get this to

play13:28

run on about 12 GB of vram although 16

play13:31

is probably preferable once all of

play13:34

that's done it's time to load this up in

play13:36

automatic 1111 or whatever other stable

play13:38

diffusion image generator software you

play13:41

use first thing we want to do is make

play13:42

sure we select stable diffusion XL base

play13:45

1.0 and then we're going to figure out a

play13:47

prompt that we're going to insert I like

play13:49

to go over to civid AI just find a

play13:51

random image that I think looks cool and

play13:53

use that as sort of a baseline so we'll

play13:55

go ahead and select the prompt here

play13:57

we'll paste it into the prompt and and

play13:58

then at the very end we're going to go

play14:00

down to this Laura section we're going

play14:02

to find Brian love it Tom Cruz we're

play14:05

going to find the first one and we're

play14:06

just going to go 1 2 3 4 five 6 7 8 9 10

play14:11

that's right we're going to select all

play14:13

10 Laura files and I'll show you why you

play14:15

can see that that adds each of the Lura

play14:17

files up here to The Prompt we're going

play14:19

to go ahead and rightclick those and

play14:21

click on copy and then we're going to

play14:23

delete all but the first one now the

play14:25

really important thing is we need our

play14:26

keyword trigger here for our Laura so so

play14:29

where it says close portrait of a man

play14:31

I'm going to delete a and I'm going to

play14:32

say close portrait of Tom Cruz man since

play14:35

that's our prompt trigger that we set

play14:37

when we trained this model I'm going to

play14:39

crank up the sampling steps to about 30

play14:41

then we're just going to generate an

play14:43

image that's not too bad for a first

play14:44

attempt but I'd really like something

play14:46

that's facing forward a little bit

play14:48

easier to see so I'm going to go ahead

play14:49

and generate another one also be sure to

play14:51

set your resolution to 1,24 by 1,24 this

play14:55

is a pretty good result nice looking

play14:57

image this is what the first first Laura

play14:59

that we trained now I'm going to show

play15:00

you how you can see the comparison side

play15:02

by side of all 10 of your Laura files so

play15:04

down here we're going to select this

play15:06

sort of recycle symbol that's going to

play15:08

set the seed to this image seed so that

play15:11

every single one of the images we

play15:12

generate here in a minute is going to

play15:14

use the same seed for script in the

play15:16

dropdown we're going to select XYZ plot

play15:19

and for the X Type we're going to select

play15:21

prompt Sr now remember we going to have

play15:23

you copy all those Laura files that we

play15:26

had up in the prompt earlier now you're

play15:28

going to paste those into the X values

play15:30

over here and in between each one you're

play15:33

going to put a comma until all of them

play15:35

have a comma except for the very last

play15:37

Laura file what it's going to do is it's

play15:39

going to look for this very first Laura

play15:41

image and then it's going to replace

play15:43

that in the prompt with the different

play15:44

Laura files for every single image

play15:46

generation so when we click on generate

play15:48

it's going to actually generate 10

play15:50

images horizontally along the xaxis and

play15:52

each one's going to use a different Lura

play15:54

file you'll see here in a minute all

play15:56

right that takes just a few minutes but

play15:57

as you can see it produces all the

play15:59

images side by side it's a really cool

play16:02

way to just kind of take a look at all

play16:03

the different Laura files and the images

play16:05

that they generate the other cool thing

play16:07

here is you can sort of think about this

play16:09

as a Continuum from the left to the

play16:11

right the Laura files on the right hand

play16:13

side of this are going to produce really

play16:16

high quality kind of really close to the

play16:18

original image you can even see that

play16:21

elements of the shirt sort of change it

play16:22

looks more like my gray polo that I was

play16:25

wearing as it gets more to the right on

play16:27

the opposite end of the spectrum on on

play16:28

the left here you're going to get Lura

play16:29

files that are really flexible so you

play16:31

might be able to get more artistic

play16:33

Freedom if you wanted to create an anime

play16:35

version of the person or some sort of

play16:37

crazy hair something else that didn't

play16:39

really exist in the training data you

play16:41

might have better luck using one of the

play16:42

Lura files from farther to the left than

play16:45

those farther to the right for me I

play16:47

typically find a good mixture balance of

play16:50

flexibility and precision is somewhere

play16:51

around Laura three or four for my own

play16:55

personal uses let me know what you find

play16:57

when you train your own models also let

play16:59

me know in the comments below if you

play17:00

have any questions or anything else I

play17:02

can help out with otherwise I'm Brian

play17:04

love it this is all your Tech AI we'll

play17:06

check you next time thank you so much

Rate This

5.0 / 5 (0 votes)

Related Tags
AITutorialStable DiffusionGenerative AIImage GenerationCustomizationSelf-PortraitsTechnical GuideMachine LearningCoaRA SS