Finetuning Flux Dev on a 3090! (Local LoRA Training)
Summary
TLDRDans cette vidéo, l'orateur explique comment entraîner le modèle Flux Dev en utilisant l'outil AI d'OST. Il souligne l'importance d'avoir 24 Go de VRAM et partage des étapes détaillées sur la configuration du dossier de travail, la préparation du fichier YAML et l'intégration des dépendances. L'utilisateur apprend à configurer des hyperparamètres, des jeux de données et à générer des images de test. Le formateur explique également les cycles d'apprentissage du modèle, tout en donnant des conseils pour éviter un surapprentissage. En conclusion, il mentionne l'importance de la patience et de l'expérimentation dans l'entraînement des modèles.
Takeaways
- 💻 Assurez-vous d'avoir au moins 24 Go de VRAM, idéalement sur une carte graphique comme la 3090 Ti, pour faire fonctionner l'outil d'IA OST.
- 📂 Organisez vos fichiers dans des répertoires dédiés, comme un répertoire de travail 'work' avec un sous-dossier 'train' pour y regrouper tous les fichiers nécessaires.
- ⌨️ Suivez les étapes de configuration ligne par ligne pour installer les dépendances dans l'environnement de commande.
- 📑 Le fichier YAML est crucial pour la configuration des hyperparamètres, assurez-vous de copier et modifier l'exemple 'train_lora_flux_24g.yaml' pour adapter votre formation.
- 🖼️ Préparez correctement votre dataset avec des fichiers de texte et des images correspondants, et configurez la génération de fichiers tensor pour la sauvegarde régulière pendant l'entraînement.
- 🎨 Le modèle peut entraîner des styles différents, ici un style anime, avec des étapes régulières pour évaluer les images générées et ajuster les paramètres.
- ⚙️ L'outil permet l'utilisation de différentes tailles d'images grâce à la fonction de 'buckets', ce qui améliore l'efficacité du processus de formation.
- 🚀 Activez les options de 'shuffle tokens' et 'gradient checkpointing' pour optimiser l'entraînement sur les ressources VRAM limitées.
- 🔑 Configurez un fichier `.env` avec votre clé Hugging Face pour que le modèle puisse accéder aux ressources nécessaires pour la formation.
- ⏳ L'entraînement peut prendre environ 4 heures et vous pouvez observer des cycles d'apprentissage et de régression dans le processus, nécessitant plusieurs ajustements.
Q & A
Quelle est la configuration matérielle requise pour entraîner Flux Dev avec l'outil d'IA d'OST?
-Il est recommandé d'avoir une carte graphique avec 24 Go de VRAM, comme la 3090 Ti, pour que le processus d'entraînement fonctionne correctement sur une machine locale.
Pourquoi l'utilisateur doit-il configurer un fichier YAML?
-Le fichier YAML contient les hyperparamètres nécessaires à l'entraînement, comme le nom du modèle, les dossiers de sortie, et les paramètres de performance. Il est essentiel pour personnaliser et optimiser le processus d'entraînement.
Quels sont les principaux hyperparamètres à ajuster dans le fichier YAML?
-Les principaux hyperparamètres incluent le nom du fichier final, le dossier de sortie, les paramètres de la dimension (rank et dimension), la fréquence de sauvegarde des tensors, et la configuration des ensembles de données, entre autres.
Quel est l'intérêt d'utiliser des images de tailles différentes pendant l'entraînement?
-L'utilisation d'images de différentes tailles permet au modèle de s'entraîner avec des aspects variés, ce qui améliore sa flexibilité et sa capacité à générer des images dans plusieurs résolutions, rendant l'entraînement plus efficace.
Pourquoi l'utilisateur mentionne-t-il l'activation du mode 'gradient checkpointing'?
-L'activation du 'gradient checkpointing' permet de gérer les limitations de VRAM en économisant de la mémoire lors de l'entraînement, ce qui est utile pour les utilisateurs ayant des ressources limitées.
Pourquoi l'utilisateur saute-t-il certaines étapes d'échantillonnage au début de l'entraînement?
-L'utilisateur saute les échantillons initiaux pour éviter de perdre du temps et des ressources, car ces échantillons ne sont pas essentiels dans les premières étapes d'entraînement. Il préfère générer des échantillons à intervalles réguliers plus tard.
Comment l'utilisateur gère-t-il les espaces de stockage pendant l'entraînement?
-L'utilisateur configure le système pour conserver un nombre limité de fichiers de tensor (quatre dans cet exemple), en supprimant les plus anciens pour économiser de l'espace disque.
Pourquoi est-il important de configurer un token Hugging Face dans le processus?
-Le token Hugging Face est nécessaire pour accéder aux modèles et aux ressources externes hébergés sur Hugging Face, ce qui est crucial pour l'entraînement des modèles en utilisant les bibliothèques disponibles.
Quelles sont les étapes à suivre si l'on souhaite reprendre un entraînement interrompu?
-Il est possible de reprendre un entraînement interrompu en utilisant la fonctionnalité de reprise automatique à partir du dernier point de sauvegarde. Cette option peut être configurée dans le fichier YAML.
Quels sont les défis mentionnés lors de l'entraînement de modèles dans des cycles d'apprentissage?
-L'utilisateur observe que le modèle passe par des cycles d'apprentissage et de régression, où il apprend puis oublie certaines caractéristiques. Il recommande de ne pas arrêter l'entraînement trop tôt pour éviter une perte de nuances.
Outlines
🖥️ Introduction à l'entraînement de Flux Dev avec l'outil AI d'OST
Le paragraphe introduit le processus d'entraînement de Flux Dev à l'aide de l'outil AI d'OST. Il met l'accent sur la nécessité de disposer de 24 Go de VRAM, soulignant l'utilisation d'une 3090 Ti pour ce projet. L'auteur partage ses préparatifs, notamment la création d'un répertoire de travail appelé 'train', puis mentionne la progression ligne par ligne du processus et les outils disponibles. L'auteur envisage de mettre certaines tâches en pause pour rendre l'expérience plus fluide pour les utilisateurs.
📝 Configuration et modification du fichier YAML pour l'entraînement
Ce paragraphe explique comment configurer un fichier YAML pour l'entraînement. L'auteur détaille les paramètres importants, comme la modification du nom du fichier et des dossiers de sortie. Il aborde également le changement des hyperparamètres tels que le 'rank' et la 'dimension' pour une meilleure optimisation. D'autres ajustements incluent la fréquence de sauvegarde des fichiers tensoriels et la préparation du dataset pour l'entraînement. Le paragraphe se termine par une explication sur l'activation de certaines fonctionnalités comme le 'shuffle token' et la résolution multiple pour les images.
🔐 Configuration des clés Hugging Face pour l'entraînement
Ce paragraphe explique comment configurer une clé Hugging Face pour appeler les modèles nécessaires à l'entraînement. L'auteur montre comment créer un fichier '.env' dans le dossier racine de l'outil AI d'OST et y insérer la clé secrète de Hugging Face. Cela permet à l'environnement de télécharger et utiliser les modèles pré-entraînés. Une note supplémentaire est ajoutée concernant les ajustements nécessaires à apporter sous Windows pour faire fonctionner correctement la commande YAML.
🚀 Démarrage et progression de l'entraînement du modèle
Le paragraphe décrit le processus de démarrage de l'entraînement du modèle après avoir configuré l'environnement. L'auteur explique qu'il a déjà téléchargé les modèles nécessaires pour accélérer le processus, évitant ainsi un long temps d'attente. Il souligne l'importance des tailles d'image multiples dans l'entraînement et évoque l'utilisation des différentes résolutions pour améliorer la flexibilité du modèle. Finalement, il montre les progrès après 200 étapes, soulignant que bien que le modèle commence à adopter le style visuel, il reste encore du chemin à parcourir.
🎨 Observation des résultats intermédiaires et conseils d'entraînement
Dans ce dernier paragraphe, l'auteur présente les résultats obtenus après 600 étapes d'entraînement, soulignant l'évolution notable du style visuel. Il recommande de laisser l'entraînement se poursuivre sur plusieurs milliers d'étapes pour obtenir des résultats plus affinés et éviter le surapprentissage. L'auteur conclut en promettant une vidéo plus détaillée pour analyser les cycles d'apprentissage et fournir des comparaisons côte à côte, tout en rappelant que le modèle sera disponible dans quelques jours après avoir terminé les tests finaux.
Mindmap
Keywords
💡VRAM
💡Commande prompt
💡Dépendances
💡Fichier YAML
💡Hyperparamètres
💡Modèle SDXL
💡Entraînement d'un modèle
💡Trigger word
💡Shuffle token
💡Hugging Face
Highlights
Ensure you have 24GB of VRAM, preferably using a 3090 Ti for optimal training performance.
Set up a working directory and populate it with necessary files, using a dedicated 'train' folder.
Open command prompt and navigate to the right directory to start setting up the training environment.
Edit the YAML file located in the 'config' folder to set hyperparameters and customize training settings.
Change key YAML settings like output folder, rank, dimension, and sample image generation intervals.
Set up the environment by adding your Hugging Face token for model access, saved in the '.env' file.
Adjust Python commands for compatibility with your system, such as removing the 'python3' call on Windows.
Ensure bucket resizing is enabled to accommodate images of different aspect ratios during training.
Monitor initial steps closely, as the model starts to adapt to the anime-style visuals within 200 steps.
At 600 steps, the model demonstrates significant progress in learning style and coloration.
Flex models can learn and regress in cycles, so don’t stop the training too early to avoid missing nuances.
For style models, consider extending training up to 3400 steps to avoid under-training and overfitting.
Use the checkpoint-saving feature wisely, setting up sample generations every 200 steps to monitor progress.
Flexibility with different sizes and aspect ratios improves the overall quality of the training outputs.
The entire training process may take about 4 hours, and results should be carefully tested afterward.
Transcripts
okay today we're going to talk about how
to train flux Dev uh using ost's AI
toolkit um just make sure you can follow
the requirements here uh it's really
important that you have 24 gigs of vram
I'm running this in a 3090 ti so
obviously it can handle or it can be
handled pretty well on a local machine
but it can't be handled on a local
machine that doesn't have enough vam
there's a lot of Cool Tools here too
we're just going to focus on this for
today okay so next we are going to open
up command prompt as you can see here
then we just have to point towards the
right directory so we're going to go
through that step by step um first I
have everything in a work directory
that's sort of my like over arcing one
and then I've set up a folder for this
that I'm going to populate and pull this
all into called train
as you can see
[Music]
here and then we're just going to go
through line by line as you can see um
there might be like a better way to do
this I am not really a coder so I'm just
going to do it in the way that's the
most intuitive for me um so just trying
to populate each one of these and some
of them might take a little longer in
which case I'll just uh pause and you
know um come come back to it it won't
look like anything happen but that is
maybe a little easier on you guys so I'm
going to probably do that for this one
let's
see yeah this one's taking a little bit
more
time but it's all in all goes pretty
fast um these last ones are going to
take some time so I'll definitely be
pausing during these yeah I'm going to
past them anyway there might be a better
way again to do that sorry if there is
all right so here we
go
and we're going to let these
dependencies load and yep I'm just
speeding forward to the future running
this as well I this all sort of come
together and now we have everything sort
of uh pulling pulling in so that is good
it's important obviously to make sure
all these different dependencies are set
up the correct way and now it should be
ready to get started but there are a
couple more things that we have to do to
set up uh the training so we're going to
do that
next um let's see in particular we're
going to focus on setting up the yaml
file with the
settings okay now we're going to set up
that yaml file which has really like all
of the detail that we need to give in
terms of hyper parameters so you're
going to find that in the configure
folder there's an example it's called
train Laura flux 24 gig you want to copy
that one and make a new version of it
which is great and I'm just going to
rename it
um
[Music]
train well I really wanted to reflect
what I'm training so I'm training a
fantasma anime Laura which is based on a
Laura I've trained previously for sdxl
so we're going to call it that and then
we're actually going to put it into
the config folder so let me just move
these oops r a little too far and drop
that right in great um we're going to go
ahead and edit that so we want to go
through line by line so first and
foremost I'm probably going to get rid
of some of this extra script um but it
might be useful for you
uh you do need to change the name to the
file name that you're going to want so
go ahead and do that first uh I think
I'm going to call it well I could call
it the same thing but actually in this
case I'm calling it
fantasma
anime and that's going to be the name of
the final safe tensor we're going to
delete some more of this text some
things we don't really want to change
either um so let's go through here
delete
that you're going to want to I I want to
change my output folder so output is a
subfolder and I'm just adding a
subfolder which it will
populate I'm going to uncomment this uh
just so that we get those performance
stats in the future I like them um it's
up to you whether you want to keep them
or
not uh we're keeping the Cuda device the
same and we're going to change the
trigger word in this case I don't know
if the trigger word makes a difference
or not um I haven't actually seen a
massive difference but it does
definitely recognize the trigger word so
you know worth worth
investigating um I'm going to change
this is uh linear and linear Alpha are
your rank and dimension so I'm changing
them both to 232 it's good to keep them
the same
number
um flat 16 it's F I'm going to do every
200 steps here that basically I want the
save tensor file to save and also some
sample images to generate every 200
steps um I'm fine with four this is
basically every time there's more than
four save tensor files that's going to
delete the oldest and then scrolling
down a
little uh yep um so you want to make
sure that you prep your data set
correctly so this is really good
information to
have um
basically you just need the text prompt
captions to be in a text file that has
the same file name for it to recogniz
really standard uh same for most
training
programs um I set up a folder for this
uh that I'm want to call
from here and then that's
fine a lot of this stuff we don't really
need to change because it's kind of of
inherent to the training unless you're
like super Advanced or just trying to
figure some stuff out I am going to turn
on shuffle token I actually didn't try
this last time but I'm really interested
in trying it this
time so I'm going to do
that
um yep that is
useful and something that you want to
keep the same cash L disc um multiple
resolutions great uh it actually solves
a lot of issues what they had with sdxl
and only be able to train in one one
thing I like about this trainer as well
is that you can train in different sizes
uh so step suggust that step count I
don't think 4,000 is needed for this one
I'm going to go for
3,000 but I do think that people do
under Trin this model quite a bit um no
text encoder training for this I haven't
really seen any negative aspects of that
as you would have with like sdxl um uh
let's see I'm going to keep it
balanced
[Music]
there yeah clean that up there great um
gradient
checkpointing
yeah I do not have a ton of vram so I'm
going to do
that noise scheduler
yep uh keep all that the same I'm not
messing with thear rate right now but
you could try out different learning
rates I do want to skip the samples
because it takes up a lot of time in the
beginning and I don't necessarily need
them it's good though if you're trying
to compare the like the distance that
your training has
gone so I'm going to do
that uh we'll keep that all the
same let's see here all this looks fine
all important to keep I'm not going to
actually found that I didn't need to do
the low vam mode even though I am right
had 24 gigs found that actually was
working just fine so I'm not turning
that
on you can though if you're if you're
concerned um also you can restart from
your last training uh stop if you need
to this is the thing I want to do every
200 just like I have for my
uh my save I don't want my save to be
off balance with my other
stuff and then I'm going to keep these
as the prompts I do want some prompts
um I again I just don't know if the
trigger makes a difference at this stage
it very well might um but I haven't seen
a significant difference one way or
another I've seen a little difference
but not one that's like super
significant um but I'll I'll keep the
trigger in a couple of these because
it'll be nice to see the difference and
compare them and everything else is just
about the same I'll add my name into the
metadata but otherwise this is like very
straightforward I'm I might change the
prompts for the sample images if I was
training like a character or something
like that for for this I I'll keep it
the same so go ahead here yeah this
looks good um we'll go ahead and save
this and this one's good to go okay and
there is one more really important step
that we need to take before we really
dive in and that is we do need to uh set
things up so we can
call from hugging face so we're going to
do these next steps in a second but um
you need to be able to call from Hing
face so I'm going to show you how to set
this up obviously I'm not going to show
you my secret key but anyway you just
want to create a new file you want to
put it into the root which is the AI
toolkit
folder
um okay yes so you want to put it in the
AI toolkit
folder and you want to save it as uh
Point EnV under all files that's all you
need to call it and then you're going to
take this right here HF token equals
your key
here I'm going to open that file up drop
it in here and then you need to replace
that end bit so I'm not going to show
you mine obviously but just replace that
with a read file or read um token from
hiking face and that will do it and then
you just save it it's already in the
route so it is going to be able to pull
the models for
you okay now we're going to go ahead and
set up our model to start training so I
did have to change a couple things here
so I'm just going to show you what I did
for that um this is all set up we have
an environment set so we're in good
shape uh but I am just going to open a
note just so we can look at this and
make some small adjustments because I
noticed that maybe it's just a windows
thing but I had to change the command a
little bit for it to run so uh let's
open in an empty let's see got a lot of
these open right now
[Music]
so let's open an empty one
here we do want this name by the way for
the call that we're going to do so uh
good to know that
and drop this in so we're going to get
rid of that three after python uh
because for some reason it do not really
recognize and in this case I need to
change it to yam and then you just need
to put in the name of that yaml file
that you did before and you're good to
go so it's pretty straightforward now
this is where you would hit run I
actually have it set up separately as an
instance because I had been training
earlier and I just don't want to run
through all of the model loading again
and downloading because it takes a while
so when you run it for the first time
it's going to take a little bit of time
but don't worry about it I'm going to
show from where my model uh is set up
from which is going to be like right
from the start so just explaining that a
little though just so you're not
confused all righty then so you can see
my fresh environment here um I set it up
I put it in the v environment I dropped
that that um command that we had just
put together and now it is starting
everything up so normally if you did
this it would be like a very long
initial process because it's going to be
pulling in you know the flux model and
all of the other models that it needs to
be able to train over in this case I'm
saving a little bit of a headache
because uh I've already done that once
before so I really just am starting from
the um it does download like Shar cards
and stuff but generally most of what I
need is already set up so uh it's
loading the checkpoint pieces it's
starting to put everything together um
so we'll look at this for a moment but
obviously it's sort of like General
running through stuff so yeah so oh one
nice thing is that it does uh do buckets
of different sizes so you can use
different size images which I think is
super awesome and I love that ostr has
included that um so definitely take
advantage of that I'm so happy that I
don't have to crop every image and this
is sort of like how some of these newer
models are trained to begin with so it
it does make a really big difference in
the overall training process to be able
to show images of different um aspect
ratios so yeah um other than that I'm
going to give a little pause and here we
go it has started so and these are all
going to also
save um
well here you can see the actual data
set that it's pulling from I actually
should show you really what we're
looking at in terms of images so if you
take a look here I'll just adjust
this we're going to make these extra
large so this is the style that I'm
training just to give you a little bit
of a point of reference um kind of an
anime style I really like how anime is
done on flux and uh with a lot of like
color explosive elements and and so we
are going to jump ahead here and take a
look at 200 steps in because I think
this is a great sort
of um visualization of how quickly
things train and also just sort of like
where we're at as you can see we are
generating some example images here so
I'll pop into those in a second to show
you they're in that folder that we were
talking about earlier
so let me pull those up here so we did
put those right into the output folder
that we looked at a little earlier and
there it actually populated the folders
We need so now we can see them in
samples so this is our first stop um you
can see it is starting to pick up a
little bit of that cartoony style some
of the coloration certainly the clouds
but it's the explosions as well but it's
not really learned much yet so it's just
a good example of how much Flex does
learn quite quickly um with that in mind
it still has quite a ways to go and I do
find that people stop a little bit too
early at times as well so I'm going to
show you one more stop and then I
probably won't go all the way to the end
but do a second video for that
part okay now we're sort of jumping
further into the future this is at 600
steps so you can see it's already
generated a few different sets of
samples and I what I really like to
highlight here is how much we've adopted
the style but I typically let this run
3000 because it does go through Cycles
so I definitely recommend taking your
time with it don't get too ahead of
yourself because you can stop earlier
than you should and find that actually
you lose a lot of nuance I'm going to do
a second video and probably like a
little nicer video cuz I was just trying
to get this one out quickly but
basically what I've observed is that it
goes through these cycles of learning
and regressing um that you know I've
done a few trainings now that really
stop at different steps and I've watched
how it seems to pick up a lot of details
sort of superficial details early on and
then it starts to look a little bit
overfit and then it kind of forgets a
bunch of details and then it'll do the
same thing over and over again and
actually we've gotten uh with the help
of ostas I was able to get a training
all the way to
3400 steps and it was still looking
really nice it wasn't overfit that was
for a character so I think a style would
actually be even more flexible in some
ways um but just to keep that in mind
that in fact like you can kind of go
pretty far with it so anyway you you'll
find the files here um and they do get
saved uh however many steps you've set
in your EML I 200 once it gets to five
because I set four as like the total
limit it'll delete one of the older ones
which is a nice way to save on space as
well um but yeah I had thought about
doing this video all the way to the end
but I do think that sort of observations
on training deserves an extra sort of
focused video with some side by-side
examples so I'm going to work on that uh
and it'll probably be better than this
one cuz I don't us edit the videos it's
usually Timothy who's on the pr crafted
team who does a lot of the video editing
but I really wanted to get this out
quickly because I know people are trying
to pick up training as quickly as
possible and I thought it would be nice
to just have a little guide you should
be able to adopt this guide for this in
most ways but anyway uh I appreciate
appreciate you guys uh following along
and yeah I'm I'm excited to train more
as you can see we're just only just
above 600 steps here but I will release
this model in a couple days um and it's
you know it's it's quite a bit of time
and energy I'm guessing this will take
about 4 hours to train total and then
I'll want to test it out um you'd set it
up just as you'd set up any model really
so yeah thank you guys so much and have
a great one
Browse More Related Video
5.0 / 5 (0 votes)