Fine-tuning Tiny LLM on Your Data | Sentiment Analysis with TinyLlama and LoRA on a Single GPU
Summary
TLDRIn this tutorial video, Vin shows how to fine-tune a Tiny L language model on a custom cryptocurrency news dataset. He covers preparing the data, setting the correct parameters for the tokenizer and model, training the model efficiently with Warp using a Google Colab notebook, evaluating model performance, and doing inference with the fine-tuned model. The goal is to predict the subject and sentiment for new crypto articles. With only 40 minutes of training data, the fine-tuned Tiny L model achieves promising results - around 79% subject accuracy and over 90% sentiment accuracy.
Takeaways
- 📚 Vin explains the process of fine-tuning a TinyLM model on a custom dataset, beginning with dataset preparation and proceeding through training to evaluation.
- 🔧 Key steps include setting up tokenizer and model parameters, using a Google Colab notebook, and evaluating the fine-tuned model on a test set.
- 🌐 The tutorial includes a complete text guide and a Google Colab notebook link, available in the ML expert bootcamp section for Pro subscribers.
- 🤖 TinyLM is preferred over larger models like 7B parameter models due to its smaller size, faster inference, training speed, and suitability for older GPUs.
- 📈 Fine-tuning is essential for improving model performance, especially when prompt engineering alone doesn't suffice, and for adapting the model to specific data or privacy needs.
- 📊 For dataset preparation, a minimum of 1,000 high-quality examples is recommended, and consideration of task type and token count is crucial.
- 🔍 The tutorial uses the 'Crypton News+' dataset from Kaggle, focusing on sentiment and subject classification of cryptocurrency news.
- ⚙️ Vin demonstrates using Hugging Face's datasets library and tokenizer configurations, emphasizing the importance of padding tokens in avoiding repetition.
- 🚀 The training process involves using WaRT (Weighted Activation Regularization of Training) to train a small adapter model over the base TinyLM model.
- 📝 Evaluation results show high accuracy in predicting subjects and sentiments from the news dataset, validating the effectiveness of the fine-tuning process.
Q & A
What model is used for fine-tuning in the video?
-The Tiny Lama model, which is a 1.1 billion parameter model trained on over 3 trillion tokens.
What techniques can be used to improve model performance before fine-tuning?
-Prompt engineering can be used before fine-tuning to try to improve model performance. This involves crafting the prompts fed into the model more carefully without changing the model itself.
How can Warp be used during fine-tuning?
-Warp allows only a small model called an adapter to be trained on top of a large model like Tiny Lama. This reduces memory requirements during fine-tuning.
What data set is used for fine-tuning in the video?
-A cryptocurrency news data set containing titles, text, sentiment analysis labels, and subjects for articles is used.
How can the data set be preprocessed?
-The data can be split into train, validation, and test sets. The distributions of labels can be analyzed to check for imbalances. A template can be designed for formatting the inputs.
What accuracy is achieved on the test set?
-An accuracy of 78.6% is achieved on subject prediction on the test set. An accuracy of 90% is achieved on sentiment analysis on the test set.
How can the fine-tuned model be deployed?
-The adapted model can be merged into the original Tiny Lama model and pushed to Hugging Face Hub. Then it can be deployed behind an API for inferences in production.
What batch size is used during training?
-A batch size of 4 is used with gradient accumulation over 4 iterations to simulate an effective batch size of 16.
How are only the model completions used to calculate loss?
-A special collator is used that sets the labels for all tokens before the completion template to -100 to ignore them in the loss calculation.
How can the model repetitions be reduced?
-The repeated subject and sentiment lines could be removed from the completion template to improve quality.
Outlines
🚀 Fine-tuning a Tiny Language Model on Custom Data
Vin introduces a tutorial on fine-tuning a tiny language model (LM) on a custom dataset, starting from data preparation to training and evaluation, using a Google Colab notebook. He highlights the advantages of using smaller models like TinyLM over larger models for faster inference and training, and the significance of fine-tuning for improved performance on specific tasks. The tutorial promises a step-by-step guide for ML Expert Pro subscribers, emphasizing the need for high-quality data and the process of selecting and preparing the dataset for fine-tuning.
📊 Preparing and Understanding Your Dataset for Fine-tuning
This section delves into dataset preparation, focusing on selecting tasks and ensuring data quality. Vin uses a cryptocurrency news dataset from Kaggle, detailing the process of creating training, validation, and test splits. He emphasizes the importance of stratified sampling to maintain representative data distribution across splits and discusses handling class imbalance. The dataset includes sentiment and subjectivity labels for news articles, serving as a basis for training the tiny LM to predict news sentiment and subjects accurately.
🔧 Setting Up Tokenizer and Model Configuration
Vin explains the setup process for the tokenizer and model configuration, including adding a padding token and adjusting token embeddings for the TinyLM model. He discusses the importance of correct padding to avoid repetition and the use of GPU capabilities for training. The section also covers how to fit data within the model's context window using a specific template and the preparation steps for using the model with WaRT (With Adaptation and Retraining Techniques), highlighting the benefits of training smaller models or adapters for efficiency.
⚙️ Applying WaRT and Training the Model
This part focuses on applying WaRT to fine-tune the TinyLM model, targeting specific model layers for adaptation and discussing the configuration for efficient training. Vin shares insights on optimizing training parameters, like batch size and learning rate, and introduces techniques for training on completions to improve model performance. He provides a detailed walkthrough of setting up training arguments and using a data collator for focusing loss calculation on specific parts of the model output.
📝 Training Insights and Evaluation Techniques
Vin shares his training insights, noting the effectiveness of using a smaller batch size with gradient accumulation for better training dynamics. He outlines the training process, including optimizer choices and the rationale behind training setup decisions. The section also covers model evaluation strategies, demonstrating how to test the fine-tuned model's performance on the dataset and analyze results for both subject and sentiment prediction accuracy using confusion matrices and accuracy calculations.
🎯 Achieving Accurate Predictions and Model Deployment
The final section showcases the fine-tuned model's ability to accurately predict news subjects and sentiments, with examples demonstrating its performance. Vin discusses the potential for discrepancies between model predictions and dataset labels, suggesting the model's predictions might sometimes be more accurate. He concludes by outlining plans for deploying the model in production, emphasizing the significance of model fine-tuning in achieving high accuracy and the upcoming tutorial on model deployment and API integration.
Mindmap
Keywords
💡fine-tuning
💡Tiny L
💡dataset preparation
💡tokenization
💡War Adapters
💡sentiment analysis
💡subject classification
💡deployment
💡model evaluation
💡Google Colab
Highlights
Introduction to fine-tuning TinyLM on custom datasets using Google Colab free tier.
Advantages of choosing TinyLM over larger language models for faster inference and training.
Importance of fine-tuning for improving model performance on specific tasks.
Guidance on dataset preparation and the need for high-quality examples.
Using TinyLM for multiple tasks, showcasing versatility in application.
Crypton news dataset example to demonstrate fine-tuning on real-world data.
Detailed process of tokenizer and model preparation for training.
Utilizing WaRT (Weighted Adapter Residual Tuning) for efficient fine-tuning.
Strategies for managing GPU memory limitations during model training.
Fine-tuning model performance by adjusting WaRT configuration parameters.
Introduction to training with completion collators for focused learning.
Techniques for achieving lower training loss and effective model evaluation.
Saving and reloading fine-tuned models for inference.
Demonstrating the fine-tuned model's accuracy in predicting news subjects and sentiments.
Future directions on deploying the fine-tuned model for production and API integration.
Transcripts
hey everyone my name is Vin and in this
video we're going to have a look at how
you can f tune a tiny L on your own data
set we're going to start with preparing
the data set for training then we're
going to have a look at what parameters
you need to set in order to get your
tokenizer and model prepared for
training along with war setup then we're
going to train the model within a Google
CL notebook free tier finally we are
going to wot the train model and do
evaluation on a test set to see whether
or not the fine tuned model is doing a
good job let's get started if you want
to follow along there will be a complete
text tutorial along with the link to a
Google clap notebook for this video and
this will be available within the
bootcamp section of ML expert. and then
find unink tiny L on custom data set
this is available for ML expert Pro
subscribers so if you want to support my
work and get access to this please go
and subscribe to mxer Pro thanks so what
do you need in order to find you a tiny
L first we're going to go through why
you would might want to choose a tiny I
over something like wama 7B Li parameter
models then we're going to have a look
at why you would need to do some
fine-tuning then we're going to have a
look at some of the checkpoints that you
need to cover in order to choose and
prepare your data set and finally I'm
going to give you some tips in order to
find you a tiny L using War so why tiny
L first and most importantly those types
of models are relatively small or
smaller compared to regular watch
language models such as 7 billion
parameters models such as mistra or W 2
and Tiny LM are usually something like
tiny wama the one that we're going to
use in this video and other like F and F
two which is on the let's say limits of
what I would call a tiny a another
important thing for tiny L is that you
can do much faster inference with those
and uh the training itself can be a lot
faster compared to what you might get
with a relatively larger a and you can
even use like older gpus in order to
train those types of models and finally
even though those models are tiny some
of those are still trained with very
high quality data such as fi and f 2 and
trained on a lot of tokens in the data
set such as Tiny Lama which has uh more
than 3 trillion tokens in the training
data set why would you want to do some
fine-tuning well first you can try to
start with some prompt engineering and
if that works for you and The Benchmark
or the performance of your model is
relatively good then try to stick with
just prompt engineering but if you want
to increase the performance of your
model and if you have enough data in
order to do that fine tuning is a very
good approach in order to get much
better performance of your tiny a and in
the general case tiny a are not as
powerful at 70 billion parameter model
plus like uh for example W 2 or mist or
other models and not even close to CH
GPT and GPT 4 and GPT 4 Turbo so in that
case if you want to have some much
smaller model that is performing
relatively well on your benchmark on
your tasks you would likely need to do
some fine-tuning in order to provide
much better performance for your tiny a
another good thing about fine tuning is
that you're going to reduce essentially
the number of tokens that you need in
order to pass into the input with with
the prompt so you might just pass in
your data and you might just want to
think of a much smaller template that
will be good for your prompts and you
can essentially just use that instead of
some larger prompts and this will make
your inference time even faster of
course you might want to have a data or
might have data that is private to you
or your company so when you're fine
tuning your own models you don't have to
expose the data to the outside world so
this is another uh let's say positive of
the fine tuning
approach and how would you prepare your
data as a general row of temp I would
suggest more than a thousand examples
dat of high quality so uh preferably you
might want to have a humans that were
looking through the data and they would
essentially get a feel of where the data
quality is and when you get get a good
quality data your fine tuning L are
going to be much much better compared to
if you have some let's say Shady data
points and you would have to think about
what type of tasks you're solving in
this video I'm going to show you that we
are going to use the a for two different
tasks which is very good uh in the past
if you had to Sol for multiple tasks
essentially you have to train multiple
models or have a single model that have
multiple heads for each prediction in
the era of the L uh we are going to just
say that we want two outputs one will be
the sentiment of uh news and then is
going to be the subject of the news or
cryptocurrency news this is the DAT set
that we're going to use and you would
have to have a look at how much tokens
do you need in the input and the output
and uh have a look at your model maximum
context WID and choose whether or not
you're going to be able to fit the
inputs and outputs within the context
window and then you would have to think
of a template that is going to be
essentially good in order to prepare
your own data the data set that we're
going to use is uh these Crypton news
plus that are available on KGO and it
says that there are Crypton news
articles containing title text and
sentiment analysis of course the
sentiment analysis is going to be Essen
probably predicted from some model so
the labels might not be perfect but
still this is a real world example of
what you might have and uh here is the
Crypton news data for year over a year
21 to
23 structured format including title
text Source subject and sentiment
analysis and this is the example of data
that you get you have a class for the
sentiment polarity and subjectivity and
of course you have this subject and all
of those are going to be accompanied
with the text and a title from the news
and this is just the first paragraph of
the article and this is the title of the
article I have a Google cop notebook
that I've have wed the Crypton news data
and I essentially took the CSV or the
original CSV file and created this
stratified split between train
validation and test sets and here is the
data frame for the training headit
training data frame and you'll see the
split between the training the
validation and tests examples we still
have a lot of data and uh you'll see
that I've got the subject here and I
essentially split the sentiment within a
couple of columns so this will be a bit
easier to work with compared to what we
had into the original U data set uh
other than that I'm going to show you
the splits between the train test and
valid ations so you can see that the
stratified sampling has worked wonders
for us you see that the trend the
validation and the test set for each uh
subject which is Bitcoin altcoin
blockchain ethereum nft and defi all of
those are split um pretty much as the
way that the training set has the
frequency for those and you see that
essentially we have a very large bias
towards Bitcoin outcome and blockchain
examples which is again something that
you might not want in your data set but
this is uh the real world in here you
can of course use some techniques such
as oversampling under sampling Etc in
order to fight this but just for this
fine tuning example I'm going to stick
with the original
distributions uh this is the subject
that we're going to try to predict and
then we have the sentiment again the
distribution is uh essentially kept as
in the way that the training set has
this so again with the stratified
sampling and you see that we have the
positive neutral and negative sentiments
and you might see again that we have
somewhat of a skew data towards neutral
and positive news while the negative
news are much much less compared to the
neutral and positive so keep that in
mind as well and this is the
subjectivity score something that we are
not going to predict but I've shown this
in order to get a few of this uh
category this distribution so the first
thing that I'm doing here with the data
set in order to pre-pro is to
essentially get the data set from pandas
and I'm going to use the huging phas
data sets Library I'm going to just
create this dictionary with the train
validation and test subsets and then I'm
going to essentially W the tokenizer for
the model that we're going to use in our
case this is going to be the tiny Lama
model and I'm going to get the latest
model that is not a chat model and this
was trained on 3 trillion parameter
tokens and I'm going to set a padding
token or P token for the tokenizer and
uh here you see that I'm getting the
tokenizer for the model then I'm adding
this special token for the P token and
then I'm setting a padding side to right
and after wading the model itself I'm
going to resize the token embeddings in
order to get the new uh token embeddings
count since I'm loading or adding this
tokenizer and I'm expanding this to a p
of multiple of eight and you'll see that
we've added this token the padding token
that is and you see that now the
tokenizer has all the available tokens
and this is the new token that we've
added to the tokenizer so this is very
important because if you don't have some
padding or correct ping within the
training sets your model is tending to
essentially repeat the last couple of
words or tokens that is going to
generate so this really helps with the
repetition of the model and then another
thing right here is that if you're using
a GPU that is capable of using flash
attention to I would strongly suggest
you that you turn on this one but since
I'm using the T4 GPU which is available
on the free tier of Google C I'm
essentially commenting out this one so
essentially this is how you're going to
to what the model and the tokenizer
itself next we are going to make sure
that the number of tokens are going to
be fitted right within the context
window of our tiny W model which has 248
tokens of context WID and in our example
I'm going to create this format or
template which is something that I've
chose to use this is not something
standard so I chose to set the title the
text and then the prediction in this
format for the article of or the news
article and then you see that within the
prediction I have this subject and then
sentiment and in order to have a look at
how many tokens we are going to need I'm
essentially counting the number of
tokens in each example after formatting
it into uh using this template and you
see that the number of tokens is much
much more L compared to the maximum
limit of 28 48 so we are going to
essentially need at most 200 tokens for
the input so the problem with the
context window should not be uh anything
errow our examples are very tiny
compared to what the tiny one model can
handle while you can fine tune a tiny l
in its fullest still 1.1 billion
parameter models are not small by any
means even though the name is Tiny wama
so if you have a single GPU for example
a T4 that we're going to use within the
the Google cop notebook you might have a
hard time fitting this model into the
GPU and fine-tuning it in on its own so
in our case I'm going to have a look at
how you can use war in order to fine
tune the tiny I and this will allow us
to even increase the bat size that we
are going to use in order to train this
model so one important thing to note is
that war or with War when you're
training such models you are going to
essentially train just a small model
called adapter on top of the original
model so you have to essentially W the
original model within the memory and
then create a smaller model or a set of
or a matrix of parameters in order to
find youe just those and even though
when you're training models such as wama
7B you might just train roughly or even
lower than 1% of the parameters if you
do that with tiny L you're going to get
like something like maybe 1 or 10
million parameters in order to train
your model so in the general case this
wouldn't be enough of course this
depends on the task at hand so as a
general start I would recommend
something like 100 million
parameters which is a great start and
you can tweak that in order to get
something like this for the tiny wama
we're going to increase the rank of the
wama or sorry the water conf to about
128 so this will give us roughly
8.5% of the parameters for training of
the original model and then I'm going to
increase also the war Alpha in order to
scale the learning rate and not change
its value and again I'm going to set
this number to
128 to start with the training I'm going
to set the P token ID on the model and
then on the model config P token ID to
the tokenizer Token IDs then I'm going
to have a look at model config in order
to double check that the Ping token or P
token ID has been properly set which is
and then we are going to have a look at
the model architecture which is going to
tell us where do we need to apply the
war scaling or the war Target modules so
in this case you're going to see within
my config right here that I'm targeting
the self attention one and then the MLP
ones so these are the linear layers and
these are the self attention layers as
you can see right here and I'm
essentially targeting all of those and
for the rank of the Matrix and shout out
to Tris research YouTube channel from
which I've seen that he's actually
targeting tiny LS with much higher
number of parameters so thank uh thanks
to you I've seen that you can actually
you need to actually increase the number
of parameters or the ranking of the war
Matrix in order to find you much better
with tiny a and here I'm going to set
the rank of the Matrix and the war Alpha
in order to scale the warning crate
within 128 bolt and I'm going to apply a
small Dropout to the War uh so this is
the new adapter model and then I'm going
to say that this is a coal language
modeling task from the task type right
here and then I'm going to get the P
model on top of the original tiny Lama
model with the water config application
right here and you see that we are
actually targeting roughly um 100
million or 101 million parameters for
training
8.4% on the training front with the
water so next I'm going to show you how
you can train just on the completions
and this is uh something that my
colleague called wo have shown me thank
you w for that so instead of training
the the whole text or using the whole
text for the training you essentially
what you want to get is to use for
example from this example uh you want to
calculate the was only on this so
essentially I'm going to ignore this
which is the changing part within the
DAT set and to calculate the W I'm going
to essentially take only those tokens in
order to have a look at how well the
model is performing and this will
drastically reduce the was that you have
but keep that in mind that if you're
training for a task such as are on right
here for some completion just so some
for some completions then this type of
collator is doing a great job but if
you're training for something like um
assistant and chats Etc this might might
not be a good use case of the data
cleator so keep that in
mind and uh in our case I'm going to use
the prediction as a template I'm going
to encode this and then I'm going to
pass in the template IDs or the response
template IDs to the collator and then
I'm going to pass in a tokenizer to that
so essentially what we are going to do
here is to um tokenize the template
since without that uh discolor appears
to be failing at least for me and I'm
going to essentially get a single
example and tokenize it in order to show
you what the labels uh this collector is
going to add so you'll see here that
when I create this data water and I get
the next patch from it you see that now
we have input IDs attention mask and
then a new field callede
labels if you look through the batch
labels you'll see that everything uh
before the template essentially has been
given an ID of minus 100 so this is
essentially ignore these tokens and for
the was itself only these tokens are
going to be used for the calculation of
the loss since we get a bit of
repetition with the subject and
sentiment you can essentially prove this
to be either better so essentially what
you might want to get is to get rid of
this and get rid of this and just um
print those two lines this would
probably be much better compared to what
we have right now and your wor is going
to be performing even better but yeah
this is an exercise that if you want to
do this and then for the training
arguments I am going to essentially use
a b size of four but I'm going to
multiply that by four in order to get an
effective B size of 16 using gradient
accumulation so what this will do is
going to be passing only four examples
through the GPU but then uh the results
are going to be
accumulated within a four uh iterations
of those four batches and then the
accumulation or the gradient is going to
be calculated on top of that this
appears to be he to pink with the
training and I've seen that during the
training on this single GPU this gave me
a much ler uh war or sorry much l wases
so it appears to be helping uh then I'm
going to be using a regular Adam with uh
wdk fix from torch Optimizer we are not
using any um
quantized um any quantized Optimizer
since we are going to be using uh fp16
or floating Point 16 training for this
one we don't need Q for those tiny uh
language models this appears to be
training very fast and it appears to be
very stable with very good results so no
quantization on this part right here and
I'm going to essentially use a constant
schedu type uh yeah this is is a bit
redundant since we're not going to be
using any warm up right here uh and then
another important thing is that I'm
going to train just for one Epoch of
course you might want to train for
multiple epochs that depends on the DAT
set size that you have I've trained this
for roughly 40 minutes I believe and if
you train for longer you might actually
get better results with those tiny l so
uh it it might be worth to experiment
with that and those are essentially the
training arguments that we
have then I'm going to get this format
prompts which is going to be passing
essentially a
example and within this example I'm
going to our examples and within that
I'm going to essentially use the format
of the template that we've seen thus far
and this is going to essentially create
our batch for us so this is the trainer
that I'm going to use um I'm going to
pass in the model the training arguments
then the training and the validation uh
sets a tokenizer Max sequence length
which can be increased but in our case
that's not needed then the formatting
function which is this one and then the
data calator which is going to be
training only on the completions so uh
this is essentially the output of the
training uh and you see that the model
is actually performing very well this is
the evaluation was from the the tensor
board training uh you see right here
that we start with a relatively high
value of
0.15 then uh after 600 steps this is
0.11 uh below
0.10 and yeah you can see that uh in
relatively let's see that again in about
26 minutes of training we get this far
below so this is really
good uh and uh you can check the
tutorial for the full outline of this
but this is my training course without
any smoothing and you see that is again
generally decreasing uh you might U
argue that we are going to hit a plateau
right here but I would say that the
training went really well and these are
again the results from this one uh you
see here in this table that we have the
training course and the validation was
and you can see that that they're fairly
similar uh and the validation was is
actually a bit better in the later
iterations which is surprising but um it
is within the realm of what you might
get since the training set is much
larger compared to validation set so it
might be just uh Randomness right here
and then in order to get this model to
be safed I'm going to use the trainer
model save pre-trained and within the
same folder I'm going to essentially get
the tokenizer to save itself as well
with the proper
configuration so in order to try out our
model I'm going to um essentially I've
at this point I've restarted the Google
WAP notebook and what I did here was to
get the base model wed into for 16 and
then apply the pth model on top of that
this is again the same folder and train
folder and then I'm going to essentially
merge the P model on top of the original
model and this is going to get again the
tokenizer which was correctly formatted
you can see right here that we have a
padding token and we have a correct
padding site and a correct P token ID
and after that I just again setting the
P token ID and p uh config P token ID
just in case and now we can use our
function model as a regular Hing face
Transformers model I'm I'm going to
create a pipeline I it for text
generation I'm going to pass in the
model the tokenizer the maximum number
of new tokens this is going to be only
16 since uh we already know that our
model is going to be producing a very
small number of tokens for the
completion and I'm going to essentially
format the example for completion or for
prediction I'm going to just take from
the example the title the text and then
I'm going to pass in the prediction
without the prediction itself I'm going
to to um reduce the verbosity of the
Ling and then I'm going to have a look
at 10 examples note here that this is
the text so this is the complete text
from the example and then I'm calling
format for prediction right here with
the example itself and I'm going to
essentially output the prediction so um
the original subject or sentiment is not
passed into the
model so this is the first example
binance research report reviews Etc and
the subject here is from the original
data point is nft the sentiment is
positive and this is now the prediction
you see that we have a um duplicate of
the sentiment line by the model this is
relatively common and we're going to
address that in a bit but the subject
appears to be correct right here and the
sentiment appears to be positive as well
let's look at the another one subject
altcoin sentiment positive outco
positive again uh this is essentially
what we have right here it is correct uh
then subject etherium sentiment positive
again those appear to be uh exactly
correct altcoin positive but the
prediction was negative let's have a
look at the title coinbase coinbase coo
calls for regulation of centralized
crypto entities the demise of FTX has
set back crypto by years and This
Disaster is likely to steer Regulators
Regulators into action so the sentiment
is positive but I wouldn't exactly agree
with this label right here uh you can
decide on your own and I think that our
model is actually predicting a better
sentiment than the one in the
labels something that is very
interesting let's have a look at another
one altcoin positive again uh
correct now this subject here here is
altcoin but the model is saying Bitcoin
let's have a look at and there again
positive sentiments for both so
bitcoin's PR prediction as BTC breaks
through Etc Bitcoin the world swes
currency and the label is altcoin yeah
our model is uh performing very well
indeed so uh this looks to be the case
that the labels are not exactly perfect
but our model seems to be doing a good
job even though the the data set is not
of that high quality uh and yeah you can
go through a lot of examples and see for
yourself so next I'm going to do
something a bit different I'm going to
extract the prediction for the complete
test set again this is 1, uh200 uh yeah
1,24 242 examples this took about 10
minutes and this these are the
predictions uh the title the text true
subject true sentiment predicted subject
predicted sentiment this is essentially
the data frame that we're going to get
and I'm going to essentially calculate a
very rough accuracy for the subject
which is according to this calculation
78.6% accuracy of course you might want
to go through some examples and see for
yourself if the model is actually better
compared to the labels uh this is
essentially a heat map or a confusion
Matrix
of
different predictions for the subject
and the real values uh you can see that
we have some overlap between blockchain
and altcoin right here but uh nothing
really
major and again for the TR subject and
the predicted subject uh let's see let's
get an example from right
here AI optimizing crypto exchange
functions artificial intelligence tools
are providing so the TR subject is
Bitcoin but the predicted subject is
blockchain yeah at least from the first
couple of words it appears that again
our model is performing better than the
labels but I might be wrong I I mean go
over the title and the text for some
examples on your own next for the
sentiment uh we have exactly the same
calculation and you see that this time
we have a just a tiny bit over 90%
accuracy on the test set which is really
impressive if with such a small data set
again this is the confusion
Matrix uh yeah and again we are going to
have a look at some
examples bad news is good news Bitcoin
plays with USD Bitcoin reaches its
highest Target in nearly seven Etc and
here the sentiment is positive while our
prediction is neutral I would agree that
the labels here is better compared to
what we have in the model
itself uh arst stroke promised me 100 in
Bitcoin is it possible that coinbase CEO
Etc neutral and our prediction is
negative I'm not sure I have to see the
title and the text for this one but yeah
even if this is correct 90% is very good
for such a small training so this is it
for this video you now know how to find
you a tiny L on your own data set and
you know how to set up correctly the war
configuration for for that and also you
know how to save the model after
training and then get the final model on
top of the original model and do some
inference with it in the next video I'm
going to show you how you can use the
adapted model and fuse it or merge it
within the original model push that to a
huging face Hub repository and then from
there we're going to deploy the model in
production behind an API and we're going
to start to get some inference on top of
a real world example thanks for watching
guys please like share and subscribe
also join the Discord channel that I'm
going to link down into the description
below and I'll see you in the next one
bye
関連動画をさらに表示
EASIEST Way to Fine-Tune a LLM and Use It With Ollama
Fine-Tune Your Own Tiny-Llama on Custom Dataset
YOLO World Training Workflow with LVIS Dataset and Guide Walkthrough | Episode 46
Fine Tuning Microsoft DialoGPT for building custom Chatbot || Step-by-step guide
Training a model to recognize sentiment in text (NLP Zero to Hero - Part 3)
Fine-tuning Gemini with Google AI Studio Tutorial - [Customize a model for your application]
5.0 / 5 (0 votes)