Fine-tuning Tiny LLM on Your Data | Sentiment Analysis with TinyLlama and LoRA on a Single GPU

Venelin Valkov
29 Jan 202431:41

Summary

TLDRIn this tutorial video, Vin shows how to fine-tune a Tiny L language model on a custom cryptocurrency news dataset. He covers preparing the data, setting the correct parameters for the tokenizer and model, training the model efficiently with Warp using a Google Colab notebook, evaluating model performance, and doing inference with the fine-tuned model. The goal is to predict the subject and sentiment for new crypto articles. With only 40 minutes of training data, the fine-tuned Tiny L model achieves promising results - around 79% subject accuracy and over 90% sentiment accuracy.

Takeaways

  • 📚 Vin explains the process of fine-tuning a TinyLM model on a custom dataset, beginning with dataset preparation and proceeding through training to evaluation.
  • 🔧 Key steps include setting up tokenizer and model parameters, using a Google Colab notebook, and evaluating the fine-tuned model on a test set.
  • 🌐 The tutorial includes a complete text guide and a Google Colab notebook link, available in the ML expert bootcamp section for Pro subscribers.
  • 🤖 TinyLM is preferred over larger models like 7B parameter models due to its smaller size, faster inference, training speed, and suitability for older GPUs.
  • 📈 Fine-tuning is essential for improving model performance, especially when prompt engineering alone doesn't suffice, and for adapting the model to specific data or privacy needs.
  • 📊 For dataset preparation, a minimum of 1,000 high-quality examples is recommended, and consideration of task type and token count is crucial.
  • 🔍 The tutorial uses the 'Crypton News+' dataset from Kaggle, focusing on sentiment and subject classification of cryptocurrency news.
  • ⚙️ Vin demonstrates using Hugging Face's datasets library and tokenizer configurations, emphasizing the importance of padding tokens in avoiding repetition.
  • 🚀 The training process involves using WaRT (Weighted Activation Regularization of Training) to train a small adapter model over the base TinyLM model.
  • 📝 Evaluation results show high accuracy in predicting subjects and sentiments from the news dataset, validating the effectiveness of the fine-tuning process.

Q & A

  • What model is used for fine-tuning in the video?

    -The Tiny Lama model, which is a 1.1 billion parameter model trained on over 3 trillion tokens.

  • What techniques can be used to improve model performance before fine-tuning?

    -Prompt engineering can be used before fine-tuning to try to improve model performance. This involves crafting the prompts fed into the model more carefully without changing the model itself.

  • How can Warp be used during fine-tuning?

    -Warp allows only a small model called an adapter to be trained on top of a large model like Tiny Lama. This reduces memory requirements during fine-tuning.

  • What data set is used for fine-tuning in the video?

    -A cryptocurrency news data set containing titles, text, sentiment analysis labels, and subjects for articles is used.

  • How can the data set be preprocessed?

    -The data can be split into train, validation, and test sets. The distributions of labels can be analyzed to check for imbalances. A template can be designed for formatting the inputs.

  • What accuracy is achieved on the test set?

    -An accuracy of 78.6% is achieved on subject prediction on the test set. An accuracy of 90% is achieved on sentiment analysis on the test set.

  • How can the fine-tuned model be deployed?

    -The adapted model can be merged into the original Tiny Lama model and pushed to Hugging Face Hub. Then it can be deployed behind an API for inferences in production.

  • What batch size is used during training?

    -A batch size of 4 is used with gradient accumulation over 4 iterations to simulate an effective batch size of 16.

  • How are only the model completions used to calculate loss?

    -A special collator is used that sets the labels for all tokens before the completion template to -100 to ignore them in the loss calculation.

  • How can the model repetitions be reduced?

    -The repeated subject and sentiment lines could be removed from the completion template to improve quality.

Outlines

plate

Cette section est réservée aux utilisateurs payants. Améliorez votre compte pour accéder à cette section.

Améliorer maintenant

Mindmap

plate

Cette section est réservée aux utilisateurs payants. Améliorez votre compte pour accéder à cette section.

Améliorer maintenant

Keywords

plate

Cette section est réservée aux utilisateurs payants. Améliorez votre compte pour accéder à cette section.

Améliorer maintenant

Highlights

plate

Cette section est réservée aux utilisateurs payants. Améliorez votre compte pour accéder à cette section.

Améliorer maintenant

Transcripts

plate

Cette section est réservée aux utilisateurs payants. Améliorez votre compte pour accéder à cette section.

Améliorer maintenant
Rate This

5.0 / 5 (0 votes)

Besoin d'un résumé en anglais ?