A Practical Introduction to Large Language Models (LLMs)
Summary
TLDRIn this data science series, Shah introduces large language models (LLMs), emphasizing their vast parameters and emergent properties like zero-shot learning. He explains the shift from supervised to self-supervised learning, highlighting next word prediction as the core task. Shah outlines three practical levels of LLM use: prompt engineering, model fine-tuning, and building your own LLM. The series aims to make LLMs accessible, with future videos covering APIs, open-source solutions, and practical applications.
Takeaways
- 😀 Shah introduces a new data science series focused on large language models (LLMs) and their practical applications.
- 🔍 The series will cover beginner-friendly introductions to LLMs, practical aspects, APIs, open-source solutions, fine-tuning, and building LLMs from scratch.
- 🗣 Large language models, like Chat GPT, are advanced chatbots that can generate human-like responses to queries.
- 📏 'Large' in LLM refers to the vast number of model parameters, ranging from tens to hundreds of billions, which define the model's functionality.
- 🌟 A key qualitative feature of LLMs is 'emergent properties', such as zero-shot learning, which allows models to perform tasks without explicit training for those tasks.
- 🔄 The shift from supervised learning to self-supervised learning in LLMs has been significant, with self-supervised learning relying on the structure within the data itself.
- 🔮 The core task of LLMs is next word prediction, which they learn through exposure to massive amounts of text data, allowing them to understand word associations and context.
- 🛠 Three levels of working with LLMs are discussed: prompt engineering (using LLMs out of the box), model fine-tuning (adjusting model parameters for specific tasks), and building your own LLM.
- 💻 Prompt engineering can be done through user interfaces like Chat GPT or programmatically via APIs and libraries like OpenAI or Hugging Face Transformers.
- 🔧 Model fine-tuning involves taking a pre-trained LLM and updating its parameters using task-specific examples, often resulting in better performance for specific use cases.
- 🏗 For organizations with specific needs, building a custom LLM may be necessary, involving data collection, pre-processing, model training, and deployment.
Q & A
What is the main focus of Shah's new data science series?
-The main focus of Shah's new data science series is to discuss large language models (LLMs) and their practical applications.
What is the difference between a large language model and a smaller one?
-Large language models differ from smaller ones in two main aspects: quantitatively, they have many more model parameters, often tens to hundreds of billions; qualitatively, they exhibit emergent properties like zero-shot learning that smaller models do not.
What is zero-shot learning in the context of large language models?
-Zero-shot learning refers to the capability of a machine learning model to complete a task it was not explicitly trained to do, showcasing an emergent property of large language models.
How does self-supervised learning differ from supervised learning in the context of large language models?
-In self-supervised learning, models are trained on a large corpus of data without manual labeling, using the inherent structure of the data to define labels. This contrasts with supervised learning, which requires manually labeled examples for training.
What is the core task that large language models are trained to do?
-The core task that large language models are trained to do is next word prediction, where they predict the probability distribution of the next word given the previous words.
What are the three levels of working with large language models mentioned by Shah?
-The three levels of working with large language models mentioned by Shah are: 1) Prompt Engineering, 2) Model Fine-tuning, and 3) Building your own Large Language Model.
What is meant by prompt engineering in the context of large language models?
-Prompt engineering refers to using a large language model out of the box, without altering any model parameters, and crafting prompts to elicit desired responses.
How does model fine-tuning in large language models work?
-Model fine-tuning involves adjusting at least one internal model parameter of a pre-trained large language model to optimize its performance for a specific task using task-specific examples.
Why might an organization choose to build its own large language model?
-An organization might choose to build its own large language model for security reasons, to customize training data, or to have full ownership and control over the model for commercial use.
What resources does Shah recommend for further exploration of large language models?
-Shah recommends the blog in Towards Data Science and a GitHub repository for more details, example code, and further exploration of large language models.
Outlines
🚀 Introduction to Large Language Models
Shah introduces a new data science series focused on large language models (LLMs), emphasizing their practical applications. The video aims to provide a beginner-friendly introduction to LLMs, explaining their significance and how they differ from traditional language models. Shah highlights the impressive capabilities of LLMs, such as Chat GPT, which can generate human-like responses. The key distinguishing features of LLMs are their vast number of parameters and emergent properties like zero-shot learning, which allows them to perform tasks without explicit training. The video sets the stage for future discussions on practical aspects, including using APIs, open-source solutions, fine-tuning, and building LLMs from scratch.
🔍 Deep Dive into Large Language Models
This paragraph delves deeper into the workings of large language models, contrasting them with traditional machine learning models. It explains the self-supervised learning paradigm used to train LLMs, which involves predicting the next word in a sequence based on the context. The process is described as an auto-regression task, where the model learns to predict the probability distribution of the next word given the previous words. Shah emphasizes the importance of context in language modeling and how a single word change can significantly alter the model's output. The paragraph also outlines the three levels of working with LLMs: prompt engineering, model fine-tuning, and building your own LLM. Each level requires varying degrees of technical expertise and computational resources.
🛠 Practical Applications of Large Language Models
Shah discusses the practical applications of large language models, focusing on three levels of engagement: prompt engineering, model fine-tuning, and building your own LLM. Prompt engineering involves using LLMs without altering their parameters, either through intuitive interfaces like Chat GPT or programmatically via APIs and libraries. Model fine-tuning adjusts the model parameters for specific tasks, building on the pre-trained model's capabilities. Shah mentions techniques like low-rank adaptation and reinforcement learning with human feedback. For organizations with specific needs and security concerns, building a custom LLM might be the best approach. The paragraph concludes with a call to action for viewers to subscribe for more content and engage with the series through comments and suggestions.
Mindmap
Keywords
💡Large Language Models (LLMs)
💡Parameters
💡Zero-Shot Learning
💡Self-Supervised Learning
💡Prompt Engineering
💡Fine-Tuning
💡Autoregression
💡Emergent Properties
💡ChatGPT
💡Hugging Face Transformers
Highlights
Introduction to a new data science series focusing on large language models (LLMs).
Description of three levels of working with LLMs: prompt engineering, model fine-tuning, and building a custom LLM.
Definition of a large language model and its distinguishing properties.
Quantitative aspect of LLMs: the vast number of model parameters.
Qualitative aspect of LLMs: emergent properties like zero-shot learning.
Comparison between supervised learning and self-supervised learning in LLMs.
Explanation of next word prediction as the core task of LLMs.
Importance of context in language modeling demonstrated through example.
Introduction to prompt engineering as the most accessible way to use LLMs.
Discussion on using LLMs through user interfaces like chat GPT.
Mention of programmatic access to LLMs via APIs and open-source libraries.
Overview of model fine-tuning as a way to customize LLMs for specific tasks.
Explanation of the process of fine-tuning a pre-trained LLM for a specific use case.
Introduction to the concept of building a custom LLM for organizations with specific needs.
Discussion on the potential of LLMs for various applications and the importance of understanding the technology.
Invitation to subscribe for future videos in the series and to engage with the content through comments and suggestions.
Transcripts
everyone I'm Shah and I'm back with a
new data science Series in this new
series I'm going to be talking about
large language models and how to use
them in practice
in this video I will give a beginner
friendly introduction to large language
models and describe three levels of
working with them in practice future
videos in this series will discuss
various practical aspects of large
language models things like using open
ai's python API using open source
Solutions like the hugging face
Transformers Library how to fine-tune
large language models and of course how
to build a large language model from
scratch if you enjoyed this content
please be sure to like subscribe and
share with others and if you have any
suggestions for me to include in this
series please share those in the
comments section below and so with that
let's get into the video so to kick off
the video series in this video I'm going
to be giving a practical introduction to
large language models and this is meant
to be very beginner friendly and high
level and I'll leave more technical
details and example code for future
videos and blogs in this series so a
natural place to start is what is a
large language model or llm for short so
I'm sure most people are familiar with
chat GPT however if you are enlightened
enough to not keep up with new cycles
and Tech hype and all this kind of stuff
chat GPT is essentially a very
impressive and advanced chat bot so if
you go to the chat GPT website you can
ask it questions like what's a large
language model and it will generate a
response very quickly like the one that
we are seeing here and that is really
impressive like if you were ever on AOL
Instant Messenger also called aim you
know back in early 2000s or in the early
days of the internet there were chat
Bots then there have been chat Bots for
a long time but this one feels different
like the text is very impressive and it
almost feels human-like a question you
might have when you hear the term large
language model is what makes it large
what's the difference between a large
language model and a not large language
model and this was exactly the question
I had when I first heard the term and so
one way we can put it is that large
language models are a special type of
language model but what makes them so
special and I'm sure there's a lot that
can be said about large language models
but to keep things simple I'm going to
talk about two distinguishing properties
the first quantitative and the second
qualitative so first quantitatively
large language models are large they
have many many more model parameters
than past language models and so these
days this is anywhere from tens to
hundreds of billions of parameters the
model parameters are numbers that Define
how the model will take an input and
generate the output so it's essentially
the numbers that Define the model itself
okay so that's a quantitative
perspective of what distinguishes large
language models from not large language
models but there's also this qualitative
perspective and these so-called emergent
properties that start to show up when
Lang language models become large and so
emergent properties is the language used
in this paper cited below a survey of
large language models available in the
archive really great beginner's guide I
recommend it but essentially what this
term means is there are properties in
large language models that do not appear
in smaller language models and so one
example of this is zero shot learning
one definition of zero shot learning is
the capability of a machine learning
model to complete a task it was not
explicitly trained to do so while this
may not sound super impressive to us
very smart and sophisticated humans this
is actually a major innovation in how
these state-of-the-art machine learning
models are developed so to see this we
can compare the old state-of-the-art
Paradigm to this new state-of-the-art
paradigm the old way and not too long
ago we can say like about five ten years
ago the way the high performing best
machine learning models were developed
was strictly through supervised learning
what this would typically look like is
you would train a model on thousands if
not millions of labeled examples and so
what this might have looked like is you
have some input text like hello Ola
how's it going nastabian so on and so
forth and you take all these examples
and you manually assign a label to each
example here we're labeling the language
so English Spanish so on and so you can
imagine that this would take a
tremendous amount of human effort to get
thousands if not millions of high
quality examples so let's compare this
to the more recent Innovation with large
language models who use a different
Paradigm they use so-called
self-supervised learning so what that
looks like in the context of large
language models is you train a very
large model on a very large Corpus of
data and so what this can look like is
if you're trying to build a model that
can do language classification instead
of painstakingly generating this labeled
data set you can just take a corpus of
English text and a corpus of Spanish
text and train a model in a
self-supervised way so in contrast to
supervised learning self-supervised
learning does not require manual
labeling of each example in your data
set the so-called labels or targets for
the model are actually defined from the
inherent structure of the data or in
this context of the text so you might be
thinking to yourself how does this
self-supervised learning actually work
and so one of the most popular ways that
this is done is the next word prediction
Paradigm so suppose we have this text
listen to your and we want to predict
what the next word would be but clearly
there's not just one word that can go
after the string of words there are
actually many words you can put after
this text and it would make sense in
this next word prediction Paradigm what
the language model is trying to do is to
predict the probability distribution of
the neck next word given the previous
words what this might look like is
listen to your heart might be the most
probable next word but another likely
word could be gut or listen to your body
or listen to your parents and listen to
your grandma and so this is essentially
the core task that these large language
models are trained to do and the way the
large language model will learn these
probabilities is that it'll see so many
examples in this massive Corpus of text
that is trained on and it has a massive
number of internal parameters so it can
efficiently represent all the different
statistical associations with different
words and an important Point here is
that context matters if we simply added
the word don't to the front of this
string here and it changed it to don't
listen to your then this probability
distribution could look entirely
different because just by adding one
word before this sentence we completely
change the meaning of the sentence and
so to put this a bit more mathematically
and I promise this is the most technical
thing in this video this is an example
of a auto regression task so Auto
meaning self regression meaning you're
trying to predict something so what this
notation means is what is the
probability of the nth text or more
technically the nth token given the
preceding M token so n minus 1 and minus
two and minus three and so on and so
forth and so if you really want to boil
everything down this is the core task
most large language models are doing and
somehow through this very simple task of
predict the next word we get this
incredible performance from tools like
chat GPT and other large language models
so now with that Foundation said
hopefully you have a decent
understanding of what large language
models are and how they work and a
broader context for them now let's talk
about how we can use these in practice
here I will talk about three levels in
which we can use large language models
these three levels are ordered by the
technical expertise and computational
resources required the most accessible
way to use large language models is
prompt engineering next we have model
fine tuning and then finally we have
build your own large language model so
starting from level one prompt
engineering here I have a pretty broad
definition of prompt engineering here I
Define it as just using an llm out of
the box so more specifically not
touching any of the model parameters so
of these tens of billions or hundreds of
billions of parameters that Define the
model we're not going to touch any of
them we're just going to leave them as
is here I'll talk about two ways we can
do this one is the easy way and I'm sure
is the way that most people in the world
have interacted with large language
models which is using things like chat
GPT these are like intuitive user
interfaces they don't require any code
and they're completely free anyone can
just go to the Chad gbt website type in
a prompt and it'll spit out a response
so while this is definitely the easiest
way to do it it is a bit restrictive in
that you have to go to their website
this doesn't really scale well if you're
trying to build a product or service
around it but for a lot of use cases
this is actually super helpful so for
applications where the easy way doesn't
cut it there is the less easy way which
is using things like the open AI API or
the hugging phase Transformers library
and these tools provide ways to interact
with large language models
programmatically so essentially using
python in The Case of the openai API
instead of typing your request in the
chat GPT user interface you can send it
over to openai using Python and their
API and then you will get a response
back of course their API is not free so
you have to pay per API call another way
we can do this is via open source
Solutions one of which is the hugging
phase Transformers Library which gives
you easy access to open source large
language models so it's free and you can
run these models locally so no need to
send your potentially proprietary or
confidential information to a third
party and open AI so future videos of
the series we'll dive into all these
different aspects I'll talk about the
openai API what it is how it works share
example code I'll dive into the hugging
face Transformers Library same situation
what the heck is it how does it work and
then sharing some python example code
there I'll also do a video talking about
prompt engineering more generally how
can we create prompts to get good
responses from large language models and
so while prompt engineering is the most
accessible way to work with large
language models just working with a
model out of the box may give you
sub-optimal performance on a specific
task or use case or the model has really
good performance but it's massive it has
like a hundred billion parameters so
question might be is there a way we can
use a smaller model but kind of tweak it
in a way to have good performance on our
very narrow and specific use case and so
this brings us to level two which is
model fine tuning which here I Define as
adjusting at least one internal model
parameter for a particular task and so
here there are just generally two steps
one you get a pre-trained large language
model maybe from open AI or maybe an
open source model from the hugging phase
Transformers library and then you update
the model parameters given task specific
examples kind of going back to the
supervised learning versus
self-supervised learning the pre-trained
model is going to be a self-supervised
model so it will be trained on this
simple word prediction task but in step
two here's where we're going to do
supervised learning or even
reinforcement learning to tweak the
model parameters for a specific use case
and so this turns out to work very well
models like Chachi BT you're not working
with the raw pre-trained model the model
that you are interacting with in chat
GPT is actually a fine-tuned model
developed using reinforcement learning
and so a reason why this might work is
that in doing this self-supervised task
and doing the word prediction the base
model this pre-trained large language
model is learning useful representations
for a wide variety of tasks so in a
future video I will dive in more deeply
into fine tuning techniques popular one
is low rank adaptation or low raw for
short and then another popular one is
reinforcement learning with human
feedback or rlhf and of course there is
a third step here you'll deploy your
fine-tuned large language model to do
some kind of service or you know use it
in your day-to-day life and you'll
profit somehow and so my sense is
between prompt engineering and model
fine tuning you can probably handle 99
of large language model use cases and
applications however if you're a large
organization large Enterprise and
security is a big concern so you don't
want to use open source models or you
don't want to send data to a third party
via an API and maybe you want your large
language model to be very good at a
relatively specific set of tasks you
want to customize the training data in a
very specific way and you want to own
all the rides have it for commercial use
all this kind of stuff then it can make
sense to go one step further Beyond
monofine tuning and build your own large
language model and so here I Define it
as just coming up with all the model
parameters so I'll just talk about how
to do this at a very high level here and
I'll leave technical details for a
future video in the series first we need
to get our data and so what this might
look like is you'll get a book Corpus a
Wikipedia Corpus and a python Corpus and
so this is billions of tokens of text
and then you will take that and
pre-process it refine it into your
training data set and then you can take
the training data set and do the model
training through self-supervised
learning and then out of that comes the
pre-trained large language model so you
can take this as your starting point for
level two and go from there and so if
you enjoyed this video and you want to
read more be sure to check out the blog
in towards data science there I share
some more details that I may have missed
in this video this series is both a
video and blog Series so each video will
have an Associated blog and there will
also be tons of example code on the
GitHub repository story The goal of the
series is to really just make
information about large language models
much more accessible I really do think
this is the technological innovation of
our time and there's so many
opportunities for potential use cases
applications products services that can
come out of large language models and
that's something that I want to support
I think we'll be better off if more
people understand this technology and
are applying it to solving problems so
with that be sure to hit the Subscribe
button to keep up with future videos in
the series if you have any questions or
suggestions for other topics I should
cover in this series please drop those
in the comments section below and as
always thank you so much for your time
and thanks for watching
浏览更多相关视频
Introduction to Generative AI
Introduction to Large Language Models
A basic introduction to LLM | Ideas behind ChatGPT
Fine Tuning, RAG e Prompt Engineering: Qual é melhor? e Quando Usar?
Conversation w/ Victoria Albrecht (Springbok.ai) - How To Build Your Own Internal ChatGPT
ChatGPT e Engenharia de Prompt: Técnicas para o Prompt Perfeito
5.0 / 5 (0 votes)