GPT2 implemented in Excel (Spreadsheets-are-all-you-need) at AI Tinkerers Seattle
Summary
TLDRIn this engaging presentation, Isan showcases a personal project that demonstrates the power of AI and edge computing. He introduces a Python script using GPT-2 and a spreadsheet model that emulates the Transformer architecture without any coding, allowing users to interact with and understand the mechanics of AI models like GPT-2. The project, which he describes as both a teaching tool and a fascinating exploration of AI's inner workings, provides insights into the attention mechanism and the Chain of Thought prompting technique. Isan also shares his experience with the challenges of implementing such a model in a spreadsheet and offers resources for further exploration.
Takeaways
- 👨💻 The speaker, Isan, works at an edge compute platform that powers 4% of the internet and is open to discussing AI and Edge projects.
- 🚀 Isan has a personal side project called 'Spreadsheets Are All You Need', demonstrating the use of spreadsheets for complex tasks like running AI models.
- 🐍 The project involves using around 40-50 lines of Python code to integrate with the GPT-2 model, using a very short prompt at zero temperature.
- 📊 Isan has created a spreadsheet that implements GPT-2 without any API calls or Python, using only spreadsheet functions.
- ⏱️ The spreadsheet's computation is resource-intensive and can take about a minute to process, with a warning against running it on a Mac due to potential UI lockups.
- 🛠️ The spreadsheet serves as a teaching tool, analogous to the way computer architecture courses help understand system building and programming.
- 🎥 Isan is creating a series of videos that walk through every step of the GPT-2 implementation, focusing on the inference pass.
- 🔍 The spreadsheet allows for hands-on exploration of the Transformer model, including attention mechanisms and the Chain of Thought prompting technique.
- 🤖 The project provides insights into why certain aspects of the model are named the way they are, such as 'attention is all you need'.
- 📈 The speaker shares the 'weights' tab of the spreadsheet, which contains a massive amount of data, including all 124 million parameters of GPT-2.
- 🚫 The project has limitations, such as a maximum of 10 tokens due to the size of the weight matrices, and the difficulty of programmatically expanding the model.
Q & A
What is Isan's day job?
-Isan works at an edge compute platform that powers 4% of the internet.
What is the purpose of Isan's personal side project?
-Isan's personal side project aims to demonstrate and teach the concepts of AI and Edge computing using a unique tool: a spreadsheet that implements GPT-2 without any API calls or Python code.
How does Isan's spreadsheet project work?
-The spreadsheet project uses Excel functions to implement GPT-2, allowing users to input prompts and receive outputs without writing any code. It's designed to be an educational tool for understanding the Transformer model and its mechanisms.
What are the benefits of using a spreadsheet to teach about GPT-2?
-Using a spreadsheet to teach about GPT-2 provides a more approachable and tangible way for both non-developers and developers to understand the model's architecture and functionality. It allows users to visually track the information flow and make changes to see the effects in real-time.
What is the 'Chain of Thought' prompting technique mentioned in the script?
-Chain of Thought prompting is a technique where the AI is given more context or steps to reason through a problem. It helps the AI provide more detailed and accurate responses by simulating a step-by-step reasoning process similar to human thought.
What issues did Isan encounter while implementing GPT-2 in a spreadsheet?
-Isan faced challenges such as the spreadsheet's large size causing the Mac UI to lock up randomly, and dealing with the complexity of implementing bite pair encoding and positional embeddings within Excel's limitations.
How can one access Isan's spreadsheet project?
-The spreadsheet project can be accessed by visiting the website 'spreadsheets are all you need' where users can find videos, download the spreadsheet, and report any bugs or ask questions.
What is the significance of the attention mechanism in Transformers?
-The attention mechanism in Transformers allows the model to focus on different parts of the input sequence when generating each output element. It helps the model to handle long-range dependencies and understand the context better, which is crucial for tasks like text understanding and generation.
How does the spreadsheet demonstrate the 'attention is all you need' concept?
-The spreadsheet visually shows how the attention mechanism works by allowing users to see how tokens interact with each other at each layer. It provides a clear demonstration of how the model attends to different parts of the input, which is a core concept in Transformer models.
What was Isan's experience with running GPT-2 from source?
-Isan found running GPT-2 from source to be a challenging experience due to the complexity of the process and the difficulty in getting a working environment set up, particularly with TensorFlow 1.x.
What is the size of the spreadsheet in terms of parameters and file size?
-The spreadsheet contains all 124 million parameters of GPT-2 and is 1.5 GB in size in Excel binary format.
Outlines
🚀 Introduction to AI and Edge Computing
The speaker, Isan, introduces himself and his work at an edge compute platform that powers 4% of the internet. He invites those with AI and Edge-related projects to connect. The main focus of the video, however, is a personal project called 'Spreadsheets Are All You Need,' showcasing a Python code snippet using the GPT-2 model with a short prompt at zero temperature. The speaker emphasizes the simplicity of the prompt and its obvious completion, providing an example of how the model's predictions work based on the input given.
📊 Spreadsheet Implementation of GPT-2
Isan demonstrates a spreadsheet that implements GPT-2 without any API calls or Python code, using only spreadsheet functions. He explains the process of running the model within Excel, including the need to manually recalculate due to the time-consuming nature of the operation. He warns against running it on a Mac due to potential UI lockups and shares his experience with the threading issues. The speaker positions this tool as an educational resource, akin to computer architecture courses, to understand the underlying mechanisms of Transformers and LMS models. He also discusses the benefits of this approach, such as its accessibility to non-developers and the practical insights it offers to developers, including a deeper understanding of concepts like the attention mechanism in Transformers.
🔍 Deep Dive into GPT-2's Architecture in Spreadsheet
Isan continues to explore the intricacies of GPT-2's architecture within the spreadsheet. He walks through the various components, such as the attention mechanism, residual connections, and multi-layer perceptron, explaining how they function and interact. He also touches on the potential for using the model for Chain of Thought prompting, providing a technical explanation for its effectiveness. The speaker shares his experience with the model's performance, including the challenges of running it from source and the limitations of the Excel implementation, such as the character limit and the need for manual rearrangement of matrices.
Mindmap
Keywords
💡Edge Computing
💡AI
💡Transformers
💡GPT-2
💡Spreadsheets
💡Inference
💡Attention Mechanism
💡Chain of Thought Prompting
💡Positional Encoding
💡Layer Normalization
💡Residual Connections
Highlights
Isan works at an edge compute platform that powers 4% of the internet.
Isan's personal side project is called 'Spreadsheets are all you need'.
The project involves using 40-50 lines of Python code with Hugging Face Transformers and GPT-2 small at zero temperature.
Isan demonstrates a spreadsheet that implements GPT-2 without any API calls or Python, using only spreadsheet functions.
The spreadsheet requires manual recalculation due to its computational intensity.
Running the spreadsheet on a Mac may cause the UI to lock up due to threading issues.
The project serves as a teaching tool for understanding Transformers and LMS.
Isan is creating videos that walk through every step of the GPT-2 implementation for the inference pass.
The spreadsheet allows for a hands-on, approachable understanding of AI models, even for non-developers.
Isan explains the attention mechanism in Transformers, how information flows through the model.
Chain of Thought prompting is practically demonstrated by giving the model more space and vectors to compute against.
The spreadsheet contains the entire GPT-2 model, including all 124 million parameters.
The spreadsheet is 1.5 GB in size and hosted on GitHub due to its large file size.
Isan shares his experience of implementing GPT-2 from source and the challenges faced.
The project aims to provide a visual and interactive learning experience for computer science concepts related to AI.
Isan's project has the potential to demystify AI models and make them more accessible to a wider audience.
The spreadsheet includes detailed components such as token embeddings, positional embeddings, and attention values.
Isan's project showcases the complexity and depth of AI models in a tangible and understandable way.
Transcripts
hi everyone I'm isan my day job actually
is working at an edge compute platform
that poers 4% of the internet so if
you've got something that you think is
AI and Edge can help it out let me know
I'd like to talk to you but today is a
personal side project I call
spreadsheets are all you need so uh let
me just cut to the punchline here uh
let's move here this is let's blow this
up can you guys see that yeah okay it's
about 40 50 lines of python code just
hug and face Transformers draun gpt2
small using uh very short prompt at zero
temperature so I'm going to put in a
really simple prompt once it decides to
process there we go Mike is quick he
moves and I like this prompt because
it's small and the completion is really
obvious right what would you expect
about Mike knowing that he's quick well
that he moves quickly um
so now let me go over here this is a
spreadsheet that also
implements all of gpt2
Spawn no API calls no python entirely in
spreadsheet functions what so here's
what we're going to
do we are going to push this button so
you see that where it is here that's
where your predicted token is going to
come in it's not doing it right now
because it takes so long to run I turned
off automatic recalculations there this
mode in Excel you got to push a button
to
recalculate uh so I'm going to hit this
button and then we should get quickly
right right here you guys ready yeah
okay here we
go okay now you can see at the bottom I
don't know if you can see way in the
back you see it's like calculating
calculating calculating this is going to
take about a minute just to warn you
guys um and by the the way do not run
this on a Mac it is so big the Mac UI
will lock up on you for a minute
randomly at time there's no reason it
should do that somebody messed up on the
threading you're running on a m I am
running on
a so I I actually after making this
whole thing work for months dealing with
it randomly stopping on me for for a
minute at a time I tried on my wife's PC
and it never locked up so somebody
messed up on the
implementation uh there there you
goly okay so uh okay good God why would
anyone do this so besides just being a
mass kist well uh a couple reasons um
the obviously you're not going to run
production workloads on this right it's
really a teaching tool so if you've had
you know formal computer science
training there's a class usually they
called computer architecture or computer
organization right they start with
circuits they build themand game
then logic gates then into an Alo and
then all the way to a microprocessor and
even if you're not going into chip
design it gives you a really good
grounding when you're actually
programming and Building Systems out
processors on top of it uh and what I'm
trying to do actually with the
spreadsheet is do the same thing but for
Transformers and LMS so I'm creating a
bunch of videos uh there's only two so
far where I'm walking through every step
of the gpt2 implement a uh at least for
the inference pass so from bite pair
encoding all the way to the layer Norms
the residual connections the
multi-headed tension the multi-layer
perceptron and getting the loit up to
the prediction uh step by step um and I
think you know two benefits one is like
as a non-developer this is really
approachable there's something visceral
about being able to play with this even
as a developer I think it's useful I'll
just give you one you know example um
you know personally
like oh my God I now know why they said
and called it attention is all you need
what else are you going to call it when
you watch the information flow
especially in a spreadsheet of
everything going through the Transformer
it's really crazy you type all these
tokens in and they only talk to each
other once at each layer at the
attention mechanism like you can
actually take the diffs like you create
a tab you create the additional tab in
the spreadsheet and just see how they
change and make changes and you're like
it's amazing there's only one time they
talk to each other so that seems a
little theoretical but like a practical
example would be Chain of Thought
prompting which you guys probably heard
of you know we can think of that
anthropomorphically like oh yeah as a
person if you gave ask me to think aloud
I might reason better but if you really
want to ground it more technically more
satisfying Theory and the one I kind of
subscribe to that some people believe
and when you see the information flow
makes a lot of sense what you're really
doing is you're a giving it more space
more vectors to compute against but
you're also giving it more hits at that
detention Mech more passes at it okay so
that's I think my one minute timer or
even less I mod zero Les but you can go
yeah okay so that's it I'll wrap up if
you want to download the spreadsheet go
to spreadsheets are all you
need. and you can see the videos and
download the spreadsheet and let me know
if you find bugs or question thank
you see the weights tab so bad oh you
want to see the weights tab oh my God
okay so okay so first of all okay so
you've got the prompt to tokens that's
here these are some random constants
this is where whoops this is where I
actually do bite pair and coating inside
a spreadsheet that's the hardest thing
to do inside this thing because it's not
matte mold it's all con cats but then
you get here this is where we convert
them to embeddings so here's your uh
text embeddings and then here's your
positional embeddings and one of the
videos a really great demo is you change
one of the tokens and you make them
duplicate between the top and the the
bottom and you can see that they're
identical here but they're not identical
here because the positional encoding has
changed them um then here are the blocks
and there are 12 of these this is block
zero each of these has who uh there's a
lot here oh this is what happens when
you're in parallels uh it randomly
decides to start scrolling but oh wait
that's not it running that's it
scrolling that's scrolling yeah welcome
welcome to running in BM that's also
what leans it's slow uh I don't know if
you can see the 16 Here There are 16
steps you can follow along that's you
know each layer inside one block so
here's like the residual connection
here's here's your attention values uh
here's the linear projection of that
let's see there's residual connection
there's the layer
normalization uh and look I'll click on
this this is this is all spreadsheet
there's a a m ball right right there
that's like massively long this was like
a serious amount of time span oh yeah
like this is your attention Matrix in
here so like
he I say my
is lick he
moves right and then you can see it
reference oh no that's not the multi hit
this is the soft Max this is what I was
going for so Mike is quick he and this
is one head moves right and you can see
it actually looking at itself so this
would be the mic here and you can see he
is looking at mik you see that's a 73 so
it's referencing that but keep in mind
this is one head and there's a really
good open AI paper where they actually
ask gp4 to explain parts of gpt2 and I
you know I think it' be actually really
interesting to put that into the
spreadsheet and take a look but anyways
you're asking about the weights so
here's your 11th block one block feeds
into the other uh this formula basically
calculates from which block it is which
weight do you see these names right here
so there is a version of this name for
every single weight so this is layer
known this is predicted token
this is your ID to tokens this is your
triangle mask right for caal this is the
most a thing I've ever this is this is
your positional encoding by the way you
know one of the things I was doing like
I was asking chat GPT to tell me about
the architecture of GPT while I was
doing it it gets some things wrong like
it told me the positional encodings were
sinusoids even though they're learned uh
and I was like no you're wrong and then
finally apologize um but but it was
really helpful most of the time so then
okay here's your attention waste right
this is the like there
are remember I counted at one point but
like there is weight Matrix after weight
Matrix after weight Matrix after weight
Matrix all the way down all 124 million
parameters are in here this sheet is 1.5
GB in Excel binary format not the XML
form so I couldn't put it on I like
where do I poost this thing so it's
hosted right now as a release on GitHub
because I couldn't put it as just an
Excel file it was too big so uh and then
the other problem is it's really limited
like 10 characters 10 uh tokens if I
want to expand it I have to rearrange
the whole Matrix did you make it
programmatically like how you type wait
what I did not I after I did I'm like oh
I should do this
program I I really wanted you you know
in the that movie The Matrix right when
you're looking at the the numbers that's
what it felt like and and I
should good uh I really I started
playing around in June oh the other
problem is I tried running gpt2 from
Source don't do that it's britne me a
tensorflow one it's really hard to get a
working a collab notebook you got use
yeah you got to use uh but not on a map
anyways I should wrap up
but okay that was amazing I'm just going
to do kind of quick things first of all
Посмотреть больше похожих видео
5.0 / 5 (0 votes)