Training Your Own AI Model Is Not As Hard As You (Probably) Think
Summary
TLDRThis video script outlines a practical approach to training specialized AI models, offering a more efficient and cost-effective alternative to large, off-the-shelf models like GPT-3. The speaker shares their experience with using pre-existing models and the challenges faced, leading to the decision to train a custom model. They detail the process of breaking down the problem, identifying the right model type, generating high-quality training data, and leveraging tools like Google's Vertex AI for training and deployment. The result is a faster, cheaper, and more customizable solution tailored to specific use cases, emphasizing the value of specialized AI models over general ones.
Takeaways
- 🤖 Custom AI models can be easier to train than expected with basic development skills.
- 🚫 Using off-the-shelf large models like GPT-3 and GPT-4 can be slow, expensive, and difficult to customize.
- 🔍 Breaking down a problem into smaller pieces is crucial for training a specialized AI model.
- 🛠️ Before building an AI, explore if the problem can be solved with existing models or plain code.
- 💡 Training a large model is costly and time-consuming, and may not be necessary for all problems.
- 📈 Small and specialized models can yield faster, cheaper, and more predictable results tailored to specific use cases.
- 📚 Generating high-quality example data is essential for training an effective AI model.
- 🔍 Object detection models can be repurposed for novel use cases, such as identifying elements in design files.
- 🛑 Quality assurance of the training data is critical to ensure the accuracy of the AI model.
- 🌐 Tools like Google's Vertex AI can simplify the process of training AI models without extensive coding.
- 🔧 Combining specialized AI models with plain code can create a robust and efficient solution for complex problems.
Q & A
Why did the speaker find using an off-the-shelf large model like GPT-3 or GPT-4 unsuitable for their needs?
-The speaker found using an off-the-shelf large model unsuitable because the results were disappointing, slow, expensive, unpredictable, and difficult to customize for their specific use case.
What were the benefits of training their own AI model according to the speaker's experience?
-Training their own AI model resulted in over 1,000 times faster and cheaper outcomes, more predictable and reliable results, and greater customizability compared to using a large, pre-existing model.
What is the first step the speaker suggests when trying to solve a problem with AI?
-The first step is to break down the problem into smaller pieces and explore if it can be solved with a pre-existing model to understand its effectiveness and potential for replication by competitors.
Why did the speaker's attempt to use a pre-existing model for converting Figma designs into code fail?
-The attempt failed because the model was unable to handle raw JSON data from Figma designs and produce accurate React components, resulting in highly unpredictable and often poor outcomes.
What is the main drawback of training a large AI model according to the speaker?
-The main drawbacks are the high cost and time required to train a large model, the complexity of generating the necessary data, and the long iteration cycles that can take days for training to complete.
What approach does the speaker recommend if a pre-existing model does not work well for a specific use case?
-The speaker recommends trying to solve as much of the problem without AI, breaking it down into discrete pieces that can be addressed with traditional code, and then identifying specific areas where a specialized AI model could be beneficial.
What are the two key things needed to train a specialized AI model according to the script?
-The two key things needed are identifying the right type of model and generating lots of example data to train the model effectively.
How did the speaker generate example data for training their specialized AI model?
-The speaker used a symbol crawler with a headless browser to pull up websites, evaluate JavaScript to identify images and their bounding boxes, and programmatically generate a large amount of training data.
What is the importance of data quality in training an AI model as emphasized in the script?
-Data quality is crucial because the quality of the AI model is entirely dependent on it. The speaker manually verified and corrected the bounding boxes in the generated examples to ensure high-quality training data.
How did the speaker utilize Google's Vertex AI in the process of training their AI model?
-The speaker used Vertex AI for uploading and verifying the training data, choosing the object detection model type, and training the model without needing to write code, utilizing Vertex AI's built-in tools and UI.
What role did an LLM play in the final step of the speaker's AI model pipeline?
-An LLM was used for the final step of customizing the code, making adjustments and providing new code with small changes based on the baseline code generated by the specialized models.
Outlines
🤖 Custom AI Model Training Over Large Pre-built Models
The speaker discusses the advantages of training a custom AI model over using large, pre-built models like OpenAI's GPT-3 and GPT-4. They share their experience where using a large language model (LLM) was slow, expensive, unpredictable, and difficult to customize. Instead, they trained a smaller, specialized model that was faster, cheaper, more predictable, and customizable. The speaker emphasizes the importance of breaking down a complex problem into smaller, manageable pieces and suggests exploring pre-existing models first before considering custom model training. They also touch on the challenges of training large models, such as cost, time, and data availability.
🛠️ Building a Specialized AI Model for Image Identification
The speaker outlines the process of creating a specialized AI model for identifying images within a Figma design and converting them into code. They describe using an object detection model to locate specific types of objects in an image, which in this case are groups of vectors that should be compressed into a single image for code generation. The speaker details the importance of generating high-quality example data, using a symbol crawler with a headless browser to create training data. They also discuss the necessity of manually verifying and correcting the data to ensure the model's accuracy. The use of Google's Vertex AI for training the model without coding is highlighted, along with the steps to upload data, verify it, and train the model. The speaker concludes with the successful deployment of the model and its integration into a tool for generating responsive, pixel-perfect code.
🚀 Leveraging AI and Plain Code for a Robust Toolchain
The speaker wraps up by summarizing the process of integrating specialized AI models with plain code to create a powerful toolchain. They advocate for testing an LLM for specific use cases and, if it falls short, relying on plain code solutions where possible. For areas where AI is necessary, they suggest finding or training a specialized model and generating custom data. The speaker also mentions using an LLM for the final step of code customization, despite its drawbacks, because it offers the best solution for that particular task. They invite viewers to explore a detailed blog post for further insights and express excitement for the innovative engineering projects that can be built using this approach.
Mindmap
Keywords
💡AI Model
💡LLM (Large Language Model)
💡Customization
💡Figma Design
💡Object Detection Model
💡Data Generation
💡Quality Assurance (QA)
💡Google's Vertex AI
💡Bounding Boxes
💡Confidence Threshold
💡Code Generation
Highlights
Training your own AI model can be easier and more cost-effective than using large off-the-shelf models.
Using an off-the-shelf large language model (LLM) like OpenAI's GPT-3 and GPT-4 for specific tasks resulted in slow, expensive, and unpredictable outcomes.
Training a specialized model can yield over 1,000 times faster and cheaper results with better customization.
Breaking down a complex problem into smaller, solvable parts is a recommended approach before considering AI solutions.
Pre-existing models may not be effective for specific use cases, necessitating the training of a custom model.
Large models are not always the best approach due to high training costs and long iteration cycles.
Attempting to solve as much of the problem without AI can lead to innovative and efficient traditional code solutions.
Identifying the right type of model and generating lots of example data are key to training a successful AI model.
Object detection models can be repurposed for novel use cases, such as identifying elements in a design for code generation.
Public data and tools like Google's Vertex AI can be utilized to generate and verify high-quality training data.
The quality of the model is entirely dependent on the quality of the training data.
Google's Vertex AI provides tools for uploading, verifying, and tweaking training data without custom code.
Training an AI model on Vertex AI can be done with minimal cost and without specialized hardware.
Specialized models can be more effective for specific tasks like image identification and layout hierarchy building.
Plain code is often the fastest, cheapest, and most reliable solution where applicable.
LLMs can be effectively used in specific steps of a pipeline, such as making adjustments to baseline code.
Creating a custom toolchain allows for control and optimization of the entire process from design to code.
Testing an LLM for a specific use case is recommended for exploratory purposes before investing in custom models.
For a detailed guide on training specialized AI models, refer to the latest blog post on the builder blog.
Transcripts
training your own AI model is a lot
easier than you probably think I'll show
you how to do it with only basic
development skills in a way that for us
yielded wildly faster cheaper and better
results than using an off-the-shelf
large model like those provided by open
AI but first why not just use an llm in
our experience we tried to apply an llm
to our problem like open ai's gpt3 and
GPT 4 but the results were very
disappointing for our use case it was
incredibly slow insanely expensive
highly unpredictable and very difficult
to customize so instead we trained our
own model it wasn't as hard as we
anticipated and because our models were
small and specialized the results were
that they were over 1,000 times faster
and cheaper and they not only served our
use case better but were more
predictable more reliable and of course
far more customizable so let's break
down how you can train your own
specialized AI model like we did first
you need to break down your problem into
smaller pieces in our case we wanted to
take any figma design and automatically
convert that into high quality code in
order to break this problem down we
first explored our options the first one
I'd suggest you always try is basically
what I suggested not to do which is see
if you can solve your problem with a
pre-existing model if you find this
effective it can allow you to get a
product to Market faster and test on
real users as well as understand how
easy to replicate this might be for
competitors and ultimately if you find
this works well for you but some of
those drawbacks I mentioned become a
problem such as cost beat or
customization you could train your own
model on the side and keep your finding
it until it outperforms the llm you
tried first but in many cases you might
find that these popular general purpose
models just don't work well for your use
case at all in our case we tried feeding
it figma designs as raw Jon data and
asking for react components out the
other side and it just frankly did awful
we also tried GPT 4V and taking
screenshots of figma designs and getting
cut out the other side and similarly the
results were highly unpredictable and
often terribly bad so if you can't just
pick up and use a model off the shelf
now we need to explore what it would
look like to train our own a lot of
people have the intuition let's just
make one big giant model where the input
is the figma design and the output is
the fully finished code we'll just apply
millions of figma designs with millions
of code Snippets and we'll be done the
AI model will solve all our problems the
reality is a lot more nuanced than that
first training a large model is
extremely expensive the larger it is and
the more data it needs the more costly
it is to train large models also take a
lot of time to train so as you iterate
and make improvements your iteration
Cycles can be days at a time waiting for
training to complete and even if you can
afford that amount of time expense and
have the expertise needed to make these
large complicated custom models you may
not have any way to generate all that
data you need anyway if you can't find
this data on the open web then are you
really going to pay thousands of
developers to hand code millions of
figma designs into react or any other
framework let alone all the different
styling options like Tailwind versus
emotion versus CSS modules it just
becomes an impossibly complex problem to
solve and a super duper model that just
does everything for us is probably not
the right approach here at least not
today when you run into problems like
this I would highly recommend trying to
swing the pendulum to the complete other
end and try as hard as you can to solve
as much of this problem as possible
without AI whatsoever that forces you to
break the problem down into lots of
discreet pieces that you can write
normal traditional code for and see how
far you can solve this in my experience
however far you think you can solve it
with some iteration and creativity you
can get a lot farther than you think
when we tried to break this problem down
into just plain code we realized that
there were a few different specific
problems we had to solve in our findings
at least two of the five problems were
really easy to just solve with code
where we hit challenges was in those
other three areas so let's take that
first step of identifying images and
cover how we can train our own
specialized model to solve this use case
you really only need two key things to
train your own model these days first is
identify the right type of model and
second you need to generate lots of
example data in our case we're able to
find a very common type of model that
people train is an object detection
model which can take an image and return
some bounding boxes on where it found
specific types of objects in this case
locating the three cats and so we asked
ourselves could we train this on a
slightly novel use case which is to take
a figma design as an image which uses
hundreds of vectors throughout but for a
website or mobile app certain groups of
those should really be compressed into
one single image and can identify where
those image points would be so we can
compress those into one and generate the
code accordingly so that leads us to
step two we need to generate lots of
example data and see if training this
model accordingly will work out for our
use case we thought wait a second could
we derive this data from somewhere
somewhere that's public and free just
like tools like open AI did where they
crawl through tons of public data on the
web and GitHub and use that as the basis
of their training ultimately we realized
yes we wrote a symol crawler that uses a
headless browser to pull up a website
into it and then evaluate some
JavaScript on the page to identify where
the images are and what their bounding
boxes are which was able to generate a
lot of training data for us really
quickly now keep in mind one critical
thing quality of your model is entirely
dependent on quality of your data so out
of hundreds of examples we generated we
manually went through and used
engineering to verify that every single
bounding box was correct every time and
used a visual tool to correct it anytime
there weren't in my experience this can
become one of the most complex areas of
machine learning which is building your
own tools to generate QA and fix data to
ensure that your data set is as
Immaculate as possible so that your
model has the highest quality
information to go off of now in the case
of this object detection model luckily
we use Google's vertex AI which has that
exact tooling built in in fact vertex AI
I is how we uploaded all that data and
train the model without even needing to
do that in code at all all you need to
do is go to the vertex AI section of the
Google Cloud console go to data sets and
hit create we then can choose that we're
using an object detection model and hit
create and now you just need to upload
your data you can do it manually by
selecting files from your computer and
then use their visual tool to outline
the areas that matter to us which is a
huge help that we don't have to build
that ourselves or in our case because we
generated all of our data
programmatically
we can just upload it to Google cloud in
this format where you provide a path to
an image and then list out the bounding
boxes of the objects you want to
identify then back in Google Cloud you
can manually verify or tweak your data
as much as you need and then once your
data set's in shape all we need to do is
train our model I use all the default
settings here and I use the minimum
amount of training hours this is the one
piece that will cost you some money in
this case the minimum amount of training
needed costs about $60 now that's a lot
cheaper than buying your own GPU and
letting it run for hours or days at a
time but if you don't want to pay a
cloud provider trending on your own
machine is still an option there's a lot
of nice python libraries that are not
complicated to learn where you can do
this too once you hit start trading in
an rase took about three Real World
hours then you can find your training
results and deploy your model which in
this case I've already done that can
take a couple minutes and then you'll
have an API endpoints that you can send
an image and get back a set of bounding
boxes with their confidence levels we
could also use the UI here as well so to
test it out now in figma I'm just going
to take a screen grab of a portion of
this figma file because I'm lazy and I
can just upload it to the UI to test and
there we go we can see it did a decent
job but there are some mistakes here but
there's something important to know this
UI is showing all possible images
regardless of confidence when I take my
cursor and I hover over each area that
has high confidence these are spot-on
these are perfect look at that and the
strange ones are the ones down here with
really low confidence I mean these are
just wrong but that works as expected
this even gives you an API where you can
specify only return results above a
certain confidence threshold by looking
at this I think we want a threshold of
at least point2 and there you have it
with a specialized model we can run it
wildly faster and cheaper and when we
broke down our problem we found for
image identification A specialized model
was a much better solution for building
the layout hierarchy similarly we made
our own specialized model for that too
for Styles and basic code generation
plain code with a perfect solution and
don't forget plain code is always the
fastest the cheapest the easiest to test
the easiest to debug the most
predictable and just the best things
whenever you could use it absolutely
just do that and then finally to allow
people to customize their code name it
better use different libraries than we
already support we used an llm for the
final step now that we're able to take a
design and make Baseline code llms are
very good at taking basic code and
making adjustments to the code giving
you new code with small changes back so
despite all my complaints about llms and
the fact that I still hate how slow and
costly that step is in this pipeline it
was and continues to be the best
solution for that one specific piece and
now when we bring all that together and
launch the builder. figment importer all
I need to do is Click generate code we
will rapidly run through those
specialized models and launch it to the
Builder visual editor where we converted
that design into responsive Pixel
Perfect code that we can output as high
quality react quick view Etc code and
even change options to use popular
styling Frameworks like Tailwind Etc
doing all this super cool AI magic and
you can just copy and paste right into
your code base and luckily because we
created this entire tool chain all of
that's in our control and that's it to
quickly recap I would always recommend
testing an llm for your use case just
for exploratory purposes but if it's not
hitting the mark write plain old code as
much as you possibly can and where you
hit bottleneck see if you can find a
specialized type of model that you can
train generating your own data and using
a product like vertex AI or many others
and create your own robust incredible
tool chain to wow your users with
exciting feat of engineering that they
maybe have never seen before for a more
detailed breakdown of everything I just
showed you here check out my latest blog
post on the builder. blog and I can't
wait to see what you go and build
5.0 / 5 (0 votes)