Training Your Own AI Model Is Not As Hard As You (Probably) Think

Steve (Builder.io)
22 Nov 202310:23

Summary

TLDRThis video script outlines a practical approach to training specialized AI models, offering a more efficient and cost-effective alternative to large, off-the-shelf models like GPT-3. The speaker shares their experience with using pre-existing models and the challenges faced, leading to the decision to train a custom model. They detail the process of breaking down the problem, identifying the right model type, generating high-quality training data, and leveraging tools like Google's Vertex AI for training and deployment. The result is a faster, cheaper, and more customizable solution tailored to specific use cases, emphasizing the value of specialized AI models over general ones.

Takeaways

  • πŸ€– Custom AI models can be easier to train than expected with basic development skills.
  • 🚫 Using off-the-shelf large models like GPT-3 and GPT-4 can be slow, expensive, and difficult to customize.
  • πŸ” Breaking down a problem into smaller pieces is crucial for training a specialized AI model.
  • πŸ› οΈ Before building an AI, explore if the problem can be solved with existing models or plain code.
  • πŸ’‘ Training a large model is costly and time-consuming, and may not be necessary for all problems.
  • πŸ“ˆ Small and specialized models can yield faster, cheaper, and more predictable results tailored to specific use cases.
  • πŸ“š Generating high-quality example data is essential for training an effective AI model.
  • πŸ” Object detection models can be repurposed for novel use cases, such as identifying elements in design files.
  • πŸ›‘ Quality assurance of the training data is critical to ensure the accuracy of the AI model.
  • 🌐 Tools like Google's Vertex AI can simplify the process of training AI models without extensive coding.
  • πŸ”§ Combining specialized AI models with plain code can create a robust and efficient solution for complex problems.

Q & A

  • Why did the speaker find using an off-the-shelf large model like GPT-3 or GPT-4 unsuitable for their needs?

    -The speaker found using an off-the-shelf large model unsuitable because the results were disappointing, slow, expensive, unpredictable, and difficult to customize for their specific use case.

  • What were the benefits of training their own AI model according to the speaker's experience?

    -Training their own AI model resulted in over 1,000 times faster and cheaper outcomes, more predictable and reliable results, and greater customizability compared to using a large, pre-existing model.

  • What is the first step the speaker suggests when trying to solve a problem with AI?

    -The first step is to break down the problem into smaller pieces and explore if it can be solved with a pre-existing model to understand its effectiveness and potential for replication by competitors.

  • Why did the speaker's attempt to use a pre-existing model for converting Figma designs into code fail?

    -The attempt failed because the model was unable to handle raw JSON data from Figma designs and produce accurate React components, resulting in highly unpredictable and often poor outcomes.

  • What is the main drawback of training a large AI model according to the speaker?

    -The main drawbacks are the high cost and time required to train a large model, the complexity of generating the necessary data, and the long iteration cycles that can take days for training to complete.

  • What approach does the speaker recommend if a pre-existing model does not work well for a specific use case?

    -The speaker recommends trying to solve as much of the problem without AI, breaking it down into discrete pieces that can be addressed with traditional code, and then identifying specific areas where a specialized AI model could be beneficial.

  • What are the two key things needed to train a specialized AI model according to the script?

    -The two key things needed are identifying the right type of model and generating lots of example data to train the model effectively.

  • How did the speaker generate example data for training their specialized AI model?

    -The speaker used a symbol crawler with a headless browser to pull up websites, evaluate JavaScript to identify images and their bounding boxes, and programmatically generate a large amount of training data.

  • What is the importance of data quality in training an AI model as emphasized in the script?

    -Data quality is crucial because the quality of the AI model is entirely dependent on it. The speaker manually verified and corrected the bounding boxes in the generated examples to ensure high-quality training data.

  • How did the speaker utilize Google's Vertex AI in the process of training their AI model?

    -The speaker used Vertex AI for uploading and verifying the training data, choosing the object detection model type, and training the model without needing to write code, utilizing Vertex AI's built-in tools and UI.

  • What role did an LLM play in the final step of the speaker's AI model pipeline?

    -An LLM was used for the final step of customizing the code, making adjustments and providing new code with small changes based on the baseline code generated by the specialized models.

Outlines

00:00

πŸ€– Custom AI Model Training Over Large Pre-built Models

The speaker discusses the advantages of training a custom AI model over using large, pre-built models like OpenAI's GPT-3 and GPT-4. They share their experience where using a large language model (LLM) was slow, expensive, unpredictable, and difficult to customize. Instead, they trained a smaller, specialized model that was faster, cheaper, more predictable, and customizable. The speaker emphasizes the importance of breaking down a complex problem into smaller, manageable pieces and suggests exploring pre-existing models first before considering custom model training. They also touch on the challenges of training large models, such as cost, time, and data availability.

05:01

πŸ› οΈ Building a Specialized AI Model for Image Identification

The speaker outlines the process of creating a specialized AI model for identifying images within a Figma design and converting them into code. They describe using an object detection model to locate specific types of objects in an image, which in this case are groups of vectors that should be compressed into a single image for code generation. The speaker details the importance of generating high-quality example data, using a symbol crawler with a headless browser to create training data. They also discuss the necessity of manually verifying and correcting the data to ensure the model's accuracy. The use of Google's Vertex AI for training the model without coding is highlighted, along with the steps to upload data, verify it, and train the model. The speaker concludes with the successful deployment of the model and its integration into a tool for generating responsive, pixel-perfect code.

10:02

πŸš€ Leveraging AI and Plain Code for a Robust Toolchain

The speaker wraps up by summarizing the process of integrating specialized AI models with plain code to create a powerful toolchain. They advocate for testing an LLM for specific use cases and, if it falls short, relying on plain code solutions where possible. For areas where AI is necessary, they suggest finding or training a specialized model and generating custom data. The speaker also mentions using an LLM for the final step of code customization, despite its drawbacks, because it offers the best solution for that particular task. They invite viewers to explore a detailed blog post for further insights and express excitement for the innovative engineering projects that can be built using this approach.

Mindmap

Keywords

πŸ’‘AI Model

An AI model, in the context of the video, refers to a machine learning system designed to process information and make predictions or decisions based on that information. The video discusses training a custom AI model for a specific task, which is more efficient and cost-effective than using a generic, large-scale model like those provided by OpenAI.

πŸ’‘LLM (Large Language Model)

LLM stands for Large Language Model, which is a type of AI model designed to process and generate human-like text. The video mentions using LLMs like OpenAI's GPT-3 and GPT-4, but found them to be slow, expensive, and difficult to customize for their specific use case.

πŸ’‘Customization

Customization, in this video, refers to the ability to tailor an AI model to fit specific needs or requirements of a particular application. The speaker emphasizes the importance of customization for their project, as it allows for a more predictable, reliable, and specialized solution.

πŸ’‘Figma Design

Figma is a cloud-based interface design and collaboration tool. In the video, the speaker describes a project where they aimed to automatically convert Figma designs into high-quality code, highlighting the need to break down complex problems into smaller, more manageable pieces.

πŸ’‘Object Detection Model

An object detection model is a type of AI model that can identify and locate objects within an image. The video describes using such a model to process Figma designs as images and identify specific elements that need to be compressed into a single image for code generation.

πŸ’‘Data Generation

Data generation is the process of creating or collecting data that can be used to train an AI model. The video discusses the importance of generating high-quality example data for training their specialized AI model, using a tool that crawls websites to identify images and their bounding boxes.

πŸ’‘Quality Assurance (QA)

Quality Assurance (QA) in the context of AI model training refers to the process of verifying and ensuring the accuracy and quality of the data used for training. The video emphasizes manually checking and correcting the data to ensure the model learns from the most accurate information possible.

πŸ’‘Google's Vertex AI

Google's Vertex AI is a machine learning platform that provides tools for training and deploying AI models. The video describes using Vertex AI to upload data, train their object detection model, and deploy it as an API endpoint without needing to write any code.

πŸ’‘Bounding Boxes

Bounding boxes are rectangular frames used in computer vision to outline and locate objects within an image. In the video, bounding boxes are used to identify specific elements within Figma designs so that they can be processed by the AI model for code generation.

πŸ’‘Confidence Threshold

A confidence threshold is a value used to determine the accuracy of an AI model's predictions. The video mentions setting a confidence threshold for their model to ensure that only high-confidence predictions are returned, which helps in filtering out incorrect or less reliable results.

πŸ’‘Code Generation

Code generation refers to the process of automatically creating code from a different type of input, such as a design file. The video discusses using a combination of specialized AI models and plain code to generate high-quality, responsive code from Figma designs.

Highlights

Training your own AI model can be easier and more cost-effective than using large off-the-shelf models.

Using an off-the-shelf large language model (LLM) like OpenAI's GPT-3 and GPT-4 for specific tasks resulted in slow, expensive, and unpredictable outcomes.

Training a specialized model can yield over 1,000 times faster and cheaper results with better customization.

Breaking down a complex problem into smaller, solvable parts is a recommended approach before considering AI solutions.

Pre-existing models may not be effective for specific use cases, necessitating the training of a custom model.

Large models are not always the best approach due to high training costs and long iteration cycles.

Attempting to solve as much of the problem without AI can lead to innovative and efficient traditional code solutions.

Identifying the right type of model and generating lots of example data are key to training a successful AI model.

Object detection models can be repurposed for novel use cases, such as identifying elements in a design for code generation.

Public data and tools like Google's Vertex AI can be utilized to generate and verify high-quality training data.

The quality of the model is entirely dependent on the quality of the training data.

Google's Vertex AI provides tools for uploading, verifying, and tweaking training data without custom code.

Training an AI model on Vertex AI can be done with minimal cost and without specialized hardware.

Specialized models can be more effective for specific tasks like image identification and layout hierarchy building.

Plain code is often the fastest, cheapest, and most reliable solution where applicable.

LLMs can be effectively used in specific steps of a pipeline, such as making adjustments to baseline code.

Creating a custom toolchain allows for control and optimization of the entire process from design to code.

Testing an LLM for a specific use case is recommended for exploratory purposes before investing in custom models.

For a detailed guide on training specialized AI models, refer to the latest blog post on the builder blog.

Transcripts

play00:00

training your own AI model is a lot

play00:02

easier than you probably think I'll show

play00:04

you how to do it with only basic

play00:05

development skills in a way that for us

play00:07

yielded wildly faster cheaper and better

play00:10

results than using an off-the-shelf

play00:12

large model like those provided by open

play00:14

AI but first why not just use an llm in

play00:17

our experience we tried to apply an llm

play00:19

to our problem like open ai's gpt3 and

play00:22

GPT 4 but the results were very

play00:24

disappointing for our use case it was

play00:26

incredibly slow insanely expensive

play00:28

highly unpredictable and very difficult

play00:30

to customize so instead we trained our

play00:32

own model it wasn't as hard as we

play00:34

anticipated and because our models were

play00:36

small and specialized the results were

play00:38

that they were over 1,000 times faster

play00:41

and cheaper and they not only served our

play00:43

use case better but were more

play00:44

predictable more reliable and of course

play00:47

far more customizable so let's break

play00:50

down how you can train your own

play00:51

specialized AI model like we did first

play00:53

you need to break down your problem into

play00:55

smaller pieces in our case we wanted to

play00:57

take any figma design and automatically

play00:59

convert that into high quality code in

play01:01

order to break this problem down we

play01:03

first explored our options the first one

play01:05

I'd suggest you always try is basically

play01:07

what I suggested not to do which is see

play01:09

if you can solve your problem with a

play01:11

pre-existing model if you find this

play01:12

effective it can allow you to get a

play01:14

product to Market faster and test on

play01:16

real users as well as understand how

play01:18

easy to replicate this might be for

play01:19

competitors and ultimately if you find

play01:22

this works well for you but some of

play01:23

those drawbacks I mentioned become a

play01:25

problem such as cost beat or

play01:26

customization you could train your own

play01:28

model on the side and keep your finding

play01:30

it until it outperforms the llm you

play01:32

tried first but in many cases you might

play01:34

find that these popular general purpose

play01:36

models just don't work well for your use

play01:38

case at all in our case we tried feeding

play01:41

it figma designs as raw Jon data and

play01:44

asking for react components out the

play01:45

other side and it just frankly did awful

play01:48

we also tried GPT 4V and taking

play01:50

screenshots of figma designs and getting

play01:52

cut out the other side and similarly the

play01:54

results were highly unpredictable and

play01:56

often terribly bad so if you can't just

play01:58

pick up and use a model off the shelf

play02:00

now we need to explore what it would

play02:01

look like to train our own a lot of

play02:03

people have the intuition let's just

play02:04

make one big giant model where the input

play02:07

is the figma design and the output is

play02:09

the fully finished code we'll just apply

play02:11

millions of figma designs with millions

play02:13

of code Snippets and we'll be done the

play02:15

AI model will solve all our problems the

play02:17

reality is a lot more nuanced than that

play02:19

first training a large model is

play02:22

extremely expensive the larger it is and

play02:24

the more data it needs the more costly

play02:26

it is to train large models also take a

play02:28

lot of time to train so as you iterate

play02:30

and make improvements your iteration

play02:32

Cycles can be days at a time waiting for

play02:35

training to complete and even if you can

play02:36

afford that amount of time expense and

play02:39

have the expertise needed to make these

play02:41

large complicated custom models you may

play02:44

not have any way to generate all that

play02:45

data you need anyway if you can't find

play02:48

this data on the open web then are you

play02:50

really going to pay thousands of

play02:51

developers to hand code millions of

play02:54

figma designs into react or any other

play02:57

framework let alone all the different

play02:58

styling options like Tailwind versus

play03:00

emotion versus CSS modules it just

play03:03

becomes an impossibly complex problem to

play03:05

solve and a super duper model that just

play03:07

does everything for us is probably not

play03:09

the right approach here at least not

play03:11

today when you run into problems like

play03:13

this I would highly recommend trying to

play03:15

swing the pendulum to the complete other

play03:16

end and try as hard as you can to solve

play03:19

as much of this problem as possible

play03:21

without AI whatsoever that forces you to

play03:23

break the problem down into lots of

play03:25

discreet pieces that you can write

play03:27

normal traditional code for and see how

play03:29

far you can solve this in my experience

play03:32

however far you think you can solve it

play03:33

with some iteration and creativity you

play03:35

can get a lot farther than you think

play03:37

when we tried to break this problem down

play03:39

into just plain code we realized that

play03:42

there were a few different specific

play03:44

problems we had to solve in our findings

play03:46

at least two of the five problems were

play03:47

really easy to just solve with code

play03:49

where we hit challenges was in those

play03:50

other three areas so let's take that

play03:53

first step of identifying images and

play03:55

cover how we can train our own

play03:56

specialized model to solve this use case

play03:58

you really only need two key things to

play04:00

train your own model these days first is

play04:02

identify the right type of model and

play04:04

second you need to generate lots of

play04:06

example data in our case we're able to

play04:09

find a very common type of model that

play04:10

people train is an object detection

play04:12

model which can take an image and return

play04:15

some bounding boxes on where it found

play04:17

specific types of objects in this case

play04:19

locating the three cats and so we asked

play04:21

ourselves could we train this on a

play04:23

slightly novel use case which is to take

play04:25

a figma design as an image which uses

play04:28

hundreds of vectors throughout but for a

play04:30

website or mobile app certain groups of

play04:32

those should really be compressed into

play04:33

one single image and can identify where

play04:36

those image points would be so we can

play04:39

compress those into one and generate the

play04:41

code accordingly so that leads us to

play04:42

step two we need to generate lots of

play04:44

example data and see if training this

play04:46

model accordingly will work out for our

play04:48

use case we thought wait a second could

play04:50

we derive this data from somewhere

play04:52

somewhere that's public and free just

play04:54

like tools like open AI did where they

play04:56

crawl through tons of public data on the

play04:58

web and GitHub and use that as the basis

play05:01

of their training ultimately we realized

play05:03

yes we wrote a symol crawler that uses a

play05:06

headless browser to pull up a website

play05:07

into it and then evaluate some

play05:09

JavaScript on the page to identify where

play05:11

the images are and what their bounding

play05:13

boxes are which was able to generate a

play05:15

lot of training data for us really

play05:17

quickly now keep in mind one critical

play05:19

thing quality of your model is entirely

play05:22

dependent on quality of your data so out

play05:24

of hundreds of examples we generated we

play05:27

manually went through and used

play05:29

engineering to verify that every single

play05:31

bounding box was correct every time and

play05:33

used a visual tool to correct it anytime

play05:35

there weren't in my experience this can

play05:37

become one of the most complex areas of

play05:39

machine learning which is building your

play05:40

own tools to generate QA and fix data to

play05:45

ensure that your data set is as

play05:46

Immaculate as possible so that your

play05:48

model has the highest quality

play05:49

information to go off of now in the case

play05:51

of this object detection model luckily

play05:54

we use Google's vertex AI which has that

play05:57

exact tooling built in in fact vertex AI

play05:59

I is how we uploaded all that data and

play06:01

train the model without even needing to

play06:03

do that in code at all all you need to

play06:05

do is go to the vertex AI section of the

play06:07

Google Cloud console go to data sets and

play06:10

hit create we then can choose that we're

play06:12

using an object detection model and hit

play06:14

create and now you just need to upload

play06:16

your data you can do it manually by

play06:18

selecting files from your computer and

play06:20

then use their visual tool to outline

play06:21

the areas that matter to us which is a

play06:23

huge help that we don't have to build

play06:25

that ourselves or in our case because we

play06:27

generated all of our data

play06:28

programmatically

play06:29

we can just upload it to Google cloud in

play06:32

this format where you provide a path to

play06:33

an image and then list out the bounding

play06:36

boxes of the objects you want to

play06:37

identify then back in Google Cloud you

play06:40

can manually verify or tweak your data

play06:42

as much as you need and then once your

play06:44

data set's in shape all we need to do is

play06:46

train our model I use all the default

play06:48

settings here and I use the minimum

play06:50

amount of training hours this is the one

play06:52

piece that will cost you some money in

play06:54

this case the minimum amount of training

play06:56

needed costs about $60 now that's a lot

play06:59

cheaper than buying your own GPU and

play07:01

letting it run for hours or days at a

play07:03

time but if you don't want to pay a

play07:05

cloud provider trending on your own

play07:07

machine is still an option there's a lot

play07:09

of nice python libraries that are not

play07:11

complicated to learn where you can do

play07:13

this too once you hit start trading in

play07:15

an rase took about three Real World

play07:17

hours then you can find your training

play07:19

results and deploy your model which in

play07:21

this case I've already done that can

play07:22

take a couple minutes and then you'll

play07:24

have an API endpoints that you can send

play07:26

an image and get back a set of bounding

play07:29

boxes with their confidence levels we

play07:31

could also use the UI here as well so to

play07:33

test it out now in figma I'm just going

play07:35

to take a screen grab of a portion of

play07:38

this figma file because I'm lazy and I

play07:40

can just upload it to the UI to test and

play07:43

there we go we can see it did a decent

play07:45

job but there are some mistakes here but

play07:47

there's something important to know this

play07:49

UI is showing all possible images

play07:51

regardless of confidence when I take my

play07:53

cursor and I hover over each area that

play07:56

has high confidence these are spot-on

play07:58

these are perfect look at that and the

play08:00

strange ones are the ones down here with

play08:02

really low confidence I mean these are

play08:04

just wrong but that works as expected

play08:06

this even gives you an API where you can

play08:08

specify only return results above a

play08:10

certain confidence threshold by looking

play08:12

at this I think we want a threshold of

play08:14

at least point2 and there you have it

play08:17

with a specialized model we can run it

play08:19

wildly faster and cheaper and when we

play08:20

broke down our problem we found for

play08:22

image identification A specialized model

play08:24

was a much better solution for building

play08:26

the layout hierarchy similarly we made

play08:28

our own specialized model for that too

play08:30

for Styles and basic code generation

play08:33

plain code with a perfect solution and

play08:35

don't forget plain code is always the

play08:37

fastest the cheapest the easiest to test

play08:40

the easiest to debug the most

play08:41

predictable and just the best things

play08:43

whenever you could use it absolutely

play08:45

just do that and then finally to allow

play08:47

people to customize their code name it

play08:49

better use different libraries than we

play08:51

already support we used an llm for the

play08:54

final step now that we're able to take a

play08:56

design and make Baseline code llms are

play08:59

very good at taking basic code and

play09:01

making adjustments to the code giving

play09:03

you new code with small changes back so

play09:06

despite all my complaints about llms and

play09:08

the fact that I still hate how slow and

play09:09

costly that step is in this pipeline it

play09:12

was and continues to be the best

play09:13

solution for that one specific piece and

play09:15

now when we bring all that together and

play09:17

launch the builder. figment importer all

play09:19

I need to do is Click generate code we

play09:21

will rapidly run through those

play09:22

specialized models and launch it to the

play09:24

Builder visual editor where we converted

play09:26

that design into responsive Pixel

play09:29

Perfect code that we can output as high

play09:32

quality react quick view Etc code and

play09:35

even change options to use popular

play09:37

styling Frameworks like Tailwind Etc

play09:39

doing all this super cool AI magic and

play09:41

you can just copy and paste right into

play09:42

your code base and luckily because we

play09:44

created this entire tool chain all of

play09:47

that's in our control and that's it to

play09:48

quickly recap I would always recommend

play09:50

testing an llm for your use case just

play09:53

for exploratory purposes but if it's not

play09:55

hitting the mark write plain old code as

play09:57

much as you possibly can and where you

play09:59

hit bottleneck see if you can find a

play10:01

specialized type of model that you can

play10:03

train generating your own data and using

play10:05

a product like vertex AI or many others

play10:08

and create your own robust incredible

play10:10

tool chain to wow your users with

play10:12

exciting feat of engineering that they

play10:13

maybe have never seen before for a more

play10:15

detailed breakdown of everything I just

play10:17

showed you here check out my latest blog

play10:19

post on the builder. blog and I can't

play10:21

wait to see what you go and build

Rate This
β˜…
β˜…
β˜…
β˜…
β˜…

5.0 / 5 (0 votes)

Related Tags
AI TrainingCustom ModelsFigma to CodeObject DetectionCode GenerationML OptimizationData QualityVertex AIModel CustomizationTech Innovation