Don’t Build AI Products The Way Everyone Else Is Doing It

Steve (Builder.io)
8 Nov 202312:52

Summary

TLDRThis video script advocates a strategic approach to building AI products that are unique, valuable, and efficient. It critiques the common practice of simply wrapping existing large language models like ChatGPT, highlighting issues like lack of differentiation, high costs, and slow performance. Instead, it proposes a toolchain approach, combining specialized AI models with traditional coding for targeted problem-solving. This method involves first exploring the problem space using standard programming, then introducing AI models only for specific challenges that are difficult to solve with conventional code. By owning and continuously improving these custom models, companies can create differentiated, cost-effective, and high-performing AI solutions tailored to their needs.

Takeaways

  • 🔑 Don't just wrap existing AI models like ChatGPT; build your own custom AI toolchain for a differentiated, valuable, and fast product.
  • ⚠️ Using large pre-trained models like ChatGPT is risky as they can be easily copied, are expensive to run, slow, and difficult to customize.
  • 🧩 Break down complex problems into smaller parts that can be solved with specialized AI models combined with traditional code.
  • 🔍 Explore the problem space using normal programming practices first, then identify areas that require specialized AI models.
  • 📊 Generate your own training data creatively, e.g., using web scraping or other techniques, to train custom AI models.
  • 🎯 Train specialized AI models for specific tasks using off-the-shelf tools like Google's Vertex AI.
  • 🧱 Connect multiple specialized AI models with traditional code to create the final product.
  • ⚡ Custom AI toolchains can be faster, more reliable, cheaper, and more differentiated than using large pre-trained models.
  • 🔄 Continuously improve your AI models by incorporating user feedback and new data.
  • 🔐 Owning your AI models allows for better control, privacy, and customization compared to relying on third-party models.

Q & A

  • What is the main issue with building AI products by simply wrapping other models like ChatGPT?

    -The main issue is that this approach does not create differentiated technology. It's easy for competitors to copy and replicate, putting the product at risk of being commoditized.

  • What are the other major problems with relying solely on large language models like ChatGPT?

    -Other major problems include high costs of running large and complex models, slow performance for applications that require instant responses, and limited customizability despite fine-tuning.

  • How did the speaker's company approach building their Visual Co-Pilot product?

    -Instead of relying on a single large language model, they created their own toolchain by combining a fine-tuned LLM with other technologies and custom-trained models for specific tasks.

  • Why did the speaker recommend not using AI initially when building an AI product?

    -The speaker recommended exploring the problem space using normal programming practices first, to determine which areas truly require specialized AI models. This approach avoids building overly complex models from the start.

  • How did the speaker's company generate data to train their object detection model?

    -They used Puppeteer to automate opening websites, taking screenshots, and mapping the locations of images on the page. This generated the input and output data needed to train the object detection model.

  • What are the advantages of owning and training your own models, according to the speaker?

    -Owning and training your own models allows for faster improvements, lower costs, better privacy control, and the ability to meet specific customer requirements that pre-trained models may not address.

  • What advice did the speaker give for building AI products?

    -The speaker advised using AI for as little as possible, and instead relying on normal code combined with specialized AI models for critical areas. This approach aims to create faster, more reliable, and more cost-effective products.

  • How does the speaker's approach differ from the common perception of how AI products are built?

    -The speaker's approach differs from the misconception that AI products are built using a single, large model that handles all inputs and outputs. Instead, the speaker advocates for a toolchain of specialized models combined with regular code.

  • What example did the speaker use to illustrate the toolchain approach?

    -The speaker used the example of self-driving cars, which are not built using a single AI brain, but rather a toolchain of specialized models for tasks like computer vision, predictive decision-making, and natural language processing, combined with regular code.

  • What advice did the speaker give for companies with strict privacy requirements?

    -For companies with strict privacy requirements, the speaker suggested that owning and controlling the entire technology stack allows for holding models to a high privacy bar, or even allowing companies to plug in their own in-house or enterprise language models.

Outlines

00:00

🚫 Don't Follow the Crowd: Build Unique AI Products

The video script emphasizes the importance of building unique and valuable AI products instead of simply wrapping existing models like ChatGPT. It highlights the risks of following the crowd, such as lack of differentiation, high costs, poor performance, and ease of replication. The narrator suggests creating custom toolchains by combining fine-tuned language models, specialized AI models, and traditional code to build faster, cheaper, and more reliable products.

05:01

🧱 The Modular Approach: Building AI Products Like LEGO Blocks

The script dispels the misconception that AI products are built using a single, all-encompassing model. Instead, it advocates for a modular approach, where various specialized models are combined with traditional code to create the final product. The example of self-driving cars is used to illustrate this concept, where multiple models for computer vision, decision-making, and natural language processing are connected through code to achieve the desired functionality.

10:01

🔨 The Builder.ai Approach: Blending Code and AI Models

The script outlines the approach taken by Builder.ai in developing their Visual Co-Pilot product. It involves breaking down the problem, solving as much as possible with traditional programming practices, and then incorporating specialized AI models for specific tasks that are difficult to achieve with code alone. The process includes techniques like object detection, data generation through web scraping, and combining multiple models to produce responsive and customizable code from design inputs.

Mindmap

Keywords

💡Differentiated AI Products

The video emphasizes the importance of creating AI products that are unique and distinguishable from competitors. It critiques the common approach of simply wrapping existing language models like GPT-3, which makes products easily replicable and not differentiated. Instead, the video advocates for building specialized AI models and toolchains tailored to specific use cases, making the product more valuable and difficult to copy.

💡Cost Optimization

One major issue highlighted is the high cost associated with running large language models for AI products. The video argues that most use cases do not require models trained on the entire internet, leading to unnecessary expenses. It suggests optimizing costs by building custom, specialized models trained only on relevant data for the specific use case, resulting in a more affordable and sustainable product.

💡Performance Optimization

The video identifies performance as another major challenge when using large language models, which can be unacceptably slow for certain applications. It provides an example of a visual co-pilot product where using an LLM for design-to-code conversion resulted in multi-minute response times, rendering the user experience poor. The proposed solution is to build custom models optimized for speed and tailored to the specific task at hand.

💡Custom Model Training

A central recommendation in the video is to train custom AI models for specific tasks, rather than relying solely on pre-trained language models. It highlights the availability of tools and services that enable developers without machine learning expertise to train their own models using relevant data. This approach allows for greater customization, differentiation, and optimization of AI products.

💡Toolchain Approach

The video advocates for a toolchain approach to building AI products, where multiple specialized models are combined with traditional code to create the final solution. It argues against the misconception that advanced AI products are powered by a single, monolithic model. Instead, it suggests breaking down the problem into smaller components, using traditional code where possible, and integrating custom AI models only for specific tasks that cannot be easily solved with code alone.

💡Incremental Development

The video emphasizes an incremental approach to building AI products, starting with traditional programming practices and introducing AI components only when necessary. It provides the example of a self-driving car, which was not developed overnight but rather through a layered approach, gradually adding more advanced AI capabilities over time. This approach allows for iterative improvement and easier management of complexity.

💡Data Generation

To train custom AI models, the video highlights the importance of generating relevant training data. It suggests creative approaches, such as using web scraping and automation tools to gather and process data from the internet. The example provided is using Puppeteer to automate web browsing, capture screenshots, and extract image locations to train an object detection model for a visual co-pilot product.

💡Ownership and Control

By building custom AI models and toolchains, the video argues that companies can maintain ownership and control over their AI technology. This addresses concerns around privacy, data handling, and the ability to continuously improve and adapt the models based on user feedback and evolving requirements. It contrasts this with the limited control and dependence on third-party models when simply wrapping existing language models.

💡Rapid Iteration

One of the advantages of owning custom AI models, as highlighted in the video, is the ability to rapidly iterate and improve them. When user feedback or issues arise, the models can be quickly updated and enhanced, with improvements shipped daily during the beta phase. This level of agility and responsiveness is more challenging when relying solely on third-party models over which the company has limited control.

💡Minimizing AI Usage

Counterintuitively, the video recommends using AI as little as possible when building products. It suggests starting with traditional code and introducing AI components only when specific problems cannot be solved efficiently with code alone. This approach is proposed to leverage the speed, reliability, determinism, and ease of debugging offered by traditional code, while selectively utilizing the strengths of AI models for critical tasks.

Highlights

The vast majority of AI products being built right now are just wrappers over other models, like calling ChatGPT over an API, which makes them easy to copy and not differentiated.

Using large language models like ChatGPT can be costly, as they are incredibly large and complex, which makes them expensive to run.

Large language models are painfully slow for applications that need the entire response before proceeding, like generating code from a design specification.

Large language models cannot be customized much, even with fine-tuning, which can lead to poor quality results for specific use cases.

The solution is to create a tool chain that combines a fine-tuned language model with other technology and custom-trained models for specific tasks.

Most advanced AI products are built as a tool chain of several specialized models connected with normal code, rather than a single super-intelligent model.

The recommended approach is to explore the problem space using normal programming practices first, to determine what areas need specialized models.

Break down the problem and solve as much as possible with normal code before introducing AI models for specific tasks that are difficult to solve with code.

Generate data for training custom models by using creative methods like web scraping or automating processes.

Use AI models for as little as possible, and only for critical areas where normal code is insufficient, as code is faster, more reliable, and easier to manage.

Owning the models allows for constant improvement and rapid iteration based on user feedback and new data.

This approach provides control over privacy, customization, and integration with other systems or models.

The magic comes from the small but critical areas where AI models are used, combined with normal code for the rest of the system.

AI products should be built in layers, like self-driving cars, with incremental additions of AI capabilities over time.

The end result, like Visual Co-Pilot, is a fast, low-cost, and valuable product that is difficult for competitors to copy.

Transcripts

play00:00

if you want to build AI products that

play00:01

are unique valuable and fast don't do

play00:05

what everybody else is doing I'll show

play00:06

you what to do instead the vast majority

play00:09

of AI products being built right now are

play00:11

just wrappers over other models for

play00:13

instance basically just calling chat GPT

play00:15

over an API and while that's incredibly

play00:17

easy you send natural language in and

play00:20

get natural language out and it can do

play00:22

some really cool things there are some

play00:24

major problems with this approach that

play00:25

people are running into and there's a

play00:27

solution for them that I'll show you the

play00:28

first major issue is is this is not

play00:30

differentiated technology if you've

play00:32

noticed that one person creates a chat

play00:34

with a PDF app and then another dozen

play00:36

people do two and then open AI builds

play00:39

that into chat GPT directly That's

play00:41

because nobody there actually built

play00:42

something differentiated they use a

play00:44

simple technique with a pre-trained

play00:46

model which anyone can copy in a very

play00:49

short period of time when building a

play00:50

product whose unique value proposition

play00:52

is some type of advanced AI technology

play00:55

it's a very risky position to be so easy

play00:57

to copy now of course there's a whole

play00:59

Spectrum here here if you're on the

play01:01

right side of the spectrum where all you

play01:02

made was a button that sends something

play01:04

to chat GPT and gets a response back

play01:07

that you showed to your end users where

play01:08

chat GPT basically did all the work

play01:10

you're at the highest risk here on the

play01:12

other end if you actually built some

play01:14

substantial technology and LMS like open

play01:17

AIS only assisted with a small but

play01:19

crucial piece then you may be in a

play01:20

better position but you're still going

play01:22

to run into two other major issues the

play01:25

first major issue you'll run into is

play01:27

cost the best part of a large language

play01:29

model is their broad versatility but

play01:31

they achieve this by being incredibly

play01:33

large and complex which makes them

play01:35

incredibly costly to run as an example

play01:38

co-pilot is losing money per user

play01:41

charging $10 but on average costing $20

play01:44

just on API calls and some users cost

play01:47

GitHub up to $80 and the worst part is

play01:50

you probably don't need such a large

play01:52

model your use case probably doesn't

play01:55

need a model trained on the entirety of

play01:57

the internet which 99.9% % will be

play02:00

covering topics that have nothing to do

play02:01

with your use case so while the ease of

play02:04

this approach might be tempting you

play02:05

could run into this common issue where

play02:07

what your users want to pay is less than

play02:09

what it costs to run your service on top

play02:12

of large language models but even if

play02:14

you're the rare case where the cost

play02:16

economics might work out okay for you

play02:18

you're still going to hit one more major

play02:19

issue llms are painfully slow now this

play02:23

isn't a huge problem for all

play02:25

applications for instance for use cases

play02:27

like chat GPT where you can read one

play02:30

word at a time anyway this isn't the

play02:32

worst thing but for applications that

play02:34

are not about streaming text where

play02:36

nobody is going to be reading it word

play02:37

for word but instead waiting on the

play02:39

entire response before the next step in

play02:42

the flow can be taken this can be a big

play02:44

problem for instance when we started

play02:45

building our visual co-pilot product

play02:47

where we wanted one button click to turn

play02:49

any design into highquality code one of

play02:52

the approaches we explored was using an

play02:54

llm for the conversion but one of the

play02:56

key issues was it took forever because

play02:59

if you need to pass an entire design

play03:01

spec into an llm and get an entire new

play03:04

representation out token by token it was

play03:07

taking literally minutes to give us a

play03:09

reply which was just not viable and

play03:12

because the representation returned by

play03:13

the llm is not what a human would see

play03:16

the loading state was just a spinner and

play03:18

it was horrific but if for some reason

play03:21

performance is still not even an issue

play03:22

to you and for some reason your users do

play03:25

not care about having a slow and

play03:27

expensive product that's easy for your

play03:29

competitors to copy you'll still likely

play03:31

hit at some point one other major issue

play03:34

which is llms cannot be customized that

play03:36

much yes they all support fine-tuning

play03:39

and fine-tuning can incrementally help

play03:41

the model get closer to what you need

play03:43

but in our case we tried using fine

play03:45

tuning to provide figma designs and get

play03:48

code out the other side but no matter

play03:50

how many examples we gave the model it

play03:52

did not seem to get hardly any smarter

play03:54

at all but we were left with was

play03:56

something slow expensive and Incredibly

play03:58

poor quality and that's where we

play04:00

realized we had to take a different

play04:02

approach what did we find we had to do

play04:04

instead we had to create our own tool

play04:06

chain in this case we combined a

play04:08

fine-tuned llm a whole lot of other

play04:11

technology and a custom trained model

play04:13

and this is not necessarily as hard as

play04:15

you might think these days you don't

play04:17

have to be a data scientist or a PhD in

play04:20

machine learning to train your own model

play04:22

any moderately experienced developer Now

play04:24

can do it what this can allow you to

play04:26

build is something that is way faster

play04:28

way more reliable far cheaper and far

play04:31

more differentiated so you won't have to

play04:33

worry about copycat products or open

play04:35

source clones spawning overnight either

play04:38

and this isn't just a theory most if not

play04:40

all advanced AI products are built in a

play04:43

way like this a lot of people have a

play04:45

major misconception about how AI

play04:47

products are built I've seen that they

play04:49

often think that all the core Tech is

play04:51

handled by one super smart model where

play04:53

they trained it with tons of inputs to

play04:56

give exactly the right output for

play04:58

instance for self-driving cars I've seen

play05:00

a lot of people have the impression that

play05:02

there's this giant model that takes in

play05:04

all these different inputs like cameras

play05:07

sensors GPS Etc it crunches it through

play05:10

the smart Ai and then out comes the

play05:12

action on the other side such as turn

play05:14

right but this could not be farther from

play05:16

the truth that car driving itself is not

play05:19

one big AI brain but instead a whole

play05:21

tool chain of several specialized models

play05:25

all connected with normal code such as

play05:27

models for computer vision to find and

play05:30

identify objects and predictive

play05:32

decision- making to anticipate the

play05:34

actions of others or natural language

play05:36

processing for understanding voice

play05:38

commands all of these specialized models

play05:40

combined with tons of just normal code

play05:43

and logic creates the end result that

play05:45

you see now keep in mind autonomous

play05:47

vehicles is a highly complex example

play05:50

that include many more models than I'm

play05:51

even showing here but for building your

play05:53

own product you won't need something

play05:55

nearly this complex especially to start

play05:58

remember self-driving car cars didn't

play05:59

spawn overnight my 2018 Prius is capable

play06:03

of parking itself stopping automatically

play06:05

when too close to an object and many

play06:07

other things using little to no AI over

play06:10

time more and more layers were added to

play06:13

do more and more advanced things like

play06:15

correcting L departure or eventually

play06:17

making entire decisions to drive from

play06:19

one place to another but like all

play06:20

software these things are built in

play06:22

layers one on top of the next the way we

play06:24

build visual co-pilot is a way I would

play06:26

highly recommend you explore for your

play06:28

own AI Solutions it's a very simple but

play06:31

counterintuitive approach the most

play06:33

important thing is don't use AI to start

play06:36

you need to explore the problem space

play06:39

using normal programming practices first

play06:41

to even determine what areas need a

play06:44

specialized model because remember

play06:46

making super models is generally not the

play06:48

right approach we don't want to just

play06:50

send tons of figma data into a model and

play06:53

get finished code out the other side

play06:55

that would be an insanely complex

play06:57

problem to solve with just one model

play06:59

model and when you factor in all the

play07:01

different Frameworks we support and

play07:03

styling options and customizations this

play07:05

would just get insane to retrain this

play07:08

model with all this different data and

play07:10

it would likely become so complex slow

play07:13

and expensive that our product probably

play07:15

would have never shipped in the first

play07:17

place instead what we did is we looked

play07:18

at the problem and said well how can we

play07:20

solve this without Ai and how far can we

play07:23

get before it just gets impossible

play07:25

without the types of specialized

play07:27

decision- making AI is best at so broke

play07:29

the problem down and said okay we need

play07:31

to convert each of these nodes to things

play07:33

we can represent in code like HTML nodes

play07:36

for the web we need to understand what

play07:38

is an image what is a background what is

play07:41

a foreground and most importantly how to

play07:43

make this responsive because this only

play07:46

works if what we import becomes fully

play07:48

responsive for all screen sizes

play07:50

automatically then we started looking at

play07:52

more complex examples and realize there

play07:54

are many cases where many many layers

play07:56

need to be turned into one image we

play07:59

started writing hand-coded logic to say

play08:01

if a set of items is in a vertical stack

play08:03

that should probably be a flex column

play08:05

and if groups are side by side they

play08:06

should probably be a flex row and we got

play08:09

as far as we could creating all these

play08:11

different types of sophisticated

play08:13

algorithms to automatically transform

play08:15

designs to responsive code before we

play08:18

started hitting limits and in my

play08:20

experience wherever you think the limit

play08:21

is it's probably actually a lot further

play08:23

at a certain point you'll find some

play08:25

things are just near impossible to do

play08:27

with normal code for example

play08:29

automatically detecting which of these

play08:31

layers should turn into one image is

play08:33

something that our eyes are really good

play08:36

at understanding but not necessarily

play08:38

normal imperative code in our case we

play08:40

wrote all this in JavaScript now lucky

play08:42

for us training your own object

play08:44

detection model is not that hard for

play08:47

example products like Google's vertex AI

play08:50

has a range of common types of models

play08:52

that you can easily train yourself one

play08:54

of which is object detection I can

play08:56

choose that with a guey and then prepare

play08:58

data and just upload it as a file for a

play09:00

wellestablished typee of model like this

play09:03

all it comes down to is creating the

play09:04

data now where things get interesting is

play09:06

finding creative ways of generating the

play09:08

data you need one awesome massive free

play09:13

resource for generating data is simply

play09:15

the internet and so one way we explored

play09:17

approaching this is using Puppeteer to

play09:20

automate opening websites in a web

play09:21

browser we can then take a screenshot of

play09:24

the site and we can Traverse the HTML to

play09:26

find the image tags we can then use the

play09:28

location of the images as the output

play09:31

data and the screenshot of the web page

play09:33

as the input data and now we have

play09:35

exactly what we need a source image and

play09:37

coordinates of where all the sub images

play09:39

are to train this AI model so while in

play09:42

figma this which should be one image as

play09:44

many layers our object detection model

play09:47

can take the pixels identify that this

play09:50

rectangle should be one image we can

play09:52

compress it into one and use it as part

play09:54

of our code gen using these techniques

play09:56

where we fill in the unknowns with

play09:58

specialized AI models and ping multiple

play10:00

together is how we're able to produce

play10:02

end results like this where I can just

play10:04

select this hit generate code launch

play10:07

into Builder and get a completely

play10:09

responsive website out the other side

play10:11

with highquality code that you can

play10:13

customize yourself completely supporting

play10:15

a wide variety of Frameworks and options

play10:18

and it's all incredibly fast because all

play10:21

of our models are specially built just

play10:23

for this purpose incredibly low cost to

play10:25

provide we provide a generous free tier

play10:28

and ultimately really valuable for our

play10:30

customers to save them lots of time and

play10:32

the best part is this is only the

play10:33

beginning because one of the best parts

play10:35

of this approach as opposed to just

play10:37

wrapping somebody else's model is we

play10:40

completely own the models so we can

play10:42

constantly improve them if you're fully

play10:44

dependent only on someone else's model

play10:46

like open AI there's no guarantee it's

play10:48

going to get smarter faster or cheaper

play10:50

for your use case and your ability to

play10:52

control that with prompt engineering and

play10:54

fine-tuning is severely limited but when

play10:57

we own our own model we're making

play10:59

drastic improvements every day when new

play11:01

designs come in that don't import well

play11:02

which still happens as we're in beta we

play11:04

look at user feedback we find areas to

play11:07

improve and we improve at a rapid

play11:08

Cadence shipping improvements every

play11:10

single day and we never have to worry

play11:12

about a lack of control for instance we

play11:15

started talking to some very large and

play11:17

very privacy focused companies to be

play11:19

early beta customers and one of the

play11:21

first pieces of feedback was they're not

play11:23

able to use open aai or any products

play11:25

using open AI because of their privacy

play11:27

requirements and the need to make sure

play11:29

their data never goes into systems that

play11:32

they don't allow in our case because we

play11:34

control the entire technology we can

play11:36

hold our models to an extremely high

play11:38

privacy bar and for the llm step it can

play11:40

either be disabled because it's purely a

play11:42

nice to have or we're allowing companies

play11:45

to plug in their own llm which might be

play11:47

a completely in-house built model a fork

play11:49

of llama 2 their own Enterprise instance

play11:51

of open AI or something else entirely so

play11:54

if you want to build AI products I would

play11:56

highly recommend taking a similar

play11:57

approach as strange as it sounds don't

play12:00

use AI as long as possible when you

play12:02

start finding extremely specific

play12:04

problems that normal coding doesn't

play12:06

solve well but wellestablished AI models

play12:09

can start generating your own data and

play12:11

training your own models using a wide

play12:13

variety of tools that you can find off

play12:15

the shelf connect your model or multiple

play12:17

models to your code at only the small

play12:20

points that they're needed and I want to

play12:22

emphasize this use AI for as little as

play12:24

possible because at the end of the day

play12:27

normal plain code is some of the fastest

play12:30

most reliable most deterministic most

play12:33

easy to debug easy to fix easy to manage

play12:36

and easy to test code you will ever have

play12:38

but the magic will come from the small

play12:40

small but critical areas you use AI

play12:43

models for if you'd like to learn more

play12:44

about this topic you can see more on my

play12:47

latest blog post on the builder. blog

play12:49

thanks for watching and I can't wait to

play12:50

see what you build

Rate This

5.0 / 5 (0 votes)

Related Tags
AI ProductsSpecialized ModelsToolchainsDifferentiationPerformanceCost EfficiencyProduct DevelopmentInnovationCustomizationTechnology