Don’t Build AI Products The Way Everyone Else Is Doing It
Summary
TLDRThis video script advocates a strategic approach to building AI products that are unique, valuable, and efficient. It critiques the common practice of simply wrapping existing large language models like ChatGPT, highlighting issues like lack of differentiation, high costs, and slow performance. Instead, it proposes a toolchain approach, combining specialized AI models with traditional coding for targeted problem-solving. This method involves first exploring the problem space using standard programming, then introducing AI models only for specific challenges that are difficult to solve with conventional code. By owning and continuously improving these custom models, companies can create differentiated, cost-effective, and high-performing AI solutions tailored to their needs.
Takeaways
- 🔑 Don't just wrap existing AI models like ChatGPT; build your own custom AI toolchain for a differentiated, valuable, and fast product.
- ⚠️ Using large pre-trained models like ChatGPT is risky as they can be easily copied, are expensive to run, slow, and difficult to customize.
- 🧩 Break down complex problems into smaller parts that can be solved with specialized AI models combined with traditional code.
- 🔍 Explore the problem space using normal programming practices first, then identify areas that require specialized AI models.
- 📊 Generate your own training data creatively, e.g., using web scraping or other techniques, to train custom AI models.
- 🎯 Train specialized AI models for specific tasks using off-the-shelf tools like Google's Vertex AI.
- 🧱 Connect multiple specialized AI models with traditional code to create the final product.
- ⚡ Custom AI toolchains can be faster, more reliable, cheaper, and more differentiated than using large pre-trained models.
- 🔄 Continuously improve your AI models by incorporating user feedback and new data.
- 🔐 Owning your AI models allows for better control, privacy, and customization compared to relying on third-party models.
Q & A
What is the main issue with building AI products by simply wrapping other models like ChatGPT?
-The main issue is that this approach does not create differentiated technology. It's easy for competitors to copy and replicate, putting the product at risk of being commoditized.
What are the other major problems with relying solely on large language models like ChatGPT?
-Other major problems include high costs of running large and complex models, slow performance for applications that require instant responses, and limited customizability despite fine-tuning.
How did the speaker's company approach building their Visual Co-Pilot product?
-Instead of relying on a single large language model, they created their own toolchain by combining a fine-tuned LLM with other technologies and custom-trained models for specific tasks.
Why did the speaker recommend not using AI initially when building an AI product?
-The speaker recommended exploring the problem space using normal programming practices first, to determine which areas truly require specialized AI models. This approach avoids building overly complex models from the start.
How did the speaker's company generate data to train their object detection model?
-They used Puppeteer to automate opening websites, taking screenshots, and mapping the locations of images on the page. This generated the input and output data needed to train the object detection model.
What are the advantages of owning and training your own models, according to the speaker?
-Owning and training your own models allows for faster improvements, lower costs, better privacy control, and the ability to meet specific customer requirements that pre-trained models may not address.
What advice did the speaker give for building AI products?
-The speaker advised using AI for as little as possible, and instead relying on normal code combined with specialized AI models for critical areas. This approach aims to create faster, more reliable, and more cost-effective products.
How does the speaker's approach differ from the common perception of how AI products are built?
-The speaker's approach differs from the misconception that AI products are built using a single, large model that handles all inputs and outputs. Instead, the speaker advocates for a toolchain of specialized models combined with regular code.
What example did the speaker use to illustrate the toolchain approach?
-The speaker used the example of self-driving cars, which are not built using a single AI brain, but rather a toolchain of specialized models for tasks like computer vision, predictive decision-making, and natural language processing, combined with regular code.
What advice did the speaker give for companies with strict privacy requirements?
-For companies with strict privacy requirements, the speaker suggested that owning and controlling the entire technology stack allows for holding models to a high privacy bar, or even allowing companies to plug in their own in-house or enterprise language models.
Outlines
🚫 Don't Follow the Crowd: Build Unique AI Products
The video script emphasizes the importance of building unique and valuable AI products instead of simply wrapping existing models like ChatGPT. It highlights the risks of following the crowd, such as lack of differentiation, high costs, poor performance, and ease of replication. The narrator suggests creating custom toolchains by combining fine-tuned language models, specialized AI models, and traditional code to build faster, cheaper, and more reliable products.
🧱 The Modular Approach: Building AI Products Like LEGO Blocks
The script dispels the misconception that AI products are built using a single, all-encompassing model. Instead, it advocates for a modular approach, where various specialized models are combined with traditional code to create the final product. The example of self-driving cars is used to illustrate this concept, where multiple models for computer vision, decision-making, and natural language processing are connected through code to achieve the desired functionality.
🔨 The Builder.ai Approach: Blending Code and AI Models
The script outlines the approach taken by Builder.ai in developing their Visual Co-Pilot product. It involves breaking down the problem, solving as much as possible with traditional programming practices, and then incorporating specialized AI models for specific tasks that are difficult to achieve with code alone. The process includes techniques like object detection, data generation through web scraping, and combining multiple models to produce responsive and customizable code from design inputs.
Mindmap
Keywords
💡Differentiated AI Products
💡Cost Optimization
💡Performance Optimization
💡Custom Model Training
💡Toolchain Approach
💡Incremental Development
💡Data Generation
💡Ownership and Control
💡Rapid Iteration
💡Minimizing AI Usage
Highlights
The vast majority of AI products being built right now are just wrappers over other models, like calling ChatGPT over an API, which makes them easy to copy and not differentiated.
Using large language models like ChatGPT can be costly, as they are incredibly large and complex, which makes them expensive to run.
Large language models are painfully slow for applications that need the entire response before proceeding, like generating code from a design specification.
Large language models cannot be customized much, even with fine-tuning, which can lead to poor quality results for specific use cases.
The solution is to create a tool chain that combines a fine-tuned language model with other technology and custom-trained models for specific tasks.
Most advanced AI products are built as a tool chain of several specialized models connected with normal code, rather than a single super-intelligent model.
The recommended approach is to explore the problem space using normal programming practices first, to determine what areas need specialized models.
Break down the problem and solve as much as possible with normal code before introducing AI models for specific tasks that are difficult to solve with code.
Generate data for training custom models by using creative methods like web scraping or automating processes.
Use AI models for as little as possible, and only for critical areas where normal code is insufficient, as code is faster, more reliable, and easier to manage.
Owning the models allows for constant improvement and rapid iteration based on user feedback and new data.
This approach provides control over privacy, customization, and integration with other systems or models.
The magic comes from the small but critical areas where AI models are used, combined with normal code for the rest of the system.
AI products should be built in layers, like self-driving cars, with incremental additions of AI capabilities over time.
The end result, like Visual Co-Pilot, is a fast, low-cost, and valuable product that is difficult for competitors to copy.
Transcripts
if you want to build AI products that
are unique valuable and fast don't do
what everybody else is doing I'll show
you what to do instead the vast majority
of AI products being built right now are
just wrappers over other models for
instance basically just calling chat GPT
over an API and while that's incredibly
easy you send natural language in and
get natural language out and it can do
some really cool things there are some
major problems with this approach that
people are running into and there's a
solution for them that I'll show you the
first major issue is is this is not
differentiated technology if you've
noticed that one person creates a chat
with a PDF app and then another dozen
people do two and then open AI builds
that into chat GPT directly That's
because nobody there actually built
something differentiated they use a
simple technique with a pre-trained
model which anyone can copy in a very
short period of time when building a
product whose unique value proposition
is some type of advanced AI technology
it's a very risky position to be so easy
to copy now of course there's a whole
Spectrum here here if you're on the
right side of the spectrum where all you
made was a button that sends something
to chat GPT and gets a response back
that you showed to your end users where
chat GPT basically did all the work
you're at the highest risk here on the
other end if you actually built some
substantial technology and LMS like open
AIS only assisted with a small but
crucial piece then you may be in a
better position but you're still going
to run into two other major issues the
first major issue you'll run into is
cost the best part of a large language
model is their broad versatility but
they achieve this by being incredibly
large and complex which makes them
incredibly costly to run as an example
co-pilot is losing money per user
charging $10 but on average costing $20
just on API calls and some users cost
GitHub up to $80 and the worst part is
you probably don't need such a large
model your use case probably doesn't
need a model trained on the entirety of
the internet which 99.9% % will be
covering topics that have nothing to do
with your use case so while the ease of
this approach might be tempting you
could run into this common issue where
what your users want to pay is less than
what it costs to run your service on top
of large language models but even if
you're the rare case where the cost
economics might work out okay for you
you're still going to hit one more major
issue llms are painfully slow now this
isn't a huge problem for all
applications for instance for use cases
like chat GPT where you can read one
word at a time anyway this isn't the
worst thing but for applications that
are not about streaming text where
nobody is going to be reading it word
for word but instead waiting on the
entire response before the next step in
the flow can be taken this can be a big
problem for instance when we started
building our visual co-pilot product
where we wanted one button click to turn
any design into highquality code one of
the approaches we explored was using an
llm for the conversion but one of the
key issues was it took forever because
if you need to pass an entire design
spec into an llm and get an entire new
representation out token by token it was
taking literally minutes to give us a
reply which was just not viable and
because the representation returned by
the llm is not what a human would see
the loading state was just a spinner and
it was horrific but if for some reason
performance is still not even an issue
to you and for some reason your users do
not care about having a slow and
expensive product that's easy for your
competitors to copy you'll still likely
hit at some point one other major issue
which is llms cannot be customized that
much yes they all support fine-tuning
and fine-tuning can incrementally help
the model get closer to what you need
but in our case we tried using fine
tuning to provide figma designs and get
code out the other side but no matter
how many examples we gave the model it
did not seem to get hardly any smarter
at all but we were left with was
something slow expensive and Incredibly
poor quality and that's where we
realized we had to take a different
approach what did we find we had to do
instead we had to create our own tool
chain in this case we combined a
fine-tuned llm a whole lot of other
technology and a custom trained model
and this is not necessarily as hard as
you might think these days you don't
have to be a data scientist or a PhD in
machine learning to train your own model
any moderately experienced developer Now
can do it what this can allow you to
build is something that is way faster
way more reliable far cheaper and far
more differentiated so you won't have to
worry about copycat products or open
source clones spawning overnight either
and this isn't just a theory most if not
all advanced AI products are built in a
way like this a lot of people have a
major misconception about how AI
products are built I've seen that they
often think that all the core Tech is
handled by one super smart model where
they trained it with tons of inputs to
give exactly the right output for
instance for self-driving cars I've seen
a lot of people have the impression that
there's this giant model that takes in
all these different inputs like cameras
sensors GPS Etc it crunches it through
the smart Ai and then out comes the
action on the other side such as turn
right but this could not be farther from
the truth that car driving itself is not
one big AI brain but instead a whole
tool chain of several specialized models
all connected with normal code such as
models for computer vision to find and
identify objects and predictive
decision- making to anticipate the
actions of others or natural language
processing for understanding voice
commands all of these specialized models
combined with tons of just normal code
and logic creates the end result that
you see now keep in mind autonomous
vehicles is a highly complex example
that include many more models than I'm
even showing here but for building your
own product you won't need something
nearly this complex especially to start
remember self-driving car cars didn't
spawn overnight my 2018 Prius is capable
of parking itself stopping automatically
when too close to an object and many
other things using little to no AI over
time more and more layers were added to
do more and more advanced things like
correcting L departure or eventually
making entire decisions to drive from
one place to another but like all
software these things are built in
layers one on top of the next the way we
build visual co-pilot is a way I would
highly recommend you explore for your
own AI Solutions it's a very simple but
counterintuitive approach the most
important thing is don't use AI to start
you need to explore the problem space
using normal programming practices first
to even determine what areas need a
specialized model because remember
making super models is generally not the
right approach we don't want to just
send tons of figma data into a model and
get finished code out the other side
that would be an insanely complex
problem to solve with just one model
model and when you factor in all the
different Frameworks we support and
styling options and customizations this
would just get insane to retrain this
model with all this different data and
it would likely become so complex slow
and expensive that our product probably
would have never shipped in the first
place instead what we did is we looked
at the problem and said well how can we
solve this without Ai and how far can we
get before it just gets impossible
without the types of specialized
decision- making AI is best at so broke
the problem down and said okay we need
to convert each of these nodes to things
we can represent in code like HTML nodes
for the web we need to understand what
is an image what is a background what is
a foreground and most importantly how to
make this responsive because this only
works if what we import becomes fully
responsive for all screen sizes
automatically then we started looking at
more complex examples and realize there
are many cases where many many layers
need to be turned into one image we
started writing hand-coded logic to say
if a set of items is in a vertical stack
that should probably be a flex column
and if groups are side by side they
should probably be a flex row and we got
as far as we could creating all these
different types of sophisticated
algorithms to automatically transform
designs to responsive code before we
started hitting limits and in my
experience wherever you think the limit
is it's probably actually a lot further
at a certain point you'll find some
things are just near impossible to do
with normal code for example
automatically detecting which of these
layers should turn into one image is
something that our eyes are really good
at understanding but not necessarily
normal imperative code in our case we
wrote all this in JavaScript now lucky
for us training your own object
detection model is not that hard for
example products like Google's vertex AI
has a range of common types of models
that you can easily train yourself one
of which is object detection I can
choose that with a guey and then prepare
data and just upload it as a file for a
wellestablished typee of model like this
all it comes down to is creating the
data now where things get interesting is
finding creative ways of generating the
data you need one awesome massive free
resource for generating data is simply
the internet and so one way we explored
approaching this is using Puppeteer to
automate opening websites in a web
browser we can then take a screenshot of
the site and we can Traverse the HTML to
find the image tags we can then use the
location of the images as the output
data and the screenshot of the web page
as the input data and now we have
exactly what we need a source image and
coordinates of where all the sub images
are to train this AI model so while in
figma this which should be one image as
many layers our object detection model
can take the pixels identify that this
rectangle should be one image we can
compress it into one and use it as part
of our code gen using these techniques
where we fill in the unknowns with
specialized AI models and ping multiple
together is how we're able to produce
end results like this where I can just
select this hit generate code launch
into Builder and get a completely
responsive website out the other side
with highquality code that you can
customize yourself completely supporting
a wide variety of Frameworks and options
and it's all incredibly fast because all
of our models are specially built just
for this purpose incredibly low cost to
provide we provide a generous free tier
and ultimately really valuable for our
customers to save them lots of time and
the best part is this is only the
beginning because one of the best parts
of this approach as opposed to just
wrapping somebody else's model is we
completely own the models so we can
constantly improve them if you're fully
dependent only on someone else's model
like open AI there's no guarantee it's
going to get smarter faster or cheaper
for your use case and your ability to
control that with prompt engineering and
fine-tuning is severely limited but when
we own our own model we're making
drastic improvements every day when new
designs come in that don't import well
which still happens as we're in beta we
look at user feedback we find areas to
improve and we improve at a rapid
Cadence shipping improvements every
single day and we never have to worry
about a lack of control for instance we
started talking to some very large and
very privacy focused companies to be
early beta customers and one of the
first pieces of feedback was they're not
able to use open aai or any products
using open AI because of their privacy
requirements and the need to make sure
their data never goes into systems that
they don't allow in our case because we
control the entire technology we can
hold our models to an extremely high
privacy bar and for the llm step it can
either be disabled because it's purely a
nice to have or we're allowing companies
to plug in their own llm which might be
a completely in-house built model a fork
of llama 2 their own Enterprise instance
of open AI or something else entirely so
if you want to build AI products I would
highly recommend taking a similar
approach as strange as it sounds don't
use AI as long as possible when you
start finding extremely specific
problems that normal coding doesn't
solve well but wellestablished AI models
can start generating your own data and
training your own models using a wide
variety of tools that you can find off
the shelf connect your model or multiple
models to your code at only the small
points that they're needed and I want to
emphasize this use AI for as little as
possible because at the end of the day
normal plain code is some of the fastest
most reliable most deterministic most
easy to debug easy to fix easy to manage
and easy to test code you will ever have
but the magic will come from the small
small but critical areas you use AI
models for if you'd like to learn more
about this topic you can see more on my
latest blog post on the builder. blog
thanks for watching and I can't wait to
see what you build
5.0 / 5 (0 votes)